All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Btrfs: kill BUG_ON in do_relocation
@ 2016-09-14 16:27 Liu Bo
  2016-09-14 17:13 ` Josef Bacik
  2016-09-23 21:05 ` [PATCH v2] " Liu Bo
  0 siblings, 2 replies; 15+ messages in thread
From: Liu Bo @ 2016-09-14 16:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

While updating btree, we try to push items between sibling
nodes/leaves in order to keep height as low as possible.
But we don't memset the original places with zero when
pushing items so that we could end up leaving stale content
in nodes/leaves.  One may read the above stale content by
increasing btree blocks' @nritems.

One case I've come across is that in fs tree, a leaf has two
parent nodes, hence running balance ends up with processing
this leaf with two parent nodes, but it can only reach the
valid parent node through btrfs_search_slot, so it'd be like,

do_relocation
    for P in all parent nodes of block A:
        if !P->eb:
            btrfs_search_slot(key);   --> get path from P to A.
        if lowest:
            BUG_ON(A->bytenr != bytenr of A recorded in P);
        btrfs_cow_block(P, A);   --> change A's bytenr in P.

After btrfs_cow_block, P has the new bytenr of A, but with the
same @key, we get the same path again, and get panic by BUG_ON.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/relocation.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 680e234..bce8fea 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2723,7 +2723,14 @@ static int do_relocation(struct btrfs_trans_handle *trans,
 
 		bytenr = btrfs_node_blockptr(upper->eb, slot);
 		if (lowest) {
-			BUG_ON(bytenr != node->bytenr);
+			if (bytenr != node->bytenr) {
+				btrfs_err(root->fs_info,
+		"lowest leaf/node mismatch: bytenr %llu node->bytenr %llu slot %d upper %llu",
+					  bytenr, node->bytenr, slot,
+					  upper->eb->start);
+				err = -EEXIST;
+				goto next;
+			}
 		} else {
 			if (node->eb->start == bytenr)
 				goto next;
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] Btrfs: kill BUG_ON in do_relocation
  2016-09-14 16:27 [PATCH] Btrfs: kill BUG_ON in do_relocation Liu Bo
@ 2016-09-14 17:13 ` Josef Bacik
  2016-09-14 17:29   ` Chris Mason
  2016-09-14 18:16   ` Liu Bo
  2016-09-23 21:05 ` [PATCH v2] " Liu Bo
  1 sibling, 2 replies; 15+ messages in thread
From: Josef Bacik @ 2016-09-14 17:13 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs; +Cc: David Sterba

On 09/14/2016 12:27 PM, Liu Bo wrote:
> While updating btree, we try to push items between sibling
> nodes/leaves in order to keep height as low as possible.
> But we don't memset the original places with zero when
> pushing items so that we could end up leaving stale content
> in nodes/leaves.  One may read the above stale content by
> increasing btree blocks' @nritems.
>

Ok this sounds really bad.  Is this as bad as I think it sounds?  We should 
probably fix this like right now right?

> One case I've come across is that in fs tree, a leaf has two
> parent nodes, hence running balance ends up with processing
> this leaf with two parent nodes, but it can only reach the
> valid parent node through btrfs_search_slot, so it'd be like,
>
> do_relocation
>     for P in all parent nodes of block A:
>         if !P->eb:
>             btrfs_search_slot(key);   --> get path from P to A.
>         if lowest:
>             BUG_ON(A->bytenr != bytenr of A recorded in P);
>         btrfs_cow_block(P, A);   --> change A's bytenr in P.
>
> After btrfs_cow_block, P has the new bytenr of A, but with the
> same @key, we get the same path again, and get panic by BUG_ON.
>

Ok so this happens because of the problem you described above right?  Because 
this shouldn't actually happen.  We should go down the next parent and still get 
to the original block where A->bytenr == node->bytenr.  So I'm all for killing 
this BUG_ON(), but the problem description is wrong.  We need to keep this 
scenario from happening in the first place.  And then we kill this BUG_ON() 
because it can happen if there is FS corruption or something.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Btrfs: kill BUG_ON in do_relocation
  2016-09-14 17:13 ` Josef Bacik
@ 2016-09-14 17:29   ` Chris Mason
  2016-09-14 17:31     ` Josef Bacik
  2016-09-14 18:16   ` Liu Bo
  1 sibling, 1 reply; 15+ messages in thread
From: Chris Mason @ 2016-09-14 17:29 UTC (permalink / raw)
  To: Josef Bacik, Liu Bo, linux-btrfs; +Cc: David Sterba



On 09/14/2016 01:13 PM, Josef Bacik wrote:
> On 09/14/2016 12:27 PM, Liu Bo wrote:
>> While updating btree, we try to push items between sibling
>> nodes/leaves in order to keep height as low as possible.
>> But we don't memset the original places with zero when
>> pushing items so that we could end up leaving stale content
>> in nodes/leaves.  One may read the above stale content by
>> increasing btree blocks' @nritems.
>>
>
> Ok this sounds really bad.  Is this as bad as I think it sounds?  We
> should probably fix this like right now right?

He's bumping @nritems with a fuzzer I think?  As in this happens when 
someone forces it (or via some other bug) but not in normal operations.

-chris

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Btrfs: kill BUG_ON in do_relocation
  2016-09-14 17:29   ` Chris Mason
@ 2016-09-14 17:31     ` Josef Bacik
  2016-09-14 18:19       ` Liu Bo
  0 siblings, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2016-09-14 17:31 UTC (permalink / raw)
  To: Chris Mason, Liu Bo, linux-btrfs; +Cc: David Sterba

On 09/14/2016 01:29 PM, Chris Mason wrote:
>
>
> On 09/14/2016 01:13 PM, Josef Bacik wrote:
>> On 09/14/2016 12:27 PM, Liu Bo wrote:
>>> While updating btree, we try to push items between sibling
>>> nodes/leaves in order to keep height as low as possible.
>>> But we don't memset the original places with zero when
>>> pushing items so that we could end up leaving stale content
>>> in nodes/leaves.  One may read the above stale content by
>>> increasing btree blocks' @nritems.
>>>
>>
>> Ok this sounds really bad.  Is this as bad as I think it sounds?  We
>> should probably fix this like right now right?
>
> He's bumping @nritems with a fuzzer I think?  As in this happens when someone
> forces it (or via some other bug) but not in normal operations.
>

Oh ok if this happens with a fuzzer than this is fine, but I'd rather do -EIO so 
we know this is something bad with the fs.  And change the changelog to make it 
explicit that this is the result of fs corruption, not normal operation.  Then 
you can add

Reviewed-by: Josef Bacik <jbacik@fb.com>

Thanks,

Josef


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Btrfs: kill BUG_ON in do_relocation
  2016-09-14 17:13 ` Josef Bacik
  2016-09-14 17:29   ` Chris Mason
@ 2016-09-14 18:16   ` Liu Bo
  1 sibling, 0 replies; 15+ messages in thread
From: Liu Bo @ 2016-09-14 18:16 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, David Sterba

On Wed, Sep 14, 2016 at 01:13:34PM -0400, Josef Bacik wrote:
> On 09/14/2016 12:27 PM, Liu Bo wrote:
> > While updating btree, we try to push items between sibling
> > nodes/leaves in order to keep height as low as possible.
> > But we don't memset the original places with zero when
> > pushing items so that we could end up leaving stale content
> > in nodes/leaves.  One may read the above stale content by
> > increasing btree blocks' @nritems.
> > 
> 
> Ok this sounds really bad.  Is this as bad as I think it sounds?  We should
> probably fix this like right now right?

Yeah, I'm preparing two patches to address it.

> 
> > One case I've come across is that in fs tree, a leaf has two
> > parent nodes, hence running balance ends up with processing
> > this leaf with two parent nodes, but it can only reach the
> > valid parent node through btrfs_search_slot, so it'd be like,
> > 
> > do_relocation
> >     for P in all parent nodes of block A:
> >         if !P->eb:
> >             btrfs_search_slot(key);   --> get path from P to A.
> >         if lowest:
> >             BUG_ON(A->bytenr != bytenr of A recorded in P);
> >         btrfs_cow_block(P, A);   --> change A's bytenr in P.
> > 
> > After btrfs_cow_block, P has the new bytenr of A, but with the
> > same @key, we get the same path again, and get panic by BUG_ON.
> > 
> 
> Ok so this happens because of the problem you described above right?
> Because this shouldn't actually happen.  We should go down the next parent
> and still get to the original block where A->bytenr == node->bytenr.

After bumping block 55279616's @nritems from 320 to 492,

fs tree key (FS_TREE ROOT_ITEM 0) 
node 55230464 level 2 items 4 free 489 generation 11 owner 5
fs uuid 03737dfb-8087-4923-b058-2ec629bf39bd
chunk uuid d586e037-9d50-4332-9b3a-fa2344d210e1
        key (256 INODE_ITEM 0) block 55279616 (3374) gen 11 nr 0
        key (257 DIR_INDEX 24879) block 56410112 (3443) gen 11 nr 1
        key (10404 INODE_ITEM 0) block 60391424 (3686) gen 11 nr 2
        key (34068 INODE_ITEM 0) block 55246848 (3372) gen 11 nr 3
node 55279616 level 1 items 492 free 1 generation 11 owner 5
        ...
        key (257 DIR_INDEX 24625) block 44335104 (2706) gen 9 nr 319
        key (2100 INODE_ITEM 0) block 30425088 (1857) gen 7 nr 320
        key (2148 INODE_ITEM 0) block 30441472 (1858) gen 7 nr 321
node 56410112 level 1 items 311 free 182 generation 11 owner 5
fs uuid 03737dfb-8087-4923-b058-2ec629bf39bd
chunk uuid d586e037-9d50-4332-9b3a-fa2344d210e1
        ...
        key (2052 INODE_ITEM 0) block 30408704 (1856) gen 7 nr 137
        key (2100 INODE_ITEM 0) block 30425088 (1857) gen 7 nr 138
        key (2148 INODE_ITEM 0) block 30441472 (1858) gen 7 nr 139
        ...

If we search fs tree with disk key (2100 INODE_ITEM 0), we always get
into node block 56410112, not node block 55279616 after bin_search in
the top level, so leaf block 30408704 never gets both parent.

>  So I'm
> all for killing this BUG_ON(), but the problem description is wrong.  We
> need to keep this scenario from happening in the first place.  And then we
> kill this BUG_ON() because it can happen if there is FS corruption or
> something.  Thanks,

We really should, we can prevent it from happening by checking btree
node's validation although it is not as easy as checking a leaf.

The description is what I got from debugging, just wanna show how we
come to the BUG_ON since it's not straighforward at all.

But yes, I'll update the description that this problem is due to fs
corruption.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Btrfs: kill BUG_ON in do_relocation
  2016-09-14 17:31     ` Josef Bacik
@ 2016-09-14 18:19       ` Liu Bo
  2016-09-15 19:01         ` Liu Bo
  0 siblings, 1 reply; 15+ messages in thread
From: Liu Bo @ 2016-09-14 18:19 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Chris Mason, linux-btrfs, David Sterba

On Wed, Sep 14, 2016 at 01:31:31PM -0400, Josef Bacik wrote:
> On 09/14/2016 01:29 PM, Chris Mason wrote:
> > 
> > 
> > On 09/14/2016 01:13 PM, Josef Bacik wrote:
> > > On 09/14/2016 12:27 PM, Liu Bo wrote:
> > > > While updating btree, we try to push items between sibling
> > > > nodes/leaves in order to keep height as low as possible.
> > > > But we don't memset the original places with zero when
> > > > pushing items so that we could end up leaving stale content
> > > > in nodes/leaves.  One may read the above stale content by
> > > > increasing btree blocks' @nritems.
> > > > 
> > > 
> > > Ok this sounds really bad.  Is this as bad as I think it sounds?  We
> > > should probably fix this like right now right?
> > 
> > He's bumping @nritems with a fuzzer I think?  As in this happens when someone
> > forces it (or via some other bug) but not in normal operations.
> > 
> 
> Oh ok if this happens with a fuzzer than this is fine, but I'd rather do
> -EIO so we know this is something bad with the fs. 

-EIO may be more appropriate to be given while reading btree blocks and
checking their validation?

> And change the changelog
> to make it explicit that this is the result of fs corruption, not normal
> operation.  Then you can add
> 
> Reviewed-by: Josef Bacik <jbacik@fb.com>

OK, make sense.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Btrfs: kill BUG_ON in do_relocation
  2016-09-15 19:01         ` Liu Bo
@ 2016-09-15 18:58           ` Chris Mason
  2016-09-19 18:01             ` David Sterba
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Mason @ 2016-09-15 18:58 UTC (permalink / raw)
  To: bo.li.liu, Josef Bacik; +Cc: linux-btrfs, David Sterba



On 09/15/2016 03:01 PM, Liu Bo wrote:
> On Wed, Sep 14, 2016 at 11:19:04AM -0700, Liu Bo wrote:
>> On Wed, Sep 14, 2016 at 01:31:31PM -0400, Josef Bacik wrote:
>>> On 09/14/2016 01:29 PM, Chris Mason wrote:
>>>>
>>>>
>>>> On 09/14/2016 01:13 PM, Josef Bacik wrote:
>>>>> On 09/14/2016 12:27 PM, Liu Bo wrote:
>>>>>> While updating btree, we try to push items between sibling
>>>>>> nodes/leaves in order to keep height as low as possible.
>>>>>> But we don't memset the original places with zero when
>>>>>> pushing items so that we could end up leaving stale content
>>>>>> in nodes/leaves.  One may read the above stale content by
>>>>>> increasing btree blocks' @nritems.
>>>>>>
>>>>>
>>>>> Ok this sounds really bad.  Is this as bad as I think it sounds?  We
>>>>> should probably fix this like right now right?
>>>>
>>>> He's bumping @nritems with a fuzzer I think?  As in this happens when someone
>>>> forces it (or via some other bug) but not in normal operations.
>>>>
>>>
>>> Oh ok if this happens with a fuzzer than this is fine, but I'd rather do
>>> -EIO so we know this is something bad with the fs.
>>
>> -EIO may be more appropriate to be given while reading btree blocks and
>> checking their validation?
>
> Looks like EIO doesn't fit into this case, either, do we have any errno
> representing 'corrupted filesystem'?

That's EIO.  Sometimes the EIO is big enough we have to abort, but 
really the abort is just adding bonus.

-chris


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Btrfs: kill BUG_ON in do_relocation
  2016-09-14 18:19       ` Liu Bo
@ 2016-09-15 19:01         ` Liu Bo
  2016-09-15 18:58           ` Chris Mason
  0 siblings, 1 reply; 15+ messages in thread
From: Liu Bo @ 2016-09-15 19:01 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Chris Mason, linux-btrfs, David Sterba

On Wed, Sep 14, 2016 at 11:19:04AM -0700, Liu Bo wrote:
> On Wed, Sep 14, 2016 at 01:31:31PM -0400, Josef Bacik wrote:
> > On 09/14/2016 01:29 PM, Chris Mason wrote:
> > > 
> > > 
> > > On 09/14/2016 01:13 PM, Josef Bacik wrote:
> > > > On 09/14/2016 12:27 PM, Liu Bo wrote:
> > > > > While updating btree, we try to push items between sibling
> > > > > nodes/leaves in order to keep height as low as possible.
> > > > > But we don't memset the original places with zero when
> > > > > pushing items so that we could end up leaving stale content
> > > > > in nodes/leaves.  One may read the above stale content by
> > > > > increasing btree blocks' @nritems.
> > > > > 
> > > > 
> > > > Ok this sounds really bad.  Is this as bad as I think it sounds?  We
> > > > should probably fix this like right now right?
> > > 
> > > He's bumping @nritems with a fuzzer I think?  As in this happens when someone
> > > forces it (or via some other bug) but not in normal operations.
> > > 
> > 
> > Oh ok if this happens with a fuzzer than this is fine, but I'd rather do
> > -EIO so we know this is something bad with the fs. 
> 
> -EIO may be more appropriate to be given while reading btree blocks and
> checking their validation?

Looks like EIO doesn't fit into this case, either, do we have any errno
representing 'corrupted filesystem'?

Thanks,

-liubo

> 
> > And change the changelog
> > to make it explicit that this is the result of fs corruption, not normal
> > operation.  Then you can add
> > 
> > Reviewed-by: Josef Bacik <jbacik@fb.com>
> 
> OK, make sense.
> 
> Thanks,
> 
> -liubo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Btrfs: kill BUG_ON in do_relocation
  2016-09-15 18:58           ` Chris Mason
@ 2016-09-19 18:01             ` David Sterba
  2016-09-19 23:11               ` Liu Bo
  0 siblings, 1 reply; 15+ messages in thread
From: David Sterba @ 2016-09-19 18:01 UTC (permalink / raw)
  To: Chris Mason; +Cc: bo.li.liu, Josef Bacik, linux-btrfs, David Sterba

On Thu, Sep 15, 2016 at 02:58:12PM -0400, Chris Mason wrote:
> 
> 
> On 09/15/2016 03:01 PM, Liu Bo wrote:
> > On Wed, Sep 14, 2016 at 11:19:04AM -0700, Liu Bo wrote:
> >> On Wed, Sep 14, 2016 at 01:31:31PM -0400, Josef Bacik wrote:
> >>> On 09/14/2016 01:29 PM, Chris Mason wrote:
> >>>>
> >>>>
> >>>> On 09/14/2016 01:13 PM, Josef Bacik wrote:
> >>>>> On 09/14/2016 12:27 PM, Liu Bo wrote:
> >>>>>> While updating btree, we try to push items between sibling
> >>>>>> nodes/leaves in order to keep height as low as possible.
> >>>>>> But we don't memset the original places with zero when
> >>>>>> pushing items so that we could end up leaving stale content
> >>>>>> in nodes/leaves.  One may read the above stale content by
> >>>>>> increasing btree blocks' @nritems.
> >>>>>>
> >>>>>
> >>>>> Ok this sounds really bad.  Is this as bad as I think it sounds?  We
> >>>>> should probably fix this like right now right?
> >>>>
> >>>> He's bumping @nritems with a fuzzer I think?  As in this happens when someone
> >>>> forces it (or via some other bug) but not in normal operations.
> >>>>
> >>>
> >>> Oh ok if this happens with a fuzzer than this is fine, but I'd rather do
> >>> -EIO so we know this is something bad with the fs.
> >>
> >> -EIO may be more appropriate to be given while reading btree blocks and
> >> checking their validation?
> >
> > Looks like EIO doesn't fit into this case, either, do we have any errno
> > representing 'corrupted filesystem'?
> 
> That's EIO.  Sometimes the EIO is big enough we have to abort, but 
> really the abort is just adding bonus.

I think we misuse the EIO where we should really return EFSCORRUPTED
that's an alias for EUCLEAN, looking at xfs or ext4. EIO should be
really a message that the hardware is bad.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Btrfs: kill BUG_ON in do_relocation
  2016-09-19 18:01             ` David Sterba
@ 2016-09-19 23:11               ` Liu Bo
  2016-09-20  8:03                 ` David Sterba
  0 siblings, 1 reply; 15+ messages in thread
From: Liu Bo @ 2016-09-19 23:11 UTC (permalink / raw)
  To: dsterba; +Cc: Chris Mason, Josef Bacik, linux-btrfs

On Mon, Sep 19, 2016 at 08:01:05PM +0200, David Sterba wrote:
> On Thu, Sep 15, 2016 at 02:58:12PM -0400, Chris Mason wrote:
> > 
> > 
> > On 09/15/2016 03:01 PM, Liu Bo wrote:
> > > On Wed, Sep 14, 2016 at 11:19:04AM -0700, Liu Bo wrote:
> > >> On Wed, Sep 14, 2016 at 01:31:31PM -0400, Josef Bacik wrote:
> > >>> On 09/14/2016 01:29 PM, Chris Mason wrote:
> > >>>>
> > >>>>
> > >>>> On 09/14/2016 01:13 PM, Josef Bacik wrote:
> > >>>>> On 09/14/2016 12:27 PM, Liu Bo wrote:
> > >>>>>> While updating btree, we try to push items between sibling
> > >>>>>> nodes/leaves in order to keep height as low as possible.
> > >>>>>> But we don't memset the original places with zero when
> > >>>>>> pushing items so that we could end up leaving stale content
> > >>>>>> in nodes/leaves.  One may read the above stale content by
> > >>>>>> increasing btree blocks' @nritems.
> > >>>>>>
> > >>>>>
> > >>>>> Ok this sounds really bad.  Is this as bad as I think it sounds?  We
> > >>>>> should probably fix this like right now right?
> > >>>>
> > >>>> He's bumping @nritems with a fuzzer I think?  As in this happens when someone
> > >>>> forces it (or via some other bug) but not in normal operations.
> > >>>>
> > >>>
> > >>> Oh ok if this happens with a fuzzer than this is fine, but I'd rather do
> > >>> -EIO so we know this is something bad with the fs.
> > >>
> > >> -EIO may be more appropriate to be given while reading btree blocks and
> > >> checking their validation?
> > >
> > > Looks like EIO doesn't fit into this case, either, do we have any errno
> > > representing 'corrupted filesystem'?
> > 
> > That's EIO.  Sometimes the EIO is big enough we have to abort, but 
> > really the abort is just adding bonus.
> 
> I think we misuse the EIO where we should really return EFSCORRUPTED
> that's an alias for EUCLEAN, looking at xfs or ext4. EIO should be
> really a message that the hardware is bad.

I love this idea, but one quick question, when returning EUCLEAN, what
message do users get? 

"#define EUCLEAN         117     /* Structure needs cleaning */"

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Btrfs: kill BUG_ON in do_relocation
  2016-09-19 23:11               ` Liu Bo
@ 2016-09-20  8:03                 ` David Sterba
  2016-09-20 17:59                   ` Liu Bo
  0 siblings, 1 reply; 15+ messages in thread
From: David Sterba @ 2016-09-20  8:03 UTC (permalink / raw)
  To: Liu Bo; +Cc: Chris Mason, Josef Bacik, linux-btrfs

On Mon, Sep 19, 2016 at 04:11:44PM -0700, Liu Bo wrote:
> > > That's EIO.  Sometimes the EIO is big enough we have to abort, but 
> > > really the abort is just adding bonus.
> > 
> > I think we misuse the EIO where we should really return EFSCORRUPTED
> > that's an alias for EUCLEAN, looking at xfs or ext4. EIO should be
> > really a message that the hardware is bad.
> 
> I love this idea, but one quick question, when returning EUCLEAN, what
> message do users get? 
> 
> "#define EUCLEAN         117     /* Structure needs cleaning */"

strerror(EUCLEAN) -> "Structure needs cleaning"

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Btrfs: kill BUG_ON in do_relocation
  2016-09-20  8:03                 ` David Sterba
@ 2016-09-20 17:59                   ` Liu Bo
  2016-09-21  8:14                     ` David Sterba
  0 siblings, 1 reply; 15+ messages in thread
From: Liu Bo @ 2016-09-20 17:59 UTC (permalink / raw)
  To: dsterba; +Cc: Chris Mason, Josef Bacik, linux-btrfs

On Tue, Sep 20, 2016 at 10:03:43AM +0200, David Sterba wrote:
> On Mon, Sep 19, 2016 at 04:11:44PM -0700, Liu Bo wrote:
> > > > That's EIO.  Sometimes the EIO is big enough we have to abort, but 
> > > > really the abort is just adding bonus.
> > > 
> > > I think we misuse the EIO where we should really return EFSCORRUPTED
> > > that's an alias for EUCLEAN, looking at xfs or ext4. EIO should be
> > > really a message that the hardware is bad.
> > 
> > I love this idea, but one quick question, when returning EUCLEAN, what
> > message do users get? 
> > 
> > "#define EUCLEAN         117     /* Structure needs cleaning */"
> 
> strerror(EUCLEAN) -> "Structure needs cleaning"

Hmm, if I was the user, I'm not sure how to deal with "Structure needs cleaning", so still need to take a glance at dmesg log.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Btrfs: kill BUG_ON in do_relocation
  2016-09-20 17:59                   ` Liu Bo
@ 2016-09-21  8:14                     ` David Sterba
  0 siblings, 0 replies; 15+ messages in thread
From: David Sterba @ 2016-09-21  8:14 UTC (permalink / raw)
  To: Liu Bo; +Cc: Chris Mason, Josef Bacik, linux-btrfs

On Tue, Sep 20, 2016 at 10:59:59AM -0700, Liu Bo wrote:
> On Tue, Sep 20, 2016 at 10:03:43AM +0200, David Sterba wrote:
> > On Mon, Sep 19, 2016 at 04:11:44PM -0700, Liu Bo wrote:
> > > > > That's EIO.  Sometimes the EIO is big enough we have to abort, but 
> > > > > really the abort is just adding bonus.
> > > > 
> > > > I think we misuse the EIO where we should really return EFSCORRUPTED
> > > > that's an alias for EUCLEAN, looking at xfs or ext4. EIO should be
> > > > really a message that the hardware is bad.
> > > 
> > > I love this idea, but one quick question, when returning EUCLEAN, what
> > > message do users get? 
> > > 
> > > "#define EUCLEAN         117     /* Structure needs cleaning */"
> > 
> > strerror(EUCLEAN) -> "Structure needs cleaning"
> 
> Hmm, if I was the user, I'm not sure how to deal with "Structure needs
> cleaning", so still need to take a glance at dmesg log.

I understand that, it's not descriptive at all. We could remap it to EIO
once it goes outside of btrfs module, so it would be only internal
error.  The commit that introduces it to xfs is 10 years old,
da2f4d679c8070ba5b6a920281e495917b293aa0 and mentions something like
that.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] Btrfs: kill BUG_ON in do_relocation
  2016-09-14 16:27 [PATCH] Btrfs: kill BUG_ON in do_relocation Liu Bo
  2016-09-14 17:13 ` Josef Bacik
@ 2016-09-23 21:05 ` Liu Bo
  2016-10-11 14:25   ` David Sterba
  1 sibling, 1 reply; 15+ messages in thread
From: Liu Bo @ 2016-09-23 21:05 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba, Chris Mason, Josef Bacik

While updating btree, we try to push items between sibling
nodes/leaves in order to keep height as low as possible.
But we don't memset the original places with zero when
pushing items so that we could end up leaving stale content
in nodes/leaves.  One may read the above stale content by
increasing btree blocks' @nritems.

One case I've come across is that in fs tree, a leaf has two
parent nodes, hence running balance ends up with processing
this leaf with two parent nodes, but it can only reach the
valid parent node through btrfs_search_slot, so it'd be like,

do_relocation
    for P in all parent nodes of block A:
        if !P->eb:
            btrfs_search_slot(key);   --> get path from P to A.
        if lowest:
            BUG_ON(A->bytenr != bytenr of A recorded in P);
        btrfs_cow_block(P, A);   --> change A's bytenr in P.

After btrfs_cow_block, P has the new bytenr of A, but with the
same @key, we get the same path again, and get panic by BUG_ON.

Note that this is only happening in a corrupted fs, for a
regular fs in which we have correct @nritems so that we won't
read stale content in any case.

Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v2: - use new internal error EFSCORRUPTED as "Filesystem is corrupted",
      suggested by David Sterba.

 fs/btrfs/ctree.h      | 2 ++
 fs/btrfs/relocation.c | 9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 67d71c0..8b23410 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -127,6 +127,8 @@ static inline unsigned long btrfs_chunk_item_size(int num_stripes)
 #define BTRFS_OLD_BACKREF_REV		0
 #define BTRFS_MIXED_BACKREF_REV		1
 
+#define EFSCORRUPTED			EUCLEAN /* Filesystem is corrupted */
+
 /*
  * every tree block (leaf or node) starts with this header.
  */
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index c0c13dc..6f8b952 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2712,7 +2712,14 @@ static int do_relocation(struct btrfs_trans_handle *trans,
 
 		bytenr = btrfs_node_blockptr(upper->eb, slot);
 		if (lowest) {
-			BUG_ON(bytenr != node->bytenr);
+			if (bytenr != node->bytenr) {
+				btrfs_err(root->fs_info,
+		"lowest leaf/node mismatch: bytenr %llu node->bytenr %llu slot %d upper %llu",
+					  bytenr, node->bytenr, slot,
+					  upper->eb->start);
+				err = -EFSCORRUPTED;
+				goto next;
+			}
 		} else {
 			if (node->eb->start == bytenr)
 				goto next;
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] Btrfs: kill BUG_ON in do_relocation
  2016-09-23 21:05 ` [PATCH v2] " Liu Bo
@ 2016-10-11 14:25   ` David Sterba
  0 siblings, 0 replies; 15+ messages in thread
From: David Sterba @ 2016-10-11 14:25 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs, David Sterba, Chris Mason, Josef Bacik

On Fri, Sep 23, 2016 at 02:05:04PM -0700, Liu Bo wrote:
> While updating btree, we try to push items between sibling
> nodes/leaves in order to keep height as low as possible.
> But we don't memset the original places with zero when
> pushing items so that we could end up leaving stale content
> in nodes/leaves.  One may read the above stale content by
> increasing btree blocks' @nritems.
> 
> One case I've come across is that in fs tree, a leaf has two
> parent nodes, hence running balance ends up with processing
> this leaf with two parent nodes, but it can only reach the
> valid parent node through btrfs_search_slot, so it'd be like,
> 
> do_relocation
>     for P in all parent nodes of block A:
>         if !P->eb:
>             btrfs_search_slot(key);   --> get path from P to A.
>         if lowest:
>             BUG_ON(A->bytenr != bytenr of A recorded in P);
>         btrfs_cow_block(P, A);   --> change A's bytenr in P.
> 
> After btrfs_cow_block, P has the new bytenr of A, but with the
> same @key, we get the same path again, and get panic by BUG_ON.
> 
> Note that this is only happening in a corrupted fs, for a
> regular fs in which we have correct @nritems so that we won't
> read stale content in any case.
> 
> Reviewed-by: Josef Bacik <jbacik@fb.com>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
> v2: - use new internal error EFSCORRUPTED as "Filesystem is corrupted",
>       suggested by David Sterba.

Sorry I steered it to EFSCORRUPTED, we should introduce the error code
separately and audit the call chains. I'll drop the parts and change it
back to EIO.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-10-11 14:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-14 16:27 [PATCH] Btrfs: kill BUG_ON in do_relocation Liu Bo
2016-09-14 17:13 ` Josef Bacik
2016-09-14 17:29   ` Chris Mason
2016-09-14 17:31     ` Josef Bacik
2016-09-14 18:19       ` Liu Bo
2016-09-15 19:01         ` Liu Bo
2016-09-15 18:58           ` Chris Mason
2016-09-19 18:01             ` David Sterba
2016-09-19 23:11               ` Liu Bo
2016-09-20  8:03                 ` David Sterba
2016-09-20 17:59                   ` Liu Bo
2016-09-21  8:14                     ` David Sterba
2016-09-14 18:16   ` Liu Bo
2016-09-23 21:05 ` [PATCH v2] " Liu Bo
2016-10-11 14:25   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.