* [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees @ 2019-10-04 9:31 Qu Wenruo 2019-10-04 9:31 ` [PATCH 1/3] btrfs: tree-checker: Fix false alerts on " Qu Wenruo ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Qu Wenruo @ 2019-10-04 9:31 UTC (permalink / raw) To: linux-btrfs There is a false alerts of tree-checker when running fstests/btrfs/063 in a loop. The bug is caused by commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM"). For the full error analyse, please check the first patch. The first patch will give it a quick fix, so that it can be addressed in v5.4 release cycle. The 2nd patch is a more proper patch, with refactor to reduce duplicated code and add the check to INODE_REF item. But it's pretty large (+72, -41), not sure if it's suitbale for late -rc. Also current write-time tree checker error message is too silent, can't be caught by fstests nor a quick glance of dmesg. And it doesn't contain enough info to debug. So to enhance the error message, and make it more noisy, the 3rd patch will enhance the error message. Qu Wenruo (3): btrfs: tree-checker: Fix false alerts on log trees btrfs: tree-checker: Refactor prev_key check for ino into a function btrfs: Enhance the error outputting for write time tree checker fs/btrfs/disk-io.c | 2 + fs/btrfs/tree-checker.c | 111 ++++++++++++++++++++++++++-------------- 2 files changed, 74 insertions(+), 39 deletions(-) -- 2.23.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees 2019-10-04 9:31 [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees Qu Wenruo @ 2019-10-04 9:31 ` Qu Wenruo 2019-10-04 13:52 ` Nikolay Borisov 2019-10-04 14:15 ` Filipe Manana 2019-10-04 9:31 ` [PATCH 2/3] btrfs: tree-checker: Refactor prev_key check for ino into a function Qu Wenruo ` (2 subsequent siblings) 3 siblings, 2 replies; 10+ messages in thread From: Qu Wenruo @ 2019-10-04 9:31 UTC (permalink / raw) To: linux-btrfs; +Cc: David Sterba [BUG] When running btrfs/063 in a loop, we got the following random write time tree checker error: BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, have 305 expect 307 BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 12146 owner 18446744073709551610 BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 26176 item 0 key (305 1 0) itemoff 16123 itemsize 160 inode generation 0 size 0 mode 40777 item 1 key (305 12 257) itemoff 16111 itemsize 12 item 2 key (307 108 0) itemoff 16058 itemsize 53 <<< extent data disk bytenr 0 nr 0 extent data offset 0 nr 614400 ram 671744 item 3 key (307 108 614400) itemoff 16005 itemsize 53 extent data disk bytenr 195342336 nr 57344 extent data offset 0 nr 53248 ram 57344 item 4 key (307 108 667648) itemoff 15952 itemsize 53 extent data disk bytenr 194048000 nr 4096 extent data offset 0 nr 4096 ram 4096 [...] BTRFS error (device dm-4): block=33095680 write time tree block corruption detected BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO failure (Error while writing out transaction) BTRFS info (device dm-4): forced readonly BTRFS warning (device dm-4): Skipping commit of aborted transaction. BTRFS info (device dm-4): use zlib compression, level 3 BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure [CAUSE] Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM") assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items should have previous key with the same objectid as ino. But it's only true for fs trees. For log-tree, we can get above log tree block where an EXTENT_DATA item has no previous key with the same ino. As log tree only records modified items, it won't record unmodified items like INODE_ITEM. So this triggers write time tree check warning. [FIX] As a quick fix, check header owner to skip the previous key if it's not fs tree (log tree doesn't count as fs tree). This fix is only to be merged as a quick fix. There will be a more comprehensive fix to refactor the common check into one function. Reported-by: David Sterba <dsterba@suse.com> Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM") Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/tree-checker.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index b8f82d9be9f0..5e34cd5e3e2e 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer *leaf, * But if objectids mismatch, it means we have a missing * INODE_ITEM. */ - if (slot > 0 && prev_key->objectid != key->objectid) { + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) && + prev_key->objectid != key->objectid) { file_extent_err(leaf, slot, "invalid previous key objectid, have %llu expect %llu", prev_key->objectid, key->objectid); @@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf, u32 cur = 0; /* Same check as in check_extent_data_item() */ - if (slot > 0 && prev_key->objectid != key->objectid) { + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) && + prev_key->objectid != key->objectid) { dir_item_err(leaf, slot, "invalid previous key objectid, have %llu expect %llu", prev_key->objectid, key->objectid); -- 2.23.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees 2019-10-04 9:31 ` [PATCH 1/3] btrfs: tree-checker: Fix false alerts on " Qu Wenruo @ 2019-10-04 13:52 ` Nikolay Borisov 2019-10-04 14:13 ` Filipe Manana 2019-10-04 14:15 ` Filipe Manana 1 sibling, 1 reply; 10+ messages in thread From: Nikolay Borisov @ 2019-10-04 13:52 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs; +Cc: David Sterba, Filipe Manana On 4.10.19 г. 12:31 ч., Qu Wenruo wrote: > [BUG] > When running btrfs/063 in a loop, we got the following random write time > tree checker error: > > BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, have 305 expect 307 > BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 12146 owner 18446744073709551610 > BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 26176 > item 0 key (305 1 0) itemoff 16123 itemsize 160 > inode generation 0 size 0 mode 40777 > item 1 key (305 12 257) itemoff 16111 itemsize 12 > item 2 key (307 108 0) itemoff 16058 itemsize 53 <<< > extent data disk bytenr 0 nr 0 > extent data offset 0 nr 614400 ram 671744 > item 3 key (307 108 614400) itemoff 16005 itemsize 53 > extent data disk bytenr 195342336 nr 57344 > extent data offset 0 nr 53248 ram 57344 > item 4 key (307 108 667648) itemoff 15952 itemsize 53 > extent data disk bytenr 194048000 nr 4096 > extent data offset 0 nr 4096 ram 4096 > [...] > BTRFS error (device dm-4): block=33095680 write time tree block corruption detected > BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO failure (Error while writing out transaction) > BTRFS info (device dm-4): forced readonly > BTRFS warning (device dm-4): Skipping commit of aborted transaction. > BTRFS info (device dm-4): use zlib compression, level 3 > BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure > > [CAUSE] > Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM") > assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items > should have previous key with the same objectid as ino. > > But it's only true for fs trees. For log-tree, we can get above log tree > block where an EXTENT_DATA item has no previous key with the same ino. > As log tree only records modified items, it won't record unmodified > items like INODE_ITEM. > > So this triggers write time tree check warning. > > [FIX] > As a quick fix, check header owner to skip the previous key if it's not > fs tree (log tree doesn't count as fs tree). > > This fix is only to be merged as a quick fix. > There will be a more comprehensive fix to refactor the common check into > one function. > > Reported-by: David Sterba <dsterba@suse.com> > Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM") > Signed-off-by: Qu Wenruo <wqu@suse.com> It's not entirely clear why this bug manifests. My tests show that when we write extents we always update the inode's c/m time so it's always dirtied hence it's logged. OTOH when punching a hole the same thing is valid. Filipe, under what conditions should it be possible to log an EXTENT_DATA item without first logging the inode it belongs to? It seems using the usual write paths (e.g. buffered write and punchole) that's impossible? > --- > fs/btrfs/tree-checker.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c > index b8f82d9be9f0..5e34cd5e3e2e 100644 > --- a/fs/btrfs/tree-checker.c > +++ b/fs/btrfs/tree-checker.c > @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer *leaf, > * But if objectids mismatch, it means we have a missing > * INODE_ITEM. > */ > - if (slot > 0 && prev_key->objectid != key->objectid) { > + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) && > + prev_key->objectid != key->objectid) { > file_extent_err(leaf, slot, > "invalid previous key objectid, have %llu expect %llu", > prev_key->objectid, key->objectid); > @@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf, > u32 cur = 0; > > /* Same check as in check_extent_data_item() */ > - if (slot > 0 && prev_key->objectid != key->objectid) { > + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) && > + prev_key->objectid != key->objectid) { > dir_item_err(leaf, slot, > "invalid previous key objectid, have %llu expect %llu", > prev_key->objectid, key->objectid); > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees 2019-10-04 13:52 ` Nikolay Borisov @ 2019-10-04 14:13 ` Filipe Manana 2019-10-04 14:19 ` Nikolay Borisov 0 siblings, 1 reply; 10+ messages in thread From: Filipe Manana @ 2019-10-04 14:13 UTC (permalink / raw) To: Nikolay Borisov; +Cc: Qu Wenruo, linux-btrfs, David Sterba On Fri, Oct 4, 2019 at 2:54 PM Nikolay Borisov <nborisov@suse.com> wrote: > > > > On 4.10.19 г. 12:31 ч., Qu Wenruo wrote: > > [BUG] > > When running btrfs/063 in a loop, we got the following random write time > > tree checker error: > > > > BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, have 305 expect 307 > > BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 12146 owner 18446744073709551610 > > BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 26176 > > item 0 key (305 1 0) itemoff 16123 itemsize 160 > > inode generation 0 size 0 mode 40777 > > item 1 key (305 12 257) itemoff 16111 itemsize 12 > > item 2 key (307 108 0) itemoff 16058 itemsize 53 <<< > > extent data disk bytenr 0 nr 0 > > extent data offset 0 nr 614400 ram 671744 > > item 3 key (307 108 614400) itemoff 16005 itemsize 53 > > extent data disk bytenr 195342336 nr 57344 > > extent data offset 0 nr 53248 ram 57344 > > item 4 key (307 108 667648) itemoff 15952 itemsize 53 > > extent data disk bytenr 194048000 nr 4096 > > extent data offset 0 nr 4096 ram 4096 > > [...] > > BTRFS error (device dm-4): block=33095680 write time tree block corruption detected > > BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO failure (Error while writing out transaction) > > BTRFS info (device dm-4): forced readonly > > BTRFS warning (device dm-4): Skipping commit of aborted transaction. > > BTRFS info (device dm-4): use zlib compression, level 3 > > BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure > > > > [CAUSE] > > Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM") > > assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items > > should have previous key with the same objectid as ino. > > > > But it's only true for fs trees. For log-tree, we can get above log tree > > block where an EXTENT_DATA item has no previous key with the same ino. > > As log tree only records modified items, it won't record unmodified > > items like INODE_ITEM. > > > > So this triggers write time tree check warning. > > > > [FIX] > > As a quick fix, check header owner to skip the previous key if it's not > > fs tree (log tree doesn't count as fs tree). > > > > This fix is only to be merged as a quick fix. > > There will be a more comprehensive fix to refactor the common check into > > one function. > > > > Reported-by: David Sterba <dsterba@suse.com> > > Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM") > > Signed-off-by: Qu Wenruo <wqu@suse.com> > > > It's not entirely clear why this bug manifests. My tests show that when > we write extents we always update the inode's c/m time so it's always > dirtied hence it's logged. OTOH when punching a hole the same thing is > valid. > > Filipe, under what conditions should it be possible to log an > EXTENT_DATA item without first logging the inode it belongs to? It seems > using the usual write paths (e.g. buffered write and punchole) that's > impossible? The tests you did are pointless, none of those operations write to a log tree, only fsync does that. This change is perfectly fine. Logging (fsync) always logs the inode item since commit [1] (2015), however it might do so after logging extents and other items, and in between that, if writeback for the log tree leaf happens we get that error from the tree-checker. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e4545de5b035c7debb73d260c78377dbb69cbfb5 > > > --- > > fs/btrfs/tree-checker.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c > > index b8f82d9be9f0..5e34cd5e3e2e 100644 > > --- a/fs/btrfs/tree-checker.c > > +++ b/fs/btrfs/tree-checker.c > > @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer *leaf, > > * But if objectids mismatch, it means we have a missing > > * INODE_ITEM. > > */ > > - if (slot > 0 && prev_key->objectid != key->objectid) { > > + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) && > > + prev_key->objectid != key->objectid) { > > file_extent_err(leaf, slot, > > "invalid previous key objectid, have %llu expect %llu", > > prev_key->objectid, key->objectid); > > @@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf, > > u32 cur = 0; > > > > /* Same check as in check_extent_data_item() */ > > - if (slot > 0 && prev_key->objectid != key->objectid) { > > + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) && > > + prev_key->objectid != key->objectid) { > > dir_item_err(leaf, slot, > > "invalid previous key objectid, have %llu expect %llu", > > prev_key->objectid, key->objectid); > > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees 2019-10-04 14:13 ` Filipe Manana @ 2019-10-04 14:19 ` Nikolay Borisov 0 siblings, 0 replies; 10+ messages in thread From: Nikolay Borisov @ 2019-10-04 14:19 UTC (permalink / raw) To: fdmanana; +Cc: David Sterba, WenRuo Qu, linux-btrfs On 4.10.19 г. 17:13 ч., Filipe Manana wrote: > On Fri, Oct 4, 2019 at 2:54 PM Nikolay Borisov <nborisov@suse.com> wrote: >> >> >> >> On 4.10.19 г. 12:31 ч., Qu Wenruo wrote: >>> [BUG] >>> When running btrfs/063 in a loop, we got the following random write time >>> tree checker error: >>> >>> BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, have 305 expect 307 >>> BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 12146 owner 18446744073709551610 >>> BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 26176 >>> item 0 key (305 1 0) itemoff 16123 itemsize 160 >>> inode generation 0 size 0 mode 40777 >>> item 1 key (305 12 257) itemoff 16111 itemsize 12 >>> item 2 key (307 108 0) itemoff 16058 itemsize 53 <<< >>> extent data disk bytenr 0 nr 0 >>> extent data offset 0 nr 614400 ram 671744 >>> item 3 key (307 108 614400) itemoff 16005 itemsize 53 >>> extent data disk bytenr 195342336 nr 57344 >>> extent data offset 0 nr 53248 ram 57344 >>> item 4 key (307 108 667648) itemoff 15952 itemsize 53 >>> extent data disk bytenr 194048000 nr 4096 >>> extent data offset 0 nr 4096 ram 4096 >>> [...] >>> BTRFS error (device dm-4): block=33095680 write time tree block corruption detected >>> BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO failure (Error while writing out transaction) >>> BTRFS info (device dm-4): forced readonly >>> BTRFS warning (device dm-4): Skipping commit of aborted transaction. >>> BTRFS info (device dm-4): use zlib compression, level 3 >>> BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure >>> >>> [CAUSE] >>> Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM") >>> assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items >>> should have previous key with the same objectid as ino. >>> >>> But it's only true for fs trees. For log-tree, we can get above log tree >>> block where an EXTENT_DATA item has no previous key with the same ino. >>> As log tree only records modified items, it won't record unmodified >>> items like INODE_ITEM. >>> >>> So this triggers write time tree check warning. >>> >>> [FIX] >>> As a quick fix, check header owner to skip the previous key if it's not >>> fs tree (log tree doesn't count as fs tree). >>> >>> This fix is only to be merged as a quick fix. >>> There will be a more comprehensive fix to refactor the common check into >>> one function. >>> >>> Reported-by: David Sterba <dsterba@suse.com> >>> Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM") >>> Signed-off-by: Qu Wenruo <wqu@suse.com> >> >> >> It's not entirely clear why this bug manifests. My tests show that when >> we write extents we always update the inode's c/m time so it's always >> dirtied hence it's logged. OTOH when punching a hole the same thing is >> valid. >> >> Filipe, under what conditions should it be possible to log an >> EXTENT_DATA item without first logging the inode it belongs to? It seems >> using the usual write paths (e.g. buffered write and punchole) that's >> impossible? > > The tests you did are pointless, none of those operations write to a > log tree, only fsync does that. You were quick to judge, I tried: xfs_io -f -c "fpunch 1m 4k" -c "fsync" /media/foo (foo was a 4m, fully sycned file) Similar command with the just writing in the middle of the file i.e not changing isize. > > This change is perfectly fine. Logging (fsync) always logs the inode > item since commit [1] (2015), > however it might do so after logging extents and other items, and in > between that, if writeback for > the log tree leaf happens we get that error from the tree-checker. Fair enough, however that clarification about the sequence of events should be in the changelog. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e4545de5b035c7debb73d260c78377dbb69cbfb5 > >> >>> --- >>> fs/btrfs/tree-checker.c | 6 ++++-- >>> 1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c >>> index b8f82d9be9f0..5e34cd5e3e2e 100644 >>> --- a/fs/btrfs/tree-checker.c >>> +++ b/fs/btrfs/tree-checker.c >>> @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer *leaf, >>> * But if objectids mismatch, it means we have a missing >>> * INODE_ITEM. >>> */ >>> - if (slot > 0 && prev_key->objectid != key->objectid) { >>> + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) && >>> + prev_key->objectid != key->objectid) { >>> file_extent_err(leaf, slot, >>> "invalid previous key objectid, have %llu expect %llu", >>> prev_key->objectid, key->objectid); >>> @@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf, >>> u32 cur = 0; >>> >>> /* Same check as in check_extent_data_item() */ >>> - if (slot > 0 && prev_key->objectid != key->objectid) { >>> + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) && >>> + prev_key->objectid != key->objectid) { >>> dir_item_err(leaf, slot, >>> "invalid previous key objectid, have %llu expect %llu", >>> prev_key->objectid, key->objectid); >>> > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees 2019-10-04 9:31 ` [PATCH 1/3] btrfs: tree-checker: Fix false alerts on " Qu Wenruo 2019-10-04 13:52 ` Nikolay Borisov @ 2019-10-04 14:15 ` Filipe Manana 2019-10-07 15:31 ` David Sterba 1 sibling, 1 reply; 10+ messages in thread From: Filipe Manana @ 2019-10-04 14:15 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs, David Sterba On Fri, Oct 4, 2019 at 11:27 AM Qu Wenruo <wqu@suse.com> wrote: > > [BUG] > When running btrfs/063 in a loop, we got the following random write time > tree checker error: > > BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, have 305 expect 307 > BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 12146 owner 18446744073709551610 > BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 26176 > item 0 key (305 1 0) itemoff 16123 itemsize 160 > inode generation 0 size 0 mode 40777 > item 1 key (305 12 257) itemoff 16111 itemsize 12 > item 2 key (307 108 0) itemoff 16058 itemsize 53 <<< > extent data disk bytenr 0 nr 0 > extent data offset 0 nr 614400 ram 671744 > item 3 key (307 108 614400) itemoff 16005 itemsize 53 > extent data disk bytenr 195342336 nr 57344 > extent data offset 0 nr 53248 ram 57344 > item 4 key (307 108 667648) itemoff 15952 itemsize 53 > extent data disk bytenr 194048000 nr 4096 > extent data offset 0 nr 4096 ram 4096 > [...] > BTRFS error (device dm-4): block=33095680 write time tree block corruption detected > BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO failure (Error while writing out transaction) > BTRFS info (device dm-4): forced readonly > BTRFS warning (device dm-4): Skipping commit of aborted transaction. > BTRFS info (device dm-4): use zlib compression, level 3 > BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure > > [CAUSE] > Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM") > assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items > should have previous key with the same objectid as ino. > > But it's only true for fs trees. For log-tree, we can get above log tree > block where an EXTENT_DATA item has no previous key with the same ino. > As log tree only records modified items, it won't record unmodified > items like INODE_ITEM. > > So this triggers write time tree check warning. > > [FIX] > As a quick fix, check header owner to skip the previous key if it's not > fs tree (log tree doesn't count as fs tree). > > This fix is only to be merged as a quick fix. > There will be a more comprehensive fix to refactor the common check into > one function. > > Reported-by: David Sterba <dsterba@suse.com> > Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM") So this is bogus, since that commit is not in Linus' tree, and once it gets there its ID changes. More likely, this will get squashed into that commit in misc-next since we are still far from the 5.5 merge window. > Signed-off-by: Qu Wenruo <wqu@suse.com> Anyway, the change looks fine to me. Reviewed-by: Filipe Manana <fdmanana@suse.com> Thanks. > --- > fs/btrfs/tree-checker.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c > index b8f82d9be9f0..5e34cd5e3e2e 100644 > --- a/fs/btrfs/tree-checker.c > +++ b/fs/btrfs/tree-checker.c > @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer *leaf, > * But if objectids mismatch, it means we have a missing > * INODE_ITEM. > */ > - if (slot > 0 && prev_key->objectid != key->objectid) { > + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) && > + prev_key->objectid != key->objectid) { > file_extent_err(leaf, slot, > "invalid previous key objectid, have %llu expect %llu", > prev_key->objectid, key->objectid); > @@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf, > u32 cur = 0; > > /* Same check as in check_extent_data_item() */ > - if (slot > 0 && prev_key->objectid != key->objectid) { > + if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) && > + prev_key->objectid != key->objectid) { > dir_item_err(leaf, slot, > "invalid previous key objectid, have %llu expect %llu", > prev_key->objectid, key->objectid); > -- > 2.23.0 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees 2019-10-04 14:15 ` Filipe Manana @ 2019-10-07 15:31 ` David Sterba 0 siblings, 0 replies; 10+ messages in thread From: David Sterba @ 2019-10-07 15:31 UTC (permalink / raw) To: Filipe Manana; +Cc: Qu Wenruo, linux-btrfs, David Sterba On Fri, Oct 04, 2019 at 03:15:51PM +0100, Filipe Manana wrote: > On Fri, Oct 4, 2019 at 11:27 AM Qu Wenruo <wqu@suse.com> wrote: > > Reported-by: David Sterba <dsterba@suse.com> > > Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM") > > So this is bogus, since that commit is not in Linus' tree, and once it > gets there its ID changes. > More likely, this will get squashed into that commit in misc-next > since we are still far from the 5.5 merge window. You're right, squashing it in is preferred in this case. Split fixes have bitten us in the past so if we can afford to rebase the devel queue a single complete patch is preferred. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > > Anyway, the change looks fine to me. > > Reviewed-by: Filipe Manana <fdmanana@suse.com> Thanks, I can add rev-by to "btrfs: tree-checker: Try to detect missing INODE_ITEM" as well if you want. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/3] btrfs: tree-checker: Refactor prev_key check for ino into a function 2019-10-04 9:31 [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees Qu Wenruo 2019-10-04 9:31 ` [PATCH 1/3] btrfs: tree-checker: Fix false alerts on " Qu Wenruo @ 2019-10-04 9:31 ` Qu Wenruo 2019-10-04 9:31 ` [PATCH 3/3] btrfs: Enhance the error outputting for write time tree checker Qu Wenruo 2019-10-07 16:46 ` [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees David Sterba 3 siblings, 0 replies; 10+ messages in thread From: Qu Wenruo @ 2019-10-04 9:31 UTC (permalink / raw) To: linux-btrfs Refactor the check for prev_key->objectid of the following key types into one function, check_prev_ino(): - EXTENT_DATA - INODE_REF - DIR_INDEX - DIR_ITEM - XATTR_ITEM Despite the refactor, also add the check of prev_key for INODE_REF. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/tree-checker.c | 113 +++++++++++++++++++++++++--------------- 1 file changed, 72 insertions(+), 41 deletions(-) diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index 5e34cd5e3e2e..73678393340a 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -125,6 +125,74 @@ static u64 file_extent_end(struct extent_buffer *leaf, return end; } +/* + * Customized reported for dir_item, only important new info is key->objectid, + * which represents inode number + */ +__printf(3, 4) +__cold +static void dir_item_err(const struct extent_buffer *eb, int slot, + const char *fmt, ...) +{ + const struct btrfs_fs_info *fs_info = eb->fs_info; + struct btrfs_key key; + struct va_format vaf; + va_list args; + + btrfs_item_key_to_cpu(eb, &key, slot); + va_start(args, fmt); + + vaf.fmt = fmt; + vaf.va = &args; + + btrfs_crit(fs_info, + "corrupt %s: root=%llu block=%llu slot=%d ino=%llu, %pV", + btrfs_header_level(eb) == 0 ? "leaf" : "node", + btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot, + key.objectid, &vaf); + va_end(args); +} + +/* + * This functions checks prev_key->objectid, to ensure current key and prev_key + * shares the same objectid as ino. + * + * This is to detect missing INODE_ITEM in subvolume trees. + * + * Return true if everything is OK or we don't need to check. + * Return false if anything is wrong. + */ +static bool check_prev_ino(struct extent_buffer *leaf, + struct btrfs_key *key, int slot, + struct btrfs_key *prev_key) +{ + /* No prev key, skip check */ + if (slot == 0) + return true; + + /* Only these key->types needs to be checked */ + ASSERT(key->type == BTRFS_XATTR_ITEM_KEY || + key->type == BTRFS_INODE_REF_KEY || + key->type == BTRFS_DIR_INDEX_KEY || + key->type == BTRFS_DIR_ITEM_KEY || + key->type == BTRFS_EXTENT_DATA_KEY); + + /* + * Only subvolume trees along with their reloc trees needs this check. + * Things like log tree doesn't follow this ino requirement. + */ + if (!is_fstree(btrfs_header_owner(leaf))) + return true; + + if (key->objectid == prev_key->objectid) + return true; + + /* Error found */ + dir_item_err(leaf, slot, + "invalid previous key objectid, have %llu expect %llu", + prev_key->objectid, key->objectid); + return false; +} static int check_extent_data_item(struct extent_buffer *leaf, struct btrfs_key *key, int slot, struct btrfs_key *prev_key) @@ -148,13 +216,8 @@ static int check_extent_data_item(struct extent_buffer *leaf, * But if objectids mismatch, it means we have a missing * INODE_ITEM. */ - if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) && - prev_key->objectid != key->objectid) { - file_extent_err(leaf, slot, - "invalid previous key objectid, have %llu expect %llu", - prev_key->objectid, key->objectid); + if (!check_prev_ino(leaf, key, slot, prev_key)) return -EUCLEAN; - } fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); @@ -285,34 +348,6 @@ static int check_csum_item(struct extent_buffer *leaf, struct btrfs_key *key, return 0; } -/* - * Customized reported for dir_item, only important new info is key->objectid, - * which represents inode number - */ -__printf(3, 4) -__cold -static void dir_item_err(const struct extent_buffer *eb, int slot, - const char *fmt, ...) -{ - const struct btrfs_fs_info *fs_info = eb->fs_info; - struct btrfs_key key; - struct va_format vaf; - va_list args; - - btrfs_item_key_to_cpu(eb, &key, slot); - va_start(args, fmt); - - vaf.fmt = fmt; - vaf.va = &args; - - btrfs_crit(fs_info, - "corrupt %s: root=%llu block=%llu slot=%d ino=%llu, %pV", - btrfs_header_level(eb) == 0 ? "leaf" : "node", - btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot, - key.objectid, &vaf); - va_end(args); -} - static int check_dir_item(struct extent_buffer *leaf, struct btrfs_key *key, struct btrfs_key *prev_key, int slot) @@ -322,14 +357,8 @@ static int check_dir_item(struct extent_buffer *leaf, u32 item_size = btrfs_item_size_nr(leaf, slot); u32 cur = 0; - /* Same check as in check_extent_data_item() */ - if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) && - prev_key->objectid != key->objectid) { - dir_item_err(leaf, slot, - "invalid previous key objectid, have %llu expect %llu", - prev_key->objectid, key->objectid); + if (!check_prev_ino(leaf, key, slot, prev_key)) return -EUCLEAN; - } di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item); while (cur < item_size) { u32 name_len; @@ -1266,6 +1295,8 @@ static int check_inode_ref(struct extent_buffer *leaf, unsigned long ptr; unsigned long end; + if (!check_prev_ino(leaf, key, slot, prev_key)) + return -EUCLEAN; /* namelen can't be 0, so item_size == sizeof() is also invalid */ if (btrfs_item_size_nr(leaf, slot) <= sizeof(*iref)) { inode_ref_err(fs_info, leaf, slot, -- 2.23.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] btrfs: Enhance the error outputting for write time tree checker 2019-10-04 9:31 [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees Qu Wenruo 2019-10-04 9:31 ` [PATCH 1/3] btrfs: tree-checker: Fix false alerts on " Qu Wenruo 2019-10-04 9:31 ` [PATCH 2/3] btrfs: tree-checker: Refactor prev_key check for ino into a function Qu Wenruo @ 2019-10-04 9:31 ` Qu Wenruo 2019-10-07 16:46 ` [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees David Sterba 3 siblings, 0 replies; 10+ messages in thread From: Qu Wenruo @ 2019-10-04 9:31 UTC (permalink / raw) To: linux-btrfs Unlike read time tree checker error, write time error can't be inspected by "btrfs ins dump-tree", so we need extra info to determine what's going wrong. The patch will add the following output for write time tree checker error: - The content of the offending tree block To help determining if it's a false alert. - Kernel WARN_ON() for debug build This is helpful for us to detect unexpected write time tree checker error, especially fstests could catch the dmesg. Since the WARN_ON() is only triggered for write time tree checker, test cases utilizing dm-error won't trigger this WARN_ON(), thus no extra noise. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 16dc60b4966d..a0925b4e00af 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -545,9 +545,11 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page) ret = btrfs_check_leaf_full(eb); if (ret < 0) { + btrfs_print_tree(eb, 0); btrfs_err(fs_info, "block=%llu write time tree block corruption detected", eb->start); + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); return ret; } write_extent_buffer(eb, result, 0, csum_size); -- 2.23.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees 2019-10-04 9:31 [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees Qu Wenruo ` (2 preceding siblings ...) 2019-10-04 9:31 ` [PATCH 3/3] btrfs: Enhance the error outputting for write time tree checker Qu Wenruo @ 2019-10-07 16:46 ` David Sterba 3 siblings, 0 replies; 10+ messages in thread From: David Sterba @ 2019-10-07 16:46 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Fri, Oct 04, 2019 at 05:31:30PM +0800, Qu Wenruo wrote: > There is a false alerts of tree-checker when running fstests/btrfs/063 > in a loop. > > The bug is caused by commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect > missing INODE_ITEM"). > For the full error analyse, please check the first patch. > > The first patch will give it a quick fix, so that it can be addressed in > v5.4 release cycle. > > The 2nd patch is a more proper patch, with refactor to reduce duplicated > code and add the check to INODE_REF item. > But it's pretty large (+72, -41), not sure if it's suitbale for late > -rc. > > Also current write-time tree checker error message is too silent, can't > be caught by fstests nor a quick glance of dmesg. And it doesn't contain > enough info to debug. > > So to enhance the error message, and make it more noisy, the 3rd patch > will enhance the error message. > > Qu Wenruo (3): > btrfs: tree-checker: Fix false alerts on log trees > btrfs: tree-checker: Refactor prev_key check for ino into a function > btrfs: Enhance the error outputting for write time tree checker Patch 1 folded to the original patch and 2 and 2 now in misc-next, thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-10-07 16:46 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-04 9:31 [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees Qu Wenruo 2019-10-04 9:31 ` [PATCH 1/3] btrfs: tree-checker: Fix false alerts on " Qu Wenruo 2019-10-04 13:52 ` Nikolay Borisov 2019-10-04 14:13 ` Filipe Manana 2019-10-04 14:19 ` Nikolay Borisov 2019-10-04 14:15 ` Filipe Manana 2019-10-07 15:31 ` David Sterba 2019-10-04 9:31 ` [PATCH 2/3] btrfs: tree-checker: Refactor prev_key check for ino into a function Qu Wenruo 2019-10-04 9:31 ` [PATCH 3/3] btrfs: Enhance the error outputting for write time tree checker Qu Wenruo 2019-10-07 16:46 ` [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees David Sterba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).