All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: fdmanana@gmail.com
Cc: David Sterba <dsterba@suse.com>, WenRuo Qu <wqu@suse.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees
Date: Fri, 4 Oct 2019 17:19:45 +0300	[thread overview]
Message-ID: <886d611e-e95e-a7ac-6025-9a302771a72e@suse.com> (raw)
In-Reply-To: <CAL3q7H43Qsz9cy_EULphP=L=FjpPpiY2KwScHYCSbncJogujjg@mail.gmail.com>



On 4.10.19 г. 17:13 ч., Filipe Manana wrote:
> On Fri, Oct 4, 2019 at 2:54 PM Nikolay Borisov <nborisov@suse.com> wrote:
>>
>>
>>
>> On 4.10.19 г. 12:31 ч., Qu Wenruo wrote:
>>> [BUG]
>>> When running btrfs/063 in a loop, we got the following random write time
>>> tree checker error:
>>>
>>>   BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, have 305 expect 307
>>>   BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 12146 owner 18446744073709551610
>>>   BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 26176
>>>           item 0 key (305 1 0) itemoff 16123 itemsize 160
>>>                   inode generation 0 size 0 mode 40777
>>>           item 1 key (305 12 257) itemoff 16111 itemsize 12
>>>           item 2 key (307 108 0) itemoff 16058 itemsize 53 <<<
>>>                   extent data disk bytenr 0 nr 0
>>>                   extent data offset 0 nr 614400 ram 671744
>>>           item 3 key (307 108 614400) itemoff 16005 itemsize 53
>>>                   extent data disk bytenr 195342336 nr 57344
>>>                   extent data offset 0 nr 53248 ram 57344
>>>           item 4 key (307 108 667648) itemoff 15952 itemsize 53
>>>                   extent data disk bytenr 194048000 nr 4096
>>>                   extent data offset 0 nr 4096 ram 4096
>>>         [...]
>>>   BTRFS error (device dm-4): block=33095680 write time tree block corruption detected
>>>   BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO failure (Error while writing out transaction)
>>>   BTRFS info (device dm-4): forced readonly
>>>   BTRFS warning (device dm-4): Skipping commit of aborted transaction.
>>>   BTRFS info (device dm-4): use zlib compression, level 3
>>>   BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure
>>>
>>> [CAUSE]
>>> Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
>>> assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items
>>> should have previous key with the same objectid as ino.
>>>
>>> But it's only true for fs trees. For log-tree, we can get above log tree
>>> block where an EXTENT_DATA item has no previous key with the same ino.
>>> As log tree only records modified items, it won't record unmodified
>>> items like INODE_ITEM.
>>>
>>> So this triggers write time tree check warning.
>>>
>>> [FIX]
>>> As a quick fix, check header owner to skip the previous key if it's not
>>> fs tree (log tree doesn't count as fs tree).
>>>
>>> This fix is only to be merged as a quick fix.
>>> There will be a more comprehensive fix to refactor the common check into
>>> one function.
>>>
>>> Reported-by: David Sterba <dsterba@suse.com>
>>> Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>
>>
>> It's not entirely clear why this bug manifests. My tests show that when
>> we write extents we always update the inode's c/m time so it's always
>> dirtied hence it's logged. OTOH when punching a hole the same thing is
>> valid.
>>
>> Filipe, under what conditions should it be possible to log an
>> EXTENT_DATA item without first logging the inode it belongs to? It seems
>> using the usual write paths (e.g. buffered write and punchole) that's
>> impossible?
> 
> The tests you did are pointless, none of those operations write to a
> log tree, only fsync does that.

You were quick to judge, I tried:
xfs_io -f -c "fpunch 1m 4k" -c "fsync" /media/foo (foo was a 4m, fully
sycned file)

Similar command with the just writing in the middle of the file i.e not
changing isize.

> 
> This change is perfectly fine. Logging (fsync) always logs the inode
> item since commit [1] (2015),
> however it might do so after logging extents and other items, and in
> between that, if writeback for
> the log tree leaf happens we get that error from the tree-checker.

Fair enough, however that clarification about the sequence of events
should be in the changelog.

> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e4545de5b035c7debb73d260c78377dbb69cbfb5
> 
>>
>>> ---
>>>  fs/btrfs/tree-checker.c | 6 ++++--
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
>>> index b8f82d9be9f0..5e34cd5e3e2e 100644
>>> --- a/fs/btrfs/tree-checker.c
>>> +++ b/fs/btrfs/tree-checker.c
>>> @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer *leaf,
>>>        * But if objectids mismatch, it means we have a missing
>>>        * INODE_ITEM.
>>>        */
>>> -     if (slot > 0 && prev_key->objectid != key->objectid) {
>>> +     if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
>>> +         prev_key->objectid != key->objectid) {
>>>               file_extent_err(leaf, slot,
>>>               "invalid previous key objectid, have %llu expect %llu",
>>>                               prev_key->objectid, key->objectid);
>>> @@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf,
>>>       u32 cur = 0;
>>>
>>>       /* Same check as in check_extent_data_item() */
>>> -     if (slot > 0 && prev_key->objectid != key->objectid) {
>>> +     if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
>>> +         prev_key->objectid != key->objectid) {
>>>               dir_item_err(leaf, slot,
>>>               "invalid previous key objectid, have %llu expect %llu",
>>>                            prev_key->objectid, key->objectid);
>>>
> 
> 
> 

  reply	other threads:[~2019-10-04 14:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-04  9:31 [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees Qu Wenruo
2019-10-04  9:31 ` [PATCH 1/3] btrfs: tree-checker: Fix false alerts on " Qu Wenruo
2019-10-04 13:52   ` Nikolay Borisov
2019-10-04 14:13     ` Filipe Manana
2019-10-04 14:19       ` Nikolay Borisov [this message]
2019-10-04 14:15   ` Filipe Manana
2019-10-07 15:31     ` David Sterba
2019-10-04  9:31 ` [PATCH 2/3] btrfs: tree-checker: Refactor prev_key check for ino into a function Qu Wenruo
2019-10-04  9:31 ` [PATCH 3/3] btrfs: Enhance the error outputting for write time tree checker Qu Wenruo
2019-10-07 16:46 ` [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=886d611e-e95e-a7ac-6025-9a302771a72e@suse.com \
    --to=nborisov@suse.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.