All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Balance loops: what we know so far
Date: Wed, 13 May 2020 19:23:40 +0800	[thread overview]
Message-ID: <6fcccf0b-108d-75d2-ad53-3f7837478319@gmx.com> (raw)
In-Reply-To: <20200513052452.GY10769@hungrycats.org>


[-- Attachment #1.1: Type: text/plain, Size: 14969 bytes --]



On 2020/5/13 下午1:24, Zygo Blaxell wrote:
> On Wed, May 13, 2020 at 10:28:37AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2020/5/12 下午10:11, Zygo Blaxell wrote:
>>> On Tue, May 12, 2020 at 09:43:06AM -0400, Zygo Blaxell wrote:
>>>> On Mon, May 11, 2020 at 04:31:32PM +0800, Qu Wenruo wrote:
>>>>> Hi Zygo,
>>>>>
>>>>> Would you like to test this diff?
>>>>>
>>>>> Although I haven't find a solid reason yet, there is another report and
>>>>> with the help from the reporter, it turns out that balance hangs at
>>>>> relocating DATA_RELOC tree block.
>>>>>
>>>>> After some more digging, DATA_RELOC tree doesn't need REF_COW bit at all
>>>>> since we can't create snapshot for data reloc tree.
>>>>>
>>>>> By removing the REF_COW bit, we could ensure that data reloc tree always
>>>>> get cowed for relocation (just like extent tree), this would hugely
>>>>> reduce the complexity for data reloc tree.
>>>>>
>>>>> Not sure if this would help, but it passes my local balance run.
>>>>
>>>> I ran it last night.  It did 30804 loops during a metadata block group
>>>> balance, and is now looping on a data block group as I write this.
>>
>> OK, not that surprised the patch doesn't help.
>> But still, the patch itself could still make sense for removing the
>> REFCOW bit for data reloc tree.
>>
>> But I'm interesting in that, after that 30804 loops, it found its way to
>> next block group?!
>>
>>>
>>> Here's the block group that is failing, and some poking around with it:
>>
>> In fact, such poking would be the most valuable part.
>>
>>>
>>> 	root@tester:~# ~/share/python-btrfs/examples/show_block_group_contents.py 4368594108416 /media/testfs/
>>> 	block group vaddr 4368594108416 length 1073741824 flags DATA used 530509824 used_pct 49
>>> 	extent vaddr 4368594108416 length 121053184 refs 1 gen 1318394 flags DATA
>>> 	    inline shared data backref parent 4374646833152 count 1
>>> 	extent vaddr 4368715161600 length 120168448 refs 1 gen 1318394 flags DATA
>>> 	    inline shared data backref parent 4374801383424 count 1
>>> 	extent vaddr 4368835330048 length 127623168 refs 1 gen 1318394 flags DATA
>>> 	    inline shared data backref parent 4374801383424 count 1
>>> 	extent vaddr 4368962953216 length 124964864 refs 1 gen 1318394 flags DATA
>>> 	    inline shared data backref parent 4374801383424 count 1
>>> 	extent vaddr 4369182420992 length 36700160 refs 1 gen 1321064 flags DATA
>>> 	    inline extent data backref root 257 objectid 257 offset 822607872 count 1
>>
>> One interesting thing is, there are 5 extents during the loop.
>> The first 4 looks like they belong to data reloc tree, which means they
>> have been swapped, waiting to be cleaned up.
>>
>> The last one belongs to root 257, and looks like it hadn't been relocated.
>>
>>>
>>> The extent data backref is unusual--during loops, I don't usually see those.
>>> And...as I write this, it disappeared (it was part of the bees hash table, and
>>> was overwritten).  Now there are 4 extents reported in the balance loop (note:
>>> I added a loop counter to the log message):
>>
>> Then it means the last one get properly relocated.
> 
> No.  In these cases the extent is removed by other filesystem activity.
> If balance gets stuck looping, it can never break out of a loop if it
> is the only writer on the filesystem.  I've left it looping for days,
> it makes no progress.
> 
> When balance is looping, it is always stuck waiting until the extents
> are removed by something else.  In this particular case, the extent was
> overwritten by another process removing the extent's last reference so
> it was no longer part of the block group any more.  It is possible to
> break a balance loop by simply deleting all the files with extents in
> the block group.
> 
> I wrote some scripts that dump out the extents in the looping block
> group, find the files they belong to, and run the defrag ioctl on them,
> thereby removing all the extents in the block group so the balance loop
> will end, without deleting the data.

This should definitely be fixed.

>  I used the script for a while and
> was able to balance hundreds of block groups more than I would have been
> able to without the script; however, the script couldn't run defrag on
> extents that were not reachable through open() (e.g. extents referenced
> by a deleted snapshot),

And data reloc tree.

> so it couldn't work around the balance loops in
> all cases.

For the single data extent hanging you are able to reproduce, can you
send me a binary dump of that fs when it's hanging?

Thanks,
Qu

> 
>> The cleanup for the first 4 doesn't happen properly.
>>>
>>> 	[Tue May 12 09:44:22 2020] BTRFS info (device dm-0): found 5 extents, loops 378, stage: update data pointers
>>> 	[Tue May 12 09:44:23 2020] BTRFS info (device dm-0): found 5 extents, loops 379, stage: update data pointers
>>> 	[Tue May 12 09:44:24 2020] BTRFS info (device dm-0): found 5 extents, loops 380, stage: update data pointers
>>> 	[Tue May 12 09:44:26 2020] BTRFS info (device dm-0): found 5 extents, loops 381, stage: update data pointers
>>> 	[Tue May 12 09:44:27 2020] BTRFS info (device dm-0): found 5 extents, loops 382, stage: update data pointers
>>> 	[Tue May 12 10:04:49 2020] BTRFS info (device dm-0): found 5 extents, loops 383, stage: update data pointers
>>> 	[Tue May 12 10:04:53 2020] BTRFS info (device dm-0): found 4 extents, loops 384, stage: update data pointers
>>> 	[Tue May 12 10:04:58 2020] BTRFS info (device dm-0): found 4 extents, loops 385, stage: update data pointers
>>> 	[Tue May 12 10:05:00 2020] BTRFS info (device dm-0): found 4 extents, loops 386, stage: update data pointers
>>> 	[Tue May 12 10:05:00 2020] BTRFS info (device dm-0): found 4 extents, loops 387, stage: update data pointers
>>> 	[Tue May 12 10:05:01 2020] BTRFS info (device dm-0): found 4 extents, loops 388, stage: update data pointers
>>>
>>> Some of the extents that remain are confusing python-btrfs a little:
>>>
>>> 	root@tester:~# ~/share/python-btrfs/examples/show_block_group_data_extent_filenames.py 4368594108416 /media/testfs/
>>> 	block group vaddr 4368594108416 length 1073741824 flags DATA used 530509824 used_pct 49
>>> 	extent vaddr 4368594108416 length 121053184 refs 1 gen 1318394 flags DATA
>>> 	Traceback (most recent call last):
>>> 	  File "/root/share/python-btrfs/examples/show_block_group_data_extent_filenames.py", line 52, in <module>
>>> 	    inodes, bytes_missed = logical_to_ino_fn(fs.fd, extent.vaddr)
>>> 	  File "/root/share/python-btrfs/examples/show_block_group_data_extent_filenames.py", line 28, in find_out_about_v1_or_v2
>>> 	    inodes, bytes_missed = using_v2(fd, vaddr)
>>> 	  File "/root/share/python-btrfs/examples/show_block_group_data_extent_filenames.py", line 17, in using_v2
>>> 	    inodes, bytes_missed = btrfs.ioctl.logical_to_ino_v2(fd, vaddr, ignore_offset=True)
>>> 	  File "/media/share/python-btrfs/examples/btrfs/ioctl.py", line 565, in logical_to_ino_v2
>>> 	    return _logical_to_ino(fd, vaddr, bufsize, ignore_offset, _v2=True)
>>> 	  File "/media/share/python-btrfs/examples/btrfs/ioctl.py", line 581, in _logical_to_ino
>>> 	    fcntl.ioctl(fd, IOC_LOGICAL_INO_V2, args)
>>
>> I'm a little surprised about the it's using logical ino ioctl, not just
>> TREE_SEARCH.
>>
>> I guess if we could get a plain tree search based one (it only search
>> commit root, which is exactly balance based on), it would be easier to
>> do the digging.
>>
>>> 	OSError: [Errno 22] Invalid argument
>>>
>>> 	root@tester:~# btrfs ins log 4368594108416 /media/testfs/
>>> 	/media/testfs//snap-1589258042/testhost/var/log/messages.6.lzma
>>> 	/media/testfs//current/testhost/var/log/messages.6.lzma
>>> 	/media/testfs//snap-1589249822/testhost/var/log/messages.6.lzma
>>> 	ERROR: ino paths ioctl: No such file or directory
>>> 	/media/testfs//snap-1589249547/testhost/var/log/messages.6.lzma
>>> 	ERROR: ino paths ioctl: No such file or directory
>>> 	/media/testfs//snap-1589248407/testhost/var/log/messages.6.lzma
>>> 	/media/testfs//snap-1589256422/testhost/var/log/messages.6.lzma
>>> 	ERROR: ino paths ioctl: No such file or directory
>>> 	/media/testfs//snap-1589251322/testhost/var/log/messages.6.lzma
>>> 	/media/testfs//snap-1589251682/testhost/var/log/messages.6.lzma
>>> 	/media/testfs//snap-1589253842/testhost/var/log/messages.6.lzma
>>> 	/media/testfs//snap-1589246727/testhost/var/log/messages.6.lzma
>>> 	/media/testfs//snap-1589258582/testhost/var/log/messages.6.lzma
>>> 	/media/testfs//snap-1589244027/testhost/var/log/messages.6.lzma
>>> 	/media/testfs//snap-1589245227/testhost/var/log/messages.6.lzma
>>> 	ERROR: ino paths ioctl: No such file or directory
>>> 	ERROR: ino paths ioctl: No such file or directory
>>> 	/media/testfs//snap-1589246127/testhost/var/log/messages.6.lzma
>>> 	/media/testfs//snap-1589247327/testhost/var/log/messages.6.lzma
>>> 	ERROR: ino paths ioctl: No such file or directory
>>>
>>> Hmmm, I wonder if there's a problem with deleted snapshots?
>>
>> Yes, also what I'm guessing.
>>
>> The cleanup of data reloc tree doesn't look correct to me.
>>
>> Thanks for the new clues,
>> Qu
>>
>>>  I have those
>>> nearly continuously in my test environment, which is creating and deleting
>>> snapshots all the time.
>>>
>>> 	root@tester:~# btrfs ins log 4368594108416 -P /media/testfs/
>>> 	inode 20838190 offset 0 root 10347
>>> 	inode 20838190 offset 0 root 8013
>>> 	inode 20838190 offset 0 root 10332
>>> 	inode 20838190 offset 0 root 10330
>>> 	inode 20838190 offset 0 root 10331
>>> 	inode 20838190 offset 0 root 10328
>>> 	inode 20838190 offset 0 root 10329
>>> 	inode 20838190 offset 0 root 10343
>>> 	inode 20838190 offset 0 root 10333
>>> 	inode 20838190 offset 0 root 10334
>>> 	inode 20838190 offset 0 root 10336
>>> 	inode 20838190 offset 0 root 10338
>>> 	inode 20838190 offset 0 root 10325
>>> 	inode 20838190 offset 0 root 10349
>>> 	inode 20838190 offset 0 root 10320
>>> 	inode 20838190 offset 0 root 10321
>>> 	inode 20838190 offset 0 root 10322
>>> 	inode 20838190 offset 0 root 10323
>>> 	inode 20838190 offset 0 root 10324
>>> 	inode 20838190 offset 0 root 10326
>>> 	inode 20838190 offset 0 root 10327
>>> 	root@tester:~# btrfs sub list -d /media/testfs/
>>> 	ID 10201 gen 1321166 top level 0 path DELETED
>>> 	ID 10210 gen 1321166 top level 0 path DELETED
>>> 	ID 10230 gen 1321166 top level 0 path DELETED
>>> 	ID 10254 gen 1321166 top level 0 path DELETED
>>> 	ID 10257 gen 1321166 top level 0 path DELETED
>>> 	ID 10274 gen 1321166 top level 0 path DELETED
>>> 	ID 10281 gen 1321166 top level 0 path DELETED
>>> 	ID 10287 gen 1321166 top level 0 path DELETED
>>> 	ID 10296 gen 1321166 top level 0 path DELETED
>>> 	ID 10298 gen 1321166 top level 0 path DELETED
>>> 	ID 10299 gen 1321166 top level 0 path DELETED
>>> 	ID 10308 gen 1321166 top level 0 path DELETED
>>> 	ID 10311 gen 1321166 top level 0 path DELETED
>>> 	ID 10313 gen 1321166 top level 0 path DELETED
>>> 	ID 10315 gen 1321166 top level 0 path DELETED
>>> 	ID 10317 gen 1321166 top level 0 path DELETED
>>> 	ID 10322 gen 1321166 top level 0 path DELETED
>>> 	ID 10323 gen 1321166 top level 0 path DELETED
>>> 	ID 10327 gen 1321166 top level 0 path DELETED
>>> 	ID 10328 gen 1321166 top level 0 path DELETED
>>> 	ID 10330 gen 1321166 top level 0 path DELETED
>>> 	ID 10333 gen 1321166 top level 0 path DELETED
>>>
>>>
>>>>> Thanks,
>>>>> Qu
>>>>
>>>>> From 82f3b96a68561b2de9712262cb652192b8ea9b1b Mon Sep 17 00:00:00 2001
>>>>> From: Qu Wenruo <wqu@suse.com>
>>>>> Date: Mon, 11 May 2020 16:27:43 +0800
>>>>> Subject: [PATCH] btrfs: Remove the REF_COW bit for data reloc tree
>>>>>
>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>>> ---
>>>>>  fs/btrfs/disk-io.c    | 9 ++++++++-
>>>>>  fs/btrfs/inode.c      | 6 ++++--
>>>>>  fs/btrfs/relocation.c | 3 ++-
>>>>>  3 files changed, 14 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>>> index 56675d3cd23a..cb90966a8aab 100644
>>>>> --- a/fs/btrfs/disk-io.c
>>>>> +++ b/fs/btrfs/disk-io.c
>>>>> @@ -1418,9 +1418,16 @@ static int btrfs_init_fs_root(struct btrfs_root *root)
>>>>>  	if (ret)
>>>>>  		goto fail;
>>>>>  
>>>>> -	if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) {
>>>>> +	if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID &&
>>>>> +	    root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID) {
>>>>>  		set_bit(BTRFS_ROOT_REF_COWS, &root->state);
>>>>>  		btrfs_check_and_init_root_item(&root->root_item);
>>>>> +	} else if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID) {
>>>>> +		/*
>>>>> +		 * Data reloc tree won't be snapshotted, thus it's COW only
>>>>> +		 * tree, it's needed to set TRACK_DIRTY bit for it.
>>>>> +		 */
>>>>> +		set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
>>>>>  	}
>>>>>  
>>>>>  	btrfs_init_free_ino_ctl(root);
>>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>>> index 5d567082f95a..71841535c7ca 100644
>>>>> --- a/fs/btrfs/inode.c
>>>>> +++ b/fs/btrfs/inode.c
>>>>> @@ -4129,7 +4129,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>>>>>  	 * extent just the way it is.
>>>>>  	 */
>>>>>  	if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) ||
>>>>> -	    root == fs_info->tree_root)
>>>>> +	    root == fs_info->tree_root ||
>>>>> +	    root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID)
>>>>>  		btrfs_drop_extent_cache(BTRFS_I(inode), ALIGN(new_size,
>>>>>  					fs_info->sectorsize),
>>>>>  					(u64)-1, 0);
>>>>> @@ -4334,7 +4335,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>>>>>  
>>>>>  		if (found_extent &&
>>>>>  		    (test_bit(BTRFS_ROOT_REF_COWS, &root->state) ||
>>>>> -		     root == fs_info->tree_root)) {
>>>>> +		     root == fs_info->tree_root ||
>>>>> +		     root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID)) {
>>>>>  			struct btrfs_ref ref = { 0 };
>>>>>  
>>>>>  			bytes_deleted += extent_num_bytes;
>>>>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>>>>> index f25deca18a5d..a85dd5d465f6 100644
>>>>> --- a/fs/btrfs/relocation.c
>>>>> +++ b/fs/btrfs/relocation.c
>>>>> @@ -1087,7 +1087,8 @@ int replace_file_extents(struct btrfs_trans_handle *trans,
>>>>>  		 * if we are modifying block in fs tree, wait for readpage
>>>>>  		 * to complete and drop the extent cache
>>>>>  		 */
>>>>> -		if (root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID) {
>>>>> +		if (root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID &&
>>>>> +		    root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID) {
>>>>>  			if (first) {
>>>>>  				inode = find_next_inode(root, key.objectid);
>>>>>  				first = 0;
>>>>> -- 
>>>>> 2.26.2
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
> 
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-05-13 11:23 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-11 21:14 Balance loops: what we know so far Zygo Blaxell
2020-04-27  7:07 ` Qu Wenruo
2020-04-28  4:55   ` Zygo Blaxell
2020-04-28  9:54     ` Qu Wenruo
2020-04-28 14:51       ` Zygo Blaxell
2020-04-29  5:34         ` Qu Wenruo
2020-04-29 12:23           ` Sebastian Döring
2020-05-04 18:54       ` Andrea Gelmini
2020-05-04 23:48         ` Qu Wenruo
2020-05-05  9:10           ` Andrea Gelmini
2020-05-06  5:58             ` Qu Wenruo
2020-05-06 18:24               ` Andrea Gelmini
2020-05-07  9:59                 ` Andrea Gelmini
2020-05-08  6:33                 ` Qu Wenruo
2020-05-11  8:31     ` Qu Wenruo
2020-05-12 13:43       ` Zygo Blaxell
2020-05-12 14:11         ` Zygo Blaxell
2020-05-13  2:28           ` Qu Wenruo
2020-05-13  5:02             ` Zygo Blaxell
2020-05-13  6:36               ` Qu Wenruo
2020-05-13  5:24             ` Zygo Blaxell
2020-05-13 11:23               ` Qu Wenruo [this message]
2020-05-13 12:21                 ` Zygo Blaxell
2020-05-14  8:08                   ` Qu Wenruo
2020-05-14  8:55                     ` Qu Wenruo
2020-05-14 17:44                       ` Zygo Blaxell
2020-05-14 23:43                         ` Qu Wenruo
2020-05-15  6:57                         ` Qu Wenruo
2020-05-15 15:17                           ` Zygo Blaxell
2020-05-18  5:25                             ` Qu Wenruo
2020-05-20  7:27                             ` Qu Wenruo
2020-05-21  3:26                               ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6fcccf0b-108d-75d2-ad53-3f7837478319@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.