Re: Balance loops: what we know so far

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Balance loops: what we know so far
Date: Fri, 15 May 2020 07:43:04 +0800	[thread overview]
Message-ID: <fc456c8b-358d-89d9-77d4-4c8efa5ebf63@gmx.com> (raw)
In-Reply-To: <20200514174409.GC10769@hungrycats.org>

[-- Attachment #1.1: Type: text/plain, Size: 4969 bytes --]

On 2020/5/15 上午1:44, Zygo Blaxell wrote:
> On Thu, May 14, 2020 at 04:55:18PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2020/5/14 下午4:08, Qu Wenruo wrote:
>>>
>>>
>>> On 2020/5/13 下午8:21, Zygo Blaxell wrote:
>>>> On Wed, May 13, 2020 at 07:23:40PM +0800, Qu Wenruo wrote:
>>>>>
>>>>>
>>> [...]
>>>>
>>>> Kernel log:
>>>>
>>>> 	[96199.614869][ T9676] BTRFS info (device dm-0): balance: start -d
>>>> 	[96199.616086][ T9676] BTRFS info (device dm-0): relocating block group 4396679168000 flags data
>>>> 	[96199.782217][ T9676] BTRFS info (device dm-0): relocating block group 4395605426176 flags data
>>>> 	[96199.971118][ T9676] BTRFS info (device dm-0): relocating block group 4394531684352 flags data
>>>> 	[96220.858317][ T9676] BTRFS info (device dm-0): found 13 extents, loops 1, stage: move data extents
>>>> 	[...]
>>>> 	[121403.509718][ T9676] BTRFS info (device dm-0): found 13 extents, loops 131823, stage: update data pointers
>>>> 	(qemu) stop
>>>>
>>>> btrfs-image URL:
>>>>
>>>> 	http://www.furryterror.org/~zblaxell/tmp/.fsinqz/image.bin
>>>>
>>> The image shows several very strange result.
>>>
>>> For one, although we're relocating block group 4394531684352, the
>>> previous two block groups doesn't really get relocated.
>>>
>>> There are still extents there, all belongs to data reloc tree.
>>>
>>> Furthermore, the data reloc tree inode 620 should be evicted when
>>> previous block group relocation finishes.
>>>
>>> So I'm considering something went wrong in data reloc tree, would you
>>> please try the following diff?
>>> (Either on vanilla kernel or with my previous useless patch)
>>
>> Oh, my previous testing patch is doing wrong inode put for data reloc
>> tree, thus it's possible to lead to such situation.
>>
>> Thankfully the v2 for upstream gets the problem fixed.
>>
>> Thus it goes back to the original stage, still no faster way to
>> reproduce the problem...
> 
> Can we attack the problem by logging kernel activity?  Like can we
> log whenever we add or remove items from the data reloc tree, or
> why we don't?

Sure, the only and tiny problem is the amount of logs we're adding.

I have no problem to add such debugging, either using plain pr_info() or
trace events.
Although I prefer plain pr_info().

> 
> I can get a filesystem with a single data block group and a single
> (visible) extent that loops, and somehow it's so easy to do that that I'm
> having problems making filesystems _not_ do it.  What can we do with that?

Even without my data reloc tree patch?

> 
> What am I (and everyone else with this problem) doing that you are not?

I just can't reproduce the bug, that's the biggest problem.

> Usually that difference is "I'm running bees" but we're running out of
> bugs related to LOGICAL_INO and the dedupe ioctl, and I think other people
> are reporting the problem without running bees.  I'm also running balance
> cancels, which seem to increase the repro rate (though they might just
> be increasing the number of balances tested per day, and there could be
> just a fixed percentage of balances that loop).

I tend to not to add cancel into the case, as it may add more complexity
and may be a different problem.

> 
> I will see if I can build a standalone kvm image that generates balance
> loops on blank disks.  If I'm successful, you can download it and then
> run all the experiments you want.

That's great.
Looking forward that.

Thanks,
Qu
> 
> I also want to see if reverting the extended reloc tree lifespan patch
> (d2311e698578 "btrfs: relocation: Delay reloc tree deletion after
> merge_reloc_roots") stops the looping on misc-next.  I found that
> reverting that patch stops the balance looping on 5.1.21 in an earlier
> experiment.  Maybe there are two bugs here, and we've already fixed one,
> but the symptom won't go away because some second bug has appeared.
> 
> 
>> Thanks,
>> Qu
>>>
>>> Thanks,
>>> Qu
>>>
>>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>>> index 9afc1a6928cf..ef9e18bab6f6 100644
>>> --- a/fs/btrfs/relocation.c
>>> +++ b/fs/btrfs/relocation.c
>>> @@ -3498,6 +3498,7 @@ struct inode *create_reloc_inode(struct
>>> btrfs_fs_info *fs_info,
>>>         BTRFS_I(inode)->index_cnt = group->start;
>>>
>>>         err = btrfs_orphan_add(trans, BTRFS_I(inode));
>>> +       WARN_ON(atomic_read(inode->i_count) != 1);
>>>  out:
>>>         btrfs_put_root(root);
>>>         btrfs_end_transaction(trans);
>>> @@ -3681,6 +3682,7 @@ int btrfs_relocate_block_group(struct
>>> btrfs_fs_info *fs_info, u64 group_start)
>>>  out:
>>>         if (err && rw)
>>>                 btrfs_dec_block_group_ro(rc->block_group);
>>> +       WARN_ON(atomic_read(inode->i_count) != 1);
>>>         iput(rc->data_inode);
>>>         btrfs_put_block_group(rc->block_group);
>>>         free_reloc_control(rc);
>>>
>>
> 
> 
> 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]