Re: 5.6-5.10 balance regression?

From: Qu Wenruo <wqu@suse.com>
To: "David Arendt" <admin@prnet.org>,
	"Qu Wenruo" <quwenruo.btrfs@gmx.com>,
	"Stéphane Lesimple" <stephane_btrfs2@lesimple.fr>,
	linux-btrfs@vger.kernel.org
Subject: Re: 5.6-5.10 balance regression?
Date: Mon, 28 Dec 2020 15:48:47 +0800	[thread overview]
Message-ID: <da42984a-1f75-153a-b7fd-145e0d66b6d4@suse.com> (raw)
In-Reply-To: <344c4bfd-3189-2bf5-9282-2f7b3cb23972@prnet.org>

On 2020/12/28 下午3:38, David Arendt wrote:
> Hi,
> 
> unfortunately the problem is no longer reproducible, probably due to 
> writes happening in meantime. If you still want a btrfs-image, I can 
> create one (unfortunately only without data as there is confidential 
> data in it), but as the problem is currently no longer reproducible, I 
> think it probably won't help.

That's fine, at least you get your fs back to normal.

I tried several small balance locally, not reproduced, thus I guess it 
may be related to certain tree layout.

Anyway, I'll wait for another small enough and reproducible report.

Thanks,
Qu

> 
> Thanks in advance,
> David Arendt
> 
> On 12/28/20 1:06 AM, Qu Wenruo wrote:
>>
>>
>> On 2020/12/27 下午9:11, David Arendt wrote:
>>> Hi,
>>>
>>> last week I had the same problem on a btrfs filesystem after updating to
>>> kernel 5.10.1. I have never had this problem before kernel 5.10.x.
>>> 5.9.x did now show any problem.
>>>
>>> Dec 14 22:30:59 xxx kernel: BTRFS info (device sda2): scrub: started on
>>> devid 1
>>> Dec 14 22:31:09 xxx kernel: BTRFS info (device sda2): scrub: finished on
>>> devid 1 with status: 0
>>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): balance: start
>>> -dusage\x10
>>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): relocating block
>>> group 71694286848 flags data
>>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): found 1058
>>> extents, stage: move data extents
>>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): balance: ended
>>> with status: -2
>>>
>>> This is not a multidevice volume but a volume consisting of a single
>>> partition.
>>>
>>> xxx ~ # btrfs fi df /u00
>>> Data, single: total\x10.01GiB, used=24GiB
>>> System, single: total=00MiB, used\x16.00KiB
>>> Metadata, single: total=76GiB, used=10GiB
>>> GlobalReserve, single: totalG.17MiB, used=00B
>>>
>>> xxx ~ # btrfs device usage /u00
>>> /dev/sda2, ID: 1
>>>     Device size:            19.81GiB
>>>     Device slack:              0.00B
>>>     Data,single:            10.01GiB
>>>     Metadata,single:         2.76GiB
>>>     System,single:           4.00MiB
>>>     Unallocated:             7.04GiB
>>
>> This seems small enough, thus a btrfs-image dump would help.
>>
>> Although there is a limit for btrfs-image dump, since it only contains
>> metadata, when we try to balance data to reproduce the bug, it would
>> easily cause data csum error and exit convert.
>>
>> If possible, would you please try to take a dump with this branch?
>> https://github.com/adam900710/btrfs-progs/tree/image_data_dump
>>
>> It provides a new option for btrfs-image, -d, which will also take the 
>> data.
>>
>> Also, please keep in mind that, -d dump will contain data of your fs,
>> thus if it contains confidential info, please use regular btrfs-image.
>>
>> Thanks,
>> Qu
>>>
>>>
>>> On 12/27/20 1:11 PM, Stéphane Lesimple wrote:
>>>> Hello,
>>>>
>>>> As part of the maintenance routine of one of my raid1 FS, a few days
>>>> ago I was in the process
>>>> of replacing a 10T drive with a 16T one.
>>>> So I first added the new 16T drive to the FS (btrfs dev add), then
>>>> started a btrfs dev del.
>>>>
>>>> After a few days of balancing the block groups out of the old 10T 
>>>> drive,
>>>> the balance aborted when around 500 GiB of data was still to be moved
>>>> out of the drive:
>>>>
>>>> Dec 21 14:18:40 nas kernel: BTRFS info (device dm-10): relocating
>>>> block group 11115169841152 flags data|raid1
>>>> Dec 21 14:18:54 nas kernel: BTRFS info (device dm-10): found 6264
>>>> extents, stage: move data extents
>>>> Dec 21 14:19:16 nas kernel: BTRFS info (device dm-10): balance: ended
>>>> with status: -2
>>>>
>>>> Of course this also cancelled the device deletion, so after that the
>>>> device was still part of the FS. I then tried to do a balance manually,
>>>> in an attempt to reproduce the issue:
>>>>
>>>> Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): balance: start
>>>> -ddevid=limit=
>>>> Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): relocating
>>>> block group 11115169841152 flags data|raid1
>>>> Dec 21 14:28:29 nas kernel: BTRFS info (device dm-10): found 6264
>>>> extents, stage: move data extents
>>>> Dec 21 14:28:46 nas kernel: BTRFS info (device dm-10): balance: ended
>>>> with status: -2
>>>>
>>>> There were of course still plenty of room on the FS, as I added a new
>>>> 16T drive
>>>> (a btrfs fi usage is further down this email), so it struck me as odd.
>>>> So, I tried to lower the reduncancy temporarily, expecting the balance
>>>> of this block group to
>>>> complete immediately given that there were already a copy of this data
>>>> present on another drive:
>>>>
>>>> Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): balance: start
>>>> -dconvert=ngle,soft,devid=limit=
>>>> Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): relocating
>>>> block group 11115169841152 flags data|raid1
>>>> Dec 21 14:39:00 nas kernel: BTRFS info (device dm-10): found 6264
>>>> extents, stage: move data extents
>>>> Dec 21 14:39:17 nas kernel: BTRFS info (device dm-10): balance: ended
>>>> with status: -2
>>>>
>>>> That didn't work.
>>>> I also tried to mount the FS in degraded mode, with the drive I wanted
>>>> to remove missing,
>>>> using btrfs dev del missing, but the balance still failed with the
>>>> same error on the same block group.
>>>>
>>>> So, as I was running 5.10.1 just for a few days, I tried an older
>>>> kernel: 5.6.17,
>>>> and retried the balance once again (with still the drive voluntarily
>>>> missing):
>>>>
>>>> [ 413.188812] BTRFS info (device dm-10): allowing degraded mounts
>>>> [ 413.188814] BTRFS info (device dm-10): using free space tree
>>>> [ 413.188815] BTRFS info (device dm-10): has skinny extents
>>>> [ 413.189674] BTRFS warning (device dm-10): devid 5 uuid
>>>> 068c6db3-3c30-4c97-b96b-5fe2d6c5d677 is missing
>>>> [ 424.159486] BTRFS info (device dm-10): balance: start
>>>> -dconvert=ngle,soft,devid=limit=
>>>> [ 424.772640] BTRFS info (device dm-10): relocating block group
>>>> 11115169841152 flags data|raid1
>>>> [ 434.749100] BTRFS info (device dm-10): found 6264 extents, stage:
>>>> move data extents
>>>> [ 477.703111] BTRFS info (device dm-10): found 6264 extents, stage:
>>>> update data pointers
>>>> [ 497.941482] BTRFS info (device dm-10): balance: ended with status: 0
>>>>
>>>> The problematic block group was balanced successfully this time.
>>>>
>>>> I balanced a few more successfully (without the -dconvert=ngle option),
>>>> then decided to reboot under 5.10 just to see if I would hit this
>>>> issue again.
>>>> I didn't: the btrfs dev del worked correctly after the last 500G or so
>>>> data
>>>> was moved out of the drive.
>>>>
>>>> This is the output of btrfs fi usage after I successfully balanced the
>>>> problematic block group under the 5.6.17 kernel. Notice the multiple
>>>> data profile, which is expected as I used the -dconvert balance option,
>>>> and also the fact that apparently 3 chunks were allocated on new16T for
>>>> this, even if only 1 seem to be used. We can tell because this is the
>>>> first and only time the balance succeeded with the -dconvert option,
>>>> hence these chunks are all under "data,single":
>>>>
>>>> Overall:
>>>> Device size:        41.89TiB
>>>> Device allocated:   21.74TiB
>>>> Device unallocated: 20.14TiB
>>>> Device missing:      9.09TiB
>>>> Used:               21.71TiB
>>>> Free (estimated):   10.08TiB (min: 10.07TiB)
>>>> Data ratio:             2.00
>>>> Metadata ratio:         2.00
>>>> Global reserve:    512.00MiB (used: 0.00B)
>>>> Multiple profiles:       yes (data)
>>>>
>>>> Data,single: Size:3.00GiB, Used:1.00GiB (33.34%)
>>>> /dev/mapper/luks-new16T     3.00GiB
>>>>
>>>> Data,RAID1: Size:10.83TiB, Used:10.83TiB (99.99%)
>>>> /dev/mapper/luks-10Ta       7.14TiB
>>>> /dev/mapper/luks-10Tb       7.10TiB
>>>> missing                   482.00GiB
>>>> /dev/mapper/luks-new16T     6.95TiB
>>>>
>>>> Metadata,RAID1: Size:36.00GiB, Used:23.87GiB (66.31%)
>>>> /dev/mapper/luks-10Tb      36.00GiB
>>>> /dev/mapper/luks-ssd-mdata 36.00GiB
>>>>
>>>> System,RAID1: Size:32.00MiB, Used:1.77MiB (5.52%)
>>>> /dev/mapper/luks-10Ta      32.00MiB
>>>> /dev/mapper/luks-10Tb      32.00MiB
>>>>
>>>> Unallocated:
>>>> /dev/mapper/luks-10Ta       1.95TiB
>>>> /dev/mapper/luks-10Tb       1.96TiB
>>>> missing                     8.62TiB
>>>> /dev/mapper/luks-ssd-mdata 11.29GiB
>>>> /dev/mapper/luks-new16T     7.60TiB
>>>>
>>>> I wasn't going to send an email to this ML because I knew I had nothing
>>>> to reproduce the issue noww that it was "fixed", but now I think I'm
>>>> bumping
>>>> into the same issue on another FS, while rebalancing data after adding
>>>> a drive,
>>>> which happens to be the old 10T drive of the FS above.
>>>>
>>>> The btrfs fi usage of this second FS is as follows:
>>>>
>>>> Overall:
>>>> Device size:        25.50TiB
>>>> Device allocated:   22.95TiB
>>>> Device unallocated:  2.55TiB
>>>> Device missing:        0.00B
>>>> Used:               22.36TiB
>>>> Free (estimated):    3.14TiB (min: 1.87TiB)
>>>> Data ratio:             1.00
>>>> Metadata ratio:         2.00
>>>> Global reserve:    512.00MiB (used: 0.00B)
>>>> Multiple profiles:        no
>>>>
>>>> Data,single: Size:22.89TiB, Used:22.29TiB (97.40%)
>>>> /dev/mapper/luks-12T        10.91TiB
>>>> /dev/mapper/luks-3Ta         2.73TiB
>>>> /dev/mapper/luks-3Tb         2.73TiB
>>>> /dev/mapper/luks-10T         6.52TiB
>>>>
>>>> Metadata,RAID1: Size:32.00GiB, Used:30.83GiB (96.34%)
>>>> /dev/mapper/luks-ssd-mdata2 32.00GiB
>>>> /dev/mapper/luks-10T        32.00GiB
>>>>
>>>> System,RAID1: Size:32.00MiB, Used:2.44MiB (7.62%)
>>>> /dev/mapper/luks-3Tb        32.00MiB
>>>> /dev/mapper/luks-10T        32.00MiB
>>>>
>>>> Unallocated:
>>>> /dev/mapper/luks-12T        45.00MiB
>>>> /dev/mapper/luks-ssd-mdata2  4.00GiB
>>>> /dev/mapper/luks-3Ta         1.02MiB
>>>> /dev/mapper/luks-3Tb         2.97GiB
>>>> /dev/mapper/luks-10T         2.54TiB
>>>>
>>>> I can reproduce the problem reliably:
>>>>
>>>> # btrfs bal start -dvrange4625344765952..34625344765953 /tank
>>>> ERROR: error during balancing '/tank': No such file or directory
>>>> There may be more info in syslog - try dmesg | tail
>>>>
>>>> [145979.563045] BTRFS info (device dm-10): balance: start
>>>> -dvrange4625344765952..34625344765953
>>>> [145979.585572] BTRFS info (device dm-10): relocating block group
>>>> 34625344765952 flags data|raid1
>>>> [145990.396585] BTRFS info (device dm-10): found 167 extents, stage:
>>>> move data extents
>>>> [146002.236115] BTRFS info (device dm-10): balance: ended with 
>>>> status: -2
>>>>
>>>> If anybody is interested in looking into this, this time I can leave
>>>> the FS in this state.
>>>> The issue is reproducible, and I can live without completing the
>>>> balance for the next weeks
>>>> or even months, as I don't think I'll need the currently unallocatable
>>>> space soon.
>>>>
>>>> I also made a btrfs-image of the FS, using btrfs-image -c 9 -t 4 -s -w.
>>>> If it's of any use, I can drop it somewhere (51G).
>>>>
>>>> I could try to bisect manually to find which version between 5.6.x and
>>>> 5.10.1 started to behave
>>>> like this, but on the first success, I won't know how to reproduce the
>>>> issue a second time, as
>>>> I'm not 100% sure it can be done solely with the btrfs-image.
>>>>
>>>> Note that another user seem to have encoutered a similar issue in July
>>>> with 5.8:
>>>> https://www.spinics.net/lists/linux-btrfs/msg103188.html
>>>>
>>>> Regards,
>>>>
>>>> Stéphane Lesimple.
>>>
>>>
>