All of lore.kernel.org
 help / color / mirror / Atom feed
* 5.6-5.10 balance regression?
@ 2020-12-27 12:11 Stéphane Lesimple
  2020-12-27 13:11 ` David Arendt
  0 siblings, 1 reply; 13+ messages in thread
From: Stéphane Lesimple @ 2020-12-27 12:11 UTC (permalink / raw)
  To: linux-btrfs

Hello,

As part of the maintenance routine of one of my raid1 FS, a few days ago I was in the process
of replacing a 10T drive with a 16T one.
So I first added the new 16T drive to the FS (btrfs dev add), then started a btrfs dev del.

After a few days of balancing the block groups out of the old 10T drive,
the balance aborted when around 500 GiB of data was still to be moved
out of the drive:

Dec 21 14:18:40 nas kernel: BTRFS info (device dm-10): relocating block group 11115169841152 flags data|raid1
Dec 21 14:18:54 nas kernel: BTRFS info (device dm-10): found 6264 extents, stage: move data extents
Dec 21 14:19:16 nas kernel: BTRFS info (device dm-10): balance: ended with status: -2

Of course this also cancelled the device deletion, so after that the
device was still part of the FS. I then tried to do a balance manually,
in an attempt to reproduce the issue:

Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): balance: start -ddevid=5,limit=1
Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): relocating block group 11115169841152 flags data|raid1
Dec 21 14:28:29 nas kernel: BTRFS info (device dm-10): found 6264 extents, stage: move data extents
Dec 21 14:28:46 nas kernel: BTRFS info (device dm-10): balance: ended with status: -2

There were of course still plenty of room on the FS, as I added a new 16T drive
(a btrfs fi usage is further down this email), so it struck me as odd.
So, I tried to lower the reduncancy temporarily, expecting the balance of this block group to
complete immediately given that there were already a copy of this data present on another drive:

Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): balance: start -dconvert=single,soft,devid=5,limit=1
Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): relocating block group 11115169841152 flags data|raid1
Dec 21 14:39:00 nas kernel: BTRFS info (device dm-10): found 6264 extents, stage: move data extents
Dec 21 14:39:17 nas kernel: BTRFS info (device dm-10): balance: ended with status: -2

That didn't work.
I also tried to mount the FS in degraded mode, with the drive I wanted to remove missing,
using btrfs dev del missing, but the balance still failed with the same error on the same block group.

So, as I was running 5.10.1 just for a few days, I tried an older kernel: 5.6.17,
and retried the balance once again (with still the drive voluntarily missing):

[ 413.188812] BTRFS info (device dm-10): allowing degraded mounts
[ 413.188814] BTRFS info (device dm-10): using free space tree
[ 413.188815] BTRFS info (device dm-10): has skinny extents
[ 413.189674] BTRFS warning (device dm-10): devid 5 uuid 068c6db3-3c30-4c97-b96b-5fe2d6c5d677 is missing
[ 424.159486] BTRFS info (device dm-10): balance: start -dconvert=single,soft,devid=5,limit=1
[ 424.772640] BTRFS info (device dm-10): relocating block group 11115169841152 flags data|raid1
[ 434.749100] BTRFS info (device dm-10): found 6264 extents, stage: move data extents
[ 477.703111] BTRFS info (device dm-10): found 6264 extents, stage: update data pointers
[ 497.941482] BTRFS info (device dm-10): balance: ended with status: 0

The problematic block group was balanced successfully this time.

I balanced a few more successfully (without the -dconvert=single option),
then decided to reboot under 5.10 just to see if I would hit this issue again.
I didn't: the btrfs dev del worked correctly after the last 500G or so data
was moved out of the drive.

This is the output of btrfs fi usage after I successfully balanced the
problematic block group under the 5.6.17 kernel. Notice the multiple
data profile, which is expected as I used the -dconvert balance option,
and also the fact that apparently 3 chunks were allocated on new16T for
this, even if only 1 seem to be used. We can tell because this is the
first and only time the balance succeeded with the -dconvert option,
hence these chunks are all under "data,single":

Overall:
Device size:        41.89TiB
Device allocated:   21.74TiB
Device unallocated: 20.14TiB
Device missing:      9.09TiB
Used:               21.71TiB
Free (estimated):   10.08TiB (min: 10.07TiB)
Data ratio:             2.00
Metadata ratio:         2.00
Global reserve:    512.00MiB (used: 0.00B)
Multiple profiles:       yes (data)

Data,single: Size:3.00GiB, Used:1.00GiB (33.34%)
/dev/mapper/luks-new16T     3.00GiB

Data,RAID1: Size:10.83TiB, Used:10.83TiB (99.99%)
/dev/mapper/luks-10Ta       7.14TiB
/dev/mapper/luks-10Tb       7.10TiB
missing                   482.00GiB
/dev/mapper/luks-new16T     6.95TiB

Metadata,RAID1: Size:36.00GiB, Used:23.87GiB (66.31%)
/dev/mapper/luks-10Tb      36.00GiB
/dev/mapper/luks-ssd-mdata 36.00GiB

System,RAID1: Size:32.00MiB, Used:1.77MiB (5.52%)
/dev/mapper/luks-10Ta      32.00MiB
/dev/mapper/luks-10Tb      32.00MiB

Unallocated:
/dev/mapper/luks-10Ta       1.95TiB
/dev/mapper/luks-10Tb       1.96TiB
missing                     8.62TiB
/dev/mapper/luks-ssd-mdata 11.29GiB
/dev/mapper/luks-new16T     7.60TiB

I wasn't going to send an email to this ML because I knew I had nothing
to reproduce the issue noww that it was "fixed", but now I think I'm bumping
into the same issue on another FS, while rebalancing data after adding a drive,
which happens to be the old 10T drive of the FS above.

The btrfs fi usage of this second FS is as follows:

Overall:
Device size:        25.50TiB
Device allocated:   22.95TiB
Device unallocated:  2.55TiB
Device missing:        0.00B
Used:               22.36TiB
Free (estimated):    3.14TiB (min: 1.87TiB)
Data ratio:             1.00
Metadata ratio:         2.00
Global reserve:    512.00MiB (used: 0.00B)
Multiple profiles:        no

Data,single: Size:22.89TiB, Used:22.29TiB (97.40%)
/dev/mapper/luks-12T        10.91TiB
/dev/mapper/luks-3Ta         2.73TiB
/dev/mapper/luks-3Tb         2.73TiB
/dev/mapper/luks-10T         6.52TiB

Metadata,RAID1: Size:32.00GiB, Used:30.83GiB (96.34%)
/dev/mapper/luks-ssd-mdata2 32.00GiB
/dev/mapper/luks-10T        32.00GiB

System,RAID1: Size:32.00MiB, Used:2.44MiB (7.62%)
/dev/mapper/luks-3Tb        32.00MiB
/dev/mapper/luks-10T        32.00MiB

Unallocated:
/dev/mapper/luks-12T        45.00MiB
/dev/mapper/luks-ssd-mdata2  4.00GiB
/dev/mapper/luks-3Ta         1.02MiB
/dev/mapper/luks-3Tb         2.97GiB
/dev/mapper/luks-10T         2.54TiB

I can reproduce the problem reliably:

# btrfs bal start -dvrange=34625344765952..34625344765953 /tank
ERROR: error during balancing '/tank': No such file or directory
There may be more info in syslog - try dmesg | tail

[145979.563045] BTRFS info (device dm-10): balance: start -dvrange=34625344765952..34625344765953
[145979.585572] BTRFS info (device dm-10): relocating block group 34625344765952 flags data|raid1
[145990.396585] BTRFS info (device dm-10): found 167 extents, stage: move data extents
[146002.236115] BTRFS info (device dm-10): balance: ended with status: -2

If anybody is interested in looking into this, this time I can leave the FS in this state.
The issue is reproducible, and I can live without completing the balance for the next weeks
or even months, as I don't think I'll need the currently unallocatable space soon.

I also made a btrfs-image of the FS, using btrfs-image -c 9 -t 4 -s -w.
If it's of any use, I can drop it somewhere (51G).

I could try to bisect manually to find which version between 5.6.x and 5.10.1 started to behave
like this, but on the first success, I won't know how to reproduce the issue a second time, as
I'm not 100% sure it can be done solely with the btrfs-image.

Note that another user seem to have encoutered a similar issue in July with 5.8:
https://www.spinics.net/lists/linux-btrfs/msg103188.html

Regards,

Stéphane Lesimple.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.6-5.10 balance regression?
  2020-12-27 12:11 5.6-5.10 balance regression? Stéphane Lesimple
@ 2020-12-27 13:11 ` David Arendt
  2020-12-28  0:06   ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: David Arendt @ 2020-12-27 13:11 UTC (permalink / raw)
  To: Stéphane Lesimple, linux-btrfs

Hi,

last week I had the same problem on a btrfs filesystem after updating to 
kernel 5.10.1. I have never had this problem before kernel 5.10.x.
5.9.x did now show any problem.

Dec 14 22:30:59 xxx kernel: BTRFS info (device sda2): scrub: started on 
devid 1
Dec 14 22:31:09 xxx kernel: BTRFS info (device sda2): scrub: finished on 
devid 1 with status: 0
Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): balance: start 
-dusage=10
Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): relocating block 
group 71694286848 flags data
Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): found 1058 
extents, stage: move data extents
Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): balance: ended 
with status: -2

This is not a multidevice volume but a volume consisting of a single 
partition.

xxx ~ # btrfs fi df /u00
Data, single: total=10.01GiB, used=9.24GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=2.76GiB, used=1.10GiB
GlobalReserve, single: total=47.17MiB, used=0.00B

xxx ~ # btrfs device usage /u00
/dev/sda2, ID: 1
    Device size:            19.81GiB
    Device slack:              0.00B
    Data,single:            10.01GiB
    Metadata,single:         2.76GiB
    System,single:           4.00MiB
    Unallocated:             7.04GiB


On 12/27/20 1:11 PM, Stéphane Lesimple wrote:
> Hello,
>
> As part of the maintenance routine of one of my raid1 FS, a few days ago I was in the process
> of replacing a 10T drive with a 16T one.
> So I first added the new 16T drive to the FS (btrfs dev add), then started a btrfs dev del.
>
> After a few days of balancing the block groups out of the old 10T drive,
> the balance aborted when around 500 GiB of data was still to be moved
> out of the drive:
>
> Dec 21 14:18:40 nas kernel: BTRFS info (device dm-10): relocating block group 11115169841152 flags data|raid1
> Dec 21 14:18:54 nas kernel: BTRFS info (device dm-10): found 6264 extents, stage: move data extents
> Dec 21 14:19:16 nas kernel: BTRFS info (device dm-10): balance: ended with status: -2
>
> Of course this also cancelled the device deletion, so after that the
> device was still part of the FS. I then tried to do a balance manually,
> in an attempt to reproduce the issue:
>
> Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): balance: start -ddevid=5,limit=1
> Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): relocating block group 11115169841152 flags data|raid1
> Dec 21 14:28:29 nas kernel: BTRFS info (device dm-10): found 6264 extents, stage: move data extents
> Dec 21 14:28:46 nas kernel: BTRFS info (device dm-10): balance: ended with status: -2
>
> There were of course still plenty of room on the FS, as I added a new 16T drive
> (a btrfs fi usage is further down this email), so it struck me as odd.
> So, I tried to lower the reduncancy temporarily, expecting the balance of this block group to
> complete immediately given that there were already a copy of this data present on another drive:
>
> Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): balance: start -dconvert=single,soft,devid=5,limit=1
> Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): relocating block group 11115169841152 flags data|raid1
> Dec 21 14:39:00 nas kernel: BTRFS info (device dm-10): found 6264 extents, stage: move data extents
> Dec 21 14:39:17 nas kernel: BTRFS info (device dm-10): balance: ended with status: -2
>
> That didn't work.
> I also tried to mount the FS in degraded mode, with the drive I wanted to remove missing,
> using btrfs dev del missing, but the balance still failed with the same error on the same block group.
>
> So, as I was running 5.10.1 just for a few days, I tried an older kernel: 5.6.17,
> and retried the balance once again (with still the drive voluntarily missing):
>
> [ 413.188812] BTRFS info (device dm-10): allowing degraded mounts
> [ 413.188814] BTRFS info (device dm-10): using free space tree
> [ 413.188815] BTRFS info (device dm-10): has skinny extents
> [ 413.189674] BTRFS warning (device dm-10): devid 5 uuid 068c6db3-3c30-4c97-b96b-5fe2d6c5d677 is missing
> [ 424.159486] BTRFS info (device dm-10): balance: start -dconvert=single,soft,devid=5,limit=1
> [ 424.772640] BTRFS info (device dm-10): relocating block group 11115169841152 flags data|raid1
> [ 434.749100] BTRFS info (device dm-10): found 6264 extents, stage: move data extents
> [ 477.703111] BTRFS info (device dm-10): found 6264 extents, stage: update data pointers
> [ 497.941482] BTRFS info (device dm-10): balance: ended with status: 0
>
> The problematic block group was balanced successfully this time.
>
> I balanced a few more successfully (without the -dconvert=single option),
> then decided to reboot under 5.10 just to see if I would hit this issue again.
> I didn't: the btrfs dev del worked correctly after the last 500G or so data
> was moved out of the drive.
>
> This is the output of btrfs fi usage after I successfully balanced the
> problematic block group under the 5.6.17 kernel. Notice the multiple
> data profile, which is expected as I used the -dconvert balance option,
> and also the fact that apparently 3 chunks were allocated on new16T for
> this, even if only 1 seem to be used. We can tell because this is the
> first and only time the balance succeeded with the -dconvert option,
> hence these chunks are all under "data,single":
>
> Overall:
> Device size:        41.89TiB
> Device allocated:   21.74TiB
> Device unallocated: 20.14TiB
> Device missing:      9.09TiB
> Used:               21.71TiB
> Free (estimated):   10.08TiB (min: 10.07TiB)
> Data ratio:             2.00
> Metadata ratio:         2.00
> Global reserve:    512.00MiB (used: 0.00B)
> Multiple profiles:       yes (data)
>
> Data,single: Size:3.00GiB, Used:1.00GiB (33.34%)
> /dev/mapper/luks-new16T     3.00GiB
>
> Data,RAID1: Size:10.83TiB, Used:10.83TiB (99.99%)
> /dev/mapper/luks-10Ta       7.14TiB
> /dev/mapper/luks-10Tb       7.10TiB
> missing                   482.00GiB
> /dev/mapper/luks-new16T     6.95TiB
>
> Metadata,RAID1: Size:36.00GiB, Used:23.87GiB (66.31%)
> /dev/mapper/luks-10Tb      36.00GiB
> /dev/mapper/luks-ssd-mdata 36.00GiB
>
> System,RAID1: Size:32.00MiB, Used:1.77MiB (5.52%)
> /dev/mapper/luks-10Ta      32.00MiB
> /dev/mapper/luks-10Tb      32.00MiB
>
> Unallocated:
> /dev/mapper/luks-10Ta       1.95TiB
> /dev/mapper/luks-10Tb       1.96TiB
> missing                     8.62TiB
> /dev/mapper/luks-ssd-mdata 11.29GiB
> /dev/mapper/luks-new16T     7.60TiB
>
> I wasn't going to send an email to this ML because I knew I had nothing
> to reproduce the issue noww that it was "fixed", but now I think I'm bumping
> into the same issue on another FS, while rebalancing data after adding a drive,
> which happens to be the old 10T drive of the FS above.
>
> The btrfs fi usage of this second FS is as follows:
>
> Overall:
> Device size:        25.50TiB
> Device allocated:   22.95TiB
> Device unallocated:  2.55TiB
> Device missing:        0.00B
> Used:               22.36TiB
> Free (estimated):    3.14TiB (min: 1.87TiB)
> Data ratio:             1.00
> Metadata ratio:         2.00
> Global reserve:    512.00MiB (used: 0.00B)
> Multiple profiles:        no
>
> Data,single: Size:22.89TiB, Used:22.29TiB (97.40%)
> /dev/mapper/luks-12T        10.91TiB
> /dev/mapper/luks-3Ta         2.73TiB
> /dev/mapper/luks-3Tb         2.73TiB
> /dev/mapper/luks-10T         6.52TiB
>
> Metadata,RAID1: Size:32.00GiB, Used:30.83GiB (96.34%)
> /dev/mapper/luks-ssd-mdata2 32.00GiB
> /dev/mapper/luks-10T        32.00GiB
>
> System,RAID1: Size:32.00MiB, Used:2.44MiB (7.62%)
> /dev/mapper/luks-3Tb        32.00MiB
> /dev/mapper/luks-10T        32.00MiB
>
> Unallocated:
> /dev/mapper/luks-12T        45.00MiB
> /dev/mapper/luks-ssd-mdata2  4.00GiB
> /dev/mapper/luks-3Ta         1.02MiB
> /dev/mapper/luks-3Tb         2.97GiB
> /dev/mapper/luks-10T         2.54TiB
>
> I can reproduce the problem reliably:
>
> # btrfs bal start -dvrange=34625344765952..34625344765953 /tank
> ERROR: error during balancing '/tank': No such file or directory
> There may be more info in syslog - try dmesg | tail
>
> [145979.563045] BTRFS info (device dm-10): balance: start -dvrange=34625344765952..34625344765953
> [145979.585572] BTRFS info (device dm-10): relocating block group 34625344765952 flags data|raid1
> [145990.396585] BTRFS info (device dm-10): found 167 extents, stage: move data extents
> [146002.236115] BTRFS info (device dm-10): balance: ended with status: -2
>
> If anybody is interested in looking into this, this time I can leave the FS in this state.
> The issue is reproducible, and I can live without completing the balance for the next weeks
> or even months, as I don't think I'll need the currently unallocatable space soon.
>
> I also made a btrfs-image of the FS, using btrfs-image -c 9 -t 4 -s -w.
> If it's of any use, I can drop it somewhere (51G).
>
> I could try to bisect manually to find which version between 5.6.x and 5.10.1 started to behave
> like this, but on the first success, I won't know how to reproduce the issue a second time, as
> I'm not 100% sure it can be done solely with the btrfs-image.
>
> Note that another user seem to have encoutered a similar issue in July with 5.8:
> https://www.spinics.net/lists/linux-btrfs/msg103188.html
>
> Regards,
>
> Stéphane Lesimple.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.6-5.10 balance regression?
  2020-12-27 13:11 ` David Arendt
@ 2020-12-28  0:06   ` Qu Wenruo
  2020-12-28  7:38     ` David Arendt
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2020-12-28  0:06 UTC (permalink / raw)
  To: David Arendt, Stéphane Lesimple, linux-btrfs



On 2020/12/27 下午9:11, David Arendt wrote:
> Hi,
>
> last week I had the same problem on a btrfs filesystem after updating to
> kernel 5.10.1. I have never had this problem before kernel 5.10.x.
> 5.9.x did now show any problem.
>
> Dec 14 22:30:59 xxx kernel: BTRFS info (device sda2): scrub: started on
> devid 1
> Dec 14 22:31:09 xxx kernel: BTRFS info (device sda2): scrub: finished on
> devid 1 with status: 0
> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): balance: start
> -dusage=10
> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): relocating block
> group 71694286848 flags data
> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): found 1058
> extents, stage: move data extents
> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): balance: ended
> with status: -2
>
> This is not a multidevice volume but a volume consisting of a single
> partition.
>
> xxx ~ # btrfs fi df /u00
> Data, single: total=10.01GiB, used=9.24GiB
> System, single: total=4.00MiB, used=16.00KiB
> Metadata, single: total=2.76GiB, used=1.10GiB
> GlobalReserve, single: total=47.17MiB, used=0.00B
>
> xxx ~ # btrfs device usage /u00
> /dev/sda2, ID: 1
>     Device size:            19.81GiB
>     Device slack:              0.00B
>     Data,single:            10.01GiB
>     Metadata,single:         2.76GiB
>     System,single:           4.00MiB
>     Unallocated:             7.04GiB

This seems small enough, thus a btrfs-image dump would help.

Although there is a limit for btrfs-image dump, since it only contains
metadata, when we try to balance data to reproduce the bug, it would
easily cause data csum error and exit convert.

If possible, would you please try to take a dump with this branch?
https://github.com/adam900710/btrfs-progs/tree/image_data_dump

It provides a new option for btrfs-image, -d, which will also take the data.

Also, please keep in mind that, -d dump will contain data of your fs,
thus if it contains confidential info, please use regular btrfs-image.

Thanks,
Qu
>
>
> On 12/27/20 1:11 PM, Stéphane Lesimple wrote:
>> Hello,
>>
>> As part of the maintenance routine of one of my raid1 FS, a few days
>> ago I was in the process
>> of replacing a 10T drive with a 16T one.
>> So I first added the new 16T drive to the FS (btrfs dev add), then
>> started a btrfs dev del.
>>
>> After a few days of balancing the block groups out of the old 10T drive,
>> the balance aborted when around 500 GiB of data was still to be moved
>> out of the drive:
>>
>> Dec 21 14:18:40 nas kernel: BTRFS info (device dm-10): relocating
>> block group 11115169841152 flags data|raid1
>> Dec 21 14:18:54 nas kernel: BTRFS info (device dm-10): found 6264
>> extents, stage: move data extents
>> Dec 21 14:19:16 nas kernel: BTRFS info (device dm-10): balance: ended
>> with status: -2
>>
>> Of course this also cancelled the device deletion, so after that the
>> device was still part of the FS. I then tried to do a balance manually,
>> in an attempt to reproduce the issue:
>>
>> Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): balance: start
>> -ddevid=5,limit=1
>> Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): relocating
>> block group 11115169841152 flags data|raid1
>> Dec 21 14:28:29 nas kernel: BTRFS info (device dm-10): found 6264
>> extents, stage: move data extents
>> Dec 21 14:28:46 nas kernel: BTRFS info (device dm-10): balance: ended
>> with status: -2
>>
>> There were of course still plenty of room on the FS, as I added a new
>> 16T drive
>> (a btrfs fi usage is further down this email), so it struck me as odd.
>> So, I tried to lower the reduncancy temporarily, expecting the balance
>> of this block group to
>> complete immediately given that there were already a copy of this data
>> present on another drive:
>>
>> Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): balance: start
>> -dconvert=single,soft,devid=5,limit=1
>> Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): relocating
>> block group 11115169841152 flags data|raid1
>> Dec 21 14:39:00 nas kernel: BTRFS info (device dm-10): found 6264
>> extents, stage: move data extents
>> Dec 21 14:39:17 nas kernel: BTRFS info (device dm-10): balance: ended
>> with status: -2
>>
>> That didn't work.
>> I also tried to mount the FS in degraded mode, with the drive I wanted
>> to remove missing,
>> using btrfs dev del missing, but the balance still failed with the
>> same error on the same block group.
>>
>> So, as I was running 5.10.1 just for a few days, I tried an older
>> kernel: 5.6.17,
>> and retried the balance once again (with still the drive voluntarily
>> missing):
>>
>> [ 413.188812] BTRFS info (device dm-10): allowing degraded mounts
>> [ 413.188814] BTRFS info (device dm-10): using free space tree
>> [ 413.188815] BTRFS info (device dm-10): has skinny extents
>> [ 413.189674] BTRFS warning (device dm-10): devid 5 uuid
>> 068c6db3-3c30-4c97-b96b-5fe2d6c5d677 is missing
>> [ 424.159486] BTRFS info (device dm-10): balance: start
>> -dconvert=single,soft,devid=5,limit=1
>> [ 424.772640] BTRFS info (device dm-10): relocating block group
>> 11115169841152 flags data|raid1
>> [ 434.749100] BTRFS info (device dm-10): found 6264 extents, stage:
>> move data extents
>> [ 477.703111] BTRFS info (device dm-10): found 6264 extents, stage:
>> update data pointers
>> [ 497.941482] BTRFS info (device dm-10): balance: ended with status: 0
>>
>> The problematic block group was balanced successfully this time.
>>
>> I balanced a few more successfully (without the -dconvert=single option),
>> then decided to reboot under 5.10 just to see if I would hit this
>> issue again.
>> I didn't: the btrfs dev del worked correctly after the last 500G or so
>> data
>> was moved out of the drive.
>>
>> This is the output of btrfs fi usage after I successfully balanced the
>> problematic block group under the 5.6.17 kernel. Notice the multiple
>> data profile, which is expected as I used the -dconvert balance option,
>> and also the fact that apparently 3 chunks were allocated on new16T for
>> this, even if only 1 seem to be used. We can tell because this is the
>> first and only time the balance succeeded with the -dconvert option,
>> hence these chunks are all under "data,single":
>>
>> Overall:
>> Device size:        41.89TiB
>> Device allocated:   21.74TiB
>> Device unallocated: 20.14TiB
>> Device missing:      9.09TiB
>> Used:               21.71TiB
>> Free (estimated):   10.08TiB (min: 10.07TiB)
>> Data ratio:             2.00
>> Metadata ratio:         2.00
>> Global reserve:    512.00MiB (used: 0.00B)
>> Multiple profiles:       yes (data)
>>
>> Data,single: Size:3.00GiB, Used:1.00GiB (33.34%)
>> /dev/mapper/luks-new16T     3.00GiB
>>
>> Data,RAID1: Size:10.83TiB, Used:10.83TiB (99.99%)
>> /dev/mapper/luks-10Ta       7.14TiB
>> /dev/mapper/luks-10Tb       7.10TiB
>> missing                   482.00GiB
>> /dev/mapper/luks-new16T     6.95TiB
>>
>> Metadata,RAID1: Size:36.00GiB, Used:23.87GiB (66.31%)
>> /dev/mapper/luks-10Tb      36.00GiB
>> /dev/mapper/luks-ssd-mdata 36.00GiB
>>
>> System,RAID1: Size:32.00MiB, Used:1.77MiB (5.52%)
>> /dev/mapper/luks-10Ta      32.00MiB
>> /dev/mapper/luks-10Tb      32.00MiB
>>
>> Unallocated:
>> /dev/mapper/luks-10Ta       1.95TiB
>> /dev/mapper/luks-10Tb       1.96TiB
>> missing                     8.62TiB
>> /dev/mapper/luks-ssd-mdata 11.29GiB
>> /dev/mapper/luks-new16T     7.60TiB
>>
>> I wasn't going to send an email to this ML because I knew I had nothing
>> to reproduce the issue noww that it was "fixed", but now I think I'm
>> bumping
>> into the same issue on another FS, while rebalancing data after adding
>> a drive,
>> which happens to be the old 10T drive of the FS above.
>>
>> The btrfs fi usage of this second FS is as follows:
>>
>> Overall:
>> Device size:        25.50TiB
>> Device allocated:   22.95TiB
>> Device unallocated:  2.55TiB
>> Device missing:        0.00B
>> Used:               22.36TiB
>> Free (estimated):    3.14TiB (min: 1.87TiB)
>> Data ratio:             1.00
>> Metadata ratio:         2.00
>> Global reserve:    512.00MiB (used: 0.00B)
>> Multiple profiles:        no
>>
>> Data,single: Size:22.89TiB, Used:22.29TiB (97.40%)
>> /dev/mapper/luks-12T        10.91TiB
>> /dev/mapper/luks-3Ta         2.73TiB
>> /dev/mapper/luks-3Tb         2.73TiB
>> /dev/mapper/luks-10T         6.52TiB
>>
>> Metadata,RAID1: Size:32.00GiB, Used:30.83GiB (96.34%)
>> /dev/mapper/luks-ssd-mdata2 32.00GiB
>> /dev/mapper/luks-10T        32.00GiB
>>
>> System,RAID1: Size:32.00MiB, Used:2.44MiB (7.62%)
>> /dev/mapper/luks-3Tb        32.00MiB
>> /dev/mapper/luks-10T        32.00MiB
>>
>> Unallocated:
>> /dev/mapper/luks-12T        45.00MiB
>> /dev/mapper/luks-ssd-mdata2  4.00GiB
>> /dev/mapper/luks-3Ta         1.02MiB
>> /dev/mapper/luks-3Tb         2.97GiB
>> /dev/mapper/luks-10T         2.54TiB
>>
>> I can reproduce the problem reliably:
>>
>> # btrfs bal start -dvrange=34625344765952..34625344765953 /tank
>> ERROR: error during balancing '/tank': No such file or directory
>> There may be more info in syslog - try dmesg | tail
>>
>> [145979.563045] BTRFS info (device dm-10): balance: start
>> -dvrange=34625344765952..34625344765953
>> [145979.585572] BTRFS info (device dm-10): relocating block group
>> 34625344765952 flags data|raid1
>> [145990.396585] BTRFS info (device dm-10): found 167 extents, stage:
>> move data extents
>> [146002.236115] BTRFS info (device dm-10): balance: ended with status: -2
>>
>> If anybody is interested in looking into this, this time I can leave
>> the FS in this state.
>> The issue is reproducible, and I can live without completing the
>> balance for the next weeks
>> or even months, as I don't think I'll need the currently unallocatable
>> space soon.
>>
>> I also made a btrfs-image of the FS, using btrfs-image -c 9 -t 4 -s -w.
>> If it's of any use, I can drop it somewhere (51G).
>>
>> I could try to bisect manually to find which version between 5.6.x and
>> 5.10.1 started to behave
>> like this, but on the first success, I won't know how to reproduce the
>> issue a second time, as
>> I'm not 100% sure it can be done solely with the btrfs-image.
>>
>> Note that another user seem to have encoutered a similar issue in July
>> with 5.8:
>> https://www.spinics.net/lists/linux-btrfs/msg103188.html
>>
>> Regards,
>>
>> Stéphane Lesimple.
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.6-5.10 balance regression?
  2020-12-28  0:06   ` Qu Wenruo
@ 2020-12-28  7:38     ` David Arendt
  2020-12-28  7:48       ` Qu Wenruo
                         ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: David Arendt @ 2020-12-28  7:38 UTC (permalink / raw)
  To: Qu Wenruo, Stéphane Lesimple, linux-btrfs

Hi,

unfortunately the problem is no longer reproducible, probably due to 
writes happening in meantime. If you still want a btrfs-image, I can 
create one (unfortunately only without data as there is confidential 
data in it), but as the problem is currently no longer reproducible, I 
think it probably won't help.

Thanks in advance,
David Arendt

On 12/28/20 1:06 AM, Qu Wenruo wrote:
>
>
> On 2020/12/27 下午9:11, David Arendt wrote:
>> Hi,
>>
>> last week I had the same problem on a btrfs filesystem after updating to
>> kernel 5.10.1. I have never had this problem before kernel 5.10.x.
>> 5.9.x did now show any problem.
>>
>> Dec 14 22:30:59 xxx kernel: BTRFS info (device sda2): scrub: started on
>> devid 1
>> Dec 14 22:31:09 xxx kernel: BTRFS info (device sda2): scrub: finished on
>> devid 1 with status: 0
>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): balance: start
>> -dusage=10
>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): relocating block
>> group 71694286848 flags data
>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): found 1058
>> extents, stage: move data extents
>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): balance: ended
>> with status: -2
>>
>> This is not a multidevice volume but a volume consisting of a single
>> partition.
>>
>> xxx ~ # btrfs fi df /u00
>> Data, single: total=10.01GiB, used=9.24GiB
>> System, single: total=4.00MiB, used=16.00KiB
>> Metadata, single: total=2.76GiB, used=1.10GiB
>> GlobalReserve, single: total=47.17MiB, used=0.00B
>>
>> xxx ~ # btrfs device usage /u00
>> /dev/sda2, ID: 1
>>     Device size:            19.81GiB
>>     Device slack:              0.00B
>>     Data,single:            10.01GiB
>>     Metadata,single:         2.76GiB
>>     System,single:           4.00MiB
>>     Unallocated:             7.04GiB
>
> This seems small enough, thus a btrfs-image dump would help.
>
> Although there is a limit for btrfs-image dump, since it only contains
> metadata, when we try to balance data to reproduce the bug, it would
> easily cause data csum error and exit convert.
>
> If possible, would you please try to take a dump with this branch?
> https://github.com/adam900710/btrfs-progs/tree/image_data_dump
>
> It provides a new option for btrfs-image, -d, which will also take the 
> data.
>
> Also, please keep in mind that, -d dump will contain data of your fs,
> thus if it contains confidential info, please use regular btrfs-image.
>
> Thanks,
> Qu
>>
>>
>> On 12/27/20 1:11 PM, Stéphane Lesimple wrote:
>>> Hello,
>>>
>>> As part of the maintenance routine of one of my raid1 FS, a few days
>>> ago I was in the process
>>> of replacing a 10T drive with a 16T one.
>>> So I first added the new 16T drive to the FS (btrfs dev add), then
>>> started a btrfs dev del.
>>>
>>> After a few days of balancing the block groups out of the old 10T 
>>> drive,
>>> the balance aborted when around 500 GiB of data was still to be moved
>>> out of the drive:
>>>
>>> Dec 21 14:18:40 nas kernel: BTRFS info (device dm-10): relocating
>>> block group 11115169841152 flags data|raid1
>>> Dec 21 14:18:54 nas kernel: BTRFS info (device dm-10): found 6264
>>> extents, stage: move data extents
>>> Dec 21 14:19:16 nas kernel: BTRFS info (device dm-10): balance: ended
>>> with status: -2
>>>
>>> Of course this also cancelled the device deletion, so after that the
>>> device was still part of the FS. I then tried to do a balance manually,
>>> in an attempt to reproduce the issue:
>>>
>>> Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): balance: start
>>> -ddevid=5,limit=1
>>> Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): relocating
>>> block group 11115169841152 flags data|raid1
>>> Dec 21 14:28:29 nas kernel: BTRFS info (device dm-10): found 6264
>>> extents, stage: move data extents
>>> Dec 21 14:28:46 nas kernel: BTRFS info (device dm-10): balance: ended
>>> with status: -2
>>>
>>> There were of course still plenty of room on the FS, as I added a new
>>> 16T drive
>>> (a btrfs fi usage is further down this email), so it struck me as odd.
>>> So, I tried to lower the reduncancy temporarily, expecting the balance
>>> of this block group to
>>> complete immediately given that there were already a copy of this data
>>> present on another drive:
>>>
>>> Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): balance: start
>>> -dconvert=single,soft,devid=5,limit=1
>>> Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): relocating
>>> block group 11115169841152 flags data|raid1
>>> Dec 21 14:39:00 nas kernel: BTRFS info (device dm-10): found 6264
>>> extents, stage: move data extents
>>> Dec 21 14:39:17 nas kernel: BTRFS info (device dm-10): balance: ended
>>> with status: -2
>>>
>>> That didn't work.
>>> I also tried to mount the FS in degraded mode, with the drive I wanted
>>> to remove missing,
>>> using btrfs dev del missing, but the balance still failed with the
>>> same error on the same block group.
>>>
>>> So, as I was running 5.10.1 just for a few days, I tried an older
>>> kernel: 5.6.17,
>>> and retried the balance once again (with still the drive voluntarily
>>> missing):
>>>
>>> [ 413.188812] BTRFS info (device dm-10): allowing degraded mounts
>>> [ 413.188814] BTRFS info (device dm-10): using free space tree
>>> [ 413.188815] BTRFS info (device dm-10): has skinny extents
>>> [ 413.189674] BTRFS warning (device dm-10): devid 5 uuid
>>> 068c6db3-3c30-4c97-b96b-5fe2d6c5d677 is missing
>>> [ 424.159486] BTRFS info (device dm-10): balance: start
>>> -dconvert=single,soft,devid=5,limit=1
>>> [ 424.772640] BTRFS info (device dm-10): relocating block group
>>> 11115169841152 flags data|raid1
>>> [ 434.749100] BTRFS info (device dm-10): found 6264 extents, stage:
>>> move data extents
>>> [ 477.703111] BTRFS info (device dm-10): found 6264 extents, stage:
>>> update data pointers
>>> [ 497.941482] BTRFS info (device dm-10): balance: ended with status: 0
>>>
>>> The problematic block group was balanced successfully this time.
>>>
>>> I balanced a few more successfully (without the -dconvert=single 
>>> option),
>>> then decided to reboot under 5.10 just to see if I would hit this
>>> issue again.
>>> I didn't: the btrfs dev del worked correctly after the last 500G or so
>>> data
>>> was moved out of the drive.
>>>
>>> This is the output of btrfs fi usage after I successfully balanced the
>>> problematic block group under the 5.6.17 kernel. Notice the multiple
>>> data profile, which is expected as I used the -dconvert balance option,
>>> and also the fact that apparently 3 chunks were allocated on new16T for
>>> this, even if only 1 seem to be used. We can tell because this is the
>>> first and only time the balance succeeded with the -dconvert option,
>>> hence these chunks are all under "data,single":
>>>
>>> Overall:
>>> Device size:        41.89TiB
>>> Device allocated:   21.74TiB
>>> Device unallocated: 20.14TiB
>>> Device missing:      9.09TiB
>>> Used:               21.71TiB
>>> Free (estimated):   10.08TiB (min: 10.07TiB)
>>> Data ratio:             2.00
>>> Metadata ratio:         2.00
>>> Global reserve:    512.00MiB (used: 0.00B)
>>> Multiple profiles:       yes (data)
>>>
>>> Data,single: Size:3.00GiB, Used:1.00GiB (33.34%)
>>> /dev/mapper/luks-new16T     3.00GiB
>>>
>>> Data,RAID1: Size:10.83TiB, Used:10.83TiB (99.99%)
>>> /dev/mapper/luks-10Ta       7.14TiB
>>> /dev/mapper/luks-10Tb       7.10TiB
>>> missing                   482.00GiB
>>> /dev/mapper/luks-new16T     6.95TiB
>>>
>>> Metadata,RAID1: Size:36.00GiB, Used:23.87GiB (66.31%)
>>> /dev/mapper/luks-10Tb      36.00GiB
>>> /dev/mapper/luks-ssd-mdata 36.00GiB
>>>
>>> System,RAID1: Size:32.00MiB, Used:1.77MiB (5.52%)
>>> /dev/mapper/luks-10Ta      32.00MiB
>>> /dev/mapper/luks-10Tb      32.00MiB
>>>
>>> Unallocated:
>>> /dev/mapper/luks-10Ta       1.95TiB
>>> /dev/mapper/luks-10Tb       1.96TiB
>>> missing                     8.62TiB
>>> /dev/mapper/luks-ssd-mdata 11.29GiB
>>> /dev/mapper/luks-new16T     7.60TiB
>>>
>>> I wasn't going to send an email to this ML because I knew I had nothing
>>> to reproduce the issue noww that it was "fixed", but now I think I'm
>>> bumping
>>> into the same issue on another FS, while rebalancing data after adding
>>> a drive,
>>> which happens to be the old 10T drive of the FS above.
>>>
>>> The btrfs fi usage of this second FS is as follows:
>>>
>>> Overall:
>>> Device size:        25.50TiB
>>> Device allocated:   22.95TiB
>>> Device unallocated:  2.55TiB
>>> Device missing:        0.00B
>>> Used:               22.36TiB
>>> Free (estimated):    3.14TiB (min: 1.87TiB)
>>> Data ratio:             1.00
>>> Metadata ratio:         2.00
>>> Global reserve:    512.00MiB (used: 0.00B)
>>> Multiple profiles:        no
>>>
>>> Data,single: Size:22.89TiB, Used:22.29TiB (97.40%)
>>> /dev/mapper/luks-12T        10.91TiB
>>> /dev/mapper/luks-3Ta         2.73TiB
>>> /dev/mapper/luks-3Tb         2.73TiB
>>> /dev/mapper/luks-10T         6.52TiB
>>>
>>> Metadata,RAID1: Size:32.00GiB, Used:30.83GiB (96.34%)
>>> /dev/mapper/luks-ssd-mdata2 32.00GiB
>>> /dev/mapper/luks-10T        32.00GiB
>>>
>>> System,RAID1: Size:32.00MiB, Used:2.44MiB (7.62%)
>>> /dev/mapper/luks-3Tb        32.00MiB
>>> /dev/mapper/luks-10T        32.00MiB
>>>
>>> Unallocated:
>>> /dev/mapper/luks-12T        45.00MiB
>>> /dev/mapper/luks-ssd-mdata2  4.00GiB
>>> /dev/mapper/luks-3Ta         1.02MiB
>>> /dev/mapper/luks-3Tb         2.97GiB
>>> /dev/mapper/luks-10T         2.54TiB
>>>
>>> I can reproduce the problem reliably:
>>>
>>> # btrfs bal start -dvrange=34625344765952..34625344765953 /tank
>>> ERROR: error during balancing '/tank': No such file or directory
>>> There may be more info in syslog - try dmesg | tail
>>>
>>> [145979.563045] BTRFS info (device dm-10): balance: start
>>> -dvrange=34625344765952..34625344765953
>>> [145979.585572] BTRFS info (device dm-10): relocating block group
>>> 34625344765952 flags data|raid1
>>> [145990.396585] BTRFS info (device dm-10): found 167 extents, stage:
>>> move data extents
>>> [146002.236115] BTRFS info (device dm-10): balance: ended with 
>>> status: -2
>>>
>>> If anybody is interested in looking into this, this time I can leave
>>> the FS in this state.
>>> The issue is reproducible, and I can live without completing the
>>> balance for the next weeks
>>> or even months, as I don't think I'll need the currently unallocatable
>>> space soon.
>>>
>>> I also made a btrfs-image of the FS, using btrfs-image -c 9 -t 4 -s -w.
>>> If it's of any use, I can drop it somewhere (51G).
>>>
>>> I could try to bisect manually to find which version between 5.6.x and
>>> 5.10.1 started to behave
>>> like this, but on the first success, I won't know how to reproduce the
>>> issue a second time, as
>>> I'm not 100% sure it can be done solely with the btrfs-image.
>>>
>>> Note that another user seem to have encoutered a similar issue in July
>>> with 5.8:
>>> https://www.spinics.net/lists/linux-btrfs/msg103188.html
>>>
>>> Regards,
>>>
>>> Stéphane Lesimple.
>>
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.6-5.10 balance regression?
  2020-12-28  7:38     ` David Arendt
@ 2020-12-28  7:48       ` Qu Wenruo
  2020-12-28 17:43       ` Stéphane Lesimple
  2020-12-28 19:58       ` Stéphane Lesimple
  2 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2020-12-28  7:48 UTC (permalink / raw)
  To: David Arendt, Qu Wenruo, Stéphane Lesimple, linux-btrfs



On 2020/12/28 下午3:38, David Arendt wrote:
> Hi,
> 
> unfortunately the problem is no longer reproducible, probably due to 
> writes happening in meantime. If you still want a btrfs-image, I can 
> create one (unfortunately only without data as there is confidential 
> data in it), but as the problem is currently no longer reproducible, I 
> think it probably won't help.

That's fine, at least you get your fs back to normal.

I tried several small balance locally, not reproduced, thus I guess it 
may be related to certain tree layout.

Anyway, I'll wait for another small enough and reproducible report.

Thanks,
Qu

> 
> Thanks in advance,
> David Arendt
> 
> On 12/28/20 1:06 AM, Qu Wenruo wrote:
>>
>>
>> On 2020/12/27 下午9:11, David Arendt wrote:
>>> Hi,
>>>
>>> last week I had the same problem on a btrfs filesystem after updating to
>>> kernel 5.10.1. I have never had this problem before kernel 5.10.x.
>>> 5.9.x did now show any problem.
>>>
>>> Dec 14 22:30:59 xxx kernel: BTRFS info (device sda2): scrub: started on
>>> devid 1
>>> Dec 14 22:31:09 xxx kernel: BTRFS info (device sda2): scrub: finished on
>>> devid 1 with status: 0
>>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): balance: start
>>> -dusage\x10
>>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): relocating block
>>> group 71694286848 flags data
>>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): found 1058
>>> extents, stage: move data extents
>>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): balance: ended
>>> with status: -2
>>>
>>> This is not a multidevice volume but a volume consisting of a single
>>> partition.
>>>
>>> xxx ~ # btrfs fi df /u00
>>> Data, single: total\x10.01GiB, used=24GiB
>>> System, single: total=00MiB, used\x16.00KiB
>>> Metadata, single: total=76GiB, used=10GiB
>>> GlobalReserve, single: totalG.17MiB, used=00B
>>>
>>> xxx ~ # btrfs device usage /u00
>>> /dev/sda2, ID: 1
>>>     Device size:            19.81GiB
>>>     Device slack:              0.00B
>>>     Data,single:            10.01GiB
>>>     Metadata,single:         2.76GiB
>>>     System,single:           4.00MiB
>>>     Unallocated:             7.04GiB
>>
>> This seems small enough, thus a btrfs-image dump would help.
>>
>> Although there is a limit for btrfs-image dump, since it only contains
>> metadata, when we try to balance data to reproduce the bug, it would
>> easily cause data csum error and exit convert.
>>
>> If possible, would you please try to take a dump with this branch?
>> https://github.com/adam900710/btrfs-progs/tree/image_data_dump
>>
>> It provides a new option for btrfs-image, -d, which will also take the 
>> data.
>>
>> Also, please keep in mind that, -d dump will contain data of your fs,
>> thus if it contains confidential info, please use regular btrfs-image.
>>
>> Thanks,
>> Qu
>>>
>>>
>>> On 12/27/20 1:11 PM, Stéphane Lesimple wrote:
>>>> Hello,
>>>>
>>>> As part of the maintenance routine of one of my raid1 FS, a few days
>>>> ago I was in the process
>>>> of replacing a 10T drive with a 16T one.
>>>> So I first added the new 16T drive to the FS (btrfs dev add), then
>>>> started a btrfs dev del.
>>>>
>>>> After a few days of balancing the block groups out of the old 10T 
>>>> drive,
>>>> the balance aborted when around 500 GiB of data was still to be moved
>>>> out of the drive:
>>>>
>>>> Dec 21 14:18:40 nas kernel: BTRFS info (device dm-10): relocating
>>>> block group 11115169841152 flags data|raid1
>>>> Dec 21 14:18:54 nas kernel: BTRFS info (device dm-10): found 6264
>>>> extents, stage: move data extents
>>>> Dec 21 14:19:16 nas kernel: BTRFS info (device dm-10): balance: ended
>>>> with status: -2
>>>>
>>>> Of course this also cancelled the device deletion, so after that the
>>>> device was still part of the FS. I then tried to do a balance manually,
>>>> in an attempt to reproduce the issue:
>>>>
>>>> Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): balance: start
>>>> -ddevid=limit=
>>>> Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): relocating
>>>> block group 11115169841152 flags data|raid1
>>>> Dec 21 14:28:29 nas kernel: BTRFS info (device dm-10): found 6264
>>>> extents, stage: move data extents
>>>> Dec 21 14:28:46 nas kernel: BTRFS info (device dm-10): balance: ended
>>>> with status: -2
>>>>
>>>> There were of course still plenty of room on the FS, as I added a new
>>>> 16T drive
>>>> (a btrfs fi usage is further down this email), so it struck me as odd.
>>>> So, I tried to lower the reduncancy temporarily, expecting the balance
>>>> of this block group to
>>>> complete immediately given that there were already a copy of this data
>>>> present on another drive:
>>>>
>>>> Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): balance: start
>>>> -dconvert=ngle,soft,devid=limit=
>>>> Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): relocating
>>>> block group 11115169841152 flags data|raid1
>>>> Dec 21 14:39:00 nas kernel: BTRFS info (device dm-10): found 6264
>>>> extents, stage: move data extents
>>>> Dec 21 14:39:17 nas kernel: BTRFS info (device dm-10): balance: ended
>>>> with status: -2
>>>>
>>>> That didn't work.
>>>> I also tried to mount the FS in degraded mode, with the drive I wanted
>>>> to remove missing,
>>>> using btrfs dev del missing, but the balance still failed with the
>>>> same error on the same block group.
>>>>
>>>> So, as I was running 5.10.1 just for a few days, I tried an older
>>>> kernel: 5.6.17,
>>>> and retried the balance once again (with still the drive voluntarily
>>>> missing):
>>>>
>>>> [ 413.188812] BTRFS info (device dm-10): allowing degraded mounts
>>>> [ 413.188814] BTRFS info (device dm-10): using free space tree
>>>> [ 413.188815] BTRFS info (device dm-10): has skinny extents
>>>> [ 413.189674] BTRFS warning (device dm-10): devid 5 uuid
>>>> 068c6db3-3c30-4c97-b96b-5fe2d6c5d677 is missing
>>>> [ 424.159486] BTRFS info (device dm-10): balance: start
>>>> -dconvert=ngle,soft,devid=limit=
>>>> [ 424.772640] BTRFS info (device dm-10): relocating block group
>>>> 11115169841152 flags data|raid1
>>>> [ 434.749100] BTRFS info (device dm-10): found 6264 extents, stage:
>>>> move data extents
>>>> [ 477.703111] BTRFS info (device dm-10): found 6264 extents, stage:
>>>> update data pointers
>>>> [ 497.941482] BTRFS info (device dm-10): balance: ended with status: 0
>>>>
>>>> The problematic block group was balanced successfully this time.
>>>>
>>>> I balanced a few more successfully (without the -dconvert=ngle option),
>>>> then decided to reboot under 5.10 just to see if I would hit this
>>>> issue again.
>>>> I didn't: the btrfs dev del worked correctly after the last 500G or so
>>>> data
>>>> was moved out of the drive.
>>>>
>>>> This is the output of btrfs fi usage after I successfully balanced the
>>>> problematic block group under the 5.6.17 kernel. Notice the multiple
>>>> data profile, which is expected as I used the -dconvert balance option,
>>>> and also the fact that apparently 3 chunks were allocated on new16T for
>>>> this, even if only 1 seem to be used. We can tell because this is the
>>>> first and only time the balance succeeded with the -dconvert option,
>>>> hence these chunks are all under "data,single":
>>>>
>>>> Overall:
>>>> Device size:        41.89TiB
>>>> Device allocated:   21.74TiB
>>>> Device unallocated: 20.14TiB
>>>> Device missing:      9.09TiB
>>>> Used:               21.71TiB
>>>> Free (estimated):   10.08TiB (min: 10.07TiB)
>>>> Data ratio:             2.00
>>>> Metadata ratio:         2.00
>>>> Global reserve:    512.00MiB (used: 0.00B)
>>>> Multiple profiles:       yes (data)
>>>>
>>>> Data,single: Size:3.00GiB, Used:1.00GiB (33.34%)
>>>> /dev/mapper/luks-new16T     3.00GiB
>>>>
>>>> Data,RAID1: Size:10.83TiB, Used:10.83TiB (99.99%)
>>>> /dev/mapper/luks-10Ta       7.14TiB
>>>> /dev/mapper/luks-10Tb       7.10TiB
>>>> missing                   482.00GiB
>>>> /dev/mapper/luks-new16T     6.95TiB
>>>>
>>>> Metadata,RAID1: Size:36.00GiB, Used:23.87GiB (66.31%)
>>>> /dev/mapper/luks-10Tb      36.00GiB
>>>> /dev/mapper/luks-ssd-mdata 36.00GiB
>>>>
>>>> System,RAID1: Size:32.00MiB, Used:1.77MiB (5.52%)
>>>> /dev/mapper/luks-10Ta      32.00MiB
>>>> /dev/mapper/luks-10Tb      32.00MiB
>>>>
>>>> Unallocated:
>>>> /dev/mapper/luks-10Ta       1.95TiB
>>>> /dev/mapper/luks-10Tb       1.96TiB
>>>> missing                     8.62TiB
>>>> /dev/mapper/luks-ssd-mdata 11.29GiB
>>>> /dev/mapper/luks-new16T     7.60TiB
>>>>
>>>> I wasn't going to send an email to this ML because I knew I had nothing
>>>> to reproduce the issue noww that it was "fixed", but now I think I'm
>>>> bumping
>>>> into the same issue on another FS, while rebalancing data after adding
>>>> a drive,
>>>> which happens to be the old 10T drive of the FS above.
>>>>
>>>> The btrfs fi usage of this second FS is as follows:
>>>>
>>>> Overall:
>>>> Device size:        25.50TiB
>>>> Device allocated:   22.95TiB
>>>> Device unallocated:  2.55TiB
>>>> Device missing:        0.00B
>>>> Used:               22.36TiB
>>>> Free (estimated):    3.14TiB (min: 1.87TiB)
>>>> Data ratio:             1.00
>>>> Metadata ratio:         2.00
>>>> Global reserve:    512.00MiB (used: 0.00B)
>>>> Multiple profiles:        no
>>>>
>>>> Data,single: Size:22.89TiB, Used:22.29TiB (97.40%)
>>>> /dev/mapper/luks-12T        10.91TiB
>>>> /dev/mapper/luks-3Ta         2.73TiB
>>>> /dev/mapper/luks-3Tb         2.73TiB
>>>> /dev/mapper/luks-10T         6.52TiB
>>>>
>>>> Metadata,RAID1: Size:32.00GiB, Used:30.83GiB (96.34%)
>>>> /dev/mapper/luks-ssd-mdata2 32.00GiB
>>>> /dev/mapper/luks-10T        32.00GiB
>>>>
>>>> System,RAID1: Size:32.00MiB, Used:2.44MiB (7.62%)
>>>> /dev/mapper/luks-3Tb        32.00MiB
>>>> /dev/mapper/luks-10T        32.00MiB
>>>>
>>>> Unallocated:
>>>> /dev/mapper/luks-12T        45.00MiB
>>>> /dev/mapper/luks-ssd-mdata2  4.00GiB
>>>> /dev/mapper/luks-3Ta         1.02MiB
>>>> /dev/mapper/luks-3Tb         2.97GiB
>>>> /dev/mapper/luks-10T         2.54TiB
>>>>
>>>> I can reproduce the problem reliably:
>>>>
>>>> # btrfs bal start -dvrange4625344765952..34625344765953 /tank
>>>> ERROR: error during balancing '/tank': No such file or directory
>>>> There may be more info in syslog - try dmesg | tail
>>>>
>>>> [145979.563045] BTRFS info (device dm-10): balance: start
>>>> -dvrange4625344765952..34625344765953
>>>> [145979.585572] BTRFS info (device dm-10): relocating block group
>>>> 34625344765952 flags data|raid1
>>>> [145990.396585] BTRFS info (device dm-10): found 167 extents, stage:
>>>> move data extents
>>>> [146002.236115] BTRFS info (device dm-10): balance: ended with 
>>>> status: -2
>>>>
>>>> If anybody is interested in looking into this, this time I can leave
>>>> the FS in this state.
>>>> The issue is reproducible, and I can live without completing the
>>>> balance for the next weeks
>>>> or even months, as I don't think I'll need the currently unallocatable
>>>> space soon.
>>>>
>>>> I also made a btrfs-image of the FS, using btrfs-image -c 9 -t 4 -s -w.
>>>> If it's of any use, I can drop it somewhere (51G).
>>>>
>>>> I could try to bisect manually to find which version between 5.6.x and
>>>> 5.10.1 started to behave
>>>> like this, but on the first success, I won't know how to reproduce the
>>>> issue a second time, as
>>>> I'm not 100% sure it can be done solely with the btrfs-image.
>>>>
>>>> Note that another user seem to have encoutered a similar issue in July
>>>> with 5.8:
>>>> https://www.spinics.net/lists/linux-btrfs/msg103188.html
>>>>
>>>> Regards,
>>>>
>>>> Stéphane Lesimple.
>>>
>>>
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.6-5.10 balance regression?
  2020-12-28  7:38     ` David Arendt
  2020-12-28  7:48       ` Qu Wenruo
@ 2020-12-28 17:43       ` Stéphane Lesimple
  2020-12-28 19:58       ` Stéphane Lesimple
  2 siblings, 0 replies; 13+ messages in thread
From: Stéphane Lesimple @ 2020-12-28 17:43 UTC (permalink / raw)
  To: Qu Wenruo, David Arendt, Qu Wenruo, linux-btrfs

>> unfortunately the problem is no longer reproducible, probably due to
>> writes happening in meantime. If you still want a btrfs-image, I can
>> create one (unfortunately only without data as there is confidential
>> data in it), but as the problem is currently no longer reproducible, I
>> think it probably won't help.
> 
> That's fine, at least you get your fs back to normal.
> 
> I tried several small balance locally, not reproduced, thus I guess it
> may be related to certain tree layout.
> 
> Anyway, I'll wait for another small enough and reproducible report.

This is still reproducible on my FS, and I have the btrfs-image.
I can easily upload it somewhere, but of course I understand downloading
an image of 51G can be impractical.

An other way might be: as I know which block group is causing the problem,
as per the dmesg, maybe I can dump only the part of the metadata relevant
to this block group?

In any case I can run commands on this system, compile a custom btrfs-progs
or a custom kernel with whatever you want me to try, and reboot as many times
as necessary (this is not a production server).

I know it fails in relocate_block_group(), which returns -2, I'm currently
adding a couple printk's here and there to try to pinpoint that better.

Regards,

Stéphane.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.6-5.10 balance regression?
  2020-12-28  7:38     ` David Arendt
  2020-12-28  7:48       ` Qu Wenruo
  2020-12-28 17:43       ` Stéphane Lesimple
@ 2020-12-28 19:58       ` Stéphane Lesimple
  2020-12-28 23:39         ` Qu Wenruo
  2 siblings, 1 reply; 13+ messages in thread
From: Stéphane Lesimple @ 2020-12-28 19:58 UTC (permalink / raw)
  To: Qu Wenruo, David Arendt, Qu Wenruo, linux-btrfs

> I know it fails in relocate_block_group(), which returns -2, I'm currently
> adding a couple printk's here and there to try to pinpoint that better.

Okay, so btrfs_relocate_block_group() starts with stage MOVE_DATA_EXTENTS, which
completes successfully, as relocate_block_group() returns 0:

BTRFS info (device <unknown>): relocate_block_group: prepare_to_realocate = 0
BTRFS info (device <unknown>): relocate_block_group loop: progress = 1, btrfs_start_transaction = ok
[...]
BTRFS info (device <unknown>): relocate_block_group loop: progress = 168, btrfs_start_transaction = ok
BTRFS info (device <unknown>): relocate_block_group: returning err = 0
BTRFS info (device dm-10): stage = move data extents, relocate_block_group = 0
BTRFS info (device dm-10): found 167 extents, stage: move data extents

Then it proceeds to the UPDATE_DATA_PTRS stage and calls relocate_block_group()
again. This time it'll fail at the 92th iteration of the loop:

BTRFS info (device <unknown>): relocate_block_group loop: progress = 92, btrfs_start_transaction = ok
BTRFS info (device <unknown>): relocate_block_group loop: extents_found = 92, item_size(53) >= sizeof(*ei)(24), flags = 1, ret = 0
BTRFS info (device <unknown>): add_data_references: btrfs_find_all_leafs = 0
BTRFS info (device <unknown>): add_data_references loop: read_tree_block ok
BTRFS info (device <unknown>): add_data_references loop: delete_v1_space_cache = -2
BTRFS info (device <unknown>): relocate_block_group loop: add_data_references = -2

Then the -ENOENT goes all the way up the call stack and aborts the balance.

So it fails in delete_v1_space_cache(), though it is worth noting that the
FS we're talking about is actually using space_cache v2.

Does it help? Shall I dig deeper?

Regards,

Stéphane.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.6-5.10 balance regression?
  2020-12-28 19:58       ` Stéphane Lesimple
@ 2020-12-28 23:39         ` Qu Wenruo
  2020-12-29  0:44           ` Qu Wenruo
  2020-12-29  9:31           ` Stéphane Lesimple
  0 siblings, 2 replies; 13+ messages in thread
From: Qu Wenruo @ 2020-12-28 23:39 UTC (permalink / raw)
  To: Stéphane Lesimple, Qu Wenruo, David Arendt, linux-btrfs



On 2020/12/29 上午3:58, Stéphane Lesimple wrote:
>> I know it fails in relocate_block_group(), which returns -2, I'm currently
>> adding a couple printk's here and there to try to pinpoint that better.
>
> Okay, so btrfs_relocate_block_group() starts with stage MOVE_DATA_EXTENTS, which
> completes successfully, as relocate_block_group() returns 0:
>
> BTRFS info (device <unknown>): relocate_block_group: prepare_to_realocate = 0
> BTRFS info (device <unknown>): relocate_block_group loop: progress = 1, btrfs_start_transaction = ok
> [...]
> BTRFS info (device <unknown>): relocate_block_group loop: progress = 168, btrfs_start_transaction = ok
> BTRFS info (device <unknown>): relocate_block_group: returning err = 0
> BTRFS info (device dm-10): stage = move data extents, relocate_block_group = 0
> BTRFS info (device dm-10): found 167 extents, stage: move data extents
>
> Then it proceeds to the UPDATE_DATA_PTRS stage and calls relocate_block_group()
> again. This time it'll fail at the 92th iteration of the loop:
>
> BTRFS info (device <unknown>): relocate_block_group loop: progress = 92, btrfs_start_transaction = ok
> BTRFS info (device <unknown>): relocate_block_group loop: extents_found = 92, item_size(53) >= sizeof(*ei)(24), flags = 1, ret = 0
> BTRFS info (device <unknown>): add_data_references: btrfs_find_all_leafs = 0
> BTRFS info (device <unknown>): add_data_references loop: read_tree_block ok
> BTRFS info (device <unknown>): add_data_references loop: delete_v1_space_cache = -2

Damn it, if we find no v1 space cache for the block group, it means
we're fine to continue...

> BTRFS info (device <unknown>): relocate_block_group loop: add_data_references = -2
>
> Then the -ENOENT goes all the way up the call stack and aborts the balance.
>
> So it fails in delete_v1_space_cache(), though it is worth noting that the
> FS we're talking about is actually using space_cache v2.

Space cache v2, no wonder no v1 space cache.

>
> Does it help? Shall I dig deeper?

You're already at the point!

Mind me to craft a fix with your signed-off-by?

Thanks,
Qu

>
> Regards,
>
> Stéphane.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.6-5.10 balance regression?
  2020-12-28 23:39         ` Qu Wenruo
@ 2020-12-29  0:44           ` Qu Wenruo
  2020-12-29  0:59             ` David Arendt
  2020-12-29  9:42             ` Martin Steigerwald
  2020-12-29  9:31           ` Stéphane Lesimple
  1 sibling, 2 replies; 13+ messages in thread
From: Qu Wenruo @ 2020-12-29  0:44 UTC (permalink / raw)
  To: Stéphane Lesimple, Qu Wenruo, David Arendt, linux-btrfs



On 2020/12/29 上午7:39, Qu Wenruo wrote:
>
>
> On 2020/12/29 上午3:58, Stéphane Lesimple wrote:
>>> I know it fails in relocate_block_group(), which returns -2, I'm
>>> currently
>>> adding a couple printk's here and there to try to pinpoint that better.
>>
>> Okay, so btrfs_relocate_block_group() starts with stage
>> MOVE_DATA_EXTENTS, which
>> completes successfully, as relocate_block_group() returns 0:
>>
>> BTRFS info (device <unknown>): relocate_block_group:
>> prepare_to_realocate = 0
>> BTRFS info (device <unknown>): relocate_block_group loop: progress =
>> 1, btrfs_start_transaction = ok
>> [...]
>> BTRFS info (device <unknown>): relocate_block_group loop: progress =
>> 168, btrfs_start_transaction = ok
>> BTRFS info (device <unknown>): relocate_block_group: returning err = 0
>> BTRFS info (device dm-10): stage = move data extents,
>> relocate_block_group = 0
>> BTRFS info (device dm-10): found 167 extents, stage: move data extents
>>
>> Then it proceeds to the UPDATE_DATA_PTRS stage and calls
>> relocate_block_group()
>> again. This time it'll fail at the 92th iteration of the loop:
>>
>> BTRFS info (device <unknown>): relocate_block_group loop: progress =
>> 92, btrfs_start_transaction = ok
>> BTRFS info (device <unknown>): relocate_block_group loop:
>> extents_found = 92, item_size(53) >= sizeof(*ei)(24), flags = 1, ret = 0
>> BTRFS info (device <unknown>): add_data_references:
>> btrfs_find_all_leafs = 0
>> BTRFS info (device <unknown>): add_data_references loop:
>> read_tree_block ok
>> BTRFS info (device <unknown>): add_data_references loop:
>> delete_v1_space_cache = -2
>
> Damn it, if we find no v1 space cache for the block group, it means
> we're fine to continue...
>
>> BTRFS info (device <unknown>): relocate_block_group loop:
>> add_data_references = -2
>>
>> Then the -ENOENT goes all the way up the call stack and aborts the
>> balance.
>>
>> So it fails in delete_v1_space_cache(), though it is worth noting that
>> the
>> FS we're talking about is actually using space_cache v2.
>
> Space cache v2, no wonder no v1 space cache.
>
>>
>> Does it help? Shall I dig deeper?
>
> You're already at the point!
>
> Mind me to craft a fix with your signed-off-by?

The problem is more complex than I thought, but still we at least have
some workaround.

Firstly, this happens when an old fs get v2 space cache enabled, but
still has v1 space cache left.

Newer v2 mount should cleanup v1 properly, but older kernel doesn't do
the proper cleaning, thus left some v1 cache.

Then we call btrfs balance on such old fs, leading to the -ENOENT error.
We can't ignore the error, as we have no way to relocate such left over
v1 cache (normally we delete it completely, but with v2 cache, we can't).

So what I can do is only to add a warning message to the problem.

To solve your problem, I also submitted a patch to btrfs-progs, to force
v1 space cache cleaning even if the fs has v2 space cache enabled.

Or, you can disable v2 space cache first, using "btrfs check
--clear-space-cache v2" first, then "btrfs check --clear-space_cache
v1", and finally mount the fs with "space_cache=v2" again.

To verify there is no space cache v1 left, you can run the following
command to verify:

# btrfs ins dump-tree -t root <device> | grep EXTENT_DATA

It should output nothing.

Then please try if you can balance all your data.

Thanks,
Qu

>
> Thanks,
> Qu
>
>>
>> Regards,
>>
>> Stéphane.
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.6-5.10 balance regression?
  2020-12-29  0:44           ` Qu Wenruo
@ 2020-12-29  0:59             ` David Arendt
  2020-12-29  4:36               ` Qu Wenruo
  2020-12-29  9:42             ` Martin Steigerwald
  1 sibling, 1 reply; 13+ messages in thread
From: David Arendt @ 2020-12-29  0:59 UTC (permalink / raw)
  To: Qu Wenruo, Stéphane Lesimple, Qu Wenruo, linux-btrfs

Hi,

Just for information: On my system the error appeared on a filesystem 
using space cache v1. I think my problem might then be unrelated to this 
one. If it will happen again, I will try to collect more information. 
Maybe a should try a clear_cache to ensure that the space cache is not 
wrong.

Bye,
David Arendt

On 12/29/20 1:44 AM, Qu Wenruo wrote:
>
>
> On 2020/12/29 上午7:39, Qu Wenruo wrote:
>>
>>
>> On 2020/12/29 上午3:58, Stéphane Lesimple wrote:
>>>> I know it fails in relocate_block_group(), which returns -2, I'm
>>>> currently
>>>> adding a couple printk's here and there to try to pinpoint that 
>>>> better.
>>>
>>> Okay, so btrfs_relocate_block_group() starts with stage
>>> MOVE_DATA_EXTENTS, which
>>> completes successfully, as relocate_block_group() returns 0:
>>>
>>> BTRFS info (device <unknown>): relocate_block_group:
>>> prepare_to_realocate = 0
>>> BTRFS info (device <unknown>): relocate_block_group loop: progress =
>>> 1, btrfs_start_transaction = ok
>>> [...]
>>> BTRFS info (device <unknown>): relocate_block_group loop: progress =
>>> 168, btrfs_start_transaction = ok
>>> BTRFS info (device <unknown>): relocate_block_group: returning err = 0
>>> BTRFS info (device dm-10): stage = move data extents,
>>> relocate_block_group = 0
>>> BTRFS info (device dm-10): found 167 extents, stage: move data extents
>>>
>>> Then it proceeds to the UPDATE_DATA_PTRS stage and calls
>>> relocate_block_group()
>>> again. This time it'll fail at the 92th iteration of the loop:
>>>
>>> BTRFS info (device <unknown>): relocate_block_group loop: progress =
>>> 92, btrfs_start_transaction = ok
>>> BTRFS info (device <unknown>): relocate_block_group loop:
>>> extents_found = 92, item_size(53) >= sizeof(*ei)(24), flags = 1, ret 
>>> = 0
>>> BTRFS info (device <unknown>): add_data_references:
>>> btrfs_find_all_leafs = 0
>>> BTRFS info (device <unknown>): add_data_references loop:
>>> read_tree_block ok
>>> BTRFS info (device <unknown>): add_data_references loop:
>>> delete_v1_space_cache = -2
>>
>> Damn it, if we find no v1 space cache for the block group, it means
>> we're fine to continue...
>>
>>> BTRFS info (device <unknown>): relocate_block_group loop:
>>> add_data_references = -2
>>>
>>> Then the -ENOENT goes all the way up the call stack and aborts the
>>> balance.
>>>
>>> So it fails in delete_v1_space_cache(), though it is worth noting that
>>> the
>>> FS we're talking about is actually using space_cache v2.
>>
>> Space cache v2, no wonder no v1 space cache.
>>
>>>
>>> Does it help? Shall I dig deeper?
>>
>> You're already at the point!
>>
>> Mind me to craft a fix with your signed-off-by?
>
> The problem is more complex than I thought, but still we at least have
> some workaround.
>
> Firstly, this happens when an old fs get v2 space cache enabled, but
> still has v1 space cache left.
>
> Newer v2 mount should cleanup v1 properly, but older kernel doesn't do
> the proper cleaning, thus left some v1 cache.
>
> Then we call btrfs balance on such old fs, leading to the -ENOENT error.
> We can't ignore the error, as we have no way to relocate such left over
> v1 cache (normally we delete it completely, but with v2 cache, we can't).
>
> So what I can do is only to add a warning message to the problem.
>
> To solve your problem, I also submitted a patch to btrfs-progs, to force
> v1 space cache cleaning even if the fs has v2 space cache enabled.
>
> Or, you can disable v2 space cache first, using "btrfs check
> --clear-space-cache v2" first, then "btrfs check --clear-space_cache
> v1", and finally mount the fs with "space_cache=v2" again.
>
> To verify there is no space cache v1 left, you can run the following
> command to verify:
>
> # btrfs ins dump-tree -t root <device> | grep EXTENT_DATA
>
> It should output nothing.
>
> Then please try if you can balance all your data.
>
> Thanks,
> Qu
>
>>
>> Thanks,
>> Qu
>>
>>>
>>> Regards,
>>>
>>> Stéphane.
>>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.6-5.10 balance regression?
  2020-12-29  0:59             ` David Arendt
@ 2020-12-29  4:36               ` Qu Wenruo
  0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2020-12-29  4:36 UTC (permalink / raw)
  To: David Arendt, Qu Wenruo, Stéphane Lesimple, linux-btrfs



On 2020/12/29 上午8:59, David Arendt wrote:
> Hi,
> 
> Just for information: On my system the error appeared on a filesystem 
> using space cache v1. I think my problem might then be unrelated to this 
> one.

Then this is more interesting.

There are two locations which can return -ENOENT in delete_v1_space_cache():

- No file extent found
   This means something is wrong in backref walk.
   I don't believe it's even possible, or qgroup and balance will be
   completely broken.

- delete_block_group_cache() failed to grab the free space cache inode
   There is another possibility that, we have free space cache inode in
   commit root (where data relocation reads from), but it's not in our
   current root.
   In that case, -ENOENT is safe to ignore.

I guess you may hit the 2nd case, as your next balance finishes without 
problem.

> If it will happen again, I will try to collect more information. 
> Maybe a should try a clear_cache to ensure that the space cache is not 
> wrong.

Clear_cache itself won't remove all the existing cache, it just remove 
the caches when the block group gets dirty.

Thus we use btrfs-check to remove free space cache completely.

Thanks,
Qu

> 
> Bye,
> David Arendt
> 
> On 12/29/20 1:44 AM, Qu Wenruo wrote:
>>
>>
>> On 2020/12/29 上午7:39, Qu Wenruo wrote:
>>>
>>>
>>> On 2020/12/29 上午3:58, Stéphane Lesimple wrote:
>>>>> I know it fails in relocate_block_group(), which returns -2, I'm
>>>>> currently
>>>>> adding a couple printk's here and there to try to pinpoint that 
>>>>> better.
>>>>
>>>> Okay, so btrfs_relocate_block_group() starts with stage
>>>> MOVE_DATA_EXTENTS, which
>>>> completes successfully, as relocate_block_group() returns 0:
>>>>
>>>> BTRFS info (device <unknown>): relocate_block_group:
>>>> prepare_to_realocate = 0
>>>> BTRFS info (device <unknown>): relocate_block_group loop: progress =
>>>> 1, btrfs_start_transaction = ok
>>>> [...]
>>>> BTRFS info (device <unknown>): relocate_block_group loop: progress =
>>>> 168, btrfs_start_transaction = ok
>>>> BTRFS info (device <unknown>): relocate_block_group: returning err = 0
>>>> BTRFS info (device dm-10): stage = move data extents,
>>>> relocate_block_group = 0
>>>> BTRFS info (device dm-10): found 167 extents, stage: move data extents
>>>>
>>>> Then it proceeds to the UPDATE_DATA_PTRS stage and calls
>>>> relocate_block_group()
>>>> again. This time it'll fail at the 92th iteration of the loop:
>>>>
>>>> BTRFS info (device <unknown>): relocate_block_group loop: progress =
>>>> 92, btrfs_start_transaction = ok
>>>> BTRFS info (device <unknown>): relocate_block_group loop:
>>>> extents_found = 92, item_size(53) >= sizeof(*ei)(24), flags = 1, ret 
>>>> = 0
>>>> BTRFS info (device <unknown>): add_data_references:
>>>> btrfs_find_all_leafs = 0
>>>> BTRFS info (device <unknown>): add_data_references loop:
>>>> read_tree_block ok
>>>> BTRFS info (device <unknown>): add_data_references loop:
>>>> delete_v1_space_cache = -2
>>>
>>> Damn it, if we find no v1 space cache for the block group, it means
>>> we're fine to continue...
>>>
>>>> BTRFS info (device <unknown>): relocate_block_group loop:
>>>> add_data_references = -2
>>>>
>>>> Then the -ENOENT goes all the way up the call stack and aborts the
>>>> balance.
>>>>
>>>> So it fails in delete_v1_space_cache(), though it is worth noting that
>>>> the
>>>> FS we're talking about is actually using space_cache v2.
>>>
>>> Space cache v2, no wonder no v1 space cache.
>>>
>>>>
>>>> Does it help? Shall I dig deeper?
>>>
>>> You're already at the point!
>>>
>>> Mind me to craft a fix with your signed-off-by?
>>
>> The problem is more complex than I thought, but still we at least have
>> some workaround.
>>
>> Firstly, this happens when an old fs get v2 space cache enabled, but
>> still has v1 space cache left.
>>
>> Newer v2 mount should cleanup v1 properly, but older kernel doesn't do
>> the proper cleaning, thus left some v1 cache.
>>
>> Then we call btrfs balance on such old fs, leading to the -ENOENT error.
>> We can't ignore the error, as we have no way to relocate such left over
>> v1 cache (normally we delete it completely, but with v2 cache, we can't).
>>
>> So what I can do is only to add a warning message to the problem.
>>
>> To solve your problem, I also submitted a patch to btrfs-progs, to force
>> v1 space cache cleaning even if the fs has v2 space cache enabled.
>>
>> Or, you can disable v2 space cache first, using "btrfs check
>> --clear-space-cache v2" first, then "btrfs check --clear-space_cache
>> v1", and finally mount the fs with "space_cache=v2" again.
>>
>> To verify there is no space cache v1 left, you can run the following
>> command to verify:
>>
>> # btrfs ins dump-tree -t root <device> | grep EXTENT_DATA
>>
>> It should output nothing.
>>
>> Then please try if you can balance all your data.
>>
>> Thanks,
>> Qu
>>
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> Regards,
>>>>
>>>> Stéphane.
>>>>
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.6-5.10 balance regression?
  2020-12-28 23:39         ` Qu Wenruo
  2020-12-29  0:44           ` Qu Wenruo
@ 2020-12-29  9:31           ` Stéphane Lesimple
  1 sibling, 0 replies; 13+ messages in thread
From: Stéphane Lesimple @ 2020-12-29  9:31 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs

>> Mind me to craft a fix with your signed-off-by?

Sure!

> The problem is more complex than I thought, but still we at least have
> some workaround.
> 
> Firstly, this happens when an old fs get v2 space cache enabled, but
> still has v1 space cache left.
> 
> Newer v2 mount should cleanup v1 properly, but older kernel doesn't do
> the proper cleaning, thus left some v1 cache.
> 
> Then we call btrfs balance on such old fs, leading to the -ENOENT error.
> We can't ignore the error, as we have no way to relocate such left over
> v1 cache (normally we delete it completely, but with v2 cache, we can't).
> 
> So what I can do is only to add a warning message to the problem.
> 
> To solve your problem, I also submitted a patch to btrfs-progs, to force
> v1 space cache cleaning even if the fs has v2 space cache enabled.
> 
> Or, you can disable v2 space cache first, using "btrfs check
> --clear-space-cache v2" first, then "btrfs check --clear-space_cache
> v1", and finally mount the fs with "space_cache=v2" again.
> 
> To verify there is no space cache v1 left, you can run the following
> command to verify:
> 
> # btrfs ins dump-tree -t root <device> | grep EXTENT_DATA
> 
> It should output nothing.
> 
> Then please try if you can balance all your data.

Your analysis is correct, I do have v1 leftovers as I commented on
the [PATCH] you've sent.

Now, fixing the FS:
# btrfs check --clear-space-cache v2 /dev/mapper/luks-tank-mdata
Opening filesystem to check...
Checking filesystem on /dev/mapper/luks-tank-mdata
UUID: 428b20da-dcb1-403e-b407-ba984fd07ebd
Clear free space cache v2
Segmentation fault

Wow, okay. That's unexpected.

# btrfs --version
btrfs-progs v5.9 

(gdb) r
Starting program: /usr/local/bin/btrfs check --clear-space-cache v2 /dev/mapper/luks-tank-mdata
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Opening filesystem to check...
Checking filesystem on /dev/mapper/luks-tank-mdata
UUID: 428b20da-dcb1-403e-b407-ba984fd07ebd
Clear free space cache v2

Program received signal SIGSEGV, Segmentation fault.
balance_level (level=<optimized out>, path=0x555555649490, root=0x555555645da0, trans=<optimized out>) at kernel-shared/ctree.c:930
930                             root_sub_used(root, right->len);
(gdb) bt
#0  balance_level (level=<optimized out>, path=0x555555649490, root=0x555555645da0, trans=<optimized out>) at kernel-shared/ctree.c:930
#1  btrfs_search_slot (trans=trans@entry=0x55555e8b4d30, root=root@entry=0x555555645da0, key=key@entry=0x7fffffffe000, p=p@entry=0x555555649490, ins_len=ins_len@entry=-1, cow=cow@entry=1)
    at kernel-shared/ctree.c:1320
#2  0x00005555555e3da7 in clear_free_space_tree (root=0x555555645da0, trans=0x55555e8b4d30) at kernel-shared/free-space-tree.c:1161
#3  btrfs_clear_free_space_tree (fs_info=<optimized out>) at kernel-shared/free-space-tree.c:1201
#4  0x000055555558cd5f in do_clear_free_space_cache (clear_version=clear_version@entry=2) at check/main.c:9872
#5  0x000055555559acce in cmd_check (cmd=0x555555638900 <cmd_struct_check>, argc=<optimized out>, argv=0x7fffffffe490) at check/main.c:10194
#6  0x000055555556ae88 in cmd_execute (argv=0x7fffffffe490, argc=4, cmd=0x555555638900 <cmd_struct_check>) at cmds/commands.h:125
#7  main (argc=4, argv=0x7fffffffe490) at btrfs.c:402
(gdb) 

Can v1 leftovers provoke this?

The patch you've sent for btrfs-progs might fix my problem as I wouldn't need
to remove space_cache v2 first, so I may not hit this bug, but if you're interested
in looking into this one too, we might kill one bird with two stones!

I'm leaving my FS as is waiting for your reply,

Regards,

Stéphane.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.6-5.10 balance regression?
  2020-12-29  0:44           ` Qu Wenruo
  2020-12-29  0:59             ` David Arendt
@ 2020-12-29  9:42             ` Martin Steigerwald
  1 sibling, 0 replies; 13+ messages in thread
From: Martin Steigerwald @ 2020-12-29  9:42 UTC (permalink / raw)
  To: Stéphane Lesimple, Qu Wenruo, David Arendt, linux-btrfs, Qu Wenruo

Qu Wenruo - 29.12.20, 01:44:07 CET:
> So what I can do is only to add a warning message to the problem.
> 
> To solve your problem, I also submitted a patch to btrfs-progs, to
> force v1 space cache cleaning even if the fs has v2 space cache
> enabled.
> 
> Or, you can disable v2 space cache first, using "btrfs check
> --clear-space-cache v2" first, then "btrfs check --clear-space_cache
> v1", and finally mount the fs with "space_cache=v2" again.
> 
> To verify there is no space cache v1 left, you can run the following
> command to verify:
> 
> # btrfs ins dump-tree -t root <device> | grep EXTENT_DATA
> 
> It should output nothing.

I have v1 space_cache stuff on filesystems which use v2 space_cache as 
well, so…

the fully working way to completely switch to spacecache_v2 for any 
BTRFS filesystem with space cache v1, is what you wrote above?

Or would it be more straight forward than that with a newer kernel?

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-12-29  9:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-27 12:11 5.6-5.10 balance regression? Stéphane Lesimple
2020-12-27 13:11 ` David Arendt
2020-12-28  0:06   ` Qu Wenruo
2020-12-28  7:38     ` David Arendt
2020-12-28  7:48       ` Qu Wenruo
2020-12-28 17:43       ` Stéphane Lesimple
2020-12-28 19:58       ` Stéphane Lesimple
2020-12-28 23:39         ` Qu Wenruo
2020-12-29  0:44           ` Qu Wenruo
2020-12-29  0:59             ` David Arendt
2020-12-29  4:36               ` Qu Wenruo
2020-12-29  9:42             ` Martin Steigerwald
2020-12-29  9:31           ` Stéphane Lesimple

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.