linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: Problems balancing BTRFS
@ 2019-11-22 12:37 devel
  2019-11-22 13:10 ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: devel @ 2019-11-22 12:37 UTC (permalink / raw)
  To: linux-btrfs

So been discussing this on IRC but looks like more sage advice is needed.


So quick history. A BTRFS filesystem was initially created with 2x 6Tb
and 2x1Tb drives. This was configure in a RAID 1 Meta/system   and RAID
5 data configuration

Recently 2 more 6Tb drives were used to replace the 2 1Tb ones. One was
Added to the filesystem and then the original 1Tb was deleted, the other
was just a direct replace.


The filesystem was then expanded to fill the new space with


>    btrfs fi resize 4:max /mnt/media/


The filesystem was very unbalanced and currently looks like this after
an attempt to rebalance it failed.


btrfs fi show
Label: none  uuid: 6abaa68a-2670-4d8b-8d2a-fd7321df9242
    Total devices 4 FS bytes used 2.80TiB
    devid    1 size 5.46TiB used 1.20TiB path /dev/sdb
    devid    2 size 5.46TiB used 1.20TiB path /dev/sdc
    devid    4 size 5.46TiB used 826.03GiB path /dev/sde
    devid    5 size 5.46TiB used 826.03GiB path /dev/sdd


btrfs fi usage  /mnt/media/
WARNING: RAID56 detected, not implemented
Overall:
    Device size:          21.83TiB
    Device allocated:           8.06GiB
    Device unallocated:          21.82TiB
    Device missing:             0.00B
    Used:               6.26GiB
    Free (estimated):             0.00B    (min: 8.00EiB)
    Data ratio:                  0.00
    Metadata ratio:              2.00
    Global reserve:         512.00MiB    (used: 0.00B)

Data,RAID5: Size:2.80TiB, Used:2.80TiB
   /dev/sdb       1.20TiB
   /dev/sdc       1.20TiB
   /dev/sdd     822.00GiB
   /dev/sde     822.00GiB

Metadata,RAID1: Size:4.00GiB, Used:3.13GiB
   /dev/sdd       4.00GiB
   /dev/sde       4.00GiB

System,RAID1: Size:32.00MiB, Used:256.00KiB
   /dev/sdd      32.00MiB
   /dev/sde      32.00MiB

Unallocated:
   /dev/sdb       4.26TiB
   /dev/sdc       4.26TiB
   /dev/sdd       4.65TiB
   /dev/sde       4.65TiB


A scrub is clean

btrfs scrub status /mnt/media/
UUID:             6abaa68a-2670-4d8b-8d2a-fd7321df9242
Scrub started:    Thu Nov 21 11:30:49 2019
Status:           finished
Duration:         17:20:11
Total to scrub:   2.80TiB
Rate:             47.10MiB/s
Error summary:    no errors found


A readonly fs check is clean


Opening filesystem to check...
WARNING: filesystem mounted, continuing because of --force
Checking filesystem on /dev/sdb
UUID: 6abaa68a-2670-4d8b-8d2a-fd7321df9242
[1/7] checking root items                      (0:00:13 elapsed, 373111
items checked)
[2/7] checking extents                         (0:04:18 elapsed, 205334
items checked)
[3/7] checking free space cache                (0:00:37 elapsed, 1233
items checked)
[4/7] checking fs roots                        (0:00:10 elapsed, 10714
items checked)
[5/7] checking csums (without verifying data)  (0:00:02 elapsed, 414138
items checked)
[6/7] checking root refs                       (0:00:00 elapsed, 90
items checked)
[7/7] checking quota groups skipped (not enabled on this FS)
found 3079343529984 bytes used, no error found
total csum bytes: 3003340776
total tree bytes: 3362635776
total fs tree bytes: 177717248
total extent tree bytes: 35635200
btree space waste bytes: 153780830
file data blocks allocated: 3077709344768
 referenced 3077349277696



A full balance is now failing



[Fri Nov 22 11:31:27 2019] BTRFS info (device sdb): relocating block
group 8808400289792 flags data|raid5
[Fri Nov 22 11:32:07 2019] BTRFS info (device sdb): found 74 extents
[Fri Nov 22 11:32:24 2019] BTRFS info (device sdb): found 74 extents
[Fri Nov 22 11:32:43 2019] BTRFS info (device sdb): relocating block
group 8805179064320 flags data|raid5
[Fri Nov 22 11:33:24 2019] BTRFS info (device sdb): found 61 extents
[Fri Nov 22 11:33:44 2019] BTRFS info (device sdb): found 61 extents
[Fri Nov 22 11:33:52 2019] BTRFS info (device sdb): relocating block
group 8801957838848 flags data|raid5
[Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
-9 ino 307 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 1
[Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
-9 ino 307 off 131764224 csum 0xd009e874 expected csum 0x00000000 mirror 1
[Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
-9 ino 307 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 2
[Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
-9 ino 307 off 131764224 csum 0xd009e874 expected csum 0x00000000 mirror 2
[Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
-9 ino 307 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 1
[Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
-9 ino 307 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 2
[Fri Nov 22 11:34:02 2019] BTRFS info (device sdb): balance: ended with
status: -5


Any idea how to proceed from here? The drives are relatively new and
there were no issues in the 2x6Tb+ 2x1Tb


Thanks in advance.


-- 
==

Don Alex. 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems balancing BTRFS
  2019-11-22 12:37 Problems balancing BTRFS devel
@ 2019-11-22 13:10 ` Qu Wenruo
  2019-11-22 13:20   ` devel
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2019-11-22 13:10 UTC (permalink / raw)
  To: devel, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 6605 bytes --]



On 2019/11/22 下午8:37, devel@roosoft.ltd.uk wrote:
> So been discussing this on IRC but looks like more sage advice is needed.

You're not the only one hitting the bug. (Not sure if that makes you
feel a little better)
> 
> 
> So quick history. A BTRFS filesystem was initially created with 2x 6Tb
> and 2x1Tb drives. This was configure in a RAID 1 Meta/system   and RAID
> 5 data configuration
> 
> Recently 2 more 6Tb drives were used to replace the 2 1Tb ones. One was
> Added to the filesystem and then the original 1Tb was deleted, the other
> was just a direct replace.
> 
> 
> The filesystem was then expanded to fill the new space with
> 
> 
>>    btrfs fi resize 4:max /mnt/media/
> 
> 
> The filesystem was very unbalanced and currently looks like this after
> an attempt to rebalance it failed.
> 
> 
> btrfs fi show
> Label: none  uuid: 6abaa68a-2670-4d8b-8d2a-fd7321df9242
>     Total devices 4 FS bytes used 2.80TiB
>     devid    1 size 5.46TiB used 1.20TiB path /dev/sdb
>     devid    2 size 5.46TiB used 1.20TiB path /dev/sdc
>     devid    4 size 5.46TiB used 826.03GiB path /dev/sde
>     devid    5 size 5.46TiB used 826.03GiB path /dev/sdd
> 
> 
> btrfs fi usage  /mnt/media/
> WARNING: RAID56 detected, not implemented
> Overall:
>     Device size:          21.83TiB
>     Device allocated:           8.06GiB
>     Device unallocated:          21.82TiB
>     Device missing:             0.00B
>     Used:               6.26GiB
>     Free (estimated):             0.00B    (min: 8.00EiB)
>     Data ratio:                  0.00
>     Metadata ratio:              2.00
>     Global reserve:         512.00MiB    (used: 0.00B)
> 
> Data,RAID5: Size:2.80TiB, Used:2.80TiB
>    /dev/sdb       1.20TiB
>    /dev/sdc       1.20TiB
>    /dev/sdd     822.00GiB
>    /dev/sde     822.00GiB
> 
> Metadata,RAID1: Size:4.00GiB, Used:3.13GiB
>    /dev/sdd       4.00GiB
>    /dev/sde       4.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:256.00KiB
>    /dev/sdd      32.00MiB
>    /dev/sde      32.00MiB
> 
> Unallocated:
>    /dev/sdb       4.26TiB
>    /dev/sdc       4.26TiB
>    /dev/sdd       4.65TiB
>    /dev/sde       4.65TiB
> 
> 
> A scrub is clean
> 
> btrfs scrub status /mnt/media/
> UUID:             6abaa68a-2670-4d8b-8d2a-fd7321df9242
> Scrub started:    Thu Nov 21 11:30:49 2019
> Status:           finished
> Duration:         17:20:11
> Total to scrub:   2.80TiB
> Rate:             47.10MiB/s
> Error summary:    no errors found
> 
> 
> A readonly fs check is clean
> 
> 
> Opening filesystem to check...
> WARNING: filesystem mounted, continuing because of --force
> Checking filesystem on /dev/sdb
> UUID: 6abaa68a-2670-4d8b-8d2a-fd7321df9242
> [1/7] checking root items                      (0:00:13 elapsed, 373111
> items checked)
> [2/7] checking extents                         (0:04:18 elapsed, 205334
> items checked)
> [3/7] checking free space cache                (0:00:37 elapsed, 1233
> items checked)
> [4/7] checking fs roots                        (0:00:10 elapsed, 10714
> items checked)
> [5/7] checking csums (without verifying data)  (0:00:02 elapsed, 414138
> items checked)
> [6/7] checking root refs                       (0:00:00 elapsed, 90
> items checked)
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 3079343529984 bytes used, no error found
> total csum bytes: 3003340776
> total tree bytes: 3362635776
> total fs tree bytes: 177717248
> total extent tree bytes: 35635200
> btree space waste bytes: 153780830
> file data blocks allocated: 3077709344768
>  referenced 3077349277696
> 
> 
> 
> A full balance is now failing
> 
> 
> 
> [Fri Nov 22 11:31:27 2019] BTRFS info (device sdb): relocating block
> group 8808400289792 flags data|raid5
> [Fri Nov 22 11:32:07 2019] BTRFS info (device sdb): found 74 extents
> [Fri Nov 22 11:32:24 2019] BTRFS info (device sdb): found 74 extents
> [Fri Nov 22 11:32:43 2019] BTRFS info (device sdb): relocating block
> group 8805179064320 flags data|raid5
> [Fri Nov 22 11:33:24 2019] BTRFS info (device sdb): found 61 extents
> [Fri Nov 22 11:33:44 2019] BTRFS info (device sdb): found 61 extents
> [Fri Nov 22 11:33:52 2019] BTRFS info (device sdb): relocating block
> group 8801957838848 flags data|raid5
> [Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 307 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 1
> [Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 307 off 131764224 csum 0xd009e874 expected csum 0x00000000 mirror 1
> [Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 307 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 2
> [Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 307 off 131764224 csum 0xd009e874 expected csum 0x00000000 mirror 2
> [Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 307 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 1
> [Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 307 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 2
> [Fri Nov 22 11:34:02 2019] BTRFS info (device sdb): balance: ended with
> status: -5

The csum error is from data reloc tree, which is a tree to record the
new (relocated) data.
So the good news is, your old data is not corrupted, and since we hit
EIO before switching tree blocks, the corrupted data is just deleted.

And I have also seen the bug just using single device, with DUP meta and
SINGLE data, so I believe there is something wrong with the data reloc tree.
The problem here is, I can't find a way to reproduce it, so it will take
us a longer time to debug.


Despite that, have you seen any other problem? Especially ENOSPC (needs
enospc_debug mount option).
The only time I hit it, I was debugging ENOSPC bug of relocation.

Thanks,
Qu

> 
> 
> Any idea how to proceed from here? The drives are relatively new and
> there were no issues in the 2x6Tb+ 2x1Tb
> 
> 
> Thanks in advance.
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems balancing BTRFS
  2019-11-22 13:10 ` Qu Wenruo
@ 2019-11-22 13:20   ` devel
  2019-11-22 13:56     ` Qu Wenruo
  2019-11-23 16:20     ` Chris Murphy
  0 siblings, 2 replies; 9+ messages in thread
From: devel @ 2019-11-22 13:20 UTC (permalink / raw)
  To: linux-btrfs

On 22/11/2019 13:10, Qu Wenruo wrote:
>
> On 2019/11/22 下午8:37, devel@roosoft.ltd.uk wrote:
>> So been discussing this on IRC but looks like more sage advice is needed.
> You're not the only one hitting the bug. (Not sure if that makes you
> feel a little better)



Hehe.. well always help to know you are not slowly going crazy by oneself.

>>
>> The csum error is from data reloc tree, which is a tree to record the
>> new (relocated) data.
>> So the good news is, your old data is not corrupted, and since we hit
>> EIO before switching tree blocks, the corrupted data is just deleted.
>>
>> And I have also seen the bug just using single device, with DUP meta and
>> SINGLE data, so I believe there is something wrong with the data reloc tree.
>> The problem here is, I can't find a way to reproduce it, so it will take
>> us a longer time to debug.
>>
>>
>> Despite that, have you seen any other problem? Especially ENOSPC (needs
>> enospc_debug mount option).
>> The only time I hit it, I was debugging ENOSPC bug of relocation.
>>

As far as I can tell the rest of the filesystem works normally. Like I
show scrubs clean etc.. I have not actively added much new data since
the whole point is to balance the fs so a scrub does not take 18 hours.


So really I am not sure what to do. It only seems to appear during a
balance, which as far as I know is a much needed regular maintenance
tool to keep a fs healthy, which is why it is part of the
btrfsmaintenance tools 

Are there some other tests to try and isolate what the problem appears
to be?


Thanks.

-- 
==

D LoCascio

Director

RooSoft Ltd


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems balancing BTRFS
  2019-11-22 13:20   ` devel
@ 2019-11-22 13:56     ` Qu Wenruo
  2019-11-22 14:07       ` devel
  2019-11-23 16:20     ` Chris Murphy
  1 sibling, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2019-11-22 13:56 UTC (permalink / raw)
  To: devel, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2667 bytes --]



On 2019/11/22 下午9:20, devel@roosoft.ltd.uk wrote:
> On 22/11/2019 13:10, Qu Wenruo wrote:
>>
>> On 2019/11/22 下午8:37, devel@roosoft.ltd.uk wrote:
>>> So been discussing this on IRC but looks like more sage advice is needed.
>> You're not the only one hitting the bug. (Not sure if that makes you
>> feel a little better)
> 
> 
> 
> Hehe.. well always help to know you are not slowly going crazy by oneself.
> 
>>>
>>> The csum error is from data reloc tree, which is a tree to record the
>>> new (relocated) data.
>>> So the good news is, your old data is not corrupted, and since we hit
>>> EIO before switching tree blocks, the corrupted data is just deleted.
>>>
>>> And I have also seen the bug just using single device, with DUP meta and
>>> SINGLE data, so I believe there is something wrong with the data reloc tree.
>>> The problem here is, I can't find a way to reproduce it, so it will take
>>> us a longer time to debug.
>>>
>>>
>>> Despite that, have you seen any other problem? Especially ENOSPC (needs
>>> enospc_debug mount option).
>>> The only time I hit it, I was debugging ENOSPC bug of relocation.
>>>
> 
> As far as I can tell the rest of the filesystem works normally. Like I
> show scrubs clean etc.. I have not actively added much new data since
> the whole point is to balance the fs so a scrub does not take 18 hours.

Sorry my point here is, would you like to try balance again with
"enospc_debug" mount option?

As for balance, we can hit ENOSPC without showing it as long as we have
a more serious problem, like the EIO you hit.

> 
> 
> So really I am not sure what to do. It only seems to appear during a
> balance, which as far as I know is a much needed regular maintenance
> tool to keep a fs healthy, which is why it is part of the
> btrfsmaintenance tools 

You don't need to be that nervous just for not being able to balance.

Nowadays, balance is no longer that much necessary.
In the old days, balance is the only way to delete empty block groups,
but now empty block groups will be removed automatically, so balance is
only here to address unbalanced disk usage or convert.

For your case, although it's not comfortable to have imbalanced disk
usages, but that won't hurt too much.

So for now, you can just disable balance and call it a day.
As long as you're still writing into that fs, the fs should become more
and more balanced.

> 
> Are there some other tests to try and isolate what the problem appears
> to be?

Forgot to mention, is that always reproducible? And always one the same
block group?

Thanks,
Qu

> 
> 
> Thanks.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems balancing BTRFS
  2019-11-22 13:56     ` Qu Wenruo
@ 2019-11-22 14:07       ` devel
  2019-11-22 15:32         ` devel
  0 siblings, 1 reply; 9+ messages in thread
From: devel @ 2019-11-22 14:07 UTC (permalink / raw)
  To: linux-btrfs

On 22/11/2019 13:56, Qu Wenruo wrote:
>
> On 2019/11/22 下午9:20, devel@roosoft.ltd.uk wrote:
>> On 22/11/2019 13:10, Qu Wenruo wrote:
>>> On 2019/11/22 下午8:37, devel@roosoft.ltd.uk wrote:
>>>> So been discussing this on IRC but looks like more sage advice is needed.
>>> You're not the only one hitting the bug. (Not sure if that makes you
>>> feel a little better)
>>
>>
>> Hehe.. well always help to know you are not slowly going crazy by oneself.
>>
>>>> The csum error is from data reloc tree, which is a tree to record the
>>>> new (relocated) data.
>>>> So the good news is, your old data is not corrupted, and since we hit
>>>> EIO before switching tree blocks, the corrupted data is just deleted.
>>>>
>>>> And I have also seen the bug just using single device, with DUP meta and
>>>> SINGLE data, so I believe there is something wrong with the data reloc tree.
>>>> The problem here is, I can't find a way to reproduce it, so it will take
>>>> us a longer time to debug.
>>>>
>>>>
>>>> Despite that, have you seen any other problem? Especially ENOSPC (needs
>>>> enospc_debug mount option).
>>>> The only time I hit it, I was debugging ENOSPC bug of relocation.
>>>>
>> As far as I can tell the rest of the filesystem works normally. Like I
>> show scrubs clean etc.. I have not actively added much new data since
>> the whole point is to balance the fs so a scrub does not take 18 hours.
> Sorry my point here is, would you like to try balance again with
> "enospc_debug" mount option?
>
> As for balance, we can hit ENOSPC without showing it as long as we have
> a more serious problem, like the EIO you hit.


Oh I see .. Sure I can start the balance again.


>>
>> So really I am not sure what to do. It only seems to appear during a
>> balance, which as far as I know is a much needed regular maintenance
>> tool to keep a fs healthy, which is why it is part of the
>> btrfsmaintenance tools 
> You don't need to be that nervous just for not being able to balance.
>
> Nowadays, balance is no longer that much necessary.
> In the old days, balance is the only way to delete empty block groups,
> but now empty block groups will be removed automatically, so balance is
> only here to address unbalanced disk usage or convert.
>
> For your case, although it's not comfortable to have imbalanced disk
> usages, but that won't hurt too much.


Well going from 1Tb to 6Tb devices means there is a lot of weighting
going the wrong way. Initially there was only ~ 200Gb on each of the new
disks and so that was just unacceptable it was getting better until I
hit this balance issue. But I am wary of putting too much new data
unless it is symptomatic of something else.



> So for now, you can just disable balance and call it a day.
> As long as you're still writing into that fs, the fs should become more
> and more balanced.
>
>> Are there some other tests to try and isolate what the problem appears
>> to be?
> Forgot to mention, is that always reproducible? And always one the same
> block group?
>
> Thanks,
> Qu


So far yes. The balance will always fall at the same ino and offset
making it impossible to continue.


Let me run it with debug on and get back to you.


Thanks.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems balancing BTRFS
  2019-11-22 14:07       ` devel
@ 2019-11-22 15:32         ` devel
  2019-11-23  0:09           ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: devel @ 2019-11-22 15:32 UTC (permalink / raw)
  To: linux-btrfs

On 22/11/2019 14:07, devel@roosoft.ltd.uk wrote:
> On 22/11/2019 13:56, Qu Wenruo wrote:
>> On 2019/11/22 下午9:20, devel@roosoft.ltd.uk wrote:
>>> On 22/11/2019 13:10, Qu Wenruo wrote:
>>>> On 2019/11/22 下午8:37, devel@roosoft.ltd.uk wrote:
>>>>> So been discussing this on IRC but looks like more sage advice is needed.
>>>> You're not the only one hitting the bug. (Not sure if that makes you
>>>> feel a little better)
>>>
>>> Hehe.. well always help to know you are not slowly going crazy by oneself.
>>>
>>>>> The csum error is from data reloc tree, which is a tree to record the
>>>>> new (relocated) data.
>>>>> So the good news is, your old data is not corrupted, and since we hit
>>>>> EIO before switching tree blocks, the corrupted data is just deleted.
>>>>>
>>>>> And I have also seen the bug just using single device, with DUP meta and
>>>>> SINGLE data, so I believe there is something wrong with the data reloc tree.
>>>>> The problem here is, I can't find a way to reproduce it, so it will take
>>>>> us a longer time to debug.
>>>>>
>>>>>
>>>>> Despite that, have you seen any other problem? Especially ENOSPC (needs
>>>>> enospc_debug mount option).
>>>>> The only time I hit it, I was debugging ENOSPC bug of relocation.
>>>>>
>>> As far as I can tell the rest of the filesystem works normally. Like I
>>> show scrubs clean etc.. I have not actively added much new data since
>>> the whole point is to balance the fs so a scrub does not take 18 hours.
>> Sorry my point here is, would you like to try balance again with
>> "enospc_debug" mount option?
>>
>> As for balance, we can hit ENOSPC without showing it as long as we have
>> a more serious problem, like the EIO you hit.
>
> Oh I see .. Sure I can start the balance again.
>
>
>>> So really I am not sure what to do. It only seems to appear during a
>>> balance, which as far as I know is a much needed regular maintenance
>>> tool to keep a fs healthy, which is why it is part of the
>>> btrfsmaintenance tools 
>> You don't need to be that nervous just for not being able to balance.
>>
>> Nowadays, balance is no longer that much necessary.
>> In the old days, balance is the only way to delete empty block groups,
>> but now empty block groups will be removed automatically, so balance is
>> only here to address unbalanced disk usage or convert.
>>
>> For your case, although it's not comfortable to have imbalanced disk
>> usages, but that won't hurt too much.
>
> Well going from 1Tb to 6Tb devices means there is a lot of weighting
> going the wrong way. Initially there was only ~ 200Gb on each of the new
> disks and so that was just unacceptable it was getting better until I
> hit this balance issue. But I am wary of putting too much new data
> unless it is symptomatic of something else.
>
>
>
>> So for now, you can just disable balance and call it a day.
>> As long as you're still writing into that fs, the fs should become more
>> and more balanced.
>>
>>> Are there some other tests to try and isolate what the problem appears
>>> to be?
>> Forgot to mention, is that always reproducible? And always one the same
>> block group?
>>
>> Thanks,
>> Qu
>
> So far yes. The balance will always fall at the same ino and offset
> making it impossible to continue.
>
>
> Let me run it with debug on and get back to you.
>
>
> Thanks.
>
>
>
>




OK so I mounted the fs with enospc_debug


> /dev/sdb on /mnt/media type btrfs
(rw,relatime,space_cache,enospc_debug,subvolid=1001,subvol=/media)


Re- ran a balance and it did a little more. but then errored out again..


However I don't see any more info in dmesg..

[Fri Nov 22 15:13:40 2019] BTRFS info (device sdb): relocating block
group 8963019112448 flags data|raid5
[Fri Nov 22 15:14:22 2019] BTRFS info (device sdb): found 61 extents
[Fri Nov 22 15:14:41 2019] BTRFS info (device sdb): found 61 extents
[Fri Nov 22 15:14:59 2019] BTRFS info (device sdb): relocating block
group 8801957838848 flags data|raid5
[Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
-9 ino 305 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 1
[Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
-9 ino 305 off 131764224 csum 0xd009e874 expected csum 0x00000000 mirror 1
[Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
-9 ino 305 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 2
[Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
-9 ino 305 off 131764224 csum 0xd009e874 expected csum 0x00000000 mirror 2
[Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
-9 ino 305 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 1
[Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
-9 ino 305 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 2
[Fri Nov 22 15:15:13 2019] BTRFS info (device sdb): balance: ended with
status: -5


What should I do now to get more information on the issue ?


Thank.



-- 
==

D LoCascio

Director

RooSoft Ltd


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems balancing BTRFS
  2019-11-22 15:32         ` devel
@ 2019-11-23  0:09           ` Qu Wenruo
  2019-11-23 12:53             ` devel
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2019-11-23  0:09 UTC (permalink / raw)
  To: devel, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 6772 bytes --]



On 2019/11/22 下午11:32, devel@roosoft.ltd.uk wrote:
> On 22/11/2019 14:07, devel@roosoft.ltd.uk wrote:
>> On 22/11/2019 13:56, Qu Wenruo wrote:
>>> On 2019/11/22 下午9:20, devel@roosoft.ltd.uk wrote:
>>>> On 22/11/2019 13:10, Qu Wenruo wrote:
>>>>> On 2019/11/22 下午8:37, devel@roosoft.ltd.uk wrote:
>>>>>> So been discussing this on IRC but looks like more sage advice is needed.
>>>>> You're not the only one hitting the bug. (Not sure if that makes you
>>>>> feel a little better)
>>>>
>>>> Hehe.. well always help to know you are not slowly going crazy by oneself.
>>>>
>>>>>> The csum error is from data reloc tree, which is a tree to record the
>>>>>> new (relocated) data.
>>>>>> So the good news is, your old data is not corrupted, and since we hit
>>>>>> EIO before switching tree blocks, the corrupted data is just deleted.
>>>>>>
>>>>>> And I have also seen the bug just using single device, with DUP meta and
>>>>>> SINGLE data, so I believe there is something wrong with the data reloc tree.
>>>>>> The problem here is, I can't find a way to reproduce it, so it will take
>>>>>> us a longer time to debug.
>>>>>>
>>>>>>
>>>>>> Despite that, have you seen any other problem? Especially ENOSPC (needs
>>>>>> enospc_debug mount option).
>>>>>> The only time I hit it, I was debugging ENOSPC bug of relocation.
>>>>>>
>>>> As far as I can tell the rest of the filesystem works normally. Like I
>>>> show scrubs clean etc.. I have not actively added much new data since
>>>> the whole point is to balance the fs so a scrub does not take 18 hours.
>>> Sorry my point here is, would you like to try balance again with
>>> "enospc_debug" mount option?
>>>
>>> As for balance, we can hit ENOSPC without showing it as long as we have
>>> a more serious problem, like the EIO you hit.
>>
>> Oh I see .. Sure I can start the balance again.
>>
>>
>>>> So really I am not sure what to do. It only seems to appear during a
>>>> balance, which as far as I know is a much needed regular maintenance
>>>> tool to keep a fs healthy, which is why it is part of the
>>>> btrfsmaintenance tools 
>>> You don't need to be that nervous just for not being able to balance.
>>>
>>> Nowadays, balance is no longer that much necessary.
>>> In the old days, balance is the only way to delete empty block groups,
>>> but now empty block groups will be removed automatically, so balance is
>>> only here to address unbalanced disk usage or convert.
>>>
>>> For your case, although it's not comfortable to have imbalanced disk
>>> usages, but that won't hurt too much.
>>
>> Well going from 1Tb to 6Tb devices means there is a lot of weighting
>> going the wrong way. Initially there was only ~ 200Gb on each of the new
>> disks and so that was just unacceptable it was getting better until I
>> hit this balance issue. But I am wary of putting too much new data
>> unless it is symptomatic of something else.
>>
>>
>>
>>> So for now, you can just disable balance and call it a day.
>>> As long as you're still writing into that fs, the fs should become more
>>> and more balanced.
>>>
>>>> Are there some other tests to try and isolate what the problem appears
>>>> to be?
>>> Forgot to mention, is that always reproducible? And always one the same
>>> block group?
>>>
>>> Thanks,
>>> Qu
>>
>> So far yes. The balance will always fall at the same ino and offset
>> making it impossible to continue.
>>
>>
>> Let me run it with debug on and get back to you.
>>
>>
>> Thanks.
>>
>>
>>
>>
> 
> 
> 
> 
> OK so I mounted the fs with enospc_debug
> 
> 
>> /dev/sdb on /mnt/media type btrfs
> (rw,relatime,space_cache,enospc_debug,subvolid=1001,subvol=/media)
> 
> 
> Re- ran a balance and it did a little more. but then errored out again..
> 
> 
> However I don't see any more info in dmesg..

OK, not that ENOSPC bug I'm chasing.

> 
> [Fri Nov 22 15:13:40 2019] BTRFS info (device sdb): relocating block
> group 8963019112448 flags data|raid5
> [Fri Nov 22 15:14:22 2019] BTRFS info (device sdb): found 61 extents
> [Fri Nov 22 15:14:41 2019] BTRFS info (device sdb): found 61 extents
> [Fri Nov 22 15:14:59 2019] BTRFS info (device sdb): relocating block
> group 8801957838848 flags data|raid5
> [Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 305 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 1
> [Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 305 off 131764224 csum 0xd009e874 expected csum 0x00000000 mirror 1
> [Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 305 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 2
> [Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 305 off 131764224 csum 0xd009e874 expected csum 0x00000000 mirror 2
> [Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 305 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 1
> [Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 305 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 2
> [Fri Nov 22 15:15:13 2019] BTRFS info (device sdb): balance: ended with
> status: -5
> 
> 
> What should I do now to get more information on the issue ?

Not exactly.

But I have an idea to see if it's really a certain block group causing
the problem.

1. Get the block group/chunk list.
   You can go the traditional way, by using "btrfs ins dump-tree" or
   more advanced tool to get block group/chunk list.

   If you go the manual way, it's something like:
   # btrfs ins dump-tree -t chunk <device>
   item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 290455552) itemoff 15785
itemsize 80
                length 1073741824 owner 2 stripe_len 65536 type DATA
                io_align 65536 io_width 65536 sector_size 4096
                num_stripes 1 sub_stripes 1
                        stripe 0 devid 1 offset 290455552
                        dev_uuid b929fabe-c291-4fd8-a01e-c67259d202ed


   In above case, 290455552 is the chunk's logical bytenr, and
   1073741824 is the length. Record them all.

2. Use vrange filter.
   Btrfs balance can balance certain block groups only, you can use
   vrange=290455552..1364197375 to relocate the block group above.

   So you can try to relocate block groups one by one manually.
   I recommend to relocate block group 8801957838848 first, as it looks
   like to be the offending one.

   If you can manually relocate that block group manually, then it must
   be something wrong with multiple block groups relocation sequence.

Thanks,
Qu
> 
> 
> Thank.
> 
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems balancing BTRFS
  2019-11-23  0:09           ` Qu Wenruo
@ 2019-11-23 12:53             ` devel
  0 siblings, 0 replies; 9+ messages in thread
From: devel @ 2019-11-23 12:53 UTC (permalink / raw)
  To: linux-btrfs

On 23/11/2019 00:09, Qu Wenruo wrote:
>
> On 2019/11/22 下午11:32, devel@roosoft.ltd.uk wrote:
>> On 22/11/2019 14:07, devel@roosoft.ltd.uk wrote:
>>> On 22/11/2019 13:56, Qu Wenruo wrote:
>>>> On 2019/11/22 下午9:20, devel@roosoft.ltd.uk wrote:
>>>>> On 22/11/2019 13:10, Qu Wenruo wrote:
>>>>>> On 2019/11/22 下午8:37, devel@roosoft.ltd.uk wrote:
>>>>>>> So been discussing this on IRC but looks like more sage advice is needed.
>>>>>> You're not the only one hitting the bug. (Not sure if that makes you
>>>>>> feel a little better)
>>>>> Hehe.. well always help to know you are not slowly going crazy by oneself.
>>>>>
>>>>>>> The csum error is from data reloc tree, which is a tree to record the
>>>>>>> new (relocated) data.
>>>>>>> So the good news is, your old data is not corrupted, and since we hit
>>>>>>> EIO before switching tree blocks, the corrupted data is just deleted.
>>>>>>>
>>>>>>> And I have also seen the bug just using single device, with DUP meta and
>>>>>>> SINGLE data, so I believe there is something wrong with the data reloc tree.
>>>>>>> The problem here is, I can't find a way to reproduce it, so it will take
>>>>>>> us a longer time to debug.
>>>>>>>
>>>>>>>
>>>>>>> Despite that, have you seen any other problem? Especially ENOSPC (needs
>>>>>>> enospc_debug mount option).
>>>>>>> The only time I hit it, I was debugging ENOSPC bug of relocation.
>>>>>>>
>>>>> As far as I can tell the rest of the filesystem works normally. Like I
>>>>> show scrubs clean etc.. I have not actively added much new data since
>>>>> the whole point is to balance the fs so a scrub does not take 18 hours.
>>>> Sorry my point here is, would you like to try balance again with
>>>> "enospc_debug" mount option?
>>>>
>>>> As for balance, we can hit ENOSPC without showing it as long as we have
>>>> a more serious problem, like the EIO you hit.
>>> Oh I see .. Sure I can start the balance again.
>>>
>>>
>>>>> So really I am not sure what to do. It only seems to appear during a
>>>>> balance, which as far as I know is a much needed regular maintenance
>>>>> tool to keep a fs healthy, which is why it is part of the
>>>>> btrfsmaintenance tools 
>>>> You don't need to be that nervous just for not being able to balance.
>>>>
>>>> Nowadays, balance is no longer that much necessary.
>>>> In the old days, balance is the only way to delete empty block groups,
>>>> but now empty block groups will be removed automatically, so balance is
>>>> only here to address unbalanced disk usage or convert.
>>>>
>>>> For your case, although it's not comfortable to have imbalanced disk
>>>> usages, but that won't hurt too much.
>>> Well going from 1Tb to 6Tb devices means there is a lot of weighting
>>> going the wrong way. Initially there was only ~ 200Gb on each of the new
>>> disks and so that was just unacceptable it was getting better until I
>>> hit this balance issue. But I am wary of putting too much new data
>>> unless it is symptomatic of something else.
>>>
>>>
>>>
>>>> So for now, you can just disable balance and call it a day.
>>>> As long as you're still writing into that fs, the fs should become more
>>>> and more balanced.
>>>>
>>>>> Are there some other tests to try and isolate what the problem appears
>>>>> to be?
>>>> Forgot to mention, is that always reproducible? And always one the same
>>>> block group?
>>>>
>>>> Thanks,
>>>> Qu
>>> So far yes. The balance will always fall at the same ino and offset
>>> making it impossible to continue.
>>>
>>>
>>> Let me run it with debug on and get back to you.
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>
>>
>>
>> OK so I mounted the fs with enospc_debug
>>
>>
>>> /dev/sdb on /mnt/media type btrfs
>> (rw,relatime,space_cache,enospc_debug,subvolid=1001,subvol=/media)
>>
>>
>> Re- ran a balance and it did a little more. but then errored out again..
>>
>>
>> However I don't see any more info in dmesg..
> OK, not that ENOSPC bug I'm chasing.
>
>> [Fri Nov 22 15:13:40 2019] BTRFS info (device sdb): relocating block
>> group 8963019112448 flags data|raid5
>> [Fri Nov 22 15:14:22 2019] BTRFS info (device sdb): found 61 extents
>> [Fri Nov 22 15:14:41 2019] BTRFS info (device sdb): found 61 extents
>> [Fri Nov 22 15:14:59 2019] BTRFS info (device sdb): relocating block
>> group 8801957838848 flags data|raid5
>> [Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
>> -9 ino 305 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 1
>> [Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
>> -9 ino 305 off 131764224 csum 0xd009e874 expected csum 0x00000000 mirror 1
>> [Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
>> -9 ino 305 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 2
>> [Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
>> -9 ino 305 off 131764224 csum 0xd009e874 expected csum 0x00000000 mirror 2
>> [Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
>> -9 ino 305 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 1
>> [Fri Nov 22 15:15:05 2019] BTRFS warning (device sdb): csum failed root
>> -9 ino 305 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 2
>> [Fri Nov 22 15:15:13 2019] BTRFS info (device sdb): balance: ended with
>> status: -5
>>
>>
>> What should I do now to get more information on the issue ?
> Not exactly.
>
> But I have an idea to see if it's really a certain block group causing
> the problem.
>
> 1. Get the block group/chunk list.
>    You can go the traditional way, by using "btrfs ins dump-tree" or
>    more advanced tool to get block group/chunk list.
>
>    If you go the manual way, it's something like:
>    # btrfs ins dump-tree -t chunk <device>
>    item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 290455552) itemoff 15785
> itemsize 80
>                 length 1073741824 owner 2 stripe_len 65536 type DATA
>                 io_align 65536 io_width 65536 sector_size 4096
>                 num_stripes 1 sub_stripes 1
>                         stripe 0 devid 1 offset 290455552
>                         dev_uuid b929fabe-c291-4fd8-a01e-c67259d202ed
>
>
>    In above case, 290455552 is the chunk's logical bytenr, and
>    1073741824 is the length. Record them all.
>
> 2. Use vrange filter.
>    Btrfs balance can balance certain block groups only, you can use
>    vrange=290455552..1364197375 to relocate the block group above.
>
>    So you can try to relocate block groups one by one manually.
>    I recommend to relocate block group 8801957838848 first, as it looks
>    like to be the offending one.
>
>    If you can manually relocate that block group manually, then it must
>    be something wrong with multiple block groups relocation sequence.
>
> Thanks,
> Qu
>>
>> Thank.
>>
>>
>>

OK just a follow up. As you can see that the original metadata was RAID1
and sitting on 2 drives. This is how it works currently though changes
are in the works so I see, however I was not happy with that so I
decided to balance mconvert=raidi10 it instead and use the other 2
drives as well. Well that worked no issues at all. So I decided to try
and another normal data balance.. I moved from 5 .. 95 in 5 increments
and until it hit 95 did it actually do anything then it moved just 2
blocks which took about 2 mins and that was it. No more balancing needed.


So not sure exactly what the issue was but I suspect it was replacing
the drive didn't not also replace it's place in the meta pool which left
some devices with no meta on them at all and all sorts of weirdness
ensued. So given that scrub passed check passed and all devices are now
being used for system meta and data I have started writing more data to
it now.. and as expected it is starting to balance the 4 devices on its
own now.


So keep this oddity for reference but as far as I can see swapping
metadata from Raid 1 to raid 10 solved my issues.


Thanks for all the pointers guys.. Appreciate not feeling alone on this.


Cheers



Don Alex




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Problems balancing BTRFS
  2019-11-22 13:20   ` devel
  2019-11-22 13:56     ` Qu Wenruo
@ 2019-11-23 16:20     ` Chris Murphy
  1 sibling, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2019-11-23 16:20 UTC (permalink / raw)
  To: devel; +Cc: linux-btrfs

On Fri, Nov 22, 2019 at 6:41 AM <devel@roosoft.ltd.uk> wrote:
>
> So really I am not sure what to do. It only seems to appear during a
> balance, which as far as I know is a much needed regular maintenance
> tool to keep a fs healthy, which is why it is part of the
> btrfsmaintenance tools

I wouldn't say balance is a needed regular maintenance item, let alone
much needed. The issue in your case is that the allocation is
inefficient. If I'm parsing the output correctly, it looks as if the
block groups could be, in effect, two sets of two device raid5. But
that's the worst case scenario, we'd need to see the chunk tree to
know how inefficient it is, but it's no worse than if the data were
raid1. And yes it increases the time for scrubbing as well.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-11-23 16:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-22 12:37 Problems balancing BTRFS devel
2019-11-22 13:10 ` Qu Wenruo
2019-11-22 13:20   ` devel
2019-11-22 13:56     ` Qu Wenruo
2019-11-22 14:07       ` devel
2019-11-22 15:32         ` devel
2019-11-23  0:09           ` Qu Wenruo
2019-11-23 12:53             ` devel
2019-11-23 16:20     ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).