* Re: Cannot balance FS (No space left on device)
[not found] <CAKzrAgSGRQk_wEairoCUhK6GDCFOVbVWJLub4M_fu7uHC-pO0w@mail.gmail.com>
@ 2016-06-15 10:59 ` ojab //
2016-06-15 12:41 ` E V
0 siblings, 1 reply; 14+ messages in thread
From: ojab // @ 2016-06-15 10:59 UTC (permalink / raw)
To: linux-btrfs
On Fri, Jun 10, 2016 at 2:58 PM, ojab // <ojab@ojab.ru> wrote:
> [Please CC me since I'm not subscribed to the list]
So I'm still playing w/ btrfs and again I have 'No space left on
device' during balance:
>$ sudo /usr/bin/btrfs balance start --full-balance /mnt/xxx/
>ERROR: error during balancing '/mnt/xxx/': No space left on device
>There may be more info in syslog - try dmesg | tail
>$ sudo dmesg -T | grep BTRFS | tail
>[Wed Jun 15 10:28:53 2016] BTRFS info (device sdc1): relocating block group 13043037372416 flags 9
>[Wed Jun 15 10:28:53 2016] BTRFS info (device sdc1): relocating block group 13041963630592 flags 20
>[Wed Jun 15 10:29:54 2016] BTRFS info (device sdc1): found 25155 extents
>[Wed Jun 15 10:29:54 2016] BTRFS info (device sdc1): relocating block group 13040889888768 flags 20
>[Wed Jun 15 10:30:50 2016] BTRFS info (device sdc1): found 63700 extents
>[Wed Jun 15 10:30:50 2016] BTRFS info (device sdc1): relocating block group 13040856334336 flags 18
>[Wed Jun 15 10:30:51 2016] BTRFS info (device sdc1): found 9 extents
>[Wed Jun 15 10:30:52 2016] BTRFS info (device sdc1): relocating block group 13039782592512 flags 20
>[Wed Jun 15 10:32:08 2016] BTRFS info (device sdc1): found 61931 extents
>[Wed Jun 15 10:32:08 2016] BTRFS info (device sdc1): 896 enospc errors during balance
>$ sudo /usr/bin/btrfs balance start -dusage=75 /mnt/xxx/
>Done, had to relocate 1 out of 901 chunks
>$ sudo /usr/bin/btrfs balance start -dusage=76 /mnt/xxx/
>ERROR: error during balancing '/mnt/xxx/': No space left on device
>There may be more info in syslog - try dmesg | tail
>$ sudo /usr/bin/btrfs fi usage /mnt/xxx/
>Overall:
> Device size: 1.98TiB
> Device allocated: 1.85TiB
> Device unallocated: 135.06GiB
> Device missing: 0.00B
> Used: 1.85TiB
> Free (estimated): 135.68GiB (min: 68.15GiB)
> Data ratio: 1.00
> Metadata ratio: 2.00
> Global reserve: 512.00MiB (used: 0.00B)
>
>Data,RAID0: Size:1.84TiB, Used:1.84TiB
> /dev/sdb1 895.27GiB
> /dev/sdc1 895.27GiB
> /dev/sdd1 37.27GiB
> /dev/sdd2 37.27GiB
> /dev/sde1 11.27GiB
> /dev/sde2 11.27GiB
>
>Metadata,RAID1: Size:4.00GiB, Used:2.21GiB
> /dev/sdb1 2.00GiB
> /dev/sdc1 2.00GiB
> /dev/sde1 2.00GiB
> /dev/sde2 2.00GiB
>
>System,RAID1: Size:32.00MiB, Used:160.00KiB
> /dev/sde1 32.00MiB
> /dev/sde2 32.00MiB
>
>Unallocated:
> /dev/sdb1 34.25GiB
> /dev/sdc1 34.25GiB
> /dev/sdd1 1.11MiB
> /dev/sdd2 1.05MiB
> /dev/sde1 33.28GiB
> /dev/sde2 33.28GiB
>$ sudo /usr/bin/btrfs fi show /mnt/xxx/
>Label: none uuid: 8a65465d-1a8c-4f80-abc6-c818c38567c3
> Total devices 6 FS bytes used 1.84TiB
> devid 1 size 931.51GiB used 897.27GiB path /dev/sdc1
> devid 2 size 931.51GiB used 897.27GiB path /dev/sdb1
> devid 3 size 37.27GiB used 37.27GiB path /dev/sdd1
> devid 4 size 37.27GiB used 37.27GiB path /dev/sdd2
> devid 5 size 46.58GiB used 13.30GiB path /dev/sde1
> devid 6 size 46.58GiB used 13.30GiB path /dev/sde2
show_usage.py output can be found here:
https://gist.github.com/ojab/a24ce373ce5bede001140c572879fce8
Balance always fails with '896 enospc errors during balance' message
in dmesg. I don't quite understand the logic: there is a plenty of
space on four devices, why `btrfs` apparently trying to use sdd[0-1]
drives, is it a bug or intended behaviour?
What is the proper way of fixing such an issue in general, adding more
devices and rebalancing? How can I determine how many devices should
be added and their capacity?
I'm still on 4.6.2 vanilla kernel and using btrfs-progs-4.6.
//wbr ojab
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Cannot balance FS (No space left on device)
2016-06-15 10:59 ` Cannot balance FS (No space left on device) ojab //
@ 2016-06-15 12:41 ` E V
2016-06-15 19:29 ` ojab //
0 siblings, 1 reply; 14+ messages in thread
From: E V @ 2016-06-15 12:41 UTC (permalink / raw)
To: ojab //; +Cc: linux-btrfs
In my experience phantom ENOSPC messages are frequently due to the
free space cache being corrupt. Mounting with nospace_cache or
space_cache=v2 may help.
On Wed, Jun 15, 2016 at 6:59 AM, ojab // <ojab@ojab.ru> wrote:
> On Fri, Jun 10, 2016 at 2:58 PM, ojab // <ojab@ojab.ru> wrote:
>> [Please CC me since I'm not subscribed to the list]
>
> So I'm still playing w/ btrfs and again I have 'No space left on
> device' during balance:
>>$ sudo /usr/bin/btrfs balance start --full-balance /mnt/xxx/
>>ERROR: error during balancing '/mnt/xxx/': No space left on device
>>There may be more info in syslog - try dmesg | tail
>>$ sudo dmesg -T | grep BTRFS | tail
>>[Wed Jun 15 10:28:53 2016] BTRFS info (device sdc1): relocating block group 13043037372416 flags 9
>>[Wed Jun 15 10:28:53 2016] BTRFS info (device sdc1): relocating block group 13041963630592 flags 20
>>[Wed Jun 15 10:29:54 2016] BTRFS info (device sdc1): found 25155 extents
>>[Wed Jun 15 10:29:54 2016] BTRFS info (device sdc1): relocating block group 13040889888768 flags 20
>>[Wed Jun 15 10:30:50 2016] BTRFS info (device sdc1): found 63700 extents
>>[Wed Jun 15 10:30:50 2016] BTRFS info (device sdc1): relocating block group 13040856334336 flags 18
>>[Wed Jun 15 10:30:51 2016] BTRFS info (device sdc1): found 9 extents
>>[Wed Jun 15 10:30:52 2016] BTRFS info (device sdc1): relocating block group 13039782592512 flags 20
>>[Wed Jun 15 10:32:08 2016] BTRFS info (device sdc1): found 61931 extents
>>[Wed Jun 15 10:32:08 2016] BTRFS info (device sdc1): 896 enospc errors during balance
>>$ sudo /usr/bin/btrfs balance start -dusage=75 /mnt/xxx/
>>Done, had to relocate 1 out of 901 chunks
>>$ sudo /usr/bin/btrfs balance start -dusage=76 /mnt/xxx/
>>ERROR: error during balancing '/mnt/xxx/': No space left on device
>>There may be more info in syslog - try dmesg | tail
>>$ sudo /usr/bin/btrfs fi usage /mnt/xxx/
>>Overall:
>> Device size: 1.98TiB
>> Device allocated: 1.85TiB
>> Device unallocated: 135.06GiB
>> Device missing: 0.00B
>> Used: 1.85TiB
>> Free (estimated): 135.68GiB (min: 68.15GiB)
>> Data ratio: 1.00
>> Metadata ratio: 2.00
>> Global reserve: 512.00MiB (used: 0.00B)
>>
>>Data,RAID0: Size:1.84TiB, Used:1.84TiB
>> /dev/sdb1 895.27GiB
>> /dev/sdc1 895.27GiB
>> /dev/sdd1 37.27GiB
>> /dev/sdd2 37.27GiB
>> /dev/sde1 11.27GiB
>> /dev/sde2 11.27GiB
>>
>>Metadata,RAID1: Size:4.00GiB, Used:2.21GiB
>> /dev/sdb1 2.00GiB
>> /dev/sdc1 2.00GiB
>> /dev/sde1 2.00GiB
>> /dev/sde2 2.00GiB
>>
>>System,RAID1: Size:32.00MiB, Used:160.00KiB
>> /dev/sde1 32.00MiB
>> /dev/sde2 32.00MiB
>>
>>Unallocated:
>> /dev/sdb1 34.25GiB
>> /dev/sdc1 34.25GiB
>> /dev/sdd1 1.11MiB
>> /dev/sdd2 1.05MiB
>> /dev/sde1 33.28GiB
>> /dev/sde2 33.28GiB
>>$ sudo /usr/bin/btrfs fi show /mnt/xxx/
>>Label: none uuid: 8a65465d-1a8c-4f80-abc6-c818c38567c3
>> Total devices 6 FS bytes used 1.84TiB
>> devid 1 size 931.51GiB used 897.27GiB path /dev/sdc1
>> devid 2 size 931.51GiB used 897.27GiB path /dev/sdb1
>> devid 3 size 37.27GiB used 37.27GiB path /dev/sdd1
>> devid 4 size 37.27GiB used 37.27GiB path /dev/sdd2
>> devid 5 size 46.58GiB used 13.30GiB path /dev/sde1
>> devid 6 size 46.58GiB used 13.30GiB path /dev/sde2
>
> show_usage.py output can be found here:
> https://gist.github.com/ojab/a24ce373ce5bede001140c572879fce8
>
> Balance always fails with '896 enospc errors during balance' message
> in dmesg. I don't quite understand the logic: there is a plenty of
> space on four devices, why `btrfs` apparently trying to use sdd[0-1]
> drives, is it a bug or intended behaviour?
> What is the proper way of fixing such an issue in general, adding more
> devices and rebalancing? How can I determine how many devices should
> be added and their capacity?
>
> I'm still on 4.6.2 vanilla kernel and using btrfs-progs-4.6.
>
> //wbr ojab
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Cannot balance FS (No space left on device)
2016-06-15 12:41 ` E V
@ 2016-06-15 19:29 ` ojab //
0 siblings, 0 replies; 14+ messages in thread
From: ojab // @ 2016-06-15 19:29 UTC (permalink / raw)
To: E V; +Cc: linux-btrfs
On Wed, Jun 15, 2016 at 12:41 PM, E V <eliventer@gmail.com> wrote:
> In my experience phantom ENOSPC messages are frequently due to the
> free space cache being corrupt. Mounting with nospace_cache or
> space_cache=v2 may help.
Unfortunately this is not the case.
//wbr ojab
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Cannot balance FS (No space left on device)
2016-07-02 19:03 ` Chris Murphy
@ 2016-07-04 8:32 ` ojab //
0 siblings, 0 replies; 14+ messages in thread
From: ojab // @ 2016-07-04 8:32 UTC (permalink / raw)
To: Chris Murphy
Cc: Hans van Kranenburg, Austin S. Hemmelgarn, Henk Slager, linux-btrfs
On Sat, Jul 2, 2016 at 7:03 PM, Chris Murphy <lists@colorremedies.com> wrote:
> On Sat, Jul 2, 2016 at 9:07 AM, Hans van Kranenburg
> <hans.van.kranenburg@mendix.com> wrote:
>
>>
>> Also, the behaviour of *always* creating a new empty block group before
>> starting to work (which makes it impossible to free up space on a fully
>> allocated filesystem with balance) got reverted in:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=cf25ce518e8ef9d59b292e51193bed2b023a32da
>>
>> This patch is in 4.5 and 4.7-rc, but *not* in 4.6.
>
> Upstream it first appears in 4.5.7.
>
> --
> Chris Murphy
And looks like this patch also fixed my `balance` issue, yay. Thanks.
//wbr ojab
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Cannot balance FS (No space left on device)
2016-07-02 15:07 ` Hans van Kranenburg
@ 2016-07-02 19:03 ` Chris Murphy
2016-07-04 8:32 ` ojab //
0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2016-07-02 19:03 UTC (permalink / raw)
To: Hans van Kranenburg
Cc: Austin S. Hemmelgarn, ojab //, Henk Slager, linux-btrfs
On Sat, Jul 2, 2016 at 9:07 AM, Hans van Kranenburg
<hans.van.kranenburg@mendix.com> wrote:
>
> Also, the behaviour of *always* creating a new empty block group before
> starting to work (which makes it impossible to free up space on a fully
> allocated filesystem with balance) got reverted in:
>
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=cf25ce518e8ef9d59b292e51193bed2b023a32da
>
> This patch is in 4.5 and 4.7-rc, but *not* in 4.6.
Upstream it first appears in 4.5.7.
--
Chris Murphy
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Cannot balance FS (No space left on device)
2016-06-13 12:33 ` Austin S. Hemmelgarn
@ 2016-07-02 15:07 ` Hans van Kranenburg
2016-07-02 19:03 ` Chris Murphy
0 siblings, 1 reply; 14+ messages in thread
From: Hans van Kranenburg @ 2016-07-02 15:07 UTC (permalink / raw)
To: Austin S. Hemmelgarn, ojab //; +Cc: Henk Slager, linux-btrfs
On 06/13/2016 02:33 PM, Austin S. Hemmelgarn wrote:
> On 2016-06-10 18:39, Hans van Kranenburg wrote:
>> On 06/11/2016 12:10 AM, ojab // wrote:
>>> On Fri, Jun 10, 2016 at 9:56 PM, Hans van Kranenburg
>>> <hans.van.kranenburg@mendix.com> wrote:
>>>> You can work around it by either adding two disks (like Henk said),
>>>> or by
>>>> temporarily converting some chunks to single. Just enough to get some
>>>> free
>>>> space on the first two disks to get a balance going that can fill the
>>>> third
>>>> one. You don't have to convert all of your data or metadata to single!
>>>>
>>>> Something like:
>>>>
>>>> btrfs balance start -v -dconvert=single,limit=10 /mnt/xxx/
>>>
>>> Unfortunately it fails even if I set limit=1:
>>>> $ sudo btrfs balance start -v -dconvert=single,limit=1 /mnt/xxx/
>>>> Dumping filters: flags 0x1, state 0x0, force is off
>>>> DATA (flags 0x120): converting, target=281474976710656, soft is
>>>> off, limit=1
>>>> ERROR: error during balancing '/mnt/xxx/': No space left on device
>>>> There may be more info in syslog - try dmesg | tail
>>
>> Ah, apparently the balance operation *always* wants to allocate some new
>> empty space before starting to look more close at the task you give it...
> No, that's not exactly true. It seems to be a rather common fallacy
> right now that balance repacks data into existing chunks, which is
> absolutely false. What a balance does is to send everything selected by
> the filters through the allocator again, and specifically prevent any
> existing chunks from being used to satisfy the allocation. When you
> have 5 data chunks that are 20% used and run 'balance -dlimit=20', it
> doesn't pack that all into the first chunk, it allocates a new chunk,
> and then packs it all into that, then frees all the other chunks. This
> behavior is actually a pretty important property when adding or removing
> devices or converting between profiles, because it's what forces things
> into the new configuration of the filesystem.
>
> In an ideal situation, the limit filters should make it repack into
> existing chunks when specified alone, but currently that's not how it
> works, and I kind of doubt that that will ever be how it works.
I have to disagree with you here, based on what I see happening. Two
examples will follow, providing some pudding for the proof.
Also, the behaviour of *always* creating a new empty block group before
starting to work (which makes it impossible to free up space on a fully
allocated filesystem with balance) got reverted in:
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=cf25ce518e8ef9d59b292e51193bed2b023a32da
This patch is in 4.5 and 4.7-rc, but *not* in 4.6.
Script used to provide block group output, using pyton-btrfs:
-# cat show_block_groups.py
#!/usr/bin/python
from __future__ import print_function
import btrfs
import sys
fs = btrfs.FileSystem(sys.argv[1])
for chunk in fs.chunks():
print(fs.block_group(chunk.vaddr, chunk.length))
Example 1:
-# uname -a
Linux ichiban 4.5.0-0.bpo.2-amd64 #1 SMP Debian 4.5.4-1~bpo8+1
(2016-05-13) x86_64 GNU/Linux
-# ./show_block_groups.py /
block group vaddr 86211821568 length 1073741824 flags DATA used
837120000 used_pct 78
block group vaddr 87285563392 length 33554432 flags SYSTEM used 16384
used_pct 0
block group vaddr 87319117824 length 1073741824 flags DATA used
1070030848 used_pct 100
block group vaddr 88392859648 length 1073741824 flags DATA used
1057267712 used_pct 98
block group vaddr 89466601472 length 1073741824 flags DATA used
1066360832 used_pct 99
block group vaddr 90540343296 length 268435456 flags METADATA used
238256128 used_pct 89
block group vaddr 90808778752 length 268435456 flags METADATA used
226082816 used_pct 84
block group vaddr 91077214208 length 268435456 flags METADATA used
242548736 used_pct 90
block group vaddr 91345649664 length 268435456 flags METADATA used
218415104 used_pct 81
block group vaddr 91614085120 length 268435456 flags METADATA used
223723520 used_pct 83
block group vaddr 91882520576 length 268435456 flags METADATA used
68272128 used_pct 25
block group vaddr 92150956032 length 1073741824 flags DATA used
1048154112 used_pct 98
block group vaddr 93224697856 length 1073741824 flags DATA used
800985088 used_pct 75
block group vaddr 94298439680 length 1073741824 flags DATA used 62197760
used_pct 6
block group vaddr 95372181504 length 1073741824 flags DATA used 49541120
used_pct 5
block group vaddr 96445923328 length 1073741824 flags DATA used
142856192 used_pct 13
block group vaddr 97519665152 length 1073741824 flags DATA used
102051840 used_pct 10
Now do a balance, to remove the least used block group:
1st terminal:
-# watch -d './show_block_groups.py /'
2nd terminal:
-# btrfs balance start -v -dusage=5 /
Dumping filters: flags 0x1, state 0x0, force is off
DATA (flags 0x2): balancing, usage=5
Done, had to relocate 1 out of 17 chunks
After:
-# ./show_block_groups.py /
block group vaddr 86211821568 length 1073741824 flags DATA used
837120000 used_pct 78
block group vaddr 87285563392 length 33554432 flags SYSTEM used 16384
used_pct 0
block group vaddr 87319117824 length 1073741824 flags DATA used
1070030848 used_pct 100
block group vaddr 88392859648 length 1073741824 flags DATA used
1057267712 used_pct 98
block group vaddr 89466601472 length 1073741824 flags DATA used
1066360832 used_pct 99
block group vaddr 90540343296 length 268435456 flags METADATA used
236830720 used_pct 88
block group vaddr 90808778752 length 268435456 flags METADATA used
224100352 used_pct 83
block group vaddr 91077214208 length 268435456 flags METADATA used
248299520 used_pct 92
block group vaddr 91345649664 length 268435456 flags METADATA used
218333184 used_pct 81
block group vaddr 91614085120 length 268435456 flags METADATA used
223117312 used_pct 83
block group vaddr 91882520576 length 268435456 flags METADATA used
66551808 used_pct 25
block group vaddr 92150956032 length 1073741824 flags DATA used
1048154112 used_pct 98
block group vaddr 93224697856 length 1073741824 flags DATA used
800985088 used_pct 75
block group vaddr 94298439680 length 1073741824 flags DATA used 62033920
used_pct 6
block group vaddr 96445923328 length 1073741824 flags DATA used
142331904 used_pct 13
block group vaddr 97519665152 length 1073741824 flags DATA used
152297472 used_pct 14
block group vaddr 98593406976 length 1073741824 flags DATA used 0 used_pct 0
First, the new empty block group is created, after that (using the
watch) I can see the data from 95372181504 moving into an existing bg at
97519665152. The empty one is left behind.
Second example:
-# uname -a
Linux mekker 4.7.0-rc4-amd64 #1 SMP Debian 4.7~rc4-1~exp1 (2016-06-20)
x86_64 GNU/Linux
-# ./show_block_groups.py /
block group vaddr 21630025728 length 33554432 flags SYSTEM used 4096
used_pct 0
block group vaddr 21663580160 length 268435456 flags METADATA used
108011520 used_pct 40
block group vaddr 21932015616 length 268435456 flags METADATA used
171769856 used_pct 64
block group vaddr 22200451072 length 268435456 flags METADATA used
89567232 used_pct 33
block group vaddr 22468886528 length 1073741824 flags DATA used
1059094528 used_pct 99
block group vaddr 24616370176 length 1073741824 flags DATA used
1024077824 used_pct 95
block group vaddr 25690112000 length 1073741824 flags DATA used
661626880 used_pct 62
block group vaddr 27837595648 length 1073741824 flags DATA used
824950784 used_pct 77
block group vaddr 28911337472 length 1073741824 flags DATA used
939896832 used_pct 88
block group vaddr 31058821120 length 1073741824 flags DATA used
816013312 used_pct 76
block group vaddr 32132562944 length 1073741824 flags DATA used
984100864 used_pct 92
block group vaddr 33206304768 length 1073741824 flags DATA used
541122560 used_pct 50
block group vaddr 36427530240 length 268435456 flags METADATA used
79302656 used_pct 30
block group vaddr 58528366592 length 1073741824 flags DATA used
579461120 used_pct 54
block group vaddr 69265784832 length 1073741824 flags DATA used
462090240 used_pct 43
block group vaddr 70339526656 length 1073741824 flags DATA used
700502016 used_pct 65
block group vaddr 71413268480 length 1073741824 flags DATA used
255000576 used_pct 24
block group vaddr 72487010304 length 1073741824 flags DATA used
348327936 used_pct 32
block group vaddr 73560752128 length 1073741824 flags DATA used
476127232 used_pct 44
block group vaddr 75708235776 length 1073741824 flags DATA used
301572096 used_pct 28
block group vaddr 76781977600 length 1073741824 flags DATA used
476241920 used_pct 44
block group vaddr 77855719424 length 1073741824 flags DATA used
844894208 used_pct 79
Now, let's do a balance that will remove the bg at 71413268480:
-# btrfs balance start -v -dusage=25 .
Dumping filters: flags 0x1, state 0x0, force is off
DATA (flags 0x2): balancing, usage=25
Done, had to relocate 1 out of 22 chunks
Result:
-# ./show_block_groups.py /
block group vaddr 21630025728 length 33554432 flags SYSTEM used 4096
used_pct 0
block group vaddr 21663580160 length 268435456 flags METADATA used
107319296 used_pct 40
block group vaddr 21932015616 length 268435456 flags METADATA used
175788032 used_pct 65
block group vaddr 22200451072 length 268435456 flags METADATA used
89026560 used_pct 33
block group vaddr 22468886528 length 1073741824 flags DATA used
1059090432 used_pct 99
block group vaddr 24616370176 length 1073741824 flags DATA used
1061240832 used_pct 99
block group vaddr 25690112000 length 1073741824 flags DATA used
879472640 used_pct 82
block group vaddr 27837595648 length 1073741824 flags DATA used
824950784 used_pct 77
block group vaddr 28911337472 length 1073741824 flags DATA used
939896832 used_pct 88
block group vaddr 31058821120 length 1073741824 flags DATA used
816013312 used_pct 76
block group vaddr 32132562944 length 1073741824 flags DATA used
984100864 used_pct 92
block group vaddr 33206304768 length 1073741824 flags DATA used
541122560 used_pct 50
block group vaddr 36427530240 length 268435456 flags METADATA used
76374016 used_pct 28
block group vaddr 58528366592 length 1073741824 flags DATA used
579461120 used_pct 54
block group vaddr 69265784832 length 1073741824 flags DATA used
462090240 used_pct 43
block group vaddr 70339526656 length 1073741824 flags DATA used
700502016 used_pct 65
block group vaddr 72487010304 length 1073741824 flags DATA used
348327936 used_pct 32
block group vaddr 73560752128 length 1073741824 flags DATA used
476127232 used_pct 44
block group vaddr 75708235776 length 1073741824 flags DATA used
301572096 used_pct 28
block group vaddr 76781977600 length 1073741824 flags DATA used
476241920 used_pct 44
block group vaddr 77855719424 length 1073741824 flags DATA used
844886016 used_pct 79
No new empty block group, yay. Data from the 24% filled at 71413268480
moved into existing block groups 24616370176 and 25690112000.
--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenburg@mendix.com | www.mendix.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Cannot balance FS (No space left on device)
2016-06-10 22:39 ` Hans van Kranenburg
@ 2016-06-13 12:33 ` Austin S. Hemmelgarn
2016-07-02 15:07 ` Hans van Kranenburg
0 siblings, 1 reply; 14+ messages in thread
From: Austin S. Hemmelgarn @ 2016-06-13 12:33 UTC (permalink / raw)
To: Hans van Kranenburg, ojab //; +Cc: Henk Slager, linux-btrfs
On 2016-06-10 18:39, Hans van Kranenburg wrote:
> On 06/11/2016 12:10 AM, ojab // wrote:
>> On Fri, Jun 10, 2016 at 9:56 PM, Hans van Kranenburg
>> <hans.van.kranenburg@mendix.com> wrote:
>>> You can work around it by either adding two disks (like Henk said),
>>> or by
>>> temporarily converting some chunks to single. Just enough to get some
>>> free
>>> space on the first two disks to get a balance going that can fill the
>>> third
>>> one. You don't have to convert all of your data or metadata to single!
>>>
>>> Something like:
>>>
>>> btrfs balance start -v -dconvert=single,limit=10 /mnt/xxx/
>>
>> Unfortunately it fails even if I set limit=1:
>>> $ sudo btrfs balance start -v -dconvert=single,limit=1 /mnt/xxx/
>>> Dumping filters: flags 0x1, state 0x0, force is off
>>> DATA (flags 0x120): converting, target=281474976710656, soft is
>>> off, limit=1
>>> ERROR: error during balancing '/mnt/xxx/': No space left on device
>>> There may be more info in syslog - try dmesg | tail
>
> Ah, apparently the balance operation *always* wants to allocate some new
> empty space before starting to look more close at the task you give it...
No, that's not exactly true. It seems to be a rather common fallacy
right now that balance repacks data into existing chunks, which is
absolutely false. What a balance does is to send everything selected by
the filters through the allocator again, and specifically prevent any
existing chunks from being used to satisfy the allocation. When you
have 5 data chunks that are 20% used and run 'balance -dlimit=20', it
doesn't pack that all into the first chunk, it allocates a new chunk,
and then packs it all into that, then frees all the other chunks. This
behavior is actually a pretty important property when adding or removing
devices or converting between profiles, because it's what forces things
into the new configuration of the filesystem.
In an ideal situation, the limit filters should make it repack into
existing chunks when specified alone, but currently that's not how it
works, and I kind of doubt that that will ever be how it works.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Cannot balance FS (No space left on device)
2016-06-10 21:00 ` Henk Slager
2016-06-10 21:33 ` ojab //
@ 2016-06-12 22:00 ` ojab //
1 sibling, 0 replies; 14+ messages in thread
From: ojab // @ 2016-06-12 22:00 UTC (permalink / raw)
To: Henk Slager; +Cc: linux-btrfs
On Fri, Jun 10, 2016 at 9:00 PM, Henk Slager <eye1tm@gmail.com> wrote:
> I have seldom seen an fs so full, very regular numbers :)
>
> But can you provide the output of this script:
> https://github.com/knorrie/btrfs-heatmap/blob/master/show_usage.py
>
> It gives better info w.r.t. devices and it is then easier to say what
> has to be done.
>
> But you have btrfs raid0 data (2 stripes) and raid1 metadata, and they
> both want 2 devices currently and there is only one device with place
> for your 2G chunks. So in theory you need 2 empty devices added for a
> balance to succeed. If you can allow reduces redundancy for some time,
> you could shrink the fs used space on hdd1 to half, same for the
> partition itself, add a hdd2 parttition and add that to the fs. Or
> just add another HDD.
> Then your 50Gb of deletions could get into effect if you start
> balancing. Also have a look at the balance stripe filters I would say.
So after adding another one [100Gb] disk I've successfully run `btrfs
balance` and deleted new disks without any issues.
Thanks for your help.
//wbr ojab
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Cannot balance FS (No space left on device)
2016-06-10 22:10 ` ojab //
@ 2016-06-10 22:39 ` Hans van Kranenburg
2016-06-13 12:33 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 14+ messages in thread
From: Hans van Kranenburg @ 2016-06-10 22:39 UTC (permalink / raw)
To: ojab //; +Cc: Henk Slager, linux-btrfs
On 06/11/2016 12:10 AM, ojab // wrote:
> On Fri, Jun 10, 2016 at 9:56 PM, Hans van Kranenburg
> <hans.van.kranenburg@mendix.com> wrote:
>> You can work around it by either adding two disks (like Henk said), or by
>> temporarily converting some chunks to single. Just enough to get some free
>> space on the first two disks to get a balance going that can fill the third
>> one. You don't have to convert all of your data or metadata to single!
>>
>> Something like:
>>
>> btrfs balance start -v -dconvert=single,limit=10 /mnt/xxx/
>
> Unfortunately it fails even if I set limit=1:
>> $ sudo btrfs balance start -v -dconvert=single,limit=1 /mnt/xxx/
>> Dumping filters: flags 0x1, state 0x0, force is off
>> DATA (flags 0x120): converting, target=281474976710656, soft is off, limit=1
>> ERROR: error during balancing '/mnt/xxx/': No space left on device
>> There may be more info in syslog - try dmesg | tail
Ah, apparently the balance operation *always* wants to allocate some new
empty space before starting to look more close at the task you give it...
This means that it's trying to allocate a new set of RAID0 chunks
first... and that's exactly the opposite of what we want to accomplish here.
If you really can add only one extra device now, there's always a more
dirty way to get the job done.
What you can do for example is:
- partition the new disk in two partitions
- add them both to the filesystem (btrfs doesn't know both block devices
are on the same physical disk, ghehe)
- convert a small number of data blocks to single
- then device delete the third disk again so the single chunks move back
to the two first disks
- add the third disk back as one whole block device
- etc...
:D
Moo,
--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenburg@mendix.com | www.mendix.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Cannot balance FS (No space left on device)
2016-06-10 21:56 ` Hans van Kranenburg
@ 2016-06-10 22:10 ` ojab //
2016-06-10 22:39 ` Hans van Kranenburg
0 siblings, 1 reply; 14+ messages in thread
From: ojab // @ 2016-06-10 22:10 UTC (permalink / raw)
To: Hans van Kranenburg; +Cc: Henk Slager, linux-btrfs
On Fri, Jun 10, 2016 at 9:56 PM, Hans van Kranenburg
<hans.van.kranenburg@mendix.com> wrote:
> You can work around it by either adding two disks (like Henk said), or by
> temporarily converting some chunks to single. Just enough to get some free
> space on the first two disks to get a balance going that can fill the third
> one. You don't have to convert all of your data or metadata to single!
>
> Something like:
>
> btrfs balance start -v -dconvert=single,limit=10 /mnt/xxx/
Unfortunately it fails even if I set limit=1:
>$ sudo btrfs balance start -v -dconvert=single,limit=1 /mnt/xxx/
>Dumping filters: flags 0x1, state 0x0, force is off
> DATA (flags 0x120): converting, target=281474976710656, soft is off, limit=1
>ERROR: error during balancing '/mnt/xxx/': No space left on device
>There may be more info in syslog - try dmesg | tail
//wbr ojab
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Cannot balance FS (No space left on device)
2016-06-10 21:33 ` ojab //
@ 2016-06-10 21:56 ` Hans van Kranenburg
2016-06-10 22:10 ` ojab //
0 siblings, 1 reply; 14+ messages in thread
From: Hans van Kranenburg @ 2016-06-10 21:56 UTC (permalink / raw)
To: ojab //, Henk Slager; +Cc: linux-btrfs
On 06/10/2016 11:33 PM, ojab // wrote:
> On Fri, Jun 10, 2016 at 9:00 PM, Henk Slager <eye1tm@gmail.com> wrote:
>> I have seldom seen an fs so full, very regular numbers :)
>>
>> But can you provide the output of this script:
>> https://github.com/knorrie/btrfs-heatmap/blob/master/show_usage.py
>>
>> It gives better info w.r.t. devices and it is then easier to say what
>> has to be done.
>>
>> But you have btrfs raid0 data (2 stripes) and raid1 metadata, and they
>> both want 2 devices currently and there is only one device with place
>> for your 2G chunks. So in theory you need 2 empty devices added for a
>> balance to succeed. If you can allow reduces redundancy for some time,
>> you could shrink the fs used space on hdd1 to half, same for the
>> partition itself, add a hdd2 parttition and add that to the fs. Or
>> just add another HDD.
>> Then your 50Gb of deletions could get into effect if you start
>> balancing. Also have a look at the balance stripe filters I would say.
>
> Output of show_usage.py:
> https://gist.githubusercontent.com/ojab/850276af6ff3aa566b8a3ce6ec444521/raw/4d77e02d556ed0edb0f9823259f145f65e80bc66/gistfile1.txt
> Looks like I only have smaller spare drives at the moment (largest is
> 100GB), is it ok to use? Or there is some minimal drive size needed
> for my setup?
You can work around it by either adding two disks (like Henk said), or
by temporarily converting some chunks to single. Just enough to get some
free space on the first two disks to get a balance going that can fill
the third one. You don't have to convert all of your data or metadata to
single!
Something like:
btrfs balance start -v -dconvert=single,limit=10 /mnt/xxx/
New allocated chunks will go to the third disk, because it has the most
free space.
After this, you can convert the single data back to raid0:
btrfs balance start -v -dconvert=raid0,soft /mnt/xxx/
soft is important, because it only touches everything that is not raid0 yet.
And in the end there should be a few GB of free space on the first two
disks, so you can do the big balance to spread all data over the three
disks, just btrfs balance start -v -dusage=100 /mnt/xxx/
Review the commands before doing anything, as I haven't tested this
here. The man page for btrfs-balance contains all info :)
Looking at btrfs balance status, btrfs fi show etc, in another terminal
while it's working is always nice, so you see what's happening, and you
can always stop it when you think it moved around enough data with btrfs
balance cancel.
Moo,
--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenburg@mendix.com | www.mendix.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Cannot balance FS (No space left on device)
2016-06-10 21:00 ` Henk Slager
@ 2016-06-10 21:33 ` ojab //
2016-06-10 21:56 ` Hans van Kranenburg
2016-06-12 22:00 ` ojab //
1 sibling, 1 reply; 14+ messages in thread
From: ojab // @ 2016-06-10 21:33 UTC (permalink / raw)
To: Henk Slager; +Cc: linux-btrfs
On Fri, Jun 10, 2016 at 9:00 PM, Henk Slager <eye1tm@gmail.com> wrote:
> I have seldom seen an fs so full, very regular numbers :)
>
> But can you provide the output of this script:
> https://github.com/knorrie/btrfs-heatmap/blob/master/show_usage.py
>
> It gives better info w.r.t. devices and it is then easier to say what
> has to be done.
>
> But you have btrfs raid0 data (2 stripes) and raid1 metadata, and they
> both want 2 devices currently and there is only one device with place
> for your 2G chunks. So in theory you need 2 empty devices added for a
> balance to succeed. If you can allow reduces redundancy for some time,
> you could shrink the fs used space on hdd1 to half, same for the
> partition itself, add a hdd2 parttition and add that to the fs. Or
> just add another HDD.
> Then your 50Gb of deletions could get into effect if you start
> balancing. Also have a look at the balance stripe filters I would say.
Output of show_usage.py:
https://gist.githubusercontent.com/ojab/850276af6ff3aa566b8a3ce6ec444521/raw/4d77e02d556ed0edb0f9823259f145f65e80bc66/gistfile1.txt
Looks like I only have smaller spare drives at the moment (largest is
100GB), is it ok to use? Or there is some minimal drive size needed
for my setup?
//wbr ojab
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Cannot balance FS (No space left on device)
2016-06-10 18:04 ojab //
@ 2016-06-10 21:00 ` Henk Slager
2016-06-10 21:33 ` ojab //
2016-06-12 22:00 ` ojab //
0 siblings, 2 replies; 14+ messages in thread
From: Henk Slager @ 2016-06-10 21:00 UTC (permalink / raw)
To: ojab //; +Cc: linux-btrfs
On Fri, Jun 10, 2016 at 8:04 PM, ojab // <ojab@ojab.ru> wrote:
> [Please CC me since I'm not subscribed to the list]
> Hi,
> I've tried to `/usr/bin/btrfs fi defragment -r` my btrfs partition,
> but it's failed w/ "No space left on device" and now I can't get any
> free space on that partition (deleting some files or adding new device
> doesn't help). During defrag I've used `space_cache=v2` mount option,
> but remounted FS w/ `clear_cache` flag since then. Also I've deleted
> about 50Gb of files and added new 250Gb disk since then:
>
>>$ df -h /mnt/xxx/
>>Filesystem Size Used Avail Use% Mounted on
>>/dev/sdc1 2,1T 1,8T 37G 99% /mnt/xxx
>>$ sudo /usr/bin/btrfs fi show
>>Label: none uuid: 8a65465d-1a8c-4f80-abc6-c818c38567c3
>> Total devices 3 FS bytes used 1.78TiB
>> devid 1 size 931.51GiB used 931.51GiB path /dev/sdc1
>> devid 2 size 931.51GiB used 931.51GiB path /dev/sdb1
>> devid 3 size 230.41GiB used 0.00B path /dev/sdd1
>>$ sudo /usr/bin/btrfs fi usage /mnt/xxx/
>>Overall:
>> Device size: 2.04TiB
>> Device allocated: 1.82TiB
>> Device unallocated: 230.41GiB
>> Device missing: 0.00B
>> Used: 1.78TiB
>> Free (estimated): 267.23GiB (min: 152.03GiB)
>> Data ratio: 1.00
>> Metadata ratio: 2.00
>> Global reserve: 512.00MiB (used: 0.00B)
>>
>>Data,RAID0: Size:1.81TiB, Used:1.78TiB
>> /dev/sdb1 928.48GiB
>> /dev/sdc1 928.48GiB
>>
>>Metadata,RAID1: Size:3.00GiB, Used:2.30GiB
>> /dev/sdb1 3.00GiB
>> /dev/sdc1 3.00GiB
>>
>>System,RAID1: Size:32.00MiB, Used:176.00KiB
>> /dev/sdb1 32.00MiB
>> /dev/sdc1 32.00MiB
>>
>>Unallocated:
>> /dev/sdb1 1.01MiB
>> /dev/sdc1 1.00MiB
>> /dev/sdd1 230.41GiB
>>$ sudo /usr/bin/btrfs balance start -dusage=66 /mnt/xxx/
>>Done, had to relocate 0 out of 935 chunks
>>$ sudo /usr/bin/btrfs balance start -dusage=67 /mnt/xxx/
>>ERROR: error during balancing '/mnt/xxx/': No space left on device
>>There may be more info in syslog - try dmesg | tail
>
> I assume that there is something wrong with metadata, since I can copy
> files to FS.
> I'm on 4.6.2 vanilla kernel and using btrfs-progs-4.6, btrfs-debugfs
> output can be found here:
> https://gist.githubusercontent.com/ojab/1a8b1f83341403a169a8e66995c7c3da/raw/61621d22f706d7543a93a3d005415543af9a0db0/gistfile1.txt.
> Any hint what else can I try to fix the issue?
I have seldom seen an fs so full, very regular numbers :)
But can you provide the output of this script:
https://github.com/knorrie/btrfs-heatmap/blob/master/show_usage.py
It gives better info w.r.t. devices and it is then easier to say what
has to be done.
But you have btrfs raid0 data (2 stripes) and raid1 metadata, and they
both want 2 devices currently and there is only one device with place
for your 2G chunks. So in theory you need 2 empty devices added for a
balance to succeed. If you can allow reduces redundancy for some time,
you could shrink the fs used space on hdd1 to half, same for the
partition itself, add a hdd2 parttition and add that to the fs. Or
just add another HDD.
Then your 50Gb of deletions could get into effect if you start
balancing. Also have a look at the balance stripe filters I would say.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Cannot balance FS (No space left on device)
@ 2016-06-10 18:04 ojab //
2016-06-10 21:00 ` Henk Slager
0 siblings, 1 reply; 14+ messages in thread
From: ojab // @ 2016-06-10 18:04 UTC (permalink / raw)
To: linux-btrfs
[Please CC me since I'm not subscribed to the list]
Hi,
I've tried to `/usr/bin/btrfs fi defragment -r` my btrfs partition,
but it's failed w/ "No space left on device" and now I can't get any
free space on that partition (deleting some files or adding new device
doesn't help). During defrag I've used `space_cache=v2` mount option,
but remounted FS w/ `clear_cache` flag since then. Also I've deleted
about 50Gb of files and added new 250Gb disk since then:
>$ df -h /mnt/xxx/
>Filesystem Size Used Avail Use% Mounted on
>/dev/sdc1 2,1T 1,8T 37G 99% /mnt/xxx
>$ sudo /usr/bin/btrfs fi show
>Label: none uuid: 8a65465d-1a8c-4f80-abc6-c818c38567c3
> Total devices 3 FS bytes used 1.78TiB
> devid 1 size 931.51GiB used 931.51GiB path /dev/sdc1
> devid 2 size 931.51GiB used 931.51GiB path /dev/sdb1
> devid 3 size 230.41GiB used 0.00B path /dev/sdd1
>$ sudo /usr/bin/btrfs fi usage /mnt/xxx/
>Overall:
> Device size: 2.04TiB
> Device allocated: 1.82TiB
> Device unallocated: 230.41GiB
> Device missing: 0.00B
> Used: 1.78TiB
> Free (estimated): 267.23GiB (min: 152.03GiB)
> Data ratio: 1.00
> Metadata ratio: 2.00
> Global reserve: 512.00MiB (used: 0.00B)
>
>Data,RAID0: Size:1.81TiB, Used:1.78TiB
> /dev/sdb1 928.48GiB
> /dev/sdc1 928.48GiB
>
>Metadata,RAID1: Size:3.00GiB, Used:2.30GiB
> /dev/sdb1 3.00GiB
> /dev/sdc1 3.00GiB
>
>System,RAID1: Size:32.00MiB, Used:176.00KiB
> /dev/sdb1 32.00MiB
> /dev/sdc1 32.00MiB
>
>Unallocated:
> /dev/sdb1 1.01MiB
> /dev/sdc1 1.00MiB
> /dev/sdd1 230.41GiB
>$ sudo /usr/bin/btrfs balance start -dusage=66 /mnt/xxx/
>Done, had to relocate 0 out of 935 chunks
>$ sudo /usr/bin/btrfs balance start -dusage=67 /mnt/xxx/
>ERROR: error during balancing '/mnt/xxx/': No space left on device
>There may be more info in syslog - try dmesg | tail
I assume that there is something wrong with metadata, since I can copy
files to FS.
I'm on 4.6.2 vanilla kernel and using btrfs-progs-4.6, btrfs-debugfs
output can be found here:
https://gist.githubusercontent.com/ojab/1a8b1f83341403a169a8e66995c7c3da/raw/61621d22f706d7543a93a3d005415543af9a0db0/gistfile1.txt.
Any hint what else can I try to fix the issue?
//wbr ojab
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-07-04 8:32 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CAKzrAgSGRQk_wEairoCUhK6GDCFOVbVWJLub4M_fu7uHC-pO0w@mail.gmail.com>
2016-06-15 10:59 ` Cannot balance FS (No space left on device) ojab //
2016-06-15 12:41 ` E V
2016-06-15 19:29 ` ojab //
2016-06-10 18:04 ojab //
2016-06-10 21:00 ` Henk Slager
2016-06-10 21:33 ` ojab //
2016-06-10 21:56 ` Hans van Kranenburg
2016-06-10 22:10 ` ojab //
2016-06-10 22:39 ` Hans van Kranenburg
2016-06-13 12:33 ` Austin S. Hemmelgarn
2016-07-02 15:07 ` Hans van Kranenburg
2016-07-02 19:03 ` Chris Murphy
2016-07-04 8:32 ` ojab //
2016-06-12 22:00 ` ojab //
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.