All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
To: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>, ojab // <ojab@ojab.ru>
Cc: Henk Slager <eye1tm@gmail.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Cannot balance FS (No space left on device)
Date: Sat, 2 Jul 2016 17:07:27 +0200	[thread overview]
Message-ID: <5785a25d-0bf4-53c9-0b85-eb50be140062@mendix.com> (raw)
In-Reply-To: <57d127c6-8362-8d47-79c2-19f60d930858@gmail.com>

On 06/13/2016 02:33 PM, Austin S. Hemmelgarn wrote:
> On 2016-06-10 18:39, Hans van Kranenburg wrote:
>> On 06/11/2016 12:10 AM, ojab // wrote:
>>> On Fri, Jun 10, 2016 at 9:56 PM, Hans van Kranenburg
>>> <hans.van.kranenburg@mendix.com> wrote:
>>>> You can work around it by either adding two disks (like Henk said),
>>>> or by
>>>> temporarily converting some chunks to single. Just enough to get some
>>>> free
>>>> space on the first two disks to get a balance going that can fill the
>>>> third
>>>> one. You don't have to convert all of your data or metadata to single!
>>>>
>>>> Something like:
>>>>
>>>> btrfs balance start -v -dconvert=single,limit=10 /mnt/xxx/
>>>
>>> Unfortunately it fails even if I set limit=1:
>>>> $ sudo btrfs balance start -v -dconvert=single,limit=1 /mnt/xxx/
>>>> Dumping filters: flags 0x1, state 0x0, force is off
>>>>   DATA (flags 0x120): converting, target=281474976710656, soft is
>>>> off, limit=1
>>>> ERROR: error during balancing '/mnt/xxx/': No space left on device
>>>> There may be more info in syslog - try dmesg | tail
>>
>> Ah, apparently the balance operation *always* wants to allocate some new
>> empty space before starting to look more close at the task you give it...
> No, that's not exactly true.  It seems to be a rather common fallacy
> right now that balance repacks data into existing chunks, which is
> absolutely false.  What a balance does is to send everything selected by
> the filters through the allocator again, and specifically prevent any
> existing chunks from being used to satisfy the allocation.  When you
> have 5 data chunks that are 20% used and run 'balance -dlimit=20', it
> doesn't pack that all into the first chunk, it allocates a new chunk,
> and then packs it all into that, then frees all the other chunks.  This
> behavior is actually a pretty important property when adding or removing
> devices or converting between profiles, because it's what forces things
> into the new configuration of the filesystem.
>
> In an ideal situation, the limit filters should make it repack into
> existing chunks when specified alone, but currently that's not how it
> works, and I kind of doubt that that will ever be how it works.

I have to disagree with you here, based on what I see happening. Two 
examples will follow, providing some pudding for the proof.

Also, the behaviour of *always* creating a new empty block group before 
starting to work (which makes it impossible to free up space on a fully 
allocated filesystem with balance) got reverted in:

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=cf25ce518e8ef9d59b292e51193bed2b023a32da

This patch is in 4.5 and 4.7-rc, but *not* in 4.6.

Script used to provide block group output, using pyton-btrfs:

-# cat show_block_groups.py
#!/usr/bin/python

from __future__ import print_function
import btrfs
import sys

fs = btrfs.FileSystem(sys.argv[1])
for chunk in fs.chunks():
     print(fs.block_group(chunk.vaddr, chunk.length))

Example 1:

-# uname -a
Linux ichiban 4.5.0-0.bpo.2-amd64 #1 SMP Debian 4.5.4-1~bpo8+1 
(2016-05-13) x86_64 GNU/Linux

-# ./show_block_groups.py /
block group vaddr 86211821568 length 1073741824 flags DATA used 
837120000 used_pct 78
block group vaddr 87285563392 length 33554432 flags SYSTEM used 16384 
used_pct 0
block group vaddr 87319117824 length 1073741824 flags DATA used 
1070030848 used_pct 100
block group vaddr 88392859648 length 1073741824 flags DATA used 
1057267712 used_pct 98
block group vaddr 89466601472 length 1073741824 flags DATA used 
1066360832 used_pct 99
block group vaddr 90540343296 length 268435456 flags METADATA used 
238256128 used_pct 89
block group vaddr 90808778752 length 268435456 flags METADATA used 
226082816 used_pct 84
block group vaddr 91077214208 length 268435456 flags METADATA used 
242548736 used_pct 90
block group vaddr 91345649664 length 268435456 flags METADATA used 
218415104 used_pct 81
block group vaddr 91614085120 length 268435456 flags METADATA used 
223723520 used_pct 83
block group vaddr 91882520576 length 268435456 flags METADATA used 
68272128 used_pct 25
block group vaddr 92150956032 length 1073741824 flags DATA used 
1048154112 used_pct 98
block group vaddr 93224697856 length 1073741824 flags DATA used 
800985088 used_pct 75
block group vaddr 94298439680 length 1073741824 flags DATA used 62197760 
used_pct 6
block group vaddr 95372181504 length 1073741824 flags DATA used 49541120 
used_pct 5
block group vaddr 96445923328 length 1073741824 flags DATA used 
142856192 used_pct 13
block group vaddr 97519665152 length 1073741824 flags DATA used 
102051840 used_pct 10

Now do a balance, to remove the least used block group:

1st terminal:
-# watch -d './show_block_groups.py /'

2nd terminal:
-# btrfs balance start -v -dusage=5 /
Dumping filters: flags 0x1, state 0x0, force is off
   DATA (flags 0x2): balancing, usage=5
Done, had to relocate 1 out of 17 chunks

After:

-# ./show_block_groups.py /
block group vaddr 86211821568 length 1073741824 flags DATA used 
837120000 used_pct 78
block group vaddr 87285563392 length 33554432 flags SYSTEM used 16384 
used_pct 0
block group vaddr 87319117824 length 1073741824 flags DATA used 
1070030848 used_pct 100
block group vaddr 88392859648 length 1073741824 flags DATA used 
1057267712 used_pct 98
block group vaddr 89466601472 length 1073741824 flags DATA used 
1066360832 used_pct 99
block group vaddr 90540343296 length 268435456 flags METADATA used 
236830720 used_pct 88
block group vaddr 90808778752 length 268435456 flags METADATA used 
224100352 used_pct 83
block group vaddr 91077214208 length 268435456 flags METADATA used 
248299520 used_pct 92
block group vaddr 91345649664 length 268435456 flags METADATA used 
218333184 used_pct 81
block group vaddr 91614085120 length 268435456 flags METADATA used 
223117312 used_pct 83
block group vaddr 91882520576 length 268435456 flags METADATA used 
66551808 used_pct 25
block group vaddr 92150956032 length 1073741824 flags DATA used 
1048154112 used_pct 98
block group vaddr 93224697856 length 1073741824 flags DATA used 
800985088 used_pct 75
block group vaddr 94298439680 length 1073741824 flags DATA used 62033920 
used_pct 6
block group vaddr 96445923328 length 1073741824 flags DATA used 
142331904 used_pct 13
block group vaddr 97519665152 length 1073741824 flags DATA used 
152297472 used_pct 14
block group vaddr 98593406976 length 1073741824 flags DATA used 0 used_pct 0

First, the new empty block group is created, after that (using the 
watch) I can see the data from 95372181504 moving into an existing bg at 
97519665152. The empty one is left behind.

Second example:

-# uname -a
Linux mekker 4.7.0-rc4-amd64 #1 SMP Debian 4.7~rc4-1~exp1 (2016-06-20) 
x86_64 GNU/Linux

-# ./show_block_groups.py /
block group vaddr 21630025728 length 33554432 flags SYSTEM used 4096 
used_pct 0
block group vaddr 21663580160 length 268435456 flags METADATA used 
108011520 used_pct 40
block group vaddr 21932015616 length 268435456 flags METADATA used 
171769856 used_pct 64
block group vaddr 22200451072 length 268435456 flags METADATA used 
89567232 used_pct 33
block group vaddr 22468886528 length 1073741824 flags DATA used 
1059094528 used_pct 99
block group vaddr 24616370176 length 1073741824 flags DATA used 
1024077824 used_pct 95
block group vaddr 25690112000 length 1073741824 flags DATA used 
661626880 used_pct 62
block group vaddr 27837595648 length 1073741824 flags DATA used 
824950784 used_pct 77
block group vaddr 28911337472 length 1073741824 flags DATA used 
939896832 used_pct 88
block group vaddr 31058821120 length 1073741824 flags DATA used 
816013312 used_pct 76
block group vaddr 32132562944 length 1073741824 flags DATA used 
984100864 used_pct 92
block group vaddr 33206304768 length 1073741824 flags DATA used 
541122560 used_pct 50
block group vaddr 36427530240 length 268435456 flags METADATA used 
79302656 used_pct 30
block group vaddr 58528366592 length 1073741824 flags DATA used 
579461120 used_pct 54
block group vaddr 69265784832 length 1073741824 flags DATA used 
462090240 used_pct 43
block group vaddr 70339526656 length 1073741824 flags DATA used 
700502016 used_pct 65
block group vaddr 71413268480 length 1073741824 flags DATA used 
255000576 used_pct 24
block group vaddr 72487010304 length 1073741824 flags DATA used 
348327936 used_pct 32
block group vaddr 73560752128 length 1073741824 flags DATA used 
476127232 used_pct 44
block group vaddr 75708235776 length 1073741824 flags DATA used 
301572096 used_pct 28
block group vaddr 76781977600 length 1073741824 flags DATA used 
476241920 used_pct 44
block group vaddr 77855719424 length 1073741824 flags DATA used 
844894208 used_pct 79

Now, let's do a balance that will remove the bg at 71413268480:

-# btrfs balance start -v -dusage=25 .
Dumping filters: flags 0x1, state 0x0, force is off
   DATA (flags 0x2): balancing, usage=25
Done, had to relocate 1 out of 22 chunks

Result:

-# ./show_block_groups.py /
block group vaddr 21630025728 length 33554432 flags SYSTEM used 4096 
used_pct 0
block group vaddr 21663580160 length 268435456 flags METADATA used 
107319296 used_pct 40
block group vaddr 21932015616 length 268435456 flags METADATA used 
175788032 used_pct 65
block group vaddr 22200451072 length 268435456 flags METADATA used 
89026560 used_pct 33
block group vaddr 22468886528 length 1073741824 flags DATA used 
1059090432 used_pct 99
block group vaddr 24616370176 length 1073741824 flags DATA used 
1061240832 used_pct 99
block group vaddr 25690112000 length 1073741824 flags DATA used 
879472640 used_pct 82
block group vaddr 27837595648 length 1073741824 flags DATA used 
824950784 used_pct 77
block group vaddr 28911337472 length 1073741824 flags DATA used 
939896832 used_pct 88
block group vaddr 31058821120 length 1073741824 flags DATA used 
816013312 used_pct 76
block group vaddr 32132562944 length 1073741824 flags DATA used 
984100864 used_pct 92
block group vaddr 33206304768 length 1073741824 flags DATA used 
541122560 used_pct 50
block group vaddr 36427530240 length 268435456 flags METADATA used 
76374016 used_pct 28
block group vaddr 58528366592 length 1073741824 flags DATA used 
579461120 used_pct 54
block group vaddr 69265784832 length 1073741824 flags DATA used 
462090240 used_pct 43
block group vaddr 70339526656 length 1073741824 flags DATA used 
700502016 used_pct 65
block group vaddr 72487010304 length 1073741824 flags DATA used 
348327936 used_pct 32
block group vaddr 73560752128 length 1073741824 flags DATA used 
476127232 used_pct 44
block group vaddr 75708235776 length 1073741824 flags DATA used 
301572096 used_pct 28
block group vaddr 76781977600 length 1073741824 flags DATA used 
476241920 used_pct 44
block group vaddr 77855719424 length 1073741824 flags DATA used 
844886016 used_pct 79

No new empty block group, yay. Data from the 24% filled at 71413268480 
moved into existing block groups 24616370176 and 25690112000.

-- 
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenburg@mendix.com | www.mendix.com

  reply	other threads:[~2016-07-02 15:07 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-10 18:04 Cannot balance FS (No space left on device) ojab //
2016-06-10 21:00 ` Henk Slager
2016-06-10 21:33   ` ojab //
2016-06-10 21:56     ` Hans van Kranenburg
2016-06-10 22:10       ` ojab //
2016-06-10 22:39         ` Hans van Kranenburg
2016-06-13 12:33           ` Austin S. Hemmelgarn
2016-07-02 15:07             ` Hans van Kranenburg [this message]
2016-07-02 19:03               ` Chris Murphy
2016-07-04  8:32                 ` ojab //
2016-06-12 22:00   ` ojab //
     [not found] <CAKzrAgSGRQk_wEairoCUhK6GDCFOVbVWJLub4M_fu7uHC-pO0w@mail.gmail.com>
2016-06-15 10:59 ` ojab //
2016-06-15 12:41   ` E V
2016-06-15 19:29     ` ojab //

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5785a25d-0bf4-53c9-0b85-eb50be140062@mendix.com \
    --to=hans.van.kranenburg@mendix.com \
    --cc=ahferroin7@gmail.com \
    --cc=eye1tm@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=ojab@ojab.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.