All of lore.kernel.org
 help / color / mirror / Atom feed
* how to run balance successfully (No space left on device)?
@ 2017-09-17 15:02 Tomasz Chmielewski
  2017-09-18  1:50 ` Duncan
  2017-09-18  8:20 ` Tomasz Chmielewski
  0 siblings, 2 replies; 14+ messages in thread
From: Tomasz Chmielewski @ 2017-09-17 15:02 UTC (permalink / raw)
  To: linux-btrfs

I'm trying to run balance on a 4.13.2 kernel without much luck:

# time btrfs balance start -v /var/lib/lxd -dusage=5 -musage=5
Dumping filters: flags 0x7, state 0x0, force is off
   DATA (flags 0x2): balancing, usage=5
   METADATA (flags 0x2): balancing, usage=5
   SYSTEM (flags 0x2): balancing, usage=5
Done, had to relocate 1 out of 353 chunks

real    0m2.356s
user    0m0.005s
sys     0m0.175s


# time btrfs balance start -v /var/lib/lxd -dusage=0 -musage=0
Dumping filters: flags 0x7, state 0x0, force is off
   DATA (flags 0x2): balancing, usage=0
   METADATA (flags 0x2): balancing, usage=0
   SYSTEM (flags 0x2): balancing, usage=0
Done, had to relocate 0 out of 353 chunks

real    0m0.076s
user    0m0.004s
sys     0m0.008s


# time btrfs balance start -v /var/lib/lxd
Dumping filters: flags 0x7, state 0x0, force is off
   DATA (flags 0x0): balancing
   METADATA (flags 0x0): balancing
   SYSTEM (flags 0x0): balancing
WARNING:

         Full balance without filters requested. This operation is very
         intense and takes potentially very long. It is recommended to
         use the balance filters to narrow down the balanced data.
         Use 'btrfs balance start --full-balance' option to skip this
         warning. The operation will start in 10 seconds.
         Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
ERROR: error during balancing '/var/lib/lxd': No space left on device
There may be more info in syslog - try dmesg | tail

real    284m58.541s
user    0m0.000s
sys     47m39.037s




# df -h /var/lib/lxd
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3       424G  318G  105G  76% /var/lib/lxd


# btrfs fi df /var/lib/lxd
Data, RAID1: total=318.00GiB, used=313.82GiB
System, RAID1: total=32.00MiB, used=80.00KiB
Metadata, RAID1: total=5.00GiB, used=3.17GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


# btrfs fi show /var/lib/lxd
Label: 'btrfs'  uuid: f5f30428-ec5b-4497-82de-6e20065e6f61
         Total devices 2 FS bytes used 316.98GiB
         devid    1 size 423.13GiB used 323.03GiB path /dev/sda3
         devid    2 size 423.13GiB used 323.03GiB path /dev/sdb3


# btrfs fi usage /var/lib/lxd
Overall:
     Device size:                 846.25GiB
     Device allocated:            646.06GiB
     Device unallocated:          200.19GiB
     Device missing:                  0.00B
     Used:                        633.97GiB
     Free (estimated):            104.28GiB      (min: 104.28GiB)
     Data ratio:                       2.00
     Metadata ratio:                   2.00
     Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID1: Size:318.00GiB, Used:313.82GiB
    /dev/sda3     318.00GiB
    /dev/sdb3     318.00GiB

Metadata,RAID1: Size:5.00GiB, Used:3.17GiB
    /dev/sda3       5.00GiB
    /dev/sdb3       5.00GiB

System,RAID1: Size:32.00MiB, Used:80.00KiB
    /dev/sda3      32.00MiB
    /dev/sdb3      32.00MiB

Unallocated:
    /dev/sda3     100.10GiB
    /dev/sdb3     100.10GiB


Mount flags in /etc/fstab are:

LABEL=btrfs /var/lib/lxd btrfs 
defaults,noatime,space_cache=v2,device=/dev/sda3,device=/dev/sdb3,discard 
0 0



Last pieces logged in dmesg:

[46867.225334] BTRFS info (device sda3): relocating block group 
2996254998528 flags data|raid1
[46874.563631] BTRFS info (device sda3): found 9250 extents
[46894.827895] BTRFS info (device sda3): found 9250 extents
[46898.463053] BTRFS info (device sda3): found 201 extents
[46898.562564] BTRFS info (device sda3): relocating block group 
2995181256704 flags data|raid1
[46903.555976] BTRFS info (device sda3): found 7299 extents
[46914.188044] BTRFS info (device sda3): found 7299 extents
[46914.303476] BTRFS info (device sda3): relocating block group 
2947936616448 flags metadata|raid1
[46939.570810] BTRFS info (device sda3): found 42022 extents
[46945.053488] BTRFS info (device sda3): 2 enospc errors during balance



Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to run balance successfully (No space left on device)?
  2017-09-17 15:02 how to run balance successfully (No space left on device)? Tomasz Chmielewski
@ 2017-09-18  1:50 ` Duncan
  2017-09-18  8:20 ` Tomasz Chmielewski
  1 sibling, 0 replies; 14+ messages in thread
From: Duncan @ 2017-09-18  1:50 UTC (permalink / raw)
  To: linux-btrfs

Tomasz Chmielewski posted on Mon, 18 Sep 2017 00:02:46 +0900 as excerpted:

> I'm trying to run balance on a 4.13.2 kernel without much luck:
> 
> # time btrfs balance start -v /var/lib/lxd -dusage=5 -musage=5
> [works, but only 1 chunk balanced]

> # time btrfs balance start -v /var/lib/lxd -dusage=0 -musage=0
> [no chunks with 0 usage to balance]
> 
> 
> # time btrfs balance start -v /var/lib/lxd
> [...]
> ERROR: error during balancing '/var/lib/lxd': No space left on device

OK, that fails.  Let's see what your unallocated space looks like, 
below...

> # df -h /var/lib/lxd

FWIW, standard (aka util-linux) df is effectively useless in a situation 
such as this, as it really doesn't give you the information you need (it 
can say you have lots of space available, but if btrfs has all of it 
allocated into chunks, even if the chunks have space in them still, there 
can be problems).

And actually, (util-linux) df really doesn't give you a whole lot of 
useful information on a btrfs in enough cases that most list regulars 
tend to discount its output almost entirely.  The only thing it's really 
useful for is getting a reasonable idea as to whether your next major 
file operation can be expected to succeed or not -- if it says you have 
50 MB left and you're trying to put a new 1 GiB file on the btrfs, it's 
unlikely to work, but if it says you have 300 GiB left in a multi-TB 
multi-device filesystem, you might have 300, or 3000 (its estimates are 
deliberately on the pessimistic side).

For better numbers, always use the btrfs tools, btrfs fi usage is the one 
I tend to use most, but btrfs dev usage can be very useful if you're more 
interested in a per-device listing, and btrfs fi show combined with btrfs 
fi df provide much the same information, tho it needs a bit more 
interpreting.

But you do provide them too. =:^)

> # btrfs fi df /var/lib/lxd
> Data, RAID1: total=318.00GiB, used=313.82GiB
> System, RAID1: total=32.00MiB, used=80.00KiB
> Metadata, RAID1: total=5.00GiB, used=3.17GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B

Looks reasonably healthy.  No global reserve used, good as that's a major 
indicator of problems, and data and metadata usage is reasonably close to 
totals -- no huge number of mostly empty allocated chunks.

> # btrfs fi show /var/lib/lxd Label: 'btrfs'  uuid:
> f5f30428-ec5b-4497-82de-6e20065e6f61
>          Total devices 2 FS bytes used 316.98GiB
>          devid    1 size 423.13GiB used 323.03GiB path /dev/sda3
>          devid    2 size 423.13GiB used 323.03GiB path /dev/sdb3

OK, given the ENOSPC error on balance above, those device lines are the 
real interesting numbers, and...

Healthy here too.  Very much so, in fact, as only 323 gigs out of 423 is 
allocated on each device -- 100 gigs not chunk-allocated and therefore 
free for chunk allocation on each device. =:^)

The ENOSPC is therefore a bug -- it shouldn't be happening.

And as it happens, AFAIK from reading the list, there's a currently known 
bug with over-reservation under certain circumstances that among other 
things, can (wrongly) trigger ENOSPC on balances, when there's plenty of 
space.

Also AFAIK, there's a patch on-list and (I think) in 4.14-rc1, that is I 
believe marked for stable as well, that will very likely fix your 
problem.  If it doesn't, there's another bug triggering similar symptoms.

But I'm not a dev and haven't been tracking the specific patch, so you'll 
need to either track it down (or wait to see if a dev or someone else 
points you at it) and apply it on your 4.13.x, or wait until it hits 
stable backports and you can get it there, or try 4.14-rc1 or wait until 
later/safer rcs or full release.

Meanwhile...

> # btrfs fi usage /var/lib/lxd Overall:
>      Device size:                 846.25GiB
>      Device allocated:            646.06GiB
>      Device unallocated:          200.19GiB
>      Device missing:                  0.00B
>      Used:                        633.97GiB
>      Free (estimated):            104.28GiB      (min: 104.28GiB)
>      Data ratio:                       2.00
>      Metadata ratio:                   2.00
>      Global reserve:              512.00MiB      (used: 0.00B)
> 
> Data,RAID1: Size:318.00GiB, Used:313.82GiB
>     /dev/sda3     318.00GiB
>     /dev/sdb3     318.00GiB
> 
> Metadata,RAID1: Size:5.00GiB, Used:3.17GiB
>     /dev/sda3       5.00GiB
>     /dev/sdb3       5.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:80.00KiB
>     /dev/sda3      32.00MiB
>     /dev/sdb3      32.00MiB
> 
> Unallocated:
>     /dev/sda3     100.10GiB
>     /dev/sdb3     100.10GiB

As I said above, btrfs fi usage output provides much of the same info, 
but in a much nicer format and with a bit more detail, than the 
combination of btrfs fi show and btrfs fi df.

This confirms the above 100 gigs per device unallocated, plenty for a 
balance if it's not bugging out, and data and metadata chunk usage in the 
same ball park as the totals, so as I said above, the ENOSPC during 
balance is very definitely a bug.  Everything looks healthy, which means 
an ENOSPC during balance /must/ be a bug, because it simply shouldn't be 
happening.

But chances are pretty good that one you get that patch integrated, 
whether by integrating it yourself to what you have currently, or by 
trying 4.14-rc1 or waiting until it hits release or stable, that bug will 
have been squashed! =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to run balance successfully (No space left on device)?
  2017-09-17 15:02 how to run balance successfully (No space left on device)? Tomasz Chmielewski
  2017-09-18  1:50 ` Duncan
@ 2017-09-18  8:20 ` Tomasz Chmielewski
  2017-09-18  8:29   ` Andrei Borzenkov
  2017-10-31 14:18   ` Tomasz Chmielewski
  1 sibling, 2 replies; 14+ messages in thread
From: Tomasz Chmielewski @ 2017-09-18  8:20 UTC (permalink / raw)
  To: linux-btrfs

>> # df -h /var/lib/lxd
>> 
>> FWIW, standard (aka util-linux) df is effectively useless in a 
>> situation
>> such as this, as it really doesn't give you the information you need 
>> (it
>> can say you have lots of space available, but if btrfs has all of it
>> allocated into chunks, even if the chunks have space in them still, 
>> there
>> can be problems).

I see here on RAID-1, "df -h" it shows pretty much the same amount of 
free space as "btrfs fi show":

- "df -h" shows 105G free
- "btrfs fi show" says: Free (estimated):            104.28GiB      
(min: 104.28GiB)



> But chances are pretty good that one you get that patch integrated,
> whether by integrating it yourself to what you have currently, or by
> trying 4.14-rc1 or waiting until it hits release or stable, that bug 
> will
> have been squashed! =:^)

OK, will wait for 4.14.


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to run balance successfully (No space left on device)?
  2017-09-18  8:20 ` Tomasz Chmielewski
@ 2017-09-18  8:29   ` Andrei Borzenkov
  2017-09-18  9:27     ` Tomasz Chmielewski
  2017-10-31 14:18   ` Tomasz Chmielewski
  1 sibling, 1 reply; 14+ messages in thread
From: Andrei Borzenkov @ 2017-09-18  8:29 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs

On Mon, Sep 18, 2017 at 11:20 AM, Tomasz Chmielewski <mangoo@wpkg.org> wrote:
>>> # df -h /var/lib/lxd
>>>
>>> FWIW, standard (aka util-linux) df is effectively useless in a situation
>>> such as this, as it really doesn't give you the information you need (it
>>> can say you have lots of space available, but if btrfs has all of it
>>> allocated into chunks, even if the chunks have space in them still, there
>>> can be problems).
>
>
> I see here on RAID-1, "df -h" it shows pretty much the same amount of free
> space as "btrfs fi show":
>
> - "df -h" shows 105G free
> - "btrfs fi show" says: Free (estimated):            104.28GiB      (min:
> 104.28GiB)
>

I think both use the same algorithm to compute free space (df at the
end just shows what kernel returns). The problem is that this
algorithm itself is just approximation in general case. For uniform
RAID1 profile it should be correct though.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to run balance successfully (No space left on device)?
  2017-09-18  8:29   ` Andrei Borzenkov
@ 2017-09-18  9:27     ` Tomasz Chmielewski
  2017-09-18 13:44       ` Peter Becker
  2017-09-19  2:59       ` Duncan
  0 siblings, 2 replies; 14+ messages in thread
From: Tomasz Chmielewski @ 2017-09-18  9:27 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: linux-btrfs

On 2017-09-18 17:29, Andrei Borzenkov wrote:
> On Mon, Sep 18, 2017 at 11:20 AM, Tomasz Chmielewski <mangoo@wpkg.org> 
> wrote:
>>>> # df -h /var/lib/lxd
>>>> 
>>>> FWIW, standard (aka util-linux) df is effectively useless in a 
>>>> situation
>>>> such as this, as it really doesn't give you the information you need 
>>>> (it
>>>> can say you have lots of space available, but if btrfs has all of it
>>>> allocated into chunks, even if the chunks have space in them still, 
>>>> there
>>>> can be problems).
>> 
>> 
>> I see here on RAID-1, "df -h" it shows pretty much the same amount of 
>> free
>> space as "btrfs fi show":
>> 
>> - "df -h" shows 105G free
>> - "btrfs fi show" says: Free (estimated):            104.28GiB      
>> (min:
>> 104.28GiB)
>> 
> 
> I think both use the same algorithm to compute free space (df at the
> end just shows what kernel returns). The problem is that this
> algorithm itself is just approximation in general case. For uniform
> RAID1 profile it should be correct though.

And perhaps more important - can I assume that right now, with the 
latest stable kernel (4.13.2 right now), running "btrfs balance" is not 
safe and can lead to data corruption or loss?


Consider the following case:

- system admin runs btrfs balance on a filesystem with 100 GB free and 
assumes it is enough space to complete successfully

- btrfs balance fails due to some bug with "No space left on device"

- at the same time, a database using this filesystem will fail with "No 
space left on device", apt/rpm will fail a package upgrade, some program 
using temp space will fail, log collector will fail to catch some data, 
because of "No space left on device" and so on?



Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to run balance successfully (No space left on device)?
  2017-09-18  9:27     ` Tomasz Chmielewski
@ 2017-09-18 13:44       ` Peter Becker
  2017-09-18 13:50         ` Tomasz Chmielewski
  2017-09-19  2:59       ` Duncan
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Becker @ 2017-09-18 13:44 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: Andrei Borzenkov, linux-btrfs

i'm not sure if it would help, but maybe you could try adding an 8GB
(or more) USB flash drive to the pool and try to start balance.
if it works out, you can throw him out of the pool after that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to run balance successfully (No space left on device)?
  2017-09-18 13:44       ` Peter Becker
@ 2017-09-18 13:50         ` Tomasz Chmielewski
  0 siblings, 0 replies; 14+ messages in thread
From: Tomasz Chmielewski @ 2017-09-18 13:50 UTC (permalink / raw)
  To: Peter Becker; +Cc: Andrei Borzenkov, linux-btrfs

On 2017-09-18 22:44, Peter Becker wrote:
> i'm not sure if it would help, but maybe you could try adding an 8GB
> (or more) USB flash drive to the pool and try to start balance.
> if it works out, you can throw him out of the pool after that.

I really can't, it's an "online server".

But I've removed some 65 GB data, so now it's 171 GB free, or, 60% used 
filesystem.

The balance still fails.


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to run balance successfully (No space left on device)?
  2017-09-18  9:27     ` Tomasz Chmielewski
  2017-09-18 13:44       ` Peter Becker
@ 2017-09-19  2:59       ` Duncan
  1 sibling, 0 replies; 14+ messages in thread
From: Duncan @ 2017-09-19  2:59 UTC (permalink / raw)
  To: linux-btrfs

Tomasz Chmielewski posted on Mon, 18 Sep 2017 18:27:09 +0900 as excerpted:

> And perhaps more important - can I assume that right now, with the
> latest stable kernel (4.13.2 right now), running "btrfs balance" is not
> safe and can lead to data corruption or loss?
> 
> 
> Consider the following case:
> 
> - system admin runs btrfs balance on a filesystem with 100 GB free and
> assumes it is enough space to complete successfully
> 
> - btrfs balance fails due to some bug with "No space left on device"
> 
> - at the same time, a database using this filesystem will fail with "No
> space left on device", apt/rpm will fail a package upgrade, some program
> using temp space will fail, log collector will fail to catch some data,
> because of "No space left on device" and so on?

To the best of my knowledge that shouldn't be a problem, certainly not 
one I'd worry about if you're following the sysadmin's first rule of 
backups, the true value of data to you is defined not by any claims but 
by the number of backups you consider it worth having of that data, so it 
follows that no backups means you've defined the data as worth less than 
the time/trouble/resources it would take to create at least that one 
backup.

The ENOSPC is because the internal calculation for the reserved-space 
requirement is buggy ATM, but AFAIK it's just that, an /internal/ 
calculation, that goes waayyy wild, and stops any action it's going to 
stop before it goes anywhere -- it doesn't get to the point of affecting 
anything else because the reserve space calculation goes wild and stops 
it before it can actually reserve the space.

Talking about which... I've not seen it mentioned in the bug discussion, 
but I wonder if doing a btrfs balance start -d, followed by a another 
balance with -m replacing the -d, thus separating the data and metadata 
balances, might work around the problem.  At least you could know for 
sure which is causing it that way, and complete a balance of the other 
one.  And if that blocks on one or the other, you could split the job up 
further using the devid= and drange= filters (see the btrfs-balance 
manpage), doing only part of the filesystem at a time.  My speculation is 
that you should be able to divide the operation up enough so that even if 
the reserve space calculation is off, it'll still complete.

Meanwhile, I don't believe it's just balance that's affected, either, tho 
it's the most commonly reported.  By my understanding, any sufficiently 
large operation could trigger it, tho obviously a full btrfs balance is 
about the largest operation a btrfs is likely to have, so it stands to 
reason that would trigger it more reliably than common generic filesystem 
operations.

Of course if you're paranoid, you can refrain from doing balances until 
you know the bug is fixed, but then I'd have to ask, if you're that 
paranoid of a filesystem failure, why are you running the still 
stabilizing, not yet entirely stable and mature, btrfs, in the first 
place?  Seems a bit like the folks still running RHEL/CentOS 6 with their 
stable kernels because they want stability, yet choosing to run the still 
not entirely stable btrfs, definitely not entirely stable on that old a 
kernel, on top of them.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to run balance successfully (No space left on device)?
  2017-09-18  8:20 ` Tomasz Chmielewski
  2017-09-18  8:29   ` Andrei Borzenkov
@ 2017-10-31 14:18   ` Tomasz Chmielewski
  2017-10-31 14:51     ` Tomasz Chmielewski
  2017-11-07  5:13     ` Tomasz Chmielewski
  1 sibling, 2 replies; 14+ messages in thread
From: Tomasz Chmielewski @ 2017-10-31 14:18 UTC (permalink / raw)
  To: linux-btrfs

On 2017-09-18 17:20, Tomasz Chmielewski wrote:
>>> # df -h /var/lib/lxd
>>> 
>>> FWIW, standard (aka util-linux) df is effectively useless in a 
>>> situation
>>> such as this, as it really doesn't give you the information you need 
>>> (it
>>> can say you have lots of space available, but if btrfs has all of it
>>> allocated into chunks, even if the chunks have space in them still, 
>>> there
>>> can be problems).
> 
> I see here on RAID-1, "df -h" it shows pretty much the same amount of
> free space as "btrfs fi show":
> 
> - "df -h" shows 105G free
> - "btrfs fi show" says: Free (estimated):            104.28GiB
> (min: 104.28GiB)
> 
> 
> 
>> But chances are pretty good that one you get that patch integrated,
>> whether by integrating it yourself to what you have currently, or by
>> trying 4.14-rc1 or waiting until it hits release or stable, that bug 
>> will
>> have been squashed! =:^)
> 
> OK, will wait for 4.14.

So I've tried to run balance with 4.14-rc6.

It succeeded on one server where it was failing with 4.13.x.


On a different server, however, it failed badly:

# time btrfs balance start /srv
WARNING:

         Full balance without filters requested. This operation is very
         intense and takes potentially very long. It is recommended to
         use the balance filters to narrow down the scope of balance.
         Use 'btrfs balance start --full-balance' option to skip this
         warning. The operation will start in 10 seconds.
         Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
ERROR: error during balancing '/srv': Read-only file system
There may be more info in syslog - try dmesg | tail

real    5194m41.749s
user    0m0.000s
sys     301m10.928s


[312304.050731] BTRFS info (device sda4): found 15073 extents
[313555.971253] BTRFS info (device sda4): relocating block group 
1208022466560 flags data|raid1
[314963.506580] BTRFS: Transaction aborted (error -28)
[314963.506608] ------------[ cut here ]------------
[314963.506639] WARNING: CPU: 2 PID: 27854 at 
/home/kernel/COD/linux/fs/btrfs/extent-tree.c:3089 
btrfs_run_delayed_refs+0x244/0x250 [btrfs]
[314963.506640] Modules linked in: vhost_net vhost tap xt_REDIRECT 
nf_nat_redirect xt_NFLOG nfnetlink_log nfnetlink xt_conntrack veth 
ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
nf_nat_ipv6 ip6table_filter ip6_tables xt_comment xt_CHECKSUM 
binfmt_misc iptable_mangle nf_log_ipv4 nf_log_common xt_LOG 
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 
xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc btrfs 
zstd_compress shpchp intel_rapl lpc_ich x86_pkg_temp_thermal 
intel_powerclamp input_leds tpm_infineon ie31200_edac serio_raw coretemp 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel kvm_intel pcbc kvm 
aesni_intel irqbypass aes_x86_64 mac_hid crypto_simd glue_helper cryptd 
intel_cstate
[314963.506684]  eeepc_wmi asus_wmi sparse_keymap intel_rapl_perf 
wmi_bmof nfsd auth_rpcgss nfs_acl lockd grace sunrpc lp parport autofs4 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 e1000e ahci 
libahci ptp pps_core wmi video
[314963.506710] CPU: 2 PID: 27854 Comm: sadc Tainted: G        W       
4.14.0-041400rc6-generic #201710230731
[314963.506711] Hardware name: System manufacturer System Product 
Name/P8B WS, BIOS 0904 10/24/2011
[314963.506713] task: ffff8bc0fd39ae00 task.stack: ffffb28d49490000
[314963.506732] RIP: 0010:btrfs_run_delayed_refs+0x244/0x250 [btrfs]
[314963.506734] RSP: 0018:ffffb28d49493d30 EFLAGS: 00010286
[314963.506736] RAX: 0000000000000026 RBX: 00000000ffffffe4 RCX: 
0000000000000000
[314963.506737] RDX: 0000000000000000 RSI: ffff8bc8afa8dc98 RDI: 
ffff8bc8afa8dc98
[314963.506738] RBP: ffffb28d49493d88 R08: 0000000000000001 R09: 
000000000000242b
[314963.506740] R10: ffffb28d49493c20 R11: 0000000000000000 R12: 
ffff8bc883a81078
[314963.506741] R13: ffff8bc887eb0000 R14: ffff8bc1876ec400 R15: 
000000000018ba90
[314963.506743] FS:  00007f62a12d9700(0000) GS:ffff8bc8afa80000(0000) 
knlGS:0000000000000000
[314963.506744] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[314963.506746] CR2: 00007f25f6f53880 CR3: 00000003cf4f7004 CR4: 
00000000000626e0
[314963.506747] Call Trace:
[314963.506773]  btrfs_commit_transaction+0x9b/0x8d0 [btrfs]
[314963.506799]  ? btrfs_wait_ordered_range+0x9c/0x110 [btrfs]
[314963.506821]  btrfs_sync_file+0x348/0x410 [btrfs]
[314963.506826]  vfs_fsync_range+0x4b/0xb0
[314963.506828]  do_fsync+0x3d/0x70
[314963.506831]  SyS_fdatasync+0x13/0x20
[314963.506834]  do_syscall_64+0x61/0x120
[314963.506838]  entry_SYSCALL64_slow_path+0x25/0x25
[314963.506840] RIP: 0033:0x7f62a0dfec30
[314963.506841] RSP: 002b:00007fffca89f288 EFLAGS: 00000246 ORIG_RAX: 
000000000000004b
[314963.506844] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 
00007f62a0dfec30
[314963.506845] RDX: 0000000000000000 RSI: 00007f62a10c47a0 RDI: 
0000000000000003
[314963.506846] RBP: 00007fffca89f400 R08: 00007f62a12d9700 R09: 
00007f62a12d9700
[314963.506847] R10: 00007fffca89f050 R11: 0000000000000246 R12: 
00000000ffffffff
[314963.506848] R13: 00007fffca89f440 R14: 00007fffca89f2a0 R15: 
00007fffca89f29c
[314963.506851] Code: fe ff 89 d9 ba 11 0c 00 00 48 c7 c6 40 48 67 c0 4c 
89 e7 e8 c5 bc 09 00 e9 b5 fe ff ff 89 de 48 c7 c7 f8 b4 67 c0 e8 2d 28 
51 d2 <0f> ff eb d3 e8 3a be 09 00 0f 1f 00 66 66 66 66 90 55 48 89 e5
[314963.506889] ---[ end trace b11381065314a695 ]---
[314963.506955] BTRFS: error (device sda4) in 
btrfs_run_delayed_refs:3089: errno=-28 No space left
[314963.507032] BTRFS info (device sda4): forced readonly
[314963.510570] BTRFS warning (device sda4): Skipping commit of aborted 
transaction.
[314963.510577] BTRFS: error (device sda4) in cleanup_transaction:1873: 
errno=-28 No space left
[314970.954768] mail[32290]: segfault at c0 ip 00007f6b507ae33b sp 
00007ffec4849ac0 error 4 in libmailutils.so.4.0.0[7f6b50724000+b0000]
[314983.475988] BTRFS error (device sda4): pending csums is 167936




# btrfs fi show /srv
Label: 'btrfs'  uuid: 105b2e0c-8af2-45ee-b4c8-14ff0a3ca899
         Total devices 2 FS bytes used 2.31TiB
         devid    1 size 2.63TiB used 2.32TiB path /dev/sda4
         devid    2 size 2.63TiB used 2.32TiB path /dev/sdb4


# btrfs fi df /srv
Data, RAID1: total=2.30TiB, used=2.29TiB
System, RAID1: total=32.00MiB, used=384.00KiB
Metadata, RAID1: total=22.00GiB, used=19.61GiB
GlobalReserve, single: total=512.00MiB, used=481.56MiB



# btrfs fi usage /srv
Overall:
     Device size:                   5.25TiB
     Device allocated:              4.63TiB
     Device unallocated:          633.97GiB
     Device missing:                  0.00B
     Used:                          4.62TiB
     Free (estimated):            319.11GiB      (min: 319.11GiB)
     Data ratio:                       2.00
     Metadata ratio:                   2.00
     Global reserve:              512.00MiB      (used: 481.56MiB)

Data,RAID1: Size:2.30TiB, Used:2.29TiB
    /dev/sda4       2.30TiB
    /dev/sdb4       2.30TiB

Metadata,RAID1: Size:22.00GiB, Used:19.61GiB
    /dev/sda4      22.00GiB
    /dev/sdb4      22.00GiB

System,RAID1: Size:32.00MiB, Used:384.00KiB
    /dev/sda4      32.00MiB
    /dev/sdb4      32.00MiB

Unallocated:
    /dev/sda4     316.99GiB
    /dev/sdb4     316.99GiB


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to run balance successfully (No space left on device)?
  2017-10-31 14:18   ` Tomasz Chmielewski
@ 2017-10-31 14:51     ` Tomasz Chmielewski
  2017-11-07  5:13     ` Tomasz Chmielewski
  1 sibling, 0 replies; 14+ messages in thread
From: Tomasz Chmielewski @ 2017-10-31 14:51 UTC (permalink / raw)
  To: linux-btrfs

On 2017-10-31 23:18, Tomasz Chmielewski wrote:

> On a different server, however, it failed badly:
> 
> # time btrfs balance start /srv
> WARNING:
> 
>         Full balance without filters requested. This operation is very
>         intense and takes potentially very long. It is recommended to
>         use the balance filters to narrow down the scope of balance.
>         Use 'btrfs balance start --full-balance' option to skip this
>         warning. The operation will start in 10 seconds.
>         Use Ctrl-C to stop it.
> 10 9 8 7 6 5 4 3 2 1
> Starting balance without any filters.
> ERROR: error during balancing '/srv': Read-only file system
> There may be more info in syslog - try dmesg | tail
> 
> [312304.050731] BTRFS info (device sda4): found 15073 extents
> [313555.971253] BTRFS info (device sda4): relocating block group
> 1208022466560 flags data|raid1
> [314963.506580] BTRFS: Transaction aborted (error -28)
> [314963.506608] ------------[ cut here ]------------
> [314963.506639] WARNING: CPU: 2 PID: 27854 at
> /home/kernel/COD/linux/fs/btrfs/extent-tree.c:3089
> btrfs_run_delayed_refs+0x244/0x250 [btrfs]

(...)

> [314963.506955] BTRFS: error (device sda4) in
> btrfs_run_delayed_refs:3089: errno=-28 No space left
> [314963.507032] BTRFS info (device sda4): forced readonly
> [314963.510570] BTRFS warning (device sda4): Skipping commit of
> aborted transaction.
> [314963.510577] BTRFS: error (device sda4) in
> cleanup_transaction:1873: errno=-28 No space left
> [314970.954768] mail[32290]: segfault at c0 ip 00007f6b507ae33b sp
> 00007ffec4849ac0 error 4 in libmailutils.so.4.0.0[7f6b50724000+b0000]
> [314983.475988] BTRFS error (device sda4): pending csums is 167936


And btrfs balance can be a real database killer :(


root@backupslave01:/var/log/mysql# tail -f mysql-error.log
InnoDB: Doing recovery: scanned up to log sequence number 2206178343424
InnoDB: Doing recovery: scanned up to log sequence number 2206183586304
InnoDB: Doing recovery: scanned up to log sequence number 2206188829184
InnoDB: Doing recovery: scanned up to log sequence number 2206194072064
InnoDB: Doing recovery: scanned up to log sequence number 2206199314944
InnoDB: Doing recovery: scanned up to log sequence number 2206204557824
InnoDB: Doing recovery: scanned up to log sequence number 2206209800704
InnoDB: Doing recovery: scanned up to log sequence number 2206215043584
InnoDB: Doing recovery: scanned up to log sequence number 2206220286464
InnoDB: Doing recovery: scanned up to log sequence number 2206220752384

InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 1 row operations to undo
InnoDB: Trx id counter is 21145843968
2017-10-31 14:46:59 4359 [Note] InnoDB: Starting an apply batch of log 
records to the database...
InnoDB: Progress in percent: 14:46:59 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this 
binary
or one of the libraries it was linked against is corrupt, improperly 
built,
or misconfigured. This error can also be caused by malfunctioning 
hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=0
max_threads=502
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 
232495 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0x8d444b]
/usr/sbin/mysqld(handle_fatal_signal+0x49a)[0x649b0a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f74b90bf390]
/usr/sbin/mysqld[0x99fcae]
/usr/sbin/mysqld[0x9a17ed]
/usr/sbin/mysqld[0x9881ea]
/usr/sbin/mysqld[0x989fc7]
/usr/sbin/mysqld[0xa6dd87]
/usr/sbin/mysqld[0xab8cd8]
/usr/sbin/mysqld[0xa08300]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f74b90b56ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f74b854a3dd]



Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to run balance successfully (No space left on device)?
  2017-10-31 14:18   ` Tomasz Chmielewski
  2017-10-31 14:51     ` Tomasz Chmielewski
@ 2017-11-07  5:13     ` Tomasz Chmielewski
       [not found]       ` <CAJtFHUQ34uyt-iAQKuQ-WqXMrCqxsPeqFc5LvYmZHrz+Rxs66A@mail.gmail.com>
  1 sibling, 1 reply; 14+ messages in thread
From: Tomasz Chmielewski @ 2017-11-07  5:13 UTC (permalink / raw)
  To: linux-btrfs

On 2017-10-31 23:18, Tomasz Chmielewski wrote:
> On 2017-09-18 17:20, Tomasz Chmielewski wrote:
>>>> # df -h /var/lib/lxd
>>>> 
>>>> FWIW, standard (aka util-linux) df is effectively useless in a 
>>>> situation
>>>> such as this, as it really doesn't give you the information you need 
>>>> (it
>>>> can say you have lots of space available, but if btrfs has all of it
>>>> allocated into chunks, even if the chunks have space in them still, 
>>>> there
>>>> can be problems).
>> 
>> I see here on RAID-1, "df -h" it shows pretty much the same amount of
>> free space as "btrfs fi show":
>> 
>> - "df -h" shows 105G free
>> - "btrfs fi show" says: Free (estimated):            104.28GiB
>> (min: 104.28GiB)
>> 
>> 
>> 
>>> But chances are pretty good that one you get that patch integrated,
>>> whether by integrating it yourself to what you have currently, or by
>>> trying 4.14-rc1 or waiting until it hits release or stable, that bug 
>>> will
>>> have been squashed! =:^)
>> 
>> OK, will wait for 4.14.
> 
> So I've tried to run balance with 4.14-rc6.

I've also tried with 4.14-rc7 on a server which was failing with "no 
space left" - unfortunately, it's still failing:


# time btrfs balance start /srv
WARNING:

         Full balance without filters requested. This operation is very
         intense and takes potentially very long. It is recommended to
         use the balance filters to narrow down the scope of balance.
         Use 'btrfs balance start --full-balance' option to skip this
         warning. The operation will start in 10 seconds.
         Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
ERROR: error during balancing '/srv': No space left on device
There may be more info in syslog - try dmesg | tail

real    8731m13.424s
user    0m0.000s
sys     560m36.363s



# dmesg -c
(...)
[546228.496902] BTRFS info (device sda4): relocating block group 
297455845376 flags data|raid1
[546251.393541] BTRFS info (device sda4): found 107799 extents
[546512.346360] BTRFS info (device sda4): found 107799 extents
[546529.407077] BTRFS info (device sda4): relocating block group 
296382103552 flags metadata|raid1
[546692.465746] BTRFS info (device sda4): found 35202 extents
[546733.294172] BTRFS info (device sda4): found 2586 extents
[546738.487556] BTRFS info (device sda4): relocating block group 
295308361728 flags data|raid1
[546770.474409] BTRFS info (device sda4): found 140906 extents
[547037.744023] BTRFS info (device sda4): found 140906 extents
[547065.840993] BTRFS info (device sda4): 117 enospc errors during 
balance


# btrfs fi df /srv
Data, RAID1: total=2.46TiB, used=2.35TiB
System, RAID1: total=32.00MiB, used=416.00KiB
Metadata, RAID1: total=19.00GiB, used=12.92GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


# btrfs fi show /srv
Label: 'btrfs'  uuid: 105b2e0c-8af2-45ee-b4c8-14ff0a3ca899
         Total devices 2 FS bytes used 2.36TiB
         devid    1 size 2.63TiB used 2.48TiB path /dev/sda4
         devid    2 size 2.63TiB used 2.48TiB path /dev/sdb4


# btrfs fi usage /srv
Overall:
     Device size:                   5.25TiB
     Device allocated:              4.96TiB
     Device unallocated:          302.00GiB
     Device missing:                  0.00B
     Used:                          4.72TiB
     Free (estimated):            268.66GiB      (min: 268.66GiB)
     Data ratio:                       2.00
     Metadata ratio:                   2.00
     Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID1: Size:2.46TiB, Used:2.35TiB
    /dev/sda4       2.46TiB
    /dev/sdb4       2.46TiB

Metadata,RAID1: Size:19.00GiB, Used:12.92GiB
    /dev/sda4      19.00GiB
    /dev/sdb4      19.00GiB

System,RAID1: Size:32.00MiB, Used:416.00KiB
    /dev/sda4      32.00MiB
    /dev/sdb4      32.00MiB

Unallocated:
    /dev/sda4     151.00GiB
    /dev/sdb4     151.00GiB


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to run balance successfully (No space left on device)?
       [not found]       ` <CAJtFHUQ34uyt-iAQKuQ-WqXMrCqxsPeqFc5LvYmZHrz+Rxs66A@mail.gmail.com>
@ 2017-11-10  7:42         ` Tomasz Chmielewski
  2017-11-10 21:51           ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: Tomasz Chmielewski @ 2017-11-10  7:42 UTC (permalink / raw)
  To: E V; +Cc: linux-btrfs

On 2017-11-07 23:49, E V wrote:

> Hmm, I used to see these phantom no space issues quite a bit on older
> 4.x kernels, and haven't seen them since switching to space_cache=v2.
> So it could be space cache corruption. You might try either clearing
> you space cache, or mounting with nospace_cache, or try converting to
> space_cache=v2 after reading up on it's caveats.

We have space_cache=v2.

Unfortunately yet one more system running 4.14-rc8 with "No space left" 
during balance:


[68443.535664] BTRFS info (device sdb3): relocating block group 
591771009024 flags data|raid1
[68463.203330] BTRFS info (device sdb3): found 8578 extents
[68492.238676] BTRFS info (device sdb3): found 8559 extents
[68500.751792] BTRFS info (device sdb3): 1 enospc errors during balance


# btrfs balance start /var/lib/lxd
WARNING:

         Full balance without filters requested. This operation is very
         intense and takes potentially very long. It is recommended to
         use the balance filters to narrow down the balanced data.
         Use 'btrfs balance start --full-balance' option to skip this
         warning. The operation will start in 10 seconds.
         Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
ERROR: error during balancing '/var/lib/lxd': No space left on device
There may be more info in syslog - try dmesg | tail


# btrfs fi usage /var/lib/lxd
Overall:
     Device size:                 846.26GiB
     Device allocated:            622.27GiB
     Device unallocated:          223.99GiB
     Device missing:                  0.00B
     Used:                        606.40GiB
     Free (estimated):            116.68GiB      (min: 116.68GiB)
     Data ratio:                       2.00
     Metadata ratio:                   2.00
     Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID1: Size:306.00GiB, Used:301.31GiB
    /dev/sda3     306.00GiB
    /dev/sdb3     306.00GiB

Metadata,RAID1: Size:5.10GiB, Used:1.89GiB
    /dev/sda3       5.10GiB
    /dev/sdb3       5.10GiB

System,RAID1: Size:32.00MiB, Used:80.00KiB
    /dev/sda3      32.00MiB
    /dev/sdb3      32.00MiB

Unallocated:
    /dev/sda3     112.00GiB
    /dev/sdb3     112.00GiB


# btrfs fi show /var/lib/lxd
Label: 'btrfs'  uuid: 6340f5de-f635-4d09-bbb2-1e03b1e1b160
         Total devices 2 FS bytes used 303.20GiB
         devid    1 size 423.13GiB used 311.13GiB path /dev/sda3
         devid    2 size 423.13GiB used 311.13GiB path /dev/sdb3


# btrfs fi df /var/lib/lxd
Data, RAID1: total=306.00GiB, used=301.32GiB
System, RAID1: total=32.00MiB, used=80.00KiB
Metadata, RAID1: total=5.10GiB, used=1.89GiB
GlobalReserve, single: total=512.00MiB, used=0.00B



So far out of all systems which were giving us "No space left on device" 
with 4.13.x, all but one are still giving us "No space left on device" 
during balance with 4.14-rc7 and later.
We've seen it on a mix of servers with SSD or HDD disks, with 
filesystems ranging from 0.5 TB to 20 TB, and use % from 30% to 90%.

Combined with evidence that "No space left on device" during balance can 
lead to various file corruption (we've witnessed it with MySQL), I'd day 
btrfs balance is a dangerous operation and decision to use it should be 
considered very thoroughly.


Shouldn't "Balance" be marked as "mostly OK" or "Unstable" here? Giving 
it "OK" status is misleading.

https://btrfs.wiki.kernel.org/index.php/Status


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to run balance successfully (No space left on device)?
  2017-11-10  7:42         ` Tomasz Chmielewski
@ 2017-11-10 21:51           ` Chris Murphy
  2017-11-10 22:18             ` Martin Raiber
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2017-11-10 21:51 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: E V, Btrfs BTRFS

On Fri, Nov 10, 2017 at 12:42 AM, Tomasz Chmielewski <mangoo@wpkg.org> wrote:
> On 2017-11-07 23:49, E V wrote:
>
>> Hmm, I used to see these phantom no space issues quite a bit on older
>> 4.x kernels, and haven't seen them since switching to space_cache=v2.
>> So it could be space cache corruption. You might try either clearing
>> you space cache, or mounting with nospace_cache, or try converting to
>> space_cache=v2 after reading up on it's caveats.
>
>
> We have space_cache=v2.


I have no idea if it's related or not, as this isn't a default mount
option and is still under testing.



>
> Unfortunately yet one more system running 4.14-rc8 with "No space left"
> during balance:
>
>
> [68443.535664] BTRFS info (device sdb3): relocating block group 591771009024
> flags data|raid1
> [68463.203330] BTRFS info (device sdb3): found 8578 extents
> [68492.238676] BTRFS info (device sdb3): found 8559 extents
> [68500.751792] BTRFS info (device sdb3): 1 enospc errors during balance
>
>
> # btrfs balance start /var/lib/lxd
> WARNING:
>
>         Full balance without filters requested. This operation is very
>         intense and takes potentially very long. It is recommended to
>         use the balance filters to narrow down the balanced data.
>         Use 'btrfs balance start --full-balance' option to skip this
>         warning. The operation will start in 10 seconds.
>         Use Ctrl-C to stop it.
> 10 9 8 7 6 5 4 3 2 1
> Starting balance without any filters.
> ERROR: error during balancing '/var/lib/lxd': No space left on device
> There may be more info in syslog - try dmesg | tail


OK I wonder if this is a bug in user space tool's error handling?
Because what you have in kernel messages is BTRFS info. It is not a
warning or an error. I interpret this as enospc error happened but it
recovered, so it was not an unhandled error condition, and definitely
non-fatal. But the user space tool is reporting a bogus "No space left
on device". It's plainly bogus because you have a lot of space on the
device, including unallocated space. So the user space tool needs to
either ignore this type of informational enospc or it needs a
different message to make it clear this is not a fatal error and was
properly handled.

Do you get any additional information when using enospc_debug mount
option and reproduce this problem?


> Unallocated:
>    /dev/sda3     112.00GiB
>    /dev/sdb3     112.00GiB

Metric shittons of space. The error is certainly bogus.



> Combined with evidence that "No space left on device" during balance can
> lead to various file corruption (we've witnessed it with MySQL), I'd day
> btrfs balance is a dangerous operation and decision to use it should be
> considered very thoroughly.

I've never heard of this. Balance is COW at the chunk level. The old
chunk is not dereferenced until it's written in the new location
correctly. Corruption during balance shouldn't be possible so if you
have a reproducer, the devs need to know about it.





-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to run balance successfully (No space left on device)?
  2017-11-10 21:51           ` Chris Murphy
@ 2017-11-10 22:18             ` Martin Raiber
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Raiber @ 2017-11-10 22:18 UTC (permalink / raw)
  To: Chris Murphy, Tomasz Chmielewski; +Cc: E V, Btrfs BTRFS

On 10.11.2017 22:51 Chris Murphy wrote:
>> Combined with evidence that "No space left on device" during balance can
>> lead to various file corruption (we've witnessed it with MySQL), I'd day
>> btrfs balance is a dangerous operation and decision to use it should be
>> considered very thoroughly.
> I've never heard of this. Balance is COW at the chunk level. The old
> chunk is not dereferenced until it's written in the new location
> correctly. Corruption during balance shouldn't be possible so if you
> have a reproducer, the devs need to know about it.

I didn't say anything before, because I could not reproduce the problem.
I had (I guess) a corruption caused by balance as well. It had ENOSPC in
spite of enough free space (4.9.x), which made me balance it regularly
to keep unallocated space around. Corruption occured probably after or
shortly before power reset during a balance -- no skip_balance specified
so it continued directly after mount -- data was moved relatively fast
after the mount operation (copy file then delete old file). I think
space_cache=v2 was active at the time. I'm of course not completely sure
it was btrfs's fault and as usual not all the conditions may be
relevant. Could also be instead an upper layer error (Hyper-V storage),
memory issue or an application error.

Regards,
Martin Raiber


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-11-10 22:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-17 15:02 how to run balance successfully (No space left on device)? Tomasz Chmielewski
2017-09-18  1:50 ` Duncan
2017-09-18  8:20 ` Tomasz Chmielewski
2017-09-18  8:29   ` Andrei Borzenkov
2017-09-18  9:27     ` Tomasz Chmielewski
2017-09-18 13:44       ` Peter Becker
2017-09-18 13:50         ` Tomasz Chmielewski
2017-09-19  2:59       ` Duncan
2017-10-31 14:18   ` Tomasz Chmielewski
2017-10-31 14:51     ` Tomasz Chmielewski
2017-11-07  5:13     ` Tomasz Chmielewski
     [not found]       ` <CAJtFHUQ34uyt-iAQKuQ-WqXMrCqxsPeqFc5LvYmZHrz+Rxs66A@mail.gmail.com>
2017-11-10  7:42         ` Tomasz Chmielewski
2017-11-10 21:51           ` Chris Murphy
2017-11-10 22:18             ` Martin Raiber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.