All of lore.kernel.org
 help / color / mirror / Atom feed
* "No space left on device" and balance doesn't work
@ 2016-06-01 18:30 MegaBrutal
  2016-06-01 20:30 ` Peter Becker
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: MegaBrutal @ 2016-06-01 18:30 UTC (permalink / raw)
  To: linux-btrfs

Hi all,

I have a 20 GB file system and df says I have about 2,6 GB free space,
yet I can't do anything on the file system because I get "No space
left on device" errors. I read that balance may help to remedy the
situation, but it actually doesn't.


Some data about the FS:


root@ReThinkCentre:~# df -h /
Fájlrendszer                Méret Fogl. Szab. Fo.% Csatol. pont
/dev/mapper/centrevg-rootlv   20G   18G  2,6G  88% /

root@ReThinkCentre:~# btrfs fi show /
Label: 'RootFS'  uuid: 3f002b8d-8a1f-41df-ad05-e3c91d7603fb
        Total devices 1 FS bytes used 15.42GiB
        devid    1 size 20.00GiB used 20.00GiB path /dev/mapper/centrevg-rootlv

root@ReThinkCentre:~# btrfs fi df /
Data, single: total=16.69GiB, used=14.14GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.62GiB, used=1.28GiB
GlobalReserve, single: total=352.00MiB, used=0.00B

root@ReThinkCentre:~# btrfs version
btrfs-progs v4.4


This happens when I try to balance:

root@ReThinkCentre:~# btrfs fi balance start -dusage=66 /
Done, had to relocate 0 out of 33 chunks
root@ReThinkCentre:~# btrfs fi balance start -dusage=67 /
ERROR: error during balancing '/': No space left on device
There may be more info in syslog - try dmesg | tail


"dmesg | tail" does not show anything related to this.

It is important to note that the file system currently has 32
snapshots of / at the moment, and snapshots taking up all the free
space is a plausible explanation. Maybe deleting some of the oldest
snapshots or just increasing the file system would help the situation.
However, I'm still interested, if the file system is full, why does df
show there is free space, and how could I show the situation without
having the mentioned options? I actually have an alert set up which
triggers when the FS usage reaches 90%, so then I know I have to
delete some old snapshots. It worked so far, I cleaned the snapshots
at 90%, FS usage fell back, everyone was happy. But now the alert
didn't even trigger because the FS is at 88% usage, so it shouldn't be
full yet.


Best regards and kecske,
MegaBrutal

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "No space left on device" and balance doesn't work
  2016-06-01 18:30 "No space left on device" and balance doesn't work MegaBrutal
@ 2016-06-01 20:30 ` Peter Becker
       [not found] ` <CAEtw4r2hsCd1+bFNcXn_s5jj-8x6qer+x0gLx8wTgNi1=nAXhw@mail.gmail.com>
  2016-06-02 12:56 ` Austin S. Hemmelgarn
  2 siblings, 0 replies; 14+ messages in thread
From: Peter Becker @ 2016-06-01 20:30 UTC (permalink / raw)
  To: MegaBrutal; +Cc: linux-btrfs

try this:

btrfs fi balance start -musage=0 /
btrfs fi balance start -dusage=0 /

btrfs fi balance start -musage=1 /
btrfs fi balance start -dusage=1 /

btrfs fi balance start -musage=5 /
btrfs fi balance start -musage=10 /
btrfs fi balance start -musage=20 /


btrfs fi balance start -dusage=5 /
btrfs fi balance start -dusage=10 /
btrfs fi balance start -dusage=20 /

2016-06-01 20:30 GMT+02:00 MegaBrutal <megabrutal@gmail.com>:
> Hi all,
>
> I have a 20 GB file system and df says I have about 2,6 GB free space,
> yet I can't do anything on the file system because I get "No space
> left on device" errors. I read that balance may help to remedy the
> situation, but it actually doesn't.
>
>
> Some data about the FS:
>
>
> root@ReThinkCentre:~# df -h /
> Fájlrendszer                Méret Fogl. Szab. Fo.% Csatol. pont
> /dev/mapper/centrevg-rootlv   20G   18G  2,6G  88% /
>
> root@ReThinkCentre:~# btrfs fi show /
> Label: 'RootFS'  uuid: 3f002b8d-8a1f-41df-ad05-e3c91d7603fb
>         Total devices 1 FS bytes used 15.42GiB
>         devid    1 size 20.00GiB used 20.00GiB path /dev/mapper/centrevg-rootlv
>
> root@ReThinkCentre:~# btrfs fi df /
> Data, single: total=16.69GiB, used=14.14GiB
> System, DUP: total=32.00MiB, used=16.00KiB
> Metadata, DUP: total=1.62GiB, used=1.28GiB
> GlobalReserve, single: total=352.00MiB, used=0.00B
>
> root@ReThinkCentre:~# btrfs version
> btrfs-progs v4.4
>
>
> This happens when I try to balance:
>
> root@ReThinkCentre:~# btrfs fi balance start -dusage=66 /
> Done, had to relocate 0 out of 33 chunks
> root@ReThinkCentre:~# btrfs fi balance start -dusage=67 /
> ERROR: error during balancing '/': No space left on device
> There may be more info in syslog - try dmesg | tail
>
>
> "dmesg | tail" does not show anything related to this.
>
> It is important to note that the file system currently has 32
> snapshots of / at the moment, and snapshots taking up all the free
> space is a plausible explanation. Maybe deleting some of the oldest
> snapshots or just increasing the file system would help the situation.
> However, I'm still interested, if the file system is full, why does df
> show there is free space, and how could I show the situation without
> having the mentioned options? I actually have an alert set up which
> triggers when the FS usage reaches 90%, so then I know I have to
> delete some old snapshots. It worked so far, I cleaned the snapshots
> at 90%, FS usage fell back, everyone was happy. But now the alert
> didn't even trigger because the FS is at 88% usage, so it shouldn't be
> full yet.
>
>
> Best regards and kecske,
> MegaBrutal
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "No space left on device" and balance doesn't work
       [not found] ` <CAEtw4r2hsCd1+bFNcXn_s5jj-8x6qer+x0gLx8wTgNi1=nAXhw@mail.gmail.com>
@ 2016-06-01 21:06   ` MegaBrutal
  2016-06-01 22:22     ` Henk Slager
  0 siblings, 1 reply; 14+ messages in thread
From: MegaBrutal @ 2016-06-01 21:06 UTC (permalink / raw)
  To: Peter Becker, linux-btrfs

Hi Peter,

I tried. I either get "Done, had to relocate 0 out of 33 chunks" or
"ERROR: error during balancing '/': No space left on device", and
nothing changes.


2016-06-01 22:29 GMT+02:00 Peter Becker <floyd.net@gmail.com>:
> try this:
>
> btrfs fi balance start -musage=0 /
> btrfs fi balance start -dusage=0 /
>
> btrfs fi balance start -musage=1 /
> btrfs fi balance start -dusage=1 /
>
> btrfs fi balance start -musage=5 /
> btrfs fi balance start -musage=10 /
> btrfs fi balance start -musage=20 /
>
>
> btrfs fi balance start -dusage=5 /
> btrfs fi balance start -dusage=10 /
> btrfs fi balance start -dusage=20 /
> ....
>
> 2016-06-01 20:30 GMT+02:00 MegaBrutal <megabrutal@gmail.com>:
>> Hi all,
>>
>> I have a 20 GB file system and df says I have about 2,6 GB free space,
>> yet I can't do anything on the file system because I get "No space
>> left on device" errors. I read that balance may help to remedy the
>> situation, but it actually doesn't.
>>
>>
>> Some data about the FS:
>>
>>
>> root@ReThinkCentre:~# df -h /
>> Fájlrendszer                Méret Fogl. Szab. Fo.% Csatol. pont
>> /dev/mapper/centrevg-rootlv   20G   18G  2,6G  88% /
>>
>> root@ReThinkCentre:~# btrfs fi show /
>> Label: 'RootFS'  uuid: 3f002b8d-8a1f-41df-ad05-e3c91d7603fb
>>         Total devices 1 FS bytes used 15.42GiB
>>         devid    1 size 20.00GiB used 20.00GiB path /dev/mapper/centrevg-rootlv
>>
>> root@ReThinkCentre:~# btrfs fi df /
>> Data, single: total=16.69GiB, used=14.14GiB
>> System, DUP: total=32.00MiB, used=16.00KiB
>> Metadata, DUP: total=1.62GiB, used=1.28GiB
>> GlobalReserve, single: total=352.00MiB, used=0.00B
>>
>> root@ReThinkCentre:~# btrfs version
>> btrfs-progs v4.4
>>
>>
>> This happens when I try to balance:
>>
>> root@ReThinkCentre:~# btrfs fi balance start -dusage=66 /
>> Done, had to relocate 0 out of 33 chunks
>> root@ReThinkCentre:~# btrfs fi balance start -dusage=67 /
>> ERROR: error during balancing '/': No space left on device
>> There may be more info in syslog - try dmesg | tail
>>
>>
>> "dmesg | tail" does not show anything related to this.
>>
>> It is important to note that the file system currently has 32
>> snapshots of / at the moment, and snapshots taking up all the free
>> space is a plausible explanation. Maybe deleting some of the oldest
>> snapshots or just increasing the file system would help the situation.
>> However, I'm still interested, if the file system is full, why does df
>> show there is free space, and how could I show the situation without
>> having the mentioned options? I actually have an alert set up which
>> triggers when the FS usage reaches 90%, so then I know I have to
>> delete some old snapshots. It worked so far, I cleaned the snapshots
>> at 90%, FS usage fell back, everyone was happy. But now the alert
>> didn't even trigger because the FS is at 88% usage, so it shouldn't be
>> full yet.
>>
>>
>> Best regards and kecske,
>> MegaBrutal
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "No space left on device" and balance doesn't work
  2016-06-01 21:06   ` MegaBrutal
@ 2016-06-01 22:22     ` Henk Slager
  2016-06-02 13:55       ` MegaBrutal
  0 siblings, 1 reply; 14+ messages in thread
From: Henk Slager @ 2016-06-01 22:22 UTC (permalink / raw)
  To: MegaBrutal; +Cc: Peter Becker, linux-btrfs

On Wed, Jun 1, 2016 at 11:06 PM, MegaBrutal <megabrutal@gmail.com> wrote:
> Hi Peter,
>
> I tried. I either get "Done, had to relocate 0 out of 33 chunks" or
> "ERROR: error during balancing '/': No space left on device", and
> nothing changes.
>
>
> 2016-06-01 22:29 GMT+02:00 Peter Becker <floyd.net@gmail.com>:
>> try this:
>>
>> btrfs fi balance start -musage=0 /
>> btrfs fi balance start -dusage=0 /
>>
>> btrfs fi balance start -musage=1 /
>> btrfs fi balance start -dusage=1 /
>>
>> btrfs fi balance start -musage=5 /
>> btrfs fi balance start -musage=10 /
>> btrfs fi balance start -musage=20 /
>>
>>
>> btrfs fi balance start -dusage=5 /
>> btrfs fi balance start -dusage=10 /
>> btrfs fi balance start -dusage=20 /
>> ....
>>
>> 2016-06-01 20:30 GMT+02:00 MegaBrutal <megabrutal@gmail.com>:
>>> Hi all,
>>>
>>> I have a 20 GB file system and df says I have about 2,6 GB free space,
>>> yet I can't do anything on the file system because I get "No space
>>> left on device" errors. I read that balance may help to remedy the
>>> situation, but it actually doesn't.
>>>
>>>
>>> Some data about the FS:
>>>
>>>
>>> root@ReThinkCentre:~# df -h /
>>> Fájlrendszer                Méret Fogl. Szab. Fo.% Csatol. pont
>>> /dev/mapper/centrevg-rootlv   20G   18G  2,6G  88% /
>>>
>>> root@ReThinkCentre:~# btrfs fi show /
>>> Label: 'RootFS'  uuid: 3f002b8d-8a1f-41df-ad05-e3c91d7603fb
>>>         Total devices 1 FS bytes used 15.42GiB
>>>         devid    1 size 20.00GiB used 20.00GiB path /dev/mapper/centrevg-rootlv

The device is completely filled with chunks (size and used are the
same) and none of the chunks is empty so a balance won't work at all.

A way to get out of this situation is to add a temporary extra device
(e.g. 8GB USB stick or a loop device on some larger USB disk) to the
fs and then do the various balance operations. Removing as much as
possible snapshots will ease the balance mostly, depending how old the
snapshots are.
Once you see that the total amount of space used by chunks is (a few
GiB) less then 19GiB, you can remove the temporary extra device from
the fs again.

It then is still possible that you run into the same situation again;
This is a longterm bug/problem. A brand new kernel might help, with
ENOSPC patches included.

What is the kernel version used?
Is the fs on a mechanical disk or SSD?
What are the mount options?
How old is the fs?

You might want to run this phython script, so you get an idea of what
the chunks fill-level is
https://github.com/knorrie/btrfs-heatmap/blob/master/show_usage.py

Also you could mount the fs with enospc_debug, and see what is reported in dmesg

>>> root@ReThinkCentre:~# btrfs fi df /
>>> Data, single: total=16.69GiB, used=14.14GiB
>>> System, DUP: total=32.00MiB, used=16.00KiB
>>> Metadata, DUP: total=1.62GiB, used=1.28GiB
>>> GlobalReserve, single: total=352.00MiB, used=0.00B
>>>
>>> root@ReThinkCentre:~# btrfs version
>>> btrfs-progs v4.4
>>>
>>>
>>> This happens when I try to balance:
>>>
>>> root@ReThinkCentre:~# btrfs fi balance start -dusage=66 /
>>> Done, had to relocate 0 out of 33 chunks
>>> root@ReThinkCentre:~# btrfs fi balance start -dusage=67 /
>>> ERROR: error during balancing '/': No space left on device
>>> There may be more info in syslog - try dmesg | tail
>>>
>>>
>>> "dmesg | tail" does not show anything related to this.
>>>
>>> It is important to note that the file system currently has 32
>>> snapshots of / at the moment, and snapshots taking up all the free
>>> space is a plausible explanation. Maybe deleting some of the oldest
>>> snapshots or just increasing the file system would help the situation.
>>> However, I'm still interested, if the file system is full, why does df
>>> show there is free space, and how could I show the situation without
>>> having the mentioned options? I actually have an alert set up which
>>> triggers when the FS usage reaches 90%, so then I know I have to
>>> delete some old snapshots. It worked so far, I cleaned the snapshots
>>> at 90%, FS usage fell back, everyone was happy. But now the alert
>>> didn't even trigger because the FS is at 88% usage, so it shouldn't be
>>> full yet.
>>>
>>>
>>> Best regards and kecske,
>>> MegaBrutal
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "No space left on device" and balance doesn't work
  2016-06-01 18:30 "No space left on device" and balance doesn't work MegaBrutal
  2016-06-01 20:30 ` Peter Becker
       [not found] ` <CAEtw4r2hsCd1+bFNcXn_s5jj-8x6qer+x0gLx8wTgNi1=nAXhw@mail.gmail.com>
@ 2016-06-02 12:56 ` Austin S. Hemmelgarn
  2016-06-04  6:27   ` Andrei Borzenkov
  2 siblings, 1 reply; 14+ messages in thread
From: Austin S. Hemmelgarn @ 2016-06-02 12:56 UTC (permalink / raw)
  To: MegaBrutal, linux-btrfs

On 2016-06-01 14:30, MegaBrutal wrote:
> Hi all,
>
> I have a 20 GB file system and df says I have about 2,6 GB free space,
> yet I can't do anything on the file system because I get "No space
> left on device" errors. I read that balance may help to remedy the
> situation, but it actually doesn't.
>
>
> Some data about the FS:
>
>
> root@ReThinkCentre:~# df -h /
> Fájlrendszer                Méret Fogl. Szab. Fo.% Csatol. pont
> /dev/mapper/centrevg-rootlv   20G   18G  2,6G  88% /
>
> root@ReThinkCentre:~# btrfs fi show /
> Label: 'RootFS'  uuid: 3f002b8d-8a1f-41df-ad05-e3c91d7603fb
>         Total devices 1 FS bytes used 15.42GiB
>         devid    1 size 20.00GiB used 20.00GiB path /dev/mapper/centrevg-rootlv
>
> root@ReThinkCentre:~# btrfs fi df /
> Data, single: total=16.69GiB, used=14.14GiB
> System, DUP: total=32.00MiB, used=16.00KiB
> Metadata, DUP: total=1.62GiB, used=1.28GiB
> GlobalReserve, single: total=352.00MiB, used=0.00B
>
> root@ReThinkCentre:~# btrfs version
> btrfs-progs v4.4
>
>
> This happens when I try to balance:
>
> root@ReThinkCentre:~# btrfs fi balance start -dusage=66 /
> Done, had to relocate 0 out of 33 chunks
> root@ReThinkCentre:~# btrfs fi balance start -dusage=67 /
> ERROR: error during balancing '/': No space left on device
> There may be more info in syslog - try dmesg | tail
>
>
> "dmesg | tail" does not show anything related to this.
>
> It is important to note that the file system currently has 32
> snapshots of / at the moment, and snapshots taking up all the free
> space is a plausible explanation. Maybe deleting some of the oldest
> snapshots or just increasing the file system would help the situation.
> However, I'm still interested, if the file system is full, why does df
> show there is free space, and how could I show the situation without
> having the mentioned options? I actually have an alert set up which
> triggers when the FS usage reaches 90%, so then I know I have to
> delete some old snapshots. It worked so far, I cleaned the snapshots
> at 90%, FS usage fell back, everyone was happy. But now the alert
> didn't even trigger because the FS is at 88% usage, so it shouldn't be
> full yet.
The first thing that needs to be understood is that df has been pretty 
much unchanged since it was introduced in the 70's (IIRC, it was in at 
least SVR4, possibly earlier UNIX versions too).  Back then, it was 
pretty easy to say what percentage of space was used and how much is 
left.  Back then, a filesystem only allocated one set of blocks for a 
file, and it didn't need extra space for updates, and the file took up 
exactly as much space as it's size on disk (usually, it can get kind of 
complicated based on a number of factors).  In addition, traditional UFS 
had a fixed size metadata area for the inodes, which simplified 
computations even more.

In BTRFS though, almost all of these assumptions which the original 
interface made aren't guaranteed.

Now, the biggest difference though is in how BTRFS allocates space. 
BTRFS uses a two tier allocation system.  First, you have high-level 
allocations of what are usually referred to as chunks, and then it 
allocates blocks within those chunks.  The balance operation operates at 
the chunk level, whereas things like defragmentation operate at the 
block level.  For performance reasons, BTRFS usually has separate chunks 
for metadata and data.  Data chunks are usually 1GB, and metadata chunks 
are usually 256MB, although both can vary in size based on the size of 
the filesystem.  Figuring out the exact size gets tricky on a live 
filesystem, but if your filesystem is between 16G and 64G, you're pretty 
much guaranteed to have chunks which are the default size.

Now, because of the segregation of data and metadata, and how chunk 
allocation works, it's possible to end up in a situation where you 
technically have free space, but you can't actually do anything with it. 
  This is because most file operations on BTRFS require at least a few 
blocks of metadata space so that the COW updates can happen.  You 
luckily don't appear to be quite to that point.

For compatibility reasons, we have to report _something_ through df.  We 
can't however report many of the situational things about the state of 
the FS itself (for example, if you have all the possible chunks 
allocated, no space in data chunks, but free space in metadata chunks, 
it's possible to create a lot of very small files, but creating a big 
one will fail).  As a result of this, what we report through df is 
technically absolutely correct (in your case, you _do_ technically have 
2.6G of free space), but is also absolutely useless for any kind of 
management decision.

In your particular situation, what's happened is that you have all the 
space allocated to chunks, but have free space within those chunks. 
Balance never puts data in existing chunks, and you can't allocate any 
new chunks, so you can't run a balance.  However, because of that free 
space in the chunks, you can still use the filesystem itself for 
'regular' filesystem operations.

In this situation, Henk's suggestion of adding another device is one of 
three options for dealing with this.  The other two options (which are 
usually less practical for most people) are to resize the filesystem to 
have more space, or recreate it from scratch.

As far as avoiding this in the future, the best option is to keep an eye 
on the output of fi show, and keep the per-device 'used' value at least 
a few GB below the device size.  I usually go for about 2GB or 0.2% of 
the device size, whichever is bigger.  This will give you enough 
headroom for at least a few chunks to be allocated so that balance can 
proceed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "No space left on device" and balance doesn't work
  2016-06-01 22:22     ` Henk Slager
@ 2016-06-02 13:55       ` MegaBrutal
  2016-06-02 22:45         ` Henk Slager
  0 siblings, 1 reply; 14+ messages in thread
From: MegaBrutal @ 2016-06-02 13:55 UTC (permalink / raw)
  To: Henk Slager; +Cc: Peter Becker, linux-btrfs

2016-06-02 0:22 GMT+02:00 Henk Slager <eye1tm@gmail.com>:
> What is the kernel version used?
> Is the fs on a mechanical disk or SSD?
> What are the mount options?
> How old is the fs?

Linux 4.4.0-22-generic (Ubuntu 16.04).
Mechanical disks in LVM.
Mount: /dev/mapper/centrevg-rootlv on / type btrfs
(rw,relatime,space_cache,subvolid=257,subvol=/@)
I don't know how to retrieve the exact FS age, but it was created in
2014 August.

Snapshots (their names encode their creation dates):

ID 908 gen 487349 top level 5 path @-snapshot-20160503000001
ID 909 gen 488849 top level 5 path @-snapshot-20160504000001
ID 910 gen 490313 top level 5 path @-snapshot-20160505000001
ID 911 gen 491763 top level 5 path @-snapshot-20160506000001
ID 912 gen 493399 top level 5 path @-snapshot-20160507000002
ID 913 gen 494996 top level 5 path @-snapshot-20160508000002
ID 914 gen 496495 top level 5 path @-snapshot-20160509000002
ID 915 gen 498094 top level 5 path @-snapshot-20160510000005
ID 916 gen 499688 top level 5 path @-snapshot-20160511000002
ID 917 gen 501308 top level 5 path @-snapshot-20160512000001
ID 918 gen 503375 top level 5 path @-snapshot-20160514000002
ID 919 gen 504356 top level 5 path @-snapshot-20160515000001
ID 920 gen 505890 top level 5 path @-snapshot-20160516000001
ID 921 gen 506901 top level 5 path @-snapshot-20160517000001
ID 922 gen 507313 top level 5 path @-snapshot-20160518000002
ID 923 gen 507712 top level 5 path @-snapshot-20160519000001
ID 924 gen 508057 top level 5 path @-snapshot-20160520000001
ID 925 gen 508882 top level 5 path @-snapshot-20160521000001
ID 926 gen 509241 top level 5 path @-snapshot-20160522000001
ID 927 gen 509618 top level 5 path @-snapshot-20160523000001
ID 928 gen 510277 top level 5 path @-snapshot-20160524000002
ID 929 gen 511357 top level 5 path @-snapshot-20160525000002
ID 930 gen 512125 top level 5 path @-snapshot-20160526000002
ID 931 gen 513292 top level 5 path @-snapshot-20160527000001
ID 932 gen 515766 top level 5 path @-snapshot-20160528000002
ID 933 gen 517349 top level 5 path @-snapshot-20160529000004
ID 934 gen 519004 top level 5 path @-snapshot-20160530000002
ID 935 gen 519500 top level 5 path @-snapshot-20160531000002
ID 936 gen 519847 top level 5 path @-snapshot-20160601000001
ID 937 gen 521829 top level 5 path @-snapshot-20160602000001

Removing old snapshots is the most feasible solution, but I can also
increase the FS size. It's easy since it's in LVM, and there is plenty
of space in the volume group.

Probably I should rewrite my alert script to check btrfs fi show
instead of plain df.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "No space left on device" and balance doesn't work
  2016-06-02 13:55       ` MegaBrutal
@ 2016-06-02 22:45         ` Henk Slager
  2016-06-03  5:51           ` Marc Haber
  2016-06-03 12:43           ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 14+ messages in thread
From: Henk Slager @ 2016-06-02 22:45 UTC (permalink / raw)
  To: MegaBrutal; +Cc: Peter Becker, linux-btrfs

On Thu, Jun 2, 2016 at 3:55 PM, MegaBrutal <megabrutal@gmail.com> wrote:
> 2016-06-02 0:22 GMT+02:00 Henk Slager <eye1tm@gmail.com>:
>> What is the kernel version used?
>> Is the fs on a mechanical disk or SSD?
>> What are the mount options?
>> How old is the fs?
>
> Linux 4.4.0-22-generic (Ubuntu 16.04).
> Mechanical disks in LVM.
> Mount: /dev/mapper/centrevg-rootlv on / type btrfs
> (rw,relatime,space_cache,subvolid=257,subvol=/@)
> I don't know how to retrieve the exact FS age, but it was created in
> 2014 August.
>
> Snapshots (their names encode their creation dates):
>
> ID 908 gen 487349 top level 5 path @-snapshot-20160503000001
...<snip>
> ID 937 gen 521829 top level 5 path @-snapshot-20160602000001
>
> Removing old snapshots is the most feasible solution, but I can also
> increase the FS size. It's easy since it's in LVM, and there is plenty
> of space in the volume group.
>
> Probably I should rewrite my alert script to check btrfs fi show
> instead of plain df.

Yes I think that makes sense, to decide on chunk-level. You can see
how big the chunks are with the linked show_usage.py program, most of
33 should be 1GiB as already very well explained by Austin.

The setup looks all pretty normal and btrfs should be able to handle
it, but unfortunately your fs is a typical example that one currently
needs to monitor/tune a btrfs fs for its 'health' in order to keep it
running longterm. You might want to change mount option relatime to
noatime, so that you have less writes to metadata chunks. It should
lower the scattering inside the metadata chunks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "No space left on device" and balance doesn't work
  2016-06-02 22:45         ` Henk Slager
@ 2016-06-03  5:51           ` Marc Haber
  2016-06-03 12:43           ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 14+ messages in thread
From: Marc Haber @ 2016-06-03  5:51 UTC (permalink / raw)
  To: linux-btrfs

On Fri, Jun 03, 2016 at 12:45:51AM +0200, Henk Slager wrote:
> The setup looks all pretty normal and btrfs should be able to handle
> it, but unfortunately your fs is a typical example that one currently
> needs to monitor/tune a btrfs fs for its 'health' in order to keep it
> running longterm.

What kind of work is being done to address this major usability issue?
What is the timeframe for a fix?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "No space left on device" and balance doesn't work
  2016-06-02 22:45         ` Henk Slager
  2016-06-03  5:51           ` Marc Haber
@ 2016-06-03 12:43           ` Austin S. Hemmelgarn
  2016-08-09  9:50             ` MegaBrutal
  1 sibling, 1 reply; 14+ messages in thread
From: Austin S. Hemmelgarn @ 2016-06-03 12:43 UTC (permalink / raw)
  To: Henk Slager, MegaBrutal; +Cc: Peter Becker, linux-btrfs

On 2016-06-02 18:45, Henk Slager wrote:
> On Thu, Jun 2, 2016 at 3:55 PM, MegaBrutal <megabrutal@gmail.com> wrote:
>> 2016-06-02 0:22 GMT+02:00 Henk Slager <eye1tm@gmail.com>:
>>> What is the kernel version used?
>>> Is the fs on a mechanical disk or SSD?
>>> What are the mount options?
>>> How old is the fs?
>>
>> Linux 4.4.0-22-generic (Ubuntu 16.04).
>> Mechanical disks in LVM.
>> Mount: /dev/mapper/centrevg-rootlv on / type btrfs
>> (rw,relatime,space_cache,subvolid=257,subvol=/@)
>> I don't know how to retrieve the exact FS age, but it was created in
>> 2014 August.
>>
>> Snapshots (their names encode their creation dates):
>>
>> ID 908 gen 487349 top level 5 path @-snapshot-20160503000001
> ...<snip>
>> ID 937 gen 521829 top level 5 path @-snapshot-20160602000001
>>
>> Removing old snapshots is the most feasible solution, but I can also
>> increase the FS size. It's easy since it's in LVM, and there is plenty
>> of space in the volume group.
>>
>> Probably I should rewrite my alert script to check btrfs fi show
>> instead of plain df.
>
> Yes I think that makes sense, to decide on chunk-level. You can see
> how big the chunks are with the linked show_usage.py program, most of
> 33 should be 1GiB as already very well explained by Austin.
>
> The setup looks all pretty normal and btrfs should be able to handle
> it, but unfortunately your fs is a typical example that one currently
> needs to monitor/tune a btrfs fs for its 'health' in order to keep it
> running longterm. You might want to change mount option relatime to
> noatime, so that you have less writes to metadata chunks. It should
> lower the scattering inside the metadata chunks.
Also, since you're on a new enough kernel, try 'lazytime' in the mount 
options as well, this defers all on-disk timestamp updates for up to 24 
hours or until the inode gets written out anyway, but keeps the updated 
info in memory.  The only downside to this is that mtimes might not be 
correct after an unclean shutdown, but most software will have no issues 
with this.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "No space left on device" and balance doesn't work
  2016-06-02 12:56 ` Austin S. Hemmelgarn
@ 2016-06-04  6:27   ` Andrei Borzenkov
  2016-06-04  7:57     ` Hugo Mills
  0 siblings, 1 reply; 14+ messages in thread
From: Andrei Borzenkov @ 2016-06-04  6:27 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, MegaBrutal, linux-btrfs

02.06.2016 15:56, Austin S. Hemmelgarn пишет:
> 
> In your particular situation, what's happened is that you have all the
> space allocated to chunks, but have free space within those chunks.
> Balance never puts data in existing chunks, and you can't allocate any
> new chunks, so you can't run a balance.  However, because of that free
> space in the chunks, you can still use the filesystem itself for
> 'regular' filesystem operations.
> 

How balance decides where to put data from chunks it frees? I.e. let's
say I have one free data chunk and 10 chunks filled to 10%. Will "btrfs
ba start -dusage=10" pack data from all 10 chunks into single one, this
freeing 10 chunks for further processing?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "No space left on device" and balance doesn't work
  2016-06-04  6:27   ` Andrei Borzenkov
@ 2016-06-04  7:57     ` Hugo Mills
  0 siblings, 0 replies; 14+ messages in thread
From: Hugo Mills @ 2016-06-04  7:57 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Austin S. Hemmelgarn, MegaBrutal, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1056 bytes --]

On Sat, Jun 04, 2016 at 09:27:13AM +0300, Andrei Borzenkov wrote:
> 02.06.2016 15:56, Austin S. Hemmelgarn пишет:
> > 
> > In your particular situation, what's happened is that you have all the
> > space allocated to chunks, but have free space within those chunks.
> > Balance never puts data in existing chunks, and you can't allocate any
> > new chunks, so you can't run a balance.  However, because of that free
> > space in the chunks, you can still use the filesystem itself for
> > 'regular' filesystem operations.
> > 
> 
> How balance decides where to put data from chunks it frees? I.e. let's
> say I have one free data chunk and 10 chunks filled to 10%. Will "btrfs
> ba start -dusage=10" pack data from all 10 chunks into single one, this
> freeing 10 chunks for further processing?

   Yes, it will. Andrei's assertion is, I'm afraid, incorrect.

   Hugo.

-- 
Hugo Mills             | There's many a slip 'twixt wicket-keeper and gully.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "No space left on device" and balance doesn't work
  2016-06-03 12:43           ` Austin S. Hemmelgarn
@ 2016-08-09  9:50             ` MegaBrutal
  2016-08-09 11:16               ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 14+ messages in thread
From: MegaBrutal @ 2016-08-09  9:50 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Henk Slager, Peter Becker, linux-btrfs

2016-06-03 14:43 GMT+02:00 Austin S. Hemmelgarn <ahferroin7@gmail.com>:
>
> Also, since you're on a new enough kernel, try 'lazytime' in the mount options as well, this defers all on-disk timestamp updates for up to 24 hours or until the inode gets written out anyway, but keeps the updated info in memory.  The only downside to this is that mtimes might not be correct after an unclean shutdown, but most software will have no issues with this.
>

Hi all,

Sorry for reviving this old thread, and probably it's not the best
place to ask about this... but I added the "noatime" option in fstab,
restarted the system, and now I think I should try "lazytime" too (as
I like the idea to have proper atimes with delayed writing to disk).
So now I'd just like to test the "lazytime" mount option without
restart.

I remounted the file system like this:

mount -o remount,lazytime /

But now the FS still has the "noatime" mount option, which I guess
renders "lazytime" ineffective. I thought they are supposed to be
mutually exclusive, so I'm actually surprised that I can have both
mount options at the same time.

Now my mount looks like this:

/dev/mapper/centrevg-rootlv on / type btrfs
(rw,noatime,lazytime,space_cache,subvolid=257,subvol=/@)

I also tried to explicitly add "atime" to negate "noatime" (man mount
says "atime" is the option to disable "noatime"), like this:

mount -o remount,atime,lazytime /

But the "noatime" option still stays. Why? Is it a BTRFS specific
issue, or does it reside in another layer?

By the way, is it valid to mount BTRFS subvolumes with different atime
policies? Then how do child subvolumes behave?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "No space left on device" and balance doesn't work
  2016-08-09  9:50             ` MegaBrutal
@ 2016-08-09 11:16               ` Austin S. Hemmelgarn
  2016-08-09 12:56                 ` Noah Massey
  0 siblings, 1 reply; 14+ messages in thread
From: Austin S. Hemmelgarn @ 2016-08-09 11:16 UTC (permalink / raw)
  To: MegaBrutal; +Cc: Henk Slager, Peter Becker, linux-btrfs

On 2016-08-09 05:50, MegaBrutal wrote:
> 2016-06-03 14:43 GMT+02:00 Austin S. Hemmelgarn <ahferroin7@gmail.com>:
>>
>> Also, since you're on a new enough kernel, try 'lazytime' in the mount options as well, this defers all on-disk timestamp updates for up to 24 hours or until the inode gets written out anyway, but keeps the updated info in memory.  The only downside to this is that mtimes might not be correct after an unclean shutdown, but most software will have no issues with this.
>>
>
> Hi all,
>
> Sorry for reviving this old thread, and probably it's not the best
> place to ask about this... but I added the "noatime" option in fstab,
> restarted the system, and now I think I should try "lazytime" too (as
> I like the idea to have proper atimes with delayed writing to disk).
> So now I'd just like to test the "lazytime" mount option without
> restart.
>
> I remounted the file system like this:
>
> mount -o remount,lazytime /
>
> But now the FS still has the "noatime" mount option, which I guess
> renders "lazytime" ineffective. I thought they are supposed to be
> mutually exclusive, so I'm actually surprised that I can have both
> mount options at the same time.
No, lazytime also affects mtime handling, not just atime, so they aren't 
mutually exclusive, and it does make sense to have both.
>
> Now my mount looks like this:
>
> /dev/mapper/centrevg-rootlv on / type btrfs
> (rw,noatime,lazytime,space_cache,subvolid=257,subvol=/@)
>
> I also tried to explicitly add "atime" to negate "noatime" (man mount
> says "atime" is the option to disable "noatime"), like this:
>
> mount -o remount,atime,lazytime /
>
> But the "noatime" option still stays. Why? Is it a BTRFS specific
> issue, or does it reside in another layer?
>
> By the way, is it valid to mount BTRFS subvolumes with different atime
> policies? Then how do child subvolumes behave?
I'm not entirely certain (I have my kernel patched so noatime is the 
default, and rarely if ever use anything else, so I don't have much in 
the way of experience with this), but my guess would be that it can't be 
done, and that changing atime handling on remount isn't really handled. 
I do know that adding lazytime on remount doesn't always work, but that 
kind of makes sense, since it causes significant changes in how mtimes 
and atimes are handled internally.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "No space left on device" and balance doesn't work
  2016-08-09 11:16               ` Austin S. Hemmelgarn
@ 2016-08-09 12:56                 ` Noah Massey
  0 siblings, 0 replies; 14+ messages in thread
From: Noah Massey @ 2016-08-09 12:56 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: MegaBrutal, Henk Slager, Peter Becker, linux-btrfs

On Tue, Aug 9, 2016 at 7:16 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2016-08-09 05:50, MegaBrutal wrote:
>>
>> 2016-06-03 14:43 GMT+02:00 Austin S. Hemmelgarn <ahferroin7@gmail.com>:
>>>
>>>
>>> Also, since you're on a new enough kernel, try 'lazytime' in the mount
>>> options as well, this defers all on-disk timestamp updates for up to 24
>>> hours or until the inode gets written out anyway, but keeps the updated info
>>> in memory.  The only downside to this is that mtimes might not be correct
>>> after an unclean shutdown, but most software will have no issues with this.
>>>
>>
>> Hi all,
>>
>> Sorry for reviving this old thread, and probably it's not the best
>> place to ask about this... but I added the "noatime" option in fstab,
>> restarted the system, and now I think I should try "lazytime" too (as
>> I like the idea to have proper atimes with delayed writing to disk).
>> So now I'd just like to test the "lazytime" mount option without
>> restart.
>>
>> I remounted the file system like this:
>>
>> mount -o remount,lazytime /
>>
>> But now the FS still has the "noatime" mount option, which I guess
>> renders "lazytime" ineffective. I thought they are supposed to be
>> mutually exclusive, so I'm actually surprised that I can have both
>> mount options at the same time.
>
> No, lazytime also affects mtime handling, not just atime, so they aren't
> mutually exclusive, and it does make sense to have both.
>>
>>
>> Now my mount looks like this:
>>
>> /dev/mapper/centrevg-rootlv on / type btrfs
>> (rw,noatime,lazytime,space_cache,subvolid=257,subvol=/@)
>>
>> I also tried to explicitly add "atime" to negate "noatime" (man mount
>> says "atime" is the option to disable "noatime"), like this:
>>
>> mount -o remount,atime,lazytime /
>>
>> But the "noatime" option still stays. Why? Is it a BTRFS specific
>> issue, or does it reside in another layer?
>>
>> By the way, is it valid to mount BTRFS subvolumes with different atime
>> policies? Then how do child subvolumes behave?
>
> I'm not entirely certain (I have my kernel patched so noatime is the
> default, and rarely if ever use anything else, so I don't have much in the
> way of experience with this), but my guess would be that it can't be done,
> and that changing atime handling on remount isn't really handled. I do know
> that adding lazytime on remount doesn't always work, but that kind of makes
> sense, since it causes significant changes in how mtimes and atimes are
> handled internally.
>

TLDR:

try 'mount -o remount,strictatime / && mount -o remount,relatime /'

This seems strange, but on my unpatched system:

$ uname -r
4.7.0

$ mount -o loop,noatime,lazytime /var/tmp/test.dd /mnt
$ grep ^/dev/loop0 /proc/mounts
/dev/loop0 /mnt btrfs rw,lazytime,noatime,space_cache,subvolid=5,subvol=/ 0 0

$ mount -o remount,relatime /mnt
$ grep ^/dev/loop0 /proc/mounts
/dev/loop0 /mnt btrfs rw,lazytime,noatime,space_cache,subvolid=5,subvol=/ 0 0

^^^ No change to noatime option

$ mount -o remount,strictatime /mnt
$ grep ^/dev/loop0 /proc/mounts
/dev/loop0 /mnt btrfs rw,lazytime,space_cache,subvolid=5,subvol=/ 0 0

^^^ that updated it...

$ mount -o remount,relatime /mnt
$ grep ^/dev/loop0 /proc/mounts

/dev/loop0 /mnt btrfs rw,lazytime,relatime,space_cache,subvolid=5,subvol=/ 0 0

^^^ now it changes

~ Noah

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-08-09 12:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-01 18:30 "No space left on device" and balance doesn't work MegaBrutal
2016-06-01 20:30 ` Peter Becker
     [not found] ` <CAEtw4r2hsCd1+bFNcXn_s5jj-8x6qer+x0gLx8wTgNi1=nAXhw@mail.gmail.com>
2016-06-01 21:06   ` MegaBrutal
2016-06-01 22:22     ` Henk Slager
2016-06-02 13:55       ` MegaBrutal
2016-06-02 22:45         ` Henk Slager
2016-06-03  5:51           ` Marc Haber
2016-06-03 12:43           ` Austin S. Hemmelgarn
2016-08-09  9:50             ` MegaBrutal
2016-08-09 11:16               ` Austin S. Hemmelgarn
2016-08-09 12:56                 ` Noah Massey
2016-06-02 12:56 ` Austin S. Hemmelgarn
2016-06-04  6:27   ` Andrei Borzenkov
2016-06-04  7:57     ` Hugo Mills

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.