All of lore.kernel.org
 help / color / mirror / Atom feed
* Again, no space left on device while rebalancing and recipe  doesnt work
@ 2016-02-27 21:14 Marc Haber
  2016-02-27 23:15 ` Martin Steigerwald
                   ` (4 more replies)
  0 siblings, 5 replies; 81+ messages in thread
From: Marc Haber @ 2016-02-27 21:14 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have again the issue of no space left on device while rebalancing
(with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable):

mh@fan:~$ sudo btrfs balance start /mnt/fanbtr
ERROR: error during balancing '/mnt/fanbtr': No space left on device
mh@fan:~$ sudo btrfs fi show /mnt/fanbtr
mh@fan:~$ sudo btrfs fi show -m
Label: 'fanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
        Total devices 1 FS bytes used 116.49GiB
        devid    1 size 417.19GiB used 177.06GiB path /dev/mapper/fanbtr
mh@fan:~$ sudo btrfs fi df /mnt/fanbtr
Data, single: total=113.00GiB, used=112.77GiB
System, DUP: total=32.00MiB, used=48.00KiB
Metadata, DUP: total=32.00GiB, used=3.72GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
mh@fan:~$

The filesystem was recently resized from 300 GB to 420 GB.

Why does btrfs fi show /mnt/fanbtr not give any output? Wy does btrfs
fi df /mnt/fanbtr say that my data space is only 113 GiB large?

btrfs balance start -dusage=5 works up to -dusage=100:

mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
Done, had to relocate 111 out of 179 chunks
mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
Done, had to relocate 111 out of 179 chunks
mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
Done, had to relocate 110 out of 179 chunks
mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
Done, had to relocate 109 out of 179 chunks
mh@fan:~$ sudo btrfs balance start /mnt/fanbtr
ERROR: error during balancing '/mnt/fanbtr': No space left on device
mh@fan:~$

What is going on here? How do I get away from here?

Greetings
Marc


-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe  doesnt work
  2016-02-27 21:14 Again, no space left on device while rebalancing and recipe doesnt work Marc Haber
@ 2016-02-27 23:15 ` Martin Steigerwald
  2016-02-28  0:08   ` Marc Haber
  2016-02-29  1:56 ` Qu Wenruo
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 81+ messages in thread
From: Martin Steigerwald @ 2016-02-27 23:15 UTC (permalink / raw)
  To: Marc Haber; +Cc: linux-btrfs

On Samstag, 27. Februar 2016 22:14:50 CET Marc Haber wrote:
> Hi,

Hi Marc.

> I have again the issue of no space left on device while rebalancing
> (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable):
> 
> mh@fan:~$ sudo btrfs balance start /mnt/fanbtr
> ERROR: error during balancing '/mnt/fanbtr': No space left on device
> mh@fan:~$ sudo btrfs fi show /mnt/fanbtr
> mh@fan:~$ sudo btrfs fi show -m
> Label: 'fanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
>         Total devices 1 FS bytes used 116.49GiB
>         devid    1 size 417.19GiB used 177.06GiB path /dev/mapper/fanbtr

Hmmm, thats still a ton of space to allocate chunks from.

> mh@fan:~$ sudo btrfs fi df /mnt/fanbtr
> Data, single: total=113.00GiB, used=112.77GiB
> System, DUP: total=32.00MiB, used=48.00KiB
> Metadata, DUP: total=32.00GiB, used=3.72GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> mh@fan:~$
> 
> The filesystem was recently resized from 300 GB to 420 GB.
> 
> Why does btrfs fi show /mnt/fanbtr not give any output? Wy does btrfs
> fi df /mnt/fanbtr say that my data space is only 113 GiB large?

Cause it is.

The "used" in "devid 1" line is btrfs fi sh is "data + 2x system + 2x metadata 
= 113 GiB + 2 * 32 GiB + 2 * 32 MiB, i.e. what amount of the size of the 
device is allocated for chunks.

The value one line above is what is allocated inside the chunks.

I.e. the line in "devid 1" is "total" of btrfs fi df summed up, and the line 
above is "used" in btrfs fi df summed up. And… with more devices you have more 
fun.

I suggest:

merkaba:~> btrfs fi usage -T /daten
Overall:
    Device size:                 235.00GiB
    Device allocated:            227.04GiB
    Device unallocated:            7.96GiB
    Device missing:                  0.00B
    Used:                        225.84GiB
    Free (estimated):              8.48GiB      (min: 8.48GiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              128.00MiB      (used: 0.00B)

             Data      Metadata  System              
Id Path      single    single    single   Unallocated
-- --------- --------- --------- -------- -----------
 1 /dev/dm-1 226.00GiB   1.01GiB 32.00MiB     7.96GiB
-- --------- --------- --------- -------- -----------
   Total     226.00GiB   1.01GiB 32.00MiB     7.96GiB
   Used      225.48GiB 371.83MiB 48.00KiB 

as that is much clearer to read IMHO.

and

merkaba:~> btrfs device usage /daten   
/dev/dm-1, ID: 1
   Device size:           235.00GiB
   Data,single:           226.00GiB
   Metadata,single:         1.01GiB
   System,single:          32.00MiB
   Unallocated:             7.96GiB

(although thats include in the filesystem usage output)


Or for a BTRFS RAID 1:

merkaba:~> btrfs fi usage -T /home  
Overall:
    Device size:                 340.00GiB
    Device allocated:            340.00GiB
    Device unallocated:            2.00MiB
    Device missing:                  0.00B
    Used:                        306.47GiB
    Free (estimated):             14.58GiB      (min: 14.58GiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

             Data      Metadata System              
Id Path      RAID1     RAID1    RAID1    Unallocated
-- --------- --------- -------- -------- -----------
 1 /dev/dm-0 163.94GiB  6.03GiB 32.00MiB     1.00MiB
 2 /dev/dm-3 163.94GiB  6.03GiB 32.00MiB     1.00MiB
-- --------- --------- -------- -------- -----------
   Total     163.94GiB  6.03GiB 32.00MiB     2.00MiB
   Used      149.36GiB  3.88GiB 48.00KiB            


merkaba:~> btrfs device usage /home
/dev/dm-0, ID: 1
   Device size:           170.00GiB
   Data,RAID1:            163.94GiB
   Metadata,RAID1:          6.03GiB
   System,RAID1:           32.00MiB
   Unallocated:             1.00MiB

/dev/dm-3, ID: 2
   Device size:           170.00GiB
   Data,RAID1:            163.94GiB
   Metadata,RAID1:          6.03GiB
   System,RAID1:           32.00MiB
   Unallocated:             1.00MiB


(this is actually the situation asking for hung task trouble with kworker 
threads seeking for free space inside chunks, as no new chunks can be 
allocated, lets hope kernel 4.4 finally really has fixes for this)

> btrfs balance start -dusage=5 works up to -dusage=100:
> 
> mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
> Done, had to relocate 111 out of 179 chunks
> mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
> Done, had to relocate 111 out of 179 chunks
> mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
> Done, had to relocate 110 out of 179 chunks
> mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
> Done, had to relocate 109 out of 179 chunks
> mh@fan:~$ sudo btrfs balance start /mnt/fanbtr
> ERROR: error during balancing '/mnt/fanbtr': No space left on device
> mh@fan:~$
> 
> What is going on here? How do I get away from here?

Others may have better tips, but what can always work is:

Adding a new device temporarily, doing the balance and then removing it.

Before that I´d try to balance the metadata chunks, cause 

> Metadata, DUP: total=32.00GiB, used=3.72GiB

32 GiB chunks allocated, only 3,72 GiB used.

Maybe that way you can gain more free space to have a full balance run.

Also note that it is not necessary to do a full balance in case everything 
works okayish.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-02-27 23:15 ` Martin Steigerwald
@ 2016-02-28  0:08   ` Marc Haber
  2016-02-28  0:22     ` Hugo Mills
  0 siblings, 1 reply; 81+ messages in thread
From: Marc Haber @ 2016-02-28  0:08 UTC (permalink / raw)
  To: linux-btrfs

On Sun, Feb 28, 2016 at 12:15:21AM +0100, Martin Steigerwald wrote:
> On Samstag, 27. Februar 2016 22:14:50 CET Marc Haber wrote:
> > I have again the issue of no space left on device while rebalancing
> > (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable):
> > 
> > mh@fan:~$ sudo btrfs balance start /mnt/fanbtr
> > ERROR: error during balancing '/mnt/fanbtr': No space left on device
> > mh@fan:~$ sudo btrfs fi show /mnt/fanbtr
> > mh@fan:~$ sudo btrfs fi show -m
> > Label: 'fanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
> >         Total devices 1 FS bytes used 116.49GiB
> >         devid    1 size 417.19GiB used 177.06GiB path /dev/mapper/fanbtr
> 
> Hmmm, thats still a ton of space to allocate chunks from.
> 
> > mh@fan:~$ sudo btrfs fi df /mnt/fanbtr
> > Data, single: total=113.00GiB, used=112.77GiB
> > System, DUP: total=32.00MiB, used=48.00KiB
> > Metadata, DUP: total=32.00GiB, used=3.72GiB
> > GlobalReserve, single: total=512.00MiB, used=0.00B
> > mh@fan:~$
> > 
> > The filesystem was recently resized from 300 GB to 420 GB.
> > 
> > Why does btrfs fi show /mnt/fanbtr not give any output? Wy does btrfs
> > fi df /mnt/fanbtr say that my data space is only 113 GiB large?
> 
> Cause it is.
> 
> The "used" in "devid 1" line is btrfs fi sh is "data + 2x system + 2x metadata 
> = 113 GiB + 2 * 32 GiB + 2 * 32 MiB, i.e. what amount of the size of the 
> device is allocated for chunks.
> 
> The value one line above is what is allocated inside the chunks.
> 
> I.e. the line in "devid 1" is "total" of btrfs fi df summed up, and the line 
> above is "used" in btrfs fi df summed up. And… with more devices you have more 
> fun.

Why wouldn't btrfs allocate more data chunks from the ample free space?

> I suggest:
> 
> merkaba:~> btrfs fi usage -T /daten


[2/498]mh@fan:~$ sudo btrfs fi usage /mnt/fanbtr
Overall:
    Device size:                 417.19GiB
    Device allocated:            177.06GiB
    Device unallocated:          240.12GiB
    Device missing:                  0.00B
    Used:                        120.23GiB
    Free (estimated):            240.33GiB      (min: 120.27GiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,single: Size:113.00GiB, Used:112.79GiB
   /dev/mapper/fanbtr    113.00GiB

Metadata,DUP: Size:32.00GiB, Used:3.72GiB
   /dev/mapper/fanbtr     64.00GiB

System,DUP: Size:32.00MiB, Used:48.00KiB
   /dev/mapper/fanbtr     64.00MiB

[3/498]mh@fan:~$ sudo btrfs fi usage -T /mnt/fanbtr
Overall:
    Device size:                 417.19GiB
    Device allocated:            177.06GiB
    Device unallocated:          240.12GiB
    Device missing:                  0.00B
    Used:                        120.23GiB
    Free (estimated):            240.33GiB      (min: 120.27GiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

                      Data      Metadata System
Id Path               single    DUP      DUP      Unallocated
-- ------------------ --------- -------- -------- -----------
 1 /dev/mapper/fanbtr 113.00GiB 64.00GiB 64.00MiB   240.12GiB
-- ------------------ --------- -------- -------- -----------
   Total              113.00GiB 32.00GiB 32.00MiB   240.12GiB
   Used               112.79GiB  3.72GiB 48.00KiB
[4/499]mh@fan:~$

> (this is actually the situation asking for hung task trouble with kworker 
> threads seeking for free space inside chunks, as no new chunks can be 
> allocated, lets hope kernel 4.4 finally really has fixes for this)

I am running a 4.4.2 kernel on the system in question.

> Adding a new device temporarily, doing the balance and then removing it.

I currently refuse to do this on a 400 GiB device that has more than
half of its capacity free. I do expect a modern filesystem to get out
of that situation without a manual intervention this invasive.

> Before that I´d try to balance the metadata chunks, cause 
> 
> > Metadata, DUP: total=32.00GiB, used=3.72GiB
> 
> 32 GiB chunks allocated, only 3,72 GiB used.

Why would I rebalance metadata if there is less than 20 % used?

[21/504]mh@fan:~$ sudo btrfs balance start -musage=5 /mnt/fanbtr
ERROR: error during balancing '/mnt/fanbtr': No space left on device
There may be more info in syslog - try dmesg | tail
[22/505]mh@fan:~$ sudo btrfs balance start -musage=1 /mnt/fanbtr
Done, had to relocate 56 out of 179 chunks
[23/506]mh@fan:~$ sudo btrfs balance start -musage=1 /mnt/fanbtr
Done, had to relocate 56 out of 179 chunks
[24/506]mh@fan:~$ sudo btrfs balance start -musage=1 /mnt/fanbtr
Done, had to relocate 56 out of 179 chunks
[25/506]mh@fan:~$ sudo btrfs balance start -musage=5 /mnt/fanbtr
Done, had to relocate 56 out of 179 chunks
[26/507]mh@fan:~$ sudo btrfs balance start -musage=50 /mnt/fanbtr
Done, had to relocate 56 out of 179 chunks
[27/508]mh@fan:~$ sudo btrfs balance start -musage=90 /mnt/fanbtr
Done, had to relocate 61 out of 179 chunks
[29/510]mh@fan:~$ sudo btrfs balance start -musage=90 /mnt/fanbtr
ERROR: error during balancing '/mnt/fanbtr': No space left on device
There may be more info in syslog - try dmesg | tail
[30/511]mh@fan:~$ sudo btrfs balance start -musage=90 /mnt/fanbtr
ERROR: error during balancing '/mnt/fanbtr': No space left on device
There may be more info in syslog - try dmesg | tail
[31/511]mh@fan:~$ sudo btrfs fi usage -T /mnt/fanbtr
Overall:
    Device size:                 417.19GiB
    Device allocated:            177.06GiB
    Device unallocated:          240.12GiB
    Device missing:                  0.00B
    Used:                        120.24GiB
    Free (estimated):            240.32GiB      (min: 120.26GiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

                      Data      Metadata System
Id Path               single    DUP      DUP      Unallocated
-- ------------------ --------- -------- -------- -----------
 1 /dev/mapper/fanbtr 113.00GiB 64.00GiB 64.00MiB   240.12GiB
-- ------------------ --------- -------- -------- -----------
   Total              113.00GiB 32.00GiB 32.00MiB   240.12GiB
   Used               112.80GiB  3.72GiB 48.00KiB
[32/512]mh@fan:~$

Why does the number of reallocated chunks not decrease, why does
-musage=90 finish the first time and throw out of space errors on
subsequent invocations, and why doesn't the balance change anything in
btrfs fi usage?

> Maybe that way you can gain more free space to have a full balance run.

Why would i need more free space? there are 240 GiB free!

> Also note that it is not necessary to do a full balance in case everything 
> works okayish.

The filesystem does not look healthy to me.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-02-28  0:08   ` Marc Haber
@ 2016-02-28  0:22     ` Hugo Mills
  2016-02-28  8:40       ` Marc Haber
  0 siblings, 1 reply; 81+ messages in thread
From: Hugo Mills @ 2016-02-28  0:22 UTC (permalink / raw)
  To: Marc Haber; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2199 bytes --]

On Sun, Feb 28, 2016 at 01:08:29AM +0100, Marc Haber wrote:
> On Sun, Feb 28, 2016 at 12:15:21AM +0100, Martin Steigerwald wrote:
> > On Samstag, 27. Februar 2016 22:14:50 CET Marc Haber wrote:
> > > I have again the issue of no space left on device while rebalancing
> > > (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable):
> > > 
> > > mh@fan:~$ sudo btrfs balance start /mnt/fanbtr
> > > ERROR: error during balancing '/mnt/fanbtr': No space left on device
> > > mh@fan:~$ sudo btrfs fi show /mnt/fanbtr
> > > mh@fan:~$ sudo btrfs fi show -m
> > > Label: 'fanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
> > >         Total devices 1 FS bytes used 116.49GiB
> > >         devid    1 size 417.19GiB used 177.06GiB path /dev/mapper/fanbtr
> > 
> > Hmmm, thats still a ton of space to allocate chunks from.
> > 
> > > mh@fan:~$ sudo btrfs fi df /mnt/fanbtr
> > > Data, single: total=113.00GiB, used=112.77GiB
> > > System, DUP: total=32.00MiB, used=48.00KiB
> > > Metadata, DUP: total=32.00GiB, used=3.72GiB
> > > GlobalReserve, single: total=512.00MiB, used=0.00B
> > > mh@fan:~$
> > > 
> > > The filesystem was recently resized from 300 GB to 420 GB.
> > > 
> > > Why does btrfs fi show /mnt/fanbtr not give any output? Wy does btrfs
> > > fi df /mnt/fanbtr say that my data space is only 113 GiB large?
> > 
> > Cause it is.
> > 
> > The "used" in "devid 1" line is btrfs fi sh is "data + 2x system + 2x metadata 
> > = 113 GiB + 2 * 32 GiB + 2 * 32 MiB, i.e. what amount of the size of the 
> > device is allocated for chunks.
> > 
> > The value one line above is what is allocated inside the chunks.
> > 
> > I.e. the line in "devid 1" is "total" of btrfs fi df summed up, and the line 
> > above is "used" in btrfs fi df summed up. And… with more devices you have more 
> > fun.
> 
> Why wouldn't btrfs allocate more data chunks from the ample free space?

   It's a bug. It's been around for years (literally), but nobody's
tracked it down and fixed it yet.

   Hugo.

-- 
Hugo Mills             | We believe in free will because we have no choice.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-02-28  0:22     ` Hugo Mills
@ 2016-02-28  8:40       ` Marc Haber
  0 siblings, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-02-28  8:40 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs

On Sun, Feb 28, 2016 at 12:22:45AM +0000, Hugo Mills wrote:
> On Sun, Feb 28, 2016 at 01:08:29AM +0100, Marc Haber wrote:
> > Why wouldn't btrfs allocate more data chunks from the ample free space?
> 
>    It's a bug. It's been around for years (literally), but nobody's
> tracked it down and fixed it yet.

Is there a fix/workaround?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-02-27 21:14 Again, no space left on device while rebalancing and recipe doesnt work Marc Haber
  2016-02-27 23:15 ` Martin Steigerwald
@ 2016-02-29  1:56 ` Qu Wenruo
  2016-02-29 15:33   ` Marc Haber
  2016-03-03  0:28 ` Dāvis Mosāns
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2016-02-29  1:56 UTC (permalink / raw)
  To: Marc Haber, linux-btrfs



Marc Haber wrote on 2016/02/27 22:14 +0100:
> Hi,
>
> I have again the issue of no space left on device while rebalancing
> (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable):
>
> mh@fan:~$ sudo btrfs balance start /mnt/fanbtr
> ERROR: error during balancing '/mnt/fanbtr': No space left on device

It seems that, only when balancing all chunks, ENOSPC error happens.

And did you run any other heavy IO at background?

BTW, is there any kernel log when the ENOSPC happens?

> mh@fan:~$ sudo btrfs fi show /mnt/fanbtr
> mh@fan:~$ sudo btrfs fi show -m
> Label: 'fanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
>          Total devices 1 FS bytes used 116.49GiB
>          devid    1 size 417.19GiB used 177.06GiB path /dev/mapper/fanbtr
> mh@fan:~$ sudo btrfs fi df /mnt/fanbtr
> Data, single: total=113.00GiB, used=112.77GiB
> System, DUP: total=32.00MiB, used=48.00KiB
> Metadata, DUP: total=32.00GiB, used=3.72GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> mh@fan:~$
>
> The filesystem was recently resized from 300 GB to 420 GB.
>
> Why does btrfs fi show /mnt/fanbtr not give any output? Wy does btrfs
> fi df /mnt/fanbtr say that my data space is only 113 GiB large?
>
> btrfs balance start -dusage=5 works up to -dusage=100:
>
> mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
> Done, had to relocate 111 out of 179 chunks
> mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
> Done, had to relocate 111 out of 179 chunks
> mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
> Done, had to relocate 110 out of 179 chunks
> mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
> Done, had to relocate 109 out of 179 chunks
> mh@fan:~$ sudo btrfs balance start /mnt/fanbtr
> ERROR: error during balancing '/mnt/fanbtr': No space left on device
> mh@fan:~$

Would you please try the following commands to see which one caused the 
problem?
And would you please provide the dmesg of them?

# btrfs balance start -dprofiles=single /mnt/fanbtr
# btrfs balance start -mprofile=dup /mnt/fanbtr
# btrfs balance start -sprofile=dup /mnt/fanbtr

The above three commands should be equal to your "btrfs balance start" 
command without any parameter, but do them separately, so it should tell 
you which chunk type caused the ENOSPC error.

 From your above -dusage=100 parameter, I think data should be OK, and 
since the metadata chunk is larger than 1G, and there are a lot of free 
metadata space, I assume it's the single system chunk causing the problem.

And if my assumption is right, you have nothing to worry as data and 
metadata should be OK to allocate new chunks, and 32M system chunks is 
definitely large enough.
But in that case, we should have a clearer clue to trace and fix the bug.

Thanks,
Qu

>
> What is going on here? How do I get away from here?
>
> Greetings
> Marc
>
>



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-02-29  1:56 ` Qu Wenruo
@ 2016-02-29 15:33   ` Marc Haber
  2016-03-01  0:45     ` Qu Wenruo
  0 siblings, 1 reply; 81+ messages in thread
From: Marc Haber @ 2016-02-29 15:33 UTC (permalink / raw)
  To: linux-btrfs

Hi,

On Mon, Feb 29, 2016 at 09:56:58AM +0800, Qu Wenruo wrote:
> Marc Haber wrote on 2016/02/27 22:14 +0100:
> >I have again the issue of no space left on device while rebalancing
> >(with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable):
> >
> >mh@fan:~$ sudo btrfs balance start /mnt/fanbtr
> >ERROR: error during balancing '/mnt/fanbtr': No space left on device
> 
> It seems that, only when balancing all chunks, ENOSPC error happens.
> 
> And did you run any other heavy IO at background?

Not when running those last commands for the mailing list post.

> BTW, is there any kernel log when the ENOSPC happens?

> Would you please try the following commands to see which one caused the
> problem?
> And would you please provide the dmesg of them?
> 
> # btrfs balance start -dprofiles=single /mnt/fanbtr
> # btrfs balance start -mprofile=dup /mnt/fanbtr
> # btrfs balance start -sprofile=dup /mnt/fanbtr

I have attached the logs. I used logger(1) to have in syslog which
command I executed, and I have piped the userspace's output to logger
so that the syslog entries match the userspace output.

-mprofile gave an error message, I therefore tried -mprofiles, and
-sprofiles wanted me to use the --force, so I did that as well.

The three balance commands above all three finshed alright without
running into ENOSPC, while running a plain balance (which is also part
of the log) errors out every time.

And, the -dprofiles=single log caused a number of INFOs regarding
btrfs-cleaner and btrfa-balance processes gotten stuck for more than
120 seconds during the run.

I now have a kworker and a btfs-transact kernel process taking most of
one CPU core each, even after the userspace programs have terminated.
Is there a way to find out what these threads are actually doing?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-02-29 15:33   ` Marc Haber
@ 2016-03-01  0:45     ` Qu Wenruo
       [not found]       ` <20160301065448.GJ2334@torres.zugschlus.de>
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2016-03-01  0:45 UTC (permalink / raw)
  To: Marc Haber, linux-btrfs



Marc Haber wrote on 2016/02/29 16:33 +0100:
> Hi,
>
> On Mon, Feb 29, 2016 at 09:56:58AM +0800, Qu Wenruo wrote:
>> Marc Haber wrote on 2016/02/27 22:14 +0100:
>>> I have again the issue of no space left on device while rebalancing
>>> (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable):
>>>
>>> mh@fan:~$ sudo btrfs balance start /mnt/fanbtr
>>> ERROR: error during balancing '/mnt/fanbtr': No space left on device
>>
>> It seems that, only when balancing all chunks, ENOSPC error happens.
>>
>> And did you run any other heavy IO at background?
>
> Not when running those last commands for the mailing list post.
>
>> BTW, is there any kernel log when the ENOSPC happens?
>
>> Would you please try the following commands to see which one caused the
>> problem?
>> And would you please provide the dmesg of them?
>>
>> # btrfs balance start -dprofiles=single /mnt/fanbtr
>> # btrfs balance start -mprofile=dup /mnt/fanbtr
>> # btrfs balance start -sprofile=dup /mnt/fanbtr
>
> I have attached the logs. I used logger(1) to have in syslog which
> command I executed, and I have piped the userspace's output to logger
> so that the syslog entries match the userspace output.

Didn't see the attachment though, seems to be filtered by maillist police.

>
> -mprofile gave an error message, I therefore tried -mprofiles, and
> -sprofiles wanted me to use the --force, so I did that as well.
>
> The three balance commands above all three finshed alright without
> running into ENOSPC, while running a plain balance (which is also part
> of the log) errors out every time.

Strange, but at least some small clue to chase.

>
> And, the -dprofiles=single log caused a number of INFOs regarding
> btrfs-cleaner and btrfa-balance processes gotten stuck for more than
> 120 seconds during the run.

That's not normal and it would be a bug.

>
> I now have a kworker and a btfs-transact kernel process taking most of
> one CPU core each, even after the userspace programs have terminated.
> Is there a way to find out what these threads are actually doing?

Did btrfs balance status gives any hint?

Thanks,
Qu

>
> Greetings
> Marc
>



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
       [not found]       ` <20160301065448.GJ2334@torres.zugschlus.de>
@ 2016-03-01  7:24         ` Qu Wenruo
  2016-03-01  8:13           ` Qu Wenruo
  2016-03-01 20:51           ` Duncan
  0 siblings, 2 replies; 81+ messages in thread
From: Qu Wenruo @ 2016-03-01  7:24 UTC (permalink / raw)
  To: Marc Haber; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3784 bytes --]



Marc Haber wrote on 2016/03/01 07:54 +0100:
> On Tue, Mar 01, 2016 at 08:45:21AM +0800, Qu Wenruo wrote:
>> Didn't see the attachment though, seems to be filtered by maillist police.
>
> Trying again.

OK, I got the attachment.

And, surprisingly, btrfs balance on data chunk works without problem, 
but it fails on plain btrfs balance command.

>
>>> I now have a kworker and a btfs-transact kernel process taking most of
>>> one CPU core each, even after the userspace programs have terminated.
>>> Is there a way to find out what these threads are actually doing?
>>
>> Did btrfs balance status gives any hint?
>
> It says 'No balance found on /mnt/fanbtr'. I do have a second btrfs on
> the box, which is acting up as well (it has a five digit number of
> snapshots, and deleting a single snapshot takes about five to ten
> minutes. I was planning to write another mailing list article once
> this balance issue is through).

I assume the large number of snapshots is related to the high CPU usage.
As so many snapshots will make btrfs take so much time to calculate its 
backref, and the backtrace seems to prove that.

I'd like to remove unused snapshots and keep the number of them to 4 
digits, as a workaround.

But still not sure if it's related to the ENOSPC problem.

It would provide great help if you can modify your kernel and add the 
following debug: (same as attachment)

------
 From f2cc7af0aea659a522b97d3776b719f14532bce9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: Tue, 1 Mar 2016 15:21:18 +0800
Subject: [PATCH] btrfs: debug patch

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
  fs/btrfs/extent-tree.c | 15 +++++++++++++--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 083783b..70b284b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9393,8 +9393,10 @@ int btrfs_can_relocate(struct btrfs_root *root, 
u64 bytenr)
  	block_group = btrfs_lookup_block_group(root->fs_info, bytenr);

  	/* odd, couldn't find the block group, leave it alone */
-	if (!block_group)
+	if (!block_group) {
+		pr_info("no such chunk: %llu\n", bytenr);
  		return -1;
+	}

  	min_free = btrfs_block_group_used(&block_group->item);

@@ -9419,6 +9421,11 @@ int btrfs_can_relocate(struct btrfs_root *root, 
u64 bytenr)
  	     space_info->bytes_pinned + space_info->bytes_readonly +
  	     min_free < space_info->total_bytes)) {
  		spin_unlock(&space_info->lock);
+		pr_info("no space: total:%llu, bg_len:%llu, used:%llu, reseved:%llu, 
pinned:%llu, ro:%llu, min_free:%llu\n",
+			space_info->total_bytes, block_group->key.offset,
+			space_info->bytes_used, space_info->bytes_reserved,
+			space_info->bytes_pinned, space_info->bytes_readonly,
+			min_free);
  		goto out;
  	}
  	spin_unlock(&space_info->lock);
@@ -9448,8 +9455,10 @@ int btrfs_can_relocate(struct btrfs_root *root, 
u64 bytenr)
  		 * this is just a balance, so if we were marked as full
  		 * we know there is no space for a new chunk
  		 */
-		if (full)
+		if (full) {
+			pr_info("space full\n");
  			goto out;
+		}

  		index = get_block_group_index(block_group);
  	}
@@ -9496,6 +9505,8 @@ int btrfs_can_relocate(struct btrfs_root *root, 
u64 bytenr)
  			ret = -1;
  		}
  	}
+	if (ret == -1)
+		pr_info("no new chunk allocatable\n");
  	mutex_unlock(&root->fs_info->chunk_mutex);
  	btrfs_end_transaction(trans, root);
  out:
-- 
2.7.2

------

Thanks,
Qu

>
> So I do not even know which filesystem is making two processes run in
> circles. I have noticed that the "btrfs-transact" process is still the
> same that started 24 hours ago, while the "kworker/u16:10" process
> occasionally gets replaced by a new one which runs in circles as well.
>
> Greetings
> Marc
>



[-- Attachment #2: 0001-btrfs-debug-patch.patch --]
[-- Type: text/x-patch, Size: 2026 bytes --]

>From f2cc7af0aea659a522b97d3776b719f14532bce9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: Tue, 1 Mar 2016 15:21:18 +0800
Subject: [PATCH] btrfs: debug patch

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 083783b..70b284b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9393,8 +9393,10 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr)
 	block_group = btrfs_lookup_block_group(root->fs_info, bytenr);
 
 	/* odd, couldn't find the block group, leave it alone */
-	if (!block_group)
+	if (!block_group) {
+		pr_info("no such chunk: %llu\n", bytenr);
 		return -1;
+	}
 
 	min_free = btrfs_block_group_used(&block_group->item);
 
@@ -9419,6 +9421,11 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr)
 	     space_info->bytes_pinned + space_info->bytes_readonly +
 	     min_free < space_info->total_bytes)) {
 		spin_unlock(&space_info->lock);
+		pr_info("no space: total:%llu, bg_len:%llu, used:%llu, reseved:%llu, pinned:%llu, ro:%llu, min_free:%llu\n",
+			space_info->total_bytes, block_group->key.offset,
+			space_info->bytes_used, space_info->bytes_reserved,
+			space_info->bytes_pinned, space_info->bytes_readonly,
+			min_free);
 		goto out;
 	}
 	spin_unlock(&space_info->lock);
@@ -9448,8 +9455,10 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr)
 		 * this is just a balance, so if we were marked as full
 		 * we know there is no space for a new chunk
 		 */
-		if (full)
+		if (full) {
+			pr_info("space full\n");
 			goto out;
+		}
 
 		index = get_block_group_index(block_group);
 	}
@@ -9496,6 +9505,8 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr)
 			ret = -1;
 		}
 	}
+	if (ret == -1)
+		pr_info("no new chunk allocatable\n");
 	mutex_unlock(&root->fs_info->chunk_mutex);
 	btrfs_end_transaction(trans, root);
 out:
-- 
2.7.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-01  7:24         ` Qu Wenruo
@ 2016-03-01  8:13           ` Qu Wenruo
       [not found]             ` <20160301161659.GR2334@torres.zugschlus.de>
  2016-03-01 20:51           ` Duncan
  1 sibling, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2016-03-01  8:13 UTC (permalink / raw)
  To: Marc Haber; +Cc: linux-btrfs



Qu Wenruo wrote on 2016/03/01 15:24 +0800:
>
>
> Marc Haber wrote on 2016/03/01 07:54 +0100:
>> On Tue, Mar 01, 2016 at 08:45:21AM +0800, Qu Wenruo wrote:
>>> Didn't see the attachment though, seems to be filtered by maillist
>>> police.
>>
>> Trying again.
>
> OK, I got the attachment.
>
> And, surprisingly, btrfs balance on data chunk works without problem,
> but it fails on plain btrfs balance command.
>
>>
>>>> I now have a kworker and a btfs-transact kernel process taking most of
>>>> one CPU core each, even after the userspace programs have terminated.
>>>> Is there a way to find out what these threads are actually doing?
>>>
>>> Did btrfs balance status gives any hint?
>>
>> It says 'No balance found on /mnt/fanbtr'. I do have a second btrfs on
>> the box, which is acting up as well (it has a five digit number of
>> snapshots, and deleting a single snapshot takes about five to ten
>> minutes. I was planning to write another mailing list article once
>> this balance issue is through).
>
> I assume the large number of snapshots is related to the high CPU usage.
> As so many snapshots will make btrfs take so much time to calculate its
> backref, and the backtrace seems to prove that.
>
> I'd like to remove unused snapshots and keep the number of them to 4
> digits, as a workaround.
>
> But still not sure if it's related to the ENOSPC problem.
>
> It would provide great help if you can modify your kernel and add the
> following debug: (same as attachment)
>
> ------
>  From f2cc7af0aea659a522b97d3776b719f14532bce9 Mon Sep 17 00:00:00 2001
> From: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: Tue, 1 Mar 2016 15:21:18 +0800
> Subject: [PATCH] btrfs: debug patch
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>   fs/btrfs/extent-tree.c | 15 +++++++++++++--
>   1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 083783b..70b284b 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9393,8 +9393,10 @@ int btrfs_can_relocate(struct btrfs_root *root,
> u64 bytenr)
>       block_group = btrfs_lookup_block_group(root->fs_info, bytenr);
>
>       /* odd, couldn't find the block group, leave it alone */
> -    if (!block_group)
> +    if (!block_group) {
> +        pr_info("no such chunk: %llu\n", bytenr);
>           return -1;
> +    }
>
>       min_free = btrfs_block_group_used(&block_group->item);
>
> @@ -9419,6 +9421,11 @@ int btrfs_can_relocate(struct btrfs_root *root,
> u64 bytenr)
>            space_info->bytes_pinned + space_info->bytes_readonly +
>            min_free < space_info->total_bytes)) {
>           spin_unlock(&space_info->lock);
> +        pr_info("no space: total:%llu, bg_len:%llu, used:%llu,
> reseved:%llu, pinned:%llu, ro:%llu, min_free:%llu\n",
> +            space_info->total_bytes, block_group->key.offset,
> +            space_info->bytes_used, space_info->bytes_reserved,
> +            space_info->bytes_pinned, space_info->bytes_readonly,
> +            min_free);
Oh, I'm sorry that the output is not necessary, it's better to use the 
newer patch:
https://patchwork.kernel.org/patch/8462881/

With the newer patch, you will need to use enospc_debug mount option to 
get the debug information.

Sorry for the inconvenience.

Thanks,
Qu

>           goto out;
>       }
>       spin_unlock(&space_info->lock);
> @@ -9448,8 +9455,10 @@ int btrfs_can_relocate(struct btrfs_root *root,
> u64 bytenr)
>            * this is just a balance, so if we were marked as full
>            * we know there is no space for a new chunk
>            */
> -        if (full)
> +        if (full) {
> +            pr_info("space full\n");
>               goto out;
> +        }
>
>           index = get_block_group_index(block_group);
>       }
> @@ -9496,6 +9505,8 @@ int btrfs_can_relocate(struct btrfs_root *root,
> u64 bytenr)
>               ret = -1;
>           }
>       }
> +    if (ret == -1)
> +        pr_info("no new chunk allocatable\n");
>       mutex_unlock(&root->fs_info->chunk_mutex);
>       btrfs_end_transaction(trans, root);
>   out:



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-01  7:24         ` Qu Wenruo
  2016-03-01  8:13           ` Qu Wenruo
@ 2016-03-01 20:51           ` Duncan
  2016-03-05 14:28             ` Marc Haber
  1 sibling, 1 reply; 81+ messages in thread
From: Duncan @ 2016-03-01 20:51 UTC (permalink / raw)
  To: linux-btrfs

Qu Wenruo posted on Tue, 01 Mar 2016 15:24:03 +0800 as excerpted:


> 
> Marc Haber wrote on 2016/03/01 07:54 +0100:
>> On Tue, Mar 01, 2016 at 08:45:21AM +0800, Qu Wenruo wrote:
>>> Didn't see the attachment though, seems to be filtered by maillist
>>> police.
>>
>> Trying again.
> 
> OK, I got the attachment.
> 
> And, surprisingly, btrfs balance on data chunk works without problem,
> but it fails on plain btrfs balance command.

There has been something bothering me about this thread that I wasn't 
quite pinning down, but here it is.

If you look at the btrfs fi df/usage numbers, data chunk total vs. used 
are very close to one another (113 GiB total, 112.77 GiB used, single 
profile, assuming GiB data chunks, that's only a fraction of a single 
data chunk unused), so balance would seem to be getting thru them just 
fine.

But there's a /huge/ spread between total vs. used metadata (32 GiB 
total, under 4 GiB used, clearly _many_ empty or nearly empty chunks), 
implying that has not been successfully balanced in quite some time, if 
ever.  So I'd surmise the problem is in metadata, not in data.

Which would explain why balancing data works fine, but a whole-filesystem 
balance doesn't, because it's getting stuck on the metadata, not the data.

Now the balance metadata filters include system as well, by default, and 
the -mprofiles=dup and -sprofiles=dup balances finished, apparently 
without error, which throws a wrench into my theory.

But while we have the btrfs fi df from before the attempt with the 
profiles filters, we don't have the same output from after.

If btrfs fi df still shows more than a GiB spread between metadata total 
and used /after/ the supposedly successful profiles filter runs, then 
obviously they're not balancing what they should be balancing, a bug 
right there, which an educated guess suggests if it's fixed, the metadata 
and possibly system balances will likely fail, due to whatever problem on 
the filesystem is keeping the full balance from completing as well.

Of course, if the post-filtered-balance btrfs fi df shows a metadata 
spread of under a gig (given 256 MiB metadata chunks, but dup, so 
possibly nearly a half-gig free, and the 512 MiB global reserve counts as 
unused metadata as well, adding another half-gig that's going to be 
reported free but is actually accounted for, yielding a spread of upto a 
gig even after successful balance), then the problem is elsewhere, but 
I'm guessing it's still going to be well over a gig and may still be the 
full 28+ gig spread, 32 gig total, under 4 gig used, indicating the 
metadata filtered balance actually didn't actually work at all.

Meanwhile, the metadata filters also include system, so while it's 
possible to balance system specifically, without (other) metadata, to my 
knowledge it's impossible to balance (other) metadata exclusively, 
without balancing system.

Which, now assuming we still have that huge metadata spread, means if the 
on-filesystem bug is in the system chunks, both system and metadata 
filtered balances *should* fail, while if it's in non-system metadata, a 
system filtered balance *should* succeed, while a metadata filtered 
balance *should* fail.

>>>> I now have a kworker and a btfs-transact kernel process taking most
>>>> of one CPU core each, even after the userspace programs have
>>>> terminated. Is there a way to find out what these threads are
>>>> actually doing?
>>>
>>> Did btrfs balance status gives any hint?
>>
>> It says 'No balance found on /mnt/fanbtr'. I do have a second btrfs on
>> the box, which is acting up as well (it has a five digit number of
>> snapshots, and deleting a single snapshot takes about five to ten
>> minutes. I was planning to write another mailing list article once this
>> balance issue is through).
> 
> I assume the large number of snapshots is related to the high CPU usage.
> As so many snapshots will make btrfs take so much time to calculate its
> backref, and the backtrace seems to prove that.
> 
> I'd like to remove unused snapshots and keep the number of them to 4
> digits, as a workaround.


I'll strongly second that recommendation.  Btrfs is known to have 
snapshot scaling issues at 10K snapshots and above.  My strong 
recommendation is to limit snapshots per filesystem to 3000 or less, with 
a target of 2000 per filesystem or less if possible, and an ideal of 1000 
per filesystem or less if it's practical to keep it to that, which it 
should be with thinning, if you're only snapshotting 1-2 subvolumes, but 
may not be if you're snapshotting more.

You can actually do scheduled snapshotting on a pretty tight schedule, 
say twice or 3X per hour (every 20-30 minutes), provided you have a good 
snapshot thinning program in place as well, thinning to say a snapshot an 
hour after 2-12 hours, every other hour after say 25 hours (giving you a 
bit over a day of at least hourly), every six hours after 8 days (so you 
have over a week of every other hour), twice a day after a couple weeks, 
daily after four weeks, weekly after 90 days, by which time you should 
have an off-system backup available to fall back on as well if you're so 
concerned about snapshots, such that after six months or a year you can 
delete all snapshots and finally free the space taken by the old 
snapshots.

Having posted the same suggestion and done the math multiple times, 
that's 250-500 snapshots per subvolume depending primarily on how fast 
you thin down in the early stages, which means 2-4 subvolumes of 
snapshotting per thousand snapshots total per filesystem, which means 
with a strict enough thinning program, you can snapshot upto 8 subvolumes 
per filesystem and stay under the 2000 total snapshots target.

By 3000 snapshots per filesystem, you'll be beginning to notice slowdowns 
in some btrfs maintenance commands if you're sensitive to it, tho it's 
still at least practical to work with, and by 10K, it's generally 
noticeable by all, at least once they thin down to 2K or so, as it's 
suddenly faster again!  Above 100K, some btrfs maintenance commands slow 
to a crawl and doing that sort of maintenance really becomes impractical 
enough that it's generally easier to backup what you need to and blow 
away the filesystem to start again with a new one, than it is to try to 
recover the existing filesystem to a workable state, given that 
maintenance can at that point take days to weeks.

So 5-digits of snapshots on a filesystem is definitely well outside of 
the recommended range, to the point that in some cases, particularly 
approaching 6-digits of snapshots, it'll be more practical to simply 
ditch the filesystem and start over, than to try to work with it any 
longer.  Just don't do it; setup your thinning schedule so your peak is 
3000 snapshots per filesystem or under, and you won't have that problem 
to worry about. =:^)

Oh, and btrfs quota management exacerbates the scaling issues 
dramatically.  If you're using btrfs quotas, either half the max 
snapshots per filesystem recommendations, or reconsider whether you need 
quota functionality and turn it off, eliminating the existing quota data, 
if you don't really need that functionality. =:^(

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-02-27 21:14 Again, no space left on device while rebalancing and recipe doesnt work Marc Haber
  2016-02-27 23:15 ` Martin Steigerwald
  2016-02-29  1:56 ` Qu Wenruo
@ 2016-03-03  0:28 ` Dāvis Mosāns
  2016-03-03  3:42   ` Qu Wenruo
                     ` (2 more replies)
  2016-03-27  8:41 ` Current state of old filesystem " Marc Haber
  2016-04-01 13:59 ` Again, no space left on device while rebalancing and recipe doesnt work Marc Haber
  4 siblings, 3 replies; 81+ messages in thread
From: Dāvis Mosāns @ 2016-03-03  0:28 UTC (permalink / raw)
  To: Marc Haber; +Cc: Btrfs BTRFS

2016-02-27 23:14 GMT+02:00 Marc Haber <mh+linux-btrfs@zugschlus.de>:
> Hi,
>
> I have again the issue of no space left on device while rebalancing
> (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable):
>

I've same issue, 4.4.3 kernel on Arch Linux

$ sudo btrfs fi show /mnt/fs/
Label: 'fs'  uuid: a3c66d25-2c25-40e5-a827-5f7e5208e235
        Total devices 1 FS bytes used 396.94GiB
        devid    1 size 435.76GiB used 435.76GiB path /dev/sdi2

$ sudo btrfs fi df /mnt/fs/
Data, single: total=416.70GiB, used=390.62GiB
System, DUP: total=32.00MiB, used=96.00KiB
Metadata, DUP: total=9.50GiB, used=6.32GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

$ sudo btrfs fi usage /mnt/fs/
Overall:
    Device size:                 435.76GiB
    Device allocated:            435.76GiB
    Device unallocated:            1.00MiB
    Device missing:                  0.00B
    Used:                        403.26GiB
    Free (estimated):             26.07GiB      (min: 26.07GiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,single: Size:416.70GiB, Used:390.62GiB
   /dev/sdi2     416.70GiB

Metadata,DUP: Size:9.50GiB, Used:6.32GiB
   /dev/sdi2      19.00GiB

System,DUP: Size:32.00MiB, Used:96.00KiB
   /dev/sdi2      64.00MiB

Unallocated:
   /dev/sdi2       1.00MiB


$ sudo btrfs balance start -v /mnt/fs/
Dumping filters: flags 0x7, state 0x0, force is off
  DATA (flags 0x0): balancing
  METADATA (flags 0x0): balancing
  SYSTEM (flags 0x0): balancing
ERROR: error during balancing '/mnt/fs/': No space left on device
There may be more info in syslog - try dmesg | tail

$ sudo btrfs balance start -dusage=34 /mnt/fs/
Done, had to relocate 0 out of 438 chunks
$ sudo btrfs balance start -dusage=35 /mnt/fs/
ERROR: error during balancing '/mnt/fs/': No space left on device


$ sudo btrfs balance start -musage=0 /mnt/fs/
Done, had to relocate 0 out of 438 chunks
$ sudo btrfs balance start -musage=1 /mnt/fs/
ERROR: error during balancing '/mnt/fs/': No space left on device


$ sudo btrfs balance start -dprofiles=single /mnt/fs
ERROR: error during balancing '/mnt/fs': No space left on device

$ sudo btrfs balance start -mprofiles=dup /mnt/fs
Done, had to relocate 20 out of 438 chunks

$ sudo btrfs balance start -sprofiles=dup --force /mnt/fs
Done, had to relocate 1 out of 433 chunks


$ sudo btrfs fi df /mnt/fs/
Data, single: total=416.70GiB, used=390.62GiB
System, DUP: total=32.00MiB, used=96.00KiB
Metadata, DUP: total=7.00GiB, used=6.31GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

$  sudo btrfs fi usage /mnt/fs/
Overall:
    Device size:                 435.76GiB
    Device allocated:            430.76GiB
    Device unallocated:            5.00GiB
    Device missing:                  0.00B
    Used:                        403.25GiB
    Free (estimated):             31.07GiB      (min: 28.57GiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,single: Size:416.70GiB, Used:390.62GiB
   /dev/sdi2     416.70GiB

Metadata,DUP: Size:7.00GiB, Used:6.31GiB
   /dev/sdi2      14.00GiB

System,DUP: Size:32.00MiB, Used:96.00KiB
   /dev/sdi2      64.00MiB

Unallocated:
   /dev/sdi2       5.00GiB

now balance works

$ sudo btrfs balance start -m /mnt/fs/
Done, had to relocate 15 out of 433 chunks

$ sudo btrfs balance start /mnt/fs/

it's still going but most likley it will finish

$ sudo btrfs balance status /mnt/fs/
Balance on '/mnt/fs/' is running
26 out of about 431 chunks balanced (27 considered),  94% left

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
       [not found]             ` <20160301161659.GR2334@torres.zugschlus.de>
@ 2016-03-03  2:02               ` Qu Wenruo
  0 siblings, 0 replies; 81+ messages in thread
From: Qu Wenruo @ 2016-03-03  2:02 UTC (permalink / raw)
  To: Marc Haber; +Cc: linux-btrfs

Thanks for the output.

At least for mprofile enospc error, the problem itself is very 
straightforward, just unable to alloc tree block.

I'll check the codes to see if we can improve it, by finding out why we 
can't alloc a new chunk to resolve the problem.

But I'm still a little concerned about the dprofile case, as this time, 
dprofile doesn't trigger a ENOSPC and my debug info is not triggered.

Thanks,
Qu

Marc Haber wrote on 2016/03/01 17:16 +0100:
> Hi,
>
> On Tue, Mar 01, 2016 at 04:13:35PM +0800, Qu Wenruo wrote:
>> Oh, I'm sorry that the output is not necessary, it's better to use the newer
>> patch:
>> https://patchwork.kernel.org/patch/8462881/
>>
>> With the newer patch, you will need to use enospc_debug mount option to get
>> the debug information.
>
> I'll copy the log this time in the body of the message so that it gets
> through to the list. Let me know if you'd prefer an attachment.
>
> This time, I didn't see the busy kernel threads, and I now see ENOSPC
> during the mprofiles part of the manual balance.
>
> Hope this helps.
>
> Greetings
> Marc
>



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-03  0:28 ` Dāvis Mosāns
@ 2016-03-03  3:42   ` Qu Wenruo
  2016-03-03  4:57   ` Duncan
  2016-03-05 14:39   ` Marc Haber
  2 siblings, 0 replies; 81+ messages in thread
From: Qu Wenruo @ 2016-03-03  3:42 UTC (permalink / raw)
  To: Dāvis Mosāns, Marc Haber; +Cc: Btrfs BTRFS



Dāvis Mosāns wrote on 2016/03/03 02:28 +0200:
> 2016-02-27 23:14 GMT+02:00 Marc Haber <mh+linux-btrfs@zugschlus.de>:
>> Hi,
>>
>> I have again the issue of no space left on device while rebalancing
>> (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable):
>>
>
> I've same issue, 4.4.3 kernel on Arch Linux
>
> $ sudo btrfs fi show /mnt/fs/
> Label: 'fs'  uuid: a3c66d25-2c25-40e5-a827-5f7e5208e235
>          Total devices 1 FS bytes used 396.94GiB
>          devid    1 size 435.76GiB used 435.76GiB path /dev/sdi2
>
> $ sudo btrfs fi df /mnt/fs/
> Data, single: total=416.70GiB, used=390.62GiB
> System, DUP: total=32.00MiB, used=96.00KiB
> Metadata, DUP: total=9.50GiB, used=6.32GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> $ sudo btrfs fi usage /mnt/fs/
> Overall:
>      Device size:                 435.76GiB
>      Device allocated:            435.76GiB
>      Device unallocated:            1.00MiB

Not quite the same with the problem with OP.
In your case, only 1M is not allocated, which is below minimal chunk 
size 16M.

So, you're really out of unallocated space.
>      Device missing:                  0.00B
>      Used:                        403.26GiB
>      Free (estimated):             26.07GiB      (min: 26.07GiB)
>      Data ratio:                       1.00
>      Metadata ratio:                   2.00
>      Global reserve:              512.00MiB      (used: 0.00B)
>
> Data,single: Size:416.70GiB, Used:390.62GiB
>     /dev/sdi2     416.70GiB
>
> Metadata,DUP: Size:9.50GiB, Used:6.32GiB
>     /dev/sdi2      19.00GiB

But considering your space usage, I didn't see any need to balance though.

You have over 3G metadata space, and about 20G data.
No obvious meta/data unbalance.

>
> System,DUP: Size:32.00MiB, Used:96.00KiB
>     /dev/sdi2      64.00MiB
>
> Unallocated:
>     /dev/sdi2       1.00MiB
>
>
> $ sudo btrfs balance start -v /mnt/fs/
> Dumping filters: flags 0x7, state 0x0, force is off
>    DATA (flags 0x0): balancing
>    METADATA (flags 0x0): balancing
>    SYSTEM (flags 0x0): balancing
> ERROR: error during balancing '/mnt/fs/': No space left on device
> There may be more info in syslog - try dmesg | tail
>
> $ sudo btrfs balance start -dusage=34 /mnt/fs/
> Done, had to relocate 0 out of 438 chunks
> $ sudo btrfs balance start -dusage=35 /mnt/fs/
> ERROR: error during balancing '/mnt/fs/': No space left on device
>
>
> $ sudo btrfs balance start -musage=0 /mnt/fs/
> Done, had to relocate 0 out of 438 chunks
> $ sudo btrfs balance start -musage=1 /mnt/fs/
> ERROR: error during balancing '/mnt/fs/': No space left on device
>
>
> $ sudo btrfs balance start -dprofiles=single /mnt/fs
> ERROR: error during balancing '/mnt/fs': No space left on device
>
> $ sudo btrfs balance start -mprofiles=dup /mnt/fs
> Done, had to relocate 20 out of 438 chunks
>
> $ sudo btrfs balance start -sprofiles=dup --force /mnt/fs
> Done, had to relocate 1 out of 433 chunks
>
>
> $ sudo btrfs fi df /mnt/fs/
> Data, single: total=416.70GiB, used=390.62GiB
> System, DUP: total=32.00MiB, used=96.00KiB
> Metadata, DUP: total=7.00GiB, used=6.31GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> $  sudo btrfs fi usage /mnt/fs/
> Overall:
>      Device size:                 435.76GiB
>      Device allocated:            430.76GiB
>      Device unallocated:            5.00GiB

After balancing data and metadata, you have space for at least 5 chunks.
So you should be OK to continue balance.

>      Device missing:                  0.00B
>      Used:                        403.25GiB
>      Free (estimated):             31.07GiB      (min: 28.57GiB)
>      Data ratio:                       1.00
>      Metadata ratio:                   2.00
>      Global reserve:              512.00MiB      (used: 0.00B)
>
> Data,single: Size:416.70GiB, Used:390.62GiB
>     /dev/sdi2     416.70GiB
>
> Metadata,DUP: Size:7.00GiB, Used:6.31GiB
>     /dev/sdi2      14.00GiB
>
> System,DUP: Size:32.00MiB, Used:96.00KiB
>     /dev/sdi2      64.00MiB
>
> Unallocated:
>     /dev/sdi2       5.00GiB
>
> now balance works
>
> $ sudo btrfs balance start -m /mnt/fs/
> Done, had to relocate 15 out of 433 chunks
>
> $ sudo btrfs balance start /mnt/fs/
>
> it's still going but most likley it will finish
>
> $ sudo btrfs balance status /mnt/fs/
> Balance on '/mnt/fs/' is running
> 26 out of about 431 chunks balanced (27 considered),  94% left
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-03  0:28 ` Dāvis Mosāns
  2016-03-03  3:42   ` Qu Wenruo
@ 2016-03-03  4:57   ` Duncan
  2016-03-03 15:39     ` Dāvis Mosāns
  2016-03-05 14:39   ` Marc Haber
  2 siblings, 1 reply; 81+ messages in thread
From: Duncan @ 2016-03-03  4:57 UTC (permalink / raw)
  To: linux-btrfs

Dāvis Mosāns posted on Thu, 03 Mar 2016 02:28:36 +0200 as excerpted:

> 2016-02-27 23:14 GMT+02:00 Marc Haber <mh+linux-btrfs@zugschlus.de>:
>> Hi,
>>
>> I have again the issue of no space left on device while rebalancing
>> (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable):
>>
>>
> I've same issue, 4.4.3 kernel on Arch Linux
> 
> $ sudo btrfs fi show /mnt/fs/
> Label: 'fs'  uuid: a3c66d25-2c25-40e5-a827-5f7e5208e235
>         Total devices 1 FS bytes used 396.94GiB
>         devid    1 size 435.76GiB used 435.76GiB path /dev/sdi2
> 

> $ sudo btrfs fi usage /mnt/fs/
> Overall:
>     Device size:                 435.76GiB
>     Device allocated:            435.76GiB
>     Device unallocated:            1.00MiB

[Snipped the longer story, but the summary is that at first a full 
balance would fail, but after jumping thru some balance filtering hoops, 
you cleared enough space to unallocated that you could then do a full 
balance and it was working, tho it was still running.]


You're issue isn't the same, because all your space was allocated, 
leaving only 1 MiB unallocated, which isn't normally enough to allocate a 
new chunk to rewrite the data or metadata from the old chunks into.

That's a known issue, with known workarounds as dealt with in the FAQ. 

Ideally, you catch it while you have at least a gig or two of unallocated 
space (apparently in some instances with huge filesystems, you may need 
up to 10 GiB free, as data chunk allocations can be larger than the 
nominal 1 GiB on really large filesystems, but yours obviously isn't 
/that/ large) thereby giving you enough room to allocate at least one new 
chunk in ordered to rewrite data from old chunks while consolidating.

You only had 1 MiB of unallocated space in that first report, so you're a 
bit lucky that you didn't have to temporarily add a second device of at 
least several gigs in size, at least long enough to rewrite a few chunks 
and clear some additional space.  (FWIW, a few GiB USB thumb drive can be 
used, or even a loopback file on tmpfs, if you have enough memory for it 
and your system and power is stable enough that you're willing to take 
the gamble of not having an unscheduled reboot in the middle and thus 
losing your loopback, before you're able to btrfs device delete it again 
after you've completed at least enough of the balance to have enough room 
to do so.)  That's what some folks end up having to do if all space is 
allocated and they don't even have enough left to allocate even one more 
new chunk in ordered to do the balance and consolidate the partially-free 
chunks, thereby freeing the space they were using.

The issue of this thread is quite different, as they've triggered a bug 
that the devs have been trying to track down for quite awhile, where even 
with /lots/ of unallocated free space, btrfs will due to the bug refuse 
to allocate more, resulting in ENOSPC errors even when there's tens or 
hundreds of GiB of unallocated space, where you had only that 1 MiB.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-03  4:57   ` Duncan
@ 2016-03-03 15:39     ` Dāvis Mosāns
  2016-03-04 12:31       ` Duncan
  0 siblings, 1 reply; 81+ messages in thread
From: Dāvis Mosāns @ 2016-03-03 15:39 UTC (permalink / raw)
  To: Duncan; +Cc: Btrfs BTRFS

2016-03-03 6:57 GMT+02:00 Duncan <1i5t5.duncan@cox.net>:
>
> You're issue isn't the same, because all your space was allocated,
> leaving only 1 MiB unallocated, which isn't normally enough to allocate a
> new chunk to rewrite the data or metadata from the old chunks into.
>
> That's a known issue, with known workarounds as dealt with in the FAQ.
>

Ah, thanks, well it was surprising for me that balance failed with out of
space when both data and metadata had not all been used and I thought
it could just use space from those...

especially as from FAQ:
> If there is a lot of allocated but unused data or metadata chunks,
> a balance may reclaim some of that allocated space. This is the
> main reason for running a balance on a single-device filesystem.

so I think regular balance should be smart enough that it could
solve this on own and wouldn't need to specify any options.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-03 15:39     ` Dāvis Mosāns
@ 2016-03-04 12:31       ` Duncan
  2016-03-04 12:35         ` Hugo Mills
  2016-03-27 12:10         ` Martin Steigerwald
  0 siblings, 2 replies; 81+ messages in thread
From: Duncan @ 2016-03-04 12:31 UTC (permalink / raw)
  To: linux-btrfs

Dāvis Mosāns posted on Thu, 03 Mar 2016 17:39:12 +0200 as excerpted:

> 2016-03-03 6:57 GMT+02:00 Duncan <1i5t5.duncan@cox.net>:
>>
>> You're issue isn't the same, because all your space was allocated,
>> leaving only 1 MiB unallocated, which isn't normally enough to allocate
>> a new chunk to rewrite the data or metadata from the old chunks into.
>>
>> That's a known issue, with known workarounds as dealt with in the FAQ.
>>
>>
> Ah, thanks, well it was surprising for me that balance failed with out
> of space when both data and metadata had not all been used and I thought
> it could just use space from those...
> 
> especially as from FAQ:
>> If there is a lot of allocated but unused data or metadata chunks,
>> a balance may reclaim some of that allocated space. This is the main
>> reason for running a balance on a single-device filesystem.
> 
> so I think regular balance should be smart enough that it could solve
> this on own and wouldn't need to specify any options.

Well it does solve the problem on its own... to the extent that it 
eliminates empty chunks (kernel 3.17+, it didn't before that).  But if 
there's even a single 4 KiB file block used in the (nominal 1 GiB sized 
data) chunk, it's no longer empty and thus not eliminated by the empty 
chunk cleanup routines.

Additionally, balance, which was originally called the restriper, by 
definition, must have enough space to create at least one empty chunk in 
ordered to copy the data from existing chunks into, such that it can 
consolidate that data into fewer new chunks when the old ones were partly 
empty.  If there's not enough unallocated space left to write even one 
chunk, balance can't do its thing, because there's nowhere to create the 
new chunk it needs to be able to copy over the data from the old chunk.

With nominal 1 GiB data chunk size (up to 10 GiB in some instances if the 
filesystem is large enough), that means you need at least 1 GiB of free 
unallocated space in ordered to have room to create that single chunk 
that starts the rewrite process off.  Without it, you're stuck, tho there 
are workarounds like trying to balance the smaller (256 MiB nominal) 
metadata chunks first, hoping that frees the minimum 1 GiB space needed 
for a data chunk, or temporarily adding another device a few GiB in size 
to the filesystem, to give it somewhere to write the new chunk to so it 
can start off the rewrite and shrinking process.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-04 12:31       ` Duncan
@ 2016-03-04 12:35         ` Hugo Mills
  2016-03-27 12:10         ` Martin Steigerwald
  1 sibling, 0 replies; 81+ messages in thread
From: Hugo Mills @ 2016-03-04 12:35 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2947 bytes --]

On Fri, Mar 04, 2016 at 12:31:44PM +0000, Duncan wrote:
> Dāvis Mosāns posted on Thu, 03 Mar 2016 17:39:12 +0200 as excerpted:
> 
> > 2016-03-03 6:57 GMT+02:00 Duncan <1i5t5.duncan@cox.net>:
> >>
> >> You're issue isn't the same, because all your space was allocated,
> >> leaving only 1 MiB unallocated, which isn't normally enough to allocate
> >> a new chunk to rewrite the data or metadata from the old chunks into.
> >>
> >> That's a known issue, with known workarounds as dealt with in the FAQ.
> >>
> >>
> > Ah, thanks, well it was surprising for me that balance failed with out
> > of space when both data and metadata had not all been used and I thought
> > it could just use space from those...
> > 
> > especially as from FAQ:
> >> If there is a lot of allocated but unused data or metadata chunks,
> >> a balance may reclaim some of that allocated space. This is the main
> >> reason for running a balance on a single-device filesystem.
> > 
> > so I think regular balance should be smart enough that it could solve
> > this on own and wouldn't need to specify any options.
> 
> Well it does solve the problem on its own... to the extent that it 
> eliminates empty chunks (kernel 3.17+, it didn't before that).  But if 
> there's even a single 4 KiB file block used in the (nominal 1 GiB sized 
> data) chunk, it's no longer empty and thus not eliminated by the empty 
> chunk cleanup routines.
> 
> Additionally, balance, which was originally called the restriper,

   Nope, balance was always called balance. The restriper was the
balance feature that's now called "convert". :)

   Hugo.

 by 
> definition, must have enough space to create at least one empty chunk in 
> ordered to copy the data from existing chunks into, such that it can 
> consolidate that data into fewer new chunks when the old ones were partly 
> empty.  If there's not enough unallocated space left to write even one 
> chunk, balance can't do its thing, because there's nowhere to create the 
> new chunk it needs to be able to copy over the data from the old chunk.
> 
> With nominal 1 GiB data chunk size (up to 10 GiB in some instances if the 
> filesystem is large enough), that means you need at least 1 GiB of free 
> unallocated space in ordered to have room to create that single chunk 
> that starts the rewrite process off.  Without it, you're stuck, tho there 
> are workarounds like trying to balance the smaller (256 MiB nominal) 
> metadata chunks first, hoping that frees the minimum 1 GiB space needed 
> for a data chunk, or temporarily adding another device a few GiB in size 
> to the filesystem, to give it somewhere to write the new chunk to so it 
> can start off the rewrite and shrinking process.
> 

-- 
Hugo Mills             | Darkling's First Law of Filesystems:
hugo@... carfax.org.uk | The user hates their data
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-01 20:51           ` Duncan
@ 2016-03-05 14:28             ` Marc Haber
  0 siblings, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-05 14:28 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have not seen this message coming back to the mailing list. Was it
again too long?

I have pastebinned the log at http://paste.debian.net/412118/

On Tue, Mar 01, 2016 at 08:51:32PM +0000, Duncan wrote:
> There has been something bothering me about this thread that I wasn't 
> quite pinning down, but here it is.
> 
> If you look at the btrfs fi df/usage numbers, data chunk total vs. used 
> are very close to one another (113 GiB total, 112.77 GiB used, single 
> profile, assuming GiB data chunks, that's only a fraction of a single 
> data chunk unused), so balance would seem to be getting thru them just 
> fine.

Where would you see those numbers? I have those, pre-balance:

Mar  2 20:28:01 fan root: Data, single: total=77.00GiB, used=76.35GiB
Mar  2 20:28:01 fan root: System, DUP: total=32.00MiB, used=48.00KiB
Mar  2 20:28:01 fan root: Metadata, DUP: total=86.50GiB, used=2.11GiB
Mar  2 20:28:01 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B

> But there's a /huge/ spread between total vs. used metadata (32 GiB 
> total, under 4 GiB used, clearly _many_ empty or nearly empty chunks), 
> implying that has not been successfully balanced in quite some time, if 
> ever.

This is possible, yes.

>   So I'd surmise the problem is in metadata, not in data.
> 
> Which would explain why balancing data works fine, but a whole-filesystem 
> balance doesn't, because it's getting stuck on the metadata, not the data.
> 
> Now the balance metadata filters include system as well, by default, and 
> the -mprofiles=dup and -sprofiles=dup balances finished, apparently 
> without error, which throws a wrench into my theory.

Also finishes without changing things, post-balance:
Mar  2 21:55:37 fan root: Data, single: total=77.00GiB, used=76.36GiB
Mar  2 21:55:37 fan root: System, DUP: total=32.00MiB, used=80.00KiB
Mar  2 21:55:37 fan root: Metadata, DUP: total=99.00GiB, used=2.11GiB
Mar  2 21:55:37 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B

Wait, Metadata used actually _grew_???

> But while we have the btrfs fi df from before the attempt with the 
> profiles filters, we don't have the same output from after.
s
We now have everything. New log attached.

> > I'd like to remove unused snapshots and keep the number of them to 4
> > digits, as a workaround.
> 
> I'll strongly second that recommendation.  Btrfs is known to have 
> snapshot scaling issues at 10K snapshots and above.  My strong 
> recommendation is to limit snapshots per filesystem to 3000 or less, with 
> a target of 2000 per filesystem or less if possible, and an ideal of 1000 
> per filesystem or less if it's practical to keep it to that, which it 
> should be with thinning, if you're only snapshotting 1-2 subvolumes, but 
> may not be if you're snapshotting more.

I'm snapshotting /home every 10 minutes, the filesystem that I have
been posting logs from has about 400 snapshots, and snapshot cleanup
works fine. The slow snapshot removal is a different filesystem on the
same host which is on a rotating rust HDD, and is much bigger.

> By 3000 snapshots per filesystem, you'll be beginning to notice slowdowns 
> in some btrfs maintenance commands if you're sensitive to it, tho it's 
> still at least practical to work with, and by 10K, it's generally 
> noticeable by all, at least once they thin down to 2K or so, as it's 
> suddenly faster again!  Above 100K, some btrfs maintenance commands slow 
> to a crawl and doing that sort of maintenance really becomes impractical 
> enough that it's generally easier to backup what you need to and blow 
> away the filesystem to start again with a new one, than it is to try to 
> recover the existing filesystem to a workable state, given that 
> maintenance can at that point take days to weeks.

Ouch. This shold not be the case, or btrfs subvolume snapshot should
at least emit a warning. It is not good that it is so easy to get a
filesystem into a state this bad.

> So 5-digits of snapshots on a filesystem is definitely well outside of 
> the recommended range, to the point that in some cases, particularly 
> approaching 6-digits of snapshots, it'll be more practical to simply 
> ditch the filesystem and start over, than to try to work with it any 
> longer.  Just don't do it; setup your thinning schedule so your peak is 
> 3000 snapshots per filesystem or under, and you won't have that problem 
> to worry about. =:^)

That needs to be documented prominently. Ths ZFS fanbois will love that.

> Oh, and btrfs quota management exacerbates the scaling issues 
> dramatically.  If you're using btrfs quotas

Am not, thankfully.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-03  0:28 ` Dāvis Mosāns
  2016-03-03  3:42   ` Qu Wenruo
  2016-03-03  4:57   ` Duncan
@ 2016-03-05 14:39   ` Marc Haber
  2016-03-05 19:34     ` Chris Murphy
  2 siblings, 1 reply; 81+ messages in thread
From: Marc Haber @ 2016-03-05 14:39 UTC (permalink / raw)
  To: Btrfs BTRFS

On Thu, Mar 03, 2016 at 02:28:36AM +0200, Dāvis Mosāns wrote:
> I've same issue, 4.4.3 kernel on Arch Linux
> 
> $ sudo btrfs fi show /mnt/fs/
> Label: 'fs'  uuid: a3c66d25-2c25-40e5-a827-5f7e5208e235
>         Total devices 1 FS bytes used 396.94GiB
>         devid    1 size 435.76GiB used 435.76GiB path /dev/sdi2
> 
> $ sudo btrfs fi df /mnt/fs/
> Data, single: total=416.70GiB, used=390.62GiB
> System, DUP: total=32.00MiB, used=96.00KiB
> Metadata, DUP: total=9.50GiB, used=6.32GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> $ sudo btrfs fi usage /mnt/fs/
> Overall:
>     Device size:                 435.76GiB
>     Device allocated:            435.76GiB
>     Device unallocated:            1.00MiB
>     Device missing:                  0.00B
>     Used:                        403.26GiB
>     Free (estimated):             26.07GiB      (min: 26.07GiB)
>     Data ratio:                       1.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 0.00B)
> 
> Data,single: Size:416.70GiB, Used:390.62GiB
>    /dev/sdi2     416.70GiB
> 
> Metadata,DUP: Size:9.50GiB, Used:6.32GiB
>    /dev/sdi2      19.00GiB
> 
> System,DUP: Size:32.00MiB, Used:96.00KiB
>    /dev/sdi2      64.00MiB
> 
> Unallocated:
>    /dev/sdi2       1.00MiB

http://paste.ubuntu.com/15292589/ has another log of mine with btrfs
fi usage calls as well, just in case this helps.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-05 14:39   ` Marc Haber
@ 2016-03-05 19:34     ` Chris Murphy
  2016-03-05 20:09       ` Marc Haber
                         ` (3 more replies)
  0 siblings, 4 replies; 81+ messages in thread
From: Chris Murphy @ 2016-03-05 19:34 UTC (permalink / raw)
  To: Marc Haber; +Cc: Btrfs BTRFS

I can't tell what this btrfs-balance script is doing because not every
btrfs balance command is in the log. It may be doing something not
advisable or suboptimal or unexpected that along with some other bug
is causing this to happen

Metadata,DUP: Size:107.00GiB, Used:2.11GiB

I'd try to use -musage filter alone, in whatever increments work. So
try 0. Then 5. If 5 fails, try 2. Increment until size is not much
more than 2-3x used.

Something is happening with the usage of this file system that's out
of the ordinary. This is the first time I've seen such a large amount
of unused metadata allocation. And then for it not only fail to
balance, but for the allocation amount to increase is a first. So
understanding the usage is important to figuring out what's happening.
I'd file a bug and include as much information on how the fs got into
this state as possible. And also if possible make a btrfs-image using
the proper flags to blot out the filenames for privacy. And what
btrfs-progs tools were used to create this file system. Etc.

The alternative if this can't be fixed, is to recreate the filesystem
because there's no practical way yet to migrate so many snapshots to a
new file system.


Chris Murphy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-05 19:34     ` Chris Murphy
@ 2016-03-05 20:09       ` Marc Haber
  2016-03-06  6:43         ` Duncan
  2016-03-07  8:56         ` Marc Haber
  2016-03-12 19:57       ` Marc Haber
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-05 20:09 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
> I can't tell what this btrfs-balance script is doing because not every
> btrfs balance command is in the log.

It is. I wrote it to produce reproducible logs.

[1/499]mh@fan:~$ cat btrfs-balance
#!/bin/bash

FS="/mnt/fanbtr"

showdf() {
logger -- btrfs fi df $FS
btrfs fi df $FS 2>&1 | logger
logger -- btrfs fi show /
btrfs fi show / | logger
logger -- btrfs fi usage /
btrfs fi usage / | logger
}

logger -- BEGIN btrfs-balance script
showdf

btrfs balance start  $FS 2>&1 | logger
showdf

logger -- BEGIN btrfs balance start -dprofiles=single $FS
btrfs balance start -dprofiles=single $FS 2>&1 | logger
showdf

logger -- BEGIN btrfs balance start -mprofiles=dup $FS
btrfs balance start -mprofiles=dup $FS 2>&1 | logger
showdf

logger -- BEGIN btrfs balance start --force -sprofiles=dup $FS
btrfs balance start --force -sprofiles=dup $FS 2>&1 | logger
showdf

logger -- BEGIN btrfs balance start $FS
btrfs balance start  $FS 2>&1 | logger
showdf

logger -- END btrfs-balance script
[2/500]mh@fan:~$ 

I see. The logger -- BEGIN is missing for the very first command. My
bad.

> Something is happening with the usage of this file system that's out
> of the ordinary. This is the first time I've seen such a large amount
> of unused metadata allocation. And then for it not only fail to
> balance, but for the allocation amount to increase is a first.

It is just a root filesystem of a workstation running Debian Linux, in
daily use, with daily snapshots of the system, and
ten-minute-increment snapshots of /home, with no cleanup happening for
a few months.

>  So understanding the usage is important to figuring out what's
>  happening. I'd file a bug and include as much information on how the
>  fs got into this state as possible. And also if possible make a
>  btrfs-image using the proper flags to blot out the filenames for
>  privacy.

That would btrfs-image -s?

> And what btrfs-progs tools were used to create this file system. Etc.

The file system is at least two years old, I do not remember, which
version of btrfs-tools was in Debian unstable back then. Is this
information somewhere in the filesystem label? How do I obtain this one?

> The alternative if this can't be fixed, is to recreate the filesystem
> because there's no practical way yet to migrate so many snapshots to a
> new file system.

I am now back to a mid three-digit number of snapshots.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-05 20:09       ` Marc Haber
@ 2016-03-06  6:43         ` Duncan
  2016-03-06 20:27           ` Chris Murphy
  2016-03-07  8:30           ` Marc Haber
  2016-03-07  8:56         ` Marc Haber
  1 sibling, 2 replies; 81+ messages in thread
From: Duncan @ 2016-03-06  6:43 UTC (permalink / raw)
  To: linux-btrfs

Marc Haber posted on Sat, 05 Mar 2016 21:09:09 +0100 as excerpted:

> On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:

>> Something is happening with the usage of this file system that's out of
>> the ordinary. This is the first time I've seen such a large amount of
>> unused metadata allocation. And then for it not only fail to balance,
>> but for the allocation amount to increase is a first.
> 
> It is just a root filesystem of a workstation running Debian Linux, in
> daily use, with daily snapshots of the system, and ten-minute-increment
> snapshots of /home, with no cleanup happening for a few months.
> 
>>  So understanding the usage is important to figuring out what's
>>  happening. I'd file a bug and include as much information on how the
>>  fs got into this state as possible. And also if possible make a
>>  btrfs-image using the proper flags to blot out the filenames for
>>  privacy.

Now you're homing in on what I picked up on.  There's something very 
funky about that metadata, 100+ GiB of metadata total, only just over 2 
GiB metadata used, and attempts to balance it don't help with the spread 
between the two at all, only increasing the total metadata, if anything, 
but still seem to complete without error.  There's gotta be some sort of 
bug going on there, and I'd /bet/ it's the same one that's keeping full 
balances from working, as well.


OK, this question's out of left field, but it's the only thing (well, 
/almost/ only, see below) I've seen do anything /remotely/ like that:

Was the filesystem originally created as a convert from ext*, using btrfs-
convert?  If so, was the ext2_saved or whatever subvolume removed, and a 
successful defrag and balance completed at that time?

Because as I said, problems due to faulty conversion from ext* have been 
the one thing repeatedly reported to trigger balance behavior and spreads 
between total and used that balance doesn't fix, like this.


Tho AFAIK there was in addition a very narrow timeframe in which a bug in 
mkfs.btrfs would create invalid btrfs'.  That was with btrfs-progs 4.1.1, 
released in July 2015, with an urgent bugfix release 4.1.2 in the same 
month to fix the problem, so the timeframe was days or weeks.  Btrfs 
created with that buggy mkfs.btrfs were known to have some pretty wild 
behavior as well, with the recommendation being to simply blow them up 
and recreate them with a mkfs.btrfs from a btrfs-progs without the bug, 
as the btrfs created by the bugged version were simply too bugged out to 
reliably fix, and might well appear to work fine for awhile, until BOOM!  
If there's a chance the filesystem in question was created by that bugged 
mkfs.btrfs, don't even try to fix it, just get what you can off it and 
recreate with a mkfs.btrfs without that bug, ASAP.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-06  6:43         ` Duncan
@ 2016-03-06 20:27           ` Chris Murphy
  2016-03-06 20:37             ` Chris Murphy
                               ` (2 more replies)
  2016-03-07  8:30           ` Marc Haber
  1 sibling, 3 replies; 81+ messages in thread
From: Chris Murphy @ 2016-03-06 20:27 UTC (permalink / raw)
  To: Duncan; +Cc: Btrfs BTRFS

On Sat, Mar 5, 2016 at 11:43 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> Marc Haber posted on Sat, 05 Mar 2016 21:09:09 +0100 as excerpted:
>
>> On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
>
>>> Something is happening with the usage of this file system that's out of
>>> the ordinary. This is the first time I've seen such a large amount of
>>> unused metadata allocation. And then for it not only fail to balance,
>>> but for the allocation amount to increase is a first.
>>
>> It is just a root filesystem of a workstation running Debian Linux, in
>> daily use, with daily snapshots of the system, and ten-minute-increment
>> snapshots of /home, with no cleanup happening for a few months.
>>
>>>  So understanding the usage is important to figuring out what's
>>>  happening. I'd file a bug and include as much information on how the
>>>  fs got into this state as possible. And also if possible make a
>>>  btrfs-image using the proper flags to blot out the filenames for
>>>  privacy.
>
> Now you're homing in on what I picked up on.  There's something very
> funky about that metadata, 100+ GiB of metadata total, only just over 2
> GiB metadata used, and attempts to balance it don't help with the spread
> between the two at all, only increasing the total metadata, if anything,
> but still seem to complete without error.  There's gotta be some sort of
> bug going on there, and I'd /bet/ it's the same one that's keeping full
> balances from working, as well.
>
>
> OK, this question's out of left field, but it's the only thing (well,
> /almost/ only, see below) I've seen do anything /remotely/ like that:
>
> Was the filesystem originally created as a convert from ext*, using btrfs-
> convert?  If so, was the ext2_saved or whatever subvolume removed, and a
> successful defrag and balance completed at that time?
>
> Because as I said, problems due to faulty conversion from ext* have been
> the one thing repeatedly reported to trigger balance behavior and spreads
> between total and used that balance doesn't fix, like this.
>
>
> Tho AFAIK there was in addition a very narrow timeframe in which a bug in
> mkfs.btrfs would create invalid btrfs'.  That was with btrfs-progs 4.1.1,
> released in July 2015, with an urgent bugfix release 4.1.2 in the same
> month to fix the problem, so the timeframe was days or weeks.  Btrfs
> created with that buggy mkfs.btrfs were known to have some pretty wild
> behavior as well, with the recommendation being to simply blow them up
> and recreate them with a mkfs.btrfs from a btrfs-progs without the bug,
> as the btrfs created by the bugged version were simply too bugged out to
> reliably fix, and might well appear to work fine for awhile, until BOOM!
> If there's a chance the filesystem in question was created by that bugged
> mkfs.btrfs, don't even try to fix it, just get what you can off it and
> recreate with a mkfs.btrfs without that bug, ASAP.

Marc said it was created maybe 2 years ago and doesn't remember what
version of the tools were used. Between it being two years ago and
also being Debian, for all we know it could've been 0.19. *shrug*

On the one hand, the practical advice is to just blow it away and use
everything current, go back to the same workload including thousands
of snapshots, and see if this balance problem is reproducible. That's
pretty clearly a bug.

On the other hand, we're approaching the state with Btrfs where the
problems we're seeing are at least as much about aging file systems,
because the stability is permitting file systems to get older. As they
get older though, the issues get more non-deterministic. So it's an
interesting bug from that perspective, the current kernel code ought
to be able to contend with this (as in, the user is right to expect
the code to deal with this scenario, and if it doesn't it's a bug; not
that I expect today's code to actually do this).

That's why I'm suggesting btrfs-image, because at some point the aging
question might get a dev's attention. But in the meantime the user
should get back to a reliable state and also find out sooner than
later if this is still a problem with all current tools and code.

So if it were me, I'd gather all possible data, including complete,
not trimmed, logs. And as for the btrfs-image, it could be huge. It
should be only 1GB based on the used metadata but the fact there's 50x
more allocated, I'm not sure how big it'll actually be or if
btrfs-image even has the granularity to capture this. It might not be
a bad idea to capture a complete btrfs-debug-tree also, and compress
that, add as attachment. For the image, stick it up in dropbox or
google drive, and post the URL in the bug and give it a year... I've
had devs come back and fix stuff that old before.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-06 20:27           ` Chris Murphy
@ 2016-03-06 20:37             ` Chris Murphy
  2016-03-07  8:47               ` Marc Haber
  2016-03-07  8:42             ` Marc Haber
  2016-03-12 21:30             ` Marc Haber
  2 siblings, 1 reply; 81+ messages in thread
From: Chris Murphy @ 2016-03-06 20:37 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Sun, Mar 6, 2016 at 1:27 PM, Chris Murphy <lists@colorremedies.com> wrote:

> So if it were me, I'd gather all possible data, including complete,
> not trimmed, logs.

Also include in the bug, the balance script being used. It might be a
contributing factor.

I wonder if the ENOSPC is happening just prior to the point where
balance would free up the unused portion of allocated metadata chunks
and that's why this just keeps getting worse? The balance function is
COW, so I wonder if there are a bunch of failed chunk migrations that
are just accumulating due to the ENOSPC stopping the balance?


Anyway, after collecting all data and btrfs-image, I would blow away
this fs using current kernel and tools. And then go back to the
original workload. I would not pare down the number or frequency of
snapshots. If anything increase it. The idea is to reproduce the bug.
While ENOSPC is a certain indicator the bug is present, it's not the
only possibility. If the metadata chunk allocation substantially
starts to exceed the used amount by say 4x or more after a full
metadata balance, that suggests a problem even if there is no ENOSPC.
Right now total is more than 50x used.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-06  6:43         ` Duncan
  2016-03-06 20:27           ` Chris Murphy
@ 2016-03-07  8:30           ` Marc Haber
  2016-03-07 20:07             ` Duncan
  1 sibling, 1 reply; 81+ messages in thread
From: Marc Haber @ 2016-03-07  8:30 UTC (permalink / raw)
  To: linux-btrfs

On Sun, Mar 06, 2016 at 06:43:46AM +0000, Duncan wrote:
> Marc Haber posted on Sat, 05 Mar 2016 21:09:09 +0100 as excerpted:
> > On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
> >> Something is happening with the usage of this file system that's out of
> >> the ordinary. This is the first time I've seen such a large amount of
> >> unused metadata allocation. And then for it not only fail to balance,
> >> but for the allocation amount to increase is a first.
> > 
> > It is just a root filesystem of a workstation running Debian Linux, in
> > daily use, with daily snapshots of the system, and ten-minute-increment
> > snapshots of /home, with no cleanup happening for a few months.
> > 
> >>  So understanding the usage is important to figuring out what's
> >>  happening. I'd file a bug and include as much information on how the
> >>  fs got into this state as possible. And also if possible make a
> >>  btrfs-image using the proper flags to blot out the filenames for
> >>  privacy.
> 
> Now you're homing in on what I picked up on.  There's something very 
> funky about that metadata, 100+ GiB of metadata total, only just over 2 
> GiB metadata used, and attempts to balance it don't help with the spread 
> between the two at all, only increasing the total metadata, if anything, 
> but still seem to complete without error.  There's gotta be some sort of 
> bug going on there, and I'd /bet/ it's the same one that's keeping full 
> balances from working, as well.

I don't understand a single word of this, but you seem to understand
it. Good.

> 
> OK, this question's out of left field, but it's the only thing (well, 
> /almost/ only, see below) I've seen do anything /remotely/ like that:
> 
> Was the filesystem originally created as a convert from ext*, using btrfs-
> convert?  If so, was the ext2_saved or whatever subvolume removed, and a 
> successful defrag and balance completed at that time?

I have dug aroud in my auth.logs, and thanks to my not working in a
root shell but using sudo for every single command I can say that the
filesystem was created on September 1, 2015, so it is not _this_ old,
and snapshot.debian.net tells me that Debian unstable had btrfs-tools
4.1.2 uploaded on August 31, so i guess that the filesystem was either
created by the 4.0 version we had since May 2015 or by the brand new
4.1.2.

And it was a mkfs.btrfs with no special options. I suspected this
since I would probably not have made an ext4 filesystem of 300 GB in
size. Back in the ext4 days, I usually made /, /usr, /var, /home and
/boot their own filesystems.

> Tho AFAIK there was in addition a very narrow timeframe in which a bug in 
> mkfs.btrfs would create invalid btrfs'.  That was with btrfs-progs 4.1.1, 
> released in July 2015, with an urgent bugfix release 4.1.2 in the same 
> month to fix the problem, so the timeframe was days or weeks.

Debian is chastized for their allegedly quirky release schedules even
in this thread, I usually ignore that, but this time a smile comes to
my face when I say that btrfs-progs 4.1.1 was never packaged in
Debian, hence we're clear of this bug here. We went from 4.0 straight
to 4.1.2.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-06 20:27           ` Chris Murphy
  2016-03-06 20:37             ` Chris Murphy
@ 2016-03-07  8:42             ` Marc Haber
  2016-03-07 18:39               ` Chris Murphy
  2016-03-07 19:44               ` Chris Murphy
  2016-03-12 21:30             ` Marc Haber
  2 siblings, 2 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-07  8:42 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sun, Mar 06, 2016 at 01:27:10PM -0700, Chris Murphy wrote:
> Marc said it was created maybe 2 years ago and doesn't remember what
> version of the tools were used. Between it being two years ago and
> also being Debian, for all we know it could've been 0.19. *shrug*

You are mixing up Debian unstable and Debian stable *snort*. You're
lucky that I'm not on RHEL 6[1].

> On the one hand, the practical advice is to just blow it away and use
> everything current, go back to the same workload including thousands
> of snapshots, and see if this balance problem is reproducible. That's
> pretty clearly a bug.

To have the same thing happen in half a year again? That's not why I
converted to a snapshottable file system.

> On the other hand, we're approaching the state with Btrfs where the
> problems we're seeing are at least as much about aging file systems,
> because the stability is permitting file systems to get older.

And this is really something to be proud of? I mean, this is a file
system that is part of the vanilla linux kernel, not marked as
experimental or something, and you're still concerned about file
systems that were made a year ago? This is a new experience for me.

>  As they get older though, the issues get more non-deterministic. So
>  it's an interesting bug from that perspective, the current kernel
>  code ought to be able to contend with this (as in, the user is right
>  to expect the code to deal with this scenario, and if it doesn't it's
>  a bug; not that I expect today's code to actually do this).

Kernel 4.4.4 as of the day before yesterday, thanks for considering.

> So if it were me, I'd gather all possible data, including complete,
> not trimmed, logs.

So you seriously want all messages like
Mar  7 09:25:23 fan systemd[1]: Started http per-connection Server, forwarding to 3142 ([2a01:238:4071:328d:5054:ff:fea9:6807]:41060).
Mar  7 09:25:23 fan named[3000]: client 2a01:238:4071:328d:5054:ff:fea9:6807#59920 (debian.debian.zugschlus.de): query: debian.debian.zugschlus.de IN AAAA + (fec0:0:0:ffff::1)
Mar  7 09:21:34 fan dhcpd[2468]: DHCPREQUEST for 192.168.182.29 from 54:04:a6:82:21:00 via eth0: unknown lease 192.168.182.29.
Mar  7 09:17:01 fan CRON[19474]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Mar  7 09:18:06 fan systemd[1]: Started Session c101 of user mh.
Mar  7 08:21:40 fan smartd[1956]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 31 to 30

I _can_ swamp the bug report literally with gigabytes of logs, but is
that really what you want? If it is not, please state what you mean by
"not trimmed" as I only removed those clutter messages from the logs I
sent.

Greetings
Marc

[1] Does RHEL 6 have btrfs in the first place?

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-06 20:37             ` Chris Murphy
@ 2016-03-07  8:47               ` Marc Haber
  0 siblings, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-07  8:47 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sun, Mar 06, 2016 at 01:37:31PM -0700, Chris Murphy wrote:
> On Sun, Mar 6, 2016 at 1:27 PM, Chris Murphy <lists@colorremedies.com> wrote:
> > So if it were me, I'd gather all possible data, including complete,
> > not trimmed, logs.
> 
> Also include in the bug, the balance script being used. It might be a
> contributing factor.

The balance script was only written after Duncan asked me to do
filtered balances instead of a full balance. The issue showed itself
while the filesystem was still managed using the procedures from "the
book" ;-)

> I wonder if the ENOSPC is happening just prior to the point where
> balance would free up the unused portion of allocated metadata chunks
> and that's why this just keeps getting worse? The balance function is
> COW, so I wonder if there are a bunch of failed chunk migrations that
> are just accumulating due to the ENOSPC stopping the balance?

How do we find out?

> Anyway, after collecting all data and btrfs-image, I would blow away
> this fs using current kernel and tools. And then go back to the
> original workload. I would not pare down the number or frequency of
> snapshots. If anything increase it. The idea is to reproduce the bug.

... losing another pile of snapshots in the process? This is a
productive machine[1].

Greetings
Marc

[1] yes, with off-line backups being made

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-05 20:09       ` Marc Haber
  2016-03-06  6:43         ` Duncan
@ 2016-03-07  8:56         ` Marc Haber
  1 sibling, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-07  8:56 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sat, Mar 05, 2016 at 09:09:09PM +0100, Marc Haber wrote:
> On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
> >  So understanding the usage is important to figuring out what's
> >  happening. I'd file a bug and include as much information on how the
> >  fs got into this state as possible. And also if possible make a
> >  btrfs-image using the proper flags to blot out the filenames for
> >  privacy.
> 
> That would btrfs-image -s?

btrfs-image -s -t 8 -s /dev/mapper/fanbtr

complains about a mounted filesystem. Will an image made from the
running system with the filesyste mounted help, or do I need to take
down the machine while the image is being made?

Also, threading does not seem to work, despite the -t 8 CPU usage
never increases 100 % in atop.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-07  8:42             ` Marc Haber
@ 2016-03-07 18:39               ` Chris Murphy
  2016-03-07 18:56                 ` Austin S. Hemmelgarn
  2016-03-12 21:36                 ` Marc Haber
  2016-03-07 19:44               ` Chris Murphy
  1 sibling, 2 replies; 81+ messages in thread
From: Chris Murphy @ 2016-03-07 18:39 UTC (permalink / raw)
  To: Marc Haber; +Cc: Btrfs BTRFS

On Mon, Mar 7, 2016 at 1:42 AM, Marc Haber <mh+linux-btrfs@zugschlus.de> wrote:
> On Sun, Mar 06, 2016 at 01:27:10PM -0700, Chris Murphy wrote:


>> On the one hand, the practical advice is to just blow it away and use
>> everything current, go back to the same workload including thousands
>> of snapshots, and see if this balance problem is reproducible. That's
>> pretty clearly a bug.
>
> To have the same thing happen in half a year again? That's not why I
> converted to a snapshottable file system.

If you want to help make Btfs better, yes. If you don't, that's fine,
use ZFS on Linux. Or if the ZFS Linux glue thing bugs you, use
FreeBSD. We're very lucky in free software to have so many
alternatives.


>> On the other hand, we're approaching the state with Btrfs where the
>> problems we're seeing are at least as much about aging file systems,
>> because the stability is permitting file systems to get older.
>
> And this is really something to be proud of? I mean, this is a file
> system that is part of the vanilla linux kernel, not marked as
> experimental or something, and you're still concerned about file
> systems that were made a year ago? This is a new experience for me.

I'm not a developer so you should take my opinions with a grain of
salt, but having come from something near to your perspective, it's
changed a lot by following the progress over ~5-6 years. Filesystems
are non-trivial. There's a great set of videos here
http://open-zfs.org/wiki/Main_Page about how non-trivial this was for
ZFS. It was a year to first kernel mount at which point it seemed like
90% of the work was done but far from it as it really took ~10 years
to get to a point where they were certain it could be deployed in an
enterprise environment.

Btrfs first kernel mount was in what 2008? So we're not yet done with
year 8? Not that they're exactly comparable. But I don't know who was
saying Btrfs would be done and stable with no bugs in 5 years.

The reality is, you've lost no data and you've probably found a bug.
So, write it up and contribute if you want. It's been a new experience
for me also.

>
>> So if it were me, I'd gather all possible data, including complete,
>> not trimmed, logs.
>
> So you seriously want all messages like
> Mar  7 09:25:23 fan systemd[1]: Started http per-connection Server, forwarding to 3142 ([2a01:238:4071:328d:5054:ff:fea9:6807]:41060).
> Mar  7 09:25:23 fan named[3000]: client 2a01:238:4071:328d:5054:ff:fea9:6807#59920 (debian.debian.zugschlus.de): query: debian.debian.zugschlus.de IN AAAA + (fec0:0:0:ffff::1)
> Mar  7 09:21:34 fan dhcpd[2468]: DHCPREQUEST for 192.168.182.29 from 54:04:a6:82:21:00 via eth0: unknown lease 192.168.182.29.
> Mar  7 09:17:01 fan CRON[19474]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
> Mar  7 09:18:06 fan systemd[1]: Started Session c101 of user mh.
> Mar  7 08:21:40 fan smartd[1956]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 31 to 30
>
> I _can_ swamp the bug report literally with gigabytes of logs, but is
> that really what you want?

Since there's no hardware issue suspect, you could filter for just btrfs.

journalctl -o short-iso | grep -i btrfs

When there's hardware stuff suspect it's better to include all the
SCSI and  libata (and USB if it's a USB drive) messages also.

If you have any logs that include the filesystem mounted with
enospc_debug, that might be useful for a developer?


> [1] Does RHEL 6 have btrfs in the first place?

They do, but you need a decoder ring to figure out what's been
backported to have some vague idea of what equivalent kernel.org
kernel it is.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-07 18:39               ` Chris Murphy
@ 2016-03-07 18:56                 ` Austin S. Hemmelgarn
  2016-03-07 19:07                   ` Chris Murphy
  2016-03-07 19:33                   ` Marc Haber
  2016-03-12 21:36                 ` Marc Haber
  1 sibling, 2 replies; 81+ messages in thread
From: Austin S. Hemmelgarn @ 2016-03-07 18:56 UTC (permalink / raw)
  To: Chris Murphy, Marc Haber; +Cc: Btrfs BTRFS

On 2016-03-07 13:39, Chris Murphy wrote:
> On Mon, Mar 7, 2016 at 1:42 AM, Marc Haber <mh+linux-btrfs@zugschlus.de> wrote:
>> [1] Does RHEL 6 have btrfs in the first place?
>
> They do, but you need a decoder ring to figure out what's been
> backported to have some vague idea of what equivalent kernel.org
> kernel it is.
>
Yeah, in general, if you want to get good upstream support for BTRFS 
(such as from the mailing lists), you still want to steer clear of 
'Enterprise' branded distros (RHEL (and by extension CentOS) is 
particularly bad about kernel versioning, OEL and and SLES are 
marginally better but still not great).  People don't often think about 
it, but given the degree of code and version divergence due to patches, 
RHEL, SLES, and OEL kernels are strictly speaking, forks of Linux (most 
distro kernels are, but usually not to the extreme degree that 
enterprise kernels are (with the exception of some embedded systems, 
which can be even worse)).

In fact, this is part of the reason I switched from using Gentoo's 
official kernel sources (they're nowhere near as bad about kernel 
versioning as enterprise distros, but they do have back-ports and 
numerous local modifications I don't care about) to my own sources which 
(try to) directly track linux-stable (the other reason being that it's 
easier to maintain my own patches via git than trying to handle user 
patching for gentoo-sources).

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-07 18:56                 ` Austin S. Hemmelgarn
@ 2016-03-07 19:07                   ` Chris Murphy
  2016-03-07 19:33                   ` Marc Haber
  1 sibling, 0 replies; 81+ messages in thread
From: Chris Murphy @ 2016-03-07 19:07 UTC (permalink / raw)
  To: Btrfs BTRFS

On Mon, Mar 7, 2016 at 11:56 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
>  People don't often think about it, but given the degree of code and
> version divergence due to patches, RHEL, SLES, and OEL kernels are strictly
> speaking, forks of Linux (most distro kernels are, but usually not to the
> extreme degree that enterprise kernels are (with the exception of some
> embedded systems, which can be even worse)).

Yes. The exception is with Fedora kernels which carry very minimal
well documented patches already in or planned to go in the upstream
kernel anyway. bugzilla.kernel.org has a Tree pop-up menu with a
Fedora option in addition to Mainline and some others, but no other
distros are listed.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-07 18:56                 ` Austin S. Hemmelgarn
  2016-03-07 19:07                   ` Chris Murphy
@ 2016-03-07 19:33                   ` Marc Haber
  1 sibling, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-07 19:33 UTC (permalink / raw)
  To: Btrfs BTRFS

On Mon, Mar 07, 2016 at 01:56:54PM -0500, Austin S. Hemmelgarn wrote:
> Yeah, in general, if you want to get good upstream support for BTRFS (such
> as from the mailing lists), you still want to steer clear of 'Enterprise'
> branded distros (RHEL (and by extension CentOS) is particularly bad about
> kernel versioning

Just to get back to this thread's subject, I am using Debian unstable,
with a vanilla kernel, 4.4.3 at the beginning of this thread, and
4.4.4 today.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-07  8:42             ` Marc Haber
  2016-03-07 18:39               ` Chris Murphy
@ 2016-03-07 19:44               ` Chris Murphy
  2016-03-07 20:43                 ` Duncan
  1 sibling, 1 reply; 81+ messages in thread
From: Chris Murphy @ 2016-03-07 19:44 UTC (permalink / raw)
  To: Marc Haber; +Cc: Btrfs BTRFS

On Mon, Mar 7, 2016 at 1:42 AM, Marc Haber <mh+linux-btrfs@zugschlus.de> wrote:
> And this is really something to be proud of? I mean, this is a file
> system that is part of the vanilla linux kernel, not marked as
> experimental or something, and you're still concerned about file
> systems that were made a year ago? This is a new experience for me.

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/Documentation/filesystems/btrfs.txt?id=refs/tags/v4.4.4

"Btrfs is under heavy development, and is not suitable for
any uses other than benchmarking and review. The Btrfs disk format is
not yet finalized."

I thought the 2nd sentence was removed a long time ago but I'm seeing
it in the current branch and 4.1.y. Is this a bug?


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-07  8:30           ` Marc Haber
@ 2016-03-07 20:07             ` Duncan
  0 siblings, 0 replies; 81+ messages in thread
From: Duncan @ 2016-03-07 20:07 UTC (permalink / raw)
  To: linux-btrfs

Marc Haber posted on Mon, 07 Mar 2016 09:30:43 +0100 as excerpted:

> I have dug aroud in my auth.logs, and thanks to my not working in a root
> shell but using sudo for every single command I can say that the
> filesystem was created on September 1, 2015, so it is not _this_ old,
> and snapshot.debian.net tells me that Debian unstable had btrfs-tools
> 4.1.2 uploaded on August 31, so i guess that the filesystem was either
> created by the 4.0 version we had since May 2015 or by the brand new
> 4.1.2.
> 
> And it was a mkfs.btrfs with no special options.

I doubted that either a convert-from-ext* or a mkfs with the bad 4.1.1 
would turn out to be the cause, but those are the only things /I/ am 
aware of that can trigger wild behavior like we're seeing with this 
metadata, here.

So it's certainly a bug, and not accounted for by anything know at least 
to me, which makes it a new and interesting bug.  But not being a dev, 
that's about as far as I can take it.  As CMurphy said, file a bug with 
all the various information, and hope the devs can replicate and trace it 
down.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-07 19:44               ` Chris Murphy
@ 2016-03-07 20:43                 ` Duncan
  2016-03-07 22:44                   ` Chris Murphy
  0 siblings, 1 reply; 81+ messages in thread
From: Duncan @ 2016-03-07 20:43 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Mon, 07 Mar 2016 12:44:20 -0700 as excerpted:

> On Mon, Mar 7, 2016 at 1:42 AM, Marc Haber <mh+linux-btrfs@zugschlus.de>
> wrote:
>> And this is really something to be proud of? I mean, this is a file
>> system that is part of the vanilla linux kernel, not marked as
>> experimental or something, and you're still concerned about file
>> systems that were made a year ago? This is a new experience for me.
> 
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/
tree/Documentation/filesystems/btrfs.txt?id=refs/tags/v4.4.4
> 
> "Btrfs is under heavy development, and is not suitable for any uses
> other than benchmarking and review. The Btrfs disk format is not yet
> finalized."
> 
> I thought the 2nd sentence was removed a long time ago but I'm seeing it
> in the current branch and 4.1.y. Is this a bug?

AFAIK, this is the still "semi-scary" wording that was left after the 
_really_ scary "eat your babies" level experimental warning was stripped, 
and it remains more or less literally the case, certainly the last 
sentence, tho I'd quibble that the "not suitable for any uses other than 
benchmarking and review" is a bit conservative, and the "disk format is 
not yet finalized" bit, while literally true, doesn't have the 
implications one might initially think, because there _is_ a commitment 
to support all disk formats available since btrfs was kernel-mainlined, 
so while the disk format can and does still occasionally have small 
changes (think features like skinny-metadata and the metadata changes 
necessary for raid56 support), these are all handled by incompatibility 
flags such that a kernel that doesn't understand them will refuse to 
mount a filesystem with them enabled, while newer kernels will still 
mount filesystems with them not enabled.

And of course rather famously, early in the second kernel series after 
mainlining, some change made it in that apparently caused some btrfs 
Linus was using _not_ to mount, and he rightly blistered the btrfs devs 
for it, because it screwed up his ability to kernel bisect over that 
period with the system he was using btrfs on.  Linus said that's not the 
way things work in mainline and he was right.  It hasn't happened again.  
And that was when btrfs still had the much more severe experimental label 
applied, so particularly now that it's gone, yes, there may be disk 
format changes, but they are covered by incompatibility flags and all 
kernels going forward from mainlining are going to support older formats 
in addition to any newer formats, because that's the way it is.

But I'd probably word the first sentence somewhat differently, saying 
that you should have backups and be prepared to use them if you're using 
btrfs, and that it's not suitable for production systems yet, but 
omitting the only suitable for benchmarking and review wording.

Regardless, I believe we've definitely established that while it's in the 
mainline kernel and is no longer experimental, there's still quite some 
warning there, contrary to Mark Habor's claim otherwise.  And indeed, if 
following that warning literally, review and benchmarking is all he'd be 
doing with it, so beyond that is certainly on the user... or distro 
making claims to the contrary.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-07 20:43                 ` Duncan
@ 2016-03-07 22:44                   ` Chris Murphy
  0 siblings, 0 replies; 81+ messages in thread
From: Chris Murphy @ 2016-03-07 22:44 UTC (permalink / raw)
  To: Duncan; +Cc: Btrfs BTRFS

On Mon, Mar 7, 2016 at 1:43 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> Chris Murphy posted on Mon, 07 Mar 2016 12:44:20 -0700 as excerpted:
>
>> On Mon, Mar 7, 2016 at 1:42 AM, Marc Haber <mh+linux-btrfs@zugschlus.de>
>> wrote:
>>> And this is really something to be proud of? I mean, this is a file
>>> system that is part of the vanilla linux kernel, not marked as
>>> experimental or something, and you're still concerned about file
>>> systems that were made a year ago? This is a new experience for me.
>>
>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/
> tree/Documentation/filesystems/btrfs.txt?id=refs/tags/v4.4.4
>>
>> "Btrfs is under heavy development, and is not suitable for any uses
>> other than benchmarking and review. The Btrfs disk format is not yet
>> finalized."
>>
>> I thought the 2nd sentence was removed a long time ago but I'm seeing it
>> in the current branch and 4.1.y. Is this a bug?
>
> AFAIK, this is the still "semi-scary" wording that was left after the
> _really_ scary "eat your babies" level experimental warning was stripped,
> and it remains more or less literally the case, certainly the last
> sentence,

The Btrfs wiki main page says the format is no longer unstable. So OK,
I guess those are compatible. Not yet finalized does not mean unstable
I guess?


> And that was when btrfs still had the much more severe experimental label
> applied, so particularly now that it's gone,

Right the experimental label is probably what I'm thinking of.
Nevertheless the above does sound a bit dire. I mean, for raid56 it's
totally reasonable, and maybe even any multiple device stuff just
because of the lack of faulty device handling, in particular
notification. But I'm not sure ZFS on Linux (or even FreeBSD) have any
of that either.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-05 19:34     ` Chris Murphy
  2016-03-05 20:09       ` Marc Haber
@ 2016-03-12 19:57       ` Marc Haber
  2016-03-13 19:43         ` Chris Murphy
  2016-03-12 21:14       ` Marc Haber
  2016-03-13 11:58       ` New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work) Marc Haber
  3 siblings, 1 reply; 81+ messages in thread
From: Marc Haber @ 2016-03-12 19:57 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
> Something is happening with the usage of this file system that's out
> of the ordinary. This is the first time I've seen such a large amount
> of unused metadata allocation. And then for it not only fail to
> balance, but for the allocation amount to increase is a first. So
> understanding the usage is important to figuring out what's happening.
> I'd file a bug and include as much information on how the fs got into
> this state as possible. And also if possible make a btrfs-image using
> the proper flags to blot out the filenames for privacy. And what
> btrfs-progs tools were used to create this file system. Etc.

https://bugzilla.kernel.org/show_bug.cgi?id=114451

Please advise if there is something missing.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-05 19:34     ` Chris Murphy
  2016-03-05 20:09       ` Marc Haber
  2016-03-12 19:57       ` Marc Haber
@ 2016-03-12 21:14       ` Marc Haber
  2016-03-13 11:58       ` New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work) Marc Haber
  3 siblings, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-12 21:14 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
> I'd try to use -musage filter alone, in whatever increments work. So
> try 0. Then 5. If 5 fails, try 2. Increment until size is not much
> more than 2-3x used.

Here we go:

[7/506]mh@fan:~$ sudo btrfs balance -usage=0 /media/tempdisk
Done, had to relocate 32 out of 134 chunks
[8/507]mh@fan:~$ sudo btrfs fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.33GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=27.00GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
[9/508]mh@fan:~$ sudo btrfs balance start -musage=0 /media/tempdisk
Done, had to relocate 32 out of 134 chunks
[10/509]mh@fan:~$ sudo btrfs fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.33GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=27.00GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
[11/510]mh@fan:~$ sudo btrfs balance start -musage=0 /media/tempdisk
Done, had to relocate 36 out of 134 chunks
[12/511]mh@fan:~$ sudo btrfs balance fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.33GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=27.00GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
[13/512]mh@fan:~$ sudo btrfs balance start -musage=10 /media/tempdisk
Done, had to relocate 41 out of 134 chunks
[14/513]mh@fan:~$ sudo btrfs fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.32GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=27.00GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
[15/514]mh@fan:~$ sudo btrfs balance start -musage=20 /media/tempdisk
Done, had to relocate 43 out of 134 chunks
[16/515]mh@fan:~$ sudo btrfs fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.32GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=27.00GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
[17/516]mh@fan:~$ sudo btrfs balance start -musage=30 /media/tempdisk
Done, had to relocate 48 out of 134 chunks

real    0m48.604s
user    0m0.008s
sys     0m15.756s
[18/517]mh@fan:~$ sudo btrfs fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.32GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=27.00GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
[19/518]mh@fan:~$ time sudo btrfs balance start -musage=40 /media/tempdisk
Done, had to relocate 49 out of 134 chunks

real    0m17.076s
user    0m0.000s
sys     0m5.188s
[20/519]mh@fan:~$ sudo btrfs fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.32GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=27.00GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
[21/520]mh@fan:~$ time sudo btrfs balance start -musage=50 /media/tempdisk
Done, had to relocate 49 out of 134 chunks

real    0m11.536s
user    0m0.004s
sys     0m3.512s
[22/521]mh@fan:~$ sudo btrfs fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.32GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=27.00GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
[23/522]mh@fan:~$ time sudo btrfs balance start -musage=60 /media/tempdisk
Done, had to relocate 49 out of 134 chunks

real    0m4.485s
user    0m0.008s
sys     0m0.664s
[24/523]mh@fan:~$ sudo btrfs fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.32GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=27.00GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
[25/524]mh@fan:~$ time sudo btrfs balance start -musage=70 /media/tempdisk
Done, had to relocate 50 out of 134 chunks

real    0m16.581s
user    0m0.012s
sys     0m5.456s
[26/525]mh@fan:~$ sudo btrfs fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.32GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=26.50GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=3.33MiB
[27/526]mh@fan:~$ time sudo btrfs balance start -musage=80 /media/tempdisk
ERROR: error during balancing '/media/tempdisk': No space left on device
There may be more info in syslog - try dmesg | tail

real    0m16.775s
user    0m0.004s
sys     0m5.800s
[28/527]mh@fan:~$ sudo btrfs fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.32GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=26.50GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=33.69MiB
[29/528]mh@fan:~$ time sudo btrfs balance start -musage=75 /media/tempdisk
ERROR: error during balancing '/media/tempdisk': No space left on device
There may be more info in syslog - try dmesg | tail

real    0m4.363s
user    0m0.004s
sys     0m0.600s
[30/529]mh@fan:~$ sudo btrfs fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.32GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=27.00GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
[31/530]mh@fan:~$ time sudo btrfs balance start -musage=74 /media/tempdisk
Done, had to relocate 50 out of 134 chunks

real    0m4.546s
user    0m0.000s
sys     0m0.620s
[32/531]mh@fan:~$ sudo btrfs fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.32GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=27.00GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

So one does not see a decrease in total Metadata size until -musage
has gone up to 70, then it decreases by half a gig. -musage=75 is the
first musage value that leads to the ENOSPC condition, with total
Metadata size going up to 27 GiB again, and -musage=74 being the
biggest musage value that finishs without ENOSPC, but no visible
decrease of total Metadata size.

Greetings
Marc


-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-06 20:27           ` Chris Murphy
  2016-03-06 20:37             ` Chris Murphy
  2016-03-07  8:42             ` Marc Haber
@ 2016-03-12 21:30             ` Marc Haber
  2 siblings, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-12 21:30 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sun, Mar 06, 2016 at 01:27:10PM -0700, Chris Murphy wrote:
> So if it were me, I'd gather all possible data, including complete,
> not trimmed, logs. And as for the btrfs-image, it could be huge.

[5/504]mh@q:~/.www/public_html/stuff$ unxz --list 20160307-fanbtr-image.xz
Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1      19    248.0 MiB   2385.2 MiB  0.104  CRC32   20160307-fanbtr-image.xz

>  It might not be
>  a bad idea to capture a complete btrfs-debug-tree also, and compress
>  that, add as attachment.

How do I do that?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-07 18:39               ` Chris Murphy
  2016-03-07 18:56                 ` Austin S. Hemmelgarn
@ 2016-03-12 21:36                 ` Marc Haber
  1 sibling, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-12 21:36 UTC (permalink / raw)
  To: Btrfs BTRFS

On Mon, Mar 07, 2016 at 11:39:11AM -0700, Chris Murphy wrote:
> Since there's no hardware issue suspect, you could filter for just btrfs.
> 
> journalctl -o short-iso | grep -i btrfs

Which is exactly what I did. Why did you suspect that my logs were
"trimmed"? That's what got me kind of furious. I took great care to
not trim relevant information.

> When there's hardware stuff suspect it's better to include all the
> SCSI and  libata (and USB if it's a USB drive) messages also.

None there.

> If you have any logs that include the filesystem mounted with
> enospc_debug, that might be useful for a developer?

The later logs I posted were actually taken with enospc_debug, the
4.4.3 ones even with Duncan's patch. I think I didn't apply it before
building 4.4.4.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-05 19:34     ` Chris Murphy
                         ` (2 preceding siblings ...)
  2016-03-12 21:14       ` Marc Haber
@ 2016-03-13 11:58       ` Marc Haber
  2016-03-13 13:17         ` Andrew Vaughan
                           ` (3 more replies)
  3 siblings, 4 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-13 11:58 UTC (permalink / raw)
  To: Btrfs BTRFS

Hi,

On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
> The alternative if this can't be fixed, is to recreate the filesystem
> because there's no practical way yet to migrate so many snapshots to a
> new file system.

I recreated the file system on March 7, with 200 GiB in size, using
btrfs-tools 4.4. The snapshot-taking process has been running since
then, but I also regularly cleaned up. The number of snapshots on the
new filesystem has never exceeded 1000, with the current count being
at 148.

And btrfs balance runs into the same ENOSPC issues as the old one:

[9/508]mh@fan:~$ grep -v 'device dm-15' 20160313-fanbtr-btrfs-syslog
Mar 13 11:05:45 fan mh: BEGIN btrfs-balance script
Mar 13 11:05:45 fan mh: btrfs fi df /
Mar 13 11:05:45 fan root: Data, single: total=80.00GiB, used=77.71GiB
Mar 13 11:05:45 fan root: System, single: total=32.00MiB, used=16.00KiB
Mar 13 11:05:45 fan root: Metadata, single: total=27.00GiB, used=2.38GiB
Mar 13 11:05:45 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 13 11:05:45 fan mh: btrfs fi show /
Mar 13 11:05:45 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
Mar 13 11:05:45 fan root: #011Total devices 1 FS bytes used 80.09GiB
Mar 13 11:05:45 fan root: #011devid    1 size 200.00GiB used 107.03GiB path /dev/mapper/fanbtr
Mar 13 11:05:45 fan root:
Mar 13 11:05:45 fan mh: btrfs fi usage /
Mar 13 11:05:45 fan root: Overall:
Mar 13 11:05:45 fan root:     Device size:#011#011 200.00GiB
Mar 13 11:05:45 fan root:     Device allocated:#011#011 107.03GiB
Mar 13 11:05:45 fan root:     Device unallocated:#011#011  92.97GiB
Mar 13 11:05:45 fan root:     Device missing:#011#011     0.00B
Mar 13 11:05:45 fan root:     Used:#011#011#011  80.09GiB
Mar 13 11:05:45 fan root:     Free (estimated):#011#011  95.26GiB#011(min: 95.26GiB)
Mar 13 11:05:45 fan root:     Data ratio:#011#011#011      1.00
Mar 13 11:05:45 fan root:     Metadata ratio:#011#011      1.00
Mar 13 11:05:45 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 13 11:05:45 fan root:
Mar 13 11:05:45 fan root: Data,single: Size:80.00GiB, Used:77.71GiB
Mar 13 11:05:45 fan root:    /dev/mapper/fanbtr#011  80.00GiB
Mar 13 11:05:45 fan root:
Mar 13 11:05:45 fan root: Metadata,single: Size:27.00GiB, Used:2.38GiB
Mar 13 11:05:45 fan root:    /dev/mapper/fanbtr#011  27.00GiB
Mar 13 11:05:45 fan root:
Mar 13 11:05:45 fan root: System,single: Size:32.00MiB, Used:16.00KiB
Mar 13 11:05:45 fan root:    /dev/mapper/fanbtr#011  32.00MiB
Mar 13 11:05:45 fan root:
Mar 13 11:05:45 fan root: Unallocated:
Mar 13 11:05:45 fan root:    /dev/mapper/fanbtr#011  92.97GiB
Mar 13 11:05:45 fan mh: BEGIN btrfs balance start /
Mar 13 11:20:30 fan root: ERROR: error during balancing '/': No space left on device
Mar 13 11:20:30 fan root: There may be more info in syslog - try dmesg | tail
Mar 13 11:20:30 fan root: btrfs fi df /
Mar 13 11:20:30 fan root: Data, single: total=78.00GiB, used=77.70GiB
Mar 13 11:20:30 fan root: System, single: total=32.00MiB, used=16.00KiB
Mar 13 11:20:30 fan root: Metadata, single: total=15.00GiB, used=2.38GiB
Mar 13 11:20:30 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 13 11:20:30 fan root: btrfs fi show /
Mar 13 11:20:30 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
Mar 13 11:20:30 fan root: #011Total devices 1 FS bytes used 80.08GiB
Mar 13 11:20:30 fan root: #011devid    1 size 200.00GiB used 93.03GiB path /dev/mapper/fanbtr
Mar 13 11:20:30 fan root:
Mar 13 11:20:30 fan root: btrfs fi usage /
Mar 13 11:20:30 fan root: Overall:
Mar 13 11:20:30 fan root:     Device size:#011#011 200.00GiB
Mar 13 11:20:30 fan root:     Device allocated:#011#011  93.03GiB
Mar 13 11:20:30 fan root:     Device unallocated:#011#011 106.97GiB
Mar 13 11:20:30 fan root:     Device missing:#011#011     0.00B
Mar 13 11:20:30 fan root:     Used:#011#011#011  80.08GiB
Mar 13 11:20:30 fan root:     Free (estimated):#011#011 107.27GiB#011(min: 107.27GiB)
Mar 13 11:20:30 fan root:     Data ratio:#011#011#011      1.00
Mar 13 11:20:30 fan root:     Metadata ratio:#011#011      1.00
Mar 13 11:20:30 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 13 11:20:30 fan root:
Mar 13 11:20:30 fan root: Data,single: Size:78.00GiB, Used:77.70GiB
Mar 13 11:20:30 fan root:    /dev/mapper/fanbtr#011  78.00GiB
Mar 13 11:20:30 fan root:
Mar 13 11:20:30 fan root: Metadata,single: Size:15.00GiB, Used:2.38GiB
Mar 13 11:20:30 fan root:    /dev/mapper/fanbtr#011  15.00GiB
Mar 13 11:20:30 fan root:
Mar 13 11:20:30 fan root: System,single: Size:32.00MiB, Used:16.00KiB
Mar 13 11:20:30 fan root:    /dev/mapper/fanbtr#011  32.00MiB
Mar 13 11:20:30 fan root:
Mar 13 11:20:30 fan root: Unallocated:
Mar 13 11:20:30 fan root:    /dev/mapper/fanbtr#011 106.97GiB
Mar 13 11:20:30 fan root: BEGIN btrfs balance start -dprofiles=single /
Mar 13 11:36:17 fan root: Done, had to relocate 78 out of 94 chunks
Mar 13 11:36:17 fan root: btrfs fi df /
Mar 13 11:36:17 fan root: Data, single: total=78.00GiB, used=77.71GiB
Mar 13 11:36:17 fan root: System, single: total=32.00MiB, used=16.00KiB
Mar 13 11:36:17 fan root: Metadata, single: total=15.00GiB, used=2.38GiB
Mar 13 11:36:17 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 13 11:36:17 fan root: btrfs fi show /
Mar 13 11:36:17 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
Mar 13 11:36:17 fan root: #011Total devices 1 FS bytes used 80.09GiB
Mar 13 11:36:17 fan root: #011devid    1 size 200.00GiB used 93.03GiB path /dev/mapper/fanbtr
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: btrfs fi usage /
Mar 13 11:36:17 fan root: Overall:
Mar 13 11:36:17 fan root:     Device size:#011#011 200.00GiB
Mar 13 11:36:17 fan root:     Device allocated:#011#011  93.03GiB
Mar 13 11:36:17 fan root:     Device unallocated:#011#011 106.97GiB
Mar 13 11:36:17 fan root:     Device missing:#011#011     0.00B
Mar 13 11:36:17 fan root:     Used:#011#011#011  80.09GiB
Mar 13 11:36:17 fan root:     Free (estimated):#011#011 107.26GiB#011(min: 107.26GiB)
Mar 13 11:36:17 fan root:     Data ratio:#011#011#011      1.00
Mar 13 11:36:17 fan root:     Metadata ratio:#011#011      1.00
Mar 13 11:36:17 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: Data,single: Size:78.00GiB, Used:77.71GiB
Mar 13 11:36:17 fan root:    /dev/mapper/fanbtr#011  78.00GiB
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: Metadata,single: Size:15.00GiB, Used:2.38GiB
Mar 13 11:36:17 fan root:    /dev/mapper/fanbtr#011  15.00GiB
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: System,single: Size:32.00MiB, Used:16.00KiB
Mar 13 11:36:17 fan root:    /dev/mapper/fanbtr#011  32.00MiB
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: Unallocated:
Mar 13 11:36:17 fan root:    /dev/mapper/fanbtr#011 106.97GiB
Mar 13 11:36:17 fan root: BEGIN btrfs balance start -mprofiles=dup /
Mar 13 11:36:17 fan root: Done, had to relocate 0 out of 94 chunks
Mar 13 11:36:17 fan root: btrfs fi df /
Mar 13 11:36:17 fan root: Data, single: total=78.00GiB, used=77.71GiB
Mar 13 11:36:17 fan root: System, single: total=32.00MiB, used=16.00KiB
Mar 13 11:36:17 fan root: Metadata, single: total=15.00GiB, used=2.38GiB
Mar 13 11:36:17 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 13 11:36:17 fan root: btrfs fi show /
Mar 13 11:36:17 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
Mar 13 11:36:17 fan root: #011Total devices 1 FS bytes used 80.09GiB
Mar 13 11:36:17 fan root: #011devid    1 size 200.00GiB used 93.03GiB path /dev/mapper/fanbtr
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: btrfs fi usage /
Mar 13 11:36:17 fan root: Overall:
Mar 13 11:36:17 fan root:     Device size:#011#011 200.00GiB
Mar 13 11:36:17 fan root:     Device allocated:#011#011  93.03GiB
Mar 13 11:36:17 fan root:     Device unallocated:#011#011 106.97GiB
Mar 13 11:36:17 fan root:     Device missing:#011#011     0.00B
Mar 13 11:36:17 fan root:     Used:#011#011#011  80.09GiB
Mar 13 11:36:17 fan root:     Free (estimated):#011#011 107.26GiB#011(min: 107.26GiB)
Mar 13 11:36:17 fan root:     Data ratio:#011#011#011      1.00
Mar 13 11:36:17 fan root:     Metadata ratio:#011#011      1.00
Mar 13 11:36:17 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: Data,single: Size:78.00GiB, Used:77.71GiB
Mar 13 11:36:17 fan root:    /dev/mapper/fanbtr#011  78.00GiB
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: Metadata,single: Size:15.00GiB, Used:2.38GiB
Mar 13 11:36:17 fan root:    /dev/mapper/fanbtr#011  15.00GiB
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: System,single: Size:32.00MiB, Used:16.00KiB
Mar 13 11:36:17 fan root:    /dev/mapper/fanbtr#011  32.00MiB
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: Unallocated:
Mar 13 11:36:17 fan root:    /dev/mapper/fanbtr#011 106.97GiB
Mar 13 11:36:17 fan root: BEGIN btrfs balance start --force -sprofiles=dup /
Mar 13 11:36:17 fan root: Done, had to relocate 0 out of 94 chunks
Mar 13 11:36:17 fan root: btrfs fi df /
Mar 13 11:36:17 fan root: Data, single: total=78.00GiB, used=77.71GiB
Mar 13 11:36:17 fan root: System, single: total=32.00MiB, used=16.00KiB
Mar 13 11:36:17 fan root: Metadata, single: total=15.00GiB, used=2.38GiB
Mar 13 11:36:17 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 13 11:36:17 fan root: btrfs fi show /
Mar 13 11:36:17 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
Mar 13 11:36:17 fan root: #011Total devices 1 FS bytes used 80.09GiB
Mar 13 11:36:17 fan root: #011devid    1 size 200.00GiB used 93.03GiB path /dev/mapper/fanbtr
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: btrfs fi usage /
Mar 13 11:36:17 fan root: Overall:
Mar 13 11:36:17 fan root:     Device size:#011#011 200.00GiB
Mar 13 11:36:17 fan root:     Device allocated:#011#011  93.03GiB
Mar 13 11:36:17 fan root:     Device unallocated:#011#011 106.97GiB
Mar 13 11:36:17 fan root:     Device missing:#011#011     0.00B
Mar 13 11:36:17 fan root:     Used:#011#011#011  80.09GiB
Mar 13 11:36:17 fan root:     Free (estimated):#011#011 107.26GiB#011(min: 107.26GiB)
Mar 13 11:36:17 fan root:     Data ratio:#011#011#011      1.00
Mar 13 11:36:17 fan root:     Metadata ratio:#011#011      1.00
Mar 13 11:36:17 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: Data,single: Size:78.00GiB, Used:77.71GiB
Mar 13 11:36:17 fan root:    /dev/mapper/fanbtr#011  78.00GiB
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: Metadata,single: Size:15.00GiB, Used:2.38GiB
Mar 13 11:36:17 fan root:    /dev/mapper/fanbtr#011  15.00GiB
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: System,single: Size:32.00MiB, Used:16.00KiB
Mar 13 11:36:17 fan root:    /dev/mapper/fanbtr#011  32.00MiB
Mar 13 11:36:17 fan root:
Mar 13 11:36:17 fan root: Unallocated:
Mar 13 11:36:17 fan root:    /dev/mapper/fanbtr#011 106.97GiB
Mar 13 11:36:17 fan root: BEGIN btrfs balance start /
Mar 13 11:51:23 fan root: ERROR: error during balancing '/': No space left on device
Mar 13 11:51:23 fan root: There may be more info in syslog - try dmesg | tail
Mar 13 11:51:23 fan root: btrfs fi df /
Mar 13 11:51:23 fan root: Data, single: total=78.00GiB, used=77.70GiB
Mar 13 11:51:23 fan root: System, single: total=32.00MiB, used=16.00KiB
Mar 13 11:51:23 fan root: Metadata, single: total=23.00GiB, used=2.38GiB
Mar 13 11:51:23 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 13 11:51:23 fan root: btrfs fi show /
Mar 13 11:51:23 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
Mar 13 11:51:23 fan root: #011Total devices 1 FS bytes used 80.08GiB
Mar 13 11:51:23 fan root: #011devid    1 size 200.00GiB used 101.03GiB path /dev/mapper/fanbtr
Mar 13 11:51:23 fan root:
Mar 13 11:51:23 fan root: btrfs fi usage /
Mar 13 11:51:23 fan root: Overall:
Mar 13 11:51:23 fan root:     Device size:#011#011 200.00GiB
Mar 13 11:51:23 fan root:     Device allocated:#011#011 101.03GiB
Mar 13 11:51:23 fan root:     Device unallocated:#011#011  98.97GiB
Mar 13 11:51:23 fan root:     Device missing:#011#011     0.00B
Mar 13 11:51:23 fan root:     Used:#011#011#011  80.08GiB
Mar 13 11:51:23 fan root:     Free (estimated):#011#011  99.26GiB#011(min: 99.26GiB)
Mar 13 11:51:23 fan root:     Data ratio:#011#011#011      1.00
Mar 13 11:51:23 fan root:     Metadata ratio:#011#011      1.00
Mar 13 11:51:23 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 13 11:51:23 fan root:
Mar 13 11:51:23 fan root: Data,single: Size:78.00GiB, Used:77.70GiB
Mar 13 11:51:23 fan root:    /dev/mapper/fanbtr#011  78.00GiB
Mar 13 11:51:23 fan root:
Mar 13 11:51:23 fan root: Metadata,single: Size:23.00GiB, Used:2.38GiB
Mar 13 11:51:23 fan root:    /dev/mapper/fanbtr#011  23.00GiB
Mar 13 11:51:23 fan root:
Mar 13 11:51:23 fan root: System,single: Size:32.00MiB, Used:16.00KiB
Mar 13 11:51:23 fan root:    /dev/mapper/fanbtr#011  32.00MiB
Mar 13 11:51:23 fan root:
Mar 13 11:51:23 fan root: Unallocated:
Mar 13 11:51:23 fan root:    /dev/mapper/fanbtr#011  98.97GiB
Mar 13 11:51:23 fan root: END btrfs-balance script
[10/509]mh@fan:~$

I see the same metadata spread as with the old filesystem in btrfs fi
df, totl at 23 and used at 2.38 GiB. What I find strange is that this
filesystem has Data, System and Metadata in "single" profile, is this
the new default for a 200 GiB file system?

Full log is at http://q.bofh.de/~mh/stuff/20160313-fanbtr-btrfs-syslog

The log was taken with enospc_debug active on the file system and all
file system, block device and storage relevant log lines were left in.

Is there anything missing? Is this the same issue? Would the log help
as addition in https://bugzilla.kernel.org/show_bug.cgi?id=114451?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-13 11:58       ` New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work) Marc Haber
@ 2016-03-13 13:17         ` Andrew Vaughan
  2016-03-13 16:56           ` Marc Haber
  2016-03-13 17:12         ` Duncan
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 81+ messages in thread
From: Andrew Vaughan @ 2016-03-13 13:17 UTC (permalink / raw)
  To: Marc Haber; +Cc: Btrfs BTRFS

Hi Marc

On 13 March 2016 at 22:58, Marc Haber <mh+linux-btrfs@zugschlus.de> wrote:
> Hi,
>
> On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
>> The alternative if this can't be fixed, is to recreate the filesystem
>> because there's no practical way yet to migrate so many snapshots to a
>> new file system.
>
> I recreated the file system on March 7, with 200 GiB in size, using
> btrfs-tools 4.4. The snapshot-taking process has been running since
> then, but I also regularly cleaned up. The number of snapshots on the
> new filesystem has never exceeded 1000, with the current count being
> at 148.
>
<snip>


I'm not a dev, so I'll just thouw out a random, and possibly naive idea.

How much i/o load is this filesystem under?
What type of access pattern(s), how frequent and large are the changes?

Are you still making snapshots every 10m?
How often do you delete old snapshots?  Also every 10m, or do you
delete them in batchs every hour or so?

How long does "btrfs subvolume delete -c <one_snapshot>" take?
What does "time btrfs subvolume delete -C <one_snapshot> ; time btrfs
subvolume sync <mount_point>" print ?

The reason for asking is that even on a lightly loaded filesystem I
have seen btrfs subvolume delete take more than 30 seconds.  On a more
heavily load filesystem  I have seen 5+ minutes before btrfs subvolume
delete had finished.

If you have a high enough i/o load, plus large enough changes per
snapshot, it might be possible to get btrfs into a situation were it
never actually finishes cleaning up deleted snapshots.  (I'm also not
sure what happens if you shutdown or unmount whilst btrfs is still
cleaning up, but I expect the devs thought of that).

Andrew

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-13 13:17         ` Andrew Vaughan
@ 2016-03-13 16:56           ` Marc Haber
  0 siblings, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-13 16:56 UTC (permalink / raw)
  To: Btrfs BTRFS

On Mon, Mar 14, 2016 at 12:17:24AM +1100, Andrew Vaughan wrote:
> On 13 March 2016 at 22:58, Marc Haber <mh+linux-btrfs@zugschlus.de> wrote:
> > Hi,
> >
> > On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
> >> The alternative if this can't be fixed, is to recreate the filesystem
> >> because there's no practical way yet to migrate so many snapshots to a
> >> new file system.
> >
> > I recreated the file system on March 7, with 200 GiB in size, using
> > btrfs-tools 4.4. The snapshot-taking process has been running since
> > then, but I also regularly cleaned up. The number of snapshots on the
> > new filesystem has never exceeded 1000, with the current count being
> > at 148.
> >
> <snip>
> 
> 
> I'm not a dev, so I'll just thouw out a random, and possibly naive idea.
> 
> How much i/o load is this filesystem under?
> What type of access pattern(s), how frequent and large are the changes?

Nearly none. It's a workstation which I have avoided using in the last
days due to the filesystem trouble and to avoid impact of local work
to the filesystem behavior. I even log out after working on the box
for a few minutes.

There is a Debian apt-cacher running on the box and writing its cache
to this btrfs, but /var is on its own subvolume that is only
snapshotted once a day. I'll move /var/cache to its own subvolume and
set this subvolume on a "no snapshots" schedule.

The box itself is running a couple of KVM VMs, but the virtual disks
of the VMs are on dedicated LVs.

> Are you still making snapshots every 10m?

I am snapshotting the subvolume /home/mh, with the obvious contents,
every ten minutes, yes. Most of the other subvolumes is snapshotted
once daily, with some of them not getting snapshotted at all.

> How often do you delete old snapshots?  Also every 10m, or do you
> delete them in batchs every hour or so?

I delete them in batches about every ohter day.

> How long does "btrfs subvolume delete -c <one_snapshot>" take?
> What does "time btrfs subvolume delete -C <one_snapshot> ;

[4/504]mh@fan:~$ time sudo btrfs subvolume delete -c /mnt/snapshots/fanbtr/user/subdaily/2016/03/13/07/5001/-home-mh
Delete subvolume (commit): '/mnt/snapshots/fanbtr/user/subdaily/2016/03/13/07/5001/-home-mh'

real    0m0.100s
user    0m0.000s
sys     0m0.016s
[5/505]mh@fan:~$ time sudo btrfs subvolume delete -C /mnt/snapshots/fanbtr/user/subdaily/2016/03/13/07/4001/-home-mh
Delete subvolume (commit): '/mnt/snapshots/fanbtr/user/subdaily/2016/03/13/07/4001/-home-mh'

real    0m0.079s
user    0m0.012s
sys     0m0.000s
[6/506]mh@fan:~$

The difference between -c and -C does only show when there is more
than one snapshot to be deleted.

>  time btrfs subvolume sync <mount_point>" print ?

[8/508]mh@fan:~$ time sudo btrfs subvolume sync /

real    0m0.030s
user    0m0.004s
sys     0m0.008s
[9/509]mh@fan:~$

> The reason for asking is that even on a lightly loaded filesystem I
> have seen btrfs subvolume delete take more than 30 seconds.  On a more
> heavily load filesystem  I have seen 5+ minutes before btrfs subvolume
> delete had finished.

In my experience, deleting snapshot in huge batches slows down quite a
bit, but this btrfs does not suffer from this disease.

> If you have a high enough i/o load, plus large enough changes per
> snapshot, it might be possible to get btrfs into a situation were it
> never actually finishes cleaning up deleted snapshots.  (I'm also not
> sure what happens if you shutdown or unmount whilst btrfs is still
> cleaning up, but I expect the devs thought of that).

It is a COW filesystem, I'd expect it to be consistent no matter what.
But that's the theory.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-13 11:58       ` New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work) Marc Haber
  2016-03-13 13:17         ` Andrew Vaughan
@ 2016-03-13 17:12         ` Duncan
  2016-03-13 21:05           ` Marc Haber
  2016-03-13 19:14         ` Henk Slager
  2016-03-14 12:07         ` Marc Haber
  3 siblings, 1 reply; 81+ messages in thread
From: Duncan @ 2016-03-13 17:12 UTC (permalink / raw)
  To: linux-btrfs

Marc Haber posted on Sun, 13 Mar 2016 12:58:10 +0100 as excerpted:

> I see the same metadata spread as with the old filesystem in btrfs fi
> df,
> totl at 23 and used at 2.38 GiB. What I find strange is that this
> filesystem has Data, System and Metadata in "single" profile, is this
> the new default for a 200 GiB file system?

Single is default for data.  Metadata (and system) will normally default 
to dup on a single device, raid1 on multi-device, EXCEPT on detected 
SSDs, where it defaults to single as well, because the firmware on some 
ssds will dedup it in any case.  If you know your ssd isn't one of the 
deduping ones (as I do, here), you can of course overrule that by 
specifying modes at mkfs.btrfs time.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-13 11:58       ` New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work) Marc Haber
  2016-03-13 13:17         ` Andrew Vaughan
  2016-03-13 17:12         ` Duncan
@ 2016-03-13 19:14         ` Henk Slager
  2016-03-13 19:42           ` Henk Slager
  2016-03-13 20:56           ` Marc Haber
  2016-03-14 12:07         ` Marc Haber
  3 siblings, 2 replies; 81+ messages in thread
From: Henk Slager @ 2016-03-13 19:14 UTC (permalink / raw)
  To: Marc Haber; +Cc: Btrfs BTRFS

On Sun, Mar 13, 2016 at 12:58 PM, Marc Haber
<mh+linux-btrfs@zugschlus.de> wrote:
> Hi,
>
> On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
>> The alternative if this can't be fixed, is to recreate the filesystem
>> because there's no practical way yet to migrate so many snapshots to a
>> new file system.
>
> I recreated the file system on March 7, with 200 GiB in size, using
> btrfs-tools 4.4. The snapshot-taking process has been running since
> then, but I also regularly cleaned up. The number of snapshots on the
> new filesystem has never exceeded 1000, with the current count being
> at 148.

Is the snapshotting still read-write?
You mentioned earlier that you treated the snapshots as read-only, so
maybe create them also as read-only, in an attempt to mitigate the
problem of growing metadata and enospc issues.
Also, If some part of the OS or tools scans through the snapshot dirs
every now and then with atime creation on, metadata grows without a
real need.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-13 19:14         ` Henk Slager
@ 2016-03-13 19:42           ` Henk Slager
  2016-03-13 20:56           ` Marc Haber
  1 sibling, 0 replies; 81+ messages in thread
From: Henk Slager @ 2016-03-13 19:42 UTC (permalink / raw)
  To: Marc Haber; +Cc: Btrfs BTRFS

On Sun, Mar 13, 2016 at 8:14 PM, Henk Slager <eye1tm@gmail.com> wrote:
> On Sun, Mar 13, 2016 at 12:58 PM, Marc Haber
> <mh+linux-btrfs@zugschlus.de> wrote:
>> Hi,
>>
>> On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
>>> The alternative if this can't be fixed, is to recreate the filesystem
>>> because there's no practical way yet to migrate so many snapshots to a
>>> new file system.
>>
>> I recreated the file system on March 7, with 200 GiB in size, using
>> btrfs-tools 4.4. The snapshot-taking process has been running since
>> then, but I also regularly cleaned up. The number of snapshots on the
>> new filesystem has never exceeded 1000, with the current count being
>> at 148.
>
> Is the snapshotting still read-write?
> You mentioned earlier that you treated the snapshots as read-only, so
> maybe create them also as read-only, in an attempt to mitigate the
> problem of growing metadata and enospc issues.

Sorry, I mixed it up with another thread; Forget about the question.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-12 19:57       ` Marc Haber
@ 2016-03-13 19:43         ` Chris Murphy
  2016-03-13 20:50           ` Marc Haber
  0 siblings, 1 reply; 81+ messages in thread
From: Chris Murphy @ 2016-03-13 19:43 UTC (permalink / raw)
  To: Marc Haber; +Cc: Btrfs BTRFS

On Sat, Mar 12, 2016 at 12:57 PM, Marc Haber
<mh+linux-btrfs@zugschlus.de> wrote:
> On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
>> Something is happening with the usage of this file system that's out
>> of the ordinary. This is the first time I've seen such a large amount
>> of unused metadata allocation. And then for it not only fail to
>> balance, but for the allocation amount to increase is a first. So
>> understanding the usage is important to figuring out what's happening.
>> I'd file a bug and include as much information on how the fs got into
>> this state as possible. And also if possible make a btrfs-image using
>> the proper flags to blot out the filenames for privacy. And what
>> btrfs-progs tools were used to create this file system. Etc.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=114451
>
> Please advise if there is something missing.

No enospc_debug mount option used for kernel messages. And no
indication you applied Qu's patch mentioned on March 1 to get more
info with enospc_debug mount:

>Oh, I'm sorry that the output is not necessary, it's better to use the newer patch:
>https://patchwork.kernel.org/patch/8462881/
>With the newer patch, you will need to use enospc_debug mount option to get the debug information.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-13 19:43         ` Chris Murphy
@ 2016-03-13 20:50           ` Marc Haber
  2016-03-13 21:31             ` Chris Murphy
  0 siblings, 1 reply; 81+ messages in thread
From: Marc Haber @ 2016-03-13 20:50 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sun, Mar 13, 2016 at 01:43:50PM -0600, Chris Murphy wrote:
> On Sat, Mar 12, 2016 at 12:57 PM, Marc Haber
> <mh+linux-btrfs@zugschlus.de> wrote:
> > On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
> >> Something is happening with the usage of this file system that's out
> >> of the ordinary. This is the first time I've seen such a large amount
> >> of unused metadata allocation. And then for it not only fail to
> >> balance, but for the allocation amount to increase is a first. So
> >> understanding the usage is important to figuring out what's happening.
> >> I'd file a bug and include as much information on how the fs got into
> >> this state as possible. And also if possible make a btrfs-image using
> >> the proper flags to blot out the filenames for privacy. And what
> >> btrfs-progs tools were used to create this file system. Etc.
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=114451
> >
> > Please advise if there is something missing.
> 
> No enospc_debug mount option used for kernel messages.

I apologize for not having this mentioned, but why do you think that
it wasn't active?

|[28/527]mh@fan:~$ grep enospc /proc/mounts
|/dev/mapper/fanbtr / btrfs rw,noatime,nodiratime,ssd,space_cache,enospc_debug,subvolid=257,subvol=/fan-root 0 0
|/dev/mapper/fanbtr /mnt/snapshots/fanbtr btrfs rw,noatime,nodiratime,ssd,space_cache,enospc_debug,subvolid=266,subvol=/snapshots 0 0
|[29/528]mh@fan:~$

>  And no indication you applied Qu's patch mentioned on March 1 to get
>  more info with enospc_debug mount:
> 
> >Oh, I'm sorry that the output is not necessary, it's better to use the newer patch:
> >https://patchwork.kernel.org/patch/8462881/
> >With the newer patch, you will need to use enospc_debug mount option to get the debug information.

That one didn't make it in 4.4.5 yet?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-13 19:14         ` Henk Slager
  2016-03-13 19:42           ` Henk Slager
@ 2016-03-13 20:56           ` Marc Haber
  2016-03-14  0:00             ` Henk Slager
  1 sibling, 1 reply; 81+ messages in thread
From: Marc Haber @ 2016-03-13 20:56 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sun, Mar 13, 2016 at 08:14:45PM +0100, Henk Slager wrote:
> On Sun, Mar 13, 2016 at 12:58 PM, Marc Haber
> <mh+linux-btrfs@zugschlus.de> wrote:
> > Hi,
> >
> > On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
> >> The alternative if this can't be fixed, is to recreate the filesystem
> >> because there's no practical way yet to migrate so many snapshots to a
> >> new file system.
> >
> > I recreated the file system on March 7, with 200 GiB in size, using
> > btrfs-tools 4.4. The snapshot-taking process has been running since
> > then, but I also regularly cleaned up. The number of snapshots on the
> > new filesystem has never exceeded 1000, with the current count being
> > at 148.
> 
> Is the snapshotting still read-write?

Yes, I want to keep the possibility to remove huge files from
snapshots that shouldnt have been on a snapshotted volume in the first
place without having to ditch the entire snapshot.

> Also, If some part of the OS or tools scans through the snapshot dirs
> every now and then with atime creation on, metadata grows without a
> real need.

I mount with noatime and nodiratime anyway, and the directory the
snapshots are mounted to (/mnt/snapshots) are excluded in
updatedb.conf. Any other idea which tool might scan filesystems and
that might not be noticed when it's running about a five digit number
of snapshots?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-13 17:12         ` Duncan
@ 2016-03-13 21:05           ` Marc Haber
  2016-03-14  1:05             ` Duncan
  0 siblings, 1 reply; 81+ messages in thread
From: Marc Haber @ 2016-03-13 21:05 UTC (permalink / raw)
  To: linux-btrfs

On Sun, Mar 13, 2016 at 05:12:35PM +0000, Duncan wrote:
> Marc Haber posted on Sun, 13 Mar 2016 12:58:10 +0100 as excerpted:
> > I see the same metadata spread as with the old filesystem in btrfs fi
> > df,
> > totl at 23 and used at 2.38 GiB. What I find strange is that this
> > filesystem has Data, System and Metadata in "single" profile, is this
> > the new default for a 200 GiB file system?
> 
> Single is default for data.  Metadata (and system) will normally default 
> to dup on a single device, raid1 on multi-device, EXCEPT on detected 
> SSDs, where it defaults to single as well, because the firmware on some 
> ssds will dedup it in any case.  If you know your ssd isn't one of the 
> deduping ones (as I do, here), you can of course overrule that by 
> specifying modes at mkfs.btrfs time.

It was both times the same Samsung 840 EVO. Has this SSD detection
been added recently, or did older versions of mkfs.btrfs not detect an
SSD through a crypto layer, maybe?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-13 20:50           ` Marc Haber
@ 2016-03-13 21:31             ` Chris Murphy
  0 siblings, 0 replies; 81+ messages in thread
From: Chris Murphy @ 2016-03-13 21:31 UTC (permalink / raw)
  To: Marc Haber; +Cc: Btrfs BTRFS, Qu Wenruo

On Sun, Mar 13, 2016 at 2:50 PM, Marc Haber <mh+linux-btrfs@zugschlus.de> wrote:
> On Sun, Mar 13, 2016 at 01:43:50PM -0600, Chris Murphy wrote:
>> On Sat, Mar 12, 2016 at 12:57 PM, Marc Haber
>> <mh+linux-btrfs@zugschlus.de> wrote:
>> > On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
>> >> Something is happening with the usage of this file system that's out
>> >> of the ordinary. This is the first time I've seen such a large amount
>> >> of unused metadata allocation. And then for it not only fail to
>> >> balance, but for the allocation amount to increase is a first. So
>> >> understanding the usage is important to figuring out what's happening.
>> >> I'd file a bug and include as much information on how the fs got into
>> >> this state as possible. And also if possible make a btrfs-image using
>> >> the proper flags to blot out the filenames for privacy. And what
>> >> btrfs-progs tools were used to create this file system. Etc.
>> >
>> > https://bugzilla.kernel.org/show_bug.cgi?id=114451
>> >
>> > Please advise if there is something missing.
>>
>> No enospc_debug mount option used for kernel messages.
>
> I apologize for not having this mentioned, but why do you think that
> it wasn't active?

No additional information in the log attached to the bug. So I guess
this particular problem doesn't trigger any enospc debug options and
hence the patch.


>
> |[28/527]mh@fan:~$ grep enospc /proc/mounts
> |/dev/mapper/fanbtr / btrfs rw,noatime,nodiratime,ssd,space_cache,enospc_debug,subvolid=257,subvol=/fan-root 0 0
> |/dev/mapper/fanbtr /mnt/snapshots/fanbtr btrfs rw,noatime,nodiratime,ssd,space_cache,enospc_debug,subvolid=266,subvol=/snapshots 0 0
> |[29/528]mh@fan:~$
>
>>  And no indication you applied Qu's patch mentioned on March 1 to get
>>  more info with enospc_debug mount:
>>
>> >Oh, I'm sorry that the output is not necessary, it's better to use the newer patch:
>> >https://patchwork.kernel.org/patch/8462881/
>> >With the newer patch, you will need to use enospc_debug mount option to get the debug information.
>
> That one didn't make it in 4.4.5 yet?

I got no indication it was even going to make it into mainline
actually, I was thinking it was a one off patch. But if it does, it'll
need to be in 4.5.0 before it'd be backported to 4.4.x. Maybe Qu can
clarify?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-13 20:56           ` Marc Haber
@ 2016-03-14  0:00             ` Henk Slager
  2016-03-15  7:20               ` Marc Haber
  0 siblings, 1 reply; 81+ messages in thread
From: Henk Slager @ 2016-03-14  0:00 UTC (permalink / raw)
  To: Marc Haber; +Cc: Btrfs BTRFS

On Sun, Mar 13, 2016 at 9:56 PM, Marc Haber <mh+linux-btrfs@zugschlus.de> wrote:
> On Sun, Mar 13, 2016 at 08:14:45PM +0100, Henk Slager wrote:
>> On Sun, Mar 13, 2016 at 12:58 PM, Marc Haber
>> <mh+linux-btrfs@zugschlus.de> wrote:
>> > Hi,
>> >
>> > On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
>> >> The alternative if this can't be fixed, is to recreate the filesystem
>> >> because there's no practical way yet to migrate so many snapshots to a
>> >> new file system.
>> >
>> > I recreated the file system on March 7, with 200 GiB in size, using
>> > btrfs-tools 4.4. The snapshot-taking process has been running since
>> > then, but I also regularly cleaned up. The number of snapshots on the
>> > new filesystem has never exceeded 1000, with the current count being
>> > at 148.
>>
>> Is the snapshotting still read-write?
>
> Yes, I want to keep the possibility to remove huge files from
> snapshots that shouldnt have been on a snapshotted volume in the first
> place without having to ditch the entire snapshot.

You could do ro snapshotting and in case you want to modify something
inside a snapshot/subvolume:
# btrfs property set <subvolume> ro false
# rm <subvolume>/<somefile>
# btrfs property set <subvolume> ro true

>> Also, If some part of the OS or tools scans through the snapshot dirs
>> every now and then with atime creation on, metadata grows without a
>> real need.
>
> I mount with noatime and nodiratime anyway, and the directory the
> snapshots are mounted to (/mnt/snapshots) are excluded in
> updatedb.conf. Any other idea which tool might scan filesystems and
> that might not be noticed when it's running about a five digit number
> of snapshots?

Maybe baloo or so if you use KDE. Someone else reporting on this
maillist was searching snapshots dirs for latest before creating new
snapshot. That was more related to fs performance rather then the
metadata issue you experience.

The rw vs. ro snapshotting is the only think I can think of in order
to stop having the issue. I'm using both snapper and my own
scripts+btrfspogs since 2+ years on various small and big filesystems
with 100-1500 snapshots per fs without any metadata issues or so, but
it is all ro snapshots. And actually, if I want some modification
(rarely), I create another rw snapshot of the particular snapshot.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-13 21:05           ` Marc Haber
@ 2016-03-14  1:05             ` Duncan
  2016-03-14 11:49               ` Marc Haber
  0 siblings, 1 reply; 81+ messages in thread
From: Duncan @ 2016-03-14  1:05 UTC (permalink / raw)
  To: linux-btrfs

Marc Haber posted on Sun, 13 Mar 2016 22:05:37 +0100 as excerpted:

> On Sun, Mar 13, 2016 at 05:12:35PM +0000, Duncan wrote:
>> Marc Haber posted on Sun, 13 Mar 2016 12:58:10 +0100 as excerpted:
>> > I see the same metadata spread as with the old filesystem in btrfs fi
>> > df,
>> > totl at 23 and used at 2.38 GiB. What I find strange is that this
>> > filesystem has Data, System and Metadata in "single" profile, is this
>> > the new default for a 200 GiB file system?
>> 
>> Single is default for data.  Metadata (and system) will normally
>> default to dup on a single device, raid1 on multi-device, EXCEPT on
>> detected SSDs, where it defaults to single as well, because the
>> firmware on some ssds will dedup it in any case.  If you know your ssd
>> isn't one of the deduping ones (as I do, here), you can of course
>> overrule that by specifying modes at mkfs.btrfs time.
> 
> It was both times the same Samsung 840 EVO. Has this SSD detection been
> added recently, or did older versions of mkfs.btrfs not detect an SSD
> through a crypto layer, maybe?

Btrfs' ssd detection has been there for quite some time now (for 
userspace, since well before the releases synced to kernel version with 
3.12 or so, which effectively makes it ancient history in btrfs terms).

But according to the mkfs.btrfs manpage, the detection is based on 
/sys/block/DEV/queue/rotational (with DEV substituted appropriately), and 
various layers got support for correctly passing that thru at various 
times, some before btrfs, some after.  So that's very likely why btrfs 
didn't detect it originally, if it was on top of crypto and/or some other 
layer that might not have been passing that thru.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-14  1:05             ` Duncan
@ 2016-03-14 11:49               ` Marc Haber
  0 siblings, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-14 11:49 UTC (permalink / raw)
  To: linux-btrfs

On Mon, Mar 14, 2016 at 01:05:39AM +0000, Duncan wrote:
> But according to the mkfs.btrfs manpage, the detection is based on 
> /sys/block/DEV/queue/rotational (with DEV substituted appropriately), and 
> various layers got support for correctly passing that thru at various 
> times, some before btrfs, some after.  So that's very likely why btrfs 
> didn't detect it originally, if it was on top of crypto and/or some other 
> layer that might not have been passing that thru.

That explains it, thanks.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-13 11:58       ` New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work) Marc Haber
                           ` (2 preceding siblings ...)
  2016-03-13 19:14         ` Henk Slager
@ 2016-03-14 12:07         ` Marc Haber
  2016-03-14 12:48           ` New file system with same issue Holger Hoffstätte
  2016-03-14 13:46           ` New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work) Henk Slager
  3 siblings, 2 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-14 12:07 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sun, Mar 13, 2016 at 12:58:09PM +0100, Marc Haber wrote:
> On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
> > The alternative if this can't be fixed, is to recreate the filesystem
> > because there's no practical way yet to migrate so many snapshots to a
> > new file system.
> 
> I recreated the file system on March 7, with 200 GiB in size, using
> btrfs-tools 4.4. The snapshot-taking process has been running since
> then, but I also regularly cleaned up. The number of snapshots on the
> new filesystem has never exceeded 1000, with the current count being
> at 148.
> 
> And btrfs balance runs into the same ENOSPC issues as the old one:

... with Qu's patch, I now get a reproducible kernel trace:

Mar 14 10:23:49 fan mh: BEGIN btrfs-balance script
Mar 14 10:23:49 fan mh: btrfs fi df /
Mar 14 10:23:49 fan root: Data, single: total=79.00GiB, used=78.42GiB
Mar 14 10:23:49 fan root: System, single: total=32.00MiB, used=16.00KiB
Mar 14 10:23:49 fan root: Metadata, single: total=10.00GiB, used=2.46GiB
Mar 14 10:23:49 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 14 10:23:49 fan mh: btrfs fi show /
Mar 14 10:23:49 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
Mar 14 10:23:49 fan root: #011Total devices 1 FS bytes used 80.89GiB
Mar 14 10:23:49 fan root: #011devid    1 size 200.00GiB used 89.03GiB path /dev/mapper/fanbtr
Mar 14 10:23:49 fan root: 
Mar 14 10:23:49 fan mh: btrfs fi usage /
Mar 14 10:23:49 fan root: Overall:
Mar 14 10:23:49 fan root:     Device size:#011#011 200.00GiB
Mar 14 10:23:49 fan root:     Device allocated:#011#011  89.03GiB
Mar 14 10:23:49 fan root:     Device unallocated:#011#011 110.97GiB
Mar 14 10:23:49 fan root:     Device missing:#011#011     0.00B
Mar 14 10:23:49 fan root:     Used:#011#011#011  80.89GiB
Mar 14 10:23:49 fan root:     Free (estimated):#011#011 111.54GiB#011(min: 111.54GiB)
Mar 14 10:23:49 fan root:     Data ratio:#011#011#011      1.00
Mar 14 10:23:49 fan root:     Metadata ratio:#011#011      1.00
Mar 14 10:23:49 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 14 10:23:49 fan root: 
Mar 14 10:23:49 fan root: Data,single: Size:79.00GiB, Used:78.42GiB
Mar 14 10:23:49 fan root:    /dev/mapper/fanbtr#011  79.00GiB
Mar 14 10:23:49 fan root: 
Mar 14 10:23:49 fan root: Metadata,single: Size:10.00GiB, Used:2.46GiB
Mar 14 10:23:49 fan root:    /dev/mapper/fanbtr#011  10.00GiB
Mar 14 10:23:49 fan root: 
Mar 14 10:23:49 fan root: System,single: Size:32.00MiB, Used:16.00KiB
Mar 14 10:23:49 fan root:    /dev/mapper/fanbtr#011  32.00MiB
Mar 14 10:23:49 fan root: 
Mar 14 10:23:49 fan root: Unallocated:
Mar 14 10:23:49 fan root:    /dev/mapper/fanbtr#011 110.97GiB
Mar 14 10:23:49 fan mh: BEGIN btrfs balance start /
Mar 14 10:36:46 fan kernel: [  890.995815] BTRFS info (device dm-15): 6 enospc errors during balance
Mar 14 10:36:46 fan root: ERROR: error during balancing '/': No space left on device
Mar 14 10:36:46 fan root: There may be more info in syslog - try dmesg | tail
Mar 14 10:36:46 fan root: btrfs fi df /
Mar 14 10:36:46 fan root: Data, single: total=79.00GiB, used=78.42GiB
Mar 14 10:36:46 fan root: System, single: total=32.00MiB, used=16.00KiB
Mar 14 10:36:46 fan root: Metadata, single: total=12.00GiB, used=2.46GiB
Mar 14 10:36:46 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 14 10:36:46 fan root: btrfs fi show /
Mar 14 10:36:46 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
Mar 14 10:36:46 fan root: #011Total devices 1 FS bytes used 80.89GiB
Mar 14 10:36:46 fan root: #011devid    1 size 200.00GiB used 91.03GiB path /dev/mapper/fanbtr
Mar 14 10:36:46 fan root: 
Mar 14 10:36:46 fan root: btrfs fi usage /
Mar 14 10:36:46 fan root: Overall:
Mar 14 10:36:46 fan root:     Device size:#011#011 200.00GiB
Mar 14 10:36:46 fan root:     Device allocated:#011#011  91.03GiB
Mar 14 10:36:46 fan root:     Device unallocated:#011#011 108.97GiB
Mar 14 10:36:46 fan root:     Device missing:#011#011     0.00B
Mar 14 10:36:46 fan root:     Used:#011#011#011  80.89GiB
Mar 14 10:36:46 fan root:     Free (estimated):#011#011 109.54GiB#011(min: 109.54GiB)
Mar 14 10:36:46 fan root:     Data ratio:#011#011#011      1.00
Mar 14 10:36:46 fan root:     Metadata ratio:#011#011      1.00
Mar 14 10:36:46 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 14 10:36:46 fan root: 
Mar 14 10:36:46 fan root: Data,single: Size:79.00GiB, Used:78.42GiB
Mar 14 10:36:46 fan root:    /dev/mapper/fanbtr#011  79.00GiB
Mar 14 10:36:46 fan root: 
Mar 14 10:36:46 fan root: Metadata,single: Size:12.00GiB, Used:2.46GiB
Mar 14 10:36:46 fan root:    /dev/mapper/fanbtr#011  12.00GiB
Mar 14 10:36:46 fan root: 
Mar 14 10:36:46 fan root: System,single: Size:32.00MiB, Used:16.00KiB
Mar 14 10:36:46 fan root:    /dev/mapper/fanbtr#011  32.00MiB
Mar 14 10:36:46 fan root: 
Mar 14 10:36:46 fan root: Unallocated:
Mar 14 10:36:46 fan root:    /dev/mapper/fanbtr#011 108.97GiB
Mar 14 10:36:46 fan root: BEGIN btrfs balance start -dprofiles=single /
Mar 14 10:48:16 fan kernel: [ 1581.228727] ------------[ cut here ]------------
Mar 14 10:48:16 fan kernel: [ 1581.228794] WARNING: CPU: 1 PID: 121 at fs/btrfs/extent-tree.c:7897 btrfs_alloc_tree_block+0xeb/0x3d6 [btrfs]()
Mar 14 10:48:16 fan kernel: [ 1581.228800] BTRFS: block rsv returned -28
Mar 14 10:48:16 fan kernel: [ 1581.228804] Modules linked in: iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp dummy ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp llc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi amd64_edac_mod edac_mce_amd kvm_amd input_leds pcspkr edac_core snd_hda_intel kvm snd_cmipci snd_hda_codec snd_mpu401_uart snd_opl3_lib snd_rawmidi snd_hda_core snd_seq_device irqbypass k10temp snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_timer evdev snd asus_atk0110 soundcore acpi_cpufreq i2c_piix4 tpm_tis sg tpm processor shpchp hwmon_vid autofs4 crc32c_generic btrfs xor raid6_pq ext4 crc16 mbcache jbd2 hmac sha256_ssse3 sha256_generic drbg ansi_cprng xts gf128mul algif_skcipher af_alg dm_crypt dm_mod hid_generic usbhid hid usb_storage sr_mod sd_mod cdrom ohci_pci ahci libahci amdkfd r8169 mii radeon xhci_pci i2c_algo_bit xhci_hcd ttm sym53c8xx ehci_pci scsi_transport_spi ohci_hcd drm_kms_helper libata ehci_hcd drm usbcore scsi_mod usb_common i2c_core button
Mar 14 10:48:16 fan kernel: [ 1581.228957] CPU: 1 PID: 121 Comm: kworker/u16:2 Not tainted 4.4.5-zgws1 #2
Mar 14 10:48:16 fan kernel: [ 1581.228963] Hardware name: System manufacturer System Product Name/M5A88-V EVO, BIOS 1603    10/12/2012
Mar 14 10:48:16 fan kernel: [ 1581.229012] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
Mar 14 10:48:16 fan kernel: [ 1581.229018]  0000000000000000 ffffffff811dd418 ffff880613aafb58 0000000000000009
Mar 14 10:48:16 fan kernel: [ 1581.229026]  ffffffff81051e21 ffffffffa047d29f ffff8804350d6800 ffff880613aafbb0
Mar 14 10:48:16 fan kernel: [ 1581.229033]  ffff880610322000 ffff880531466700 ffffffff81051e79 ffffffffa04f065b
Mar 14 10:48:16 fan kernel: [ 1581.229039] Call Trace:
Mar 14 10:48:16 fan kernel: [ 1581.229052]  [<ffffffff811dd418>] ? dump_stack+0x5a/0x6f
Mar 14 10:48:16 fan kernel: [ 1581.229061]  [<ffffffff81051e21>] ? warn_slowpath_common+0x8e/0xa3
Mar 14 10:48:16 fan kernel: [ 1581.229104]  [<ffffffffa047d29f>] ? btrfs_alloc_tree_block+0xeb/0x3d6 [btrfs]
Mar 14 10:48:16 fan kernel: [ 1581.229112]  [<ffffffff81051e79>] ? warn_slowpath_fmt+0x43/0x4b
Mar 14 10:48:16 fan kernel: [ 1581.229155]  [<ffffffffa047d29f>] ? btrfs_alloc_tree_block+0xeb/0x3d6 [btrfs]
Mar 14 10:48:16 fan kernel: [ 1581.229194]  [<ffffffffa046b05b>] ? btrfs_copy_root+0x9e/0x1c1 [btrfs]
Mar 14 10:48:16 fan kernel: [ 1581.229239]  [<ffffffffa04c7844>] ? create_reloc_root+0x6c/0x196 [btrfs]
Mar 14 10:48:16 fan kernel: [ 1581.229285]  [<ffffffffa04cb3cd>] ? btrfs_init_reloc_root+0x71/0x9a [btrfs]
Mar 14 10:48:16 fan kernel: [ 1581.229329]  [<ffffffffa0489d74>] ? record_root_in_trans+0xc8/0xd5 [btrfs]
Mar 14 10:48:16 fan kernel: [ 1581.229374]  [<ffffffffa048b192>] ? btrfs_record_root_in_trans+0x42/0x5a [btrfs]
Mar 14 10:48:16 fan kernel: [ 1581.229418]  [<ffffffffa048c383>] ? start_transaction+0x2de/0x455 [btrfs]
Mar 14 10:48:16 fan kernel: [ 1581.229464]  [<ffffffffa0492c67>] ? btrfs_finish_ordered_io+0x1e2/0x4d7 [btrfs]
Mar 14 10:48:16 fan kernel: [ 1581.229474]  [<ffffffff81078953>] ? pick_next_task_fair+0x1f8/0x357
Mar 14 10:48:16 fan kernel: [ 1581.229482]  [<ffffffff810752af>] ? account_entity_dequeue+0x46/0x67
Mar 14 10:48:16 fan kernel: [ 1581.229529]  [<ffffffffa04b1ec4>] ? btrfs_scrubparity_helper+0xf4/0x233 [btrfs]
Mar 14 10:48:16 fan kernel: [ 1581.229538]  [<ffffffff81063b4f>] ? process_one_work+0x178/0x27b
Mar 14 10:48:16 fan kernel: [ 1581.229545]  [<ffffffff810640d3>] ? worker_thread+0x1da/0x280
Mar 14 10:48:16 fan kernel: [ 1581.229553]  [<ffffffff81063ef9>] ? rescuer_thread+0x284/0x284
Mar 14 10:48:16 fan kernel: [ 1581.229559]  [<ffffffff81067e59>] ? kthread+0x95/0x9d
Mar 14 10:48:16 fan kernel: [ 1581.229566]  [<ffffffff81067dc4>] ? kthread_parkme+0x16/0x16
Mar 14 10:48:16 fan kernel: [ 1581.229574]  [<ffffffff8140dfff>] ? ret_from_fork+0x3f/0x70
Mar 14 10:48:16 fan kernel: [ 1581.229580]  [<ffffffff81067dc4>] ? kthread_parkme+0x16/0x16
Mar 14 10:48:16 fan kernel: [ 1581.229586] ---[ end trace 69a5cc7238dc3665 ]---
Mar 14 10:51:06 fan root: Done, had to relocate 79 out of 92 chunks
Mar 14 10:51:06 fan root: btrfs fi df /
Mar 14 10:51:06 fan root: Data, single: total=79.00GiB, used=78.43GiB
Mar 14 10:51:06 fan root: System, single: total=32.00MiB, used=16.00KiB
Mar 14 10:51:06 fan root: Metadata, single: total=12.00GiB, used=2.46GiB
Mar 14 10:51:06 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 14 10:51:06 fan root: btrfs fi show /
Mar 14 10:51:06 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
Mar 14 10:51:06 fan root: #011Total devices 1 FS bytes used 80.89GiB
Mar 14 10:51:06 fan root: #011devid    1 size 200.00GiB used 91.03GiB path /dev/mapper/fanbtr
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: btrfs fi usage /
Mar 14 10:51:06 fan root: Overall:
Mar 14 10:51:06 fan root:     Device size:#011#011 200.00GiB
Mar 14 10:51:06 fan root:     Device allocated:#011#011  91.03GiB
Mar 14 10:51:06 fan root:     Device unallocated:#011#011 108.97GiB
Mar 14 10:51:06 fan root:     Device missing:#011#011     0.00B
Mar 14 10:51:06 fan root:     Used:#011#011#011  80.89GiB
Mar 14 10:51:06 fan root:     Free (estimated):#011#011 109.54GiB#011(min: 109.54GiB)
Mar 14 10:51:06 fan root:     Data ratio:#011#011#011      1.00
Mar 14 10:51:06 fan root:     Metadata ratio:#011#011      1.00
Mar 14 10:51:06 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: Data,single: Size:79.00GiB, Used:78.43GiB
Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  79.00GiB
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: Metadata,single: Size:12.00GiB, Used:2.46GiB
Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  12.00GiB
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: System,single: Size:32.00MiB, Used:16.00KiB
Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  32.00MiB
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: Unallocated:
Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011 108.97GiB
Mar 14 10:51:06 fan root: BEGIN btrfs balance start -mprofiles=dup /
Mar 14 10:51:06 fan root: Done, had to relocate 0 out of 92 chunks
Mar 14 10:51:06 fan root: btrfs fi df /
Mar 14 10:51:06 fan root: Data, single: total=79.00GiB, used=78.43GiB
Mar 14 10:51:06 fan root: System, single: total=32.00MiB, used=16.00KiB
Mar 14 10:51:06 fan root: Metadata, single: total=12.00GiB, used=2.46GiB
Mar 14 10:51:06 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 14 10:51:06 fan root: btrfs fi show /
Mar 14 10:51:06 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
Mar 14 10:51:06 fan root: #011Total devices 1 FS bytes used 80.89GiB
Mar 14 10:51:06 fan root: #011devid    1 size 200.00GiB used 91.03GiB path /dev/mapper/fanbtr
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: btrfs fi usage /
Mar 14 10:51:06 fan root: Overall:
Mar 14 10:51:06 fan root:     Device size:#011#011 200.00GiB
Mar 14 10:51:06 fan root:     Device allocated:#011#011  91.03GiB
Mar 14 10:51:06 fan root:     Device unallocated:#011#011 108.97GiB
Mar 14 10:51:06 fan root:     Device missing:#011#011     0.00B
Mar 14 10:51:06 fan root:     Used:#011#011#011  80.89GiB
Mar 14 10:51:06 fan root:     Free (estimated):#011#011 109.54GiB#011(min: 109.54GiB)
Mar 14 10:51:06 fan root:     Data ratio:#011#011#011      1.00
Mar 14 10:51:06 fan root:     Metadata ratio:#011#011      1.00
Mar 14 10:51:06 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: Data,single: Size:79.00GiB, Used:78.43GiB
Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  79.00GiB
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: Metadata,single: Size:12.00GiB, Used:2.46GiB
Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  12.00GiB
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: System,single: Size:32.00MiB, Used:16.00KiB
Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  32.00MiB
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: Unallocated:
Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011 108.97GiB
Mar 14 10:51:06 fan root: BEGIN btrfs balance start --force -sprofiles=dup /
Mar 14 10:51:06 fan root: Done, had to relocate 0 out of 92 chunks
Mar 14 10:51:06 fan root: btrfs fi df /
Mar 14 10:51:06 fan root: Data, single: total=79.00GiB, used=78.43GiB
Mar 14 10:51:06 fan root: System, single: total=32.00MiB, used=16.00KiB
Mar 14 10:51:06 fan root: Metadata, single: total=12.00GiB, used=2.46GiB
Mar 14 10:51:06 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 14 10:51:06 fan root: btrfs fi show /
Mar 14 10:51:06 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
Mar 14 10:51:06 fan root: #011Total devices 1 FS bytes used 80.89GiB
Mar 14 10:51:06 fan root: #011devid    1 size 200.00GiB used 91.03GiB path /dev/mapper/fanbtr
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: btrfs fi usage /
Mar 14 10:51:06 fan root: Overall:
Mar 14 10:51:06 fan root:     Device size:#011#011 200.00GiB
Mar 14 10:51:06 fan root:     Device allocated:#011#011  91.03GiB
Mar 14 10:51:06 fan root:     Device unallocated:#011#011 108.97GiB
Mar 14 10:51:06 fan root:     Device missing:#011#011     0.00B
Mar 14 10:51:06 fan root:     Used:#011#011#011  80.89GiB
Mar 14 10:51:06 fan root:     Free (estimated):#011#011 109.54GiB#011(min: 109.54GiB)
Mar 14 10:51:06 fan root:     Data ratio:#011#011#011      1.00
Mar 14 10:51:06 fan root:     Metadata ratio:#011#011      1.00
Mar 14 10:51:06 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: Data,single: Size:79.00GiB, Used:78.43GiB
Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  79.00GiB
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: Metadata,single: Size:12.00GiB, Used:2.46GiB
Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  12.00GiB
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: System,single: Size:32.00MiB, Used:16.00KiB
Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  32.00MiB
Mar 14 10:51:06 fan root: 
Mar 14 10:51:06 fan root: Unallocated:
Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011 108.97GiB
Mar 14 10:51:06 fan root: BEGIN btrfs balance start /
Mar 14 11:08:36 fan kernel: [ 2800.565044] BTRFS info (device dm-15): 7 enospc errors during balance
Mar 14 11:08:36 fan root: ERROR: error during balancing '/': No space left on device
Mar 14 11:08:36 fan root: There may be more info in syslog - try dmesg | tail
Mar 14 11:08:36 fan mh: btrfs fi df /
Mar 14 11:08:36 fan root: Data, single: total=79.00GiB, used=78.43GiB
Mar 14 11:08:36 fan root: System, single: total=32.00MiB, used=16.00KiB
Mar 14 11:08:36 fan root: Metadata, single: total=19.00GiB, used=2.46GiB
Mar 14 11:08:36 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 14 11:08:36 fan mh: btrfs fi show /
Mar 14 11:08:36 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
Mar 14 11:08:36 fan root: #011Total devices 1 FS bytes used 80.89GiB
Mar 14 11:08:36 fan root: #011devid    1 size 200.00GiB used 98.03GiB path /dev/mapper/fanbtr
Mar 14 11:08:36 fan root: 
Mar 14 11:08:36 fan mh: btrfs fi usage /
Mar 14 11:08:36 fan root: Overall:
Mar 14 11:08:36 fan root:     Device size:#011#011 200.00GiB
Mar 14 11:08:36 fan root:     Device allocated:#011#011  98.03GiB
Mar 14 11:08:36 fan root:     Device unallocated:#011#011 101.97GiB
Mar 14 11:08:36 fan root:     Device missing:#011#011     0.00B
Mar 14 11:08:36 fan root:     Used:#011#011#011  80.89GiB
Mar 14 11:08:36 fan root:     Free (estimated):#011#011 102.54GiB#011(min: 102.54GiB)
Mar 14 11:08:36 fan root:     Data ratio:#011#011#011      1.00
Mar 14 11:08:36 fan root:     Metadata ratio:#011#011      1.00
Mar 14 11:08:36 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 14 11:08:36 fan root: 
Mar 14 11:08:36 fan root: Data,single: Size:79.00GiB, Used:78.43GiB
Mar 14 11:08:36 fan root:    /dev/mapper/fanbtr#011  79.00GiB
Mar 14 11:08:36 fan root: 
Mar 14 11:08:36 fan root: Metadata,single: Size:19.00GiB, Used:2.46GiB
Mar 14 11:08:36 fan root:    /dev/mapper/fanbtr#011  19.00GiB
Mar 14 11:08:36 fan root: 
Mar 14 11:08:36 fan root: System,single: Size:32.00MiB, Used:16.00KiB
Mar 14 11:08:36 fan root:    /dev/mapper/fanbtr#011  32.00MiB
Mar 14 11:08:36 fan root: 
Mar 14 11:08:36 fan root: Unallocated:
Mar 14 11:08:36 fan root:    /dev/mapper/fanbtr#011 101.97GiB
Mar 14 11:08:36 fan mh: END btrfs-balance script

Full log is at http://q.bofh.de/~mh/stuff/20160314-fanbtr-btrfs-syslog

The log was taken with enospc_debug active on the file system and all
file system, block device and storage relevant log lines were left in.

What does the trace tell us here?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue
  2016-03-14 12:07         ` Marc Haber
@ 2016-03-14 12:48           ` Holger Hoffstätte
  2016-03-14 20:13             ` Marc Haber
  2016-03-14 13:46           ` New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work) Henk Slager
  1 sibling, 1 reply; 81+ messages in thread
From: Holger Hoffstätte @ 2016-03-14 12:48 UTC (permalink / raw)
  To: Marc Haber, Btrfs BTRFS

On 03/14/16 13:07, Marc Haber wrote:
>> And btrfs balance runs into the same ENOSPC issues as the old one:
> 
> ... with Qu's patch, I now get a reproducible kernel trace:

<snip>

That is interesting and useful. Sorry if this was asked before, but
did you ever try to clear the free-space cache via -o clear_cache
on mount?

Give it a try, let it run for a while and then try balancing
again. There is definitely something wrong with the chunk
allocation on your system since you still have allegedly
~100G free, so it's not the notorious (and a lot less common these
days) edge case of sparsely populated chunks that would require
a 'compaction'.

IMHO you are also looking at two different, possibly unrelated
problems: failure to allocate more chunks vs. metadata bloat,
so let's not confuse the two.

Uncle Occam's razor also suggests that the involvement of dm
doesn't help. Why not just use the device/partition directly?
_Someone_ is lying to btrfs in terms of device size and/or allocated
chunks, otherwise you wouldn't get the ENOSPC.

-h


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-14 12:07         ` Marc Haber
  2016-03-14 12:48           ` New file system with same issue Holger Hoffstätte
@ 2016-03-14 13:46           ` Henk Slager
  2016-03-14 20:05             ` Marc Haber
  1 sibling, 1 reply; 81+ messages in thread
From: Henk Slager @ 2016-03-14 13:46 UTC (permalink / raw)
  To: Marc Haber; +Cc: Btrfs BTRFS

On Mon, Mar 14, 2016 at 1:07 PM, Marc Haber <mh+linux-btrfs@zugschlus.de> wrote:
> On Sun, Mar 13, 2016 at 12:58:09PM +0100, Marc Haber wrote:
>> On Sat, Mar 05, 2016 at 12:34:09PM -0700, Chris Murphy wrote:
>> > The alternative if this can't be fixed, is to recreate the filesystem
>> > because there's no practical way yet to migrate so many snapshots to a
>> > new file system.
>>
>> I recreated the file system on March 7, with 200 GiB in size, using
>> btrfs-tools 4.4. The snapshot-taking process has been running since
>> then, but I also regularly cleaned up. The number of snapshots on the
>> new filesystem has never exceeded 1000, with the current count being
>> at 148.
>>
>> And btrfs balance runs into the same ENOSPC issues as the old one:
>
> ... with Qu's patch, I now get a reproducible kernel trace:
>
> Mar 14 10:23:49 fan mh: BEGIN btrfs-balance script
> Mar 14 10:23:49 fan mh: btrfs fi df /
> Mar 14 10:23:49 fan root: Data, single: total=79.00GiB, used=78.42GiB
> Mar 14 10:23:49 fan root: System, single: total=32.00MiB, used=16.00KiB
> Mar 14 10:23:49 fan root: Metadata, single: total=10.00GiB, used=2.46GiB
> Mar 14 10:23:49 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
> Mar 14 10:23:49 fan mh: btrfs fi show /
> Mar 14 10:23:49 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
> Mar 14 10:23:49 fan root: #011Total devices 1 FS bytes used 80.89GiB
> Mar 14 10:23:49 fan root: #011devid    1 size 200.00GiB used 89.03GiB path /dev/mapper/fanbtr
> Mar 14 10:23:49 fan root:
> Mar 14 10:23:49 fan mh: btrfs fi usage /
> Mar 14 10:23:49 fan root: Overall:
> Mar 14 10:23:49 fan root:     Device size:#011#011 200.00GiB
> Mar 14 10:23:49 fan root:     Device allocated:#011#011  89.03GiB
> Mar 14 10:23:49 fan root:     Device unallocated:#011#011 110.97GiB
> Mar 14 10:23:49 fan root:     Device missing:#011#011     0.00B
> Mar 14 10:23:49 fan root:     Used:#011#011#011  80.89GiB
> Mar 14 10:23:49 fan root:     Free (estimated):#011#011 111.54GiB#011(min: 111.54GiB)
> Mar 14 10:23:49 fan root:     Data ratio:#011#011#011      1.00
> Mar 14 10:23:49 fan root:     Metadata ratio:#011#011      1.00
> Mar 14 10:23:49 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
It it looks a bit strange to me that this is already 512MiB for and fs
of 200GiB. Just after creation (4.4 tools) it should be something like
16MiB. And grows when fs is used, but 512MiB... An fs created with
older tools had 512MiB from start AFAIK

> Mar 14 10:23:49 fan root:
> Mar 14 10:23:49 fan root: Data,single: Size:79.00GiB, Used:78.42GiB
> Mar 14 10:23:49 fan root:    /dev/mapper/fanbtr#011  79.00GiB
> Mar 14 10:23:49 fan root:
> Mar 14 10:23:49 fan root: Metadata,single: Size:10.00GiB, Used:2.46GiB
> Mar 14 10:23:49 fan root:    /dev/mapper/fanbtr#011  10.00GiB
> Mar 14 10:23:49 fan root:
> Mar 14 10:23:49 fan root: System,single: Size:32.00MiB, Used:16.00KiB
> Mar 14 10:23:49 fan root:    /dev/mapper/fanbtr#011  32.00MiB
> Mar 14 10:23:49 fan root:
> Mar 14 10:23:49 fan root: Unallocated:
> Mar 14 10:23:49 fan root:    /dev/mapper/fanbtr#011 110.97GiB
> Mar 14 10:23:49 fan mh: BEGIN btrfs balance start /
> Mar 14 10:36:46 fan kernel: [  890.995815] BTRFS info (device dm-15): 6 enospc errors during balance
> Mar 14 10:36:46 fan root: ERROR: error during balancing '/': No space left on device
> Mar 14 10:36:46 fan root: There may be more info in syslog - try dmesg | tail
> Mar 14 10:36:46 fan root: btrfs fi df /
> Mar 14 10:36:46 fan root: Data, single: total=79.00GiB, used=78.42GiB
> Mar 14 10:36:46 fan root: System, single: total=32.00MiB, used=16.00KiB
> Mar 14 10:36:46 fan root: Metadata, single: total=12.00GiB, used=2.46GiB
> Mar 14 10:36:46 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
> Mar 14 10:36:46 fan root: btrfs fi show /
> Mar 14 10:36:46 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
> Mar 14 10:36:46 fan root: #011Total devices 1 FS bytes used 80.89GiB
> Mar 14 10:36:46 fan root: #011devid    1 size 200.00GiB used 91.03GiB path /dev/mapper/fanbtr
> Mar 14 10:36:46 fan root:
> Mar 14 10:36:46 fan root: btrfs fi usage /
> Mar 14 10:36:46 fan root: Overall:
> Mar 14 10:36:46 fan root:     Device size:#011#011 200.00GiB
> Mar 14 10:36:46 fan root:     Device allocated:#011#011  91.03GiB
> Mar 14 10:36:46 fan root:     Device unallocated:#011#011 108.97GiB
> Mar 14 10:36:46 fan root:     Device missing:#011#011     0.00B
> Mar 14 10:36:46 fan root:     Used:#011#011#011  80.89GiB
> Mar 14 10:36:46 fan root:     Free (estimated):#011#011 109.54GiB#011(min: 109.54GiB)
> Mar 14 10:36:46 fan root:     Data ratio:#011#011#011      1.00
> Mar 14 10:36:46 fan root:     Metadata ratio:#011#011      1.00
> Mar 14 10:36:46 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
> Mar 14 10:36:46 fan root:
> Mar 14 10:36:46 fan root: Data,single: Size:79.00GiB, Used:78.42GiB
> Mar 14 10:36:46 fan root:    /dev/mapper/fanbtr#011  79.00GiB
> Mar 14 10:36:46 fan root:
> Mar 14 10:36:46 fan root: Metadata,single: Size:12.00GiB, Used:2.46GiB
> Mar 14 10:36:46 fan root:    /dev/mapper/fanbtr#011  12.00GiB
> Mar 14 10:36:46 fan root:
> Mar 14 10:36:46 fan root: System,single: Size:32.00MiB, Used:16.00KiB
> Mar 14 10:36:46 fan root:    /dev/mapper/fanbtr#011  32.00MiB
> Mar 14 10:36:46 fan root:
> Mar 14 10:36:46 fan root: Unallocated:
> Mar 14 10:36:46 fan root:    /dev/mapper/fanbtr#011 108.97GiB
> Mar 14 10:36:46 fan root: BEGIN btrfs balance start -dprofiles=single /
> Mar 14 10:48:16 fan kernel: [ 1581.228727] ------------[ cut here ]------------
> Mar 14 10:48:16 fan kernel: [ 1581.228794] WARNING: CPU: 1 PID: 121 at fs/btrfs/extent-tree.c:7897 btrfs_alloc_tree_block+0xeb/0x3d6 [btrfs]()
> Mar 14 10:48:16 fan kernel: [ 1581.228800] BTRFS: block rsv returned -28
> Mar 14 10:48:16 fan kernel: [ 1581.228804] Modules linked in: iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp dummy ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp llc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi amd64_edac_mod edac_mce_amd kvm_amd input_leds pcspkr edac_core snd_hda_intel kvm snd_cmipci snd_hda_codec snd_mpu401_uart snd_opl3_lib snd_rawmidi snd_hda_core snd_seq_device irqbypass k10temp snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_timer evdev snd asus_atk0110 soundcore acpi_cpufreq i2c_piix4 tpm_tis sg tpm processor shpchp hwmon_vid autofs4 crc32c_generic btrfs xor raid6_pq ext4 crc16 mbcache jbd2 hmac sha256_ssse3 sha256_generic drbg ansi_cprng xts gf128mul algif_skcipher af_alg dm_crypt dm_mod hid_generic usbhid hid usb_storage sr_mod sd_mod cdrom ohci_pci ahci libahci amdkfd r8169 mii radeon xhci_pci i2c_algo_bit xhci_hcd ttm sym53c8xx ehci_pci scsi_transport_spi ohci_hcd drm_kms_helper libata ehci_hcd drm usbcore scsi_mod usb_common i2c_core button
> Mar 14 10:48:16 fan kernel: [ 1581.228957] CPU: 1 PID: 121 Comm: kworker/u16:2 Not tainted 4.4.5-zgws1 #2
> Mar 14 10:48:16 fan kernel: [ 1581.228963] Hardware name: System manufacturer System Product Name/M5A88-V EVO, BIOS 1603    10/12/2012
> Mar 14 10:48:16 fan kernel: [ 1581.229012] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> Mar 14 10:48:16 fan kernel: [ 1581.229018]  0000000000000000 ffffffff811dd418 ffff880613aafb58 0000000000000009
> Mar 14 10:48:16 fan kernel: [ 1581.229026]  ffffffff81051e21 ffffffffa047d29f ffff8804350d6800 ffff880613aafbb0
> Mar 14 10:48:16 fan kernel: [ 1581.229033]  ffff880610322000 ffff880531466700 ffffffff81051e79 ffffffffa04f065b
> Mar 14 10:48:16 fan kernel: [ 1581.229039] Call Trace:
> Mar 14 10:48:16 fan kernel: [ 1581.229052]  [<ffffffff811dd418>] ? dump_stack+0x5a/0x6f
> Mar 14 10:48:16 fan kernel: [ 1581.229061]  [<ffffffff81051e21>] ? warn_slowpath_common+0x8e/0xa3
> Mar 14 10:48:16 fan kernel: [ 1581.229104]  [<ffffffffa047d29f>] ? btrfs_alloc_tree_block+0xeb/0x3d6 [btrfs]
> Mar 14 10:48:16 fan kernel: [ 1581.229112]  [<ffffffff81051e79>] ? warn_slowpath_fmt+0x43/0x4b
> Mar 14 10:48:16 fan kernel: [ 1581.229155]  [<ffffffffa047d29f>] ? btrfs_alloc_tree_block+0xeb/0x3d6 [btrfs]
> Mar 14 10:48:16 fan kernel: [ 1581.229194]  [<ffffffffa046b05b>] ? btrfs_copy_root+0x9e/0x1c1 [btrfs]
> Mar 14 10:48:16 fan kernel: [ 1581.229239]  [<ffffffffa04c7844>] ? create_reloc_root+0x6c/0x196 [btrfs]
> Mar 14 10:48:16 fan kernel: [ 1581.229285]  [<ffffffffa04cb3cd>] ? btrfs_init_reloc_root+0x71/0x9a [btrfs]
> Mar 14 10:48:16 fan kernel: [ 1581.229329]  [<ffffffffa0489d74>] ? record_root_in_trans+0xc8/0xd5 [btrfs]
> Mar 14 10:48:16 fan kernel: [ 1581.229374]  [<ffffffffa048b192>] ? btrfs_record_root_in_trans+0x42/0x5a [btrfs]
> Mar 14 10:48:16 fan kernel: [ 1581.229418]  [<ffffffffa048c383>] ? start_transaction+0x2de/0x455 [btrfs]
> Mar 14 10:48:16 fan kernel: [ 1581.229464]  [<ffffffffa0492c67>] ? btrfs_finish_ordered_io+0x1e2/0x4d7 [btrfs]
> Mar 14 10:48:16 fan kernel: [ 1581.229474]  [<ffffffff81078953>] ? pick_next_task_fair+0x1f8/0x357
> Mar 14 10:48:16 fan kernel: [ 1581.229482]  [<ffffffff810752af>] ? account_entity_dequeue+0x46/0x67
> Mar 14 10:48:16 fan kernel: [ 1581.229529]  [<ffffffffa04b1ec4>] ? btrfs_scrubparity_helper+0xf4/0x233 [btrfs]
> Mar 14 10:48:16 fan kernel: [ 1581.229538]  [<ffffffff81063b4f>] ? process_one_work+0x178/0x27b
> Mar 14 10:48:16 fan kernel: [ 1581.229545]  [<ffffffff810640d3>] ? worker_thread+0x1da/0x280
> Mar 14 10:48:16 fan kernel: [ 1581.229553]  [<ffffffff81063ef9>] ? rescuer_thread+0x284/0x284
> Mar 14 10:48:16 fan kernel: [ 1581.229559]  [<ffffffff81067e59>] ? kthread+0x95/0x9d
> Mar 14 10:48:16 fan kernel: [ 1581.229566]  [<ffffffff81067dc4>] ? kthread_parkme+0x16/0x16
> Mar 14 10:48:16 fan kernel: [ 1581.229574]  [<ffffffff8140dfff>] ? ret_from_fork+0x3f/0x70
> Mar 14 10:48:16 fan kernel: [ 1581.229580]  [<ffffffff81067dc4>] ? kthread_parkme+0x16/0x16
> Mar 14 10:48:16 fan kernel: [ 1581.229586] ---[ end trace 69a5cc7238dc3665 ]---
> Mar 14 10:51:06 fan root: Done, had to relocate 79 out of 92 chunks
> Mar 14 10:51:06 fan root: btrfs fi df /
> Mar 14 10:51:06 fan root: Data, single: total=79.00GiB, used=78.43GiB
> Mar 14 10:51:06 fan root: System, single: total=32.00MiB, used=16.00KiB
> Mar 14 10:51:06 fan root: Metadata, single: total=12.00GiB, used=2.46GiB
> Mar 14 10:51:06 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
> Mar 14 10:51:06 fan root: btrfs fi show /
> Mar 14 10:51:06 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
> Mar 14 10:51:06 fan root: #011Total devices 1 FS bytes used 80.89GiB
> Mar 14 10:51:06 fan root: #011devid    1 size 200.00GiB used 91.03GiB path /dev/mapper/fanbtr
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: btrfs fi usage /
> Mar 14 10:51:06 fan root: Overall:
> Mar 14 10:51:06 fan root:     Device size:#011#011 200.00GiB
> Mar 14 10:51:06 fan root:     Device allocated:#011#011  91.03GiB
> Mar 14 10:51:06 fan root:     Device unallocated:#011#011 108.97GiB
> Mar 14 10:51:06 fan root:     Device missing:#011#011     0.00B
> Mar 14 10:51:06 fan root:     Used:#011#011#011  80.89GiB
> Mar 14 10:51:06 fan root:     Free (estimated):#011#011 109.54GiB#011(min: 109.54GiB)
> Mar 14 10:51:06 fan root:     Data ratio:#011#011#011      1.00
> Mar 14 10:51:06 fan root:     Metadata ratio:#011#011      1.00
> Mar 14 10:51:06 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: Data,single: Size:79.00GiB, Used:78.43GiB
> Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  79.00GiB
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: Metadata,single: Size:12.00GiB, Used:2.46GiB
> Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  12.00GiB
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: System,single: Size:32.00MiB, Used:16.00KiB
> Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  32.00MiB
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: Unallocated:
> Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011 108.97GiB
> Mar 14 10:51:06 fan root: BEGIN btrfs balance start -mprofiles=dup /

This probably should have been  -mprofiles=single
So that its gets more clear where and when the enospc errors occur

> Mar 14 10:51:06 fan root: Done, had to relocate 0 out of 92 chunks
> Mar 14 10:51:06 fan root: btrfs fi df /
> Mar 14 10:51:06 fan root: Data, single: total=79.00GiB, used=78.43GiB
> Mar 14 10:51:06 fan root: System, single: total=32.00MiB, used=16.00KiB
> Mar 14 10:51:06 fan root: Metadata, single: total=12.00GiB, used=2.46GiB
> Mar 14 10:51:06 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
> Mar 14 10:51:06 fan root: btrfs fi show /
> Mar 14 10:51:06 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
> Mar 14 10:51:06 fan root: #011Total devices 1 FS bytes used 80.89GiB
> Mar 14 10:51:06 fan root: #011devid    1 size 200.00GiB used 91.03GiB path /dev/mapper/fanbtr
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: btrfs fi usage /
> Mar 14 10:51:06 fan root: Overall:
> Mar 14 10:51:06 fan root:     Device size:#011#011 200.00GiB
> Mar 14 10:51:06 fan root:     Device allocated:#011#011  91.03GiB
> Mar 14 10:51:06 fan root:     Device unallocated:#011#011 108.97GiB
> Mar 14 10:51:06 fan root:     Device missing:#011#011     0.00B
> Mar 14 10:51:06 fan root:     Used:#011#011#011  80.89GiB
> Mar 14 10:51:06 fan root:     Free (estimated):#011#011 109.54GiB#011(min: 109.54GiB)
> Mar 14 10:51:06 fan root:     Data ratio:#011#011#011      1.00
> Mar 14 10:51:06 fan root:     Metadata ratio:#011#011      1.00
> Mar 14 10:51:06 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: Data,single: Size:79.00GiB, Used:78.43GiB
> Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  79.00GiB
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: Metadata,single: Size:12.00GiB, Used:2.46GiB
> Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  12.00GiB
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: System,single: Size:32.00MiB, Used:16.00KiB
> Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  32.00MiB
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: Unallocated:
> Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011 108.97GiB
> Mar 14 10:51:06 fan root: BEGIN btrfs balance start --force -sprofiles=dup /
Same here; you have no dup profiles

> Mar 14 10:51:06 fan root: Done, had to relocate 0 out of 92 chunks
> Mar 14 10:51:06 fan root: btrfs fi df /
> Mar 14 10:51:06 fan root: Data, single: total=79.00GiB, used=78.43GiB
> Mar 14 10:51:06 fan root: System, single: total=32.00MiB, used=16.00KiB
> Mar 14 10:51:06 fan root: Metadata, single: total=12.00GiB, used=2.46GiB
> Mar 14 10:51:06 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
> Mar 14 10:51:06 fan root: btrfs fi show /
> Mar 14 10:51:06 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
> Mar 14 10:51:06 fan root: #011Total devices 1 FS bytes used 80.89GiB
> Mar 14 10:51:06 fan root: #011devid    1 size 200.00GiB used 91.03GiB path /dev/mapper/fanbtr
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: btrfs fi usage /
> Mar 14 10:51:06 fan root: Overall:
> Mar 14 10:51:06 fan root:     Device size:#011#011 200.00GiB
> Mar 14 10:51:06 fan root:     Device allocated:#011#011  91.03GiB
> Mar 14 10:51:06 fan root:     Device unallocated:#011#011 108.97GiB
> Mar 14 10:51:06 fan root:     Device missing:#011#011     0.00B
> Mar 14 10:51:06 fan root:     Used:#011#011#011  80.89GiB
> Mar 14 10:51:06 fan root:     Free (estimated):#011#011 109.54GiB#011(min: 109.54GiB)
> Mar 14 10:51:06 fan root:     Data ratio:#011#011#011      1.00
> Mar 14 10:51:06 fan root:     Metadata ratio:#011#011      1.00
> Mar 14 10:51:06 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: Data,single: Size:79.00GiB, Used:78.43GiB
> Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  79.00GiB
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: Metadata,single: Size:12.00GiB, Used:2.46GiB
> Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  12.00GiB
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: System,single: Size:32.00MiB, Used:16.00KiB
> Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011  32.00MiB
> Mar 14 10:51:06 fan root:
> Mar 14 10:51:06 fan root: Unallocated:
> Mar 14 10:51:06 fan root:    /dev/mapper/fanbtr#011 108.97GiB
> Mar 14 10:51:06 fan root: BEGIN btrfs balance start /
> Mar 14 11:08:36 fan kernel: [ 2800.565044] BTRFS info (device dm-15): 7 enospc errors during balance
> Mar 14 11:08:36 fan root: ERROR: error during balancing '/': No space left on device
> Mar 14 11:08:36 fan root: There may be more info in syslog - try dmesg | tail
> Mar 14 11:08:36 fan mh: btrfs fi df /
> Mar 14 11:08:36 fan root: Data, single: total=79.00GiB, used=78.43GiB
> Mar 14 11:08:36 fan root: System, single: total=32.00MiB, used=16.00KiB
> Mar 14 11:08:36 fan root: Metadata, single: total=19.00GiB, used=2.46GiB
> Mar 14 11:08:36 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
> Mar 14 11:08:36 fan mh: btrfs fi show /
> Mar 14 11:08:36 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
> Mar 14 11:08:36 fan root: #011Total devices 1 FS bytes used 80.89GiB
> Mar 14 11:08:36 fan root: #011devid    1 size 200.00GiB used 98.03GiB path /dev/mapper/fanbtr
> Mar 14 11:08:36 fan root:
> Mar 14 11:08:36 fan mh: btrfs fi usage /
> Mar 14 11:08:36 fan root: Overall:
> Mar 14 11:08:36 fan root:     Device size:#011#011 200.00GiB
> Mar 14 11:08:36 fan root:     Device allocated:#011#011  98.03GiB
> Mar 14 11:08:36 fan root:     Device unallocated:#011#011 101.97GiB
> Mar 14 11:08:36 fan root:     Device missing:#011#011     0.00B
> Mar 14 11:08:36 fan root:     Used:#011#011#011  80.89GiB
> Mar 14 11:08:36 fan root:     Free (estimated):#011#011 102.54GiB#011(min: 102.54GiB)
> Mar 14 11:08:36 fan root:     Data ratio:#011#011#011      1.00
> Mar 14 11:08:36 fan root:     Metadata ratio:#011#011      1.00
> Mar 14 11:08:36 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
> Mar 14 11:08:36 fan root:
> Mar 14 11:08:36 fan root: Data,single: Size:79.00GiB, Used:78.43GiB
> Mar 14 11:08:36 fan root:    /dev/mapper/fanbtr#011  79.00GiB
> Mar 14 11:08:36 fan root:
> Mar 14 11:08:36 fan root: Metadata,single: Size:19.00GiB, Used:2.46GiB
> Mar 14 11:08:36 fan root:    /dev/mapper/fanbtr#011  19.00GiB
> Mar 14 11:08:36 fan root:
> Mar 14 11:08:36 fan root: System,single: Size:32.00MiB, Used:16.00KiB
> Mar 14 11:08:36 fan root:    /dev/mapper/fanbtr#011  32.00MiB
> Mar 14 11:08:36 fan root:
> Mar 14 11:08:36 fan root: Unallocated:
> Mar 14 11:08:36 fan root:    /dev/mapper/fanbtr#011 101.97GiB
> Mar 14 11:08:36 fan mh: END btrfs-balance script
>
> Full log is at http://q.bofh.de/~mh/stuff/20160314-fanbtr-btrfs-syslog
>
> The log was taken with enospc_debug active on the file system and all
> file system, block device and storage relevant log lines were left in.
>
> What does the trace tell us here?

Although a warning, data balance seems to work. The problem still
points to metadata.

BTW, I restored and mounted your 20160307-fanbtr-image:

[266169.207952] BTRFS: device label fanbtr devid 1 transid 22215732 /dev/loop0
[266203.734804] BTRFS info (device loop0): disk space caching is enabled
[266203.734806] BTRFS: has skinny extents
[266204.022175] BTRFS: checking UUID tree
[266239.407249] attempt to access beyond end of device
[266239.407252] loop0: rw=1073, want=715202688, limit=705760000
[266239.407254] BTRFS error (device loop0): bdev /dev/loop0 errs: wr
1, rd 0, flush 0, corrupt 0, gen 0
[266239.407272] attempt to access beyond end of device
.. and 16 more

As a quick fix/workaround, I truncated the image to 1T

After re-loop and mount and while doing a balance of the metadata I got this:
[266667.431704] BTRFS error (device loop0): bad tree block start 0 5827368812544

So something is/was wrong with the fs. Did you do a btrfs check before imaging?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-14 13:46           ` New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work) Henk Slager
@ 2016-03-14 20:05             ` Marc Haber
  2016-03-14 20:39               ` Henk Slager
  2016-03-15 13:29               ` Marc Haber
  0 siblings, 2 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-14 20:05 UTC (permalink / raw)
  To: Btrfs BTRFS

Hi Henk,

On Mon, Mar 14, 2016 at 02:46:54PM +0100, Henk Slager wrote:
> On Mon, Mar 14, 2016 at 1:07 PM, Marc Haber <mh+linux-btrfs@zugschlus.de> wrote:
> > Mar 14 10:23:49 fan mh: BEGIN btrfs-balance script
> > Mar 14 10:23:49 fan mh: btrfs fi df /
> > Mar 14 10:23:49 fan root: Data, single: total=79.00GiB, used=78.42GiB
> > Mar 14 10:23:49 fan root: System, single: total=32.00MiB, used=16.00KiB
> > Mar 14 10:23:49 fan root: Metadata, single: total=10.00GiB, used=2.46GiB
> > Mar 14 10:23:49 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
> > Mar 14 10:23:49 fan mh: btrfs fi show /
> > Mar 14 10:23:49 fan root: Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
> > Mar 14 10:23:49 fan root: #011Total devices 1 FS bytes used 80.89GiB
> > Mar 14 10:23:49 fan root: #011devid    1 size 200.00GiB used 89.03GiB path /dev/mapper/fanbtr
> > Mar 14 10:23:49 fan root:
> > Mar 14 10:23:49 fan mh: btrfs fi usage /
> > Mar 14 10:23:49 fan root: Overall:
> > Mar 14 10:23:49 fan root:     Device size:#011#011 200.00GiB
> > Mar 14 10:23:49 fan root:     Device allocated:#011#011  89.03GiB
> > Mar 14 10:23:49 fan root:     Device unallocated:#011#011 110.97GiB
> > Mar 14 10:23:49 fan root:     Device missing:#011#011     0.00B
> > Mar 14 10:23:49 fan root:     Used:#011#011#011  80.89GiB
> > Mar 14 10:23:49 fan root:     Free (estimated):#011#011 111.54GiB#011(min: 111.54GiB)
> > Mar 14 10:23:49 fan root:     Data ratio:#011#011#011      1.00
> > Mar 14 10:23:49 fan root:     Metadata ratio:#011#011      1.00
> > Mar 14 10:23:49 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
> It it looks a bit strange to me that this is already 512MiB for and fs
> of 200GiB. Just after creation (4.4 tools) it should be something like
> 16MiB. And grows when fs is used, but 512MiB... An fs created with
> older tools had 512MiB from start AFAIK

Confirmed, a new btrfs of 200 GB made on a rotating disk has 16 MiB of
global reserve. Unfortunately, I do not have history about how this
grew over time. The first btrfs fi usage I have on file was about half
a day into this fs' existence on Mar 7, after copying data on to it,
and Global reserve was already at 512 MiB.

> > Mar 14 10:51:06 fan root: BEGIN btrfs balance start -mprofiles=dup /
> 
> This probably should have been  -mprofiles=single
> So that its gets more clear where and when the enospc errors occur

Good catch. So I'd need to parse btrfs fi df's output to call the
right balance option. I blindly copied that over from the script I
wrote for the older btrfs which still has DUP metadata and system.

> BTW, I restored and mounted your 20160307-fanbtr-image:
> 
> [266169.207952] BTRFS: device label fanbtr devid 1 transid 22215732 /dev/loop0
> [266203.734804] BTRFS info (device loop0): disk space caching is enabled
> [266203.734806] BTRFS: has skinny extents
> [266204.022175] BTRFS: checking UUID tree
> [266239.407249] attempt to access beyond end of device
> [266239.407252] loop0: rw=1073, want=715202688, limit=705760000
> [266239.407254] BTRFS error (device loop0): bdev /dev/loop0 errs: wr
> 1, rd 0, flush 0, corrupt 0, gen 0
> [266239.407272] attempt to access beyond end of device
> .. and 16 more
> 
> As a quick fix/workaround, I truncated the image to 1T

The original fs was 417 GiB in size. What size does the image claim?

> After re-loop and mount and while doing a balance of the metadata I got this:
> [266667.431704] BTRFS error (device loop0): bad tree block start 0 5827368812544
>
> So something is/was wrong with the fs. Did you do a btrfs check before imaging?

No, I didn't. And there is indeed something wrong:

[10/509]mh@fan:~$ sudo btrfs check /media/tempdisk/
Superblock bytenr is larger than device size
Couldn't open file system
[11/509]mh@fan:~$

Can this be fixed?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue
  2016-03-14 12:48           ` New file system with same issue Holger Hoffstätte
@ 2016-03-14 20:13             ` Marc Haber
  2016-03-15 10:52               ` Holger Hoffstätte
  2016-03-17  1:17               ` A good "Boot Maintenance" scheme (WAS: New file system with same issue) Robert White
  0 siblings, 2 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-14 20:13 UTC (permalink / raw)
  To: Btrfs BTRFS

On Mon, Mar 14, 2016 at 01:48:18PM +0100, Holger Hoffstätte wrote:
> On 03/14/16 13:07, Marc Haber wrote:
> >> And btrfs balance runs into the same ENOSPC issues as the old one:
> > 
> > ... with Qu's patch, I now get a reproducible kernel trace:
> 
> <snip>
> 
> That is interesting and useful. Sorry if this was asked before, but
> did you ever try to clear the free-space cache via -o clear_cache
> on mount?

This was not asked, and I didn't try. Since this is an encrypted root
filesystem, is it a workable way to add clear_cache to /etc/fstab,
rebuild initramfs and reboot? Or do you recommend using a rescue system?

> Give it a try, let it run for a while and then try balancing
> again.

Do I need to wait for clear_cache to finish, like until I see disk
usage dropping?

> Uncle Occam's razor also suggests that the involvement of dm
> doesn't help. Why not just use the device/partition directly?

I need the dm intermediate since I don't want to repartition the
expensive SSD and the entire system is crypted.

> _Someone_ is lying to btrfs in terms of device size and/or allocated
> chunks, otherwise you wouldn't get the ENOSPC.

Which properties does a block device report other than size?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-14 20:05             ` Marc Haber
@ 2016-03-14 20:39               ` Henk Slager
  2016-03-14 21:59                 ` Chris Murphy
  2016-03-15  7:07                 ` Marc Haber
  2016-03-15 13:29               ` Marc Haber
  1 sibling, 2 replies; 81+ messages in thread
From: Henk Slager @ 2016-03-14 20:39 UTC (permalink / raw)
  To: Btrfs BTRFS

>> BTW, I restored and mounted your 20160307-fanbtr-image:
>>
>> [266169.207952] BTRFS: device label fanbtr devid 1 transid 22215732 /dev/loop0
>> [266203.734804] BTRFS info (device loop0): disk space caching is enabled
>> [266203.734806] BTRFS: has skinny extents
>> [266204.022175] BTRFS: checking UUID tree
>> [266239.407249] attempt to access beyond end of device
>> [266239.407252] loop0: rw=1073, want=715202688, limit=705760000
>> [266239.407254] BTRFS error (device loop0): bdev /dev/loop0 errs: wr
>> 1, rd 0, flush 0, corrupt 0, gen 0
>> [266239.407272] attempt to access beyond end of device
>> .. and 16 more
>>
>> As a quick fix/workaround, I truncated the image to 1T
>
> The original fs was 417 GiB in size. What size does the image claim?

ls -alFh  of the restored image showed 337G I remember.
btrfs fi us showed also a number over 400G, I don't have the
files/loopdev anymore.
It could some side effect of btrfs-image, I only have used it for
multi-device, where dev id's are ignore, but total image size did not
lead to problems.

> [10/509]mh@fan:~$ sudo btrfs check /media/tempdisk/
> Superblock bytenr is larger than device size
> Couldn't open file system
> [11/509]mh@fan:~$
>
> Can this be fixed?

What I would do in order to fix it, is resize the fs to let's say
190GiB. That should write correct values to the superblocks I /hope/.
And then resize back to max.
Maybe btrfs check --repair can also fix it, but before doing --repair
or other actions, I would see what else besides btrfs could be wrong,
see also suggestion of Holger. --repair can repair certain things, but
also destroy the fs.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-14 20:39               ` Henk Slager
@ 2016-03-14 21:59                 ` Chris Murphy
  2016-03-14 23:22                   ` Henk Slager
  2016-03-15  7:07                 ` Marc Haber
  1 sibling, 1 reply; 81+ messages in thread
From: Chris Murphy @ 2016-03-14 21:59 UTC (permalink / raw)
  To: Btrfs BTRFS

I'm a little mystified how btrfs check reports a problem with the
superblock, and yet this filesystem can still be mounted and used? If
it mounts rw then resize is possible but why would it be wrong in the
first place?


Chris Murphy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-14 21:59                 ` Chris Murphy
@ 2016-03-14 23:22                   ` Henk Slager
  2016-03-15  7:16                     ` Marc Haber
  0 siblings, 1 reply; 81+ messages in thread
From: Henk Slager @ 2016-03-14 23:22 UTC (permalink / raw)
  To: Btrfs BTRFS

On Mon, Mar 14, 2016 at 10:59 PM, Chris Murphy <lists@colorremedies.com> wrote:
> I'm a little mystified how btrfs check reports a problem with the
> superblock, and yet this filesystem can still be mounted and used? If
> it mounts rw then resize is possible but why would it be wrong in the
> first place?

Yes you are right. I just focussed on the 'Superblock bytenr is larger
than device size' part.
btrfs check  will actually output that text on a mounted filesystem anyhow.
The other question is: What is mounted on /media/tempdisk/  ?

At least I think a check of the current 200GiB fs is needed. As it is
a rootfs and encrypted, some work is needed to make that happen.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-14 20:39               ` Henk Slager
  2016-03-14 21:59                 ` Chris Murphy
@ 2016-03-15  7:07                 ` Marc Haber
  2016-03-27 12:15                   ` Martin Steigerwald
  1 sibling, 1 reply; 81+ messages in thread
From: Marc Haber @ 2016-03-15  7:07 UTC (permalink / raw)
  To: Btrfs BTRFS

On Mon, Mar 14, 2016 at 09:39:51PM +0100, Henk Slager wrote:
> >> BTW, I restored and mounted your 20160307-fanbtr-image:
> >>
> >> [266169.207952] BTRFS: device label fanbtr devid 1 transid 22215732 /dev/loop0
> >> [266203.734804] BTRFS info (device loop0): disk space caching is enabled
> >> [266203.734806] BTRFS: has skinny extents
> >> [266204.022175] BTRFS: checking UUID tree
> >> [266239.407249] attempt to access beyond end of device
> >> [266239.407252] loop0: rw=1073, want=715202688, limit=705760000
> >> [266239.407254] BTRFS error (device loop0): bdev /dev/loop0 errs: wr
> >> 1, rd 0, flush 0, corrupt 0, gen 0
> >> [266239.407272] attempt to access beyond end of device
> >> .. and 16 more
> >>
> >> As a quick fix/workaround, I truncated the image to 1T
> >
> > The original fs was 417 GiB in size. What size does the image claim?
> 
> ls -alFh  of the restored image showed 337G I remember.
> btrfs fi us showed also a number over 400G, I don't have the
> files/loopdev anymore.

sounds legit.

> It could some side effect of btrfs-image, I only have used it for
> multi-device, where dev id's are ignore, but total image size did not
> lead to problems.

The original "ofanbtr" seems to have a problem, since btrfs check
/media/tempdisk says:

> > [10/509]mh@fan:~$ sudo btrfs check /media/tempdisk/
> > Superblock bytenr is larger than device size
> > Couldn't open file system
> > [11/509]mh@fan:~$
> >
> > Can this be fixed?
> 
> What I would do in order to fix it, is resize the fs to let's say
> 190GiB. That should write correct values to the superblocks I /hope/.
> And then resize back to max.

It doesn't:
[20/518]mh@fan:~$ sudo btrfs filesystem resize 300G /media/tempdisk/
Resize '/media/tempdisk/' of '300G'
[22/520]mh@fan:~$ sudo btrfs check /media/tempdisk/
Superblock bytenr is larger than device size
Couldn't open file system
[23/521]mh@fan:~$ df -h

> Maybe btrfs check --repair can also fix it, but before doing --repair
> or other actions, I would see what else besides btrfs could be wrong,
> see also suggestion of Holger.

Like putting the filesystem on an unencrypted medium? Sorry, no,
private data, paranoia.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-14 23:22                   ` Henk Slager
@ 2016-03-15  7:16                     ` Marc Haber
  2016-03-15 12:15                       ` Henk Slager
  0 siblings, 1 reply; 81+ messages in thread
From: Marc Haber @ 2016-03-15  7:16 UTC (permalink / raw)
  To: Btrfs BTRFS

On Tue, Mar 15, 2016 at 12:22:00AM +0100, Henk Slager wrote:
> The other question is: What is mounted on /media/tempdisk/  ?

The "old" btrfs filesystem "ofanbtr", formerly 417 GB in size, now
resized to 300 GB. Does it need to be umounted to be checked?

> At least I think a check of the current 200GiB fs is needed. As it is
> a rootfs and encrypted, some work is needed to make that happen.

You suggested a btrfs check after looking at the image of "ofanbtr".
Do you want me to check the new "fanbtr" also?

Too bad that we went back to looking at "ofanbtr" after I changed the
subject to avoid mixing up both instances.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-14  0:00             ` Henk Slager
@ 2016-03-15  7:20               ` Marc Haber
  0 siblings, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-15  7:20 UTC (permalink / raw)
  Cc: Btrfs BTRFS

On Mon, Mar 14, 2016 at 01:00:13AM +0100, Henk Slager wrote:
> On Sun, Mar 13, 2016 at 9:56 PM, Marc Haber <mh+linux-btrfs@zugschlus.de> wrote:
> > Yes, I want to keep the possibility to remove huge files from
> > snapshots that shouldnt have been on a snapshotted volume in the first
> > place without having to ditch the entire snapshot.
> 
> You could do ro snapshotting and in case you want to modify something
> inside a snapshot/subvolume:
> # btrfs property set <subvolume> ro false
> # rm <subvolume>/<somefile>
> # btrfs property set <subvolume> ro true

I was not aware that it is possible to fiddle with the ro property of
an already existing snapshot. I am not yet sure whether I love or hate
this.

> >> Also, If some part of the OS or tools scans through the snapshot dirs
> >> every now and then with atime creation on, metadata grows without a
> >> real need.
> >
> > I mount with noatime and nodiratime anyway, and the directory the
> > snapshots are mounted to (/mnt/snapshots) are excluded in
> > updatedb.conf. Any other idea which tool might scan filesystems and
> > that might not be noticed when it's running about a five digit number
> > of snapshots?
> 
> Maybe baloo or so if you use KDE.

I usually do those tests via ssh without even being logged in to a
local desktop.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue
  2016-03-14 20:13             ` Marc Haber
@ 2016-03-15 10:52               ` Holger Hoffstätte
  2016-03-15 13:46                 ` Marc Haber
  2016-03-17  1:17               ` A good "Boot Maintenance" scheme (WAS: New file system with same issue) Robert White
  1 sibling, 1 reply; 81+ messages in thread
From: Holger Hoffstätte @ 2016-03-15 10:52 UTC (permalink / raw)
  To: Marc Haber, Btrfs BTRFS

On 03/14/16 21:13, Marc Haber wrote:
> On Mon, Mar 14, 2016 at 01:48:18PM +0100, Holger Hoffstätte wrote:
>> did you ever try to clear the free-space cache via -o clear_cache
>> on mount?
> 
> This was not asked, and I didn't try. Since this is an encrypted root
> filesystem, is it a workable way to add clear_cache to /etc/fstab,
> rebuild initramfs and reboot? Or do you recommend using a rescue system?

If you can do it via a rescue system that might be easiest, but adding
it to fstab and rebooting once has the same effect. Whatever you know
how to do safely.

>> Give it a try, let it run for a while and then try balancing
>> again.
> 
> Do I need to wait for clear_cache to finish, like until I see disk
> usage dropping?

The cache isn't that big, so you won't see a huge drop. Just use the
disk normally for a few minutes, after some time the cache will be
written out again.

>> _Someone_ is lying to btrfs in terms of device size and/or allocated
>> chunks, otherwise you wouldn't get the ENOSPC.
> 
> Which properties does a block device report other than size?

Well..at least all you can find in /sys/block/sdX/*. However, reading
the other subthread about the mismatching image size I'm now none the
wiser what else to suggest. :/

-h

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-15  7:16                     ` Marc Haber
@ 2016-03-15 12:15                       ` Henk Slager
  2016-03-15 13:24                         ` Marc Haber
  0 siblings, 1 reply; 81+ messages in thread
From: Henk Slager @ 2016-03-15 12:15 UTC (permalink / raw)
  To: Btrfs BTRFS

On Tue, Mar 15, 2016 at 8:16 AM, Marc Haber <mh+linux-btrfs@zugschlus.de> wrote:
> On Tue, Mar 15, 2016 at 12:22:00AM +0100, Henk Slager wrote:
>> The other question is: What is mounted on /media/tempdisk/  ?
>
> The "old" btrfs filesystem "ofanbtr", formerly 417 GB in size, now
> resized to 300 GB. Does it need to be umounted to be checked?

Yes, that's the whole point

>> At least I think a check of the current 200GiB fs is needed. As it is
>> a rootfs and encrypted, some work is needed to make that happen.
>
> You suggested a btrfs check after looking at the image of "ofanbtr".
> Do you want me to check the new "fanbtr" also?

I was not sure if 'ofanbtr' is an image created by btrfs-image or a
extra dd created image you might have locally. Both 'ofanbtr' and
'fanbtr' have the same balance issue, but 'fanbtr' is created with
newer and known kernel+tools version I assume, so that's why the
suggestion.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-15 12:15                       ` Henk Slager
@ 2016-03-15 13:24                         ` Marc Haber
  0 siblings, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-15 13:24 UTC (permalink / raw)
  To: Btrfs BTRFS

On Tue, Mar 15, 2016 at 01:15:33PM +0100, Henk Slager wrote:
> On Tue, Mar 15, 2016 at 8:16 AM, Marc Haber <mh+linux-btrfs@zugschlus.de> wrote:
> > On Tue, Mar 15, 2016 at 12:22:00AM +0100, Henk Slager wrote:
> >> The other question is: What is mounted on /media/tempdisk/  ?
> >
> > The "old" btrfs filesystem "ofanbtr", formerly 417 GB in size, now
> > resized to 300 GB. Does it need to be umounted to be checked?
> 
> Yes, that's the whole point
> 
> >> At least I think a check of the current 200GiB fs is needed. As it is
> >> a rootfs and encrypted, some work is needed to make that happen.
> >
> > You suggested a btrfs check after looking at the image of "ofanbtr".
> > Do you want me to check the new "fanbtr" also?
> 
> I was not sure if 'ofanbtr' is an image created by btrfs-image or a
> extra dd created image you might have locally. Both 'ofanbtr' and
> 'fanbtr' have the same balance issue, but 'fanbtr' is created with
> newer and known kernel+tools version I assume, so that's why the
> suggestion.

ofanbtr is the old btrfs, on /dev/mapper/ofanbtr:
Label: 'ofanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
        Total devices 1 FS bytes used 80.63GiB
        devid    1 size 300.00GiB used 122.06GiB path /dev/mapper/ofanbtr
it was created as 'fanbtr' in September, 300 GiB in Size, then - in
February, I think, resized to 417 MiB to make room for more data and
for balancing, used until March 7, and then renamed to ofanbtr with
lvrename and btrfs fi label. It was then imaged, and then resized back
to 300 GiB in the hope that this will fix the size issue.

fanbtr is the new btrfs, on /dev/mapper/fanbtr:
Label: 'fanbtr'  uuid: 90f8d728-6bae-4fca-8cda-b368ba2c008e
        Total devices 1 FS bytes used 82.45GiB
        devid    1 size 200.00GiB used 113.03GiB path /dev/mapper/fanbtr
it was created on march 7, had the data from ofanbtr cp'ed over, and
is being used as the active filesystem since then. It is smaller
because I don't have much more room on the SSD.

Both do have the same balance issue, yes.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-14 20:05             ` Marc Haber
  2016-03-14 20:39               ` Henk Slager
@ 2016-03-15 13:29               ` Marc Haber
  2016-03-15 13:42                 ` Marc Haber
  1 sibling, 1 reply; 81+ messages in thread
From: Marc Haber @ 2016-03-15 13:29 UTC (permalink / raw)
  To: Btrfs BTRFS

On Mon, Mar 14, 2016 at 09:05:46PM +0100, Marc Haber wrote:
> [10/509]mh@fan:~$ sudo btrfs check /media/tempdisk/
> Superblock bytenr is larger than device size
> Couldn't open file system
> [11/509]mh@fan:~$

After umounting and btrfs check the block device, things seem to be
fine now:

[34/532]mh@fan:~$ sudo btrfs check /dev/mapper/ofanbtr
Checking filesystem on /dev/mapper/ofanbtr
UUID: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 86554574954 bytes used err is 0
total csum bytes: 81815012
total tree bytes: 2476670976
total fs tree bytes: 2246311936
total extent tree bytes: 133201920
btree space waste bytes: 452859567
file data blocks allocated: 292994375680
 referenced 132664688640
[35/533]mh@fan:~$ sudo btrfs check /dev/mapper/ofanbtr
Checking filesystem on /dev/mapper/ofanbtr
UUID: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 86554574954 bytes used err is 0
total csum bytes: 81815012
total tree bytes: 2476670976
total fs tree bytes: 2246311936
total extent tree bytes: 133201920
btree space waste bytes: 452859567
file data blocks allocated: 292994375680
 referenced 132664688640
[36/533]mh@fan:~$

This does not indicate an error, does it?

Greetings
Marc, who would like to the tools a bit more explicit and consistent
in whether they want the fs mounted, umounted, the mountpoint or the
device on their command line

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-15 13:29               ` Marc Haber
@ 2016-03-15 13:42                 ` Marc Haber
  2016-03-15 16:54                   ` Henk Slager
  0 siblings, 1 reply; 81+ messages in thread
From: Marc Haber @ 2016-03-15 13:42 UTC (permalink / raw)
  To: Btrfs BTRFS

On Tue, Mar 15, 2016 at 02:29:32PM +0100, Marc Haber wrote:
> After umounting and btrfs check the block device, things seem to be
> fine now

But, umounting the btrfs seemed to trigger the following kernel traces:

Mar 15 14:21:30 fan kernel: [92308.377104] ------------[ cut here ]------------
Mar 15 14:21:30 fan kernel: [92308.377135] WARNING: CPU: 5 PID: 28243 at fs/btrfs/extent-tree.c:5380 bt
rfs_free_block_groups+0x1bc/0x36f [btrfs]()
Mar 15 14:21:30 fan kernel: [92308.377137] Modules linked in: vhost_net vhost macvtap macvlan tun iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp dummy ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp llc snd_cmipci snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm_amd snd_mpu401_uart snd_opl3_lib snd_rawmidi kvm snd_hda_intel snd_seq_device snd_hda_codec snd_hda_core snd_hwdep amd64_edac_mod snd_pcm_oss edac_mce_amd irqbypass input_leds snd_mixer_oss pcspkr k10temp edac_core snd_pcm snd_timer snd i2c_piix4 asus_atk0110 soundcore acpi_cpufreq tpm_tis tpm sg processor evdev shpchp hwmon_vid autofs4 crc32c_generic btrfs xor raid6_pq ext4 crc16 mbcache jbd2 hmac sha256_ssse3 sha256_generic drbg ansi_cprng xts gf128mul algif_skcipher af_alg dm_crypt dm_mod hid_generic usbhid hid usb_storage sr_mod sd_mod cdrom ohci_pci r8169 mii amdkfd radeon i2c_algo_bit ahci ttm sym53c8xx libahci xhci_pci scsi_transport_spi drm_kms_helper ohci_hcd ehci_pci xhci_hcd libata ehci_hcd drm usbcore scsi_mod usb_common i2c_core button
Mar 15 14:21:30 fan kernel: [92308.377203] CPU: 5 PID: 28243 Comm: umount Not tainted 4.4.5-zgws1 #2
Mar 15 14:21:30 fan kernel: [92308.377205] Hardware name: System manufacturer System Product Name/M5A88-V EVO, BIOS 1603    10/12/2012
Mar 15 14:21:30 fan kernel: [92308.377207]  000000000000005b ffffffff811dd418 0000000000000000 0000000000000009
Mar 15 14:21:30 fan kernel: [92308.377210]  ffffffff81051e21 ffffffffa047a147 ffff880600a28000 0000000000000000
Mar 15 14:21:30 fan kernel: [92308.377212]  ffff880600a28080 ffff8805af7eea00 ffffffffa047a147 ffff880600a28000
Mar 15 14:21:30 fan kernel: [92308.377215] Call Trace:
Mar 15 14:21:30 fan kernel: [92308.377221]  [<ffffffff811dd418>] ? dump_stack+0x5a/0x6f
Mar 15 14:21:30 fan kernel: [92308.377224]  [<ffffffff81051e21>] ? warn_slowpath_common+0x8e/0xa3
Mar 15 14:21:30 fan kernel: [92308.377239]  [<ffffffffa047a147>] ? btrfs_free_block_groups+0x1bc/0x36f[btrfs]
Mar 15 14:21:30 fan kernel: [92308.377252]  [<ffffffffa047a147>] ? btrfs_free_block_groups+0x1bc/0x36f[btrfs]
Mar 15 14:21:30 fan kernel: [92308.377267]  [<ffffffffa0487c72>] ? close_ctree+0x1e6/0x2f2 [btrfs]
Mar 15 14:21:30 fan kernel: [92308.377271]  [<ffffffff8113dfd9>] ? generic_shutdown_super+0x64/0xdf
Mar 15 14:21:30 fan kernel: [92308.377273]  [<ffffffff8113e181>] ? kill_anon_super+0x9/0xe
Mar 15 14:21:30 fan kernel: [92308.377285]  [<ffffffffa04660db>] ? btrfs_kill_super+0xd/0x16 [btrfs]
Mar 15 14:21:30 fan kernel: [92308.377288]  [<ffffffff8113e286>] ? deactivate_locked_super+0x2f/0x56
Mar 15 14:21:30 fan kernel: [92308.377291]  [<ffffffff81152aff>] ? cleanup_mnt+0x4f/0x6b
Mar 15 14:21:30 fan kernel: [92308.377293]  [<ffffffff81066ab7>] ? task_work_run+0x5d/0x71
Mar 15 14:21:30 fan kernel: [92308.377296]  [<ffffffff810036dc>] ? prepare_exit_to_usermode+0x70/0x99
Mar 15 14:21:30 fan kernel: [92308.377300]  [<ffffffff8140de08>] ? int_ret_from_sys_call+0x25/0x8f
Mar 15 14:21:30 fan kernel: [92308.377302] ---[ end trace 18c6bb90b0c6c689 ]---

Mar 15 14:21:30 fan kernel: [92308.377303] ------------[ cut here ]------------
Mar 15 14:21:30 fan kernel: [92308.377318] WARNING: CPU: 5 PID: 28243 at fs/btrfs/extent-tree.c:5381 btrfs_free_block_groups+0x1d7/0x36f [btrfs]()
Mar 15 14:21:30 fan kernel: [92308.377319] Modules linked in: vhost_net vhost macvtap macvlan tun iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp dummy ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp llc snd_cmipci snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm_amd snd_mpu401_uart snd_opl3_lib snd_rawmidi kvm snd_hda_intel snd_seq_device snd_hda_codec snd_hda_core snd_hwdep amd64_edac_mod snd_pcm_oss edac_mce_amd irqbypass input_leds snd_mixer_oss pcspkr k10temp edac_core snd_pcm snd_timer snd i2c_piix4 asus_atk0110 soundcore acpi_cpufreq tpm_tis tpm sg processor evdev shpchp hwmon_vid autofs4 crc32c_generic btrfs xor raid6_pq ext4 crc16 mbcache jbd2 hmac sha256_ssse3 sha256_generic drbg ansi_cprng xts gf128mul algif_skcipher af_alg dm_crypt dm_mod hid_generic usbhid hid usb_storage sr_mod sd_mod cdrom ohci_pci r8169 mii amdkfd radeon i2c_algo_bit ahci ttm sym53c8xx libahci xhci_pci scsi_transport_spi drm_kms_helper ohci_hcd ehci_pci xhci_hcd libata ehci_hcd drm usbcore scsi_mod usb_common i2c_core button
Mar 15 14:21:30 fan kernel: [92308.377362] CPU: 5 PID: 28243 Comm: umount Tainted: G        W       4.4.5-zgws1 #2
Mar 15 14:21:30 fan kernel: [92308.377364] Hardware name: System manufacturer System Product Name/M5A88-V EVO, BIOS 1603    10/12/2012
Mar 15 14:21:30 fan kernel: [92308.377365]  000000000000005b ffffffff811dd418 0000000000000000 0000000000000009
Mar 15 14:21:30 fan kernel: [92308.377367]  ffffffff81051e21 ffffffffa047a162 ffff880600a28000 0000000000000000
Mar 15 14:21:30 fan kernel: [92308.377369]  ffff880600a28080 ffff8805af7eea00 ffffffffa047a162 ffff880600a28000
Mar 15 14:21:30 fan kernel: [92308.377372] Call Trace:
Mar 15 14:21:30 fan kernel: [92308.377374]  [<ffffffff811dd418>] ? dump_stack+0x5a/0x6f
Mar 15 14:21:30 fan kernel: [92308.377377]  [<ffffffff81051e21>] ? warn_slowpath_common+0x8e/0xa3
Mar 15 14:21:30 fan kernel: [92308.377390]  [<ffffffffa047a162>] ? btrfs_free_block_groups+0x1d7/0x36f [btrfs]
Mar 15 14:21:30 fan kernel: [92308.377404]  [<ffffffffa047a162>] ? btrfs_free_block_groups+0x1d7/0x36f [btrfs]
Mar 15 14:21:30 fan kernel: [92308.377419]  [<ffffffffa0487c72>] ? close_ctree+0x1e6/0x2f2 [btrfs]
Mar 15 14:21:30 fan kernel: [92308.377421]  [<ffffffff8113dfd9>] ? generic_shutdown_super+0x64/0xdf
Mar 15 14:21:30 fan kernel: [92308.377423]  [<ffffffff8113e181>] ? kill_anon_super+0x9/0xe
Mar 15 14:21:30 fan kernel: [92308.377435]  [<ffffffffa04660db>] ? btrfs_kill_super+0xd/0x16 [btrfs]
Mar 15 14:21:30 fan kernel: [92308.377438]  [<ffffffff8113e286>] ? deactivate_locked_super+0x2f/0x56
Mar 15 14:21:30 fan kernel: [92308.377440]  [<ffffffff81152aff>] ? cleanup_mnt+0x4f/0x6b
Mar 15 14:21:30 fan kernel: [92308.377442]  [<ffffffff81066ab7>] ? task_work_run+0x5d/0x71
Mar 15 14:21:30 fan kernel: [92308.377444]  [<ffffffff810036dc>] ? prepare_exit_to_usermode+0x70/0x99
Mar 15 14:21:30 fan kernel: [92308.377446]  [<ffffffff8140de08>] ? int_ret_from_sys_call+0x25/0x8f
Mar 15 14:21:30 fan kernel: [92308.377448] ---[ end trace 18c6bb90b0c6c68a ]---

Mar 15 14:21:30 fan kernel: [92308.377455] ------------[ cut here ]------------
Mar 15 14:21:30 fan kernel: [92308.377469] WARNING: CPU: 5 PID: 28243 at fs/btrfs/extent-tree.c:9595 btrfs_free_block_groups+0x34f/0x36f [btrfs]()
Mar 15 14:21:30 fan kernel: [92308.377471] Modules linked in: vhost_net vhost macvtap macvlan tun iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp dummy ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp llc snd_cmipci snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm_amd snd_mpu401_uart snd_opl3_lib snd_rawmidi kvm snd_hda_intel snd_seq_device snd_hda_codec snd_hda_core snd_hwdep amd64_edac_mod snd_pcm_oss edac_mce_amd irqbypass input_leds snd_mixer_oss pcspkr k10temp edac_core snd_pcm snd_timer snd i2c_piix4 asus_atk0110 soundcore acpi_cpufreq tpm_tis tpm sg processor evdev shpchp hwmon_vid autofs4 crc32c_generic btrfs xor raid6_pq ext4 crc16 mbcache jbd2 hmac sha256_ssse3 sha256_generic drbg ansi_cprng xts gf128mul algif_skcipher af_alg dm_crypt dm_mod hid_generic usbhid hid usb_storage sr_mod sd_mod cdrom ohci_pci r8169 mii amdkfd radeon i2c_algo_bit ahci ttm sym53c8xx libahci xhci_pci scsi_transport_spi drm_kms_helper ohci_hcd ehci_pci xhci_hcd libata ehci_hcd drm usbcore scsi_mod usb_common i2c_core button
Mar 15 14:21:30 fan kernel: [92308.377514] CPU: 5 PID: 28243 Comm: umount Tainted: G        W       4.4.5-zgws1 #2
Mar 15 14:21:30 fan kernel: [92308.377515] Hardware name: System manufacturer System Product Name/M5A88-V EVO, BIOS 1603    10/12/2012
Mar 15 14:21:30 fan kernel: [92308.377517]  000000000000005b ffffffff811dd418 0000000000000000 0000000000000009
Mar 15 14:21:30 fan kernel: [92308.377519]  ffffffff81051e21 ffffffffa047a2da ffff880600a28000 ffff8806173f6e88
Mar 15 14:21:30 fan kernel: [92308.377521]  0000000000000038 0000000000000000 ffffffffa047a2da ffff880600a28000
Mar 15 14:21:30 fan kernel: [92308.377523] Call Trace:
Mar 15 14:21:30 fan kernel: [92308.377525]  [<ffffffff811dd418>] ? dump_stack+0x5a/0x6f
Mar 15 14:21:30 fan kernel: [92308.377528]  [<ffffffff81051e21>] ? warn_slowpath_common+0x8e/0xa3
Mar 15 14:21:30 fan kernel: [92308.377542]  [<ffffffffa047a2da>] ? btrfs_free_block_groups+0x34f/0x36f [btrfs]
Mar 15 14:21:30 fan kernel: [92308.377555]  [<ffffffffa047a2da>] ? btrfs_free_block_groups+0x34f/0x36f [btrfs]
Mar 15 14:21:30 fan kernel: [92308.377570]  [<ffffffffa0487c72>] ? close_ctree+0x1e6/0x2f2 [btrfs]
Mar 15 14:21:30 fan kernel: [92308.377573]  [<ffffffff8113dfd9>] ? generic_shutdown_super+0x64/0xdf
Mar 15 14:21:30 fan kernel: [92308.377574]  [<ffffffff8113e181>] ? kill_anon_super+0x9/0xe
Mar 15 14:21:30 fan kernel: [92308.377587]  [<ffffffffa04660db>] ? btrfs_kill_super+0xd/0x16 [btrfs]
Mar 15 14:21:30 fan kernel: [92308.377589]  [<ffffffff8113e286>] ? deactivate_locked_super+0x2f/0x56
Mar 15 14:21:30 fan kernel: [92308.377591]  [<ffffffff81152aff>] ? cleanup_mnt+0x4f/0x6b
Mar 15 14:21:30 fan kernel: [92308.377593]  [<ffffffff81066ab7>] ? task_work_run+0x5d/0x71
Mar 15 14:21:30 fan kernel: [92308.377596]  [<ffffffff810036dc>] ? prepare_exit_to_usermode+0x70/0x99
Mar 15 14:21:30 fan kernel: [92308.377598]  [<ffffffff8140de08>] ? int_ret_from_sys_call+0x25/0x8f
Mar 15 14:21:30 fan kernel: [92308.377600] ---[ end trace 18c6bb90b0c6c68b ]---
Mar 15 14:21:30 fan kernel: [92308.377601] BTRFS: space_info 4 has 20608794624 free, is not full
Mar 15 14:21:30 fan kernel: [92308.377604] BTRFS: space_info total=23085449216, used=2476654592, pinned=0, reserved=0, may_use=25769803776, readonly=0

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue
  2016-03-15 10:52               ` Holger Hoffstätte
@ 2016-03-15 13:46                 ` Marc Haber
  2016-03-15 13:54                   ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 81+ messages in thread
From: Marc Haber @ 2016-03-15 13:46 UTC (permalink / raw)
  To: Btrfs BTRFS

On Tue, Mar 15, 2016 at 11:52:30AM +0100, Holger Hoffstätte wrote:
> On 03/14/16 21:13, Marc Haber wrote:
> > Do I need to wait for clear_cache to finish, like until I see disk
> > usage dropping?
> 
> The cache isn't that big, so you won't see a huge drop. Just use the
> disk normally for a few minutes, after some time the cache will be
> written out again.

Is it necessary to actually cause activity on the file system or is it
ok to just let it sit there for an hour or so?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue
  2016-03-15 13:46                 ` Marc Haber
@ 2016-03-15 13:54                   ` Austin S. Hemmelgarn
  2016-03-15 14:09                     ` Marc Haber
  0 siblings, 1 reply; 81+ messages in thread
From: Austin S. Hemmelgarn @ 2016-03-15 13:54 UTC (permalink / raw)
  To: Marc Haber, Btrfs BTRFS

On 2016-03-15 09:46, Marc Haber wrote:
> On Tue, Mar 15, 2016 at 11:52:30AM +0100, Holger Hoffstätte wrote:
>> On 03/14/16 21:13, Marc Haber wrote:
>>> Do I need to wait for clear_cache to finish, like until I see disk
>>> usage dropping?
>>
>> The cache isn't that big, so you won't see a huge drop. Just use the
>> disk normally for a few minutes, after some time the cache will be
>> written out again.
>
> Is it necessary to actually cause activity on the file system or is it
> ok to just let it sit there for an hour or so?
It should be OK to just let it sit there for ten or fifteen minutes. I'm 
pretty certain that the free space cache gets rebuilt relatively 
quickly, and I'm almost 100% certain that the old one gets dropped 
within seconds of the FS being mounted with -o clear_cache.  I've 
rebuilt the cache on the 64G root filesystem on my laptop a couple of 
times before, and it consistently appears to take about 2-3 minutes to 
do so at most (based on disk usage from the kernel itself).


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue
  2016-03-15 13:54                   ` Austin S. Hemmelgarn
@ 2016-03-15 14:09                     ` Marc Haber
  0 siblings, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-15 14:09 UTC (permalink / raw)
  To: Btrfs BTRFS

On Tue, Mar 15, 2016 at 09:54:06AM -0400, Austin S. Hemmelgarn wrote:
> On 2016-03-15 09:46, Marc Haber wrote:
> >On Tue, Mar 15, 2016 at 11:52:30AM +0100, Holger Hoffstätte wrote:
> >>On 03/14/16 21:13, Marc Haber wrote:
> >>>Do I need to wait for clear_cache to finish, like until I see disk
> >>>usage dropping?
> >>
> >>The cache isn't that big, so you won't see a huge drop. Just use the
> >>disk normally for a few minutes, after some time the cache will be
> >>written out again.
> >
> >Is it necessary to actually cause activity on the file system or is it
> >ok to just let it sit there for an hour or so?
> It should be OK to just let it sit there for ten or fifteen minutes. I'm
> pretty certain that the free space cache gets rebuilt relatively quickly,
> and I'm almost 100% certain that the old one gets dropped within seconds of
> the FS being mounted with -o clear_cache.  I've rebuilt the cache on the 64G
> root filesystem on my laptop a couple of times before, and it consistently
> appears to take about 2-3 minutes to do so at most (based on disk usage from
> the kernel itself).

In my case, atop has not seen any notable disk activity after mounting
with -o clerar_cache.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-15 13:42                 ` Marc Haber
@ 2016-03-15 16:54                   ` Henk Slager
  0 siblings, 0 replies; 81+ messages in thread
From: Henk Slager @ 2016-03-15 16:54 UTC (permalink / raw)
  To: Marc Haber; +Cc: Btrfs BTRFS

On Tue, Mar 15, 2016 at 2:42 PM, Marc Haber <mh+linux-btrfs@zugschlus.de> wrote:
> On Tue, Mar 15, 2016 at 02:29:32PM +0100, Marc Haber wrote:
>> After umounting and btrfs check the block device, things seem to be
>> fine now
>
> But, umounting the btrfs seemed to trigger the following kernel traces:
>
> Mar 15 14:21:30 fan kernel: [92308.377104] ------------[ cut here ]------------
> Mar 15 14:21:30 fan kernel: [92308.377135] WARNING: CPU: 5 PID: 28243 at fs/btrfs/extent-tree.c:5380 bt
> rfs_free_block_groups+0x1bc/0x36f [btrfs]()
> Mar 15 14:21:30 fan kernel: [92308.377137] Modules linked in: vhost_net vhost macvtap macvlan tun iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp dummy ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp llc snd_cmipci snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm_amd snd_mpu401_uart snd_opl3_lib snd_rawmidi kvm snd_hda_intel snd_seq_device snd_hda_codec snd_hda_core snd_hwdep amd64_edac_mod snd_pcm_oss edac_mce_amd irqbypass input_leds snd_mixer_oss pcspkr k10temp edac_core snd_pcm snd_timer snd i2c_piix4 asus_atk0110 soundcore acpi_cpufreq tpm_tis tpm sg processor evdev shpchp hwmon_vid autofs4 crc32c_generic btrfs xor raid6_pq ext4 crc16 mbcache jbd2 hmac sha256_ssse3 sha256_generic drbg ansi_cprng xts gf128mul algif_skcipher af_alg dm_crypt dm_mod hid_generic usbhid hid usb_storage sr_mod sd_mod cdrom ohci_pci r8169 mii amdkfd radeon i2c_algo_bit ahci ttm sym53c8xx libahci xhci_pci scsi_transport_spi drm_kms_helper ohci_hcd ehci_pci xhci_hcd libata ehci_hcd drm usbcore scsi_mod usb_common i2c_core button
> Mar 15 14:21:30 fan kernel: [92308.377203] CPU: 5 PID: 28243 Comm: umount Not tainted 4.4.5-zgws1 #2
> Mar 15 14:21:30 fan kernel: [92308.377205] Hardware name: System manufacturer System Product Name/M5A88-V EVO, BIOS 1603    10/12/2012
> Mar 15 14:21:30 fan kernel: [92308.377207]  000000000000005b ffffffff811dd418 0000000000000000 0000000000000009
> Mar 15 14:21:30 fan kernel: [92308.377210]  ffffffff81051e21 ffffffffa047a147 ffff880600a28000 0000000000000000
> Mar 15 14:21:30 fan kernel: [92308.377212]  ffff880600a28080 ffff8805af7eea00 ffffffffa047a147 ffff880600a28000
> Mar 15 14:21:30 fan kernel: [92308.377215] Call Trace:
> Mar 15 14:21:30 fan kernel: [92308.377221]  [<ffffffff811dd418>] ? dump_stack+0x5a/0x6f
> Mar 15 14:21:30 fan kernel: [92308.377224]  [<ffffffff81051e21>] ? warn_slowpath_common+0x8e/0xa3
> Mar 15 14:21:30 fan kernel: [92308.377239]  [<ffffffffa047a147>] ? btrfs_free_block_groups+0x1bc/0x36f[btrfs]
> Mar 15 14:21:30 fan kernel: [92308.377252]  [<ffffffffa047a147>] ? btrfs_free_block_groups+0x1bc/0x36f[btrfs]
> Mar 15 14:21:30 fan kernel: [92308.377267]  [<ffffffffa0487c72>] ? close_ctree+0x1e6/0x2f2 [btrfs]
> Mar 15 14:21:30 fan kernel: [92308.377271]  [<ffffffff8113dfd9>] ? generic_shutdown_super+0x64/0xdf
> Mar 15 14:21:30 fan kernel: [92308.377273]  [<ffffffff8113e181>] ? kill_anon_super+0x9/0xe
> Mar 15 14:21:30 fan kernel: [92308.377285]  [<ffffffffa04660db>] ? btrfs_kill_super+0xd/0x16 [btrfs]
> Mar 15 14:21:30 fan kernel: [92308.377288]  [<ffffffff8113e286>] ? deactivate_locked_super+0x2f/0x56
> Mar 15 14:21:30 fan kernel: [92308.377291]  [<ffffffff81152aff>] ? cleanup_mnt+0x4f/0x6b
> Mar 15 14:21:30 fan kernel: [92308.377293]  [<ffffffff81066ab7>] ? task_work_run+0x5d/0x71
> Mar 15 14:21:30 fan kernel: [92308.377296]  [<ffffffff810036dc>] ? prepare_exit_to_usermode+0x70/0x99
> Mar 15 14:21:30 fan kernel: [92308.377300]  [<ffffffff8140de08>] ? int_ret_from_sys_call+0x25/0x8f
> Mar 15 14:21:30 fan kernel: [92308.377302] ---[ end trace 18c6bb90b0c6c689 ]---
>
> Mar 15 14:21:30 fan kernel: [92308.377303] ------------[ cut here ]------------
> Mar 15 14:21:30 fan kernel: [92308.377318] WARNING: CPU: 5 PID: 28243 at fs/btrfs/extent-tree.c:5381 btrfs_free_block_groups+0x1d7/0x36f [btrfs]()
> Mar 15 14:21:30 fan kernel: [92308.377319] Modules linked in: vhost_net vhost macvtap macvlan tun iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp dummy ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp llc snd_cmipci snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm_amd snd_mpu401_uart snd_opl3_lib snd_rawmidi kvm snd_hda_intel snd_seq_device snd_hda_codec snd_hda_core snd_hwdep amd64_edac_mod snd_pcm_oss edac_mce_amd irqbypass input_leds snd_mixer_oss pcspkr k10temp edac_core snd_pcm snd_timer snd i2c_piix4 asus_atk0110 soundcore acpi_cpufreq tpm_tis tpm sg processor evdev shpchp hwmon_vid autofs4 crc32c_generic btrfs xor raid6_pq ext4 crc16 mbcache jbd2 hmac sha256_ssse3 sha256_generic drbg ansi_cprng xts gf128mul algif_skcipher af_alg dm_crypt dm_mod hid_generic usbhid hid usb_storage sr_mod sd_mod cdrom ohci_pci r8169 mii amdkfd radeon i2c_algo_bit ahci ttm sym53c8xx libahci xhci_pci scsi_transport_spi drm_kms_helper ohci_hcd ehci_pci xhci_hcd libata ehci_hcd drm usbcore scsi_mod usb_common i2c_core button
> Mar 15 14:21:30 fan kernel: [92308.377362] CPU: 5 PID: 28243 Comm: umount Tainted: G        W       4.4.5-zgws1 #2
> Mar 15 14:21:30 fan kernel: [92308.377364] Hardware name: System manufacturer System Product Name/M5A88-V EVO, BIOS 1603    10/12/2012
> Mar 15 14:21:30 fan kernel: [92308.377365]  000000000000005b ffffffff811dd418 0000000000000000 0000000000000009
> Mar 15 14:21:30 fan kernel: [92308.377367]  ffffffff81051e21 ffffffffa047a162 ffff880600a28000 0000000000000000
> Mar 15 14:21:30 fan kernel: [92308.377369]  ffff880600a28080 ffff8805af7eea00 ffffffffa047a162 ffff880600a28000
> Mar 15 14:21:30 fan kernel: [92308.377372] Call Trace:
> Mar 15 14:21:30 fan kernel: [92308.377374]  [<ffffffff811dd418>] ? dump_stack+0x5a/0x6f
> Mar 15 14:21:30 fan kernel: [92308.377377]  [<ffffffff81051e21>] ? warn_slowpath_common+0x8e/0xa3
> Mar 15 14:21:30 fan kernel: [92308.377390]  [<ffffffffa047a162>] ? btrfs_free_block_groups+0x1d7/0x36f [btrfs]
> Mar 15 14:21:30 fan kernel: [92308.377404]  [<ffffffffa047a162>] ? btrfs_free_block_groups+0x1d7/0x36f [btrfs]
> Mar 15 14:21:30 fan kernel: [92308.377419]  [<ffffffffa0487c72>] ? close_ctree+0x1e6/0x2f2 [btrfs]
> Mar 15 14:21:30 fan kernel: [92308.377421]  [<ffffffff8113dfd9>] ? generic_shutdown_super+0x64/0xdf
> Mar 15 14:21:30 fan kernel: [92308.377423]  [<ffffffff8113e181>] ? kill_anon_super+0x9/0xe
> Mar 15 14:21:30 fan kernel: [92308.377435]  [<ffffffffa04660db>] ? btrfs_kill_super+0xd/0x16 [btrfs]
> Mar 15 14:21:30 fan kernel: [92308.377438]  [<ffffffff8113e286>] ? deactivate_locked_super+0x2f/0x56
> Mar 15 14:21:30 fan kernel: [92308.377440]  [<ffffffff81152aff>] ? cleanup_mnt+0x4f/0x6b
> Mar 15 14:21:30 fan kernel: [92308.377442]  [<ffffffff81066ab7>] ? task_work_run+0x5d/0x71
> Mar 15 14:21:30 fan kernel: [92308.377444]  [<ffffffff810036dc>] ? prepare_exit_to_usermode+0x70/0x99
> Mar 15 14:21:30 fan kernel: [92308.377446]  [<ffffffff8140de08>] ? int_ret_from_sys_call+0x25/0x8f
> Mar 15 14:21:30 fan kernel: [92308.377448] ---[ end trace 18c6bb90b0c6c68a ]---

OK, this corresponds with the symptoms from the balances.

You hit the first to warnings in here:

static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
{
        block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL,
                                                (u64)-1);
        WARN_ON(fs_info->delalloc_block_rsv.size > 0);
        WARN_ON(fs_info->delalloc_block_rsv.reserved > 0);
        WARN_ON(fs_info->trans_block_rsv.size > 0);
        WARN_ON(fs_info->trans_block_rsv.reserved > 0);
        WARN_ON(fs_info->chunk_block_rsv.size > 0);
        WARN_ON(fs_info->chunk_block_rsv.reserved > 0);
        WARN_ON(fs_info->delayed_block_rsv.size > 0);
        WARN_ON(fs_info->delayed_block_rsv.reserved > 0);
}

and would need to traceback the code further to figure out why this
function is failing. That is one track.

Another track maybe, assuming something outside btrfs is causing the
trouble, would be to copy the fs device (so on top of dm) with dd to
some large secure local storage (maybe on another PC)  so that you can
mount it there via a loopdev. (Make at least sure a kernel has no
possibility to work on the same UUID ).

Then you can run the balance commands etc in more isolated
environment, at least not affected by possible
dm/root-crypt/bootloader etc issues. Hopefully that gives hints to the
root cause.


> Mar 15 14:21:30 fan kernel: [92308.377455] ------------[ cut here ]------------
> Mar 15 14:21:30 fan kernel: [92308.377469] WARNING: CPU: 5 PID: 28243 at fs/btrfs/extent-tree.c:9595 btrfs_free_block_groups+0x34f/0x36f [btrfs]()
> Mar 15 14:21:30 fan kernel: [92308.377471] Modules linked in: vhost_net vhost macvtap macvlan tun iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp dummy ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp llc snd_cmipci snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm_amd snd_mpu401_uart snd_opl3_lib snd_rawmidi kvm snd_hda_intel snd_seq_device snd_hda_codec snd_hda_core snd_hwdep amd64_edac_mod snd_pcm_oss edac_mce_amd irqbypass input_leds snd_mixer_oss pcspkr k10temp edac_core snd_pcm snd_timer snd i2c_piix4 asus_atk0110 soundcore acpi_cpufreq tpm_tis tpm sg processor evdev shpchp hwmon_vid autofs4 crc32c_generic btrfs xor raid6_pq ext4 crc16 mbcache jbd2 hmac sha256_ssse3 sha256_generic drbg ansi_cprng xts gf128mul algif_skcipher af_alg dm_crypt dm_mod hid_generic usbhid hid usb_storage sr_mod sd_mod cdrom ohci_pci r8169 mii amdkfd radeon i2c_algo_bit ahci ttm sym53c8xx libahci xhci_pci scsi_transport_spi drm_kms_helper ohci_hcd ehci_pci xhci_hcd libata ehci_hcd drm usbcore scsi_mod usb_common i2c_core button
> Mar 15 14:21:30 fan kernel: [92308.377514] CPU: 5 PID: 28243 Comm: umount Tainted: G        W       4.4.5-zgws1 #2
> Mar 15 14:21:30 fan kernel: [92308.377515] Hardware name: System manufacturer System Product Name/M5A88-V EVO, BIOS 1603    10/12/2012
> Mar 15 14:21:30 fan kernel: [92308.377517]  000000000000005b ffffffff811dd418 0000000000000000 0000000000000009
> Mar 15 14:21:30 fan kernel: [92308.377519]  ffffffff81051e21 ffffffffa047a2da ffff880600a28000 ffff8806173f6e88
> Mar 15 14:21:30 fan kernel: [92308.377521]  0000000000000038 0000000000000000 ffffffffa047a2da ffff880600a28000
> Mar 15 14:21:30 fan kernel: [92308.377523] Call Trace:
> Mar 15 14:21:30 fan kernel: [92308.377525]  [<ffffffff811dd418>] ? dump_stack+0x5a/0x6f
> Mar 15 14:21:30 fan kernel: [92308.377528]  [<ffffffff81051e21>] ? warn_slowpath_common+0x8e/0xa3
> Mar 15 14:21:30 fan kernel: [92308.377542]  [<ffffffffa047a2da>] ? btrfs_free_block_groups+0x34f/0x36f [btrfs]
> Mar 15 14:21:30 fan kernel: [92308.377555]  [<ffffffffa047a2da>] ? btrfs_free_block_groups+0x34f/0x36f [btrfs]
> Mar 15 14:21:30 fan kernel: [92308.377570]  [<ffffffffa0487c72>] ? close_ctree+0x1e6/0x2f2 [btrfs]
> Mar 15 14:21:30 fan kernel: [92308.377573]  [<ffffffff8113dfd9>] ? generic_shutdown_super+0x64/0xdf
> Mar 15 14:21:30 fan kernel: [92308.377574]  [<ffffffff8113e181>] ? kill_anon_super+0x9/0xe
> Mar 15 14:21:30 fan kernel: [92308.377587]  [<ffffffffa04660db>] ? btrfs_kill_super+0xd/0x16 [btrfs]
> Mar 15 14:21:30 fan kernel: [92308.377589]  [<ffffffff8113e286>] ? deactivate_locked_super+0x2f/0x56
> Mar 15 14:21:30 fan kernel: [92308.377591]  [<ffffffff81152aff>] ? cleanup_mnt+0x4f/0x6b
> Mar 15 14:21:30 fan kernel: [92308.377593]  [<ffffffff81066ab7>] ? task_work_run+0x5d/0x71
> Mar 15 14:21:30 fan kernel: [92308.377596]  [<ffffffff810036dc>] ? prepare_exit_to_usermode+0x70/0x99
> Mar 15 14:21:30 fan kernel: [92308.377598]  [<ffffffff8140de08>] ? int_ret_from_sys_call+0x25/0x8f
> Mar 15 14:21:30 fan kernel: [92308.377600] ---[ end trace 18c6bb90b0c6c68b ]---
> Mar 15 14:21:30 fan kernel: [92308.377601] BTRFS: space_info 4 has 20608794624 free, is not full
> Mar 15 14:21:30 fan kernel: [92308.377604] BTRFS: space_info total=23085449216, used=2476654592, pinned=0, reserved=0, may_use=25769803776, readonly=0
>
> Greetings
> Marc
>
> --
> -----------------------------------------------------------------------------
> Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
> Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
> Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 81+ messages in thread

* A good "Boot Maintenance" scheme (WAS: New file system with same issue)
  2016-03-14 20:13             ` Marc Haber
  2016-03-15 10:52               ` Holger Hoffstätte
@ 2016-03-17  1:17               ` Robert White
  1 sibling, 0 replies; 81+ messages in thread
From: Robert White @ 2016-03-17  1:17 UTC (permalink / raw)
  To: Marc Haber, Btrfs BTRFS

On 03/14/2016 01:13 PM, Marc Haber wrote:
> This was not asked, and I didn't try. Since this is an encrypted root
> filesystem, is it a workable way to add clear_cache to /etc/fstab,
> rebuild initramfs and reboot? Or do you recommend using a rescue system?


You should be able to boot to single user mode and man-handle the 
filesystem from there.

but if that's a problem...

I have a small (woefully incomplete) pre-alpha project on Sourceforge 
called "underdog".

I currently use it to build the initramfs to boot all my systems.

So even if yo don't use it for production you can use it to make the 
initramfs you'll need to have full BTRFS maintenance before any of your 
filesystems are mounted.

You want this tool if you want to do pre-boot surgery without hassles. 
(or you want a nice way to convert from extN to btrfs and back without 
having a bunch of co-variant CDs.)

There's a README.txt, it's long... But basically:


(1) use git to download project

(2) go to your kernel build directory

(3) run /path/utility/make_initramfs_description.bash > .initramfs

where "path" is the path to the download location

(4) set CONFIG_INITRAMFS_SOURCE=".initramfs" in your kernel config

(5) build new kernel.

This creates a kernel which:

(A) Booted with a normal initramfs will run as if you didn't do the above.

(B1) Booted _without_ another initramfs will _probably_ boot normally as 
long as all your disk related drivers are not kernel modules. (It 
handles most layouts I've tried but some hosts have "the UID problem" 
described below

(B2) Booted _without_ another initramfs but _with_ "bash" on the kernel 
command line results in several opportunities to execute arbitrary 
commands during the boot sequence, and provides the commands you need.

You are after mode B2.

So during a normal boot without another initramfs the system will 
recursively examine all the disk block devices, identifying and 
attaching cryptsetup, mdadm, and lvm2, and btrfs devices as it finds them.

Some of the prompting will get lost in the kernel boot messages because, 
you know, multi-threading and all that.

Prompts:

"pre-loop#"; you've used "bash" option and it's about to start the disk 
search.

A password prompt for a device; it's found a cryptsetup device it wants 
to open. It'll give you three tries before moving on.

"post-loop#"; After the recursive search. _hopefully_ your root device 
is mounted as "/root". The FS is (hopefully) mounted but the system is 
completely independent of the mount.

So either use "pre-loop#" to do all your maintenance by hand, or go 
through to "post-loop#" and unmount /root and then do as you must.

These prompts are from the uber-priviliged PID1 and you can exec into 
other things (like an alternate init) as well as just running programs.

(control-d from "pre-loop#" or "post-loop#" will continue the script.

If you drop out to a bash shell prompt in B1 or B2 at the end then the 
script could not deduce your layout. You can then do fixups and 
whatever, and then mount anything as /root and then do the "exec busybox 
switch_root /root /sbin/init" (or /init or whatever) to continue the 
boot. That emergency bash prompt is still PID #1 and all the initramfs 
conditions for init are still in force.

Anyway, having used "bash" you can use the pre- or post- loop prompts to 
do any maintenance you wish. Up-arrow/history is _not_ available. But 
you have btrfs, btrfsck, fsck, mount, blkid, mdadm, and lvm available to 
you. (you can also issue "busybox --instal" to get all of the commands 
like cp and vi and such).

Other simple emergency commands can be added to the foundational list by 
creating/editing a utility/something.mod file or the script itself.

The magic of the script is that it will scour the system for dependent 
interpreters and libraries for whatever commands you add to the 
initramfs so you don't have to have "staticly linked" or otherwise 
special utilities built.

Notes:

I've been using the kernel-inbuilt initramfs this creates in all my 
development systems for years, it's quite stable... except where it's 
not. 8-)

DO NOT USE "utility/make_initramfs.bash", it's depricated. If you want 
to make a not-inbuilt initramfs image then follow the README.txt 
instructions from item 3b (which is also easier).

Full Disclosure: It's a little wonky about using UIDs to find 
filesystems right now because BASH likes to try to do math near the 
minuses in the UIDs when looking up arrays. /sigh

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Current state of old filesystem (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-02-27 21:14 Again, no space left on device while rebalancing and recipe doesnt work Marc Haber
                   ` (2 preceding siblings ...)
  2016-03-03  0:28 ` Dāvis Mosāns
@ 2016-03-27  8:41 ` Marc Haber
  2016-04-01 13:59 ` Again, no space left on device while rebalancing and recipe doesnt work Marc Haber
  4 siblings, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-03-27  8:41 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have worked on my script and have interesting new findings about the
"old" filesystem, the one that used to have the huge spread between
Size and Used for Metadata.

I have uploaded the script to
http://q.bofh.de/~mh/stuff/20160327-btrfs-balance
and a current full syslog to
http://q.bofh.de/~mh/stuff/20160326-linux-btrfs-ofanbtr

Current state is:

(1)
Filesystem has been resized to 300 GB. This has solved the "Superblock
bytenr is larger than device size" message on btrfs check. btrfs check
now runs through fine, output below.

(2)
Umounting the filesystem results in a kernel trace, given below as well.

(3)
Metadata spread is down to 27GiB/2.31 GiB, decreases to 10Gib/2.30GiB
and increases again during the run of my btrfs-balance script:
[7/507]mh@fan:~$ grep "Metadata, DUP" 20160326-linux-btrfs-ofanbtr
Mar 26 09:49:40 fan root: Metadata, DUP: total=27.00GiB, used=2.31GiB
Mar 26 10:26:15 fan root: Metadata, DUP: total=10.50GiB, used=2.30GiB
Mar 26 11:01:52 fan root: Metadata, DUP: total=10.50GiB, used=2.31GiB
Mar 26 11:03:36 fan root: Metadata, DUP: total=14.50GiB, used=2.30GiB
Mar 26 11:03:36 fan root: Metadata, DUP: total=14.50GiB, used=2.30GiB
Mar 26 11:38:40 fan root: Metadata, DUP: total=19.00GiB, used=2.31GiB
[8/508]mh@fan:~$

(4)
The attached log gives the following indications:
Mar 26 09:49:40 fan root: Metadata,DUP: Size:27.00GiB, Used:2.31GiB
Mar 26 09:49:40 fan mh: BEGIN btrfs balance start /mnt/ofanbtr
Mar 26 10:26:15 fan root: Done, had to relocate 134 out of 134 chunks
Mar 26 10:26:15 fan root: Metadata, DUP: total=10.50GiB, used=2.30GiB
Mar 26 10:26:15 fan mh: BEGIN btrfs balance start -dprofiles=single /mnt/ofanbtr
Mar 26 11:01:52 fan root: Done, had to relocate 79 out of 101 chunks
Mar 26 11:01:52 fan root: Metadata, DUP: total=10.50GiB, used=2.31GiB
Mar 26 11:01:52 fan mh: BEGIN btrfs balance start -mprofiles=dup
Mar 26 11:03:36 fan kernel: [ 4693.269593] BTRFS info (device dm-31): 9 enospc errors during balance
Mar 26 11:03:36 fan root: ERROR: error during balancing '/mnt/ofanbtr': No space left on device
Mar 26 11:03:36 fan root: There may be more info in syslog - try dmesg | tail
Mar 26 11:03:36 fan root: Metadata, DUP: total=14.50GiB, used=2.30GiB
Mar 26 11:03:36 fan mh: BEGIN btrfs balance start --force -sprofiles=dup /mnt/ofanbtr
Mar 26 11:03:36 fan root: Done, had to relocate 1 out of 109 chunks
Mar 26 11:03:36 fan root: Metadata, DUP: total=14.50GiB, used=2.30GiB
Mar 26 11:03:36 fan mh: BEGIN btrfs balance start /mnt/ofanbtr
Mar 26 11:38:40 fan kernel: [ 6796.907286] BTRFS info (device dm-31): 9 enospc errors during balance
Mar 26 11:38:40 fan root: ERROR: error during balancing '/mnt/ofanbtr': No space left on device
Mar 26 11:38:40 fan root: There may be more info in syslog - try dmesg | tail
Mar 26 11:38:40 fan root: Metadata, DUP: total=19.00GiB, used=2.31GiB

Note, that the initial btrfs balance start runs without ENOSPC, the
btrfs balance start -mprofiles=dup errors out, and the final btrfs
balance start errors out again. We thus have the case where a script
run has actually worsened the state of the filesystem.

Is this worth adding to the bug report
(https://bugzilla.kernel.org/show_bug.cgi?id=114451)? Should I include
a new btrfs image? What else should I include?

Any ideas?

Greetings
Marc



btrfs check:
Checking filesystem on /dev/mapper/ofanbtr
UUID: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 86554099816 bytes used err is 0
total csum bytes: 81815012
total tree bytes: 2476228608
total fs tree bytes: 2246311936
total extent tree bytes: 133005312
btree space waste bytes: 452682835
file data blocks allocated: 292993851392
 referenced 132664164352

Kernel trace on umount:
Mar 27 10:06:32 fan kernel: [84067.902831] ------------[ cut here ]------------
Mar 27 10:06:32 fan kernel: [84067.902861] WARNING: CPU: 5 PID: 13409 at fs/btrfs/extent-tree.c:5380 btrfs_free_block_groups+0x1bc/0x36f [btrfs]()
Mar 27 10:06:32 fan kernel: [84067.902862] Modules linked in: iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp dummy ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp llc kvm_amd kvm irqbypass amd64_edac_mod edac_mce_amd pcspkr edac_core k10temp snd_cmipci snd_mpu401_uart snd_opl3_lib snd_rawmidi snd_hda_codec_realtek snd_seq_device input_leds snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel asus_atk0110 snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm sg evdev snd_timer acpi_cpufreq snd tpm_tis tpm soundcore processor shpchp i2c_piix4 hwmon_vid autofs4 crc32c_generic btrfs xor raid6_pq ext4 crc16 mbcache jbd2 hmac sha256_ssse3 sha256_generic drbg ansi_cprng xts gf128mul algif_skcipher af_alg dm_crypt dm_mod hid_generic usbhid hid usb_storage sr_mod sd_mod cdrom ohci_pci amdkfd radeon r8169 i2c_algo_bit mii sym53c8xx ahci ttm xhci_pci scsi_transport_spi libahci ehci_pci ohci_hcd xhci_hcd drm_kms_helper ehci_hcd libata drm usbcore scsi_mod usb_common i2c_core button
Mar 27 10:06:32 fan kernel: [84067.902909] CPU: 5 PID: 13409 Comm: umount Tainted: G        W    L  4.4.5-zgws1 #2
Mar 27 10:06:32 fan kernel: [84067.902911] Hardware name: System manufacturer System Product Name/M5A88-V EVO, BIOS 1603    10/12/2012
Mar 27 10:06:32 fan kernel: [84067.902913]  000000000000005b ffffffff811dd418 0000000000000000 0000000000000009
Mar 27 10:06:32 fan kernel: [84067.902915]  ffffffff81051e21 ffffffffa0481147 ffff8805f5b00000 0000000000000000
Mar 27 10:06:32 fan kernel: [84067.902917]  ffff8805f5b00080 ffff880608b5c800 ffffffffa0481147 ffff8805f5b00000
Mar 27 10:06:32 fan kernel: [84067.902919] Call Trace:
Mar 27 10:06:32 fan kernel: [84067.902923]  [<ffffffff811dd418>] ? dump_stack+0x5a/0x6f
Mar 27 10:06:32 fan kernel: [84067.902926]  [<ffffffff81051e21>] ? warn_slowpath_common+0x8e/0xa3
Mar 27 10:06:32 fan kernel: [84067.902937]  [<ffffffffa0481147>] ? btrfs_free_block_groups+0x1bc/0x36f [btrfs]
Mar 27 10:06:32 fan kernel: [84067.902947]  [<ffffffffa0481147>] ? btrfs_free_block_groups+0x1bc/0x36f [btrfs]
Mar 27 10:06:32 fan kernel: [84067.902959]  [<ffffffffa048ec72>] ? close_ctree+0x1e6/0x2f2 [btrfs]
Mar 27 10:06:32 fan kernel: [84067.902961]  [<ffffffff8113dfd9>] ? generic_shutdown_super+0x64/0xdf
Mar 27 10:06:32 fan kernel: [84067.902963]  [<ffffffff8113e181>] ? kill_anon_super+0x9/0xe
Mar 27 10:06:32 fan kernel: [84067.902972]  [<ffffffffa046d0db>] ? btrfs_kill_super+0xd/0x16 [btrfs]
Mar 27 10:06:32 fan kernel: [84067.902974]  [<ffffffff8113e286>] ? deactivate_locked_super+0x2f/0x56
Mar 27 10:06:32 fan kernel: [84067.902976]  [<ffffffff81152aff>] ? cleanup_mnt+0x4f/0x6b
Mar 27 10:06:32 fan kernel: [84067.902978]  [<ffffffff81066ab7>] ? task_work_run+0x5d/0x71
Mar 27 10:06:32 fan kernel: [84067.902981]  [<ffffffff810036dc>] ? prepare_exit_to_usermode+0x70/0x99
Mar 27 10:06:32 fan kernel: [84067.902983]  [<ffffffff8140de08>] ? int_ret_from_sys_call+0x25/0x8f
Mar 27 10:06:32 fan kernel: [84067.902985] ---[ end trace 095a538bb634d33d ]---
Mar 27 10:06:32 fan kernel: [84067.902986] ------------[ cut here ]------------
Mar 27 10:06:32 fan kernel: [84067.902997] WARNING: CPU: 5 PID: 13409 at fs/btrfs/extent-tree.c:5381 btrfs_free_block_groups+0x1d7/0x36f [btrfs]()
Mar 27 10:06:32 fan kernel: [84067.902998] Modules linked in: iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp dummy ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative bridge stp llc kvm_amd kvm irqbypass amd64_edac_mod edac_mce_amd pcspkr edac_core k10temp snd_cmipci snd_mpu401_uart snd_opl3_lib snd_rawmidi snd_hda_codec_realtek snd_seq_device input_leds snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel asus_atk0110 snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm sg evdev snd_timer acpi_cpufreq snd tpm_tis tpm soundcore processor shpchp i2c_piix4 hwmon_vid autofs4 crc32c_generic btrfs xor raid6_pq ext4 crc16 mbcache jbd2 hmac sha256_ssse3 sha256_generic drbg ansi_cprng xts gf128mul algif_skcipher af_alg dm_crypt dm_mod hid_generic usbhid hid usb_storage sr_mod sd_mod cdrom ohci_pci amdkfd radeon r8169 i2c_algo_bit mii sym53c8xx ahci ttm xhci_pci scsi_transport_spi libahci ehci_pci ohci_hcd xhci_hcd drm_kms_helper ehci_hcd libata drm usbcore scsi_mod usb_common i2c_core button
Mar 27 10:06:32 fan kernel: [84067.903030] CPU: 5 PID: 13409 Comm: umount Tainted: G        W    L  4.4.5-zgws1 #2
Mar 27 10:06:32 fan kernel: [84067.903031] Hardware name: System manufacturer System Product Name/M5A88-V EVO, BIOS 1603    10/12/2012
Mar 27 10:06:32 fan kernel: [84067.903032]  000000000000005b ffffffff811dd418 0000000000000000 0000000000000009
Mar 27 10:06:32 fan kernel: [84067.903034]  ffffffff81051e21 ffffffffa0481162 ffff8805f5b00000 0000000000000000
Mar 27 10:06:32 fan kernel: [84067.903035]  ffff8805f5b00080 ffff880608b5c800 ffffffffa0481162 ffff8805f5b00000
Mar 27 10:06:32 fan kernel: [84067.903037] Call Trace:
Mar 27 10:06:32 fan kernel: [84067.903039]  [<ffffffff811dd418>] ? dump_stack+0x5a/0x6f
Mar 27 10:06:32 fan kernel: [84067.903041]  [<ffffffff81051e21>] ? warn_slowpath_common+0x8e/0xa3
Mar 27 10:06:32 fan kernel: [84067.903055]  [<ffffffffa0481162>] ? btrfs_free_block_groups+0x1d7/0x36f [btrfs]
Mar 27 10:06:32 fan kernel: [84067.903064]  [<ffffffffa0481162>] ? btrfs_free_block_groups+0x1d7/0x36f [btrfs]
Mar 27 10:06:32 fan kernel: [84067.903074]  [<ffffffffa048ec72>] ? close_ctree+0x1e6/0x2f2 [btrfs]
Mar 27 10:06:32 fan kernel: [84067.903076]  [<ffffffff8113dfd9>] ? generic_shutdown_super+0x64/0xdf
Mar 27 10:06:32 fan kernel: [84067.903077]  [<ffffffff8113e181>] ? kill_anon_super+0x9/0xe
Mar 27 10:06:32 fan kernel: [84067.903085]  [<ffffffffa046d0db>] ? btrfs_kill_super+0xd/0x16 [btrfs]
Mar 27 10:06:32 fan kernel: [84067.903086]  [<ffffffff8113e286>] ? deactivate_locked_super+0x2f/0x56
Mar 27 10:06:32 fan kernel: [84067.903088]  [<ffffffff81152aff>] ? cleanup_mnt+0x4f/0x6b
Mar 27 10:06:32 fan kernel: [84067.903089]  [<ffffffff81066ab7>] ? task_work_run+0x5d/0x71
Mar 27 10:06:32 fan kernel: [84067.903091]  [<ffffffff810036dc>] ? prepare_exit_to_usermode+0x70/0x99
Mar 27 10:06:32 fan kernel: [84067.903092]  [<ffffffff8140de08>] ? int_ret_from_sys_call+0x25/0x8f
Mar 27 10:06:32 fan kernel: [84067.903093] ---[ end trace 095a538bb634d33e ]---

Full log sans boring BTRFS info lines (full log including them on
http://q.bofh.de/~mh/stuff/20160326-linux-btrfs-ofanbtr):
[15/514]mh@fan:~$ grep -v -E '(relocating block group|found [[:digit:]]+ extents)' 20160326-linux-btrfs-ofanbtr
Mar 26 09:49:26 fan kernel: [  242.951247] BTRFS: device label ofanbtr devid 1 transid 22237711 /dev/dm-31
Mar 26 09:49:28 fan kernel: [  245.232833] BTRFS info (device dm-31): force clearing of disk cache
Mar 26 09:49:28 fan kernel: [  245.232846] BTRFS info (device dm-31): disk space caching is enabled
Mar 26 09:49:28 fan kernel: [  245.232851] BTRFS: has skinny extents
Mar 26 09:49:28 fan kernel: [  245.333392] BTRFS: detected SSD devices, enabling SSD mode
Mar 26 09:49:40 fan mh: BEGIN btrfs-balance script
Mar 26 09:49:40 fan root: Linux fan 4.4.5-zgws1 #2 SMP Sun Mar 13 21:00:02 UTC 2016 x86_64 GNU/Linux
Mar 26 09:49:40 fan root: /dev/mapper/ofanbtr /mnt/ofanbtr btrfs rw,relatime,ssd,space_cache,clear_cache,subvolid=5,subvol=/ 0 0
Mar 26 09:49:40 fan mh: btrfs fi df /mnt/ofanbtr
Mar 26 09:49:40 fan root: Data, single: total=79.00GiB, used=78.33GiB
Mar 26 09:49:40 fan root: System, DUP: total=32.00MiB, used=16.00KiB
Mar 26 09:49:40 fan root: Metadata, DUP: total=27.00GiB, used=2.31GiB
Mar 26 09:49:40 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 26 09:49:40 fan mh: btrfs fi show /mnt/ofanbtr
Mar 26 09:49:40 fan root: Label: 'ofanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
Mar 26 09:49:40 fan root: #011Total devices 1 FS bytes used 80.63GiB
Mar 26 09:49:40 fan root: #011devid    1 size 300.00GiB used 133.06GiB path /dev/mapper/ofanbtr
Mar 26 09:49:40 fan root:
Mar 26 09:49:40 fan mh: btrfs fi usage /mnt/ofanbtr
Mar 26 09:49:40 fan root: Overall:
Mar 26 09:49:40 fan root:     Device size:#011#011 300.00GiB
Mar 26 09:49:40 fan root:     Device allocated:#011#011 133.06GiB
Mar 26 09:49:40 fan root:     Device unallocated:#011#011 166.94GiB
Mar 26 09:49:40 fan root:     Device missing:#011#011     0.00B
Mar 26 09:49:40 fan root:     Used:#011#011#011  82.94GiB
Mar 26 09:49:40 fan root:     Free (estimated):#011#011 167.61GiB#011(min: 84.14GiB)
Mar 26 09:49:40 fan root:     Data ratio:#011#011#011      1.00
Mar 26 09:49:40 fan root:     Metadata ratio:#011#011      2.00
Mar 26 09:49:40 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 26 09:49:40 fan root:
Mar 26 09:49:40 fan root: Data,single: Size:79.00GiB, Used:78.33GiB
Mar 26 09:49:40 fan root:    /dev/mapper/ofanbtr#011  79.00GiB
Mar 26 09:49:40 fan root:
Mar 26 09:49:40 fan root: Metadata,DUP: Size:27.00GiB, Used:2.31GiB
Mar 26 09:49:40 fan root:    /dev/mapper/ofanbtr#011  54.00GiB
Mar 26 09:49:40 fan root:
Mar 26 09:49:40 fan root: System,DUP: Size:32.00MiB, Used:16.00KiB
Mar 26 09:49:40 fan root:    /dev/mapper/ofanbtr#011  64.00MiB
Mar 26 09:49:40 fan root:
Mar 26 09:49:40 fan root: Unallocated:
Mar 26 09:49:40 fan root:    /dev/mapper/ofanbtr#011 166.94GiB
Mar 26 09:49:40 fan mh: BEGIN btrfs balance start /mnt/ofanbtr
Mar 26 10:26:15 fan root: Done, had to relocate 134 out of 134 chunks
Mar 26 10:26:15 fan mh: btrfs fi df /mnt/ofanbtr
Mar 26 10:26:15 fan root: Data, single: total=79.00GiB, used=78.32GiB
Mar 26 10:26:15 fan root: System, DUP: total=32.00MiB, used=16.00KiB
Mar 26 10:26:15 fan root: Metadata, DUP: total=10.50GiB, used=2.30GiB
Mar 26 10:26:15 fan root: GlobalReserve, single: total=512.00MiB, used=2.70MiB
Mar 26 10:26:15 fan mh: btrfs fi show /mnt/ofanbtr
Mar 26 10:26:15 fan root: Label: 'ofanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
Mar 26 10:26:15 fan root: #011Total devices 1 FS bytes used 80.63GiB
Mar 26 10:26:15 fan root: #011devid    1 size 300.00GiB used 100.06GiB path /dev/mapper/ofanbtr
Mar 26 10:26:15 fan root:
Mar 26 10:26:15 fan mh: btrfs fi usage /mnt/ofanbtr
Mar 26 10:26:15 fan root: Overall:
Mar 26 10:26:15 fan root:     Device size:#011#011 300.00GiB
Mar 26 10:26:15 fan root:     Device allocated:#011#011 100.06GiB
Mar 26 10:26:15 fan root:     Device unallocated:#011#011 199.94GiB
Mar 26 10:26:15 fan root:     Device missing:#011#011     0.00B
Mar 26 10:26:15 fan root:     Used:#011#011#011  82.93GiB
Mar 26 10:26:15 fan root:     Free (estimated):#011#011 200.61GiB#011(min: 100.64GiB)
Mar 26 10:26:15 fan root:     Data ratio:#011#011#011      1.00
Mar 26 10:26:15 fan root:     Metadata ratio:#011#011      2.00
Mar 26 10:26:15 fan root:     Global reserve:#011#011 512.00MiB#011(used: 2.70MiB)
Mar 26 10:26:15 fan root:
Mar 26 10:26:15 fan root: Data,single: Size:79.00GiB, Used:78.32GiB
Mar 26 10:26:15 fan root:    /dev/mapper/ofanbtr#011  79.00GiB
Mar 26 10:26:15 fan root:
Mar 26 10:26:15 fan root: Metadata,DUP: Size:10.50GiB, Used:2.30GiB
Mar 26 10:26:15 fan root:    /dev/mapper/ofanbtr#011  21.00GiB
Mar 26 10:26:15 fan root:
Mar 26 10:26:15 fan root: System,DUP: Size:32.00MiB, Used:16.00KiB
Mar 26 10:26:15 fan root:    /dev/mapper/ofanbtr#011  64.00MiB
Mar 26 10:26:15 fan root:
Mar 26 10:26:15 fan root: Unallocated:
Mar 26 10:26:15 fan root:    /dev/mapper/ofanbtr#011 199.94GiB
Mar 26 10:26:15 fan mh: BEGIN btrfs balance start -dprofiles=single /mnt/ofanbtr
Mar 26 11:01:52 fan root: Done, had to relocate 79 out of 101 chunks
Mar 26 11:01:52 fan mh: btrfs fi df /mnt/ofanbtr
Mar 26 11:01:52 fan root: Data, single: total=79.00GiB, used=78.33GiB
Mar 26 11:01:52 fan root: System, DUP: total=32.00MiB, used=16.00KiB
Mar 26 11:01:52 fan root: Metadata, DUP: total=10.50GiB, used=2.31GiB
Mar 26 11:01:52 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 26 11:01:52 fan mh: btrfs fi show /mnt/ofanbtr
Mar 26 11:01:52 fan root: Label: 'ofanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
Mar 26 11:01:52 fan root: #011Total devices 1 FS bytes used 80.63GiB
Mar 26 11:01:52 fan root: #011devid    1 size 300.00GiB used 100.06GiB path /dev/mapper/ofanbtr
Mar 26 11:01:52 fan root:
Mar 26 11:01:52 fan mh: btrfs fi usage /mnt/ofanbtr
Mar 26 11:01:52 fan root: Overall:
Mar 26 11:01:52 fan root:     Device size:#011#011 300.00GiB
Mar 26 11:01:52 fan root:     Device allocated:#011#011 100.06GiB
Mar 26 11:01:52 fan root:     Device unallocated:#011#011 199.94GiB
Mar 26 11:01:52 fan root:     Device missing:#011#011     0.00B
Mar 26 11:01:52 fan root:     Used:#011#011#011  82.94GiB
Mar 26 11:01:52 fan root:     Free (estimated):#011#011 200.61GiB#011(min: 100.64GiB)
Mar 26 11:01:52 fan root:     Data ratio:#011#011#011      1.00
Mar 26 11:01:52 fan root:     Metadata ratio:#011#011      2.00
Mar 26 11:01:52 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 26 11:01:52 fan root:
Mar 26 11:01:52 fan root: Data,single: Size:79.00GiB, Used:78.33GiB
Mar 26 11:01:52 fan root:    /dev/mapper/ofanbtr#011  79.00GiB
Mar 26 11:01:52 fan root:
Mar 26 11:01:52 fan root: Metadata,DUP: Size:10.50GiB, Used:2.31GiB
Mar 26 11:01:52 fan root:    /dev/mapper/ofanbtr#011  21.00GiB
Mar 26 11:01:52 fan root:
Mar 26 11:01:52 fan root: System,DUP: Size:32.00MiB, Used:16.00KiB
Mar 26 11:01:52 fan root:    /dev/mapper/ofanbtr#011  64.00MiB
Mar 26 11:01:52 fan root:
Mar 26 11:01:52 fan root: Unallocated:
Mar 26 11:01:52 fan root:    /dev/mapper/ofanbtr#011 199.94GiB
Mar 26 11:01:52 fan mh: BEGIN btrfs balance start -mprofiles=dup /mnt/ofanbtr
Mar 26 11:03:36 fan kernel: [ 4693.269593] BTRFS info (device dm-31): 9 enospc errors during balance
Mar 26 11:03:36 fan root: ERROR: error during balancing '/mnt/ofanbtr': No space left on device
Mar 26 11:03:36 fan root: There may be more info in syslog - try dmesg | tail
Mar 26 11:03:36 fan mh: btrfs fi df /mnt/ofanbtr
Mar 26 11:03:36 fan root: Data, single: total=79.00GiB, used=78.32GiB
Mar 26 11:03:36 fan root: System, DUP: total=32.00MiB, used=16.00KiB
Mar 26 11:03:36 fan root: Metadata, DUP: total=14.50GiB, used=2.30GiB
Mar 26 11:03:36 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 26 11:03:36 fan mh: btrfs fi show /mnt/ofanbtr
Mar 26 11:03:36 fan root: Label: 'ofanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
Mar 26 11:03:36 fan root: #011Total devices 1 FS bytes used 80.63GiB
Mar 26 11:03:36 fan root: #011devid    1 size 300.00GiB used 108.06GiB path /dev/mapper/ofanbtr
Mar 26 11:03:36 fan root:
Mar 26 11:03:36 fan mh: btrfs fi usage /mnt/ofanbtr
Mar 26 11:03:36 fan root: Overall:
Mar 26 11:03:36 fan root:     Device size:#011#011 300.00GiB
Mar 26 11:03:36 fan root:     Device allocated:#011#011 108.06GiB
Mar 26 11:03:36 fan root:     Device unallocated:#011#011 191.94GiB
Mar 26 11:03:36 fan root:     Device missing:#011#011     0.00B
Mar 26 11:03:36 fan root:     Used:#011#011#011  82.93GiB
Mar 26 11:03:36 fan root:     Free (estimated):#011#011 192.61GiB#011(min: 96.64GiB)
Mar 26 11:03:36 fan root:     Data ratio:#011#011#011      1.00
Mar 26 11:03:36 fan root:     Metadata ratio:#011#011      2.00
Mar 26 11:03:36 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 26 11:03:36 fan root:
Mar 26 11:03:36 fan root: Data,single: Size:79.00GiB, Used:78.32GiB
Mar 26 11:03:36 fan root:    /dev/mapper/ofanbtr#011  79.00GiB
Mar 26 11:03:36 fan root:
Mar 26 11:03:36 fan root: Metadata,DUP: Size:14.50GiB, Used:2.30GiB
Mar 26 11:03:36 fan root:    /dev/mapper/ofanbtr#011  29.00GiB
Mar 26 11:03:36 fan root:
Mar 26 11:03:36 fan root: System,DUP: Size:32.00MiB, Used:16.00KiB
Mar 26 11:03:36 fan root:    /dev/mapper/ofanbtr#011  64.00MiB
Mar 26 11:03:36 fan root:
Mar 26 11:03:36 fan root: Unallocated:
Mar 26 11:03:36 fan root:    /dev/mapper/ofanbtr#011 191.94GiB
Mar 26 11:03:36 fan mh: BEGIN btrfs balance start --force -sprofiles=dup /mnt/ofanbtr
Mar 26 11:03:36 fan root: Done, had to relocate 1 out of 109 chunks
Mar 26 11:03:36 fan mh: btrfs fi df /mnt/ofanbtr
Mar 26 11:03:36 fan root: Data, single: total=79.00GiB, used=78.32GiB
Mar 26 11:03:36 fan root: System, DUP: total=32.00MiB, used=16.00KiB
Mar 26 11:03:36 fan root: Metadata, DUP: total=14.50GiB, used=2.30GiB
Mar 26 11:03:36 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 26 11:03:36 fan mh: btrfs fi show /mnt/ofanbtr
Mar 26 11:03:36 fan root: Label: 'ofanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
Mar 26 11:03:36 fan root: #011Total devices 1 FS bytes used 80.63GiB
Mar 26 11:03:36 fan root: #011devid    1 size 300.00GiB used 108.06GiB path /dev/mapper/ofanbtr
Mar 26 11:03:36 fan root:
Mar 26 11:03:36 fan mh: btrfs fi usage /mnt/ofanbtr
Mar 26 11:03:36 fan root: Overall:
Mar 26 11:03:36 fan root:     Device size:#011#011 300.00GiB
Mar 26 11:03:36 fan root:     Device allocated:#011#011 108.06GiB
Mar 26 11:03:36 fan root:     Device unallocated:#011#011 191.94GiB
Mar 26 11:03:36 fan root:     Device missing:#011#011     0.00B
Mar 26 11:03:36 fan root:     Used:#011#011#011  82.93GiB
Mar 26 11:03:36 fan root:     Free (estimated):#011#011 192.61GiB#011(min: 96.64GiB)
Mar 26 11:03:36 fan root:     Data ratio:#011#011#011      1.00
Mar 26 11:03:36 fan root:     Metadata ratio:#011#011      2.00
Mar 26 11:03:36 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 26 11:03:36 fan root:
Mar 26 11:03:36 fan root: Data,single: Size:79.00GiB, Used:78.32GiB
Mar 26 11:03:36 fan root:    /dev/mapper/ofanbtr#011  79.00GiB
Mar 26 11:03:36 fan root:
Mar 26 11:03:36 fan root: Metadata,DUP: Size:14.50GiB, Used:2.30GiB
Mar 26 11:03:36 fan root:    /dev/mapper/ofanbtr#011  29.00GiB
Mar 26 11:03:36 fan root:
Mar 26 11:03:36 fan root: System,DUP: Size:32.00MiB, Used:16.00KiB
Mar 26 11:03:36 fan root:    /dev/mapper/ofanbtr#011  64.00MiB
Mar 26 11:03:36 fan root:
Mar 26 11:03:36 fan root: Unallocated:
Mar 26 11:03:36 fan root:    /dev/mapper/ofanbtr#011 191.94GiB
Mar 26 11:03:36 fan mh: BEGIN btrfs balance start /mnt/ofanbtr
Mar 26 11:38:40 fan kernel: [ 6796.907286] BTRFS info (device dm-31): 9 enospc errors during balance
Mar 26 11:38:40 fan root: ERROR: error during balancing '/mnt/ofanbtr': No space left on device
Mar 26 11:38:40 fan root: There may be more info in syslog - try dmesg | tail
Mar 26 11:38:40 fan mh: btrfs fi df /mnt/ofanbtr
Mar 26 11:38:40 fan root: Data, single: total=79.00GiB, used=78.33GiB
Mar 26 11:38:40 fan root: System, DUP: total=32.00MiB, used=16.00KiB
Mar 26 11:38:40 fan root: Metadata, DUP: total=19.00GiB, used=2.31GiB
Mar 26 11:38:40 fan root: GlobalReserve, single: total=512.00MiB, used=0.00B
Mar 26 11:38:40 fan mh: btrfs fi show /mnt/ofanbtr
Mar 26 11:38:40 fan root: Label: 'ofanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
Mar 26 11:38:40 fan root: #011Total devices 1 FS bytes used 80.63GiB
Mar 26 11:38:40 fan root: #011devid    1 size 300.00GiB used 117.06GiB path /dev/mapper/ofanbtr
Mar 26 11:38:40 fan root:
Mar 26 11:38:40 fan mh: btrfs fi usage /mnt/ofanbtr
Mar 26 11:38:40 fan root: Overall:
Mar 26 11:38:40 fan root:     Device size:#011#011 300.00GiB
Mar 26 11:38:40 fan root:     Device allocated:#011#011 117.06GiB
Mar 26 11:38:40 fan root:     Device unallocated:#011#011 182.94GiB
Mar 26 11:38:40 fan root:     Device missing:#011#011     0.00B
Mar 26 11:38:40 fan root:     Used:#011#011#011  82.94GiB
Mar 26 11:38:40 fan root:     Free (estimated):#011#011 183.61GiB#011(min: 92.14GiB)
Mar 26 11:38:40 fan root:     Data ratio:#011#011#011      1.00
Mar 26 11:38:40 fan root:     Metadata ratio:#011#011      2.00
Mar 26 11:38:40 fan root:     Global reserve:#011#011 512.00MiB#011(used: 0.00B)
Mar 26 11:38:40 fan root:
Mar 26 11:38:40 fan root: Data,single: Size:79.00GiB, Used:78.33GiB
Mar 26 11:38:40 fan root:    /dev/mapper/ofanbtr#011  79.00GiB
Mar 26 11:38:40 fan root:
Mar 26 11:38:40 fan root: Metadata,DUP: Size:19.00GiB, Used:2.31GiB
Mar 26 11:38:40 fan root:    /dev/mapper/ofanbtr#011  38.00GiB
Mar 26 11:38:40 fan root:
Mar 26 11:38:40 fan root: System,DUP: Size:32.00MiB, Used:16.00KiB
Mar 26 11:38:40 fan root:    /dev/mapper/ofanbtr#011  64.00MiB
Mar 26 11:38:40 fan root:
Mar 26 11:38:40 fan root: Unallocated:
Mar 26 11:38:40 fan root:    /dev/mapper/ofanbtr#011 182.94GiB
Mar 26 11:38:40 fan mh: END btrfs-balance script
[16/515]mh@fan:~$


-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-04 12:31       ` Duncan
  2016-03-04 12:35         ` Hugo Mills
@ 2016-03-27 12:10         ` Martin Steigerwald
  2016-03-27 23:12           ` Duncan
  1 sibling, 1 reply; 81+ messages in thread
From: Martin Steigerwald @ 2016-03-27 12:10 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Freitag, 4. März 2016 12:31:44 CEST Duncan wrote:
> Dāvis Mosāns posted on Thu, 03 Mar 2016 17:39:12 +0200 as excerpted:
> > 2016-03-03 6:57 GMT+02:00 Duncan <1i5t5.duncan@cox.net>:
> >> You're issue isn't the same, because all your space was allocated,
> >> leaving only 1 MiB unallocated, which isn't normally enough to allocate
> >> a new chunk to rewrite the data or metadata from the old chunks into.
> >> 
> >> That's a known issue, with known workarounds as dealt with in the FAQ.
> > 
> > Ah, thanks, well it was surprising for me that balance failed with out
> > of space when both data and metadata had not all been used and I thought
> > it could just use space from those...
> > 
> > especially as from FAQ:
> >> If there is a lot of allocated but unused data or metadata chunks,
> >> a balance may reclaim some of that allocated space. This is the main
> >> reason for running a balance on a single-device filesystem.
> > 
> > so I think regular balance should be smart enough that it could solve
> > this on own and wouldn't need to specify any options.
> 
> Well it does solve the problem on its own... to the extent that it 
> eliminates empty chunks (kernel 3.17+, it didn't before that).  But if 
> there's even a single 4 KiB file block used in the (nominal 1 GiB sized 
> data) chunk, it's no longer empty and thus not eliminated by the empty 
> chunk cleanup routines.

It could theoretically copy part of one almost empty chunk into another chunk 
to free it up, couldn´t it? This way it can free some chunks completely and 
then start the regular balance?

In either case, its unintuitive for the user to fail this. The filesystem 
tools should allow a balance in *any* case without needing special treatment 
by the user.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)
  2016-03-15  7:07                 ` Marc Haber
@ 2016-03-27 12:15                   ` Martin Steigerwald
  0 siblings, 0 replies; 81+ messages in thread
From: Martin Steigerwald @ 2016-03-27 12:15 UTC (permalink / raw)
  To: Marc Haber; +Cc: Btrfs BTRFS

On Dienstag, 15. März 2016 08:07:22 CEST Marc Haber wrote:
> On Mon, Mar 14, 2016 at 09:39:51PM +0100, Henk Slager wrote:
> > >> BTW, I restored and mounted your 20160307-fanbtr-image:
> > >> 
> > >> [266169.207952] BTRFS: device label fanbtr devid 1 transid 22215732
> > >> /dev/loop0 [266203.734804] BTRFS info (device loop0): disk space
> > >> caching is enabled [266203.734806] BTRFS: has skinny extents
> > >> [266204.022175] BTRFS: checking UUID tree
> > >> [266239.407249] attempt to access beyond end of device
> > >> [266239.407252] loop0: rw=1073, want=715202688, limit=705760000
> > >> [266239.407254] BTRFS error (device loop0): bdev /dev/loop0 errs: wr
> > >> 1, rd 0, flush 0, corrupt 0, gen 0
> > >> [266239.407272] attempt to access beyond end of device
> > >> .. and 16 more
> > >> 
> > >> As a quick fix/workaround, I truncated the image to 1T
> > > 
> > > The original fs was 417 GiB in size. What size does the image claim?
> > 
> > ls -alFh  of the restored image showed 337G I remember.
> > btrfs fi us showed also a number over 400G, I don't have the
> > files/loopdev anymore.
> 
> sounds legit.
> 
> > It could some side effect of btrfs-image, I only have used it for
> > multi-device, where dev id's are ignore, but total image size did not
> > lead to problems.
> 
> The original "ofanbtr" seems to have a problem, since btrfs check
> 
> /media/tempdisk says:
> > > [10/509]mh@fan:~$ sudo btrfs check /media/tempdisk/
> > > Superblock bytenr is larger than device size
> > > Couldn't open file system
> > > [11/509]mh@fan:~$
> > > 
> > > Can this be fixed?
> > 
> > What I would do in order to fix it, is resize the fs to let's say
> > 190GiB. That should write correct values to the superblocks I /hope/.
> > And then resize back to max.
> 
> It doesn't:
> [20/518]mh@fan:~$ sudo btrfs filesystem resize 300G /media/tempdisk/
> Resize '/media/tempdisk/' of '300G'
> [22/520]mh@fan:~$ sudo btrfs check /media/tempdisk/
> Superblock bytenr is larger than device size
> Couldn't open file system
> [23/521]mh@fan:~$ df -h

Are you trying the check on the *mounted* filesystem? "media/tempdisk" appears 
to be a mount point, not a device file.

Unmount it and use the / one device file of the filesystem to check.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-03-27 12:10         ` Martin Steigerwald
@ 2016-03-27 23:12           ` Duncan
  0 siblings, 0 replies; 81+ messages in thread
From: Duncan @ 2016-03-27 23:12 UTC (permalink / raw)
  To: linux-btrfs

Martin Steigerwald posted on Sun, 27 Mar 2016 14:10:07 +0200 as excerpted:

> On Freitag, 4. März 2016 12:31:44 CEST Duncan wrote:
>> Dāvis Mosāns posted on Thu, 03 Mar 2016 17:39:12 +0200 as excerpted:
>> > 2016-03-03 6:57 GMT+02:00 Duncan <1i5t5.duncan@cox.net>:
>> >> You're issue isn't the same, because all your space was allocated,
>> >> leaving only 1 MiB unallocated, which isn't normally enough to
>> >> allocate a new chunk to rewrite the data or metadata from the old
>> >> chunks into.
>> >> 
>> >> That's a known issue, with known workarounds as dealt with in the
>> >> FAQ.
>> > 
>> > Ah, thanks, well it was surprising for me that balance failed with
>> > out of space when both data and metadata had not all been used and I
>> > thought it could just use space from those...
>> > 
>> > especially as from FAQ:
>> >> If there is a lot of allocated but unused data or metadata chunks,
>> >> a balance may reclaim some of that allocated space. This is the main
>> >> reason for running a balance on a single-device filesystem.
>> > 
>> > so I think regular balance should be smart enough that it could solve
>> > this on own and wouldn't need to specify any options.
>> 
>> Well it does solve the problem on its own... to the extent that it
>> eliminates empty chunks (kernel 3.17+, it didn't before that).  But if
>> there's even a single 4 KiB file block used in the (nominal 1 GiB sized
>> data) chunk, it's no longer empty and thus not eliminated by the empty
>> chunk cleanup routines.
> 
> It could theoretically copy part of one almost empty chunk into another
> chunk to free it up, couldn´t it? This way it can free some chunks
> completely and then start the regular balance?

To be clear here, as unfortunately I wasn't in the previous reply, "it" 
in this case refers to the kernel's general btrfs handling -- IOW, the 
kernel, since 3.17, routinely deletes entirely empty chunks.

(Tho apparently there are cases when it misses some, as we've had a few 
reports lately of a balance with usage=0 cleaning up more than the 
trivial one or two chunks that could arguably have been "in transit" at 
the time the balance was run... but that would be a bug.)

For the kernel to routinely and automatically move content from one 
partially filled chunk to another in ordered to free the one is a *MUCH* 
higher level of complexity and thus a *MUCH* higher chance of serious 
show-stopping bugs; certainly nothing /I/'d wish to touch, were I a btrfs 
dev.  

It should be noted that btrfs is in general a COW (copy-on-write) 
filesystem, so simply moving content from one chunk into another isn't 
the way it works.  At the individual node level if not at the chunk 
level, the COW nature of btrfs means that modification of the existing 
data in both chunks would require copying the node elsewhere in ordered 
to rewrite it to include the new/modified information, and this must be 
handled atomically such that in the event of a crash, either the old 
version or the new version survives, not a mix of half of one and half of 
the other.  While btrfs is already designed from the ground up with that 
in mind, normal file and metadata updates would handle that within single 
chunks, and coordinating that atomicity across chunks really does add in 
geometric proportion to the complexity of the situation.

Which means there's much more wisdom than might be first appreciated in 
having balance simply stick to the chunk level COW that is its designed 
scope, instead of having it try to do cross-chunk node-level COW, which 
is what you're effectively proposing.  (Of course the complexity is in 
fact rather higher than I'm explaining here, but the fact remains, to the 
extent possible, keeping node level atomic operations to the node level, 
and chunk level atomic operations to the chunk level, **GREATLY** 
simplifies things, and deliberately crossing that level barrier where 
it's not absolutely required is an invitation to bugs so complex and 
severe that they could ultimately collapse the entire filesystem!)

> In either case, its unintuitive for the user to fail this. The
> filesystem tools should allow a balance in *any* case without needing
> special treatment by the user.

In fairness, there's a reason btrfs isn't claiming full stability and 
maturity just yet -- it's stabilizing, but exactly this sort of problems 
need to be worked out, before it can really be called fully stable.  
Meanwhile, as the (borrowed from Latin) saying goes "caveat emptor", "let 
the buyer beware."[1]  It remains the user's responsibility to ensure 
that btrfs is an appropriate filesystem for their use-case, and if so and 
once installed, that it remains within healthy operating parameters, 
enough unallocated space is kept available to complete balances, backups 
are kept in case some bug kills the filesystem, etc.

I think what ultimately needs to and probably will happen, is they'll 
create a new kind of global reserve that will come from unallocated space 
(instead of already allocated metadata chunks, which is where the current 
global reserve comes from, providing the same sort of reserve-COW-space 
functionality to more ordinary metadata fuctions), reserving enough of it 
to allocate at least one more full-size data chunk and one more full size 
metadata chunk, with only balance allowed to actually use that new global 
reserve space.  That way, balance will always have enough space to do 
what it needs to do.

Of course, it may well be necessary to let users tweak this reserve 
space, say at mkfs.btrfs time, so users creating for instance smaller 
mixed-data/metadata-chunk mode filesystems (like the 256 MiB /boot I have 
on one device, with a parallel 256 MiB backup /boot on a second device) 
can use all the space if it's more convenient for them to backup and do a 
new mkfs.btrfs than it is to reserve additional otherwise unusable space 
on tiny filesystems for balances they don't intend to do anyway.  
Similarly, users at the TB scale might want to reserve say 100 GiB 
instead of the default 1.5 GiB or so, and people doing large multi-device 
filesystems might want to do say 20 or 50 GiB per device.  Etc.  But the 
default reserve from unallocated would be enough for at least 1 chunk 
each of data and metadata, two chunks for dup mode on a single device, on 
each device.

---
[1] Caveat Emptor:  https://en.wikipedia.org/wiki/Caveat_emptor

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Again, no space left on device while rebalancing and recipe doesnt work
  2016-02-27 21:14 Again, no space left on device while rebalancing and recipe doesnt work Marc Haber
                   ` (3 preceding siblings ...)
  2016-03-27  8:41 ` Current state of old filesystem " Marc Haber
@ 2016-04-01 13:59 ` Marc Haber
  4 siblings, 0 replies; 81+ messages in thread
From: Marc Haber @ 2016-04-01 13:59 UTC (permalink / raw)
  To: linux-btrfs

On Sat, Feb 27, 2016 at 10:14:50PM +0100, Marc Haber wrote:
> I have again the issue of no space left on device while rebalancing
> (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable):

just for the record: The host started acting up in more and more
interesting ways, and after a call of rm during kernel build resulted
in SIGSEGV, I did the backup-format-restore routine for this system
back to ext4 just to find out whether I have bad hardware or a bad
filesystem.

And, since going back to ext4, the system is just fine again. So it's
not bad hardware.

This systems's root drive is going to stay on ext4 for a loooooong
time. If I get the btrfs phenomena I experience on other hosts get
solved at some time in the future, I might migrate /home back to
btrfs, but that's not going to happen in the next six months.

This is a really bad experience which has made me lost a lot of faith
in the new filesystem. I really feel sad about that.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2016-04-01 13:59 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-27 21:14 Again, no space left on device while rebalancing and recipe doesnt work Marc Haber
2016-02-27 23:15 ` Martin Steigerwald
2016-02-28  0:08   ` Marc Haber
2016-02-28  0:22     ` Hugo Mills
2016-02-28  8:40       ` Marc Haber
2016-02-29  1:56 ` Qu Wenruo
2016-02-29 15:33   ` Marc Haber
2016-03-01  0:45     ` Qu Wenruo
     [not found]       ` <20160301065448.GJ2334@torres.zugschlus.de>
2016-03-01  7:24         ` Qu Wenruo
2016-03-01  8:13           ` Qu Wenruo
     [not found]             ` <20160301161659.GR2334@torres.zugschlus.de>
2016-03-03  2:02               ` Qu Wenruo
2016-03-01 20:51           ` Duncan
2016-03-05 14:28             ` Marc Haber
2016-03-03  0:28 ` Dāvis Mosāns
2016-03-03  3:42   ` Qu Wenruo
2016-03-03  4:57   ` Duncan
2016-03-03 15:39     ` Dāvis Mosāns
2016-03-04 12:31       ` Duncan
2016-03-04 12:35         ` Hugo Mills
2016-03-27 12:10         ` Martin Steigerwald
2016-03-27 23:12           ` Duncan
2016-03-05 14:39   ` Marc Haber
2016-03-05 19:34     ` Chris Murphy
2016-03-05 20:09       ` Marc Haber
2016-03-06  6:43         ` Duncan
2016-03-06 20:27           ` Chris Murphy
2016-03-06 20:37             ` Chris Murphy
2016-03-07  8:47               ` Marc Haber
2016-03-07  8:42             ` Marc Haber
2016-03-07 18:39               ` Chris Murphy
2016-03-07 18:56                 ` Austin S. Hemmelgarn
2016-03-07 19:07                   ` Chris Murphy
2016-03-07 19:33                   ` Marc Haber
2016-03-12 21:36                 ` Marc Haber
2016-03-07 19:44               ` Chris Murphy
2016-03-07 20:43                 ` Duncan
2016-03-07 22:44                   ` Chris Murphy
2016-03-12 21:30             ` Marc Haber
2016-03-07  8:30           ` Marc Haber
2016-03-07 20:07             ` Duncan
2016-03-07  8:56         ` Marc Haber
2016-03-12 19:57       ` Marc Haber
2016-03-13 19:43         ` Chris Murphy
2016-03-13 20:50           ` Marc Haber
2016-03-13 21:31             ` Chris Murphy
2016-03-12 21:14       ` Marc Haber
2016-03-13 11:58       ` New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work) Marc Haber
2016-03-13 13:17         ` Andrew Vaughan
2016-03-13 16:56           ` Marc Haber
2016-03-13 17:12         ` Duncan
2016-03-13 21:05           ` Marc Haber
2016-03-14  1:05             ` Duncan
2016-03-14 11:49               ` Marc Haber
2016-03-13 19:14         ` Henk Slager
2016-03-13 19:42           ` Henk Slager
2016-03-13 20:56           ` Marc Haber
2016-03-14  0:00             ` Henk Slager
2016-03-15  7:20               ` Marc Haber
2016-03-14 12:07         ` Marc Haber
2016-03-14 12:48           ` New file system with same issue Holger Hoffstätte
2016-03-14 20:13             ` Marc Haber
2016-03-15 10:52               ` Holger Hoffstätte
2016-03-15 13:46                 ` Marc Haber
2016-03-15 13:54                   ` Austin S. Hemmelgarn
2016-03-15 14:09                     ` Marc Haber
2016-03-17  1:17               ` A good "Boot Maintenance" scheme (WAS: New file system with same issue) Robert White
2016-03-14 13:46           ` New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work) Henk Slager
2016-03-14 20:05             ` Marc Haber
2016-03-14 20:39               ` Henk Slager
2016-03-14 21:59                 ` Chris Murphy
2016-03-14 23:22                   ` Henk Slager
2016-03-15  7:16                     ` Marc Haber
2016-03-15 12:15                       ` Henk Slager
2016-03-15 13:24                         ` Marc Haber
2016-03-15  7:07                 ` Marc Haber
2016-03-27 12:15                   ` Martin Steigerwald
2016-03-15 13:29               ` Marc Haber
2016-03-15 13:42                 ` Marc Haber
2016-03-15 16:54                   ` Henk Slager
2016-03-27  8:41 ` Current state of old filesystem " Marc Haber
2016-04-01 13:59 ` Again, no space left on device while rebalancing and recipe doesnt work Marc Haber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.