BTRFS constantly reports "No space left on device" even with a huge unallocated space

All of lore.kernel.org
 help / color / mirror / Atom feed

* BTRFS constantly reports "No space left on device" even with a huge unallocated space
@ 2016-08-12 17:36 Ronan Arraes Jardim Chagas
  2016-08-12 18:02 ` Chris Murphy
                   ` (2 more replies)
  0 siblings, 3 replies; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-08-12 17:36 UTC (permalink / raw)
  To: linux-btrfs

Hi guys,

I'm facing a daily problem with BTRFS. Almost everyday, I get the
message "No space left on device". Sometimes I can recover by balancing
the system but sometimes even balancing does not work due to the lack
of space. In this case, only a hard reset works if I can't delete some
files. The problem is that I have a huge unallocated space as you can
see here:

# btrfs fi usage /
Overall:
    Device size:		   1.26TiB
    Device allocated:		 119.07GiB
    Device unallocated:		   1.14TiB
    Device missing:		     0.00B
    Used:			 115.08GiB
    Free (estimated):		   1.14TiB	(min: 586.21GiB)
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 512.00MiB	(used: 0.00B)

Data,single: Size:113.01GiB, Used:111.19GiB
   /dev/sda6	 113.01GiB

Metadata,DUP: Size:3.00GiB, Used:1.94GiB
   /dev/sda6	   6.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/sda6	  64.00MiB

Unallocated:
   /dev/sda6	   1.14TiB

It is not easy to trigger the problem. But I do find some correlation
between two things:

1) When I started to create jails to build openSUSE packages locally,
then the problem happens more often. In these jails, some directories
like /dev/, /dev/pts, /proc, are mounted inside the jail.

2) When I open my KVM, I also see this problem more often. Notice,
however, that the KVM disk is stored in another EXT4 partition.

I would be glad if anyone can help me to fix it. In the following, I'm
providing more information about my system:

# uname -a
Linux ronanarraes-osd 4.7.0-1-default #1 SMP PREEMPT Mon Jul 25
08:42:47 UTC 2016 (89a2ada) x86_64 x86_64 x86_64 GNU/Linux

# btrfs --version
btrfs-progs v4.6.1+20160714

# btrfs fi show
Label: none  uuid: 80381f7f-8cef-4bd8-bdbc-3487253ee566
	Total devices 1 FS bytes used 113.13GiB
	devid    1 size 1.26TiB used 119.07GiB path /dev/sda6

# btrfs fi df /
Data, single: total=113.01GiB, used=111.19GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=3.00GiB, used=1.94GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-12 17:36 BTRFS constantly reports "No space left on device" even with a huge unallocated space Ronan Arraes Jardim Chagas
@ 2016-08-12 18:02 ` Chris Murphy
  2016-08-12 19:00   ` Ronan Arraes Jardim Chagas
  2016-08-29 12:12 ` Wang Xiaoguang
  2016-09-13  3:17 ` Wang Xiaoguang
  2 siblings, 1 reply; 82+ messages in thread
From: Chris Murphy @ 2016-08-12 18:02 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas; +Cc: Btrfs BTRFS

On Fri, Aug 12, 2016 at 11:36 AM, Ronan Arraes Jardim Chagas
<ronisbr@gmail.com> wrote:
> Hi guys,
>
> I'm facing a daily problem with BTRFS. Almost everyday, I get the
> message "No space left on device". Sometimes I can recover by balancing
> the system but sometimes even balancing does not work due to the lack
> of space. In this case, only a hard reset works if I can't delete some
> files. The problem is that I have a huge unallocated space as you can
> see here:
>
> # btrfs fi usage /
> Overall:
>     Device size:                   1.26TiB
>     Device allocated:            119.07GiB
>     Device unallocated:            1.14TiB

Tons of unallocated space. What kernel messages do you get for the
enospc? It sounds like this will be one of the mystery -28 error file
systems. So far as I recall the only work around is recreating the
file system. There are two additional things you can try: mount with
enospc_debug mount option and see if you can gather more information
about the problem. Or try a 4.8rc1 kernel which as a large number of
enospc changes.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-12 18:02 ` Chris Murphy
@ 2016-08-12 19:00   ` Ronan Arraes Jardim Chagas
  2016-08-12 19:37     ` Chris Murphy
  0 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-08-12 19:00 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Em Sex, 2016-08-12 às 12:02 -0600, Chris Murphy escreveu:
> Tons of unallocated space. What kernel messages do you get for the
> enospc? It sounds like this will be one of the mystery -28 error file
> systems. So far as I recall the only work around is recreating the
> file system. There are two additional things you can try: mount with
> enospc_debug mount option and see if you can gather more information
> about the problem. Or try a 4.8rc1 kernel which as a large number of
> enospc changes.
> 
> 

Unfortunately no log was written due to the lack of space :)
Next time it happens, I will take a screenshot of the message. Do you
think that if I reinstall my openSUSE it will be fixed?

Regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-12 19:00   ` Ronan Arraes Jardim Chagas
@ 2016-08-12 19:37     ` Chris Murphy
  2016-08-12 20:34       ` Chris Murphy
  0 siblings, 1 reply; 82+ messages in thread
From: Chris Murphy @ 2016-08-12 19:37 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas; +Cc: Chris Murphy, Btrfs BTRFS

On Fri, Aug 12, 2016 at 1:00 PM, Ronan Arraes Jardim Chagas
<ronisbr@gmail.com> wrote:
> Em Sex, 2016-08-12 às 12:02 -0600, Chris Murphy escreveu:
>> Tons of unallocated space. What kernel messages do you get for the
>> enospc? It sounds like this will be one of the mystery -28 error file
>> systems. So far as I recall the only work around is recreating the
>> file system. There are two additional things you can try: mount with
>> enospc_debug mount option and see if you can gather more information
>> about the problem. Or try a 4.8rc1 kernel which as a large number of
>> enospc changes.
>>
>>
>
> Unfortunately no log was written due to the lack of space :)

a. journalctl -f in a Terminal window or tab should still record
everything. So long as the OS isn't totally face planting when the
enospc happens, you may still be able to copy paste it into a file
that you can save on another file system volume. It might have some
noisy messages from systemd-journald being unable to flush to disk but
the enospc itself should all be in the window even though they don't
get committed to disk.

b. Modify /etc/systemd/journald.conf so that Storage=volatile and now
the journal is only in memory, and you can flush it to another file
system yourself with something like 'journalctl -b -o short-monotonic
> journal.log'

c. create a ~1GiB separate file system and mount it at /var/log/

d. Run journalctl -f from a 2nd computer.

> Next time it happens, I will take a screenshot of the message.

Maybe. enospc_debug tends to spit out more than the usual amount of
stuff that'll fit on a single screen.

> Do you
> think that if I reinstall my openSUSE it will be fixed?

Probably but the nature of this probem isn't well understood as far as
I know. It's not that common or it'd be easy for a dev to reproduce
and then figure out what's going on.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-12 19:37     ` Chris Murphy
@ 2016-08-12 20:34       ` Chris Murphy
       [not found]         ` <CAKdnfRJeOXHmrumDkfxLTf-nU=KwZ0f7ybET-3o7kwwJDOZ2aw@mail.gmail.com>
  0 siblings, 1 reply; 82+ messages in thread
From: Chris Murphy @ 2016-08-12 20:34 UTC (permalink / raw)
  Cc: Ronan Arraes Jardim Chagas, Btrfs BTRFS

On Fri, Aug 12, 2016 at 1:37 PM, Chris Murphy <lists@colorremedies.com> wrote:
> On Fri, Aug 12, 2016 at 1:00 PM, Ronan Arraes Jardim Chagas
> <ronisbr@gmail.com> wrote:

>
> d. Run journalctl -f from a 2nd computer.

Hopefully it's obvious I mean run journalctl -f on the affected
computer remotely via ssh.

>
>> Do you
>> think that if I reinstall my openSUSE it will be fixed?
>
> Probably but the nature of this probem isn't well understood as far as
> I know. It's not that common or it'd be easy for a dev to reproduce
> and then figure out what's going on.

Since this file system has relatively small metadata size, just under
2GiB, it might be useful to take a btrfs-image of it and put it up
somewhere like a google drive, or wherever it can remain for a while.
Options -t 4 -c9 -s are fairly standard and sanitize file names. Data
itself is not included in the image. From this I think a dev might be
able to figure out what's unique about this file system that results
in the bogus enospc. If you do this, I recommend filing a
bugzilla.kernel.org bug and include URL to the image and URL to this
thread, and then the bugzilla URL in a post on this thread that way
everything is cross referenced.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
       [not found]         ` <CAKdnfRJeOXHmrumDkfxLTf-nU=KwZ0f7ybET-3o7kwwJDOZ2aw@mail.gmail.com>
@ 2016-08-15 23:24           ` Chris Murphy
  2016-08-16 17:49             ` Ronan Arraes Jardim Chagas
                               ` (3 more replies)
  0 siblings, 4 replies; 82+ messages in thread
From: Chris Murphy @ 2016-08-15 23:24 UTC (permalink / raw)
  To: Ronan Chagas; +Cc: Chris Murphy, Btrfs BTRFS

On Mon, Aug 15, 2016 at 5:12 PM, Ronan Chagas <ronisbr@gmail.com> wrote:
> Hi guys!
>
> It happened again. The computer was completely unusable. The only useful
> message I saw was this one:
>
> http://img.ctrlv.in/img/16/08/16/57b24b0bb2243.jpg
>
> Does it help?
>
> I decided to format and reinstall tomorrow. This is a production machine and
> I have to fix this ASAP.

Looks similar to this:
https://lkml.org/lkml/2016/3/28/230

Can you describe the workload happening at the time?


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-15 23:24           ` Chris Murphy
@ 2016-08-16 17:49             ` Ronan Arraes Jardim Chagas
  2016-08-22 19:11             ` Ronan Arraes Jardim Chagas
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-08-16 17:49 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Em Seg, 2016-08-15 às 17:24 -0600, Chris Murphy escreveu:
> On Mon, Aug 15, 2016 at 5:12 PM, Ronan Chagas <ronisbr@gmail.com>
> wrote:
> > 
> > Hi guys!
> > 
> > It happened again. The computer was completely unusable. The only
> > useful
> > message I saw was this one:
> > 
> > http://img.ctrlv.in/img/16/08/16/57b24b0bb2243.jpg
> > 
> > Does it help?
> > 
> > I decided to format and reinstall tomorrow. This is a production
> > machine and
> > I have to fix this ASAP.
> 
> Looks similar to this:
> https://lkml.org/lkml/2016/3/28/230
> 
> Can you describe the workload happening at the time?

I was copying my /home using rsyinc when this happened. Unfortunately I
needed to format this machine because it is a production system. If I
see any problems related to that, I will report to this mailing list.

Regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-15 23:24           ` Chris Murphy
  2016-08-16 17:49             ` Ronan Arraes Jardim Chagas
@ 2016-08-22 19:11             ` Ronan Arraes Jardim Chagas
  2016-08-22 20:39             ` Ronan Arraes Jardim Chagas
  2016-08-25 15:58             ` Lutz Vieweg
  3 siblings, 0 replies; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-08-22 19:11 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

New information guys! I formatted using the latest Tumbleweed snapshot
(btrfs-progs v4.7+20160729) and I still have the same problem.

I notice two things. First, when I see the "No space left on device",
it is fixed when the Metadata space increases **a lot**. For example,
when the error first occurred, I had:

Metadata, DUP: total=2.00GiB, used=811.52MiB

After waiting a while (could not run balance), it was automatically
fixed and then I have:

Metadata, DUP: total=9.50GiB, used=811.52MiB

During the error, when I ran the balance command, I see these messages
in `dmesg`:

Ago 22 16:00:03 ronanarraes-osd kernel: BTRFS info (device sda6):
relocating block group 9323937792 flags 34
Ago 22 16:00:04 ronanarraes-osd kernel: BTRFS info (device sda6): found
1 extents
Ago 22 16:00:04 ronanarraes-osd kernel: BTRFS info (device sda6): 1
enospc errors during balance
Ago 22 16:00:24 ronanarraes-osd kernel: BTRFS info (device sda6):
relocating block group 36201037824 flags 34
Ago 22 16:00:24 ronanarraes-osd kernel: BTRFS info (device sda6): 2
enospc errors during balance
Ago 22 16:00:45 ronanarraes-osd kernel: BTRFS info (device sda6):
relocating block group 36234592256 flags 34
Ago 22 16:00:46 ronanarraes-osd kernel: BTRFS info (device sda6): found
1 extents
Ago 22 16:00:46 ronanarraes-osd kernel: BTRFS info (device sda6): 4
enospc errors during balance
Ago 22 16:01:20 ronanarraes-osd kernel: BTRFS info (device sda6):
relocating block group 38415630336 flags 34
Ago 22 16:01:21 ronanarraes-osd kernel: BTRFS info (device sda6): found
1 extents
Ago 22 16:01:21 ronanarraes-osd kernel: BTRFS info (device sda6): 8
enospc errors during balance

Does it add anything relevant to the problem?

Regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-15 23:24           ` Chris Murphy
  2016-08-16 17:49             ` Ronan Arraes Jardim Chagas
  2016-08-22 19:11             ` Ronan Arraes Jardim Chagas
@ 2016-08-22 20:39             ` Ronan Arraes Jardim Chagas
  2016-08-22 20:49               ` Chris Murphy
  2016-08-25 15:58             ` Lutz Vieweg
  3 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-08-22 20:39 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

The same thing just happened again! And now it was also fixed
automatically, but now I have:

Metadata,DUP: Size:33.50GiB, Used:812.78MiB

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-22 20:39             ` Ronan Arraes Jardim Chagas
@ 2016-08-22 20:49               ` Chris Murphy
  2016-08-22 21:04                 ` Ronan Arraes Jardim Chagas
  0 siblings, 1 reply; 82+ messages in thread
From: Chris Murphy @ 2016-08-22 20:49 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas; +Cc: Chris Murphy, Btrfs BTRFS

On Mon, Aug 22, 2016 at 2:39 PM, Ronan Arraes Jardim Chagas
<ronisbr@gmail.com> wrote:
> The same thing just happened again! And now it was also fixed
> automatically, but now I have:
>
> Metadata,DUP: Size:33.50GiB, Used:812.78MiB

This is really weird. I'm running 4.7.0 (Fedora) and I'm not
experiencing problems, let alone this. What is this kernel's
provenance? Is it a plain mainline 4.7.0 that you built? I'm not
really sure what to recommend except maybe going back to 4.5.7 or
4.6.7 as it's a production machine. Heck even 4.4.19 is OK for me in
this regard.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-22 20:49               ` Chris Murphy
@ 2016-08-22 21:04                 ` Ronan Arraes Jardim Chagas
  2016-08-24  0:40                   ` Jeff Mahoney
  0 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-08-22 21:04 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Em Seg, 2016-08-22 às 14:49 -0600, Chris Murphy escreveu:
> This is really weird. I'm running 4.7.0 (Fedora) and I'm not
> experiencing problems, let alone this. What is this kernel's
> provenance? Is it a plain mainline 4.7.0 that you built? I'm not
> really sure what to recommend except maybe going back to 4.5.7 or
> 4.6.7 as it's a production machine. Heck even 4.4.19 is OK for me in
> this regard.
> 

Well, I'm using the default openSUSE kernel here. And I have been seen
this errors for sometimes. When I reported it, I was using v4.6.1.
Hence, I think the version of btrfs-progs is not the problem.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-22 21:04                 ` Ronan Arraes Jardim Chagas
@ 2016-08-24  0:40                   ` Jeff Mahoney
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Mahoney @ 2016-08-24  0:40 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Chris Murphy; +Cc: Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 1063 bytes --]

On 8/22/16 5:04 PM, Ronan Arraes Jardim Chagas wrote:
> Em Seg, 2016-08-22 às 14:49 -0600, Chris Murphy escreveu:
>> This is really weird. I'm running 4.7.0 (Fedora) and I'm not
>> experiencing problems, let alone this. What is this kernel's
>> provenance? Is it a plain mainline 4.7.0 that you built? I'm not
>> really sure what to recommend except maybe going back to 4.5.7 or
>> 4.6.7 as it's a production machine. Heck even 4.4.19 is OK for me in
>> this regard.
>>
> 
> Well, I'm using the default openSUSE kernel here. And I have been seen
> this errors for sometimes. When I reported it, I was using v4.6.1.
> Hence, I think the version of btrfs-progs is not the problem.

The openSUSE Tumbleweed kernel is effectively vanilla for btrfs.  The
4.6-based kernel had two btrfs patches, the crc32c implementation
publishing patch that landed in 4.7, and the super_operation callback
that lets us publish the per-root anon dev via stat().  The problem
you're encountering isn't due to our patches.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-15 23:24           ` Chris Murphy
                               ` (2 preceding siblings ...)
  2016-08-22 20:39             ` Ronan Arraes Jardim Chagas
@ 2016-08-25 15:58             ` Lutz Vieweg
  2016-08-25 23:56               ` Chris Murphy
  3 siblings, 1 reply; 82+ messages in thread
From: Lutz Vieweg @ 2016-08-25 15:58 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Btrfs BTRFS

On 08/16/2016 01:24 AM, Chris Murphy wrote:
> On Mon, Aug 15, 2016 at 5:12 PM, Ronan Chagas <ronisbr@gmail.com> wrote:
>> It happened again. The computer was completely unusable. The only useful
>> message I saw was this one:
>>
>> http://img.ctrlv.in/img/16/08/16/57b24b0bb2243.jpg
>
> Looks similar to this:
> https://lkml.org/lkml/2016/3/28/230

Looks also similar to the subject of the lenghty thread titled
"6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem"
that started with:
> http://www.spinics.net/lists/linux-btrfs/msg50599.html

Regards,

Lutz Vieweg


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-25 15:58             ` Lutz Vieweg
@ 2016-08-25 23:56               ` Chris Murphy
  2016-08-26  5:59                 ` Marc Haber
  0 siblings, 1 reply; 82+ messages in thread
From: Chris Murphy @ 2016-08-25 23:56 UTC (permalink / raw)
  To: Lutz Vieweg; +Cc: Chris Murphy, Ronan Chagas, Btrfs BTRFS

On Thu, Aug 25, 2016 at 9:58 AM, Lutz Vieweg <lvml@5t9.de> wrote:
> On 08/16/2016 01:24 AM, Chris Murphy wrote:
>>
>> On Mon, Aug 15, 2016 at 5:12 PM, Ronan Chagas <ronisbr@gmail.com> wrote:
>>>
>>> It happened again. The computer was completely unusable. The only useful
>>> message I saw was this one:
>>>
>>> http://img.ctrlv.in/img/16/08/16/57b24b0bb2243.jpg
>>
>>
>> Looks similar to this:
>> https://lkml.org/lkml/2016/3/28/230
>
>
> Looks also similar to the subject of the lenghty thread titled
> "6TB partition, Data only 2TB - aka When you haven't hit the "usual"
> problem"
> that started with:
>>
>> http://www.spinics.net/lists/linux-btrfs/msg50599.html

I'm thinking it might be a conflict with the OP doing builds, implies
heavy writes, maybe especially heavy metadata based on the large
increase in metadata allocation; along with the default on opensuse
using snapper to make read only snapshots. That it can't be triggered
on demand makes sense, if the build doesn't overlap with snapper
making a snapshot of /home.

http://www.spinics.net/lists/linux-btrfs/msg52670.html

Anyway it's a known problem, I don't think it's fixed still. There's a
lot of enospc work in 4.8 so eventually it'll make sense to give it a
shot with that kernel.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-25 23:56               ` Chris Murphy
@ 2016-08-26  5:59                 ` Marc Haber
  0 siblings, 0 replies; 82+ messages in thread
From: Marc Haber @ 2016-08-26  5:59 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Lutz Vieweg, Ronan Chagas, Btrfs BTRFS

hi,

On Thu, Aug 25, 2016 at 05:56:18PM -0600, Chris Murphy wrote:
> Anyway it's a known problem, I don't think it's fixed still. There's a
> lot of enospc work in 4.8 so eventually it'll make sense to give it a
> shot with that kernel.

assuming that I'm willing to try that, will a successful rebalance
with 4.8 fix a filesystem, or is the recommended way still "backup,
format, restore, lose all snapshots"?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-12 17:36 BTRFS constantly reports "No space left on device" even with a huge unallocated space Ronan Arraes Jardim Chagas
  2016-08-12 18:02 ` Chris Murphy
@ 2016-08-29 12:12 ` Wang Xiaoguang
  2016-08-29 13:20   ` Ronan Arraes Jardim Chagas
                     ` (2 more replies)
  2016-09-13  3:17 ` Wang Xiaoguang
  2 siblings, 3 replies; 82+ messages in thread
From: Wang Xiaoguang @ 2016-08-29 12:12 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, linux-btrfs

hello,

On 08/13/2016 01:36 AM, Ronan Arraes Jardim Chagas wrote:
> Hi guys,
>
> I'm facing a daily problem with BTRFS. Almost everyday, I get the
> message "No space left on device". Sometimes I can recover by balancing
> the system but sometimes even balancing does not work due to the lack
> of space. In this case, only a hard reset works if I can't delete some
> files. The problem is that I have a huge unallocated space as you can
> see here:
>
> # btrfs fi usage /
> Overall:
>      Device size:		   1.26TiB
>      Device allocated:		 119.07GiB
>      Device unallocated:		   1.14TiB
>      Device missing:		     0.00B
>      Used:			 115.08GiB
>      Free (estimated):		   1.14TiB	(min: 586.21GiB)
>      Data ratio:			      1.00
>      Metadata ratio:		      2.00
>      Global reserve:		 512.00MiB	(used: 0.00B)
>
> Data,single: Size:113.01GiB, Used:111.19GiB
>     /dev/sda6	 113.01GiB
>
> Metadata,DUP: Size:3.00GiB, Used:1.94GiB
>     /dev/sda6	   6.00GiB
>
> System,DUP: Size:32.00MiB, Used:16.00KiB
>     /dev/sda6	  64.00MiB
>
> Unallocated:
>     /dev/sda6	   1.14TiB
>
> It is not easy to trigger the problem. But I do find some correlation
> between two things:
>
> 1) When I started to create jails to build openSUSE packages locally,
> then the problem happens more often. In these jails, some directories
> like /dev/, /dev/pts, /proc, are mounted inside the jail.
>
> 2) When I open my KVM, I also see this problem more often. Notice,
> however, that the KVM disk is stored in another EXT4 partition.
>
> I would be glad if anyone can help me to fix it. In the following, I'm
> providing more information about my system:
>
> # uname -a
> Linux ronanarraes-osd 4.7.0-1-default #1 SMP PREEMPT Mon Jul 25
> 08:42:47 UTC 2016 (89a2ada) x86_64 x86_64 x86_64 GNU/Linux
>
> # btrfs --version
> btrfs-progs v4.6.1+20160714
>
> # btrfs fi show
> Label: none  uuid: 80381f7f-8cef-4bd8-bdbc-3487253ee566
> 	Total devices 1 FS bytes used 113.13GiB
> 	devid    1 size 1.26TiB used 119.07GiB path /dev/sda6
>
> # btrfs fi df /
> Data, single: total=113.01GiB, used=111.19GiB
> System, DUP: total=32.00MiB, used=16.00KiB
> Metadata, DUP: total=3.00GiB, used=1.94GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
When strange ENOSPC errors occur, I think "btrfs fi usage"
or "btrfs di df" do not help too much. Their output do not
reflect btrfs kernel current status :)

Would you please provide attribute files' values in 
/sys/fs/btrfs/$UUID/allocation/data
and /sys/fs/btrfs/$UUID/allocation/metadata when ENOSPC error occurs.

Regards,
Xiaoguang Wang

>
> Regards,
> Ronan Arraes
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>




^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-29 12:12 ` Wang Xiaoguang
@ 2016-08-29 13:20   ` Ronan Arraes Jardim Chagas
  2016-08-29 15:52   ` Ronan Arraes Jardim Chagas
  2016-09-14 20:15   ` Ronan Arraes Jardim Chagas
  2 siblings, 0 replies; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-08-29 13:20 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs

Hi!

Em Seg, 2016-08-29 às 20:12 +0800, Wang Xiaoguang escreveu:
> When strange ENOSPC errors occur, I think "btrfs fi usage"
> or "btrfs di df" do not help too much. Their output do not
> reflect btrfs kernel current status :)
> 
> Would you please provide attribute files' values in 
> /sys/fs/btrfs/$UUID/allocation/data
> and /sys/fs/btrfs/$UUID/allocation/metadata when ENOSPC error occurs.
> 

Sure! As soon as I see the error again, I will send this results. Now,
I see that if I move my jail directory to a ext4 partition, then I do
not see the problem anymore, but I need more test to validade this
assumption.

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-29 12:12 ` Wang Xiaoguang
  2016-08-29 13:20   ` Ronan Arraes Jardim Chagas
@ 2016-08-29 15:52   ` Ronan Arraes Jardim Chagas
  2016-08-29 22:25     ` Jeff Mahoney
  2016-08-30  2:12     ` Wang Xiaoguang
  2016-09-14 20:15   ` Ronan Arraes Jardim Chagas
  2 siblings, 2 replies; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-08-29 15:52 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs

Hi guys,

I just have the problem again. Now, it happens during the lunch time
when the machine was idle. Only the system processes were running. It
was not the first time that I saw this problem just after lunch when
the machine stayed idle for a long period (+- 1h). 

Here is the information requested:

/sys/fs/btrfs/$UUID/allocation/data

./bytes_may_use
0
./bytes_pinned
0
./bytes_reserved
0
./bytes_used
36128374784
./disk_total
37589352448
./disk_used
36128374784
./flags
1
./total_bytes
37589352448
./total_bytes_pinned
20339560448
./single/total_bytes
37589352448
./single/used_bytes
36128374784

/sys/fs/btrfs/$UUID/allocation/metadata

./bytes_may_use
84974452736
./bytes_pinned
0
./bytes_reserved
0
./bytes_used
977354752
./disk_total
4294967296
./disk_used
1954709504
./flags
4
./total_bytes
2147483648
./total_bytes_pinned
-57851904
./dup/total_bytes
2147483648
./dup/used_bytes
977354752

# btrfs fi usage /
Overall:
    Device size:		   1.26TiB
    Device allocated:		  39.07GiB
    Device unallocated:		   1.22TiB
    Device missing:		     0.00B
    Used:			  35.29GiB
    Free (estimated):		   1.22TiB	(min: 625.93GiB)
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 320.00MiB	(used: 0.00B)

Data,single: Size:35.01GiB, Used:33.47GiB
   /dev/sda6	  35.01GiB

Metadata,DUP: Size:2.00GiB, Used:932.00MiB
   /dev/sda6	   4.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/sda6	  64.00MiB

Unallocated:
   /dev/sda6	   1.22TiB

# btrfs fi df /
Data, single: total=35.01GiB, used=33.47GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=2.00GiB, used=932.09MiB
GlobalReserve, single: total=320.00MiB, used=0.0

I also saw the following information in `journalctl`:

Ago 29 10:25:33 ronanarraes-osd kernel: ------------[ cut here ]-------
-----
Ago 29 10:25:33 ronanarraes-osd kernel: WARNING: CPU: 4 PID: 30424 at
../fs/btrfs/extent-tree.c:4303
btrfs_free_reserved_data_space_noquota+0xfe/0x110 [btrfs]
Ago 29 10:25:33 ronanarraes-osd kernel: Modules linked in: fuse
nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit
af_packet iscsi_ibft iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6
xt_tcpudp nf_
Ago 29 10:25:33 ronanarraes-osd kernel:  mei_wdt sysimgblt
iTCO_vendor_support i2c_i801 tpm_infineon tpm_tis tpm ioatdma
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
aes_x86_64 lrw sparse_keymap
Ago 29 10:25:33 ronanarraes-osd kernel: CPU: 4 PID: 30424 Comm:
kworker/u65:1 Tainted: P           O    4.7.1-1-default #1
Ago 29 10:25:33 ronanarraes-osd kernel: Hardware name: Hewlett-Packard
HP Z820 Workstation/158B, BIOS J63 v03.65 12/19/2013
Ago 29 10:25:33 ronanarraes-osd kernel: Workqueue: writeback wb_workfn
(flush-btrfs-1)
Ago 29 10:25:33 ronanarraes-osd kernel:  0000000000000000
ffffffff81393104 0000000000000000 0000000000000000
Ago 29 10:25:33 ronanarraes-osd kernel:  ffffffff8107ca1e
ffff88100027c800 0000000000001000 ffff88082ff06400
Ago 29 10:25:33 ronanarraes-osd kernel:  ffff88100c7af784
0000000000001000 ffff8805bd60f6cc ffffffffa025098e
Ago 29 10:25:33 ronanarraes-osd kernel: Call Trace:
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8102ed5e>]
dump_trace+0x5e/0x320
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8102f12c>]
show_stack_log_lvl+0x10c/0x180
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8102fe41>]
show_stack+0x21/0x40
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff81393104>]
dump_stack+0x5c/0x78
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8107ca1e>]
__warn+0xbe/0xe0
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa025098e>]
btrfs_free_reserved_data_space_noquota+0xfe/0x110 [btrfs]
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa026d036>]
btrfs_clear_bit_hook+0x296/0x380 [btrfs]
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028a755>]
clear_state_bit+0x55/0x1d0 [btrfs]
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028aa0d>]
__clear_extent_bit+0x13d/0x3f0 [btrfs]
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028b8d2>]
extent_clear_unlock_delalloc+0x62/0x280 [btrfs]
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa0273722>]
run_delalloc_nocow+0x962/0xba0 [btrfs]
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa0273cbf>]
run_delalloc_range+0x35f/0x3b0 [btrfs]
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028c090>]
writepage_delalloc.isra.40+0x100/0x170 [btrfs]
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028e9d3>]
__extent_writepage+0xc3/0x340 [btrfs]
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028ee8b>]
extent_write_cache_pages.isra.36.constprop.53+0x23b/0x350 [btrfs]
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028f4fe>]
extent_writepages+0x4e/0x60 [btrfs]
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8123c64d>]
__writeback_single_inode+0x3d/0x3b0
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8123ce8a>]
writeback_sb_inodes+0x20a/0x440
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8123d147>]
__writeback_inodes_wb+0x87/0xb0
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8123d49d>]
wb_writeback+0x28d/0x330
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8123dbe2>]
wb_workfn+0x222/0x3f0
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff810950ed>]
process_one_work+0x1ed/0x4e0
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff81095427>]
worker_thread+0x47/0x4c0
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8109affd>]
kthread+0xbd/0xe0
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff816bb71f>]
ret_from_fork+0x1f/0x40
Ago 29 10:25:33 ronanarraes-osd kernel: DWARF2 unwinder stuck at
ret_from_fork+0x1f/0x40
Ago 29 10:25:33 ronanarraes-osd kernel: 
Ago 29 10:25:33 ronanarraes-osd kernel: Leftover inexact backtrace:
Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8109af40>] ?
kthread_worker_fn+0x170/0x170

Ago 29 10:34:51 ronanarraes-osd kernel: ------------[ cut here ]-------
-----
Ago 29 10:34:51 ronanarraes-osd kernel: WARNING: CPU: 6 PID: 27335 at
../fs/btrfs/inode.c:9306 btrfs_destroy_inode+0x23f/0x2b0 [btrfs]
Ago 29 10:34:51 ronanarraes-osd kernel: Modules linked in: fuse
nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit
af_packet iscsi_ibft iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6
xt_tcpudp nf_
Ago 29 10:34:51 ronanarraes-osd kernel:  mei_wdt sysimgblt
iTCO_vendor_support i2c_i801 tpm_infineon tpm_tis tpm ioatdma
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
aes_x86_64 lrw sparse_keymap
Ago 29 10:34:51 ronanarraes-osd kernel: CPU: 6 PID: 27335 Comm: Cache2
I/O Tainted: P        W  O    4.7.1-1-default #1
Ago 29 10:34:51 ronanarraes-osd kernel: Hardware name: Hewlett-Packard
HP Z820 Workstation/158B, BIOS J63 v03.65 12/19/2013
Ago 29 10:34:51 ronanarraes-osd kernel:  0000000000000000
ffffffff81393104 0000000000000000 0000000000000000
Ago 29 10:34:51 ronanarraes-osd kernel:  ffffffff8107ca1e
0000000000000000 ffff88071b592a80 ffff881000221800
Ago 29 10:34:51 ronanarraes-osd kernel:  0000000000000000
ffff88071b592a80 00000000ffffff9c ffffffffa027dabf
Ago 29 10:34:51 ronanarraes-osd kernel: Call Trace:
Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff8102ed5e>]
dump_trace+0x5e/0x320
Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff8102f12c>]
show_stack_log_lvl+0x10c/0x180
Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff8102fe41>]
show_stack+0x21/0x40
Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff81393104>]
dump_stack+0x5c/0x78
Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff8107ca1e>]
__warn+0xbe/0xe0
Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffffa027dabf>]
btrfs_destroy_inode+0x23f/0x2b0 [btrfs]
Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff8121f6d1>]
do_unlinkat+0x131/0x310
Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff816bb4f6>]
entry_SYSCALL_64_fastpath+0x1e/0xa8
Ago 29 10:34:51 ronanarraes-osd kernel: DWARF2 unwinder stuck at
entry_SYSCALL_64_fastpath+0x1e/0xa8
Ago 29 10:34:51 ronanarraes-osd kernel: 
Ago 29 10:34:51 ronanarraes-osd kernel: Leftover inexact backtrace:
Ago 29 10:34:51 ronanarraes-osd kernel: ---[ end trace 5774bd3049f78a61
]---

Ago 29 11:21:19 ronanarraes-osd kernel: ------------[ cut here ]-------
-----
Ago 29 11:21:19 ronanarraes-osd kernel: WARNING: CPU: 18 PID: 16759 at
../fs/btrfs/extent-tree.c:4303
btrfs_free_reserved_data_space_noquota+0xfe/0x110 [btrfs]
Ago 29 11:21:19 ronanarraes-osd kernel: Modules linked in: fuse
nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit
af_packet iscsi_ibft iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6
xt_tcpudp nf_
Ago 29 11:21:19 ronanarraes-osd kernel:  mei_wdt sysimgblt
iTCO_vendor_support i2c_i801 tpm_infineon tpm_tis tpm ioatdma
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
aes_x86_64 lrw sparse_keymap
Ago 29 11:21:19 ronanarraes-osd kernel: CPU: 18 PID: 16759 Comm:
kworker/u65:2 Tainted: P        W  O    4.7.1-1-default #1
Ago 29 11:21:19 ronanarraes-osd kernel: Hardware name: Hewlett-Packard
HP Z820 Workstation/158B, BIOS J63 v03.65 12/19/2013
Ago 29 11:21:19 ronanarraes-osd kernel: Workqueue: writeback wb_workfn
(flush-btrfs-1)
Ago 29 11:21:19 ronanarraes-osd kernel:  0000000000000000
ffffffff81393104 0000000000000000 0000000000000000
Ago 29 11:21:19 ronanarraes-osd kernel:  ffffffff8107ca1e
ffff881000221800 0000000000001000 ffff88082ff06400
Ago 29 11:21:19 ronanarraes-osd kernel:  ffff8807b11b6784
0000000000001000 ffff8806acb1f73c ffffffffa025098e
Ago 29 11:21:19 ronanarraes-osd kernel: Call Trace:
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8102ed5e>]
dump_trace+0x5e/0x320
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8102f12c>]
show_stack_log_lvl+0x10c/0x180
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8102fe41>]
show_stack+0x21/0x40
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff81393104>]
dump_stack+0x5c/0x78
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8107ca1e>]
__warn+0xbe/0xe0
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa025098e>]
btrfs_free_reserved_data_space_noquota+0xfe/0x110 [btrfs]
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa026d036>]
btrfs_clear_bit_hook+0x296/0x380 [btrfs]
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028a755>]
clear_state_bit+0x55/0x1d0 [btrfs]
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028aa0d>]
__clear_extent_bit+0x13d/0x3f0 [btrfs]
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028b8d2>]
extent_clear_unlock_delalloc+0x62/0x280 [btrfs]
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa0272c19>]
cow_file_range+0x299/0x440 [btrfs]
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa0273cf2>]
run_delalloc_range+0x392/0x3b0 [btrfs]
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028c090>]
writepage_delalloc.isra.40+0x100/0x170 [btrfs]
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028e9d3>]
__extent_writepage+0xc3/0x340 [btrfs]
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028ee8b>]
extent_write_cache_pages.isra.36.constprop.53+0x23b/0x350 [btrfs]
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028f4fe>]
extent_writepages+0x4e/0x60 [btrfs]
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8123c64d>]
__writeback_single_inode+0x3d/0x3b0
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8123ce8a>]
writeback_sb_inodes+0x20a/0x440
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8123d147>]
__writeback_inodes_wb+0x87/0xb0
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8123d49d>]
wb_writeback+0x28d/0x330
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8123dbe2>]
wb_workfn+0x222/0x3f0
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff810950ed>]
process_one_work+0x1ed/0x4e0
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff81095427>]
worker_thread+0x47/0x4c0
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8109affd>]
kthread+0xbd/0xe0
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff816bb71f>]
ret_from_fork+0x1f/0x40
Ago 29 11:21:19 ronanarraes-osd kernel: DWARF2 unwinder stuck at
ret_from_fork+0x1f/0x40
Ago 29 11:21:19 ronanarraes-osd kernel: 
Ago 29 11:21:19 ronanarraes-osd kernel: Leftover inexact backtrace:
Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8109af40>] ?
kthread_worker_fn+0x170/0x170
Ago 29 11:21:19 ronanarraes-osd kernel: ---[ end trace 5774bd3049f78a62
]---

Ago 29 12:06:07 ronanarraes-osd kernel: ------------[ cut here ]-------
-----
Ago 29 12:06:07 ronanarraes-osd kernel: WARNING: CPU: 3 PID: 27335 at
../fs/btrfs/inode.c:9306 btrfs_destroy_inode+0x23f/0x2b0 [btrfs]
Ago 29 12:06:07 ronanarraes-osd kernel: Modules linked in: fuse
nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit
af_packet iscsi_ibft iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6
xt_tcpudp nf_
Ago 29 12:06:07 ronanarraes-osd kernel:  mei_wdt sysimgblt
iTCO_vendor_support i2c_i801 tpm_infineon tpm_tis tpm ioatdma
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
aes_x86_64 lrw sparse_keymap
Ago 29 12:06:07 ronanarraes-osd kernel: CPU: 3 PID: 27335 Comm: Cache2
I/O Tainted: P        W  O    4.7.1-1-default #1
Ago 29 12:06:07 ronanarraes-osd kernel: Hardware name: Hewlett-Packard
HP Z820 Workstation/158B, BIOS J63 v03.65 12/19/2013
Ago 29 12:06:07 ronanarraes-osd kernel:  0000000000000000
ffffffff81393104 0000000000000000 0000000000000000
Ago 29 12:06:07 ronanarraes-osd kernel:  ffffffff8107ca1e
0000000000000000 ffff88071b5eeb00 ffff881000221800
Ago 29 12:06:07 ronanarraes-osd kernel:  0000000000000000
ffff88071b5eeb00 00000000ffffff9c ffffffffa027dabf
Ago 29 12:06:07 ronanarraes-osd kernel: Call Trace:
Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff8102ed5e>]
dump_trace+0x5e/0x320
Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff8102f12c>]
show_stack_log_lvl+0x10c/0x180
Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff8102fe41>]
show_stack+0x21/0x40
Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff81393104>]
dump_stack+0x5c/0x78
Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff8107ca1e>]
__warn+0xbe/0xe0
Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffffa027dabf>]
btrfs_destroy_inode+0x23f/0x2b0 [btrfs]
Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff8121f6d1>]
do_unlinkat+0x131/0x310
Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff816bb4f6>]
entry_SYSCALL_64_fastpath+0x1e/0xa8
Ago 29 12:06:07 ronanarraes-osd kernel: DWARF2 unwinder stuck at
entry_SYSCALL_64_fastpath+0x1e/0xa8
Ago 29 12:06:07 ronanarraes-osd kernel: 
Ago 29 12:06:07 ronanarraes-osd kernel: Leftover inexact backtrace:
Ago 29 12:06:07 ronanarraes-osd kernel: ---[ end trace 5774bd3049f78a63
]---

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-29 15:52   ` Ronan Arraes Jardim Chagas
@ 2016-08-29 22:25     ` Jeff Mahoney
  2016-08-30  2:12     ` Wang Xiaoguang
  1 sibling, 0 replies; 82+ messages in thread
From: Jeff Mahoney @ 2016-08-29 22:25 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Wang Xiaoguang, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 16070 bytes --]

On 8/29/16 11:52 AM, Ronan Arraes Jardim Chagas wrote:
> Hi guys,
> 
> I just have the problem again. Now, it happens during the lunch time
> when the machine was idle. Only the system processes were running. It
> was not the first time that I saw this problem just after lunch when
> the machine stayed idle for a long period (+- 1h). 

I was going to suggest that this was due to the fsync speedup patch that
we were carrying but has since landed upstream in v4.8, but the
Tumbleweed kernel doesn't contain that patch.

It looks like we have some digging to do.

-Jeff

> Here is the information requested:
> 
> /sys/fs/btrfs/$UUID/allocation/data
> 
> ./bytes_may_use
> 0
> ./bytes_pinned
> 0
> ./bytes_reserved
> 0
> ./bytes_used
> 36128374784
> ./disk_total
> 37589352448
> ./disk_used
> 36128374784
> ./flags
> 1
> ./total_bytes
> 37589352448
> ./total_bytes_pinned
> 20339560448
> ./single/total_bytes
> 37589352448
> ./single/used_bytes
> 36128374784
> 
> /sys/fs/btrfs/$UUID/allocation/metadata
> 
> ./bytes_may_use
> 84974452736
> ./bytes_pinned
> 0
> ./bytes_reserved
> 0
> ./bytes_used
> 977354752
> ./disk_total
> 4294967296
> ./disk_used
> 1954709504
> ./flags
> 4
> ./total_bytes
> 2147483648
> ./total_bytes_pinned
> -57851904
> ./dup/total_bytes
> 2147483648
> ./dup/used_bytes
> 977354752
> 
> # btrfs fi usage /
> Overall:
>     Device size:		   1.26TiB
>     Device allocated:		  39.07GiB
>     Device unallocated:		   1.22TiB
>     Device missing:		     0.00B
>     Used:			  35.29GiB
>     Free (estimated):		   1.22TiB	(min: 625.93GiB)
>     Data ratio:			      1.00
>     Metadata ratio:		      2.00
>     Global reserve:		 320.00MiB	(used: 0.00B)
> 
> Data,single: Size:35.01GiB, Used:33.47GiB
>    /dev/sda6	  35.01GiB
> 
> Metadata,DUP: Size:2.00GiB, Used:932.00MiB
>    /dev/sda6	   4.00GiB
> 
> System,DUP: Size:32.00MiB, Used:16.00KiB
>    /dev/sda6	  64.00MiB
> 
> Unallocated:
>    /dev/sda6	   1.22TiB
> 
> # btrfs fi df /
> Data, single: total=35.01GiB, used=33.47GiB
> System, DUP: total=32.00MiB, used=16.00KiB
> Metadata, DUP: total=2.00GiB, used=932.09MiB
> GlobalReserve, single: total=320.00MiB, used=0.0
> 
> I also saw the following information in `journalctl`:
> 
> Ago 29 10:25:33 ronanarraes-osd kernel: ------------[ cut here ]-------
> -----
> Ago 29 10:25:33 ronanarraes-osd kernel: WARNING: CPU: 4 PID: 30424 at
> ../fs/btrfs/extent-tree.c:4303
> btrfs_free_reserved_data_space_noquota+0xfe/0x110 [btrfs]
> Ago 29 10:25:33 ronanarraes-osd kernel: Modules linked in: fuse
> nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit
> af_packet iscsi_ibft iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6
> xt_tcpudp nf_
> Ago 29 10:25:33 ronanarraes-osd kernel:  mei_wdt sysimgblt
> iTCO_vendor_support i2c_i801 tpm_infineon tpm_tis tpm ioatdma
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
> aes_x86_64 lrw sparse_keymap
> Ago 29 10:25:33 ronanarraes-osd kernel: CPU: 4 PID: 30424 Comm:
> kworker/u65:1 Tainted: P           O    4.7.1-1-default #1
> Ago 29 10:25:33 ronanarraes-osd kernel: Hardware name: Hewlett-Packard
> HP Z820 Workstation/158B, BIOS J63 v03.65 12/19/2013
> Ago 29 10:25:33 ronanarraes-osd kernel: Workqueue: writeback wb_workfn
> (flush-btrfs-1)
> Ago 29 10:25:33 ronanarraes-osd kernel:  0000000000000000
> ffffffff81393104 0000000000000000 0000000000000000
> Ago 29 10:25:33 ronanarraes-osd kernel:  ffffffff8107ca1e
> ffff88100027c800 0000000000001000 ffff88082ff06400
> Ago 29 10:25:33 ronanarraes-osd kernel:  ffff88100c7af784
> 0000000000001000 ffff8805bd60f6cc ffffffffa025098e
> Ago 29 10:25:33 ronanarraes-osd kernel: Call Trace:
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8102ed5e>]
> dump_trace+0x5e/0x320
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8102f12c>]
> show_stack_log_lvl+0x10c/0x180
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8102fe41>]
> show_stack+0x21/0x40
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff81393104>]
> dump_stack+0x5c/0x78
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8107ca1e>]
> __warn+0xbe/0xe0
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa025098e>]
> btrfs_free_reserved_data_space_noquota+0xfe/0x110 [btrfs]
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa026d036>]
> btrfs_clear_bit_hook+0x296/0x380 [btrfs]
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028a755>]
> clear_state_bit+0x55/0x1d0 [btrfs]
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028aa0d>]
> __clear_extent_bit+0x13d/0x3f0 [btrfs]
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028b8d2>]
> extent_clear_unlock_delalloc+0x62/0x280 [btrfs]
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa0273722>]
> run_delalloc_nocow+0x962/0xba0 [btrfs]
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa0273cbf>]
> run_delalloc_range+0x35f/0x3b0 [btrfs]
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028c090>]
> writepage_delalloc.isra.40+0x100/0x170 [btrfs]
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028e9d3>]
> __extent_writepage+0xc3/0x340 [btrfs]
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028ee8b>]
> extent_write_cache_pages.isra.36.constprop.53+0x23b/0x350 [btrfs]
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffffa028f4fe>]
> extent_writepages+0x4e/0x60 [btrfs]
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8123c64d>]
> __writeback_single_inode+0x3d/0x3b0
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8123ce8a>]
> writeback_sb_inodes+0x20a/0x440
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8123d147>]
> __writeback_inodes_wb+0x87/0xb0
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8123d49d>]
> wb_writeback+0x28d/0x330
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8123dbe2>]
> wb_workfn+0x222/0x3f0
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff810950ed>]
> process_one_work+0x1ed/0x4e0
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff81095427>]
> worker_thread+0x47/0x4c0
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8109affd>]
> kthread+0xbd/0xe0
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff816bb71f>]
> ret_from_fork+0x1f/0x40
> Ago 29 10:25:33 ronanarraes-osd kernel: DWARF2 unwinder stuck at
> ret_from_fork+0x1f/0x40
> Ago 29 10:25:33 ronanarraes-osd kernel: 
> Ago 29 10:25:33 ronanarraes-osd kernel: Leftover inexact backtrace:
> Ago 29 10:25:33 ronanarraes-osd kernel:  [<ffffffff8109af40>] ?
> kthread_worker_fn+0x170/0x170
> 
> Ago 29 10:34:51 ronanarraes-osd kernel: ------------[ cut here ]-------
> -----
> Ago 29 10:34:51 ronanarraes-osd kernel: WARNING: CPU: 6 PID: 27335 at
> ../fs/btrfs/inode.c:9306 btrfs_destroy_inode+0x23f/0x2b0 [btrfs]
> Ago 29 10:34:51 ronanarraes-osd kernel: Modules linked in: fuse
> nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit
> af_packet iscsi_ibft iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6
> xt_tcpudp nf_
> Ago 29 10:34:51 ronanarraes-osd kernel:  mei_wdt sysimgblt
> iTCO_vendor_support i2c_i801 tpm_infineon tpm_tis tpm ioatdma
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
> aes_x86_64 lrw sparse_keymap
> Ago 29 10:34:51 ronanarraes-osd kernel: CPU: 6 PID: 27335 Comm: Cache2
> I/O Tainted: P        W  O    4.7.1-1-default #1
> Ago 29 10:34:51 ronanarraes-osd kernel: Hardware name: Hewlett-Packard
> HP Z820 Workstation/158B, BIOS J63 v03.65 12/19/2013
> Ago 29 10:34:51 ronanarraes-osd kernel:  0000000000000000
> ffffffff81393104 0000000000000000 0000000000000000
> Ago 29 10:34:51 ronanarraes-osd kernel:  ffffffff8107ca1e
> 0000000000000000 ffff88071b592a80 ffff881000221800
> Ago 29 10:34:51 ronanarraes-osd kernel:  0000000000000000
> ffff88071b592a80 00000000ffffff9c ffffffffa027dabf
> Ago 29 10:34:51 ronanarraes-osd kernel: Call Trace:
> Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff8102ed5e>]
> dump_trace+0x5e/0x320
> Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff8102f12c>]
> show_stack_log_lvl+0x10c/0x180
> Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff8102fe41>]
> show_stack+0x21/0x40
> Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff81393104>]
> dump_stack+0x5c/0x78
> Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff8107ca1e>]
> __warn+0xbe/0xe0
> Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffffa027dabf>]
> btrfs_destroy_inode+0x23f/0x2b0 [btrfs]
> Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff8121f6d1>]
> do_unlinkat+0x131/0x310
> Ago 29 10:34:51 ronanarraes-osd kernel:  [<ffffffff816bb4f6>]
> entry_SYSCALL_64_fastpath+0x1e/0xa8
> Ago 29 10:34:51 ronanarraes-osd kernel: DWARF2 unwinder stuck at
> entry_SYSCALL_64_fastpath+0x1e/0xa8
> Ago 29 10:34:51 ronanarraes-osd kernel: 
> Ago 29 10:34:51 ronanarraes-osd kernel: Leftover inexact backtrace:
> Ago 29 10:34:51 ronanarraes-osd kernel: ---[ end trace 5774bd3049f78a61
> ]---
> 
> Ago 29 11:21:19 ronanarraes-osd kernel: ------------[ cut here ]-------
> -----
> Ago 29 11:21:19 ronanarraes-osd kernel: WARNING: CPU: 18 PID: 16759 at
> ../fs/btrfs/extent-tree.c:4303
> btrfs_free_reserved_data_space_noquota+0xfe/0x110 [btrfs]
> Ago 29 11:21:19 ronanarraes-osd kernel: Modules linked in: fuse
> nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit
> af_packet iscsi_ibft iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6
> xt_tcpudp nf_
> Ago 29 11:21:19 ronanarraes-osd kernel:  mei_wdt sysimgblt
> iTCO_vendor_support i2c_i801 tpm_infineon tpm_tis tpm ioatdma
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
> aes_x86_64 lrw sparse_keymap
> Ago 29 11:21:19 ronanarraes-osd kernel: CPU: 18 PID: 16759 Comm:
> kworker/u65:2 Tainted: P        W  O    4.7.1-1-default #1
> Ago 29 11:21:19 ronanarraes-osd kernel: Hardware name: Hewlett-Packard
> HP Z820 Workstation/158B, BIOS J63 v03.65 12/19/2013
> Ago 29 11:21:19 ronanarraes-osd kernel: Workqueue: writeback wb_workfn
> (flush-btrfs-1)
> Ago 29 11:21:19 ronanarraes-osd kernel:  0000000000000000
> ffffffff81393104 0000000000000000 0000000000000000
> Ago 29 11:21:19 ronanarraes-osd kernel:  ffffffff8107ca1e
> ffff881000221800 0000000000001000 ffff88082ff06400
> Ago 29 11:21:19 ronanarraes-osd kernel:  ffff8807b11b6784
> 0000000000001000 ffff8806acb1f73c ffffffffa025098e
> Ago 29 11:21:19 ronanarraes-osd kernel: Call Trace:
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8102ed5e>]
> dump_trace+0x5e/0x320
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8102f12c>]
> show_stack_log_lvl+0x10c/0x180
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8102fe41>]
> show_stack+0x21/0x40
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff81393104>]
> dump_stack+0x5c/0x78
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8107ca1e>]
> __warn+0xbe/0xe0
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa025098e>]
> btrfs_free_reserved_data_space_noquota+0xfe/0x110 [btrfs]
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa026d036>]
> btrfs_clear_bit_hook+0x296/0x380 [btrfs]
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028a755>]
> clear_state_bit+0x55/0x1d0 [btrfs]
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028aa0d>]
> __clear_extent_bit+0x13d/0x3f0 [btrfs]
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028b8d2>]
> extent_clear_unlock_delalloc+0x62/0x280 [btrfs]
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa0272c19>]
> cow_file_range+0x299/0x440 [btrfs]
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa0273cf2>]
> run_delalloc_range+0x392/0x3b0 [btrfs]
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028c090>]
> writepage_delalloc.isra.40+0x100/0x170 [btrfs]
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028e9d3>]
> __extent_writepage+0xc3/0x340 [btrfs]
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028ee8b>]
> extent_write_cache_pages.isra.36.constprop.53+0x23b/0x350 [btrfs]
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffffa028f4fe>]
> extent_writepages+0x4e/0x60 [btrfs]
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8123c64d>]
> __writeback_single_inode+0x3d/0x3b0
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8123ce8a>]
> writeback_sb_inodes+0x20a/0x440
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8123d147>]
> __writeback_inodes_wb+0x87/0xb0
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8123d49d>]
> wb_writeback+0x28d/0x330
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8123dbe2>]
> wb_workfn+0x222/0x3f0
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff810950ed>]
> process_one_work+0x1ed/0x4e0
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff81095427>]
> worker_thread+0x47/0x4c0
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8109affd>]
> kthread+0xbd/0xe0
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff816bb71f>]
> ret_from_fork+0x1f/0x40
> Ago 29 11:21:19 ronanarraes-osd kernel: DWARF2 unwinder stuck at
> ret_from_fork+0x1f/0x40
> Ago 29 11:21:19 ronanarraes-osd kernel: 
> Ago 29 11:21:19 ronanarraes-osd kernel: Leftover inexact backtrace:
> Ago 29 11:21:19 ronanarraes-osd kernel:  [<ffffffff8109af40>] ?
> kthread_worker_fn+0x170/0x170
> Ago 29 11:21:19 ronanarraes-osd kernel: ---[ end trace 5774bd3049f78a62
> ]---
> 
> Ago 29 12:06:07 ronanarraes-osd kernel: ------------[ cut here ]-------
> -----
> Ago 29 12:06:07 ronanarraes-osd kernel: WARNING: CPU: 3 PID: 27335 at
> ../fs/btrfs/inode.c:9306 btrfs_destroy_inode+0x23f/0x2b0 [btrfs]
> Ago 29 12:06:07 ronanarraes-osd kernel: Modules linked in: fuse
> nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit
> af_packet iscsi_ibft iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6
> xt_tcpudp nf_
> Ago 29 12:06:07 ronanarraes-osd kernel:  mei_wdt sysimgblt
> iTCO_vendor_support i2c_i801 tpm_infineon tpm_tis tpm ioatdma
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
> aes_x86_64 lrw sparse_keymap
> Ago 29 12:06:07 ronanarraes-osd kernel: CPU: 3 PID: 27335 Comm: Cache2
> I/O Tainted: P        W  O    4.7.1-1-default #1
> Ago 29 12:06:07 ronanarraes-osd kernel: Hardware name: Hewlett-Packard
> HP Z820 Workstation/158B, BIOS J63 v03.65 12/19/2013
> Ago 29 12:06:07 ronanarraes-osd kernel:  0000000000000000
> ffffffff81393104 0000000000000000 0000000000000000
> Ago 29 12:06:07 ronanarraes-osd kernel:  ffffffff8107ca1e
> 0000000000000000 ffff88071b5eeb00 ffff881000221800
> Ago 29 12:06:07 ronanarraes-osd kernel:  0000000000000000
> ffff88071b5eeb00 00000000ffffff9c ffffffffa027dabf
> Ago 29 12:06:07 ronanarraes-osd kernel: Call Trace:
> Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff8102ed5e>]
> dump_trace+0x5e/0x320
> Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff8102f12c>]
> show_stack_log_lvl+0x10c/0x180
> Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff8102fe41>]
> show_stack+0x21/0x40
> Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff81393104>]
> dump_stack+0x5c/0x78
> Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff8107ca1e>]
> __warn+0xbe/0xe0
> Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffffa027dabf>]
> btrfs_destroy_inode+0x23f/0x2b0 [btrfs]
> Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff8121f6d1>]
> do_unlinkat+0x131/0x310
> Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff816bb4f6>]
> entry_SYSCALL_64_fastpath+0x1e/0xa8
> Ago 29 12:06:07 ronanarraes-osd kernel: DWARF2 unwinder stuck at
> entry_SYSCALL_64_fastpath+0x1e/0xa8
> Ago 29 12:06:07 ronanarraes-osd kernel: 
> Ago 29 12:06:07 ronanarraes-osd kernel: Leftover inexact backtrace:
> Ago 29 12:06:07 ronanarraes-osd kernel: ---[ end trace 5774bd3049f78a63
> ]---
> 
> Best regards,
> Ronan Arraes
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-29 15:52   ` Ronan Arraes Jardim Chagas
  2016-08-29 22:25     ` Jeff Mahoney
@ 2016-08-30  2:12     ` Wang Xiaoguang
  2016-08-30 12:50       ` Ronan Arraes Jardim Chagas
  1 sibling, 1 reply; 82+ messages in thread
From: Wang Xiaoguang @ 2016-08-30  2:12 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, linux-btrfs

hello,

On 08/29/2016 11:52 PM, Ronan Arraes Jardim Chagas wrote:
> Hi guys,
>
> I just have the problem again. Now, it happens during the lunch time
> when the machine was idle. Only the system processes were running. It
> was not the first time that I saw this problem just after lunch when
> the machine stayed idle for a long period (+- 1h).
>
> Here is the information requested:
>
> /sys/fs/btrfs/$UUID/allocation/data
>
> ./bytes_may_use
> 0
> ./bytes_pinned
> 0
> ./bytes_reserved
> 0
> ./bytes_used
> 36128374784
> ./disk_total
> 37589352448
> ./disk_used
> 36128374784
> ./flags
> 1
> ./total_bytes
> 37589352448
> ./total_bytes_pinned
> 20339560448
> ./single/total_bytes
> 37589352448
> ./single/used_bytes
> 36128374784
>
> /sys/fs/btrfs/$UUID/allocation/metadata
>
> ./bytes_may_use
> 84974452736
> ./bytes_pinned
> 0
> ./bytes_reserved
> 0
> ./bytes_used
> 977354752
> ./disk_total
> 4294967296
> ./disk_used
> 1954709504
> ./flags
> 4
> ./total_bytes
> 2147483648
> ./total_bytes_pinned
> -57851904
> ./dup/total_bytes
> 2147483648
> ./dup/used_bytes
> 977354752

For metadata, "bytes_may_use" is about 80GB, it's very big,
I think this value is very abnormal.

So this explains why you have huge unallocated space, you still
get ENOSPC error. In kernel btrfs, there is a function should_alloc_chunk()
to determine whether to allocate new chunks(new device space)
  num_bytes = total_bytes - bytes_readonly; it's 2147483648
  num_allocated = bytes_used + bytes_reserved; it's 977354752

if num_allocated < num_bytes * 0.8, it will not allocate new device 
space :) even you
have huge unallocated space.

I think the root reason is that bytes_may_use has some computation error and
is not be converted to bytes_used or bytes_reserved.

I just explain why you get ENOSPC error even with huge unallocated space 
from
codes :)

>
> # btrfs fi usage /
>
> btrfs_destroy_inode+0x23f/0x2b0 [btrfs]
> Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff8121f6d1>]
> do_unlinkat+0x131/0x310
> Ago 29 12:06:07 ronanarraes-osd kernel:  [<ffffffff816bb4f6>]
> entry_SYSCALL_64_fastpath+0x1e/0xa8
> Ago 29 12:06:07 ronanarraes-osd kernel: DWARF2 unwinder stuck at
> entry_SYSCALL_64_fastpath+0x1e/0xa8
> Ago 29 12:06:07 ronanarraes-osd kernel:
> Ago 29 12:06:07 ronanarraes-osd kernel: Leftover inexact backtrace:
> Ago 29 12:06:07 ronanarraes-osd kernel: ---[ end trace 5774bd3049f78a63
> ]---
Yes, I know these WARNINGs, but indeed they are already results,
we don't know the procedures which cause these results.

Can you work out a reproducer for this ENOSPC error, then I can
dig into codes to figure out the true reason.

Regards,
Xiaoguang Wang

>
> Best regards,
> Ronan Arraes
>
>




^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-30  2:12     ` Wang Xiaoguang
@ 2016-08-30 12:50       ` Ronan Arraes Jardim Chagas
  2016-08-30 16:44         ` Chris Murphy
  0 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-08-30 12:50 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs

Hi!

Em Ter, 2016-08-30 às 10:12 +0800, Wang Xiaoguang escreveu:
> For metadata, "bytes_may_use" is about 80GB, it's very big,
> I think this value is very abnormal.
> 
> So this explains why you have huge unallocated space, you still
> get ENOSPC error. In kernel btrfs, there is a function
> should_alloc_chunk()
> to determine whether to allocate new chunks(new device space)
>   num_bytes = total_bytes - bytes_readonly; it's 2147483648
>   num_allocated = bytes_used + bytes_reserved; it's 977354752
> 
> if num_allocated < num_bytes * 0.8, it will not allocate new device 
> space :) even you
> have huge unallocated space.
> 
> I think the root reason is that bytes_may_use has some computation
> error and
> is not be converted to bytes_used or bytes_reserved.
> 
> I just explain why you get ENOSPC error even with huge unallocated
> space 
> from
> codes :)
> 

Thanks! At least we known why ENOSPC is happening.

> Can you work out a reproducer for this ENOSPC error, then I can
> dig into codes to figure out the true reason.

Unfortunately I failed in every attempt to trigger the problem. It
happens randomly and I could not figure out yet what was triggering it.
First, I though it was related to a build process inside a chroot jail,
but then I see the problem happening after the computer being idle for
a long time (+- 1h). So, no clues yet :(

Is there any workaround I can do?

Best regards,
Ronan Arraes



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-30 12:50       ` Ronan Arraes Jardim Chagas
@ 2016-08-30 16:44         ` Chris Murphy
  2016-08-30 16:57           ` Ronan Arraes Jardim Chagas
  2016-08-31 20:49           ` Ronan Arraes Jardim Chagas
  0 siblings, 2 replies; 82+ messages in thread
From: Chris Murphy @ 2016-08-30 16:44 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas; +Cc: Wang Xiaoguang, Btrfs BTRFS

On Tue, Aug 30, 2016 at 6:50 AM, Ronan Arraes Jardim Chagas
<ronisbr@gmail.com> wrote:
> Hi!
>
> Em Ter, 2016-08-30 às 10:12 +0800, Wang Xiaoguang escreveu:
>> For metadata, "bytes_may_use" is about 80GB, it's very big,
>> I think this value is very abnormal.
>>
>> So this explains why you have huge unallocated space, you still
>> get ENOSPC error. In kernel btrfs, there is a function
>> should_alloc_chunk()
>> to determine whether to allocate new chunks(new device space)
>>   num_bytes = total_bytes - bytes_readonly; it's 2147483648
>>   num_allocated = bytes_used + bytes_reserved; it's 977354752
>>
>> if num_allocated < num_bytes * 0.8, it will not allocate new device
>> space :) even you
>> have huge unallocated space.
>>
>> I think the root reason is that bytes_may_use has some computation
>> error and
>> is not be converted to bytes_used or bytes_reserved.
>>
>> I just explain why you get ENOSPC error even with huge unallocated
>> space
>> from
>> codes :)
>>
>
> Thanks! At least we known why ENOSPC is happening.
>
>> Can you work out a reproducer for this ENOSPC error, then I can
>> dig into codes to figure out the true reason.
>
> Unfortunately I failed in every attempt to trigger the problem. It
> happens randomly and I could not figure out yet what was triggering it.
> First, I though it was related to a build process inside a chroot jail,
> but then I see the problem happening after the computer being idle for
> a long time (+- 1h). So, no clues yet :(
>
> Is there any workaround I can do?

It sounds related to read-only snapshots to me. I wonder if this
system has something busy that's writing to a file, database, even
maybe something just spamming journald, and then there's a read-only
snapshot during the write, which then triggers the enospc.

Ronan, if you're given a work around, then it's even less likely the
bug gets fixed. But if you can disable snapper snapshots entirely and
the problem doesn't happen; or if you can increase the frequency of
snapper snapshots and the problem happens more often, that might help
narrow it down to a point where it's more easily reproduced. If it's
not related, that's still useful to know.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-30 16:44         ` Chris Murphy
@ 2016-08-30 16:57           ` Ronan Arraes Jardim Chagas
  2016-08-31 20:49           ` Ronan Arraes Jardim Chagas
  1 sibling, 0 replies; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-08-30 16:57 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Wang Xiaoguang, Btrfs BTRFS

Em Ter, 2016-08-30 às 10:44 -0600, Chris Murphy escreveu:
> It sounds related to read-only snapshots to me. I wonder if this
> system has something busy that's writing to a file, database, even
> maybe something just spamming journald, and then there's a read-only
> snapshot during the write, which then triggers the enospc.
> 

I saw the problem yesterday after lunch time (13:00) and the last
snapper snapshot was taken at 10:17:

snapper list
Tipo   | #  | Pre # | Data                         | Usuário | Limpeza
| Descrição           | Dados de usuário
-------+----+-------+------------------------------+----------+------
---+-----------------------+------------------
single | 0  |       |                              |
root     |         | current               |                  
single | 1  |       | Ter 16 Ago 2016 15:07:25 BRT |
root     |         | first root filesystem |                  
single | 2  |       | Ter 16 Ago 2016 15:15:57 BRT | root     |
number  | after installation    | important=yes    
pre    | 4  |       | Ter 16 Ago 2016 15:26:44 BRT | root     |
number  | zypp(y2base)          | important=yes    
post   | 5  | 4     | Ter 16 Ago 2016 16:12:46 BRT | root     |
number  |                       | important=yes    
pre    | 29 |       | Ter 16 Ago 2016 18:02:43 BRT | root     |
number  | zypp(zypper)          | important=yes    
post   | 30 | 29    | Ter 16 Ago 2016 18:07:34 BRT | root     |
number  |                       | important=yes    
pre    | 45 |       | Seg 22 Ago 2016 13:59:45 BRT | root     |
number  | zypp(zypper)          | important=yes    
post   | 46 | 45    | Seg 22 Ago 2016 14:11:17 BRT | root     |
number  |                       | important=yes    
pre    | 89 |       | Seg 29 Ago 2016 09:56:19 BRT | root     |
number  | yast sw_single        |                  
pre    | 90 |       | Seg 29 Ago 2016 10:00:00 BRT | root     |
number  | zypp(y2base)          | important=no     
post   | 91 | 90    | Seg 29 Ago 2016 10:01:11 BRT | root     |
number  |                       | important=no     
pre    | 92 |       | Seg 29 Ago 2016 10:07:01 BRT | root     |
number  | zypp(y2base)          | important=no     
post   | 93 | 92    | Seg 29 Ago 2016 10:07:10 BRT | root     |
number  |                       | important=no     
pre    | 94 |       | Seg 29 Ago 2016 10:12:32 BRT | root     |
number  | zypp(y2base)          | important=no     
post   | 95 | 94    | Seg 29 Ago 2016 10:14:25 BRT | root     |
number  |                       | important=no     
post   | 96 | 89    | Seg 29 Ago 2016 10:17:17 BRT | root     |
number  |                       |                 

> Ronan, if you're given a work around, then it's even less likely the
> bug gets fixed. But if you can disable snapper snapshots entirely and
> the problem doesn't happen; or if you can increase the frequency of
> snapper snapshots and the problem happens more often, that might help
> narrow it down to a point where it's more easily reproduced. If it's
> not related, that's still useful to know.

I agree with you. The problem is that since this is a production
machine, it is kind very problematic to have so many reboots that
occurs randomly.

I will install something using zypper, which will trigger snapper, and
see if the problem will be triggered. I will be out of the office this
afternoon, so the machine will be on idle.

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-30 16:44         ` Chris Murphy
  2016-08-30 16:57           ` Ronan Arraes Jardim Chagas
@ 2016-08-31 20:49           ` Ronan Arraes Jardim Chagas
  2016-08-31 21:44             ` Chris Murphy
  2016-09-02 14:09             ` Jeff Mahoney
  1 sibling, 2 replies; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-08-31 20:49 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Wang Xiaoguang, Btrfs BTRFS

Hi guys!

And the problem happened again. This time, I was only using Mozilla
Firefox. I could get the very first message after the error. I hope it
brings more information:

[28039.672199] ------------[ cut here ]------------
[28039.672253] WARNING: CPU: 3 PID: 31800 at ../fs/btrfs/qgroup.c:2667
btrfs_qgroup_free_meta+0x88/0x90 [btrfs]
[28039.672255] Modules linked in: fuse nf_log_ipv6 xt_pkttype
nf_log_ipv4 nf_log_common xt_LOG xt_limit af_packet iscsi_ibft
iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6 xt_tcpudp
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT nf_reject_ipv4
iptable_raw xt_CT nvidia_drm(PO) nvidia_modeset(PO) iptable_filter
nvidia(PO) ip6table_mangle nf_conntrack_netbios_ns
nf_conntrack_broadcast drm_kms_helper nf_conntrack_ipv4 drm
nf_defrag_ipv4 fb_sys_fops snd_hda_codec_hdmi joydev
snd_hda_codec_realtek ip_tables syscopyarea snd_hda_codec_generic
xt_conntrack snd_hda_intel sysfillrect intel_rapl sb_edac edac_core
snd_hda_codec hp_wmi x86_pkg_temp_thermal intel_powerclamp snd_hda_core
snd_hwdep nf_conntrack sparse_keymap sysimgblt coretemp kvm_intel kvm
rfkill irqbypass snd_pcm snd_timer crct10dif_pclmul
[28039.672305]  e1000e crc32_pclmul ghash_clmulni_intel snd aesni_intel
ip6table_filter aes_x86_64 lrw gf128mul glue_helper ablk_helper
iTCO_wdt iTCO_vendor_support mei_wdt ioatdma pcspkr cryptd ip6_tables
ptp lpc_ich fjes i2c_i801 dca mfd_core soundcore pps_core shpchp
tpm_infineon tpm_tis tpm mei_me mei x_tables btrfs xor raid6_pq
hid_generic usbhid crc32c_intel serio_raw xhci_pci ehci_pci sr_mod
firewire_ohci xhci_hcd ehci_hcd cdrom firewire_core crc_itu_t isci
usbcore usb_common libsas ata_generic mpt3sas raid_class
scsi_transport_sas wmi button sg
[28039.672373] CPU: 3 PID: 31800 Comm: gnome-terminal- Tainted:
P        W  O    4.7.1-1-default #1
[28039.672375] Hardware name: Hewlett-Packard HP Z820 Workstation/158B,
BIOS J63 v03.65 12/19/2013
[28039.672378]  0000000000000000 ffffffff81393104 0000000000000000
0000000000000000
[28039.672382]  ffffffff8107ca1e ffff881008780800 0000000000014000
ffff881008780800
[28039.672386]  ffffffffffffffe4 ffff88100b297c00 ffff88053b7e3540
ffffffffa02c9f58
[28039.672390] Call Trace:
[28039.672406]  [<ffffffff8102ed5e>] dump_trace+0x5e/0x320
[28039.672413]  [<ffffffff8102f12c>] show_stack_log_lvl+0x10c/0x180
[28039.672419]  [<ffffffff8102fe41>] show_stack+0x21/0x40
[28039.672425]  [<ffffffff81393104>] dump_stack+0x5c/0x78
[28039.672430]  [<ffffffff8107ca1e>] __warn+0xbe/0xe0
[28039.672461]  [<ffffffffa02c9f58>] btrfs_qgroup_free_meta+0x88/0x90
[btrfs]
[28039.672492]  [<ffffffffa0261023>] start_transaction+0x3c3/0x4f0
[btrfs]
[28039.672521]  [<ffffffffa0271078>] btrfs_create+0x38/0x1d0 [btrfs]
[28039.672528]  [<ffffffff8121e8fb>] path_openat+0x139b/0x14a0
[28039.672535]  [<ffffffff8121fb2e>] do_filp_open+0x7e/0xe0
[28039.672541]  [<ffffffff8120e7a4>] do_sys_open+0x124/0x1f0
[28039.672547]  [<ffffffff816bb4f6>]
entry_SYSCALL_64_fastpath+0x1e/0xa8
[28039.676186] DWARF2 unwinder stuck at
entry_SYSCALL_64_fastpath+0x1e/0xa8

Best regards,
Ronan

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-31 20:49           ` Ronan Arraes Jardim Chagas
@ 2016-08-31 21:44             ` Chris Murphy
  2016-08-31 21:48               ` Chris Murphy
  2016-09-02  0:37               ` Qu Wenruo
  2016-09-02 14:09             ` Jeff Mahoney
  1 sibling, 2 replies; 82+ messages in thread
From: Chris Murphy @ 2016-08-31 21:44 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas; +Cc: Chris Murphy, Wang Xiaoguang, Btrfs BTRFS

On Wed, Aug 31, 2016 at 2:49 PM, Ronan Arraes Jardim Chagas
<ronisbr@gmail.com> wrote:
> Hi guys!
>
> And the problem happened again. This time, I was only using Mozilla
> Firefox. I could get the very first message after the error. I hope it
> brings more information:
>
> [28039.672199] ------------[ cut here ]------------
> [28039.672253] WARNING: CPU: 3 PID: 31800 at ../fs/btrfs/qgroup.c:2667
> btrfs_qgroup_free_meta+0x88/0x90 [btrfs]


Does this file system have quota enabled?

I'm testing this right now and can't even figure out how to determine
when quota is enabled on a Btrfs file system. There's enable, disable,
and rescan. If it's enabled or disabled, I get the same message if I
rescan. If I mount the file system with quota previously enabled,
there is no mount time notification that quota is enabled.

I sincerely hope opensuse isn't enabled quota by default.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-31 21:44             ` Chris Murphy
@ 2016-08-31 21:48               ` Chris Murphy
  2016-08-31 22:47                 ` Jeff Mahoney
  2016-09-02  0:37               ` Qu Wenruo
  1 sibling, 1 reply; 82+ messages in thread
From: Chris Murphy @ 2016-08-31 21:48 UTC (permalink / raw)
  Cc: Ronan Arraes Jardim Chagas, Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

OK it looks like with -w flag I can get a reliable indication of
whether quota is enabled or not:

[root@f24s ~]# btrfs quota enable /mnt/0
[root@f24s ~]# btrfs quota rescan -w /mnt/0
quota rescan started
[root@f24s ~]# btrfs quota disable /mnt/0
[root@f24s ~]# btrfs quota rescan -w /mnt/0
ERROR: quota rescan failed: Invalid argument

So if you did not enable quota support, and aren't sure if it's
enabled you can try 'btrfs quota rescan -w <mp>' but this might
actually be a bad idea, a rescan could take a while if you're actually
using quotas, I have no idea because I don't use them.

Perhaps someone can point out an easier way to determine whether
quotas are enabled?

Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-31 21:48               ` Chris Murphy
@ 2016-08-31 22:47                 ` Jeff Mahoney
  2016-08-31 22:58                   ` Chris Murphy
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff Mahoney @ 2016-08-31 22:47 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Ronan Arraes Jardim Chagas, Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo


[-- Attachment #1.1: Type: text/plain, Size: 1267 bytes --]

On 8/31/16 5:48 PM, Chris Murphy wrote:
> OK it looks like with -w flag I can get a reliable indication of
> whether quota is enabled or not:
> 
> [root@f24s ~]# btrfs quota enable /mnt/0
> [root@f24s ~]# btrfs quota rescan -w /mnt/0
> quota rescan started
> [root@f24s ~]# btrfs quota disable /mnt/0
> [root@f24s ~]# btrfs quota rescan -w /mnt/0
> ERROR: quota rescan failed: Invalid argument
> 
> 
> So if you did not enable quota support, and aren't sure if it's
> enabled you can try 'btrfs quota rescan -w <mp>' but this might
> actually be a bad idea, a rescan could take a while if you're actually
> using quotas, I have no idea because I don't use them.

It can take a while, but the code is smart enough not to get too much in
the way of other activity.  It maintains a progress marker and only does
live accounting on extents that have already been scanned.

> Perhaps someone can point out an easier way to determine whether
> quotas are enabled?

btrfs qgroup show <path>

If you get a message like:
ERROR: can't perform the search - No such file or directory
ERROR: can't list qgroups: No such file or directory

... it means there's no quota root and thus quotas aren't enabled.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-31 22:47                 ` Jeff Mahoney
@ 2016-08-31 22:58                   ` Chris Murphy
  2016-08-31 23:03                     ` Jeff Mahoney
  0 siblings, 1 reply; 82+ messages in thread
From: Chris Murphy @ 2016-08-31 22:58 UTC (permalink / raw)
  To: Jeff Mahoney
  Cc: Chris Murphy, Ronan Arraes Jardim Chagas, Wang Xiaoguang,
	Btrfs BTRFS, Qu Wenruo

On Wed, Aug 31, 2016 at 4:47 PM, Jeff Mahoney <jeffm@suse.com> wrote:
> On 8/31/16 5:48 PM, Chris Murphy wrote:
>> OK it looks like with -w flag I can get a reliable indication of
>> whether quota is enabled or not:
>>
>> [root@f24s ~]# btrfs quota enable /mnt/0
>> [root@f24s ~]# btrfs quota rescan -w /mnt/0
>> quota rescan started
>> [root@f24s ~]# btrfs quota disable /mnt/0
>> [root@f24s ~]# btrfs quota rescan -w /mnt/0
>> ERROR: quota rescan failed: Invalid argument
>>
>>
>> So if you did not enable quota support, and aren't sure if it's
>> enabled you can try 'btrfs quota rescan -w <mp>' but this might
>> actually be a bad idea, a rescan could take a while if you're actually
>> using quotas, I have no idea because I don't use them.
>
> It can take a while, but the code is smart enough not to get too much in
> the way of other activity.  It maintains a progress marker and only does
> live accounting on extents that have already been scanned.
>
>> Perhaps someone can point out an easier way to determine whether
>> quotas are enabled?
>
> btrfs qgroup show <path>

Wow, thanks but that's not obvious at all. man btrfs quota is
described as "btrfs-quota - control the global quota status of a btrfs
filesystem" so it stands to reason the state command for whether it's
enabled or disabled would be in that subcommand not in some other
subcommand.

But this is sidetracking. Does Ronan's call trace showing
/fs/btrfs/qgroup.c:2667
> btrfs_qgroup_free_meta implicate qgroups as a possible source of his problem? That trace would only happen if quotas were enabled, right?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-31 22:58                   ` Chris Murphy
@ 2016-08-31 23:03                     ` Jeff Mahoney
  2016-08-31 23:09                       ` Chris Murphy
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff Mahoney @ 2016-08-31 23:03 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Ronan Arraes Jardim Chagas, Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo


[-- Attachment #1.1: Type: text/plain, Size: 1869 bytes --]

On 8/31/16 6:58 PM, Chris Murphy wrote:
> On Wed, Aug 31, 2016 at 4:47 PM, Jeff Mahoney <jeffm@suse.com> wrote:
>> On 8/31/16 5:48 PM, Chris Murphy wrote:
>>> OK it looks like with -w flag I can get a reliable indication of
>>> whether quota is enabled or not:
>>>
>>> [root@f24s ~]# btrfs quota enable /mnt/0
>>> [root@f24s ~]# btrfs quota rescan -w /mnt/0
>>> quota rescan started
>>> [root@f24s ~]# btrfs quota disable /mnt/0
>>> [root@f24s ~]# btrfs quota rescan -w /mnt/0
>>> ERROR: quota rescan failed: Invalid argument
>>>
>>>
>>> So if you did not enable quota support, and aren't sure if it's
>>> enabled you can try 'btrfs quota rescan -w <mp>' but this might
>>> actually be a bad idea, a rescan could take a while if you're actually
>>> using quotas, I have no idea because I don't use them.
>>
>> It can take a while, but the code is smart enough not to get too much in
>> the way of other activity.  It maintains a progress marker and only does
>> live accounting on extents that have already been scanned.
>>
>>> Perhaps someone can point out an easier way to determine whether
>>> quotas are enabled?
>>
>> btrfs qgroup show <path>
> 
> Wow, thanks but that's not obvious at all. man btrfs quota is
> described as "btrfs-quota - control the global quota status of a btrfs
> filesystem" so it stands to reason the state command for whether it's
> enabled or disabled would be in that subcommand not in some other
> subcommand.

Agreed.  The tools interface has some warts.

> But this is sidetracking. Does Ronan's call trace showing
> /fs/btrfs/qgroup.c:2667
>> btrfs_qgroup_free_meta implicate qgroups as a possible source of his problem? That trace would only happen if quotas were enabled, right?
> 

Yeah.  That warning doesn't get checked unless they're enabled.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-31 23:03                     ` Jeff Mahoney
@ 2016-08-31 23:09                       ` Chris Murphy
  2016-09-01 12:57                         ` Ronan Arraes Jardim Chagas
  0 siblings, 1 reply; 82+ messages in thread
From: Chris Murphy @ 2016-08-31 23:09 UTC (permalink / raw)
  To: Jeff Mahoney
  Cc: Chris Murphy, Ronan Arraes Jardim Chagas, Wang Xiaoguang,
	Btrfs BTRFS, Qu Wenruo

On Wed, Aug 31, 2016 at 5:03 PM, Jeff Mahoney <jeffm@suse.com> wrote:
> On 8/31/16 6:58 PM, Chris Murphy wrote:

> Does Ronan's call trace showing
>> /fs/btrfs/qgroup.c:2667
>>> btrfs_qgroup_free_meta implicate qgroups as a possible source of his problem? That trace would only happen if quotas were enabled, right?
>>
>
> Yeah.  That warning doesn't get checked unless they're enabled.

OK so Ronan, I'm gonna guess the simplest work around for your problem
is to disable quota support, and see if the problem happens again.

If it doesn't happen again then it sounds like the reproduce steps are:

a. enable quota support
b. do something metadata heavy workload that's also maybe hitting
fsync; from opensuse list the example that sometimes causes it:


  osc co home:Ronis_BR/julia
  cd home:Ronis_BR/julia
  osc build --root=`pwd`/jail openSUSE_Tumbleweed x86_64

I wonder if it's easier to hit it on a hard drive, slower fsyncs?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-31 23:09                       ` Chris Murphy
@ 2016-09-01 12:57                         ` Ronan Arraes Jardim Chagas
  2016-09-01 13:21                           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-01 12:57 UTC (permalink / raw)
  To: Chris Murphy, Jeff Mahoney; +Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

Hi!

Em Qua, 2016-08-31 às 17:09 -0600, Chris Murphy escreveu:
> OK so Ronan, I'm gonna guess the simplest work around for your
> problem
> is to disable quota support, and see if the problem happens again.
> 

Look at the output of the command proposed by Jeff:

btrfs qgroup show /
qgroupid         rfer         excl 
--------         ----         ---- 
0/5          16.00KiB     16.00KiB 
0/257        16.00KiB     16.00KiB 
0/258        16.30MiB     16.30MiB 
0/259        11.65GiB    309.67MiB 
0/260         2.34MiB      2.34MiB 
0/261        16.00KiB     16.00KiB 
0/262        13.19GiB     13.19GiB 
0/263        16.00KiB     16.00KiB 
0/264        60.00KiB     60.00KiB 
0/265       480.00KiB    480.00KiB 
0/266        16.00KiB     16.00KiB 
0/267         2.00GiB      2.00GiB 
0/268        16.00KiB     16.00KiB 
0/269        16.00KiB     16.00KiB 
0/270        16.00KiB     16.00KiB 
0/271        16.00KiB     16.00KiB 
0/272        16.00KiB     16.00KiB 
0/273        16.00KiB     16.00KiB 
0/274        16.00KiB     16.00KiB 
0/275       205.78MiB    205.78MiB 
0/276        16.00KiB     16.00KiB 
0/277        48.00KiB     48.00KiB 
0/278       328.41MiB    328.41MiB 
0/283         3.92GiB     26.63MiB 
0/285         3.93GiB      4.10MiB 
0/294         7.84GiB    100.59MiB 
0/330         7.98GiB      6.61MiB 
0/332         8.32GiB     69.17MiB 
0/353         9.53GiB     49.46MiB 
0/355        10.51GiB    235.39MiB 
0/415        11.54GiB      3.38MiB 
0/416        11.54GiB    896.00KiB 
0/417        11.57GiB      2.68MiB 
0/418        11.57GiB    160.00KiB 
0/419        11.54GiB      2.40MiB 
0/420        11.54GiB    192.00KiB 
0/421        11.62GiB      4.61MiB 
0/422        11.83GiB    212.93MiB 
0/427        11.64GiB      1.27MiB 
0/428        11.65GiB      4.25MiB 
1/0          16.11GiB      4.77GiB 
255/262      13.19GiB     13.19GiB 

This system was installed with Tumbleweed ISO and I did not change
anything in btrfs options. Hence, it seems that openSUSE is enabling
quotas by default. Now, I need to disable it and avoid triggering the
problem. What is the best way I can do this? Is it OK to do just:

btrfs quota disable /

? Or do I need to format and recreate btrfs without quotas?

> If it doesn't happen again then it sounds like the reproduce steps
> are:
> 
> a. enable quota support
> b. do something metadata heavy workload that's also maybe hitting
> fsync; from opensuse list the example that sometimes causes it:
> 
> 
>   osc co home:Ronis_BR/julia
>   cd home:Ronis_BR/julia
>   osc build --root=`pwd`/jail openSUSE_Tumbleweed x86_64
> 
> I wonder if it's easier to hit it on a hard drive, slower fsyncs?

This sounds good! Actually, I'm using a 7200RPM hard driver.

Thank you all very much for all the help,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-01 12:57                         ` Ronan Arraes Jardim Chagas
@ 2016-09-01 13:21                           ` Austin S. Hemmelgarn
  2016-09-01 16:34                             ` Ronan Arraes Jardim Chagas
  2016-09-01 17:07                             ` Chris Murphy
  0 siblings, 2 replies; 82+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-01 13:21 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Chris Murphy, Jeff Mahoney
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

On 2016-09-01 08:57, Ronan Arraes Jardim Chagas wrote:
> Hi!
>
> Em Qua, 2016-08-31 às 17:09 -0600, Chris Murphy escreveu:
>> OK so Ronan, I'm gonna guess the simplest work around for your
>> problem
>> is to disable quota support, and see if the problem happens again.
>>
>
> Look at the output of the command proposed by Jeff:
>
> btrfs qgroup show /
> qgroupid         rfer         excl
> --------         ----         ----
> 0/5          16.00KiB     16.00KiB
> 0/257        16.00KiB     16.00KiB
> 0/258        16.30MiB     16.30MiB
> 0/259        11.65GiB    309.67MiB
> 0/260         2.34MiB      2.34MiB
> 0/261        16.00KiB     16.00KiB
> 0/262        13.19GiB     13.19GiB
> 0/263        16.00KiB     16.00KiB
> 0/264        60.00KiB     60.00KiB
> 0/265       480.00KiB    480.00KiB
> 0/266        16.00KiB     16.00KiB
> 0/267         2.00GiB      2.00GiB
> 0/268        16.00KiB     16.00KiB
> 0/269        16.00KiB     16.00KiB
> 0/270        16.00KiB     16.00KiB
> 0/271        16.00KiB     16.00KiB
> 0/272        16.00KiB     16.00KiB
> 0/273        16.00KiB     16.00KiB
> 0/274        16.00KiB     16.00KiB
> 0/275       205.78MiB    205.78MiB
> 0/276        16.00KiB     16.00KiB
> 0/277        48.00KiB     48.00KiB
> 0/278       328.41MiB    328.41MiB
> 0/283         3.92GiB     26.63MiB
> 0/285         3.93GiB      4.10MiB
> 0/294         7.84GiB    100.59MiB
> 0/330         7.98GiB      6.61MiB
> 0/332         8.32GiB     69.17MiB
> 0/353         9.53GiB     49.46MiB
> 0/355        10.51GiB    235.39MiB
> 0/415        11.54GiB      3.38MiB
> 0/416        11.54GiB    896.00KiB
> 0/417        11.57GiB      2.68MiB
> 0/418        11.57GiB    160.00KiB
> 0/419        11.54GiB      2.40MiB
> 0/420        11.54GiB    192.00KiB
> 0/421        11.62GiB      4.61MiB
> 0/422        11.83GiB    212.93MiB
> 0/427        11.64GiB      1.27MiB
> 0/428        11.65GiB      4.25MiB
> 1/0          16.11GiB      4.77GiB
> 255/262      13.19GiB     13.19GiB
>
> This system was installed with Tumbleweed ISO and I did not change
> anything in btrfs options. Hence, it seems that openSUSE is enabling
> quotas by default. Now, I need to disable it and avoid triggering the
> problem. What is the best way I can do this? Is it OK to do just:
>
> btrfs quota disable /
>
> ? Or do I need to format and recreate btrfs without quotas?
Yes, you can just run `btrfs quota disable /` and it should work.  This 
ironically reiterates that one of the bigger problems with BTRFS is that 
distros are enabling unstable and known broken features by default on 
install.  I was pretty much dumbfounded when I first learned that 
OpenSUSE is enabling BTRFS qgroups by default since they are known to 
not work reliably and cause all kinds of issues.
>
>> If it doesn't happen again then it sounds like the reproduce steps
>> are:
>>
>> a. enable quota support
>> b. do something metadata heavy workload that's also maybe hitting
>> fsync; from opensuse list the example that sometimes causes it:
>>
>>
>>   osc co home:Ronis_BR/julia
>>   cd home:Ronis_BR/julia
>>   osc build --root=`pwd`/jail openSUSE_Tumbleweed x86_64
>>
>> I wonder if it's easier to hit it on a hard drive, slower fsyncs?
>
> This sounds good! Actually, I'm using a 7200RPM hard driver.


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-01 13:21                           ` Austin S. Hemmelgarn
@ 2016-09-01 16:34                             ` Ronan Arraes Jardim Chagas
  2016-09-01 17:04                               ` Austin S. Hemmelgarn
  2016-09-01 17:07                             ` Chris Murphy
  1 sibling, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-01 16:34 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Chris Murphy, Jeff Mahoney
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

Em Qui, 2016-09-01 às 09:21 -0400, Austin S. Hemmelgarn escreveu:
> Yes, you can just run `btrfs quota disable /` and it should
> work.  This 
> ironically reiterates that one of the bigger problems with BTRFS is
> that 
> distros are enabling unstable and known broken features by default
> on 
> install.  I was pretty much dumbfounded when I first learned that 
> OpenSUSE is enabling BTRFS qgroups by default since they are known
> to 
> not work reliably and cause all kinds of issues.

Thanks Austin! I executed the command and now I get:

btrfs qgroup show /
ERROR: can't perform the search - No such file or directory
ERROR: can't list qgroups: No such file or directory

as expected. Now I will wait for +- 1 week to see if the problem will
occur and, if not, I will send an e-mail to openSUSE factory mailing
list to start a discussion if it is better to not enable qgroups by
default.

Best regards and thanks everyone for the help,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-01 16:34                             ` Ronan Arraes Jardim Chagas
@ 2016-09-01 17:04                               ` Austin S. Hemmelgarn
  2016-09-01 17:12                                 ` Jeff Mahoney
  0 siblings, 1 reply; 82+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-01 17:04 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Chris Murphy, Jeff Mahoney
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

On 2016-09-01 12:34, Ronan Arraes Jardim Chagas wrote:
> Em Qui, 2016-09-01 às 09:21 -0400, Austin S. Hemmelgarn escreveu:
>> Yes, you can just run `btrfs quota disable /` and it should
>> work.  This
>> ironically reiterates that one of the bigger problems with BTRFS is
>> that
>> distros are enabling unstable and known broken features by default
>> on
>> install.  I was pretty much dumbfounded when I first learned that
>> OpenSUSE is enabling BTRFS qgroups by default since they are known
>> to
>> not work reliably and cause all kinds of issues.
>
> Thanks Austin! I executed the command and now I get:
>
> btrfs qgroup show /
> ERROR: can't perform the search - No such file or directory
> ERROR: can't list qgroups: No such file or directory
>
> as expected. Now I will wait for +- 1 week to see if the problem will
> occur and, if not, I will send an e-mail to openSUSE factory mailing
> list to start a discussion if it is better to not enable qgroups by
> default.
I have a feeling that you'll probably have no issues.

As far as having qgroups enabled by default, I think the reasoning is to 
emulate having separate filesystems with their own space limits.  I can 
entirely understand this use case, and TBH it's about the only use case 
I'd consider quota groups for (per-user subvolumes for home directories 
are great, but there are numerous perfectly legitimate reasons to have 
very large amounts of data in your home directory for very short periods 
of time, so I wouldn't personally use qgroups there).  The problem 
arises from the fact that it doesn't _look_ like separate filesystems 
(single entry in df, all the mounts point at the same device, etc), and 
the standard of overloading ENOSPC to mean you've hit your quota leads 
to lots of confusion in this particular case (especially considering the 
free space issues that BTRFS is known to have from time to time).

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-01 13:21                           ` Austin S. Hemmelgarn
  2016-09-01 16:34                             ` Ronan Arraes Jardim Chagas
@ 2016-09-01 17:07                             ` Chris Murphy
  1 sibling, 0 replies; 82+ messages in thread
From: Chris Murphy @ 2016-09-01 17:07 UTC (permalink / raw)
  To: Austin S. Hemmelgarn
  Cc: Ronan Arraes Jardim Chagas, Chris Murphy, Jeff Mahoney,
	Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

On Thu, Sep 1, 2016 at 7:21 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

> Yes, you can just run `btrfs quota disable /` and it should work.  This
> ironically reiterates that one of the bigger problems with BTRFS is that
> distros are enabling unstable and known broken features by default on
> install.  I was pretty much dumbfounded when I first learned that OpenSUSE
> is enabling BTRFS qgroups by default since they are known to not work
> reliably and cause all kinds of issues.

Yes, I've just confirmed this on the OpenSUSE Factory mailing list.
[1] This is default on Tumbleweed (devel) and Leap (stable), and also
SLE 12 SP2.

The feature that depends on it, that's actually enabling it is snapper:
http://snapper.io/2016/05/18/space-aware-cleanup.html

That feature says "btrfs quota support looks mature enough" which is
big news to me. If it's that mature, why not make it the mkfs default?
Just turn it on for everyone out of the gate? And if it isn't that
mature, is it really appropriate for broad, by default, silent
deployment for opensuse stable, and SUSE enterprise? I'm surprised no
one said on this list that qgroups were stable enough for widespread
testing for list regulars first. It just suddenly ends up enabled
across three major distro outputs?

Even the fucking error messages were misleading. It wasn't until the
most recent call trace that qgroups was even considered as possibly
being related to this. How is it that busting a quota limit doesn't
cause a very explicit quota related message, rather than a generic
enospc?

[1] https://lists.opensuse.org/opensuse-factory/2016-09/msg00033.html

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-01 17:04                               ` Austin S. Hemmelgarn
@ 2016-09-01 17:12                                 ` Jeff Mahoney
  2016-09-01 17:39                                   ` Ronan Arraes Jardim Chagas
                                                     ` (2 more replies)
  0 siblings, 3 replies; 82+ messages in thread
From: Jeff Mahoney @ 2016-09-01 17:12 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Ronan Arraes Jardim Chagas, Chris Murphy
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo


[-- Attachment #1.1: Type: text/plain, Size: 2520 bytes --]

On 9/1/16 1:04 PM, Austin S. Hemmelgarn wrote:
> On 2016-09-01 12:34, Ronan Arraes Jardim Chagas wrote:
>> Em Qui, 2016-09-01 às 09:21 -0400, Austin S. Hemmelgarn escreveu:
>>> Yes, you can just run `btrfs quota disable /` and it should
>>> work.  This
>>> ironically reiterates that one of the bigger problems with BTRFS is
>>> that
>>> distros are enabling unstable and known broken features by default
>>> on
>>> install.  I was pretty much dumbfounded when I first learned that
>>> OpenSUSE is enabling BTRFS qgroups by default since they are known
>>> to
>>> not work reliably and cause all kinds of issues.
>>
>> Thanks Austin! I executed the command and now I get:
>>
>> btrfs qgroup show /
>> ERROR: can't perform the search - No such file or directory
>> ERROR: can't list qgroups: No such file or directory
>>
>> as expected. Now I will wait for +- 1 week to see if the problem will
>> occur and, if not, I will send an e-mail to openSUSE factory mailing
>> list to start a discussion if it is better to not enable qgroups by
>> default.
> I have a feeling that you'll probably have no issues.
> 
> As far as having qgroups enabled by default, I think the reasoning is to
> emulate having separate filesystems with their own space limits.  I can

It's not.  We use qgroups because that's the only way we can track how
much space each subvolume is using, regardless of whether anyone wants
to do enforcement.  When it's working properly, snapper can make use of
that information to make informed decisions on how much space will
actually be released when removing old snapshots.

> entirely understand this use case, and TBH it's about the only use case
> I'd consider quota groups for (per-user subvolumes for home directories
> are great, but there are numerous perfectly legitimate reasons to have
> very large amounts of data in your home directory for very short periods
> of time, so I wouldn't personally use qgroups there).  The problem
> arises from the fact that it doesn't _look_ like separate filesystems
> (single entry in df, all the mounts point at the same device, etc), and

On SUSE-based kernels, the inodes on different subvolumes report the
anonymous device associated with the subvolume.

That said, I have a WIP that creates (and auto-tears down) vfsmounts for
each subvolume.  It's not all the way to a working df that would use the
qgroup information to report space usage, but it's a start.

-Jeff


-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-01 17:12                                 ` Jeff Mahoney
@ 2016-09-01 17:39                                   ` Ronan Arraes Jardim Chagas
  2016-09-01 17:43                                     ` Jeff Mahoney
  2016-09-01 17:45                                   ` Chris Murphy
  2016-09-01 18:47                                   ` Austin S. Hemmelgarn
  2 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-01 17:39 UTC (permalink / raw)
  To: Jeff Mahoney, Austin S. Hemmelgarn, Chris Murphy
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

Hi Jeff,

Em Qui, 2016-09-01 às 13:12 -0400, Jeff Mahoney escreveu:
> It's not.  We use qgroups because that's the only way we can track
> how
> much space each subvolume is using, regardless of whether anyone
> wants
> to do enforcement.  When it's working properly, snapper can make use
> of
> that information to make informed decisions on how much space will
> actually be released when removing old snapshots.
> 

Given that, what am I loosing by disabling qgroups here? Will I still
be able to recover my machine using snapshots (this saved my two or
three times)?

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-01 17:39                                   ` Ronan Arraes Jardim Chagas
@ 2016-09-01 17:43                                     ` Jeff Mahoney
  2016-09-01 17:58                                       ` Ronan Arraes Jardim Chagas
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff Mahoney @ 2016-09-01 17:43 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Austin S. Hemmelgarn, Chris Murphy
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

[-- Attachment #1.1: Type: text/plain, Size: 1023 bytes --]

On 9/1/16 1:39 PM, Ronan Arraes Jardim Chagas wrote:
> Hi Jeff,
> 
> Em Qui, 2016-09-01 às 13:12 -0400, Jeff Mahoney escreveu:
>> It's not.  We use qgroups because that's the only way we can track
>> how
>> much space each subvolume is using, regardless of whether anyone
>> wants
>> to do enforcement.  When it's working properly, snapper can make use
>> of
>> that information to make informed decisions on how much space will
>> actually be released when removing old snapshots.
>>
> 
> Given that, what am I loosing by disabling qgroups here? Will I still
> be able to recover my machine using snapshots (this saved my two or
> three times)?

Absolutely.  It doesn't affect the ability to take, retain, or recover
using snapshots.  It only affects the ability to see how much space a
particular snapshot is using on disk, both from the user wanting to know
and snapper using it to make retention decisions.  Snapper can handle
qgroups not being there.

-Jeff

-- 
Jeff Mahoney
SUSE Labs

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-01 17:12                                 ` Jeff Mahoney
  2016-09-01 17:39                                   ` Ronan Arraes Jardim Chagas
@ 2016-09-01 17:45                                   ` Chris Murphy
  2016-09-01 18:47                                   ` Austin S. Hemmelgarn
  2 siblings, 0 replies; 82+ messages in thread
From: Chris Murphy @ 2016-09-01 17:45 UTC (permalink / raw)
  To: Jeff Mahoney
  Cc: Austin S. Hemmelgarn, Ronan Arraes Jardim Chagas, Chris Murphy,
	Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

On Thu, Sep 1, 2016 at 11:12 AM, Jeff Mahoney <jeffm@suse.com> wrote:
> On 9/1/16 1:04 PM, Austin S. Hemmelgarn wrote:
>> On 2016-09-01 12:34, Ronan Arraes Jardim Chagas wrote:
>>> Em Qui, 2016-09-01 às 09:21 -0400, Austin S. Hemmelgarn escreveu:
>>>> Yes, you can just run `btrfs quota disable /` and it should
>>>> work.  This
>>>> ironically reiterates that one of the bigger problems with BTRFS is
>>>> that
>>>> distros are enabling unstable and known broken features by default
>>>> on
>>>> install.  I was pretty much dumbfounded when I first learned that
>>>> OpenSUSE is enabling BTRFS qgroups by default since they are known
>>>> to
>>>> not work reliably and cause all kinds of issues.
>>>
>>> Thanks Austin! I executed the command and now I get:
>>>
>>> btrfs qgroup show /
>>> ERROR: can't perform the search - No such file or directory
>>> ERROR: can't list qgroups: No such file or directory
>>>
>>> as expected. Now I will wait for +- 1 week to see if the problem will
>>> occur and, if not, I will send an e-mail to openSUSE factory mailing
>>> list to start a discussion if it is better to not enable qgroups by
>>> default.
>> I have a feeling that you'll probably have no issues.
>>
>> As far as having qgroups enabled by default, I think the reasoning is to
>> emulate having separate filesystems with their own space limits.  I can
>
> It's not.  We use qgroups because that's the only way we can track how
> much space each subvolume is using, regardless of whether anyone wants
> to do enforcement.  When it's working properly, snapper can make use of
> that information to make informed decisions on how much space will
> actually be released when removing old snapshots.
>
>> entirely understand this use case, and TBH it's about the only use case
>> I'd consider quota groups for (per-user subvolumes for home directories
>> are great, but there are numerous perfectly legitimate reasons to have
>> very large amounts of data in your home directory for very short periods
>> of time, so I wouldn't personally use qgroups there).  The problem
>> arises from the fact that it doesn't _look_ like separate filesystems
>> (single entry in df, all the mounts point at the same device, etc), and
>
> On SUSE-based kernels, the inodes on different subvolumes report the
> anonymous device associated with the subvolume.
>
> That said, I have a WIP that creates (and auto-tears down) vfsmounts for
> each subvolume.  It's not all the way to a working df that would use the
> qgroup information to report space usage, but it's a start.


Jeff, I'm a little bit irritated because I initially suspected in this
thread that this was an opensuse issue. That I questioned the kernel
as the source is really beside the point. You didn't even recognize
this might be quota related based on what was going on, because you
bounced him back to this list when I suggested he take the issue to
the opensuse-factory list.

What Ronan was reporting was behavior that no one on this list has
ever previously reported. And upstream does not have quotas enabled by
default so there is no reason why any regular testers here would have
come across this.

So now we come full circle and I have to call this a misfeature that's
trying to make up for another one, which is neurotic levels of
snapshots taken by snapper out of the box. There is no good goddamn
reason for it to take 100 read only snapshots in two fucking days.
It's any wonder why the results are pathological.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-01 17:43                                     ` Jeff Mahoney
@ 2016-09-01 17:58                                       ` Ronan Arraes Jardim Chagas
  0 siblings, 0 replies; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-01 17:58 UTC (permalink / raw)
  To: Jeff Mahoney, Austin S. Hemmelgarn, Chris Murphy
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

Hi Jeff,

Em Qui, 2016-09-01 às 13:43 -0400, Jeff Mahoney escreveu:
> Absolutely.  It doesn't affect the ability to take, retain, or
> recover
> using snapshots.  It only affects the ability to see how much space a
> particular snapshot is using on disk, both from the user wanting to
> know
> and snapper using it to make retention decisions.  Snapper can handle
> qgroups not being there.
> 

Thanks for the prompt answer. I'm glad because space is not a concern
here, at least now :) Hence, I have plenty time to wait for a proper
fix. Until there, I will try to keep my snapshot count low.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-01 17:12                                 ` Jeff Mahoney
  2016-09-01 17:39                                   ` Ronan Arraes Jardim Chagas
  2016-09-01 17:45                                   ` Chris Murphy
@ 2016-09-01 18:47                                   ` Austin S. Hemmelgarn
  2016-09-02  0:12                                     ` Chris Murphy
  2 siblings, 1 reply; 82+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-01 18:47 UTC (permalink / raw)
  To: Jeff Mahoney, Ronan Arraes Jardim Chagas, Chris Murphy
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

On 2016-09-01 13:12, Jeff Mahoney wrote:
> On 9/1/16 1:04 PM, Austin S. Hemmelgarn wrote:
>> On 2016-09-01 12:34, Ronan Arraes Jardim Chagas wrote:
>>> Em Qui, 2016-09-01 às 09:21 -0400, Austin S. Hemmelgarn escreveu:
>>>> Yes, you can just run `btrfs quota disable /` and it should
>>>> work.  This
>>>> ironically reiterates that one of the bigger problems with BTRFS is
>>>> that
>>>> distros are enabling unstable and known broken features by default
>>>> on
>>>> install.  I was pretty much dumbfounded when I first learned that
>>>> OpenSUSE is enabling BTRFS qgroups by default since they are known
>>>> to
>>>> not work reliably and cause all kinds of issues.
>>>
>>> Thanks Austin! I executed the command and now I get:
>>>
>>> btrfs qgroup show /
>>> ERROR: can't perform the search - No such file or directory
>>> ERROR: can't list qgroups: No such file or directory
>>>
>>> as expected. Now I will wait for +- 1 week to see if the problem will
>>> occur and, if not, I will send an e-mail to openSUSE factory mailing
>>> list to start a discussion if it is better to not enable qgroups by
>>> default.
>> I have a feeling that you'll probably have no issues.
>>
>> As far as having qgroups enabled by default, I think the reasoning is to
>> emulate having separate filesystems with their own space limits.  I can
>
> It's not.  We use qgroups because that's the only way we can track how
> much space each subvolume is using, regardless of whether anyone wants
> to do enforcement.  When it's working properly, snapper can make use of
> that information to make informed decisions on how much space will
> actually be released when removing old snapshots.
This is all well and good, but it ignores a few specific things:
1. There are numerous known issues with qgroups right now.  This 
includes among other things returning ENOSPC when it should return 
EDQUOT (this isn't your fault, but you haven't tried to fix it either), 
and all kinds of general usability issues (systems tend to misbehave 
when at or near the quotas for example).
2. Snapper's default snapshot creation configuration is absolutely 
pathological in nature, generating insane amounts of background resource 
usage and taking up huge amounts of space.  If this were changed, you 
would be a lot less dependent on being able to free up snapshots based 
on space usage.
3. It is fully possible (now, it may not have been when this choice was 
made) to get this info without using qgroups.  btrfs filesystem du can 
be used to determine essentially the same information (summing the 
values in the second column will give you a reasonable estimate of how 
much space deleting the snapshot will free).
4. Enabling such a marginal technology without user intervention with no 
warnings about it or other notice that it's being used is a pretty solid 
example of something that a developer should not do.

It's poor choices like this that fall into the category of 'Ooh, this 
looks cool, let's do it!' made by major distros that are most of the 
reason that BTRFS has such a bad reputation right now.  This is not 
something that should reasonably be on a production system, especially 
considering that even most of the BTRFS developers don't use qgroups, 
and that apparently your own customer support people couldn't tell that 
qgroups were to blame (seriously, your _ABSOLUTE FIRST SUGGESTION_ 
should have been to disable qgroups and see if the issue went away).

I get that you want something on par with Windows Restore Points or the 
bootable snapshot functionality provided by ZFS on Solaris, but qgroups 
really aren't at all essential to that, and even if they were, such 
functionality isn't even remotely ready for production usage on Linux yet.
>
>> entirely understand this use case, and TBH it's about the only use case
>> I'd consider quota groups for (per-user subvolumes for home directories
>> are great, but there are numerous perfectly legitimate reasons to have
>> very large amounts of data in your home directory for very short periods
>> of time, so I wouldn't personally use qgroups there).  The problem
>> arises from the fact that it doesn't _look_ like separate filesystems
>> (single entry in df, all the mounts point at the same device, etc), and
>
> On SUSE-based kernels, the inodes on different subvolumes report the
> anonymous device associated with the subvolume.
>
> That said, I have a WIP that creates (and auto-tears down) vfsmounts for
> each subvolume.  It's not all the way to a working df that would use the
> qgroup information to report space usage, but it's a start.
So in other words even more dependence on a feature that doesn't even 
work reliably?


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-01 18:47                                   ` Austin S. Hemmelgarn
@ 2016-09-02  0:12                                     ` Chris Murphy
  2016-09-02 14:26                                       ` Jeff Mahoney
  0 siblings, 1 reply; 82+ messages in thread
From: Chris Murphy @ 2016-09-02  0:12 UTC (permalink / raw)
  To: Austin S. Hemmelgarn
  Cc: Jeff Mahoney, Ronan Arraes Jardim Chagas, Chris Murphy,
	Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

On Thu, Sep 1, 2016 at 12:47 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

> 2. Snapper's default snapshot creation configuration is absolutely
> pathological in nature, generating insane amounts of background resource
> usage and taking up huge amounts of space.  If this were changed, you would
> be a lot less dependent on being able to free up snapshots based on space
> usage.

That's diplomatic.

They know all of this already though, but instead of toning down
snapper defaults, they're amping up the voluming by enabling quotas
instead.

There is only one logical reason for this that I can thing of. They're
trying to increase problem reports, presumably in order to smooth out
noisy data, maybe even by getting better bug reports like Ronan's. But
I think this is a specious policy.

> It's poor choices like this that fall into the category of 'Ooh, this looks
> cool, let's do it!' made by major distros that are most of the reason that
> BTRFS has such a bad reputation right now.

Over on Factory list, they're trying to have this two ways. First
they're saying quotas are stable as they've implemented them in the
Leap 4.4 kernel. And they consider the btrfs-progs man page warning
that quotas aren't yet stable even in 4.7, and aren't recommended
unless the user will use them, is a bug that should be removed from
their copy of the man page.

So, what are they using? Pulling out such warnings doesn't make
upstream code backported to their 4.4 kernel magically stable. If
they're using out of tree quota code, fine, remove the warnings. But
then, what is this code? How does it interact with upstream kernels?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-31 21:44             ` Chris Murphy
  2016-08-31 21:48               ` Chris Murphy
@ 2016-09-02  0:37               ` Qu Wenruo
  1 sibling, 0 replies; 82+ messages in thread
From: Qu Wenruo @ 2016-09-02  0:37 UTC (permalink / raw)
  To: Chris Murphy, Ronan Arraes Jardim Chagas; +Cc: Wang Xiaoguang, Btrfs BTRFS

At 09/01/2016 05:44 AM, Chris Murphy wrote:
> On Wed, Aug 31, 2016 at 2:49 PM, Ronan Arraes Jardim Chagas
> <ronisbr@gmail.com> wrote:
>> Hi guys!
>>
>> And the problem happened again. This time, I was only using Mozilla
>> Firefox. I could get the very first message after the error. I hope it
>> brings more information:
>>
>> [28039.672199] ------------[ cut here ]------------
>> [28039.672253] WARNING: CPU: 3 PID: 31800 at ../fs/btrfs/qgroup.c:2667
>> btrfs_qgroup_free_meta+0x88/0x90 [btrfs]
>
>
> Does this file system have quota enabled?
>
> I'm testing this right now and can't even figure out how to determine
> when quota is enabled on a Btrfs file system. There's enable, disable,
> and rescan. If it's enabled or disabled, I get the same message if I
> rescan. If I mount the file system with quota previously enabled,
> there is no mount time notification that quota is enabled.
>
> I sincerely hope opensuse isn't enabled quota by default.
>
>
>
The　kernel warning is interesting.

It means qgroup is underflowing its reserved metadata space.
However although it's a warning, it won't really under flow the numbers, 
but decrease it to zero.

It shows there is something wrong with metadata allocation, but won't 
directly cause quota corruption.

Quota uses two isolated different system, one extent based for qgroup 
numbers,
and one reserved space based for reserved space.

The latter one is only used to prevent user from exceeding qgroup limit, 
and if user doesn't use limit, it won't cause any qgroup corruption or 
ENOSPC.

Further more, if it's qgroup reserved space causing anything wrong, it 
won't return -ENOSPC, but -EDQUOT.

So, just as Wang suspected, there is something wrong with metadata 
allocation, causing the problem and triggering the qgroup warning.

Thankg,
Qu

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-31 20:49           ` Ronan Arraes Jardim Chagas
  2016-08-31 21:44             ` Chris Murphy
@ 2016-09-02 14:09             ` Jeff Mahoney
  1 sibling, 0 replies; 82+ messages in thread
From: Jeff Mahoney @ 2016-09-02 14:09 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Chris Murphy; +Cc: Wang Xiaoguang, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 4493 bytes --]

On 8/31/16 4:49 PM, Ronan Arraes Jardim Chagas wrote:
> Hi guys!
> 
> And the problem happened again. This time, I was only using Mozilla
> Firefox. I could get the very first message after the error. I hope it
> brings more information:

Ok, so I think this is a race that can happen when one thread is
starting a transaction and another thread is committing a transaction
that involves creating a snapshot.

We reserve blocks at the top of start_transaction and that reservation
stays with the root.  In: btrfs_commit_transaction->
create_pending_snapshots-> create_pending_snapshot->
qgroup_account_snapshot-> commit_fs_roots, we clear that reservation
from the root via btrfs_qgroup_free_meta_all, potentially while
start_transaction is waiting to join a new transaction.  Or not.  It can
happen asynchronously, which is the point of having the reservation
prior to that.

So the thing is that this error can only occur if start_transaction
fails after this race occurs.  That, combined with your report that you
were seeing ENOSPC instead of EDQUOT, leads me to believe that this is
just a side effect of whatever is causing you to not hit ENOSPC.  I
expect that you'll see it again -- you just won't see the WARN_ON
anymore since quotas are disabled.  I suspect it's probably the
btrfs_block_rsv_add call immediately after the reservation, but there's
no way to tell without tracing.

-Jeff


> [28039.672199] ------------[ cut here ]------------
> [28039.672253] WARNING: CPU: 3 PID: 31800 at ../fs/btrfs/qgroup.c:2667
> btrfs_qgroup_free_meta+0x88/0x90 [btrfs]
> [28039.672255] Modules linked in: fuse nf_log_ipv6 xt_pkttype
> nf_log_ipv4 nf_log_common xt_LOG xt_limit af_packet iscsi_ibft
> iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6 xt_tcpudp
> nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT nf_reject_ipv4
> iptable_raw xt_CT nvidia_drm(PO) nvidia_modeset(PO) iptable_filter
> nvidia(PO) ip6table_mangle nf_conntrack_netbios_ns
> nf_conntrack_broadcast drm_kms_helper nf_conntrack_ipv4 drm
> nf_defrag_ipv4 fb_sys_fops snd_hda_codec_hdmi joydev
> snd_hda_codec_realtek ip_tables syscopyarea snd_hda_codec_generic
> xt_conntrack snd_hda_intel sysfillrect intel_rapl sb_edac edac_core
> snd_hda_codec hp_wmi x86_pkg_temp_thermal intel_powerclamp snd_hda_core
> snd_hwdep nf_conntrack sparse_keymap sysimgblt coretemp kvm_intel kvm
> rfkill irqbypass snd_pcm snd_timer crct10dif_pclmul
> [28039.672305]  e1000e crc32_pclmul ghash_clmulni_intel snd aesni_intel
> ip6table_filter aes_x86_64 lrw gf128mul glue_helper ablk_helper
> iTCO_wdt iTCO_vendor_support mei_wdt ioatdma pcspkr cryptd ip6_tables
> ptp lpc_ich fjes i2c_i801 dca mfd_core soundcore pps_core shpchp
> tpm_infineon tpm_tis tpm mei_me mei x_tables btrfs xor raid6_pq
> hid_generic usbhid crc32c_intel serio_raw xhci_pci ehci_pci sr_mod
> firewire_ohci xhci_hcd ehci_hcd cdrom firewire_core crc_itu_t isci
> usbcore usb_common libsas ata_generic mpt3sas raid_class
> scsi_transport_sas wmi button sg
> [28039.672373] CPU: 3 PID: 31800 Comm: gnome-terminal- Tainted:
> P        W  O    4.7.1-1-default #1
> [28039.672375] Hardware name: Hewlett-Packard HP Z820 Workstation/158B,
> BIOS J63 v03.65 12/19/2013
> [28039.672378]  0000000000000000 ffffffff81393104 0000000000000000
> 0000000000000000
> [28039.672382]  ffffffff8107ca1e ffff881008780800 0000000000014000
> ffff881008780800
> [28039.672386]  ffffffffffffffe4 ffff88100b297c00 ffff88053b7e3540
> ffffffffa02c9f58
> [28039.672390] Call Trace:
> [28039.672406]  [<ffffffff8102ed5e>] dump_trace+0x5e/0x320
> [28039.672413]  [<ffffffff8102f12c>] show_stack_log_lvl+0x10c/0x180
> [28039.672419]  [<ffffffff8102fe41>] show_stack+0x21/0x40
> [28039.672425]  [<ffffffff81393104>] dump_stack+0x5c/0x78
> [28039.672430]  [<ffffffff8107ca1e>] __warn+0xbe/0xe0
> [28039.672461]  [<ffffffffa02c9f58>] btrfs_qgroup_free_meta+0x88/0x90
> [btrfs]
> [28039.672492]  [<ffffffffa0261023>] start_transaction+0x3c3/0x4f0
> [btrfs]
> [28039.672521]  [<ffffffffa0271078>] btrfs_create+0x38/0x1d0 [btrfs]
> [28039.672528]  [<ffffffff8121e8fb>] path_openat+0x139b/0x14a0
> [28039.672535]  [<ffffffff8121fb2e>] do_filp_open+0x7e/0xe0
> [28039.672541]  [<ffffffff8120e7a4>] do_sys_open+0x124/0x1f0
> [28039.672547]  [<ffffffff816bb4f6>]
> entry_SYSCALL_64_fastpath+0x1e/0xa8
> [28039.676186] DWARF2 unwinder stuck at
> entry_SYSCALL_64_fastpath+0x1e/0xa8


-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-02  0:12                                     ` Chris Murphy
@ 2016-09-02 14:26                                       ` Jeff Mahoney
  2016-09-02 14:43                                         ` Ronan Arraes Jardim Chagas
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff Mahoney @ 2016-09-02 14:26 UTC (permalink / raw)
  To: Chris Murphy, Austin S. Hemmelgarn
  Cc: Ronan Arraes Jardim Chagas, Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

[-- Attachment #1.1: Type: text/plain, Size: 3910 bytes --]

On 9/1/16 8:12 PM, Chris Murphy wrote:
> On Thu, Sep 1, 2016 at 12:47 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
> 
> 
>> 2. Snapper's default snapshot creation configuration is absolutely
>> pathological in nature, generating insane amounts of background resource
>> usage and taking up huge amounts of space.  If this were changed, you would
>> be a lot less dependent on being able to free up snapshots based on space
>> usage.
> 
> That's diplomatic.
> 
> They know all of this already though, but instead of toning down
> snapper defaults, they're amping up the voluming by enabling quotas
> instead.
> 
> There is only one logical reason for this that I can thing of. They're
> trying to increase problem reports, presumably in order to smooth out
> noisy data, maybe even by getting better bug reports like Ronan's. But
> I think this is a specious policy.

There's no conspiracy to leverage the openSUSE user base to generate bug
reports any more than enabling any other feature in Tumbleweed before
SLES is.  We've enabled qgroups by default so that snapper can make sane
decisions based on space usage.  That's it.

>> It's poor choices like this that fall into the category of 'Ooh, this looks
>> cool, let's do it!' made by major distros that are most of the reason that
>> BTRFS has such a bad reputation right now.
> 
> Over on Factory list, they're trying to have this two ways. First
> they're saying quotas are stable as they've implemented them in the
> Leap 4.4 kernel. And they consider the btrfs-progs man page warning
> that quotas aren't yet stable even in 4.7, and aren't recommended
> unless the user will use them, is a bug that should be removed from
> their copy of the man page.

Yep.  That's a bug in the man page.  We do consider them stable.  I see
every btrfs bug that gets reported against SLE12 SP2, upon which the
Leap kernel is based.  Have there been qgroups bugs over the development
cycle?  You bet.  There's a reason if you look at the commit log for
qgroups over the past year, you'll see a bunch of fixes from SUSE
developers.

I explained what I think Ronan's issue is in another part of the thread
just now.  I don't think that's a severe issue at all.  Annoying?  Sure,
but I'm more concerned with the underlying ENOSPC issue.  Without more
info, I don't know what the cause of it is and when it was introduced.

We, like every other group of file system developers, run xfstests
pretty religiously.  Since qgroups are becoming a bigger part of the
btrfs experience for our products, we test them specifically.  Yes,
there are xfstests /just/ for qgroups, but we also make it a point to
run the entire xfstests suite with and without qgroups enabled.  Since
the requirement for snapper was to have accurate space tracking, that's
what we've focused on.

I obviously can't open up the SLES bugzilla to the world, so you're
going to have to take my word on this.  For our 4.4-based kernel there
are currently 3 qgroup related bugs.  The first is a report about how
annoying it is to see old qgroup items for removed subvolumes.  The
second is an accounting bug that is old and the developer just hasn't
gotten around to closing it yet.  The third is a real issue, where users
can hit the qgroup limit and are then stuck, similar to how it used to
be when you'd hit ENOSPC and couldn't remove files or subvolumes.  My
gut feeling is that it's the same kind of problem:  Removing files
involves allocating blocks to CoW the metadata and when you've hit your
quota limit, you can't allocate the blocks.  I expect the solution will
be similar to the ENOSPC issue except that rather than keeping a pool
around, we can just CoW knowing full well the intention is to release
space.  My team is working on that today and I expect a fix shortly.

-Jeff

-- 
Jeff Mahoney
SUSE Labs

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-02 14:26                                       ` Jeff Mahoney
@ 2016-09-02 14:43                                         ` Ronan Arraes Jardim Chagas
  2016-09-02 14:48                                           ` Jeff Mahoney
  0 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-02 14:43 UTC (permalink / raw)
  To: Jeff Mahoney, Chris Murphy, Austin S. Hemmelgarn
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

Hi Jeff,

Em Sex, 2016-09-02 às 10:26 -0400, Jeff Mahoney escreveu:
> I explained what I think Ronan's issue is in another part of the
> thread
> just now.  I don't think that's a severe issue at
> all.  Annoying?  Sure,
> but I'm more concerned with the underlying ENOSPC issue.  Without
> more
> info, I don't know what the cause of it is and when it was
> introduced.

Sorry, but I really need to humbly disagree with you. Look to what has
already happened to me when the problem occurred (which is almost every
day):

1) Firefox crash;
2) Libreoffice crash (auto-save stop working);
3) Can't save my work in any text editor (vim, neovim, gedit, etc.);
4) Sometimes I can't even log as root (in TTY or by `su`);
5) Sometimes only a hard-reset solves the problem;
6) I was left with a broken operational system when the problem
occurred during a `zypper dup`.

I just can't tell you how much work I lost during those situations. So,
I think we cannot call this issue just annoying. I think it is very
severe.

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-02 14:43                                         ` Ronan Arraes Jardim Chagas
@ 2016-09-02 14:48                                           ` Jeff Mahoney
  2016-09-02 15:20                                             ` Ronan Arraes Jardim Chagas
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff Mahoney @ 2016-09-02 14:48 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Chris Murphy, Austin S. Hemmelgarn
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

[-- Attachment #1.1: Type: text/plain, Size: 1339 bytes --]

On 9/2/16 10:43 AM, Ronan Arraes Jardim Chagas wrote:
> Hi Jeff,
> 
> Em Sex, 2016-09-02 às 10:26 -0400, Jeff Mahoney escreveu:
>> I explained what I think Ronan's issue is in another part of the
>> thread
>> just now.  I don't think that's a severe issue at
>> all.  Annoying?  Sure,
>> but I'm more concerned with the underlying ENOSPC issue.  Without
>> more
>> info, I don't know what the cause of it is and when it was
>> introduced.
> 
> Sorry, but I really need to humbly disagree with you. Look to what has
> already happened to me when the problem occurred (which is almost every
> day):
> 
> 1) Firefox crash;
> 2) Libreoffice crash (auto-save stop working);
> 3) Can't save my work in any text editor (vim, neovim, gedit, etc.);
> 4) Sometimes I can't even log as root (in TTY or by `su`);
> 5) Sometimes only a hard-reset solves the problem;
> 6) I was left with a broken operational system when the problem
> occurred during a `zypper dup`.
> 
> I just can't tell you how much work I lost during those situations. So,
> I think we cannot call this issue just annoying. I think it is very
> severe.

Sorry, I miscommunicated there.  The WARN_ON is annoying.  It's the
underlying issue that's causing you to lose work that is the one that
concerns me.

-Jeff

-- 
Jeff Mahoney
SUSE Labs

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-02 14:48                                           ` Jeff Mahoney
@ 2016-09-02 15:20                                             ` Ronan Arraes Jardim Chagas
  2016-09-02 15:26                                               ` Jeff Mahoney
  0 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-02 15:20 UTC (permalink / raw)
  To: Jeff Mahoney, Chris Murphy, Austin S. Hemmelgarn
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

Hi Jeff,

Em Sex, 2016-09-02 às 10:48 -0400, Jeff Mahoney escreveu:
> Sorry, I miscommunicated there.  The WARN_ON is annoying.  It's the
> underlying issue that's causing you to lose work that is the one that
> concerns me.
> 

Oh, OK, I see, sorry about that :)

Thus, if disabling quotas does not help to fix my problem, is there any
workaround you can think of to avoid the problem you suggested in the
previous e-mail?

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-02 15:20                                             ` Ronan Arraes Jardim Chagas
@ 2016-09-02 15:26                                               ` Jeff Mahoney
  2016-09-02 19:25                                                 ` Ronan Arraes Jardim Chagas
  2016-09-02 19:56                                                 ` Ronan Arraes Jardim Chagas
  0 siblings, 2 replies; 82+ messages in thread
From: Jeff Mahoney @ 2016-09-02 15:26 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Chris Murphy, Austin S. Hemmelgarn
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

[-- Attachment #1.1: Type: text/plain, Size: 842 bytes --]

On 9/2/16 11:20 AM, Ronan Arraes Jardim Chagas wrote:
> Hi Jeff,
> 
> Em Sex, 2016-09-02 às 10:48 -0400, Jeff Mahoney escreveu:
>> Sorry, I miscommunicated there.  The WARN_ON is annoying.  It's the
>> underlying issue that's causing you to lose work that is the one that
>> concerns me.
>>  
> 
> Oh, OK, I see, sorry about that :)
> 
> Thus, if disabling quotas does not help to fix my problem, is there any
> workaround you can think of to avoid the problem you suggested in the
> previous e-mail?

Which part?  The quota reservation race will go away with quotas
disabled, so you won't get the WARN_ON.  The ENOSPC issue needs more
investigation before I can suggest a workaround/fix.  I won't be able to
get into that until Tuesday.  (Start of a holiday weekend in the US).

-Jeff

-- 
Jeff Mahoney
SUSE Labs

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-02 15:26                                               ` Jeff Mahoney
@ 2016-09-02 19:25                                                 ` Ronan Arraes Jardim Chagas
  2016-09-05  8:49                                                   ` Qu Wenruo
  2016-09-02 19:56                                                 ` Ronan Arraes Jardim Chagas
  1 sibling, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-02 19:25 UTC (permalink / raw)
  To: Jeff Mahoney, Chris Murphy, Austin S. Hemmelgarn
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

Hi guys!

Jeff was right. I had the problem again today and quotas are disabled
now. I couldn't get any useful message in log this time. Look at the
metadata:

btrfs fi usage /
Overall:
    Device size:		   1.26TiB
    Device allocated:		  43.07GiB
    Device unallocated:		   1.21TiB
    Device missing:		     0.00B
    Used:			  41.94GiB
    Free (estimated):		   1.21TiB	(min: 622.46GiB)
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 352.00MiB	(used: 0.00B)

Data,single: Size:40.01GiB, Used:39.94GiB
   /dev/sda6	  40.01GiB

Metadata,DUP: Size:1.50GiB, Used:1.00GiB
   /dev/sda6	   3.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/sda6	  64.00MiB

Unallocated:
   /dev/sda6	   1.21TiB

Any ideas to help me?

Regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-02 15:26                                               ` Jeff Mahoney
  2016-09-02 19:25                                                 ` Ronan Arraes Jardim Chagas
@ 2016-09-02 19:56                                                 ` Ronan Arraes Jardim Chagas
  2016-09-02 21:34                                                   ` Chris Murphy
  1 sibling, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-02 19:56 UTC (permalink / raw)
  To: Jeff Mahoney, Chris Murphy, Austin S. Hemmelgarn
  Cc: Wang Xiaoguang, Btrfs BTRFS, Qu Wenruo

Hi again guys!

After I rebooted the computer, I still can't run balance on metatada:

btrfs balance start -musage=1 /
ERROR: error during balancing '/': No space left on device
There may be more info in syslog - try dmesg | tail

dmesg shows:

[ 2022.530285] BTRFS info (device sda6): relocating block group
128509280256 flags 36
[ 2023.355206] BTRFS info (device sda6): relocating block group
127972409344 flags 36
[ 2024.265313] BTRFS info (device sda6): relocating block group
127435538432 flags 36
[ 2025.646712] BTRFS info (device sda6): relocating block group
126898667520 flags 36
[ 2026.794791] BTRFS info (device sda6): relocating block group
126361796608 flags 36
[ 2028.023517] BTRFS info (device sda6): relocating block group
125824925696 flags 36
[ 2028.881287] BTRFS info (device sda6): relocating block group
125288054784 flags 36
[ 2029.739342] BTRFS info (device sda6): relocating block group
124751183872 flags 36
[ 2030.631990] BTRFS info (device sda6): relocating block group
124214312960 flags 36
[ 2031.523176] BTRFS info (device sda6): relocating block group
123677442048 flags 36
[ 2032.407859] BTRFS info (device sda6): relocating block group
123140571136 flags 36
[ 2033.806672] BTRFS info (device sda6): relocating block group
122603700224 flags 36
[ 2035.237712] BTRFS info (device sda6): relocating block group
122066829312 flags 36
[ 2038.257268] BTRFS info (device sda6): relocating block group
122033274880 flags 34
[ 2039.911443] BTRFS info (device sda6): relocating block group
121496403968 flags 36
[ 2040.958106] BTRFS info (device sda6): relocating block group
120959533056 flags 36
[ 2041.841051] BTRFS info (device sda6): relocating block group
120422662144 flags 36
[ 2042.828359] BTRFS info (device sda6): relocating block group
119885791232 flags 36
[ 2044.297744] BTRFS info (device sda6): relocating block group
119348920320 flags 36
[ 2045.684932] BTRFS info (device sda6): relocating block group
118812049408 flags 36
[ 2046.761787] BTRFS info (device sda6): relocating block group
118275178496 flags 36
[ 2048.200756] BTRFS info (device sda6): relocating block group
117738307584 flags 36
[ 2049.806986] BTRFS info (device sda6): relocating block group
117201436672 flags 36
[ 2051.170470] BTRFS info (device sda6): relocating block group
116664565760 flags 36
[ 2051.910536] BTRFS info (device sda6): relocating block group
116127694848 flags 36
[ 2052.678395] BTRFS info (device sda6): relocating block group
115590823936 flags 36
[ 2053.737959] BTRFS info (device sda6): relocating block group
106363355136 flags 36
[ 2054.852065] BTRFS info (device sda6): relocating block group
105826484224 flags 36
[ 2055.911187] BTRFS info (device sda6): relocating block group
105222504448 flags 36
[ 2057.047407] BTRFS info (device sda6): 4 enospc errors during balance

and I have:

btrfs fi usage /
Overall:
    Device size:		   1.26TiB
    Device allocated:		  80.07GiB
    Device unallocated:		   1.18TiB
    Device missing:		     0.00B
    Used:			  41.95GiB
    Free (estimated):		   1.18TiB	(min: 603.95GiB)
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 352.00MiB	(used: 576.00KiB)

Data,single: Size:40.01GiB, Used:39.95GiB
   /dev/sda6	  40.01GiB

Metadata,DUP: Size:20.00GiB, Used:1.00GiB
   /dev/sda6	  40.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/sda6	  64.00MiB

Unallocated:
   /dev/sda6	   1.18TiB

Hope this brings new information!

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-02 19:56                                                 ` Ronan Arraes Jardim Chagas
@ 2016-09-02 21:34                                                   ` Chris Murphy
  2016-09-02 22:13                                                     ` Ronan Arraes Jardim Chagas
  0 siblings, 1 reply; 82+ messages in thread
From: Chris Murphy @ 2016-09-02 21:34 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas
  Cc: Jeff Mahoney, Chris Murphy, Austin S. Hemmelgarn, Wang Xiaoguang,
	Btrfs BTRFS, Qu Wenruo

On Fri, Sep 2, 2016 at 1:56 PM, Ronan Arraes Jardim Chagas
<ronisbr@gmail.com> wrote:
> Hi again guys!
>
> After I rebooted the computer, I still can't run balance on metatada:

Except for your software build case, I have about the same workload
you have with two machines, one SSD one HDD, using 4.7.0 for a month,
and then 4.7.2 for the last week. I haven't had any enospc on these
two systems.

I think for you the path of least resistance that also permits further
testing is to see if you can track down the leap 42.2 beta kernel
which is 4.4.19-1-default. I'm not easily finding that particular one,
but I did find something a bit more recent:
http://download.opensuse.org/repositories/Kernel:/openSUSE-42.2/standard/x86_64/

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-02 21:34                                                   ` Chris Murphy
@ 2016-09-02 22:13                                                     ` Ronan Arraes Jardim Chagas
  2016-09-02 22:39                                                       ` Chris Murphy
  0 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-02 22:13 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Jeff Mahoney, Austin S. Hemmelgarn, Wang Xiaoguang, Btrfs BTRFS,
	Qu Wenruo

Hi!

Em Sex, 2016-09-02 às 15:34 -0600, Chris Murphy escreveu:
> Except for your software build case, I have about the same workload
> you have with two machines, one SSD one HDD, using 4.7.0 for a month,
> and then 4.7.2 for the last week. I haven't had any enospc on these
> two systems.
> 
> I think for you the path of least resistance that also permits
> further
> testing is to see if you can track down the leap 42.2 beta kernel
> which is 4.4.19-1-default. I'm not easily finding that particular
> one,
> but I did find something a bit more recent:
> http://download.opensuse.org/repositories/Kernel:/openSUSE-42.2/stand
> ard/x86_64/

Unfortunately, it will not be possible since my actual hardware depends
on kernel >= 4.6 :(

Just now, I saw the problem again. For the first time, it happened
twice in a small period. I was copying the e-mail from one IMAP server
to my local HD. I use offlineimap, but this time it changed the backend
to sqlite and started to create tons of database files, I think. My HDD
IO stayed at 60/70% for a very long period.

Hence, let's do a review of situations in which I saw the problem:

1) Local builds using `osc`;
2) During `zypper dup`;
3) When offlineimap created tons of database files;
4) During rsync-ing /home;
4) During usage of a virtual machine (the disk image was in an EXT4
partition).

I think we can conclude that this problem is tightly coupled with
actions that require a lot of writing to the HDD. Here is the
specification of my HDD:

hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
	Model Number:       ST2000DM001-1CH164                      
	Serial Number:      W1E73CF5            
	Firmware Revision:  HP34    
	Transport:          Serial, SATA 1.0a, SATA II Extensions, SATA
Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
	Used: unknown (minor revision code 0x001f) 
	Supported: 9 8 7 6 5 
	Likely used: 9
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  268435455
	LBA48  user addressable sectors: 3907029168
	Logical  Sector size:                   512 bytes
	Physical Sector size:                  4096 bytes
	Logical Sector-0 offset:                  0 bytes
	device size with M = 1024*1024:     1907729 MBytes
	device size with M = 1000*1000:     2000398 MBytes (2000 GB)
	cache/buffer size  = unknown
	Form Factor: 3.5 inch
	Nominal Media Rotation Rate: 7200
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, no device specific
minimum
	R/W multiple sector transfer: Max = 16	Current = ?
	Advanced power management level: 128
	Recommended acoustic management value: 208, current value: 0
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow
control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	DOWNLOAD_MICROCODE
	   *	Advanced Power Management feature set
	    	Power-Up In Standby feature set
	   *	SET_FEATURES required to spinup after power up
	   *	48-bit Address feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART error logging
	   *	SMART self-test
	   *	General Purpose Logging feature set
	   *	64-bit World wide name
	   *	WRITE_UNCORRECTABLE_EXT command
	   *	{READ,WRITE}_DMA_EXT_GPL commands
	   *	Segmented DOWNLOAD_MICROCODE
	   *	Gen1 signaling speed (1.5Gb/s)
	   *	Gen2 signaling speed (3.0Gb/s)
	   *	Gen3 signaling speed (6.0Gb/s)
	   *	Native Command Queueing (NCQ)
	   *	Phy event counters
	   *	READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
	   *	DMA Setup Auto-Activate optimization
	    	Device-initiated interface power management
	   *	Software settings preservation
	   *	SMART Command Transport (SCT) feature set
	   *	SCT Read/Write Long (AC1), obsolete
	   *	SCT Error Recovery Control (AC3)
	   *	SCT Features Control (AC4)
	   *	SCT Data Tables (AC5)
	    	unknown 206[12] (vendor specific)
	    	unknown 206[13] (vendor specific)
Security: 
	Master password revision code = 65534
		supported
	not	enabled
	not	locked
	not	frozen
	not	expired: security count
		supported: enhanced erase
	212min for SECURITY ERASE UNIT. 212min for ENHANCED SECURITY
ERASE UNIT. 
Logical Unit WWN Device Identifier: 5000c50072f7ce86
	NAA		: 5
	IEEE OUI	: 000c50
	Unique ID	: 072f7ce86
Checksum: correct

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-02 22:13                                                     ` Ronan Arraes Jardim Chagas
@ 2016-09-02 22:39                                                       ` Chris Murphy
  2016-09-03  2:47                                                         ` Ronan Arraes Jardim Chagas
  0 siblings, 1 reply; 82+ messages in thread
From: Chris Murphy @ 2016-09-02 22:39 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas
  Cc: Chris Murphy, Jeff Mahoney, Austin S. Hemmelgarn, Wang Xiaoguang,
	Btrfs BTRFS, Qu Wenruo

On Fri, Sep 2, 2016 at 4:13 PM, Ronan Arraes Jardim Chagas
<ronisbr@gmail.com> wrote:
> Hi!
>
> Em Sex, 2016-09-02 às 15:34 -0600, Chris Murphy escreveu:
>> Except for your software build case, I have about the same workload
>> you have with two machines, one SSD one HDD, using 4.7.0 for a month,
>> and then 4.7.2 for the last week. I haven't had any enospc on these
>> two systems.
>>
>> I think for you the path of least resistance that also permits
>> further
>> testing is to see if you can track down the leap 42.2 beta kernel
>> which is 4.4.19-1-default. I'm not easily finding that particular
>> one,
>> but I did find something a bit more recent:
>> http://download.opensuse.org/repositories/Kernel:/openSUSE-42.2/stand
>> ard/x86_64/
>
> Unfortunately, it will not be possible since my actual hardware depends
> on kernel >= 4.6 :(

Worth a shot, considering the opensuse/SLE 4.4 kernel has a shittonne
of backports. It seems unlikely to me opensuse intends to not support
your hardware (skylake?)



>
> Just now, I saw the problem again. For the first time, it happened
> twice in a small period. I was copying the e-mail from one IMAP server
> to my local HD. I use offlineimap, but this time it changed the backend
> to sqlite and started to create tons of database files, I think. My HDD
> IO stayed at 60/70% for a very long period.
>
> Hence, let's do a review of situations in which I saw the problem:
>
> 1) Local builds using `osc`;
> 2) During `zypper dup`;
> 3) When offlineimap created tons of database files;
> 4) During rsync-ing /home;
> 4) During usage of a virtual machine (the disk image was in an EXT4
> partition).

I don't think there's anything remarkable about any of these. And I
even do VM stuff on Btrfs. I also don't think it's the drive.

What it sounds like is possible, is the file system is now in some
kind of weird metadata state and it keeps tripping up on that. There
may be more than one bug going on, one that gets it into this state,
and then one that face plants with enospc when it's encountered.




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-02 22:39                                                       ` Chris Murphy
@ 2016-09-03  2:47                                                         ` Ronan Arraes Jardim Chagas
  2016-09-03  3:41                                                           ` Chris Murphy
  0 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-03  2:47 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Jeff Mahoney, Austin S. Hemmelgarn, Wang Xiaoguang, Btrfs BTRFS,
	Qu Wenruo

Hi guys!

Em Sex, 2016-09-02 às 16:39 -0600, Chris Murphy escreveu:
> Worth a shot, considering the opensuse/SLE 4.4 kernel has a shittonne
> of backports. It seems unlikely to me opensuse intends to not support
> your hardware (skylake?)

Actually it is a peripheral we use to program embedded systems here and
the (proprietary) driver requires kernel >= 4.6. I barely use it. I am
really thinking to transfer it to another machine just to be able to
change my kernel.

I will post here one thing I already posted on openSUSE mailing list:

I think I forgot to mention one very important thing: I have been using
Tumbleweed+BTRFS on this machine for a very very very long time. I
think I installed it just after it changed to the current model. By
that time, I was using the same machine but without one peripheral that
requires a "new" kernel (HDD, processor, RAM, everything was the same).
AFAIK, the first time I saw that problem was this year. So, I think it
must be a regression after some kernel / btrfs-progs update.

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-03  2:47                                                         ` Ronan Arraes Jardim Chagas
@ 2016-09-03  3:41                                                           ` Chris Murphy
  2016-09-03  3:47                                                             ` Ronan Arraes Jardim Chagas
  0 siblings, 1 reply; 82+ messages in thread
From: Chris Murphy @ 2016-09-03  3:41 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas
  Cc: Chris Murphy, Jeff Mahoney, Austin S. Hemmelgarn, Wang Xiaoguang,
	Btrfs BTRFS, Qu Wenruo

On Fri, Sep 2, 2016 at 8:47 PM, Ronan Arraes Jardim Chagas
<ronisbr@gmail.com> wrote:
> Hi guys!
>
> Em Sex, 2016-09-02 às 16:39 -0600, Chris Murphy escreveu:
>> Worth a shot, considering the opensuse/SLE 4.4 kernel has a shittonne
>> of backports. It seems unlikely to me opensuse intends to not support
>> your hardware (skylake?)
>
> Actually it is a peripheral we use to program embedded systems here and
> the (proprietary) driver requires kernel >= 4.6. I barely use it. I am
> really thinking to transfer it to another machine just to be able to
> change my kernel.

I suggest removing the hardware, and the proprietary driver, and
retest the system with the existing Tumbleweed 4.7.0 kernel; and if
that still fails, then try the Leap 4.4 kernel.

Proprietary kernels can do all kinds of crazy things they shouldn't so
it's entirely possible that driver is a factor in the problem.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-03  3:41                                                           ` Chris Murphy
@ 2016-09-03  3:47                                                             ` Ronan Arraes Jardim Chagas
  2016-09-03  4:14                                                               ` Chris Murphy
  0 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-03  3:47 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Jeff Mahoney, Austin S. Hemmelgarn, Wang Xiaoguang, Btrfs BTRFS,
	Qu Wenruo

Hi Chris,

Em Sex, 2016-09-02 às 21:41 -0600, Chris Murphy escreveu:
> I suggest removing the hardware, and the proprietary driver, and
> retest the system with the existing Tumbleweed 4.7.0 kernel; and if
> that still fails, then try the Leap 4.4 kernel.
> 
> Proprietary kernels can do all kinds of crazy things they shouldn't
> so
> it's entirely possible that driver is a factor in the problem.

Actually it is just a module that I load. It is only loaded when I need
to work with it. However, I can assure this is not the problem because
I installed the board one month ago +-, but I have been seeing ENOSPC
since the beginning of the year IIRC. I am using Tumbleweed default
kernel right now, but I just can try Leap when 42.2 is released.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-03  3:47                                                             ` Ronan Arraes Jardim Chagas
@ 2016-09-03  4:14                                                               ` Chris Murphy
  0 siblings, 0 replies; 82+ messages in thread
From: Chris Murphy @ 2016-09-03  4:14 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas; +Cc: Btrfs BTRFS

On Fri, Sep 2, 2016 at 9:47 PM, Ronan Arraes Jardim Chagas
<ronisbr@gmail.com> wrote:
> Hi Chris,
>
> Em Sex, 2016-09-02 às 21:41 -0600, Chris Murphy escreveu:
>> I suggest removing the hardware, and the proprietary driver, and
>> retest the system with the existing Tumbleweed 4.7.0 kernel; and if
>> that still fails, then try the Leap 4.4 kernel.
>>
>> Proprietary kernels can do all kinds of crazy things they shouldn't
>> so
>> it's entirely possible that driver is a factor in the problem.
>
> Actually it is just a module that I load. It is only loaded when I need
> to work with it. However, I can assure this is not the problem because
> I installed the board one month ago +-, but I have been seeing ENOSPC
> since the beginning of the year IIRC. I am using Tumbleweed default
> kernel right now, but I just can try Leap when 42.2 is released.

If you want a work around sooner than later, pick up one of the latest
Leap 42.2 kernels from the URL I provided, I haven't tried it but it
ought to work. Leap 42.2 isn't going to be released for another 2.5
months.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-02 19:25                                                 ` Ronan Arraes Jardim Chagas
@ 2016-09-05  8:49                                                   ` Qu Wenruo
  2016-09-08 18:24                                                     ` Ronan Arraes Jardim Chagas
  0 siblings, 1 reply; 82+ messages in thread
From: Qu Wenruo @ 2016-09-05  8:49 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Jeff Mahoney, Chris Murphy,
	Austin S. Hemmelgarn
  Cc: Wang Xiaoguang, Btrfs BTRFS

Just like what Wang has mentioned, would you please paste all the output 
of the contents of /sys/fs/btrfs/<your fs uuid>/allocation?

It's recommended to use "grep . -IR <path>" to get all the data as it 
will show the file name.

Thanks,
Qu

At 09/03/2016 03:25 AM, Ronan Arraes Jardim Chagas wrote:
> Hi guys!
>
> Jeff was right. I had the problem again today and quotas are disabled
> now. I couldn't get any useful message in log this time. Look at the
> metadata:
>
> btrfs fi usage /
> Overall:
>     Device size:		   1.26TiB
>     Device allocated:		  43.07GiB
>     Device unallocated:		   1.21TiB
>     Device missing:		     0.00B
>     Used:			  41.94GiB
>     Free (estimated):		   1.21TiB	(min: 622.46GiB)
>     Data ratio:			      1.00
>     Metadata ratio:		      2.00
>     Global reserve:		 352.00MiB	(used: 0.00B)
>
> Data,single: Size:40.01GiB, Used:39.94GiB
>    /dev/sda6	  40.01GiB
>
> Metadata,DUP: Size:1.50GiB, Used:1.00GiB
>    /dev/sda6	   3.00GiB
>
> System,DUP: Size:32.00MiB, Used:16.00KiB
>    /dev/sda6	  64.00MiB
>
> Unallocated:
>    /dev/sda6	   1.21TiB
>
> Any ideas to help me?
>
> Regards,
> Ronan Arraes
>
>



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-05  8:49                                                   ` Qu Wenruo
@ 2016-09-08 18:24                                                     ` Ronan Arraes Jardim Chagas
  2016-09-08 18:49                                                       ` Jeff Mahoney
  0 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-08 18:24 UTC (permalink / raw)
  To: Qu Wenruo, Jeff Mahoney, Chris Murphy, Austin S. Hemmelgarn
  Cc: Wang Xiaoguang, Btrfs BTRFS

Hi all!

Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:
> Just like what Wang has mentioned, would you please paste all the
> output 
> of the contents of /sys/fs/btrfs/<your fs uuid>/allocation?
> 
> It's recommended to use "grep . -IR <path>" to get all the data as
> it 
> will show the file name.

So, one more time, I see the problem. This time I was just using
Firefox and I cannot recover using `btrfs balance`. I think that, one
more time, I will need to reboot this machine. This problem is really
causing me a lot of troubles :(

I have disabled the quotas and the first error message after the
problem was:

[ 2444.592255] ------------[ cut here ]------------
[ 2444.592314] WARNING: CPU: 4 PID: 289 at ../fs/btrfs/extent-
tree.c:4303 btrfs_free_reserved_data_space_noquota+0xfe/0x110 [btrfs]
[ 2444.592317] Modules linked in: fuse nf_log_ipv6 xt_pkttype
nf_log_ipv4 nf_log_common xt_LOG xt_limit af_packet iscsi_ibft
iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6 xt_tcpudp
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw nvidia_drm(PO) ipt_REJECT
nf_reject_ipv4 snd_hda_codec_hdmi nvidia_modeset(PO) intel_rapl sb_edac
edac_core x86_pkg_temp_thermal intel_powerclamp nvidia(PO) coretemp
snd_hda_codec_realtek iTCO_wdt snd_hda_codec_generic iptable_raw
drm_kms_helper snd_hda_intel drm xt_CT snd_hda_codec snd_hda_core
snd_hwdep kvm_intel snd_pcm snd_timer joydev mei_wdt fb_sys_fops
iTCO_vendor_support i2c_i801 lpc_ich kvm syscopyarea snd sysfillrect
irqbypass mei_me hp_wmi sysimgblt iptable_filter crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul
glue_helper ablk_helper
[ 2444.592386]  cryptd soundcore mei sparse_keymap rfkill e1000e shpchp
pcspkr ioatdma mfd_core tpm_infineon tpm_tis dca tpm fjes ptp pps_core
ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast
nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack
ip6table_filter ip6_tables x_tables btrfs xor raid6_pq hid_generic
usbhid crc32c_intel serio_raw xhci_pci ehci_pci xhci_hcd ehci_hcd
firewire_ohci sr_mod firewire_core cdrom crc_itu_t usbcore isci
usb_common libsas ata_generic mpt3sas raid_class scsi_transport_sas wmi
button sg
[ 2444.592447] CPU: 4 PID: 289 Comm: kworker/u65:7 Tainted:
P        W  O    4.7.1-1-default #1
[ 2444.592450] Hardware name: Hewlett-Packard HP Z820 Workstation/158B,
BIOS J63 v03.65 12/19/2013
[ 2444.592458] Workqueue: writeback wb_workfn (flush-btrfs-1)
[ 2444.592462]  0000000000000000 ffffffff81393104 0000000000000000
0000000000000000
[ 2444.592468]  ffffffff8107ca1e ffff88080de6d800 0000000000009000
ffff88080c437a00
[ 2444.592472]  ffff880634b379ac 0000000000009000 ffff88080dcfb73c
ffffffffa02af98e
[ 2444.592477] Call Trace:
[ 2444.592499]  [<ffffffff8102ed5e>] dump_trace+0x5e/0x320
[ 2444.592507]  [<ffffffff8102f12c>] show_stack_log_lvl+0x10c/0x180
[ 2444.592514]  [<ffffffff8102fe41>] show_stack+0x21/0x40
[ 2444.592523]  [<ffffffff81393104>] dump_stack+0x5c/0x78
[ 2444.592531]  [<ffffffff8107ca1e>] __warn+0xbe/0xe0
[ 2444.592561]  [<ffffffffa02af98e>]
btrfs_free_reserved_data_space_noquota+0xfe/0x110 [btrfs]
[ 2444.592602]  [<ffffffffa02cc036>] btrfs_clear_bit_hook+0x296/0x380
[btrfs]
[ 2444.592642]  [<ffffffffa02e9755>] clear_state_bit+0x55/0x1d0 [btrfs]
[ 2444.592676]  [<ffffffffa02e9a0d>] __clear_extent_bit+0x13d/0x3f0
[btrfs]
[ 2444.592707]  [<ffffffffa02ea8d2>]
extent_clear_unlock_delalloc+0x62/0x280 [btrfs]
[ 2444.592739]  [<ffffffffa02d1c19>] cow_file_range+0x299/0x440 [btrfs]
[ 2444.592768]  [<ffffffffa02d2cf2>] run_delalloc_range+0x392/0x3b0
[btrfs]
[ 2444.592801]  [<ffffffffa02eb090>]
writepage_delalloc.isra.40+0x100/0x170 [btrfs]
[ 2444.592834]  [<ffffffffa02ed9d3>] __extent_writepage+0xc3/0x340
[btrfs]
[ 2444.592864]  [<ffffffffa02ede8b>]
extent_write_cache_pages.isra.36.constprop.53+0x23b/0x350 [btrfs]
[ 2444.592894]  [<ffffffffa02ee4fe>] extent_writepages+0x4e/0x60
[btrfs]
[ 2444.592900]  [<ffffffff8123c64d>]
__writeback_single_inode+0x3d/0x3b0
[ 2444.592907]  [<ffffffff8123ce8a>] writeback_sb_inodes+0x20a/0x440
[ 2444.592914]  [<ffffffff8123d147>] __writeback_inodes_wb+0x87/0xb0
[ 2444.592921]  [<ffffffff8123d49d>] wb_writeback+0x28d/0x330
[ 2444.592927]  [<ffffffff8123dbe2>] wb_workfn+0x222/0x3f0
[ 2444.592934]  [<ffffffff810950ed>] process_one_work+0x1ed/0x4e0
[ 2444.592942]  [<ffffffff81095427>] worker_thread+0x47/0x4c0
[ 2444.592947]  [<ffffffff8109affd>] kthread+0xbd/0xe0
[ 2444.592954]  [<ffffffff816bb71f>] ret_from_fork+0x1f/0x40
[ 2444.596679] DWARF2 unwinder stuck at ret_from_fork+0x1f/0x40

[ 2444.596683] Leftover inexact backtrace:

[ 2444.596689]  [<ffffffff8109af40>] ? kthread_worker_fn+0x170/0x170

I will also provide the information requested by Qu:

grep . -IR /sys/fs/btrfs/e9efaa0c-d477-4249-830f-
ee5956768b29/allocation
allocation/data/flags:1
allocation/data/bytes_pinned:0
allocation/data/bytes_may_use:0
allocation/data/total_bytes_pinned:202973265920
allocation/data/bytes_reserved:0
allocation/data/bytes_used:45623730176
allocation/data/single/used_bytes:45623730176
allocation/data/single/total_bytes:46179287040
allocation/data/total_bytes:46179287040
allocation/data/disk_total:46179287040
allocation/data/disk_used:45623730176
allocation/metadata/dup/used_bytes:1120698368
allocation/metadata/dup/total_bytes:6979321856
allocation/metadata/flags:4
allocation/metadata/bytes_pinned:0
allocation/metadata/bytes_may_use:88521768960
allocation/metadata/total_bytes_pinned:-44285952
allocation/metadata/bytes_reserved:0
allocation/metadata/bytes_used:1120698368
allocation/metadata/total_bytes:6979321856
allocation/metadata/disk_total:13958643712
allocation/metadata/disk_used:2241396736
allocation/global_rsv_size:385875968
allocation/global_rsv_reserved:385875968
allocation/system/dup/used_bytes:16384
allocation/system/dup/total_bytes:33554432
allocation/system/flags:2
allocation/system/bytes_pinned:0
allocation/system/bytes_may_use:0
allocation/system/total_bytes_pinned:0
allocation/system/bytes_reserved:0
allocation/system/bytes_used:16384
allocation/system/total_bytes:33554432
allocation/system/disk_total:67108864
allocation/system/disk_used:32768

Additional information:

btrfs fi usage /
Overall:
    Device size:		   1.26TiB
    Device allocated:		  56.07GiB
    Device unallocated:		   1.20TiB
    Device missing:		     0.00B
    Used:			  44.58GiB
    Free (estimated):		   1.20TiB	(min: 616.41GiB)
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 368.00MiB	(used: 0.00B)

Data,single: Size:43.01GiB, Used:42.49GiB
   /dev/sda6	  43.01GiB

Metadata,DUP: Size:6.50GiB, Used:1.04GiB
   /dev/sda6	  13.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/sda6	  64.00MiB

Unallocated:
   /dev/sda6	   1.20TiB

Can anyone help me?

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-08 18:24                                                     ` Ronan Arraes Jardim Chagas
@ 2016-09-08 18:49                                                       ` Jeff Mahoney
  2016-09-08 23:02                                                         ` Jeff Mahoney
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff Mahoney @ 2016-09-08 18:49 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Qu Wenruo, Chris Murphy,
	Austin S. Hemmelgarn
  Cc: Wang Xiaoguang, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 4191 bytes --]

On 9/8/16 2:24 PM, Ronan Arraes Jardim Chagas wrote:
> Hi all!
> 
> Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:
>> Just like what Wang has mentioned, would you please paste all the
>> output 
>> of the contents of /sys/fs/btrfs/<your fs uuid>/allocation?
>>
>> It's recommended to use "grep . -IR <path>" to get all the data as
>> it 
>> will show the file name.
> 
> So, one more time, I see the problem. This time I was just using
> Firefox and I cannot recover using `btrfs balance`. I think that, one
> more time, I will need to reboot this machine. This problem is really
> causing me a lot of troubles :(

I have a hunch the list is about to be flooded with similar reports if
we don't find this one before 4.8.

commit d555b6c380c644af63dbdaa7cc14bba041a4e4dd
Author: Josef Bacik <jbacik@fb.com>
Date:   Fri Mar 25 13:25:51 2016 -0400

    Btrfs: warn_on for unaccounted spaces

This commit isn't the source of the bug, but it's making it a lot more
noisy.  I spent a few hours last night trying to track down why xfstests
was throwing these warnings and I was able to reproduce them at least as
far back as 4.4-vanilla with -oenospc_debug enabled.

Speaking of which, can you turn on mounting with -oenospc_debug if you
haven't already?

In my case, space_info->bytes_may_use was getting accounted incorrectly.

I am able to reproduce that even with the following commit:
commit 18513091af9483ba84328d42092bd4d42a3c958f
Author: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Date:   Mon Jul 25 15:51:40 2016 +0800

    btrfs: update btrfs_space_info's bytes_may_use timely


> grep . -IR /sys/fs/btrfs/e9efaa0c-d477-4249-830f-
> ee5956768b29/allocation
> allocation/data/flags:1
> allocation/data/bytes_pinned:0
> allocation/data/bytes_may_use:0
> allocation/data/total_bytes_pinned:202973265920

That adds up to ~ 189 GB.  total_bytes is only about 42 GB.

> allocation/data/bytes_reserved:0
> allocation/data/bytes_used:45623730176
> allocation/data/single/used_bytes:45623730176
> allocation/data/single/total_bytes:46179287040
> allocation/data/total_bytes:46179287040
> allocation/data/disk_total:46179287040
> allocation/data/disk_used:45623730176
> allocation/metadata/dup/used_bytes:1120698368
> allocation/metadata/dup/total_bytes:6979321856
> allocation/metadata/flags:4
> allocation/metadata/bytes_pinned:0
> allocation/metadata/bytes_may_use:88521768960
> allocation/metadata/total_bytes_pinned:-44285952

... well that's certainly interesting.  It looks like we'll need to see
how that happened.  It seems like we've messed up at least that portion
of accounting.

-Jeff

> allocation/metadata/bytes_reserved:0
> allocation/metadata/bytes_used:1120698368
> allocation/metadata/total_bytes:6979321856
> allocation/metadata/disk_total:13958643712
> allocation/metadata/disk_used:2241396736
> allocation/global_rsv_size:385875968
> allocation/global_rsv_reserved:385875968
> allocation/system/dup/used_bytes:16384
> allocation/system/dup/total_bytes:33554432
> allocation/system/flags:2
> allocation/system/bytes_pinned:0
> allocation/system/bytes_may_use:0
> allocation/system/total_bytes_pinned:0
> allocation/system/bytes_reserved:0
> allocation/system/bytes_used:16384
> allocation/system/total_bytes:33554432
> allocation/system/disk_total:67108864
> allocation/system/disk_used:32768
> 
> Additional information:
> 
> btrfs fi usage /
> Overall:
>     Device size:		   1.26TiB
>     Device allocated:		  56.07GiB
>     Device unallocated:		   1.20TiB
>     Device missing:		     0.00B
>     Used:			  44.58GiB
>     Free (estimated):		   1.20TiB	(min: 616.41GiB)
>     Data ratio:			      1.00
>     Metadata ratio:		      2.00
>     Global reserve:		 368.00MiB	(used: 0.00B)
> 
> Data,single: Size:43.01GiB, Used:42.49GiB
>    /dev/sda6	  43.01GiB
> 
> Metadata,DUP: Size:6.50GiB, Used:1.04GiB
>    /dev/sda6	  13.00GiB
> 
> System,DUP: Size:32.00MiB, Used:16.00KiB
>    /dev/sda6	  64.00MiB
> 
> Unallocated:
>    /dev/sda6	   1.20TiB
> 
> Can anyone help me?
> 
> Best regards,
> Ronan Arraes
> 


-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-08 18:49                                                       ` Jeff Mahoney
@ 2016-09-08 23:02                                                         ` Jeff Mahoney
  2016-09-13 20:24                                                           ` Josef Bacik
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff Mahoney @ 2016-09-08 23:02 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Qu Wenruo, Chris Murphy,
	Austin S. Hemmelgarn
  Cc: Wang Xiaoguang, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 6967 bytes --]

On 9/8/16 2:49 PM, Jeff Mahoney wrote:
> On 9/8/16 2:24 PM, Ronan Arraes Jardim Chagas wrote:
>> Hi all!
>>
>> Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:
>>> Just like what Wang has mentioned, would you please paste all the
>>> output 
>>> of the contents of /sys/fs/btrfs/<your fs uuid>/allocation?
>>>
>>> It's recommended to use "grep . -IR <path>" to get all the data as
>>> it 
>>> will show the file name.
>>
>> So, one more time, I see the problem. This time I was just using
>> Firefox and I cannot recover using `btrfs balance`. I think that, one
>> more time, I will need to reboot this machine. This problem is really
>> causing me a lot of troubles :(
> 
> I have a hunch the list is about to be flooded with similar reports if
> we don't find this one before 4.8.
> 
> commit d555b6c380c644af63dbdaa7cc14bba041a4e4dd
> Author: Josef Bacik <jbacik@fb.com>
> Date:   Fri Mar 25 13:25:51 2016 -0400
> 
>     Btrfs: warn_on for unaccounted spaces
> 
> This commit isn't the source of the bug, but it's making it a lot more
> noisy.  I spent a few hours last night trying to track down why xfstests
> was throwing these warnings and I was able to reproduce them at least as
> far back as 4.4-vanilla with -oenospc_debug enabled.
> 
> Speaking of which, can you turn on mounting with -oenospc_debug if you
> haven't already?
> 
> In my case, space_info->bytes_may_use was getting accounted incorrectly.
> 
> I am able to reproduce that even with the following commit:
> commit 18513091af9483ba84328d42092bd4d42a3c958f
> Author: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> Date:   Mon Jul 25 15:51:40 2016 +0800
> 
>     btrfs: update btrfs_space_info's bytes_may_use timely

And the btrfs_free_reserved_data_space_noquota WARN_ON I was seeing is
fixed by:

commit ed7a6948394305b810d0c6203268648715e5006f
Author: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Date:   Fri Aug 26 11:33:14 2016 +0800

    btrfs: do not decrease bytes_may_use when replaying extents

... which shouldn't change anything for your issue, unfortunately.

I still see these:
WARNING: CPU: 2 PID: 8166 at ../fs/btrfs/extent-tree.c:9582
btrfs_free_block_groups+0x2a8/0x400 [btrfs]()
Modules linked in: loop dm_flakey af_packet iscsi_ibft iscsi_boot_sysfs
msr ext4 crc16 mbcache jbd2 ipmi_ssif dm_mod igb ptp pps_core
acpi_cpufreq tpm_infineon kvm_amd ipmi_si kvm dca pcspkr ipmi_msghandler
8250_fintek sp5100_tco fjes irqbypass i2c_piix4 shpchp processor button
amd64_edac_mod edac_mce_amd edac_core k10temp btrfs xor raid6_pq sd_mod
ata_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect
ohci_pci sysimgblt ehci_pci serio_raw ohci_hcd fb_sys_fops pata_atiixp
ehci_hcd ttm ahci libahci drm usbcore libata usb_common sg scsi_mod autofs4
CPU: 2 PID: 8166 Comm: umount Tainted: G        W
4.4.19-11.g81405db-vanilla #1
Hardware name: HP ProLiant DL165 G7, BIOS O37 10/17/2012
 0000000000000000 ffff880230317d10 ffffffff813170ec 0000000000000000
 ffffffffa0472528 ffff880230317d48 ffffffff8107d816 0000000000000000
 ffff88009ab03600 ffff8800ba106288 ffff8800ab75a000 ffff8800ba106200
Call Trace:
 [<ffffffff813170ec>] dump_stack+0x63/0x87
 [<ffffffff8107d816>] warn_slowpath_common+0x86/0xc0
 [<ffffffff8107d90a>] warn_slowpath_null+0x1a/0x20
 [<ffffffffa03de3a8>] btrfs_free_block_groups+0x2a8/0x400 [btrfs]
 [<ffffffffa03ef24b>] close_ctree+0x15b/0x330 [btrfs]
 [<ffffffffa03bfeb9>] btrfs_put_super+0x19/0x20 [btrfs]
 [<ffffffff811fe5bf>] generic_shutdown_super+0x6f/0x100
 [<ffffffff811fe662>] kill_anon_super+0x12/0x20
 [<ffffffffa03c4fa8>] btrfs_kill_super+0x18/0x120 [btrfs]
 [<ffffffff811fe003>] deactivate_locked_super+0x43/0x70
 [<ffffffff811fe076>] deactivate_super+0x46/0x60
 [<ffffffff81219dcf>] cleanup_mnt+0x3f/0x80
 [<ffffffff81219e62>] __cleanup_mnt+0x12/0x20
 [<ffffffff81099fb6>] task_work_run+0x86/0xb0
 [<ffffffff81078806>] exit_to_usermode_loop+0x73/0xa2
 [<ffffffff81003b2d>] syscall_return_slowpath+0x8d/0xa0
 [<ffffffff815f928c>] int_ret_from_sys_call+0x25/0x8f
---[ end trace 09a0cc2892b6305c ]---
BTRFS: space_info 1 has 7946240 free, is not full
BTRFS: space_info total=8388608, used=442368, pinned=0, reserved=0,
may_use=4096, readonly=0

... where the value of may_use varies.

-Jeff

> 
>> grep . -IR /sys/fs/btrfs/e9efaa0c-d477-4249-830f-
>> ee5956768b29/allocation
>> allocation/data/flags:1
>> allocation/data/bytes_pinned:0
>> allocation/data/bytes_may_use:0
>> allocation/data/total_bytes_pinned:202973265920
> 
> That adds up to ~ 189 GB.  total_bytes is only about 42 GB.
> 
>> allocation/data/bytes_reserved:0
>> allocation/data/bytes_used:45623730176
>> allocation/data/single/used_bytes:45623730176
>> allocation/data/single/total_bytes:46179287040
>> allocation/data/total_bytes:46179287040
>> allocation/data/disk_total:46179287040
>> allocation/data/disk_used:45623730176
>> allocation/metadata/dup/used_bytes:1120698368
>> allocation/metadata/dup/total_bytes:6979321856
>> allocation/metadata/flags:4
>> allocation/metadata/bytes_pinned:0
>> allocation/metadata/bytes_may_use:88521768960
>> allocation/metadata/total_bytes_pinned:-44285952
> 
> ... well that's certainly interesting.  It looks like we'll need to see
> how that happened.  It seems like we've messed up at least that portion
> of accounting.
> 
> -Jeff
> 
>> allocation/metadata/bytes_reserved:0
>> allocation/metadata/bytes_used:1120698368
>> allocation/metadata/total_bytes:6979321856
>> allocation/metadata/disk_total:13958643712
>> allocation/metadata/disk_used:2241396736
>> allocation/global_rsv_size:385875968
>> allocation/global_rsv_reserved:385875968
>> allocation/system/dup/used_bytes:16384
>> allocation/system/dup/total_bytes:33554432
>> allocation/system/flags:2
>> allocation/system/bytes_pinned:0
>> allocation/system/bytes_may_use:0
>> allocation/system/total_bytes_pinned:0
>> allocation/system/bytes_reserved:0
>> allocation/system/bytes_used:16384
>> allocation/system/total_bytes:33554432
>> allocation/system/disk_total:67108864
>> allocation/system/disk_used:32768
>>
>> Additional information:
>>
>> btrfs fi usage /
>> Overall:
>>     Device size:		   1.26TiB
>>     Device allocated:		  56.07GiB
>>     Device unallocated:		   1.20TiB
>>     Device missing:		     0.00B
>>     Used:			  44.58GiB
>>     Free (estimated):		   1.20TiB	(min: 616.41GiB)
>>     Data ratio:			      1.00
>>     Metadata ratio:		      2.00
>>     Global reserve:		 368.00MiB	(used: 0.00B)
>>
>> Data,single: Size:43.01GiB, Used:42.49GiB
>>    /dev/sda6	  43.01GiB
>>
>> Metadata,DUP: Size:6.50GiB, Used:1.04GiB
>>    /dev/sda6	  13.00GiB
>>
>> System,DUP: Size:32.00MiB, Used:16.00KiB
>>    /dev/sda6	  64.00MiB
>>
>> Unallocated:
>>    /dev/sda6	   1.20TiB
>>
>> Can anyone help me?
>>
>> Best regards,
>> Ronan Arraes
>>
> 
> 


-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-12 17:36 BTRFS constantly reports "No space left on device" even with a huge unallocated space Ronan Arraes Jardim Chagas
  2016-08-12 18:02 ` Chris Murphy
  2016-08-29 12:12 ` Wang Xiaoguang
@ 2016-09-13  3:17 ` Wang Xiaoguang
  2016-09-13 12:54   ` Ronan Arraes Jardim Chagas
  2016-09-13 20:49   ` Ronan Arraes Jardim Chagas
  2 siblings, 2 replies; 82+ messages in thread
From: Wang Xiaoguang @ 2016-09-13  3:17 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, linux-btrfs

hello,

On 08/13/2016 01:36 AM, Ronan Arraes Jardim Chagas wrote:
> Hi guys,
>
> I'm facing a daily problem with BTRFS. Almost everyday, I get the
> message "No space left on device". Sometimes I can recover by balancing
> the system but sometimes even balancing does not work due to the lack
> of space. In this case, only a hard reset works if I can't delete some
> files. The problem is that I have a huge unallocated space as you can
> see here:
>
> # btrfs fi usage /
> Overall:
>      Device size:		   1.26TiB
>      Device allocated:		 119.07GiB
>      Device unallocated:		   1.14TiB
>      Device missing:		     0.00B
>      Used:			 115.08GiB
>      Free (estimated):		   1.14TiB	(min: 586.21GiB)
>      Data ratio:			      1.00
>      Metadata ratio:		      2.00
>      Global reserve:		 512.00MiB	(used: 0.00B)
>
> Data,single: Size:113.01GiB, Used:111.19GiB
>     /dev/sda6	 113.01GiB
>
> Metadata,DUP: Size:3.00GiB, Used:1.94GiB
>     /dev/sda6	   6.00GiB
>
> System,DUP: Size:32.00MiB, Used:16.00KiB
>     /dev/sda6	  64.00MiB
>
> Unallocated:
>     /dev/sda6	   1.14TiB
>
> It is not easy to trigger the problem. But I do find some correlation
> between two things:
>
> 1) When I started to create jails to build openSUSE packages locally,
> then the problem happens more often. In these jails, some directories
> like /dev/, /dev/pts, /proc, are mounted inside the jail.
>
> 2) When I open my KVM, I also see this problem more often. Notice,
> however, that the KVM disk is stored in another EXT4 partition.
>
> I would be glad if anyone can help me to fix it. In the following, I'm
> providing more information about my system:
>
> # uname -a
> Linux ronanarraes-osd 4.7.0-1-default #1 SMP PREEMPT Mon Jul 25
> 08:42:47 UTC 2016 (89a2ada) x86_64 x86_64 x86_64 GNU/Linux
>
> # btrfs --version
> btrfs-progs v4.6.1+20160714
>
> # btrfs fi show
> Label: none  uuid: 80381f7f-8cef-4bd8-bdbc-3487253ee566
> 	Total devices 1 FS bytes used 113.13GiB
> 	devid    1 size 1.26TiB used 119.07GiB path /dev/sda6
>
> # btrfs fi df /
> Data, single: total=113.01GiB, used=111.19GiB
> System, DUP: total=32.00MiB, used=16.00KiB
> Metadata, DUP: total=3.00GiB, used=1.94GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> Regards,
> Ronan Arraes
It maybe a irrelevant question, but do you have compression enabled?

Regards,
Xiaoguang Wang

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>




^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-13  3:17 ` Wang Xiaoguang
@ 2016-09-13 12:54   ` Ronan Arraes Jardim Chagas
  2016-09-13 20:49   ` Ronan Arraes Jardim Chagas
  1 sibling, 0 replies; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-13 12:54 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs

Hi!

Em Ter, 2016-09-13 às 11:17 +0800, Wang Xiaoguang escreveu:
> It maybe a irrelevant question, but do you have compression enabled?
> 
> Regards,
> Xiaoguang Wang

No, I do not have compression enabled. I'm using openSUSE's default
configuration.

By the way, I was wrongly mounting the filesystem with `enospc_debug`.
It turns out that I modified the fstab in a backup directory, sorry :)
Now, I did it correctly so, hopefully, we will have much more
information about the problem the next time I see it!

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-08 23:02                                                         ` Jeff Mahoney
@ 2016-09-13 20:24                                                           ` Josef Bacik
  2016-09-14 14:25                                                             ` Jeff Mahoney
  0 siblings, 1 reply; 82+ messages in thread
From: Josef Bacik @ 2016-09-13 20:24 UTC (permalink / raw)
  To: Jeff Mahoney, Ronan Arraes Jardim Chagas, Qu Wenruo,
	Chris Murphy, Austin S. Hemmelgarn
  Cc: Wang Xiaoguang, Btrfs BTRFS

On 09/08/2016 07:02 PM, Jeff Mahoney wrote:
> On 9/8/16 2:49 PM, Jeff Mahoney wrote:
>> On 9/8/16 2:24 PM, Ronan Arraes Jardim Chagas wrote:
>>> Hi all!
>>>
>>> Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:
>>>> Just like what Wang has mentioned, would you please paste all the
>>>> output
>>>> of the contents of /sys/fs/btrfs/<your fs uuid>/allocation?
>>>>
>>>> It's recommended to use "grep . -IR <path>" to get all the data as
>>>> it
>>>> will show the file name.
>>>
>>> So, one more time, I see the problem. This time I was just using
>>> Firefox and I cannot recover using `btrfs balance`. I think that, one
>>> more time, I will need to reboot this machine. This problem is really
>>> causing me a lot of troubles :(
>>
>> I have a hunch the list is about to be flooded with similar reports if
>> we don't find this one before 4.8.
>>
>> commit d555b6c380c644af63dbdaa7cc14bba041a4e4dd
>> Author: Josef Bacik <jbacik@fb.com>
>> Date:   Fri Mar 25 13:25:51 2016 -0400
>>
>>     Btrfs: warn_on for unaccounted spaces
>>
>> This commit isn't the source of the bug, but it's making it a lot more
>> noisy.  I spent a few hours last night trying to track down why xfstests
>> was throwing these warnings and I was able to reproduce them at least as
>> far back as 4.4-vanilla with -oenospc_debug enabled.
>>
>> Speaking of which, can you turn on mounting with -oenospc_debug if you
>> haven't already?
>>
>> In my case, space_info->bytes_may_use was getting accounted incorrectly.
>>
>> I am able to reproduce that even with the following commit:
>> commit 18513091af9483ba84328d42092bd4d42a3c958f
>> Author: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> Date:   Mon Jul 25 15:51:40 2016 +0800
>>
>>     btrfs: update btrfs_space_info's bytes_may_use timely
>
> And the btrfs_free_reserved_data_space_noquota WARN_ON I was seeing is
> fixed by:
>
> commit ed7a6948394305b810d0c6203268648715e5006f
> Author: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> Date:   Fri Aug 26 11:33:14 2016 +0800
>
>     btrfs: do not decrease bytes_may_use when replaying extents
>
> ... which shouldn't change anything for your issue, unfortunately.
>
> I still see these:
> WARNING: CPU: 2 PID: 8166 at ../fs/btrfs/extent-tree.c:9582
> btrfs_free_block_groups+0x2a8/0x400 [btrfs]()
> Modules linked in: loop dm_flakey af_packet iscsi_ibft iscsi_boot_sysfs
> msr ext4 crc16 mbcache jbd2 ipmi_ssif dm_mod igb ptp pps_core
> acpi_cpufreq tpm_infineon kvm_amd ipmi_si kvm dca pcspkr ipmi_msghandler
> 8250_fintek sp5100_tco fjes irqbypass i2c_piix4 shpchp processor button
> amd64_edac_mod edac_mce_amd edac_core k10temp btrfs xor raid6_pq sd_mod
> ata_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect
> ohci_pci sysimgblt ehci_pci serio_raw ohci_hcd fb_sys_fops pata_atiixp
> ehci_hcd ttm ahci libahci drm usbcore libata usb_common sg scsi_mod autofs4
> CPU: 2 PID: 8166 Comm: umount Tainted: G        W
> 4.4.19-11.g81405db-vanilla #1
> Hardware name: HP ProLiant DL165 G7, BIOS O37 10/17/2012
>  0000000000000000 ffff880230317d10 ffffffff813170ec 0000000000000000
>  ffffffffa0472528 ffff880230317d48 ffffffff8107d816 0000000000000000
>  ffff88009ab03600 ffff8800ba106288 ffff8800ab75a000 ffff8800ba106200
> Call Trace:
>  [<ffffffff813170ec>] dump_stack+0x63/0x87
>  [<ffffffff8107d816>] warn_slowpath_common+0x86/0xc0
>  [<ffffffff8107d90a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffffa03de3a8>] btrfs_free_block_groups+0x2a8/0x400 [btrfs]
>  [<ffffffffa03ef24b>] close_ctree+0x15b/0x330 [btrfs]
>  [<ffffffffa03bfeb9>] btrfs_put_super+0x19/0x20 [btrfs]
>  [<ffffffff811fe5bf>] generic_shutdown_super+0x6f/0x100
>  [<ffffffff811fe662>] kill_anon_super+0x12/0x20
>  [<ffffffffa03c4fa8>] btrfs_kill_super+0x18/0x120 [btrfs]
>  [<ffffffff811fe003>] deactivate_locked_super+0x43/0x70
>  [<ffffffff811fe076>] deactivate_super+0x46/0x60
>  [<ffffffff81219dcf>] cleanup_mnt+0x3f/0x80
>  [<ffffffff81219e62>] __cleanup_mnt+0x12/0x20
>  [<ffffffff81099fb6>] task_work_run+0x86/0xb0
>  [<ffffffff81078806>] exit_to_usermode_loop+0x73/0xa2
>  [<ffffffff81003b2d>] syscall_return_slowpath+0x8d/0xa0
>  [<ffffffff815f928c>] int_ret_from_sys_call+0x25/0x8f
> ---[ end trace 09a0cc2892b6305c ]---
> BTRFS: space_info 1 has 7946240 free, is not full
> BTRFS: space_info total=8388608, used=442368, pinned=0, reserved=0,
> may_use=4096, readonly=0
>
> ... where the value of may_use varies.
>

What test are you seeing this with?  Thanks,

Josef


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-13  3:17 ` Wang Xiaoguang
  2016-09-13 12:54   ` Ronan Arraes Jardim Chagas
@ 2016-09-13 20:49   ` Ronan Arraes Jardim Chagas
  2016-09-13 21:01     ` Josef Bacik
  1 sibling, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-13 20:49 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs

Hi guys,

One more time I saw the problem. It begins to happen on a daily basis
now. Unfortunately the `enospc_debug` flag did not help. I did not see
any new information in the logs. This time, only a hard reset worked. I
could not even reboot using gnome panel.

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-13 20:49   ` Ronan Arraes Jardim Chagas
@ 2016-09-13 21:01     ` Josef Bacik
  2016-09-14 14:40       ` Ronan Arraes Jardim Chagas
  0 siblings, 1 reply; 82+ messages in thread
From: Josef Bacik @ 2016-09-13 21:01 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Wang Xiaoguang, linux-btrfs

On 09/13/2016 04:49 PM, Ronan Arraes Jardim Chagas wrote:
> Hi guys,
>
> One more time I saw the problem. It begins to happen on a daily basis
> now. Unfortunately the `enospc_debug` flag did not help. I did not see
> any new information in the logs. This time, only a hard reset worked. I
> could not even reboot using gnome panel.

I just started paying attention to this, the last kernel I saw you were running 
was 4.7.  Have you tried a recent kernel, like chris's tree?


git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus-4.8

is what I would like you to try if not.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-13 20:24                                                           ` Josef Bacik
@ 2016-09-14 14:25                                                             ` Jeff Mahoney
  2016-09-19  2:38                                                               ` Wang Xiaoguang
       [not found]                                                               ` <57DF4E44.2040506@cn.fujitsu.com>
  0 siblings, 2 replies; 82+ messages in thread
From: Jeff Mahoney @ 2016-09-14 14:25 UTC (permalink / raw)
  To: Josef Bacik, Ronan Arraes Jardim Chagas, Qu Wenruo, Chris Murphy,
	Austin S. Hemmelgarn
  Cc: Wang Xiaoguang, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 4770 bytes --]

On 9/13/16 10:24 PM, Josef Bacik wrote:
> On 09/08/2016 07:02 PM, Jeff Mahoney wrote:
>> On 9/8/16 2:49 PM, Jeff Mahoney wrote:
>>> On 9/8/16 2:24 PM, Ronan Arraes Jardim Chagas wrote:
>>>> Hi all!
>>>>
>>>> Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:
>>>>> Just like what Wang has mentioned, would you please paste all the
>>>>> output
>>>>> of the contents of /sys/fs/btrfs/<your fs uuid>/allocation?
>>>>>
>>>>> It's recommended to use "grep . -IR <path>" to get all the data as
>>>>> it
>>>>> will show the file name.
>>>>
>>>> So, one more time, I see the problem. This time I was just using
>>>> Firefox and I cannot recover using `btrfs balance`. I think that, one
>>>> more time, I will need to reboot this machine. This problem is really
>>>> causing me a lot of troubles :(
>>>
>>> I have a hunch the list is about to be flooded with similar reports if
>>> we don't find this one before 4.8.
>>>
>>> commit d555b6c380c644af63dbdaa7cc14bba041a4e4dd
>>> Author: Josef Bacik <jbacik@fb.com>
>>> Date:   Fri Mar 25 13:25:51 2016 -0400
>>>
>>>     Btrfs: warn_on for unaccounted spaces
>>>
>>> This commit isn't the source of the bug, but it's making it a lot more
>>> noisy.  I spent a few hours last night trying to track down why xfstests
>>> was throwing these warnings and I was able to reproduce them at least as
>>> far back as 4.4-vanilla with -oenospc_debug enabled.
>>>
>>> Speaking of which, can you turn on mounting with -oenospc_debug if you
>>> haven't already?
>>>
>>> In my case, space_info->bytes_may_use was getting accounted incorrectly.
>>>
>>> I am able to reproduce that even with the following commit:
>>> commit 18513091af9483ba84328d42092bd4d42a3c958f
>>> Author: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>> Date:   Mon Jul 25 15:51:40 2016 +0800
>>>
>>>     btrfs: update btrfs_space_info's bytes_may_use timely
>>
>> And the btrfs_free_reserved_data_space_noquota WARN_ON I was seeing is
>> fixed by:
>>
>> commit ed7a6948394305b810d0c6203268648715e5006f
>> Author: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> Date:   Fri Aug 26 11:33:14 2016 +0800
>>
>>     btrfs: do not decrease bytes_may_use when replaying extents
>>
>> ... which shouldn't change anything for your issue, unfortunately.
>>
>> I still see these:
>> WARNING: CPU: 2 PID: 8166 at ../fs/btrfs/extent-tree.c:9582
>> btrfs_free_block_groups+0x2a8/0x400 [btrfs]()
>> Modules linked in: loop dm_flakey af_packet iscsi_ibft iscsi_boot_sysfs
>> msr ext4 crc16 mbcache jbd2 ipmi_ssif dm_mod igb ptp pps_core
>> acpi_cpufreq tpm_infineon kvm_amd ipmi_si kvm dca pcspkr ipmi_msghandler
>> 8250_fintek sp5100_tco fjes irqbypass i2c_piix4 shpchp processor button
>> amd64_edac_mod edac_mce_amd edac_core k10temp btrfs xor raid6_pq sd_mod
>> ata_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect
>> ohci_pci sysimgblt ehci_pci serio_raw ohci_hcd fb_sys_fops pata_atiixp
>> ehci_hcd ttm ahci libahci drm usbcore libata usb_common sg scsi_mod
>> autofs4
>> CPU: 2 PID: 8166 Comm: umount Tainted: G        W
>> 4.4.19-11.g81405db-vanilla #1
>> Hardware name: HP ProLiant DL165 G7, BIOS O37 10/17/2012
>>  0000000000000000 ffff880230317d10 ffffffff813170ec 0000000000000000
>>  ffffffffa0472528 ffff880230317d48 ffffffff8107d816 0000000000000000
>>  ffff88009ab03600 ffff8800ba106288 ffff8800ab75a000 ffff8800ba106200
>> Call Trace:
>>  [<ffffffff813170ec>] dump_stack+0x63/0x87
>>  [<ffffffff8107d816>] warn_slowpath_common+0x86/0xc0
>>  [<ffffffff8107d90a>] warn_slowpath_null+0x1a/0x20
>>  [<ffffffffa03de3a8>] btrfs_free_block_groups+0x2a8/0x400 [btrfs]
>>  [<ffffffffa03ef24b>] close_ctree+0x15b/0x330 [btrfs]
>>  [<ffffffffa03bfeb9>] btrfs_put_super+0x19/0x20 [btrfs]
>>  [<ffffffff811fe5bf>] generic_shutdown_super+0x6f/0x100
>>  [<ffffffff811fe662>] kill_anon_super+0x12/0x20
>>  [<ffffffffa03c4fa8>] btrfs_kill_super+0x18/0x120 [btrfs]
>>  [<ffffffff811fe003>] deactivate_locked_super+0x43/0x70
>>  [<ffffffff811fe076>] deactivate_super+0x46/0x60
>>  [<ffffffff81219dcf>] cleanup_mnt+0x3f/0x80
>>  [<ffffffff81219e62>] __cleanup_mnt+0x12/0x20
>>  [<ffffffff81099fb6>] task_work_run+0x86/0xb0
>>  [<ffffffff81078806>] exit_to_usermode_loop+0x73/0xa2
>>  [<ffffffff81003b2d>] syscall_return_slowpath+0x8d/0xa0
>>  [<ffffffff815f928c>] int_ret_from_sys_call+0x25/0x8f
>> ---[ end trace 09a0cc2892b6305c ]---
>> BTRFS: space_info 1 has 7946240 free, is not full
>> BTRFS: space_info total=8388608, used=442368, pinned=0, reserved=0,
>> may_use=4096, readonly=0
>>
>> ... where the value of may_use varies.
>>
> 
> What test are you seeing this with?  Thanks,

btrfs/022 hits it every time for me.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-13 21:01     ` Josef Bacik
@ 2016-09-14 14:40       ` Ronan Arraes Jardim Chagas
  0 siblings, 0 replies; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-14 14:40 UTC (permalink / raw)
  To: Josef Bacik, Wang Xiaoguang, linux-btrfs

Hi Josef,

Em Ter, 2016-09-13 às 17:01 -0400, Josef Bacik escreveu:
> I just started paying attention to this, the last kernel I saw you
> were running 
> was 4.7.  Have you tried a recent kernel, like chris's tree?
> 
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
> for-linus-4.8
> 
> is what I would like you to try if not.  Thanks,
> 
> Josef

Unfortunately, since this is a production machine, I am not allowed to
install unreleased kernels. If this is the only solution, I will need
to wait for 4.8 or search if anyone has already backported the BTRFS
patches for 4.7.

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-08-29 12:12 ` Wang Xiaoguang
  2016-08-29 13:20   ` Ronan Arraes Jardim Chagas
  2016-08-29 15:52   ` Ronan Arraes Jardim Chagas
@ 2016-09-14 20:15   ` Ronan Arraes Jardim Chagas
  2016-09-14 22:25     ` Chris Murphy
  2 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-14 20:15 UTC (permalink / raw)
  To: Wang Xiaoguang, linux-btrfs

Hi guys,

The problem happened again, but now it was way more serious. I was
doing a big Tumbleweed update (4680 packages) and I got the ENOSPC
during the update. To avoid being left with a broken system, as it has
already happened in the past, I, unfortunately, needed to delete data
that I really was not planning to. This is a disaster, because I have
more than 1 TiB of **free space**.

After deleting 7GiB of data, I could run rebalance and the update
finished successfully. However, the ENOSPC happened 3 more times (!)
and I always needed to run rebalance to keep the update going.

Sometimes, during the rebalance, I saw the message:

[28736.688266] BTRFS info (device sda6): relocating block group
389998968832 flags 34
[28737.376302] BTRFS info (device sda6): found 4 extents
[28737.712815] BTRFS info (device sda6): relocating block group
343760961536 flags 36
[28738.010030] BTRFS info (device sda6): relocating block group
343224090624 flags 36
[28738.343461] BTRFS info (device sda6): relocating block group
342687219712 flags 36
[28738.660023] BTRFS info (device sda6): relocating block group
342150348800 flags 36
[28738.665241] use_block_rsv: 11 callbacks suppressed
[28738.665247] ------------[ cut here ]------------
[28738.665290] WARNING: CPU: 10 PID: 639 at ../fs/btrfs/extent-
tree.c:8097 btrfs_alloc_tree_block+0x3f1/0x4c0 [btrfs]
[28738.665292] BTRFS: block rsv returned -28
[28738.665295] Modules linked in: dm_mod fuse nf_log_ipv6 xt_pkttype
nf_log_ipv4 nf_log_common xt_LOG xt_limit af_packet iscsi_ibft
iscsi_boot_sysfs msr ip6t_REJECT nf_reject_ipv6 xt_tcpudp
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT nf_reject_ipv4
iptable_raw xt_CT snd_hda_codec_hdmi snd_hda_codec_realtek
nvidia_drm(PO) snd_hda_codec_generic snd_hda_intel nvidia_modeset(PO)
snd_hda_codec snd_hda_core snd_hwdep iptable_filter nvidia(PO) joydev
drm_kms_helper intel_rapl drm fb_sys_fops iTCO_wdt mei_wdt syscopyarea
snd_pcm snd_timer iTCO_vendor_support sysfillrect sb_edac snd i2c_i801
mei_me lpc_ich edac_core sysimgblt ip6table_mangle x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel soundcore mei aes_x86_64
[28738.665359]  lrw gf128mul glue_helper ablk_helper cryptd e1000e
hp_wmi ioatdma fjes nf_conntrack_netbios_ns ptp shpchp pps_core
sparse_keymap pcspkr mfd_core nf_conntrack_broadcast rfkill
tpm_infineon tpm_tis dca tpm nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables
xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables btrfs xor
raid6_pq hid_generic usbhid crc32c_intel serio_raw xhci_pci ehci_pci
sr_mod firewire_ohci xhci_hcd ehci_hcd cdrom firewire_core crc_itu_t
usbcore isci usb_common libsas ata_generic mpt3sas raid_class
scsi_transport_sas wmi button sg
[28738.665419] CPU: 10 PID: 639 Comm: systemd-journal Tainted:
P        W  O    4.7.1-1-default #1
[28738.665421] Hardware name: Hewlett-Packard HP Z820 Workstation/158B,
BIOS J63 v03.65 12/19/2013
[28738.665425]  0000000000000000 ffffffff81393104 ffff88080bc63a68
0000000000000000
[28738.665430]  ffffffff8107ca1e ffff8804eaa73300 ffff88080bc63ab8
0000000000004000
[28738.665434]  0000000000000000 ffff88017be9a000 ffff880f51b31760
ffffffff8107ca8f
[28738.665438] Call Trace:
[28738.665464]  [<ffffffff8102ed5e>] dump_trace+0x5e/0x320
[28738.665472]  [<ffffffff8102f12c>] show_stack_log_lvl+0x10c/0x180
[28738.665478]  [<ffffffff8102fe41>] show_stack+0x21/0x40
[28738.665486]  [<ffffffff81393104>] dump_stack+0x5c/0x78
[28738.665496]  [<ffffffff8107ca1e>] __warn+0xbe/0xe0
[28738.665503]  [<ffffffff8107ca8f>] warn_slowpath_fmt+0x4f/0x60
[28738.665529]  [<ffffffffa029d911>] btrfs_alloc_tree_block+0x3f1/0x4c0
[btrfs]
[28738.665560]  [<ffffffffa02846a2>] btrfs_copy_root+0xf2/0x280 [btrfs]
[28738.665593]  [<ffffffffa02fd141>] create_reloc_root+0x171/0x1e0
[btrfs]
[28738.665623]  [<ffffffffa030316f>] btrfs_init_reloc_root+0x8f/0xa0
[btrfs]
[28738.665652]  [<ffffffffa02ac992>] record_root_in_trans+0xb2/0x110
[btrfs]
[28738.665679]  [<ffffffffa02adb11>]
btrfs_record_root_in_trans+0x41/0x70 [btrfs]
[28738.665704]  [<ffffffffa02afd00>] start_transaction+0xa0/0x4f0
[btrfs]
[28738.665732]  [<ffffffffa02b6153>] btrfs_dirty_inode+0x33/0xc0
[btrfs]
[28738.665741]  [<ffffffff8122aa59>] file_update_time+0x99/0xf0
[28738.665770]  [<ffffffffa02c11a3>] btrfs_page_mkwrite+0xa3/0x450
[btrfs]
[28738.665779]  [<ffffffff811bd2c9>] do_page_mkwrite+0x69/0xc0
[28738.665785]  [<ffffffff811c00f4>] handle_pte_fault+0xf4/0x1760
[28738.665792]  [<ffffffff811c1bfe>] handle_mm_fault+0x29e/0x5a0
[28738.665798]  [<ffffffff81064fc0>] __do_page_fault+0x1e0/0x510
[28738.665809]  [<ffffffff816bd608>] page_fault+0x28/0x30
[28738.669296] DWARF2 unwinder stuck at page_fault+0x28/0x30

[28738.669300] Leftover inexact backtrace:

[28738.669327] ---[ end trace 8ef9cfba38cc9bfc ]---

Look what happened to my METADATA during the update:

1) When the problem occured:

# btrfs fi usage /
Overall:
    Device size:		   1.26TiB
    Device allocated:		  63.07GiB
    Device unallocated:		   1.20TiB
    Device missing:		     0.00B
    Used:			  50.21GiB
    Free (estimated):		   1.20TiB	(min: 612.49GiB)
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 400.00MiB	(used: 0.00B)

Data,single: Size:48.01GiB, Used:47.91GiB
   /dev/sda6	  48.01GiB

Metadata,DUP: Size:7.50GiB, Used:1.15GiB
   /dev/sda6	  15.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/sda6	  64.00MiB

Unallocated:
   /dev/sda6	   1.20TiB

2) After deleting 7GiB of data and run rebalance:

# btrfs fi usage /
Overall:
    Device size:		   1.26TiB
    Device allocated:		 133.07GiB
    Device unallocated:		   1.13TiB
    Device missing:		     0.00B
    Used:			  43.16GiB
    Free (estimated):		   1.13TiB	(min: 584.46GiB)
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 384.00MiB	(used: 0.00B)

Data,single: Size:48.01GiB, Used:40.94GiB
   /dev/sda6	  48.01GiB

Metadata,DUP: Size:42.50GiB, Used:1.11GiB
   /dev/sda6	  85.00GiB

System,DUP: Size:32.00MiB, Used:48.00KiB
   /dev/sda6	  64.00MiB

Unallocated:
   /dev/sda6	   1.13TiB

3) After another rebalance (I saw the ENOSPC again):

# btrfs fi usage /
Overall:
    Device size:		   1.26TiB
    Device allocated:		 207.07GiB
    Device unallocated:		   1.05TiB
    Device missing:		     0.00B
    Used:			  43.87GiB
    Free (estimated):		   1.06TiB	(min: 540.83GiB)
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 400.00MiB	(used: 0.00B)

Data,single: Size:42.01GiB, Used:41.57GiB
   /dev/sda6	  42.01GiB

Metadata,DUP: Size:82.50GiB, Used:1.15GiB
   /dev/sda6	 165.00GiB

System,DUP: Size:32.00MiB, Used:48.00KiB
   /dev/sda6	  64.00MiB

Unallocated:
   /dev/sda6	   1.05TiB

4) After another rebalance (I saw the ENOSPC again):

# btrfs fi usage /
Overall:
    Device size:		   1.26TiB
    Device allocated:		 344.07GiB
    Device unallocated:		 943.79GiB
    Device missing:		     0.00B
    Used:			  44.69GiB
    Free (estimated):		 944.45GiB	(min: 472.55GiB)
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 416.00MiB	(used: 0.00B)

Data,single: Size:43.01GiB, Used:42.34GiB
   /dev/sda6	  43.01GiB

Metadata,DUP: Size:150.50GiB, Used:1.17GiB
   /dev/sda6	 301.00GiB

System,DUP: Size:32.00MiB, Used:80.00KiB
   /dev/sda6	  64.00MiB

Unallocated:
   /dev/sda6	 943.79GiB

Yes, 150 GiB of METADATA, 3x more than my actual data.

This problem is really causing me problems. I am starting to think that
Tumbleweed, at least, should not choose BTRFS as the default file
system, since this distribution is supposed to be stable. I think that
BTRFS has some serious problems at least in kernels 4.6 and 4.7.

I reported this problem more than 1 month ago, and yet nobody could
provide me at least a workaround so I can keep working here. I think
the best will be to format this machine (**again**) and use EXT4 of
XFS, if nobody could help me to fix or avoid this problem in the
following days.

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-14 20:15   ` Ronan Arraes Jardim Chagas
@ 2016-09-14 22:25     ` Chris Murphy
  2016-09-15  0:56       ` Ronan Arraes Jardim Chagas
  0 siblings, 1 reply; 82+ messages in thread
From: Chris Murphy @ 2016-09-14 22:25 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas; +Cc: Wang Xiaoguang, Btrfs BTRFS

All I can think of is the file system has gotten into a unique state
through a combination of events. I'm still suspicious that qgroups is
contributing to the problem even after being disabled. The workload
you're talking about is completely ordinary and trivial.

The openSUSE layout is basically impossible to backup and restore,
there's astrometric tons of snapshots, there's no recursive btrfs
send/receive to try and migrate it to a new file system intact, so
you'd pretty much just have to reinstall it no matter what. If it were
me, reinstall with Btrfs same as now, and first thing before anything
else I'd disable quotas. Or yeah, it's completely reasonable for you
to move to a different file system, it's really a coin toss for ext4
vs XFS, but at least XFS now checksums metadata and the journal by
default so if I thought about it at the time of the installation I'd
do that.

> Look what happened to my METADATA during the update:
>
> 1) When the problem occured:
>
> # btrfs fi usage /

Yeah FWIW, the devs seem to prefer the output from 'grep . -IR
/sys/fs/btrfs/<fsuuid>/allocation/' so for these kinds of problems I'd
report that.

>
> 4) After another rebalance (I saw the ENOSPC again):

> Metadata,DUP: Size:150.50GiB, Used:1.17GiB
>    /dev/sda6     301.00GiB

Yeah holy crap weird.

But the fs is already in some funky state so at this point it's not
surprising it continues to do crazy things. If the devs knew exactly
what was going on, they'd say so. If they had a fix, they'd post it or
at least an ETA. And while ostensibly the enospc work in 4.8 would
work around this problem, it's unknown until it's tested.

If you *really* want to, you could grab a Fedora Rawhide nightly that
has kernel 4.8 rc6 on it, with debug stuff enabled. If it face plants,
it should catch useful stuff for Josef. If it doesn't, maybe it fixes
enough things that you can get back to work for a while longer until a
long term fix becomes available. The only way to know for sure is to
test it. But it's completely sane to just switch to XFS and get back
to work also.

Current
https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20160914.n.0/compose/Everything/x86_64/iso/Fedora-Everything-netinst-x86_64-Rawhide-20160914.n.0.iso.n.0.iso

Use 'dd if=ISO of=USBstick bs=256K' that will boot anything, BIOS or
UEFI. At the menu, choose Troubleshooting, then the Rescue option, at
the next text menu choose 3 to get to a shell. And from there you can
mount with enospc_debug, and do a balance of the file system. To get
logs off the system, use a 2nd USB stick, or if you have wired
ethernet use scp, or if you know nmcli you can maybe get the wireless
up by command line.

> This problem is really causing me problems. I am starting to think that
> Tumbleweed, at least, should not choose BTRFS as the default file
> system, since this distribution is supposed to be stable. I think that
> BTRFS has some serious problems at least in kernels 4.6 and 4.7.
>
> I reported this problem more than 1 month ago, and yet nobody could
> provide me at least a workaround so I can keep working here. I think
> the best will be to format this machine (**again**) and use EXT4 of
> XFS, if nobody could help me to fix or avoid this problem in the
> following days.

Yep, completely reasonable.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-14 22:25     ` Chris Murphy
@ 2016-09-15  0:56       ` Ronan Arraes Jardim Chagas
  0 siblings, 0 replies; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-15  0:56 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Wang Xiaoguang, Btrfs BTRFS

Hi Chris,

Em Qua, 2016-09-14 às 16:25 -0600, Chris Murphy escreveu:
> All I can think of is the file system has gotten into a unique state
> through a combination of events. I'm still suspicious that qgroups is
> contributing to the problem even after being disabled. The workload
> you're talking about is completely ordinary and trivial.

This seems reasonable. However, I formatted the computer and after two
days, if I remember correctly, I started to see the problems again. I'm
still thinking it should be also related to my HDD (7200 RPM). In all
my other computers, everything is fine and I use SSD.

> The openSUSE layout is basically impossible to backup and restore,
> there's astrometric tons of snapshots, there's no recursive btrfs
> send/receive to try and migrate it to a new file system intact, so
> you'd pretty much just have to reinstall it no matter what. If it
> were
> me, reinstall with Btrfs same as now, and first thing before anything
> else I'd disable quotas. Or yeah, it's completely reasonable for you
> to move to a different file system, it's really a coin toss for ext4
> vs XFS, but at least XFS now checksums metadata and the journal by
> default so if I thought about it at the time of the installation I'd
> do that.

Thanks! 

> Yeah FWIW, the devs seem to prefer the output from 'grep . -IR
> /sys/fs/btrfs/<fsuuid>/allocation/' so for these kinds of problems
> I'd
> report that.

Yeah, unfortunately I forgot this one today :(

> If you *really* want to, you could grab a Fedora Rawhide nightly that
> has kernel 4.8 rc6 on it, with debug stuff enabled. If it face
> plants,
> it should catch useful stuff for Josef. If it doesn't, maybe it fixes
> enough things that you can get back to work for a while longer until
> a
> long term fix becomes available. The only way to know for sure is to
> test it. But it's completely sane to just switch to XFS and get back
> to work also.
> 
> Current
> https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-201
> 60914.n.0/compose/Everything/x86_64/iso/Fedora-Everything-netinst-
> x86_64-Rawhide-20160914.n.0.iso.n.0.iso
> 
> Use 'dd if=ISO of=USBstick bs=256K' that will boot anything, BIOS or
> UEFI. At the menu, choose Troubleshooting, then the Rescue option, at
> the next text menu choose 3 to get to a shell. And from there you can
> mount with enospc_debug, and do a balance of the file system. To get
> logs off the system, use a 2nd USB stick, or if you have wired
> ethernet use scp, or if you know nmcli you can maybe get the wireless
> up by command line.

This seems good. However, I just have access to that machine during my
working period, and I just does not have time to test this, sorry :(

Nevertheless, when you mentioned the `dd` command, I had a great idea
that can help me to live with this problem until I have access to
kernel 4.8. I will use `dd` to create, let's say, 100 files with 3 GiB
each in my /home directory. Hence, when I see ENOSPC, I will just need
to delete some of these files. I think this should work.

Thanks for all the advices Chris!

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-14 14:25                                                             ` Jeff Mahoney
@ 2016-09-19  2:38                                                               ` Wang Xiaoguang
  2016-09-22 13:40                                                                 ` Jeff Mahoney
       [not found]                                                               ` <57DF4E44.2040506@cn.fujitsu.com>
  1 sibling, 1 reply; 82+ messages in thread
From: Wang Xiaoguang @ 2016-09-19  2:38 UTC (permalink / raw)
  To: Jeff Mahoney, Josef Bacik, Ronan Arraes Jardim Chagas, Qu Wenruo,
	Chris Murphy, Austin S. Hemmelgarn
  Cc: Btrfs BTRFS

hi,

On 09/14/2016 10:25 PM, Jeff Mahoney wrote:
> On 9/13/16 10:24 PM, Josef Bacik wrote:
>> On 09/08/2016 07:02 PM, Jeff Mahoney wrote:
>>> On 9/8/16 2:49 PM, Jeff Mahoney wrote:
>>>> On 9/8/16 2:24 PM, Ronan Arraes Jardim Chagas wrote:
>>>>> Hi all!
>>>>>
>>>>> Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:
>>>>>> Just like what Wang has mentioned, would you please paste all the
>>>>>> output
>>>>>> of the contents of /sys/fs/btrfs/<your fs uuid>/allocation?
>>>>>>
>>>>>> It's recommended to use "grep . -IR <path>" to get all the data as
>>>>>> it
>>>>>> will show the file name.
>>>>> So, one more time, I see the problem. This time I was just using
>>>>> Firefox and I cannot recover using `btrfs balance`. I think that, one
>>>>> more time, I will need to reboot this machine. This problem is really
>>>>> causing me a lot of troubles :(
>>>> I have a hunch the list is about to be flooded with similar reports if
>>>> we don't find this one before 4.8.
>>>>
>>>> commit d555b6c380c644af63dbdaa7cc14bba041a4e4dd
>>>> Author: Josef Bacik <jbacik@fb.com>
>>>> Date:   Fri Mar 25 13:25:51 2016 -0400
>>>>
>>>>      Btrfs: warn_on for unaccounted spaces
>>>>
>>>> This commit isn't the source of the bug, but it's making it a lot more
>>>> noisy.  I spent a few hours last night trying to track down why xfstests
>>>> was throwing these warnings and I was able to reproduce them at least as
>>>> far back as 4.4-vanilla with -oenospc_debug enabled.
>>>>
>>>> Speaking of which, can you turn on mounting with -oenospc_debug if you
>>>> haven't already?
>>>>
>>>> In my case, space_info->bytes_may_use was getting accounted incorrectly.
>>>>
>>>> I am able to reproduce that even with the following commit:
>>>> commit 18513091af9483ba84328d42092bd4d42a3c958f
>>>> Author: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>>> Date:   Mon Jul 25 15:51:40 2016 +0800
>>>>
>>>>      btrfs: update btrfs_space_info's bytes_may_use timely
>>> And the btrfs_free_reserved_data_space_noquota WARN_ON I was seeing is
>>> fixed by:
>>>
>>> commit ed7a6948394305b810d0c6203268648715e5006f
>>> Author: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>> Date:   Fri Aug 26 11:33:14 2016 +0800
>>>
>>>      btrfs: do not decrease bytes_may_use when replaying extents
>>>
>>> ... which shouldn't change anything for your issue, unfortunately.
>>>
>>> I still see these:
>>> WARNING: CPU: 2 PID: 8166 at ../fs/btrfs/extent-tree.c:9582
>>> btrfs_free_block_groups+0x2a8/0x400 [btrfs]()
>>> Modules linked in: loop dm_flakey af_packet iscsi_ibft iscsi_boot_sysfs
>>> msr ext4 crc16 mbcache jbd2 ipmi_ssif dm_mod igb ptp pps_core
>>> acpi_cpufreq tpm_infineon kvm_amd ipmi_si kvm dca pcspkr ipmi_msghandler
>>> 8250_fintek sp5100_tco fjes irqbypass i2c_piix4 shpchp processor button
>>> amd64_edac_mod edac_mce_amd edac_core k10temp btrfs xor raid6_pq sd_mod
>>> ata_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect
>>> ohci_pci sysimgblt ehci_pci serio_raw ohci_hcd fb_sys_fops pata_atiixp
>>> ehci_hcd ttm ahci libahci drm usbcore libata usb_common sg scsi_mod
>>> autofs4
>>> CPU: 2 PID: 8166 Comm: umount Tainted: G        W
>>> 4.4.19-11.g81405db-vanilla #1
>>> Hardware name: HP ProLiant DL165 G7, BIOS O37 10/17/2012
>>>   0000000000000000 ffff880230317d10 ffffffff813170ec 0000000000000000
>>>   ffffffffa0472528 ffff880230317d48 ffffffff8107d816 0000000000000000
>>>   ffff88009ab03600 ffff8800ba106288 ffff8800ab75a000 ffff8800ba106200
>>> Call Trace:
>>>   [<ffffffff813170ec>] dump_stack+0x63/0x87
>>>   [<ffffffff8107d816>] warn_slowpath_common+0x86/0xc0
>>>   [<ffffffff8107d90a>] warn_slowpath_null+0x1a/0x20
>>>   [<ffffffffa03de3a8>] btrfs_free_block_groups+0x2a8/0x400 [btrfs]
>>>   [<ffffffffa03ef24b>] close_ctree+0x15b/0x330 [btrfs]
>>>   [<ffffffffa03bfeb9>] btrfs_put_super+0x19/0x20 [btrfs]
>>>   [<ffffffff811fe5bf>] generic_shutdown_super+0x6f/0x100
>>>   [<ffffffff811fe662>] kill_anon_super+0x12/0x20
>>>   [<ffffffffa03c4fa8>] btrfs_kill_super+0x18/0x120 [btrfs]
>>>   [<ffffffff811fe003>] deactivate_locked_super+0x43/0x70
>>>   [<ffffffff811fe076>] deactivate_super+0x46/0x60
>>>   [<ffffffff81219dcf>] cleanup_mnt+0x3f/0x80
>>>   [<ffffffff81219e62>] __cleanup_mnt+0x12/0x20
>>>   [<ffffffff81099fb6>] task_work_run+0x86/0xb0
>>>   [<ffffffff81078806>] exit_to_usermode_loop+0x73/0xa2
>>>   [<ffffffff81003b2d>] syscall_return_slowpath+0x8d/0xa0
>>>   [<ffffffff815f928c>] int_ret_from_sys_call+0x25/0x8f
>>> ---[ end trace 09a0cc2892b6305c ]---
>>> BTRFS: space_info 1 has 7946240 free, is not full
>>> BTRFS: space_info total=8388608, used=442368, pinned=0, reserved=0,
>>> may_use=4096, readonly=0
>>>
>>> ... where the value of may_use varies.
>>>
>> What test are you seeing this with?  Thanks,
> btrfs/022 hits it every time for me.
btrfs/022 is not related to this enospc error.
Qu wenruo's patch “ btrfs: Fix leaking bytes_may_use after hitting 
EDQUOTA” has
fixed this warning, please check his patch for detailed commit message.

Regards,
Xiaoguang Wang
>
> -Jeff
>




^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
       [not found]                                                               ` <57DF4E44.2040506@cn.fujitsu.com>
@ 2016-09-22 13:20                                                                 ` Ronan Arraes Jardim Chagas
  2016-09-22 13:41                                                                   ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-22 13:20 UTC (permalink / raw)
  To: Wang Xiaoguang, Jeff Mahoney, Josef Bacik, Qu Wenruo,
	Chris Murphy, Austin S. Hemmelgarn
  Cc: Btrfs BTRFS

Guys,

Something very strange happened. I have not seen the problem since
Monday, which is pretty much the first time ever I work more than 3
days without seeing it.

Ok, it can be a coincidence. Notice that I did not change anything
related to my work behavior. However, I did do two things:

_ Update the kernel to 4.7.2; and
_ Created 50 dummy files with 3.0 GiB each.

Can anyone, please, tell me if these things seems to be correlated?

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-19  2:38                                                               ` Wang Xiaoguang
@ 2016-09-22 13:40                                                                 ` Jeff Mahoney
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Mahoney @ 2016-09-22 13:40 UTC (permalink / raw)
  To: Wang Xiaoguang, Josef Bacik, Ronan Arraes Jardim Chagas,
	Qu Wenruo, Chris Murphy, Austin S. Hemmelgarn
  Cc: Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 5418 bytes --]

On 9/18/16 10:38 PM, Wang Xiaoguang wrote:
> hi,
> 
> On 09/14/2016 10:25 PM, Jeff Mahoney wrote:
>> On 9/13/16 10:24 PM, Josef Bacik wrote:
>>> On 09/08/2016 07:02 PM, Jeff Mahoney wrote:
>>>> On 9/8/16 2:49 PM, Jeff Mahoney wrote:
>>>>> On 9/8/16 2:24 PM, Ronan Arraes Jardim Chagas wrote:
>>>>>> Hi all!
>>>>>>
>>>>>> Em Seg, 2016-09-05 às 16:49 +0800, Qu Wenruo escreveu:
>>>>>>> Just like what Wang has mentioned, would you please paste all the
>>>>>>> output
>>>>>>> of the contents of /sys/fs/btrfs/<your fs uuid>/allocation?
>>>>>>>
>>>>>>> It's recommended to use "grep . -IR <path>" to get all the data as
>>>>>>> it
>>>>>>> will show the file name.
>>>>>> So, one more time, I see the problem. This time I was just using
>>>>>> Firefox and I cannot recover using `btrfs balance`. I think that, one
>>>>>> more time, I will need to reboot this machine. This problem is really
>>>>>> causing me a lot of troubles :(
>>>>> I have a hunch the list is about to be flooded with similar reports if
>>>>> we don't find this one before 4.8.
>>>>>
>>>>> commit d555b6c380c644af63dbdaa7cc14bba041a4e4dd
>>>>> Author: Josef Bacik <jbacik@fb.com>
>>>>> Date:   Fri Mar 25 13:25:51 2016 -0400
>>>>>
>>>>>      Btrfs: warn_on for unaccounted spaces
>>>>>
>>>>> This commit isn't the source of the bug, but it's making it a lot more
>>>>> noisy.  I spent a few hours last night trying to track down why
>>>>> xfstests
>>>>> was throwing these warnings and I was able to reproduce them at
>>>>> least as
>>>>> far back as 4.4-vanilla with -oenospc_debug enabled.
>>>>>
>>>>> Speaking of which, can you turn on mounting with -oenospc_debug if you
>>>>> haven't already?
>>>>>
>>>>> In my case, space_info->bytes_may_use was getting accounted
>>>>> incorrectly.
>>>>>
>>>>> I am able to reproduce that even with the following commit:
>>>>> commit 18513091af9483ba84328d42092bd4d42a3c958f
>>>>> Author: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>>>> Date:   Mon Jul 25 15:51:40 2016 +0800
>>>>>
>>>>>      btrfs: update btrfs_space_info's bytes_may_use timely
>>>> And the btrfs_free_reserved_data_space_noquota WARN_ON I was seeing is
>>>> fixed by:
>>>>
>>>> commit ed7a6948394305b810d0c6203268648715e5006f
>>>> Author: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>>> Date:   Fri Aug 26 11:33:14 2016 +0800
>>>>
>>>>      btrfs: do not decrease bytes_may_use when replaying extents
>>>>
>>>> ... which shouldn't change anything for your issue, unfortunately.
>>>>
>>>> I still see these:
>>>> WARNING: CPU: 2 PID: 8166 at ../fs/btrfs/extent-tree.c:9582
>>>> btrfs_free_block_groups+0x2a8/0x400 [btrfs]()
>>>> Modules linked in: loop dm_flakey af_packet iscsi_ibft iscsi_boot_sysfs
>>>> msr ext4 crc16 mbcache jbd2 ipmi_ssif dm_mod igb ptp pps_core
>>>> acpi_cpufreq tpm_infineon kvm_amd ipmi_si kvm dca pcspkr
>>>> ipmi_msghandler
>>>> 8250_fintek sp5100_tco fjes irqbypass i2c_piix4 shpchp processor button
>>>> amd64_edac_mod edac_mce_amd edac_core k10temp btrfs xor raid6_pq sd_mod
>>>> ata_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect
>>>> ohci_pci sysimgblt ehci_pci serio_raw ohci_hcd fb_sys_fops pata_atiixp
>>>> ehci_hcd ttm ahci libahci drm usbcore libata usb_common sg scsi_mod
>>>> autofs4
>>>> CPU: 2 PID: 8166 Comm: umount Tainted: G        W
>>>> 4.4.19-11.g81405db-vanilla #1
>>>> Hardware name: HP ProLiant DL165 G7, BIOS O37 10/17/2012
>>>>   0000000000000000 ffff880230317d10 ffffffff813170ec 0000000000000000
>>>>   ffffffffa0472528 ffff880230317d48 ffffffff8107d816 0000000000000000
>>>>   ffff88009ab03600 ffff8800ba106288 ffff8800ab75a000 ffff8800ba106200
>>>> Call Trace:
>>>>   [<ffffffff813170ec>] dump_stack+0x63/0x87
>>>>   [<ffffffff8107d816>] warn_slowpath_common+0x86/0xc0
>>>>   [<ffffffff8107d90a>] warn_slowpath_null+0x1a/0x20
>>>>   [<ffffffffa03de3a8>] btrfs_free_block_groups+0x2a8/0x400 [btrfs]
>>>>   [<ffffffffa03ef24b>] close_ctree+0x15b/0x330 [btrfs]
>>>>   [<ffffffffa03bfeb9>] btrfs_put_super+0x19/0x20 [btrfs]
>>>>   [<ffffffff811fe5bf>] generic_shutdown_super+0x6f/0x100
>>>>   [<ffffffff811fe662>] kill_anon_super+0x12/0x20
>>>>   [<ffffffffa03c4fa8>] btrfs_kill_super+0x18/0x120 [btrfs]
>>>>   [<ffffffff811fe003>] deactivate_locked_super+0x43/0x70
>>>>   [<ffffffff811fe076>] deactivate_super+0x46/0x60
>>>>   [<ffffffff81219dcf>] cleanup_mnt+0x3f/0x80
>>>>   [<ffffffff81219e62>] __cleanup_mnt+0x12/0x20
>>>>   [<ffffffff81099fb6>] task_work_run+0x86/0xb0
>>>>   [<ffffffff81078806>] exit_to_usermode_loop+0x73/0xa2
>>>>   [<ffffffff81003b2d>] syscall_return_slowpath+0x8d/0xa0
>>>>   [<ffffffff815f928c>] int_ret_from_sys_call+0x25/0x8f
>>>> ---[ end trace 09a0cc2892b6305c ]---
>>>> BTRFS: space_info 1 has 7946240 free, is not full
>>>> BTRFS: space_info total=8388608, used=442368, pinned=0, reserved=0,
>>>> may_use=4096, readonly=0
>>>>
>>>> ... where the value of may_use varies.
>>>>
>>> What test are you seeing this with?  Thanks,
>> btrfs/022 hits it every time for me.
> btrfs/022 is not related to this enospc error.
> Qu wenruo's patch “ btrfs: Fix leaking bytes_may_use after hitting
> EDQUOTA” has
> fixed this warning, please check his patch for detailed commit message.

Yep, that's understood.  This was just something I happened to encounter
while looking at this.

-Jeff


-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-22 13:20                                                                 ` Ronan Arraes Jardim Chagas
@ 2016-09-22 13:41                                                                   ` Austin S. Hemmelgarn
  2016-09-22 14:03                                                                     ` Ronan Arraes Jardim Chagas
  0 siblings, 1 reply; 82+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-22 13:41 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Wang Xiaoguang, Jeff Mahoney,
	Josef Bacik, Qu Wenruo, Chris Murphy
  Cc: Btrfs BTRFS

On 2016-09-22 09:20, Ronan Arraes Jardim Chagas wrote:
> Guys,
>
> Something very strange happened. I have not seen the problem since
> Monday, which is pretty much the first time ever I work more than 3
> days without seeing it.
>
> Ok, it can be a coincidence. Notice that I did not change anything
> related to my work behavior. However, I did do two things:
>
> _ Update the kernel to 4.7.2; and
> _ Created 50 dummy files with 3.0 GiB each.
>
> Can anyone, please, tell me if these things seems to be correlated?
Most likely the kernel upgrade fixed things.  It's possible that the 
large allocation is impacting something and making it work, but I don't 
think that that is very likely.


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-22 13:41                                                                   ` Austin S. Hemmelgarn
@ 2016-09-22 14:03                                                                     ` Ronan Arraes Jardim Chagas
  2016-09-22 14:39                                                                       ` Josef Bacik
  0 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-22 14:03 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Wang Xiaoguang, Jeff Mahoney, Josef Bacik,
	Qu Wenruo, Chris Murphy
  Cc: Btrfs BTRFS

Em qui, 2016-09-22 às 09:41 -0400, Austin S. Hemmelgarn escreveu:
> Most likely the kernel upgrade fixed things.  It's possible that the 
> large allocation is impacting something and making it work, but I
> don't 
> think that that is very likely.

The patches related to btrfs I could find in kernel 4.7.2 and 4.7.3
changelog are:

commit 8d32aaa89067225d4202a362dc201280e2514952
Author: Chris Mason <clm@fb.com>
Date:   Tue Jul 19 05:52:36 2016 -0700

    Btrfs: fix delalloc accounting after copy_from_user faults

commit f495a60eb6351bf2f29fdbc1854375df9fe4022b
Author: Paolo Valente <paolo.valente@linaro.org>
Date:   Wed Jul 27 07:22:05 2016 +0200

    block: add missing group association in bio-cloning functions
    Fixes: da2f0f74cf7d ("Btrfs: add support for blkio controllers")

commit ff3235105fc7e4ecf04eb308940821d4a098c08d
Author: Jeff Mahoney <jeffm@suse.com>
Date:   Wed Aug 17 21:58:33 2016 -0400

    btrfs: don't create or leak aliased root while cleaning up orphans

commit 64563a38fde57a26f4d68d488d0d4918f843547c
Author: Jeff Mahoney <jeffm@suse.com>
Date:   Mon Aug 15 12:10:33 2016 -0400

    btrfs: properly track when rescan worker is running

commit 69b69167965e108a775ef20decabcc76fbe4fc08
Author: Jeff Mahoney <jeffm@suse.com>
Date:   Mon Aug 8 22:08:06 2016 -0400

    btrfs: waiting on qgroup rescan should not always be interruptible

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-22 14:03                                                                     ` Ronan Arraes Jardim Chagas
@ 2016-09-22 14:39                                                                       ` Josef Bacik
  2016-09-22 17:06                                                                         ` Ronan Arraes Jardim Chagas
  0 siblings, 1 reply; 82+ messages in thread
From: Josef Bacik @ 2016-09-22 14:39 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Austin S. Hemmelgarn, Wang Xiaoguang,
	Jeff Mahoney, Qu Wenruo, Chris Murphy
  Cc: Btrfs BTRFS

On 09/22/2016 10:03 AM, Ronan Arraes Jardim Chagas wrote:
> Em qui, 2016-09-22 às 09:41 -0400, Austin S. Hemmelgarn escreveu:
>> Most likely the kernel upgrade fixed things.  It's possible that the
>> large allocation is impacting something and making it work, but I
>> don't
>> think that that is very likely.
>
> The patches related to btrfs I could find in kernel 4.7.2 and 4.7.3
> changelog are:
>
> commit 8d32aaa89067225d4202a362dc201280e2514952
> Author: Chris Mason <clm@fb.com>
> Date:   Tue Jul 19 05:52:36 2016 -0700
>
>     Btrfs: fix delalloc accounting after copy_from_user faults

This is what fixed it.  I thought it was in 4.7 which is why I started paying 
attention, but I guess I was wrong.  Glad your problem is resolved.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-22 14:39                                                                       ` Josef Bacik
@ 2016-09-22 17:06                                                                         ` Ronan Arraes Jardim Chagas
  2016-09-22 17:49                                                                           ` Josef Bacik
  0 siblings, 1 reply; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-22 17:06 UTC (permalink / raw)
  To: Josef Bacik, Austin S. Hemmelgarn, Wang Xiaoguang, Jeff Mahoney,
	Qu Wenruo, Chris Murphy
  Cc: Btrfs BTRFS

Hi Josef,

Em qui, 2016-09-22 às 10:39 -0400, Josef Bacik escreveu:
> This is what fixed it.  I thought it was in 4.7 which is why I
> started paying 
> attention, but I guess I was wrong.  Glad your problem is
> resolved.  Thanks,

Do you have any explanations why the problem solved by the patch was
causing me the ENOSPC? Also, is it necessary to format my partition or
should I consider it good for use after the installation of the new
kernel?

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-22 17:06                                                                         ` Ronan Arraes Jardim Chagas
@ 2016-09-22 17:49                                                                           ` Josef Bacik
  2016-09-22 17:54                                                                             ` Ronan Arraes Jardim Chagas
  2016-09-23 15:20                                                                             ` [SOLVED] " Ronan Arraes Jardim Chagas
  0 siblings, 2 replies; 82+ messages in thread
From: Josef Bacik @ 2016-09-22 17:49 UTC (permalink / raw)
  To: Ronan Arraes Jardim Chagas, Austin S. Hemmelgarn, Wang Xiaoguang,
	Jeff Mahoney, Qu Wenruo, Chris Murphy
  Cc: Btrfs BTRFS

On 09/22/2016 01:06 PM, Ronan Arraes Jardim Chagas wrote:
> Hi Josef,
>
> Em qui, 2016-09-22 às 10:39 -0400, Josef Bacik escreveu:
>> This is what fixed it.  I thought it was in 4.7 which is why I
>> started paying
>> attention, but I guess I was wrong.  Glad your problem is
>> resolved.  Thanks,
>
> Do you have any explanations why the problem solved by the patch was
> causing me the ENOSPC? Also, is it necessary to format my partition or
> should I consider it good for use after the installation of the new
> kernel?

That patch fixed a problem where we would screw up the ENOSPC accounting, and 
would slowly leak space into one of the counters.  So eventually (or often in 
your case) you'd hit ENOSPC, but have plenty of space available.  If you 
unmounted and mounted again, or simply rebooted, everything would have been 
fine.  You can still use the fs, the accounting is purely in memory so it's not 
like your FS is permanently screwed.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-22 17:49                                                                           ` Josef Bacik
@ 2016-09-22 17:54                                                                             ` Ronan Arraes Jardim Chagas
  2016-09-23 15:20                                                                             ` [SOLVED] " Ronan Arraes Jardim Chagas
  1 sibling, 0 replies; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-22 17:54 UTC (permalink / raw)
  To: Josef Bacik, Austin S. Hemmelgarn, Wang Xiaoguang, Jeff Mahoney,
	Qu Wenruo, Chris Murphy
  Cc: Btrfs BTRFS

Hi Josef,

Em qui, 2016-09-22 às 13:49 -0400, Josef Bacik escreveu:
> That patch fixed a problem where we would screw up the ENOSPC
> accounting, and 
> would slowly leak space into one of the counters.  So eventually (or
> often in 
> your case) you'd hit ENOSPC, but have plenty of space available.  If
> you 
> unmounted and mounted again, or simply rebooted, everything would
> have been 
> fine.  You can still use the fs, the accounting is purely in memory
> so it's not 
> like your FS is permanently screwed.  Thanks,


Thank you very much for the explanation. I am very glad it is finally
fixed here :)

Best regards,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [SOLVED] BTRFS constantly reports "No space left on device" even with a huge unallocated space
  2016-09-22 17:49                                                                           ` Josef Bacik
  2016-09-22 17:54                                                                             ` Ronan Arraes Jardim Chagas
@ 2016-09-23 15:20                                                                             ` Ronan Arraes Jardim Chagas
  1 sibling, 0 replies; 82+ messages in thread
From: Ronan Arraes Jardim Chagas @ 2016-09-23 15:20 UTC (permalink / raw)
  To: Josef Bacik, Austin S. Hemmelgarn, Wang Xiaoguang, Jeff Mahoney,
	Qu Wenruo, Chris Murphy
  Cc: Btrfs BTRFS

Hi guys!

After a week without experiencing the problem, I think we can mark this
problem as solved. I want to thanks all the devs on this list. You were
always very helpful. For anyone who is still experiencing the reported
problem, upgrade to kernel 4.7.3 and I think you will be fine :)

Best regards and thank you all,
Ronan Arraes

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2016-09-23 15:20 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-12 17:36 BTRFS constantly reports "No space left on device" even with a huge unallocated space Ronan Arraes Jardim Chagas
2016-08-12 18:02 ` Chris Murphy
2016-08-12 19:00   ` Ronan Arraes Jardim Chagas
2016-08-12 19:37     ` Chris Murphy
2016-08-12 20:34       ` Chris Murphy
     [not found]         ` <CAKdnfRJeOXHmrumDkfxLTf-nU=KwZ0f7ybET-3o7kwwJDOZ2aw@mail.gmail.com>
2016-08-15 23:24           ` Chris Murphy
2016-08-16 17:49             ` Ronan Arraes Jardim Chagas
2016-08-22 19:11             ` Ronan Arraes Jardim Chagas
2016-08-22 20:39             ` Ronan Arraes Jardim Chagas
2016-08-22 20:49               ` Chris Murphy
2016-08-22 21:04                 ` Ronan Arraes Jardim Chagas
2016-08-24  0:40                   ` Jeff Mahoney
2016-08-25 15:58             ` Lutz Vieweg
2016-08-25 23:56               ` Chris Murphy
2016-08-26  5:59                 ` Marc Haber
2016-08-29 12:12 ` Wang Xiaoguang
2016-08-29 13:20   ` Ronan Arraes Jardim Chagas
2016-08-29 15:52   ` Ronan Arraes Jardim Chagas
2016-08-29 22:25     ` Jeff Mahoney
2016-08-30  2:12     ` Wang Xiaoguang
2016-08-30 12:50       ` Ronan Arraes Jardim Chagas
2016-08-30 16:44         ` Chris Murphy
2016-08-30 16:57           ` Ronan Arraes Jardim Chagas
2016-08-31 20:49           ` Ronan Arraes Jardim Chagas
2016-08-31 21:44             ` Chris Murphy
2016-08-31 21:48               ` Chris Murphy
2016-08-31 22:47                 ` Jeff Mahoney
2016-08-31 22:58                   ` Chris Murphy
2016-08-31 23:03                     ` Jeff Mahoney
2016-08-31 23:09                       ` Chris Murphy
2016-09-01 12:57                         ` Ronan Arraes Jardim Chagas
2016-09-01 13:21                           ` Austin S. Hemmelgarn
2016-09-01 16:34                             ` Ronan Arraes Jardim Chagas
2016-09-01 17:04                               ` Austin S. Hemmelgarn
2016-09-01 17:12                                 ` Jeff Mahoney
2016-09-01 17:39                                   ` Ronan Arraes Jardim Chagas
2016-09-01 17:43                                     ` Jeff Mahoney
2016-09-01 17:58                                       ` Ronan Arraes Jardim Chagas
2016-09-01 17:45                                   ` Chris Murphy
2016-09-01 18:47                                   ` Austin S. Hemmelgarn
2016-09-02  0:12                                     ` Chris Murphy
2016-09-02 14:26                                       ` Jeff Mahoney
2016-09-02 14:43                                         ` Ronan Arraes Jardim Chagas
2016-09-02 14:48                                           ` Jeff Mahoney
2016-09-02 15:20                                             ` Ronan Arraes Jardim Chagas
2016-09-02 15:26                                               ` Jeff Mahoney
2016-09-02 19:25                                                 ` Ronan Arraes Jardim Chagas
2016-09-05  8:49                                                   ` Qu Wenruo
2016-09-08 18:24                                                     ` Ronan Arraes Jardim Chagas
2016-09-08 18:49                                                       ` Jeff Mahoney
2016-09-08 23:02                                                         ` Jeff Mahoney
2016-09-13 20:24                                                           ` Josef Bacik
2016-09-14 14:25                                                             ` Jeff Mahoney
2016-09-19  2:38                                                               ` Wang Xiaoguang
2016-09-22 13:40                                                                 ` Jeff Mahoney
     [not found]                                                               ` <57DF4E44.2040506@cn.fujitsu.com>
2016-09-22 13:20                                                                 ` Ronan Arraes Jardim Chagas
2016-09-22 13:41                                                                   ` Austin S. Hemmelgarn
2016-09-22 14:03                                                                     ` Ronan Arraes Jardim Chagas
2016-09-22 14:39                                                                       ` Josef Bacik
2016-09-22 17:06                                                                         ` Ronan Arraes Jardim Chagas
2016-09-22 17:49                                                                           ` Josef Bacik
2016-09-22 17:54                                                                             ` Ronan Arraes Jardim Chagas
2016-09-23 15:20                                                                             ` [SOLVED] " Ronan Arraes Jardim Chagas
2016-09-02 19:56                                                 ` Ronan Arraes Jardim Chagas
2016-09-02 21:34                                                   ` Chris Murphy
2016-09-02 22:13                                                     ` Ronan Arraes Jardim Chagas
2016-09-02 22:39                                                       ` Chris Murphy
2016-09-03  2:47                                                         ` Ronan Arraes Jardim Chagas
2016-09-03  3:41                                                           ` Chris Murphy
2016-09-03  3:47                                                             ` Ronan Arraes Jardim Chagas
2016-09-03  4:14                                                               ` Chris Murphy
2016-09-01 17:07                             ` Chris Murphy
2016-09-02  0:37               ` Qu Wenruo
2016-09-02 14:09             ` Jeff Mahoney
2016-09-14 20:15   ` Ronan Arraes Jardim Chagas
2016-09-14 22:25     ` Chris Murphy
2016-09-15  0:56       ` Ronan Arraes Jardim Chagas
2016-09-13  3:17 ` Wang Xiaoguang
2016-09-13 12:54   ` Ronan Arraes Jardim Chagas
2016-09-13 20:49   ` Ronan Arraes Jardim Chagas
2016-09-13 21:01     ` Josef Bacik
2016-09-14 14:40       ` Ronan Arraes Jardim Chagas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.