6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem

All of lore.kernel.org
 help / color / mirror / Atom feed

* 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
@ 2015-12-30 21:44 cheater00 .
  2015-12-30 22:13 ` Chris Murphy
  0 siblings, 1 reply; 55+ messages in thread
From: cheater00 . @ 2015-12-30 21:44 UTC (permalink / raw)
  To: Btrfs BTRFS

Hi,
I have a 6TB partition here, it filled up while still just under 2TB
were on it. btrfs fi df showed that Data is 1.92TB:

Data, single: total=1.92TiB, used=1.92TiB
System, DUP: total=8.00MiB, used=224.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=5.00GiB, used=3.32GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

btrfs fs resize max . did nothing, I also tried resize -1T and resize
+1T and that did nothing as well. On IRC I was directed to this:

https://btrfs.wiki.kernel.org/index.php/FAQ#if_your_device_is_large_.28.3E16GiB.29

"When you haven't hit the "usual" problem

If the conditions above aren't true (i.e. there's plenty of
unallocated space, or there's lots of unused metadata allocation),
then you may have hit a known but unresolved bug. If this is the case,
please report it to either the mailing list, or IRC. In some cases, it
has been possible to deal with the problem, but the approach is new,
and we would like more direct contact with people experiencing this
particular bug."

What do I do now? It's kind of important to me to get that free space.
I'm really jonesing for that free space.

Thanks.

(btw, so far I haven't been able to follow up on that unrelated thread
from a while back. But I hope to be able to do that sometime in
January.)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2015-12-30 21:44 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem cheater00 .
@ 2015-12-30 22:13 ` Chris Murphy
  2016-01-02  2:09   ` cheater00 .
  0 siblings, 1 reply; 55+ messages in thread
From: Chris Murphy @ 2015-12-30 22:13 UTC (permalink / raw)
  To: cheater00 .; +Cc: Btrfs BTRFS

kernel and btrfs-progs versions
and output from:
'btrfs fi show <mp>'
'btrfs fi usage <mp>'
'btrfs-show-super <blockdev>'
'df -h'

Then umount the volume, and mount with option enospc_debug, and try to
reproduce the problem, then include everything from dmesg from the
time the volume was mounted.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2015-12-30 22:13 ` Chris Murphy
@ 2016-01-02  2:09   ` cheater00 .
  2016-01-02  2:10     ` cheater00 .
  0 siblings, 1 reply; 55+ messages in thread
From: cheater00 . @ 2016-01-02  2:09 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

I have been unable to reproduce so far.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-02  2:09   ` cheater00 .
@ 2016-01-02  2:10     ` cheater00 .
       [not found]       ` <CA+9GZUiWQ2tAotFuq2Svkjnk+2Quz5B8UwZSSpm4SJfhqfoStQ@mail.gmail.com>
  0 siblings, 1 reply; 55+ messages in thread
From: cheater00 . @ 2016-01-02  2:10 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

here is the info requested, if that helps anyone.

# uname -a
Linux SX20S 4.3.0-040300rc7-generic #201510260712 SMP Mon Oct 26
11:27:59 UTC 2015 i686 i686 i686 GNU/Linux
# aptitude show btrfs-tools
Package: btrfs-tools
State: installed
Automatically installed: no
Version: 4.2.1+ppa1-1~ubuntu15.10.1
# btrfs --version
btrfs-progs v4.2.1
# btrfs fi show Media
Label: 'Media'  uuid: b397b7ef-6754-4ba4-8b1a-fbf235aa1cf8
Total devices 1 FS bytes used 1.92TiB
devid    1 size 5.46TiB used 1.93TiB path /dev/sdd1

btrfs-progs v4.2.1
# btrfs fi usage Media
Overall:
    Device size:   5.46TiB
    Device allocated:   1.93TiB
    Device unallocated:   3.52TiB
    Device missing:     0.00B
    Used:   1.93TiB
    Free (estimated):   3.53TiB (min: 1.76TiB)
    Data ratio:      1.00
    Metadata ratio:      2.00
    Global reserve: 512.00MiB (used: 0.00B)

Data,single: Size:1.92TiB, Used:1.92TiB
   /dev/sdd1   1.92TiB

Metadata,single: Size:8.00MiB, Used:0.00B
   /dev/sdd1   8.00MiB

Metadata,DUP: Size:5.00GiB, Used:3.32GiB
   /dev/sdd1  10.00GiB

System,single: Size:4.00MiB, Used:0.00B
   /dev/sdd1   4.00MiB

System,DUP: Size:8.00MiB, Used:224.00KiB
   /dev/sdd1  16.00MiB

Unallocated:
   /dev/sdd1   3.52TiB



# btrfs-show-super /dev/sdd1
superblock: bytenr=65536, device=/dev/sdd1
---------------------------------------------------------
csum 0xae174f16 [match]
bytenr 65536
flags 0x1
( WRITTEN )
magic _BHRfS_M [match]
fsid b397b7ef-6754-4ba4-8b1a-fbf235aa1cf8
label Media
generation 11983
root 34340864
sys_array_size 226
chunk_root_generation 11982
root_level 1
chunk_root 21135360
chunk_root_level 1
log_root 0
log_root_transid 0
log_root_level 0
total_bytes 6001173463040
bytes_used 2115339448320
sectorsize 4096
nodesize 16384
leafsize 16384
stripesize 4096
root_dir 6
num_devices 1
compat_flags 0x0
compat_ro_flags 0x0
incompat_flags 0x61
( MIXED_BACKREF |
 BIG_METADATA |
 EXTENDED_IREF )
csum_type 0
csum_size 4
cache_generation 11983
uuid_tree_generation 11983
dev_item.uuid 819e1c8a-5e55-4992-81d3-f22fdd088dc9
dev_item.fsid b397b7ef-6754-4ba4-8b1a-fbf235aa1cf8 [match]
dev_item.type 0
dev_item.total_bytes 6001173463040
dev_item.bytes_used 2124972818432
dev_item.io_align 4096
dev_item.io_width 4096
dev_item.sector_size 4096
dev_item.devid 1
dev_item.dev_group 0
dev_item.seek_speed 0
dev_item.bandwidth 0
dev_item.generation 0

I did mount Media -o enospc_debug and now mount shows:
/dev/sdd1 on /media/cheater/Media type btrfs
(rw,nosuid,nodev,enospc_debug,_netdev)


On Wed, Dec 30, 2015 at 11:13 PM, Chris Murphy <lists@colorremedies.com> wrote:
> kernel and btrfs-progs versions
> and output from:
> 'btrfs fi show <mp>'
> 'btrfs fi usage <mp>'
> 'btrfs-show-super <blockdev>'
> 'df -h'
>
> Then umount the volume, and mount with option enospc_debug, and try to
> reproduce the problem, then include everything from dmesg from the
> time the volume was mounted.
>
> --
> Chris Murphy

On Sat, Jan 2, 2016 at 3:09 AM, cheater00 . <cheater00@gmail.com> wrote:
> I have been unable to reproduce so far.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
       [not found]       ` <CA+9GZUiWQ2tAotFuq2Svkjnk+2Quz5B8UwZSSpm4SJfhqfoStQ@mail.gmail.com>
@ 2016-01-07 21:55         ` Chris Murphy
       [not found]           ` <CA+9GZUjLcRnRX_mwO-McXWFd+G4o3jtBENMLnszg-rJTn6vL1w@mail.gmail.com>
  0 siblings, 1 reply; 55+ messages in thread
From: Chris Murphy @ 2016-01-07 21:55 UTC (permalink / raw)
  To: cheater00 .; +Cc: Chris Murphy, Btrfs BTRFS

I'm not finding much about this.

http://www.spinics.net/lists/linux-btrfs/msg46537.html
http://www.spinics.net/lists/linux-btrfs/msg46279.html

The last one Anand Jain (developer) replied:
http://www.spinics.net/lists/linux-btrfs/msg46303.html

So I'm not sure if this problem is the same, and if the cause is
known. Best I can suggest is to try a newer kernel. Even 4.4.0rc8 is
stable enough to try; but there were quite a few btrfs patches for
4.3.3 as well. I would try 4.4.0 first to see if it reproduces. If
not, then try 4.3.3.

>4.3.0-040300rc7-generic #201510260712 SMP Mon Oct 26
>11:27:59 UTC 2015 i686 i686 i686 GNU/Linux

I think the other two cases are x86-64 so I don't think i686 is related.

There are quite a few messages:
swapper/6: page allocation failure: order:0, mode:0x20
[1129393.648245] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W  OE
  4.3.0-040300rc7-generic #201510260712

So the kernel is in a tainted state, I can't tell if that's related.

And also
usb 4-1.2: reset high-speed USB device number 7 using ehci-pci

One happens less than a minute before the first btrfs call trace. I've
had these messages also with USB drives, and they usually don't cause
a problem, and in my research it seems they can usually be ignored
except when there's subsequent problems. I ended up getting a powered
hub to put all the USB drives on and now I don't get these messages
anymore (so far anyway) and I also haven't had any Btrfs errors, which
I did rarely have happen prior to moving to the USB 3 powered hub;
even though the power shouldn't have been a factor since the drives
want 900mA at most (usually just spinning up), and standard USB 3
supplies 900mA.

Anyway, easiest is to move to a newer kernel to try to reproduce and
then start looking at hardware issues. What specific drive is
/dev/sdd? I'd make a note of that particular drive and see if it's the
same drive (not drive letter) if the problem happens again.

---
Chris Murphy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
       [not found]             ` <CAJCQCtRhYZi9nqWP_LYmZeg1yRQVkpnmUDQ-P5o1-gc-3w+Pdg@mail.gmail.com>
@ 2016-01-09 20:00               ` cheater00 .
  2016-01-09 20:26                 ` Hugo Mills
  0 siblings, 1 reply; 55+ messages in thread
From: cheater00 . @ 2016-01-09 20:00 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

Hello,
I can repeatedly trigger this bug by making the "data" portion fill
up. If you remember the partition is 6 TB but in btrfs filesystem df
Data is shown as only 2TB when in fact it should be nearly 6TB. So
this has nothing to do with kernel bugs. The filesystem on disk is
structured incorrectly. How do i fix this? How do I make "Data"
bigger? What is it exactly?

Thanks

P.S. Sorry about reposting twice, apparently Google's "Inbox" app
doesn't allow posting plain text at all and the mail got rejected from
the list.

On Thu, 7 Jan 2016 23:22 Chris Murphy <lists@colorremedies.com> wrote:
>
> On Thu, Jan 7, 2016 at 3:04 PM, cheater00 . <cheater00@gmail.com> wrote:
> > Yes, both times it was the same drive. I only have one usb drive now.
>
> That it's the same drive is suspicious. But I don't know what
> errno=-28 means or what could trigger it, if some USB weirdness could
> cause Btrfs to get confused somehow. I have one 7200rpm drive that
> wants 1.15A compared to all the others that have a 900mA spec, and
> while it behaves find 99% of the time like the others, rarely I would
> get the reset message and most of the time it was that drive (and less
> often one other). Now that doesn't happen anymore.
>
> >
> > I am not sure if chasing the kernel makes sense unless you think there is a
> > specific commit that would have foxed it. I only reported here in case
> > anyone here wanted to do some form of debugging before i reset the drive and
> > rescan the fs to make it writeable again. But since there seems to be no
> > interest i will go forward.
>
> I'd chase the hardware problem then first. It's just that the kernel
> switch is easier from my perspective. And it's just as unclear this is
> hardware related than just a bug. And since there are hundreds to
> thousands of Btrfs bugs being fixed per kernel release, I have no way
> to tell you whether it's fixed and maybe even a developer wouldn't
> either, you'd just have to try it.
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-09 20:00               ` cheater00 .
@ 2016-01-09 20:26                 ` Hugo Mills
  2016-01-09 20:59                   ` cheater00 .
  2016-01-10 14:14                   ` Henk Slager
  0 siblings, 2 replies; 55+ messages in thread
From: Hugo Mills @ 2016-01-09 20:26 UTC (permalink / raw)
  To: cheater00 .; +Cc: Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 2845 bytes --]

On Sat, Jan 09, 2016 at 09:00:47PM +0100, cheater00 . wrote:
> Hello,
> I can repeatedly trigger this bug by making the "data" portion fill
> up. If you remember the partition is 6 TB but in btrfs filesystem df
> Data is shown as only 2TB when in fact it should be nearly 6TB. So
> this has nothing to do with kernel bugs. The filesystem on disk is
> structured incorrectly. How do i fix this? How do I make "Data"
> bigger? What is it exactly?

   This is *exactly* the behaviour of the known kernel bug. The bug is
that the FS *should* be extending the data allocation when it gets
near to full, and it's not. There is no way of manually allocating
more (because the FS should be doing it automatically). There is no
known way of persuading the FS to it when it isn't.

   The only good solution I know of is to reformat the FS and restore
from backups. Even then, some people manage to repeatedly hit this
with newly-created filesystems.

   Hugo.

> Thanks
> 
> P.S. Sorry about reposting twice, apparently Google's "Inbox" app
> doesn't allow posting plain text at all and the mail got rejected from
> the list.
> 
> On Thu, 7 Jan 2016 23:22 Chris Murphy <lists@colorremedies.com> wrote:
> >
> > On Thu, Jan 7, 2016 at 3:04 PM, cheater00 . <cheater00@gmail.com> wrote:
> > > Yes, both times it was the same drive. I only have one usb drive now.
> >
> > That it's the same drive is suspicious. But I don't know what
> > errno=-28 means or what could trigger it, if some USB weirdness could
> > cause Btrfs to get confused somehow. I have one 7200rpm drive that
> > wants 1.15A compared to all the others that have a 900mA spec, and
> > while it behaves find 99% of the time like the others, rarely I would
> > get the reset message and most of the time it was that drive (and less
> > often one other). Now that doesn't happen anymore.
> >
> > >
> > > I am not sure if chasing the kernel makes sense unless you think there is a
> > > specific commit that would have foxed it. I only reported here in case
> > > anyone here wanted to do some form of debugging before i reset the drive and
> > > rescan the fs to make it writeable again. But since there seems to be no
> > > interest i will go forward.
> >
> > I'd chase the hardware problem then first. It's just that the kernel
> > switch is easier from my perspective. And it's just as unclear this is
> > hardware related than just a bug. And since there are hundreds to
> > thousands of Btrfs bugs being fixed per kernel release, I have no way
> > to tell you whether it's fixed and maybe even a developer wouldn't
> > either, you'd just have to try it.
> >
> >

-- 
Hugo Mills             | Hey, Virtual Memory! Now I can have a *really big*
hugo@... carfax.org.uk | ramdisk!
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-09 20:26                 ` Hugo Mills
@ 2016-01-09 20:59                   ` cheater00 .
  2016-01-09 21:04                     ` Hugo Mills
  2016-01-10 14:14                   ` Henk Slager
  1 sibling, 1 reply; 55+ messages in thread
From: cheater00 . @ 2016-01-09 20:59 UTC (permalink / raw)
  To: Hugo Mills, cheater00 ., Chris Murphy, Btrfs BTRFS

OK. How do we track down that bug and get it fixed?

On Sat, Jan 9, 2016 at 9:26 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Sat, Jan 09, 2016 at 09:00:47PM +0100, cheater00 . wrote:
>> Hello,
>> I can repeatedly trigger this bug by making the "data" portion fill
>> up. If you remember the partition is 6 TB but in btrfs filesystem df
>> Data is shown as only 2TB when in fact it should be nearly 6TB. So
>> this has nothing to do with kernel bugs. The filesystem on disk is
>> structured incorrectly. How do i fix this? How do I make "Data"
>> bigger? What is it exactly?
>
>    This is *exactly* the behaviour of the known kernel bug. The bug is
> that the FS *should* be extending the data allocation when it gets
> near to full, and it's not. There is no way of manually allocating
> more (because the FS should be doing it automatically). There is no
> known way of persuading the FS to it when it isn't.
>
>    The only good solution I know of is to reformat the FS and restore
> from backups. Even then, some people manage to repeatedly hit this
> with newly-created filesystems.
>
>    Hugo.
>
>> Thanks
>>
>> P.S. Sorry about reposting twice, apparently Google's "Inbox" app
>> doesn't allow posting plain text at all and the mail got rejected from
>> the list.
>>
>> On Thu, 7 Jan 2016 23:22 Chris Murphy <lists@colorremedies.com> wrote:
>> >
>> > On Thu, Jan 7, 2016 at 3:04 PM, cheater00 . <cheater00@gmail.com> wrote:
>> > > Yes, both times it was the same drive. I only have one usb drive now.
>> >
>> > That it's the same drive is suspicious. But I don't know what
>> > errno=-28 means or what could trigger it, if some USB weirdness could
>> > cause Btrfs to get confused somehow. I have one 7200rpm drive that
>> > wants 1.15A compared to all the others that have a 900mA spec, and
>> > while it behaves find 99% of the time like the others, rarely I would
>> > get the reset message and most of the time it was that drive (and less
>> > often one other). Now that doesn't happen anymore.
>> >
>> > >
>> > > I am not sure if chasing the kernel makes sense unless you think there is a
>> > > specific commit that would have foxed it. I only reported here in case
>> > > anyone here wanted to do some form of debugging before i reset the drive and
>> > > rescan the fs to make it writeable again. But since there seems to be no
>> > > interest i will go forward.
>> >
>> > I'd chase the hardware problem then first. It's just that the kernel
>> > switch is easier from my perspective. And it's just as unclear this is
>> > hardware related than just a bug. And since there are hundreds to
>> > thousands of Btrfs bugs being fixed per kernel release, I have no way
>> > to tell you whether it's fixed and maybe even a developer wouldn't
>> > either, you'd just have to try it.
>> >
>> >
>
> --
> Hugo Mills             | Hey, Virtual Memory! Now I can have a *really big*
> hugo@... carfax.org.uk | ramdisk!
> http://carfax.org.uk/  |
> PGP: E2AB1DE4          |

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-09 20:59                   ` cheater00 .
@ 2016-01-09 21:04                     ` Hugo Mills
  2016-01-09 21:07                       ` cheater00 .
  2016-01-11  0:13                       ` Chris Murphy
  0 siblings, 2 replies; 55+ messages in thread
From: Hugo Mills @ 2016-01-09 21:04 UTC (permalink / raw)
  To: cheater00 .; +Cc: Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3493 bytes --]

On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
> OK. How do we track down that bug and get it fixed?

   I have no idea. I'm not a btrfs dev, I'm afraid.

   It's been around for a number of years. None of the devs has, I
think, had the time to look at it. When Josef was still (publicly)
active, he had it second on his list of bugs to look at for many
months -- but it always got trumped by some new bug that could cause
data loss.

   Hugo.

> On Sat, Jan 9, 2016 at 9:26 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> > On Sat, Jan 09, 2016 at 09:00:47PM +0100, cheater00 . wrote:
> >> Hello,
> >> I can repeatedly trigger this bug by making the "data" portion fill
> >> up. If you remember the partition is 6 TB but in btrfs filesystem df
> >> Data is shown as only 2TB when in fact it should be nearly 6TB. So
> >> this has nothing to do with kernel bugs. The filesystem on disk is
> >> structured incorrectly. How do i fix this? How do I make "Data"
> >> bigger? What is it exactly?
> >
> >    This is *exactly* the behaviour of the known kernel bug. The bug is
> > that the FS *should* be extending the data allocation when it gets
> > near to full, and it's not. There is no way of manually allocating
> > more (because the FS should be doing it automatically). There is no
> > known way of persuading the FS to it when it isn't.
> >
> >    The only good solution I know of is to reformat the FS and restore
> > from backups. Even then, some people manage to repeatedly hit this
> > with newly-created filesystems.
> >
> >    Hugo.
> >
> >> Thanks
> >>
> >> P.S. Sorry about reposting twice, apparently Google's "Inbox" app
> >> doesn't allow posting plain text at all and the mail got rejected from
> >> the list.
> >>
> >> On Thu, 7 Jan 2016 23:22 Chris Murphy <lists@colorremedies.com> wrote:
> >> >
> >> > On Thu, Jan 7, 2016 at 3:04 PM, cheater00 . <cheater00@gmail.com> wrote:
> >> > > Yes, both times it was the same drive. I only have one usb drive now.
> >> >
> >> > That it's the same drive is suspicious. But I don't know what
> >> > errno=-28 means or what could trigger it, if some USB weirdness could
> >> > cause Btrfs to get confused somehow. I have one 7200rpm drive that
> >> > wants 1.15A compared to all the others that have a 900mA spec, and
> >> > while it behaves find 99% of the time like the others, rarely I would
> >> > get the reset message and most of the time it was that drive (and less
> >> > often one other). Now that doesn't happen anymore.
> >> >
> >> > >
> >> > > I am not sure if chasing the kernel makes sense unless you think there is a
> >> > > specific commit that would have foxed it. I only reported here in case
> >> > > anyone here wanted to do some form of debugging before i reset the drive and
> >> > > rescan the fs to make it writeable again. But since there seems to be no
> >> > > interest i will go forward.
> >> >
> >> > I'd chase the hardware problem then first. It's just that the kernel
> >> > switch is easier from my perspective. And it's just as unclear this is
> >> > hardware related than just a bug. And since there are hundreds to
> >> > thousands of Btrfs bugs being fixed per kernel release, I have no way
> >> > to tell you whether it's fixed and maybe even a developer wouldn't
> >> > either, you'd just have to try it.
> >> >
> >> >
> >

-- 
Hugo Mills             | Hey, Virtual Memory! Now I can have a *really big*
hugo@... carfax.org.uk | ramdisk!
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-09 21:04                     ` Hugo Mills
@ 2016-01-09 21:07                       ` cheater00 .
  2016-01-09 21:15                         ` Hugo Mills
                                           ` (2 more replies)
  2016-01-11  0:13                       ` Chris Murphy
  1 sibling, 3 replies; 55+ messages in thread
From: cheater00 . @ 2016-01-09 21:07 UTC (permalink / raw)
  To: Hugo Mills, cheater00 ., Chris Murphy, Btrfs BTRFS

Would like to point out that this can cause data loss. If I'm writing
to disk and the disk becomes unexpectedly read only - that data will
be lost, because who in their right mind makes their code expect this
and builds a contingency (e.g. caching, backpressure, etc)...

There's no loss of data on the disk because the data doesn't make it
to disk in the first place. But it's exactly the same as if the data
had been written to disk, and then lost.

On Sat, Jan 9, 2016 at 10:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
>> OK. How do we track down that bug and get it fixed?
>
>    I have no idea. I'm not a btrfs dev, I'm afraid.
>
>    It's been around for a number of years. None of the devs has, I
> think, had the time to look at it. When Josef was still (publicly)
> active, he had it second on his list of bugs to look at for many
> months -- but it always got trumped by some new bug that could cause
> data loss.
>
>    Hugo.
>
>> On Sat, Jan 9, 2016 at 9:26 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> > On Sat, Jan 09, 2016 at 09:00:47PM +0100, cheater00 . wrote:
>> >> Hello,
>> >> I can repeatedly trigger this bug by making the "data" portion fill
>> >> up. If you remember the partition is 6 TB but in btrfs filesystem df
>> >> Data is shown as only 2TB when in fact it should be nearly 6TB. So
>> >> this has nothing to do with kernel bugs. The filesystem on disk is
>> >> structured incorrectly. How do i fix this? How do I make "Data"
>> >> bigger? What is it exactly?
>> >
>> >    This is *exactly* the behaviour of the known kernel bug. The bug is
>> > that the FS *should* be extending the data allocation when it gets
>> > near to full, and it's not. There is no way of manually allocating
>> > more (because the FS should be doing it automatically). There is no
>> > known way of persuading the FS to it when it isn't.
>> >
>> >    The only good solution I know of is to reformat the FS and restore
>> > from backups. Even then, some people manage to repeatedly hit this
>> > with newly-created filesystems.
>> >
>> >    Hugo.
>> >
>> >> Thanks
>> >>
>> >> P.S. Sorry about reposting twice, apparently Google's "Inbox" app
>> >> doesn't allow posting plain text at all and the mail got rejected from
>> >> the list.
>> >>
>> >> On Thu, 7 Jan 2016 23:22 Chris Murphy <lists@colorremedies.com> wrote:
>> >> >
>> >> > On Thu, Jan 7, 2016 at 3:04 PM, cheater00 . <cheater00@gmail.com> wrote:
>> >> > > Yes, both times it was the same drive. I only have one usb drive now.
>> >> >
>> >> > That it's the same drive is suspicious. But I don't know what
>> >> > errno=-28 means or what could trigger it, if some USB weirdness could
>> >> > cause Btrfs to get confused somehow. I have one 7200rpm drive that
>> >> > wants 1.15A compared to all the others that have a 900mA spec, and
>> >> > while it behaves find 99% of the time like the others, rarely I would
>> >> > get the reset message and most of the time it was that drive (and less
>> >> > often one other). Now that doesn't happen anymore.
>> >> >
>> >> > >
>> >> > > I am not sure if chasing the kernel makes sense unless you think there is a
>> >> > > specific commit that would have foxed it. I only reported here in case
>> >> > > anyone here wanted to do some form of debugging before i reset the drive and
>> >> > > rescan the fs to make it writeable again. But since there seems to be no
>> >> > > interest i will go forward.
>> >> >
>> >> > I'd chase the hardware problem then first. It's just that the kernel
>> >> > switch is easier from my perspective. And it's just as unclear this is
>> >> > hardware related than just a bug. And since there are hundreds to
>> >> > thousands of Btrfs bugs being fixed per kernel release, I have no way
>> >> > to tell you whether it's fixed and maybe even a developer wouldn't
>> >> > either, you'd just have to try it.
>> >> >
>> >> >
>> >
>
> --
> Hugo Mills             | Hey, Virtual Memory! Now I can have a *really big*
> hugo@... carfax.org.uk | ramdisk!
> http://carfax.org.uk/  |
> PGP: E2AB1DE4          |

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-09 21:07                       ` cheater00 .
@ 2016-01-09 21:15                         ` Hugo Mills
  2016-01-10  3:59                           ` cheater00 .
  2016-01-10  6:16                         ` Russell Coker
  2016-01-11 13:05                         ` Austin S. Hemmelgarn
  2 siblings, 1 reply; 55+ messages in thread
From: Hugo Mills @ 2016-01-09 21:15 UTC (permalink / raw)
  To: cheater00 .; +Cc: Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 4615 bytes --]

On Sat, Jan 09, 2016 at 10:07:50PM +0100, cheater00 . wrote:
> Would like to point out that this can cause data loss. If I'm writing
> to disk and the disk becomes unexpectedly read only - that data will
> be lost, because who in their right mind makes their code expect this
> and builds a contingency (e.g. caching, backpressure, etc)...
> 
> There's no loss of data on the disk because the data doesn't make it
> to disk in the first place. But it's exactly the same as if the data
> had been written to disk, and then lost.

   That's only the same kind of data loss that you'd encounter if the
power went out unexpectedly at the same point. The application isn't
told that data has been written to disk when it hasn't. It's far from
a good situation, but it's not a failure of data durability.

   Hugo.

> On Sat, Jan 9, 2016 at 10:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> > On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
> >> OK. How do we track down that bug and get it fixed?
> >
> >    I have no idea. I'm not a btrfs dev, I'm afraid.
> >
> >    It's been around for a number of years. None of the devs has, I
> > think, had the time to look at it. When Josef was still (publicly)
> > active, he had it second on his list of bugs to look at for many
> > months -- but it always got trumped by some new bug that could cause
> > data loss.
> >
> >    Hugo.
> >
> >> On Sat, Jan 9, 2016 at 9:26 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> >> > On Sat, Jan 09, 2016 at 09:00:47PM +0100, cheater00 . wrote:
> >> >> Hello,
> >> >> I can repeatedly trigger this bug by making the "data" portion fill
> >> >> up. If you remember the partition is 6 TB but in btrfs filesystem df
> >> >> Data is shown as only 2TB when in fact it should be nearly 6TB. So
> >> >> this has nothing to do with kernel bugs. The filesystem on disk is
> >> >> structured incorrectly. How do i fix this? How do I make "Data"
> >> >> bigger? What is it exactly?
> >> >
> >> >    This is *exactly* the behaviour of the known kernel bug. The bug is
> >> > that the FS *should* be extending the data allocation when it gets
> >> > near to full, and it's not. There is no way of manually allocating
> >> > more (because the FS should be doing it automatically). There is no
> >> > known way of persuading the FS to it when it isn't.
> >> >
> >> >    The only good solution I know of is to reformat the FS and restore
> >> > from backups. Even then, some people manage to repeatedly hit this
> >> > with newly-created filesystems.
> >> >
> >> >    Hugo.
> >> >
> >> >> Thanks
> >> >>
> >> >> P.S. Sorry about reposting twice, apparently Google's "Inbox" app
> >> >> doesn't allow posting plain text at all and the mail got rejected from
> >> >> the list.
> >> >>
> >> >> On Thu, 7 Jan 2016 23:22 Chris Murphy <lists@colorremedies.com> wrote:
> >> >> >
> >> >> > On Thu, Jan 7, 2016 at 3:04 PM, cheater00 . <cheater00@gmail.com> wrote:
> >> >> > > Yes, both times it was the same drive. I only have one usb drive now.
> >> >> >
> >> >> > That it's the same drive is suspicious. But I don't know what
> >> >> > errno=-28 means or what could trigger it, if some USB weirdness could
> >> >> > cause Btrfs to get confused somehow. I have one 7200rpm drive that
> >> >> > wants 1.15A compared to all the others that have a 900mA spec, and
> >> >> > while it behaves find 99% of the time like the others, rarely I would
> >> >> > get the reset message and most of the time it was that drive (and less
> >> >> > often one other). Now that doesn't happen anymore.
> >> >> >
> >> >> > >
> >> >> > > I am not sure if chasing the kernel makes sense unless you think there is a
> >> >> > > specific commit that would have foxed it. I only reported here in case
> >> >> > > anyone here wanted to do some form of debugging before i reset the drive and
> >> >> > > rescan the fs to make it writeable again. But since there seems to be no
> >> >> > > interest i will go forward.
> >> >> >
> >> >> > I'd chase the hardware problem then first. It's just that the kernel
> >> >> > switch is easier from my perspective. And it's just as unclear this is
> >> >> > hardware related than just a bug. And since there are hundreds to
> >> >> > thousands of Btrfs bugs being fixed per kernel release, I have no way
> >> >> > to tell you whether it's fixed and maybe even a developer wouldn't
> >> >> > either, you'd just have to try it.
> >> >> >
> >> >> >
> >> >
> >

-- 
Hugo Mills             | make bzImage, not war
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                       Markus Reichelt

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-09 21:15                         ` Hugo Mills
@ 2016-01-10  3:59                           ` cheater00 .
  0 siblings, 0 replies; 55+ messages in thread
From: cheater00 . @ 2016-01-10  3:59 UTC (permalink / raw)
  To: Hugo Mills, cheater00 ., Chris Murphy, Btrfs BTRFS

Not really - at power loss the whole system is unreachable and can not
accept incoming jobs/RPCs. When btrfs bugs out, it can still do that,
and will not be able to say "oops i messed up" because it's not the
kind of error that you'd be expected to handle as a programmer. So say
an email server will not accept incoming emails at power loss, whereas
if btrfs bugs out it will accept emails but will put them in a black
hole. So it's more urgent than "hey this is like power loss and we
can't really do much about power loss so let's ignore this issue for
now"

On Sat, Jan 9, 2016 at 10:15 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Sat, Jan 09, 2016 at 10:07:50PM +0100, cheater00 . wrote:
>> Would like to point out that this can cause data loss. If I'm writing
>> to disk and the disk becomes unexpectedly read only - that data will
>> be lost, because who in their right mind makes their code expect this
>> and builds a contingency (e.g. caching, backpressure, etc)...
>>
>> There's no loss of data on the disk because the data doesn't make it
>> to disk in the first place. But it's exactly the same as if the data
>> had been written to disk, and then lost.
>
>    That's only the same kind of data loss that you'd encounter if the
> power went out unexpectedly at the same point. The application isn't
> told that data has been written to disk when it hasn't. It's far from
> a good situation, but it's not a failure of data durability.
>
>    Hugo.
>
>> On Sat, Jan 9, 2016 at 10:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> > On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
>> >> OK. How do we track down that bug and get it fixed?
>> >
>> >    I have no idea. I'm not a btrfs dev, I'm afraid.
>> >
>> >    It's been around for a number of years. None of the devs has, I
>> > think, had the time to look at it. When Josef was still (publicly)
>> > active, he had it second on his list of bugs to look at for many
>> > months -- but it always got trumped by some new bug that could cause
>> > data loss.
>> >
>> >    Hugo.
>> >
>> >> On Sat, Jan 9, 2016 at 9:26 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> >> > On Sat, Jan 09, 2016 at 09:00:47PM +0100, cheater00 . wrote:
>> >> >> Hello,
>> >> >> I can repeatedly trigger this bug by making the "data" portion fill
>> >> >> up. If you remember the partition is 6 TB but in btrfs filesystem df
>> >> >> Data is shown as only 2TB when in fact it should be nearly 6TB. So
>> >> >> this has nothing to do with kernel bugs. The filesystem on disk is
>> >> >> structured incorrectly. How do i fix this? How do I make "Data"
>> >> >> bigger? What is it exactly?
>> >> >
>> >> >    This is *exactly* the behaviour of the known kernel bug. The bug is
>> >> > that the FS *should* be extending the data allocation when it gets
>> >> > near to full, and it's not. There is no way of manually allocating
>> >> > more (because the FS should be doing it automatically). There is no
>> >> > known way of persuading the FS to it when it isn't.
>> >> >
>> >> >    The only good solution I know of is to reformat the FS and restore
>> >> > from backups. Even then, some people manage to repeatedly hit this
>> >> > with newly-created filesystems.
>> >> >
>> >> >    Hugo.
>> >> >
>> >> >> Thanks
>> >> >>
>> >> >> P.S. Sorry about reposting twice, apparently Google's "Inbox" app
>> >> >> doesn't allow posting plain text at all and the mail got rejected from
>> >> >> the list.
>> >> >>
>> >> >> On Thu, 7 Jan 2016 23:22 Chris Murphy <lists@colorremedies.com> wrote:
>> >> >> >
>> >> >> > On Thu, Jan 7, 2016 at 3:04 PM, cheater00 . <cheater00@gmail.com> wrote:
>> >> >> > > Yes, both times it was the same drive. I only have one usb drive now.
>> >> >> >
>> >> >> > That it's the same drive is suspicious. But I don't know what
>> >> >> > errno=-28 means or what could trigger it, if some USB weirdness could
>> >> >> > cause Btrfs to get confused somehow. I have one 7200rpm drive that
>> >> >> > wants 1.15A compared to all the others that have a 900mA spec, and
>> >> >> > while it behaves find 99% of the time like the others, rarely I would
>> >> >> > get the reset message and most of the time it was that drive (and less
>> >> >> > often one other). Now that doesn't happen anymore.
>> >> >> >
>> >> >> > >
>> >> >> > > I am not sure if chasing the kernel makes sense unless you think there is a
>> >> >> > > specific commit that would have foxed it. I only reported here in case
>> >> >> > > anyone here wanted to do some form of debugging before i reset the drive and
>> >> >> > > rescan the fs to make it writeable again. But since there seems to be no
>> >> >> > > interest i will go forward.
>> >> >> >
>> >> >> > I'd chase the hardware problem then first. It's just that the kernel
>> >> >> > switch is easier from my perspective. And it's just as unclear this is
>> >> >> > hardware related than just a bug. And since there are hundreds to
>> >> >> > thousands of Btrfs bugs being fixed per kernel release, I have no way
>> >> >> > to tell you whether it's fixed and maybe even a developer wouldn't
>> >> >> > either, you'd just have to try it.
>> >> >> >
>> >> >> >
>> >> >
>> >
>
> --
> Hugo Mills             | make bzImage, not war
> hugo@... carfax.org.uk |
> http://carfax.org.uk/  |
> PGP: E2AB1DE4          |                                       Markus Reichelt

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-09 21:07                       ` cheater00 .
  2016-01-09 21:15                         ` Hugo Mills
@ 2016-01-10  6:16                         ` Russell Coker
  2016-01-10 22:24                           ` cheater00 .
  2016-01-11 13:05                         ` Austin S. Hemmelgarn
  2 siblings, 1 reply; 55+ messages in thread
From: Russell Coker @ 2016-01-10  6:16 UTC (permalink / raw)
  To: cheater00 ., Btrfs BTRFS

On Sun, 10 Jan 2016 08:07:50 AM cheater00 . wrote:
> Would like to point out that this can cause data loss. If I'm writing
> to disk and the disk becomes unexpectedly read only - that data will
> be lost, because who in their right mind makes their code expect this
> and builds a contingency (e.g. caching, backpressure, etc)...

There are lots of situations when a filesystem can become read-only, it's a 
fairly standard response to disk errors.  You should be able to handle that if 
your application deals with important data.

I was under the impression that this bug didn't make the disk read-only (IE 
you can delete/truncate files to free space) but instead incorrectly told the 
application that there was no space.  ENOSPACE is very common and all apps 
have to deal with it.

> There's no loss of data on the disk because the data doesn't make it
> to disk in the first place. But it's exactly the same as if the data
> had been written to disk, and then lost.

No it's not.  If you write data and a fsync() or fdatasync() call succeeds 
then it's on disk, otherwise not.  All apps which depend on data being written 
to disk (EG database servers and mail servers) use fsync() and fdatasync().

On Sun, 10 Jan 2016 02:59:46 PM cheater00 . wrote:
> Not really - at power loss the whole system is unreachable and can not
> accept incoming jobs/RPCs. When btrfs bugs out, it can still do that,
> and will not be able to say "oops i messed up" because it's not the
> kind of error that you'd be expected to handle as a programmer. So say
> an email server will not accept incoming emails at power loss, whereas
> if btrfs bugs out it will accept emails but will put them in a black
> hole. So it's more urgent than "hey this is like power loss and we
> can't really do much about power loss so let's ignore this issue for
> now"

Please test this with the common mail server software, EG Postfix, Exim, 
Procmail, Maildrop, Dovecot, etc.  The BTRFS bug as described won't cause data 
loss with any of them.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-09 20:26                 ` Hugo Mills
  2016-01-09 20:59                   ` cheater00 .
@ 2016-01-10 14:14                   ` Henk Slager
  2016-01-10 23:47                     ` cheater00 .
  1 sibling, 1 reply; 55+ messages in thread
From: Henk Slager @ 2016-01-10 14:14 UTC (permalink / raw)
  To: cheater00 .; +Cc: Btrfs BTRFS, Hugo Mills, Chris Murphy

On Sat, Jan 9, 2016 at 9:26 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Sat, Jan 09, 2016 at 09:00:47PM +0100, cheater00 . wrote:
>> Hello,
>> I can repeatedly trigger this bug by making the "data" portion fill
>> up. If you remember the partition is 6 TB but in btrfs filesystem df
>> Data is shown as only 2TB when in fact it should be nearly 6TB. So
>> this has nothing to do with kernel bugs. The filesystem on disk is
>> structured incorrectly. How do i fix this? How do I make "Data"
>> bigger? What is it exactly?
>
>    This is *exactly* the behaviour of the known kernel bug. The bug is
> that the FS *should* be extending the data allocation when it gets
> near to full, and it's not. There is no way of manually allocating
> more (because the FS should be doing it automatically). There is no
> known way of persuading the FS to it when it isn't.

Probably this is 'the'  bug we talk about:
https://bugzilla.kernel.org/show_bug.cgi?id=74101

Size of the fs is much smaller, but also problem occurs when fill-level is <50%

btrfs fs resize  did nothing you mention, but AFAIK you should see
something in dmesg when you do that.
And what is the output of   gdisk -l /dev/sdd  , just to check?

Have you had the fs already filled up to e.g. 95% before or has is
always been not more than 2TiB?

Why are the single (empty )profiles for metadata and system still
there?  They should have been removed already by the various balancing
operations that are advised in the btrfs-wiki.

What is the output of   btrfs check  /dev/sdd  ?  The usb resets
mentioned might have introduced some errors to the fs (that is what I
have experienced at least, but it depends on timing etc)

What you could try is to create an image+'copy' of the fs with
btrfs-image just after you get ENOSPC abd then do various tests with
that (make sure unmount or even better unplug the physical hdd!). Like
mounting and then try to add a file, convert all metadata + system
from dup to single and then try to add a file. It all doesn't give
real space, but it might give hints to what could be wrong.

If you somehow manage to reduce the fs by lets say 100G and also the
partition, you could install or copy a newer linux+kernel to
partition(s) in that 100G space and boot from there.


>    The only good solution I know of is to reformat the FS and restore
> from backups. Even then, some people manage to repeatedly hit this
> with newly-created filesystems.
>
>    Hugo.
>
>> Thanks
>>
>> P.S. Sorry about reposting twice, apparently Google's "Inbox" app
>> doesn't allow posting plain text at all and the mail got rejected from
>> the list.
>>
>> On Thu, 7 Jan 2016 23:22 Chris Murphy <lists@colorremedies.com> wrote:
>> >
>> > On Thu, Jan 7, 2016 at 3:04 PM, cheater00 . <cheater00@gmail.com> wrote:
>> > > Yes, both times it was the same drive. I only have one usb drive now.
>> >
>> > That it's the same drive is suspicious. But I don't know what
>> > errno=-28 means or what could trigger it, if some USB weirdness could
>> > cause Btrfs to get confused somehow. I have one 7200rpm drive that
>> > wants 1.15A compared to all the others that have a 900mA spec, and
>> > while it behaves find 99% of the time like the others, rarely I would
>> > get the reset message and most of the time it was that drive (and less
>> > often one other). Now that doesn't happen anymore.
>> >
>> > >
>> > > I am not sure if chasing the kernel makes sense unless you think there is a
>> > > specific commit that would have foxed it. I only reported here in case
>> > > anyone here wanted to do some form of debugging before i reset the drive and
>> > > rescan the fs to make it writeable again. But since there seems to be no
>> > > interest i will go forward.
>> >
>> > I'd chase the hardware problem then first. It's just that the kernel
>> > switch is easier from my perspective. And it's just as unclear this is
>> > hardware related than just a bug. And since there are hundreds to
>> > thousands of Btrfs bugs being fixed per kernel release, I have no way
>> > to tell you whether it's fixed and maybe even a developer wouldn't
>> > either, you'd just have to try it.
>> >
>> >
>
> --
> Hugo Mills             | Hey, Virtual Memory! Now I can have a *really big*
> hugo@... carfax.org.uk | ramdisk!
> http://carfax.org.uk/  |
> PGP: E2AB1DE4          |

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-10  6:16                         ` Russell Coker
@ 2016-01-10 22:24                           ` cheater00 .
  2016-01-10 22:32                             ` Lionel Bouton
  0 siblings, 1 reply; 55+ messages in thread
From: cheater00 . @ 2016-01-10 22:24 UTC (permalink / raw)
  To: Russell Coker; +Cc: Btrfs BTRFS

On Sun, Jan 10, 2016 at 7:16 AM, Russell Coker <russell@coker.com.au> wrote:
> On Sun, 10 Jan 2016 08:07:50 AM cheater00 . wrote:
>> Would like to point out that this can cause data loss. If I'm writing
>> to disk and the disk becomes unexpectedly read only - that data will
>> be lost, because who in their right mind makes their code expect this
>> and builds a contingency (e.g. caching, backpressure, etc)...
>
> I was under the impression that this bug didn't make the disk read-only (IE
> you can delete/truncate files to free space) but instead incorrectly told the
> application that there was no space.  ENOSPACE is very common and all apps
> have to deal with it.

The kernel remounts the disk as ro when this bug happens.

>> There's no loss of data on the disk because the data doesn't make it
>> to disk in the first place. But it's exactly the same as if the data
>> had been written to disk, and then lost.

> No it's not.  If you write data and a fsync() or fdatasync() call succeeds
> then it's on disk, otherwise not.  All apps which depend on data being written
> to disk (EG database servers and mail servers) use fsync() and fdatasync().

> Please test this with the common mail server software, EG Postfix, Exim,
> Procmail, Maildrop, Dovecot, etc.  The BTRFS bug as described won't cause data
> loss with any of them.

It's easy to imagine this scenario: you don't want your server to run
out of space, so you put alerts in place for when there's, say, only
10 GB left, and will provision new servers at that point. So you keep
on doing df (or even btrfs filesystem df) and that works. But then
this bug shows up and you run out of space even though you still have
3 terabytes left according to your metrics. At this point your server
breaks down and cannot accept jobs any more, even though according to
everything you've done it should. So even though your system
(including the btrfs code) assures you that you might be able to
receive a message within say 60 seconds (the maximum time to provision
a server on your cloud provider), it turns out the time extends
indefinitely, that is until you notice there's an issue and intervene,
which might be, say, half an hour. During which time messages are lost
because the sender only retries for 60 seconds and then throws away
the message forever. There you go, loss of data and service.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-10 22:24                           ` cheater00 .
@ 2016-01-10 22:32                             ` Lionel Bouton
  0 siblings, 0 replies; 55+ messages in thread
From: Lionel Bouton @ 2016-01-10 22:32 UTC (permalink / raw)
  To: cheater00 ., Russell Coker; +Cc: Btrfs BTRFS

Le 10/01/2016 23:24, cheater00 . a écrit :
>> Please test this with the common mail server software, EG Postfix, Exim,
>> Procmail, Maildrop, Dovecot, etc.  The BTRFS bug as described won't cause data
>> loss with any of them.
> [...]
> because the sender only retries for 60 seconds and then throws away
> the message forever. There you go, loss of data and service.

What Russel was trying to say is that it's not how SMTP (and in general
any system/protocol where data is important) works. If the sender
behaves like this he is the one with a defective system.

Lionel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-10 14:14                   ` Henk Slager
@ 2016-01-10 23:47                     ` cheater00 .
  2016-01-11  0:24                       ` Chris Murphy
  2016-01-11 19:50                       ` Henk Slager
  0 siblings, 2 replies; 55+ messages in thread
From: cheater00 . @ 2016-01-10 23:47 UTC (permalink / raw)
  To: Henk Slager; +Cc: Btrfs BTRFS, Hugo Mills, Chris Murphy

[-- Attachment #1: Type: text/plain, Size: 5019 bytes --]

On Sun, Jan 10, 2016 at 3:14 PM, Henk Slager <eye1tm@gmail.com> wrote:
> On Sat, Jan 9, 2016 at 9:26 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> On Sat, Jan 09, 2016 at 09:00:47PM +0100, cheater00 . wrote:
>>> Hello,
>>> I can repeatedly trigger this bug by making the "data" portion fill
>>> up. If you remember the partition is 6 TB but in btrfs filesystem df
>>> Data is shown as only 2TB when in fact it should be nearly 6TB. So
>>> this has nothing to do with kernel bugs. The filesystem on disk is
>>> structured incorrectly. How do i fix this? How do I make "Data"
>>> bigger? What is it exactly?
>>
>>    This is *exactly* the behaviour of the known kernel bug. The bug is
>> that the FS *should* be extending the data allocation when it gets
>> near to full, and it's not. There is no way of manually allocating
>> more (because the FS should be doing it automatically). There is no
>> known way of persuading the FS to it when it isn't.
>
> Probably this is 'the'  bug we talk about:
> https://bugzilla.kernel.org/show_bug.cgi?id=74101

Yes, would seem like it.

> Size of the fs is much smaller, but also problem occurs when fill-level is <50%
>
> btrfs fs resize  did nothing you mention, but AFAIK you should see
> something in dmesg when you do that.


I remounted (to make the filesystem rw again) and got the attached
log. There are some errors in there. But bear in mind that this disk
has been hanging on in ro mode for a few days now.

I did this:
# btrfs fi resize -1T Media
Resize 'Media' of '-1T'
# btrfs filesystem resize -1G Media
Resize 'Media' of '-1G'
# btrfs filesystem resize -1T Media
Resize 'Media' of '-1T'
# btrfs filesystem resize max Media
Resize 'Media' of 'max'

and it resulted in the following lines in dmesg:
[189115.919160] BTRFS: new size for /dev/sdc1 is 4901661835264
[189177.306291] BTRFS: new size for /dev/sdc1 is 4900588093440
[189181.950289] BTRFS: new size for /dev/sdc1 is 3801076465664
[189232.064357] BTRFS: new size for /dev/sdc1 is 6001173463040

(note the device changed from sdd to sdc when I rebooted last)

> And what is the output of   gdisk -l /dev/sdd  , just to check?

(the device changed to sdc since I've rebooted)

# gdisk -l /dev/sdc
GPT fdisk (gdisk) version 0.8.8

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sdc: 11721045168 sectors, 5.5 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 0DEF5509-8730-4AB4-A846-79DA3C376F66
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 11721045134
Partitions will be aligned on 2048-sector boundaries
Total free space is 3181 sectors (1.6 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048     11721043967   5.5 TiB     8300


> Have you had the fs already filled up to e.g. 95% before or has is
> always been not more than 2TiB?

It has never been more than 2TB, I've had it for quite some time now
but it's always hovered around 1TB.

> Why are the single (empty )profiles for metadata and system still
> there?  They should have been removed already by the various balancing
> operations that are advised in the btrfs-wiki.

Not sure. Would this happen automatically? Or is this something I
should have done?
I have another fs on the same model/size disk, which isn't exhibiting
this bug, and it has those profiles as well.

Here's what both look like:

buggy fs ("Media"):
Data, single: total=1.98TiB, used=1.98TiB
System, DUP: total=8.00MiB, used=240.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=5.50GiB, used=3.49GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

non-buggy fs:
Data, single: total=5.43TiB, used=5.40TiB
System, DUP: total=8.00MiB, used=608.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=12.50GiB, used=11.21GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B


> What is the output of   btrfs check  /dev/sdd  ?  The usb resets
> mentioned might have introduced some errors to the fs (that is what I
> have experienced at least, but it depends on timing etc)

I'll run that overnight and report tomorrow.

> What you could try is to create an image+'copy' of the fs with
> btrfs-image just after you get ENOSPC abd then do various tests with
> that (make sure unmount or even better unplug the physical hdd!). Like
> mounting and then try to add a file, convert all metadata + system
> from dup to single and then try to add a file. It all doesn't give
> real space, but it might give hints to what could be wrong.

I can't do that because I would have to buy an extra disk which is 300 euro.

> If you somehow manage to reduce the fs by lets say 100G and also the
> partition, you could install or copy a newer linux+kernel to
> partition(s) in that 100G space and boot from there.

let me try finding the latest kernel then. There are backports.

[-- Attachment #2: remount to make disk rw.txt --]
[-- Type: text/plain, Size: 7965 bytes --]

[188936.772198] BTRFS error (device sdc1): cleaner transaction attach returned -30
[188936.776188] ------------[ cut here ]------------
[188936.776215] WARNING: CPU: 3 PID: 623 at /home/kernel/COD/linux/fs/btrfs/inode.c:8968 btrfs_destroy_inode+0x298/0x2c0 [btrfs]()
[188936.776217] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c nls_utf8 isofs pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) cuse ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables usblp snd_hda_codec_hdmi hp_wmi sparse_keymap snd_hda_codec_idt snd_hda_codec_generic intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp dm_multipath snd_hda_intel snd_hda_codec coretemp snd_hda_core kvm_intel radeon snd_hwdep hid_logitech_hidpp kvm snd_pcm i915 snd_seq_midi crc32_pclmul snd_seq_midi_event snd_rawmidi aesni_intel aes_i586 ttm xts snd_seq lrw drm_kms_helper snd_seq_device gf128mul snd_timer ablk_helper drm joydev cryptd bnep rfcomm snd input_leds i2c_algo_bit fb_sys_fops rtsx_pci_ms bluetooth serio_raw soundcore syscopyarea memstick sysfillrect sysimgblt hp_accel mei_me lis3lv02d lpc_ich wmi shpchp input_polldev mei video mac_hid nfsd auth_rpcgss nfs_acl parport_pc nfs ppdev lockd lp grace sunrpc parport fscache binfmt_misc hid_generic hid_logitech_dj usbhid hid btrfs xor uas usb_storage raid6_pq rtsx_pci_sdmmc ahci r8169 sdhci_pci psmouse libahci sdhci rtsx_pci mii fjes
[188936.776312] CPU: 3 PID: 623 Comm: umount Tainted: G        W  OE   4.3.0-040300rc7-generic #201510260712
[188936.776314] Hardware name: Hewlett-Packard HP Pavilion dv6 Notebook PC/17FA, BIOS F.02 10/03/2011
[188936.776317]  00000000 00000000 e831dddc c13610e8 00000000 e831de0c c1068107 c1957804
[188936.776322]  00000003 0000026f f8ae5c34 00002308 f8a6e248 f8a6e248 ceabb1e4 00000002
[188936.776326]  ea887800 e831de1c c10681e2 00000009 00000000 e831de4c f8a6e248 e831de38
[188936.776331] Call Trace:
[188936.776338]  [<c13610e8>] dump_stack+0x41/0x59
[188936.776343]  [<c1068107>] warn_slowpath_common+0x87/0xc0
[188936.776360]  [<f8a6e248>] ? btrfs_destroy_inode+0x298/0x2c0 [btrfs]
[188936.776375]  [<f8a6e248>] ? btrfs_destroy_inode+0x298/0x2c0 [btrfs]
[188936.776379]  [<c10681e2>] warn_slowpath_null+0x22/0x30
[188936.776393]  [<f8a6e248>] btrfs_destroy_inode+0x298/0x2c0 [btrfs]
[188936.776397]  [<c1201fe8>] ? fsnotify_destroy_marks+0x58/0x70
[188936.776400]  [<c11dd8df>] destroy_inode+0x2f/0x60
[188936.776402]  [<c11dda0a>] evict+0xfa/0x170
[188936.776405]  [<c11ddc23>] iput+0x153/0x1c0
[188936.776419]  [<f8a44b64>] btrfs_put_block_group_cache+0x94/0xd0 [btrfs]
[188936.776434]  [<f8a569b1>] close_ctree+0x131/0x300 [btrfs]
[188936.776437]  [<c11ddab7>] ? dispose_list+0x37/0x50
[188936.776440]  [<c11de769>] ? evict_inodes+0x139/0x150
[188936.776450]  [<f8a27056>] btrfs_put_super+0x16/0x20 [btrfs]
[188936.776454]  [<c11c6850>] generic_shutdown_super+0x60/0xe0
[188936.776458]  [<c11ab1ed>] ? kfree+0x12d/0x140
[188936.776462]  [<c116df00>] ? unregister_shrinker+0x40/0x50
[188936.776465]  [<c11c6af1>] kill_anon_super+0x11/0x20
[188936.776475]  [<f8a27f65>] btrfs_kill_super+0x15/0xf0 [btrfs]
[188936.776479]  [<c11c6c4d>] deactivate_locked_super+0x3d/0x70
[188936.776482]  [<c11c70b7>] deactivate_super+0x57/0x60
[188936.776485]  [<c11e1d49>] cleanup_mnt+0x39/0x90
[188936.776488]  [<c11e1de0>] __cleanup_mnt+0x10/0x20
[188936.776491]  [<c10822cf>] task_work_run+0x7f/0xa0
[188936.776495]  [<c10035e5>] prepare_exit_to_usermode+0xf5/0x120
[188936.776498]  [<c1003646>] syscall_return_slowpath+0x36/0x120
[188936.776501]  [<c11e3365>] ? SyS_oldumount+0x75/0xb0
[188936.776506]  [<c1743786>] syscall_exit_work+0x7/0xc
[188936.776509] ---[ end trace dc3cf6814526c7c8 ]---
[188936.777408] ------------[ cut here ]------------
[188936.777426] WARNING: CPU: 3 PID: 623 at /home/kernel/COD/linux/fs/btrfs/inode.c:8968 btrfs_destroy_inode+0x298/0x2c0 [btrfs]()
[188936.777427] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c nls_utf8 isofs pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) cuse ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables usblp snd_hda_codec_hdmi hp_wmi sparse_keymap snd_hda_codec_idt snd_hda_codec_generic intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp dm_multipath snd_hda_intel snd_hda_codec coretemp snd_hda_core kvm_intel radeon snd_hwdep hid_logitech_hidpp kvm snd_pcm i915 snd_seq_midi crc32_pclmul snd_seq_midi_event snd_rawmidi aesni_intel aes_i586 ttm xts snd_seq lrw drm_kms_helper snd_seq_device gf128mul snd_timer ablk_helper drm joydev cryptd bnep rfcomm snd input_leds i2c_algo_bit fb_sys_fops rtsx_pci_ms bluetooth serio_raw soundcore syscopyarea memstick sysfillrect sysimgblt hp_accel mei_me lis3lv02d lpc_ich wmi shpchp input_polldev mei video mac_hid nfsd auth_rpcgss nfs_acl parport_pc nfs ppdev lockd lp grace sunrpc parport fscache binfmt_misc hid_generic hid_logitech_dj usbhid hid btrfs xor uas usb_storage raid6_pq rtsx_pci_sdmmc ahci r8169 sdhci_pci psmouse libahci sdhci rtsx_pci mii fjes
[188936.777502] CPU: 3 PID: 623 Comm: umount Tainted: G        W  OE   4.3.0-040300rc7-generic #201510260712
[188936.777504] Hardware name: Hewlett-Packard HP Pavilion dv6 Notebook PC/17FA, BIOS F.02 10/03/2011
[188936.777505]  00000000 00000000 e831dddc c13610e8 00000000 e831de0c c1068107 c1957804
[188936.777510]  00000003 0000026f f8ae5c34 00002308 f8a6e248 f8a6e248 d40395b4 00000002
[188936.777515]  ea887800 e831de1c c10681e2 00000009 00000000 e831de4c f8a6e248 e831de38
[188936.777520] Call Trace:
[188936.777524]  [<c13610e8>] dump_stack+0x41/0x59
[188936.777527]  [<c1068107>] warn_slowpath_common+0x87/0xc0
[188936.777542]  [<f8a6e248>] ? btrfs_destroy_inode+0x298/0x2c0 [btrfs]
[188936.777556]  [<f8a6e248>] ? btrfs_destroy_inode+0x298/0x2c0 [btrfs]
[188936.777559]  [<c10681e2>] warn_slowpath_null+0x22/0x30
[188936.777572]  [<f8a6e248>] btrfs_destroy_inode+0x298/0x2c0 [btrfs]
[188936.777575]  [<c1201fe8>] ? fsnotify_destroy_marks+0x58/0x70
[188936.777578]  [<c11dd8df>] destroy_inode+0x2f/0x60
[188936.777581]  [<c11dda0a>] evict+0xfa/0x170
[188936.777583]  [<c11ddc23>] iput+0x153/0x1c0
[188936.777597]  [<f8a44b64>] btrfs_put_block_group_cache+0x94/0xd0 [btrfs]
[188936.777611]  [<f8a569b1>] close_ctree+0x131/0x300 [btrfs]
[188936.777614]  [<c11ddab7>] ? dispose_list+0x37/0x50
[188936.777617]  [<c11de769>] ? evict_inodes+0x139/0x150
[188936.777628]  [<f8a27056>] btrfs_put_super+0x16/0x20 [btrfs]
[188936.777631]  [<c11c6850>] generic_shutdown_super+0x60/0xe0
[188936.777634]  [<c11ab1ed>] ? kfree+0x12d/0x140
[188936.777637]  [<c116df00>] ? unregister_shrinker+0x40/0x50
[188936.777640]  [<c11c6af1>] kill_anon_super+0x11/0x20
[188936.777650]  [<f8a27f65>] btrfs_kill_super+0x15/0xf0 [btrfs]
[188936.777653]  [<c11c6c4d>] deactivate_locked_super+0x3d/0x70
[188936.777656]  [<c11c70b7>] deactivate_super+0x57/0x60
[188936.777659]  [<c11e1d49>] cleanup_mnt+0x39/0x90
[188936.777662]  [<c11e1de0>] __cleanup_mnt+0x10/0x20
[188936.777665]  [<c10822cf>] task_work_run+0x7f/0xa0
[188936.777668]  [<c10035e5>] prepare_exit_to_usermode+0xf5/0x120
[188936.777671]  [<c1003646>] syscall_return_slowpath+0x36/0x120
[188936.777674]  [<c11e3365>] ? SyS_oldumount+0x75/0xb0
[188936.777679]  [<c1743786>] syscall_exit_work+0x7/0xc
[188936.777682] ---[ end trace dc3cf6814526c7c9 ]---
[188939.222403] BTRFS info (device sdc1): disk space caching is enabled
[189002.588577] BTRFS info (device sdc1): The free space cache file (2159324168192) is invalid. skip it
[189002.588577] 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-09 21:04                     ` Hugo Mills
  2016-01-09 21:07                       ` cheater00 .
@ 2016-01-11  0:13                       ` Chris Murphy
  2016-01-11  9:03                         ` Hugo Mills
  1 sibling, 1 reply; 55+ messages in thread
From: Chris Murphy @ 2016-01-11  0:13 UTC (permalink / raw)
  To: Hugo Mills, cheater00 ., Btrfs BTRFS

On Sat, Jan 9, 2016 at 2:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
>> OK. How do we track down that bug and get it fixed?
>
>    I have no idea. I'm not a btrfs dev, I'm afraid.
>
>    It's been around for a number of years. None of the devs has, I
> think, had the time to look at it. When Josef was still (publicly)
> active, he had it second on his list of bugs to look at for many
> months -- but it always got trumped by some new bug that could cause
> data loss.


Interesting. I did not know of this bug. It's pretty rare.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-10 23:47                     ` cheater00 .
@ 2016-01-11  0:24                       ` Chris Murphy
  2016-01-11  6:07                         ` cheater00 .
  2016-01-11 19:50                       ` Henk Slager
  1 sibling, 1 reply; 55+ messages in thread
From: Chris Murphy @ 2016-01-11  0:24 UTC (permalink / raw)
  To: cheater00 .; +Cc: Henk Slager, Btrfs BTRFS, Hugo Mills, Chris Murphy

On Sun, Jan 10, 2016 at 4:47 PM, cheater00 . <cheater00@gmail.com> wrote:
> On Sun, Jan 10, 2016 at 3:14 PM, Henk Slager <eye1tm@gmail.com> wrote:

> buggy fs ("Media"):
> Data, single: total=1.98TiB, used=1.98TiB
> System, DUP: total=8.00MiB, used=240.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, DUP: total=5.50GiB, used=3.49GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>> What you could try is to create an image+'copy' of the fs with
>> btrfs-image just after you get ENOSPC abd then do various tests with
>> that (make sure unmount or even better unplug the physical hdd!). Like
>> mounting and then try to add a file, convert all metadata + system
>> from dup to single and then try to add a file. It all doesn't give
>> real space, but it might give hints to what could be wrong.
>
> I can't do that because I would have to buy an extra disk which is 300 euro.

You have 4TB unused on that disk. You could shrink the fs to ~3TB, and
then change partition size to match and create a 2nd partition with
whatever FS you want as the target for btrfs-image.

After that, migrate your data from the broken fs to the new one. If
you use Btrfs for the new volume on that 2nd partition you can use
btrfs send-receive for this. After it's successful, you can wipefs the
1st partition, and then add it as an additional device to the new
volume.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11  0:24                       ` Chris Murphy
@ 2016-01-11  6:07                         ` cheater00 .
  2016-01-11  6:24                           ` cheater00 .
  0 siblings, 1 reply; 55+ messages in thread
From: cheater00 . @ 2016-01-11  6:07 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Henk Slager, Btrfs BTRFS, Hugo Mills

Here is the fsck log:

# btrfs check /dev/sdc1
Checking filesystem on /dev/sdc1
UUID: b397b7ef-6754-4ba4-8b1a-fbf235aa1cf8
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 2178647877626 bytes used err is 0
total csum bytes: 2123923832
total tree bytes: 3749871616
total fs tree bytes: 645742592
total extent tree bytes: 680394752
btree space waste bytes: 284237552
file data blocks allocated: 2178788139008
 referenced 2173089497088
btrfs-progs v4.2.1

On Mon, Jan 11, 2016 at 1:24 AM, Chris Murphy <lists@colorremedies.com> wrote:
> On Sun, Jan 10, 2016 at 4:47 PM, cheater00 . <cheater00@gmail.com> wrote:
>> On Sun, Jan 10, 2016 at 3:14 PM, Henk Slager <eye1tm@gmail.com> wrote:
>
>> buggy fs ("Media"):
>> Data, single: total=1.98TiB, used=1.98TiB
>> System, DUP: total=8.00MiB, used=240.00KiB
>> System, single: total=4.00MiB, used=0.00B
>> Metadata, DUP: total=5.50GiB, used=3.49GiB
>> Metadata, single: total=8.00MiB, used=0.00B
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>
>>> What you could try is to create an image+'copy' of the fs with
>>> btrfs-image just after you get ENOSPC abd then do various tests with
>>> that (make sure unmount or even better unplug the physical hdd!). Like
>>> mounting and then try to add a file, convert all metadata + system
>>> from dup to single and then try to add a file. It all doesn't give
>>> real space, but it might give hints to what could be wrong.
>>
>> I can't do that because I would have to buy an extra disk which is 300 euro.
>
> You have 4TB unused on that disk. You could shrink the fs to ~3TB, and
> then change partition size to match and create a 2nd partition with
> whatever FS you want as the target for btrfs-image.
>
> After that, migrate your data from the broken fs to the new one. If
> you use Btrfs for the new volume on that 2nd partition you can use
> btrfs send-receive for this. After it's successful, you can wipefs the
> 1st partition, and then add it as an additional device to the new
> volume.
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11  6:07                         ` cheater00 .
@ 2016-01-11  6:24                           ` cheater00 .
  2016-01-11  7:54                             ` cheater00 .
  0 siblings, 1 reply; 55+ messages in thread
From: cheater00 . @ 2016-01-11  6:24 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Henk Slager, Btrfs BTRFS, Hugo Mills

When re-mounting now, this line was not present any more in dmesg:
"BTRFS info (device sdc1): The free space cache file (2159324168192)
is invalid. skip it"

dmesg only showed:
"[216798.144518] BTRFS info (device sdc1): disk space caching is enabled"

On Mon, Jan 11, 2016 at 7:07 AM, cheater00 . <cheater00@gmail.com> wrote:
> Here is the fsck log:
>
> # btrfs check /dev/sdc1
> Checking filesystem on /dev/sdc1
> UUID: b397b7ef-6754-4ba4-8b1a-fbf235aa1cf8
> checking extents
> checking free space cache
> checking fs roots
> checking csums
> checking root refs
> found 2178647877626 bytes used err is 0
> total csum bytes: 2123923832
> total tree bytes: 3749871616
> total fs tree bytes: 645742592
> total extent tree bytes: 680394752
> btree space waste bytes: 284237552
> file data blocks allocated: 2178788139008
>  referenced 2173089497088
> btrfs-progs v4.2.1
>
> On Mon, Jan 11, 2016 at 1:24 AM, Chris Murphy <lists@colorremedies.com> wrote:
>> On Sun, Jan 10, 2016 at 4:47 PM, cheater00 . <cheater00@gmail.com> wrote:
>>> On Sun, Jan 10, 2016 at 3:14 PM, Henk Slager <eye1tm@gmail.com> wrote:
>>
>>> buggy fs ("Media"):
>>> Data, single: total=1.98TiB, used=1.98TiB
>>> System, DUP: total=8.00MiB, used=240.00KiB
>>> System, single: total=4.00MiB, used=0.00B
>>> Metadata, DUP: total=5.50GiB, used=3.49GiB
>>> Metadata, single: total=8.00MiB, used=0.00B
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>>
>>>> What you could try is to create an image+'copy' of the fs with
>>>> btrfs-image just after you get ENOSPC abd then do various tests with
>>>> that (make sure unmount or even better unplug the physical hdd!). Like
>>>> mounting and then try to add a file, convert all metadata + system
>>>> from dup to single and then try to add a file. It all doesn't give
>>>> real space, but it might give hints to what could be wrong.
>>>
>>> I can't do that because I would have to buy an extra disk which is 300 euro.
>>
>> You have 4TB unused on that disk. You could shrink the fs to ~3TB, and
>> then change partition size to match and create a 2nd partition with
>> whatever FS you want as the target for btrfs-image.
>>
>> After that, migrate your data from the broken fs to the new one. If
>> you use Btrfs for the new volume on that 2nd partition you can use
>> btrfs send-receive for this. After it's successful, you can wipefs the
>> 1st partition, and then add it as an additional device to the new
>> volume.
>>
>> --
>> Chris Murphy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11  6:24                           ` cheater00 .
@ 2016-01-11  7:54                             ` cheater00 .
  2016-01-12  0:35                               ` Duncan
  0 siblings, 1 reply; 55+ messages in thread
From: cheater00 . @ 2016-01-11  7:54 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Henk Slager, Btrfs BTRFS, Hugo Mills

After the fsck, the Data segment is being resized correctly. It would
seem to me that the fact Data was 2TB when this bug transpired was
just a coincidence.

Perhaps this line:
"BTRFS info (device sdc1): The free space cache file (2159324168192)
is invalid. skip it"

should not be "info" but an error, and should instruct the user to
fsck the file system.

I have requested an account on the btrfs wiki in order to add this info to:
https://btrfs.wiki.kernel.org/index.php/FAQ#Help.21_Btrfs_claims_I.27m_out_of_space.2C_but_it_looks_like_I_should_have_lots_left.21

On Mon, Jan 11, 2016 at 7:24 AM, cheater00 . <cheater00@gmail.com> wrote:
> When re-mounting now, this line was not present any more in dmesg:
> "BTRFS info (device sdc1): The free space cache file (2159324168192)
> is invalid. skip it"
>
> dmesg only showed:
> "[216798.144518] BTRFS info (device sdc1): disk space caching is enabled"
>
> On Mon, Jan 11, 2016 at 7:07 AM, cheater00 . <cheater00@gmail.com> wrote:
>> Here is the fsck log:
>>
>> # btrfs check /dev/sdc1
>> Checking filesystem on /dev/sdc1
>> UUID: b397b7ef-6754-4ba4-8b1a-fbf235aa1cf8
>> checking extents
>> checking free space cache
>> checking fs roots
>> checking csums
>> checking root refs
>> found 2178647877626 bytes used err is 0
>> total csum bytes: 2123923832
>> total tree bytes: 3749871616
>> total fs tree bytes: 645742592
>> total extent tree bytes: 680394752
>> btree space waste bytes: 284237552
>> file data blocks allocated: 2178788139008
>>  referenced 2173089497088
>> btrfs-progs v4.2.1
>>
>> On Mon, Jan 11, 2016 at 1:24 AM, Chris Murphy <lists@colorremedies.com> wrote:
>>> On Sun, Jan 10, 2016 at 4:47 PM, cheater00 . <cheater00@gmail.com> wrote:
>>>> On Sun, Jan 10, 2016 at 3:14 PM, Henk Slager <eye1tm@gmail.com> wrote:
>>>
>>>> buggy fs ("Media"):
>>>> Data, single: total=1.98TiB, used=1.98TiB
>>>> System, DUP: total=8.00MiB, used=240.00KiB
>>>> System, single: total=4.00MiB, used=0.00B
>>>> Metadata, DUP: total=5.50GiB, used=3.49GiB
>>>> Metadata, single: total=8.00MiB, used=0.00B
>>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>>>
>>>>> What you could try is to create an image+'copy' of the fs with
>>>>> btrfs-image just after you get ENOSPC abd then do various tests with
>>>>> that (make sure unmount or even better unplug the physical hdd!). Like
>>>>> mounting and then try to add a file, convert all metadata + system
>>>>> from dup to single and then try to add a file. It all doesn't give
>>>>> real space, but it might give hints to what could be wrong.
>>>>
>>>> I can't do that because I would have to buy an extra disk which is 300 euro.
>>>
>>> You have 4TB unused on that disk. You could shrink the fs to ~3TB, and
>>> then change partition size to match and create a 2nd partition with
>>> whatever FS you want as the target for btrfs-image.
>>>
>>> After that, migrate your data from the broken fs to the new one. If
>>> you use Btrfs for the new volume on that 2nd partition you can use
>>> btrfs send-receive for this. After it's successful, you can wipefs the
>>> 1st partition, and then add it as an additional device to the new
>>> volume.
>>>
>>> --
>>> Chris Murphy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11  0:13                       ` Chris Murphy
@ 2016-01-11  9:03                         ` Hugo Mills
  2016-01-11 13:04                           ` cheater00 .
  2016-01-11 21:31                           ` Chris Murphy
  0 siblings, 2 replies; 55+ messages in thread
From: Hugo Mills @ 2016-01-11  9:03 UTC (permalink / raw)
  To: Chris Murphy; +Cc: cheater00 ., Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 960 bytes --]

On Sun, Jan 10, 2016 at 05:13:28PM -0700, Chris Murphy wrote:
> On Sat, Jan 9, 2016 at 2:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> > On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
> >> OK. How do we track down that bug and get it fixed?
> >
> >    I have no idea. I'm not a btrfs dev, I'm afraid.
> >
> >    It's been around for a number of years. None of the devs has, I
> > think, had the time to look at it. When Josef was still (publicly)
> > active, he had it second on his list of bugs to look at for many
> > months -- but it always got trumped by some new bug that could cause
> > data loss.
> 
> 
> Interesting. I did not know of this bug. It's pretty rare.

   Not really. It shows up maybe on average once a week on IRC. It
gets reported much less on the mailing list.

   Hugo.

-- 
Hugo Mills             | What part of "gestalt" don't you understand?
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11  9:03                         ` Hugo Mills
@ 2016-01-11 13:04                           ` cheater00 .
  2016-01-11 21:31                           ` Chris Murphy
  1 sibling, 0 replies; 55+ messages in thread
From: cheater00 . @ 2016-01-11 13:04 UTC (permalink / raw)
  To: Hugo Mills, Chris Murphy, cheater00 ., Btrfs BTRFS

The way I understood this line in dmesg:
"BTRFS info (device sdc1): The free space cache file (2159324168192)
is invalid. skip it"

was:

"here's some info. There was a file caching some work, but the cache
turned out to be invalid, so we won't be using it. We'll do the work
that it was caching, instead. This is not an error, it's just for your
information. If it were an error we would have told you."

On Mon, Jan 11, 2016 at 10:03 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Sun, Jan 10, 2016 at 05:13:28PM -0700, Chris Murphy wrote:
>> On Sat, Jan 9, 2016 at 2:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> > On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
>> >> OK. How do we track down that bug and get it fixed?
>> >
>> >    I have no idea. I'm not a btrfs dev, I'm afraid.
>> >
>> >    It's been around for a number of years. None of the devs has, I
>> > think, had the time to look at it. When Josef was still (publicly)
>> > active, he had it second on his list of bugs to look at for many
>> > months -- but it always got trumped by some new bug that could cause
>> > data loss.
>>
>>
>> Interesting. I did not know of this bug. It's pretty rare.
>
>    Not really. It shows up maybe on average once a week on IRC. It
> gets reported much less on the mailing list.
>
>    Hugo.
>
> --
> Hugo Mills             | What part of "gestalt" don't you understand?
> hugo@... carfax.org.uk |
> http://carfax.org.uk/  |
> PGP: E2AB1DE4          |

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-09 21:07                       ` cheater00 .
  2016-01-09 21:15                         ` Hugo Mills
  2016-01-10  6:16                         ` Russell Coker
@ 2016-01-11 13:05                         ` Austin S. Hemmelgarn
  2016-01-11 13:11                           ` cheater00 .
  2 siblings, 1 reply; 55+ messages in thread
From: Austin S. Hemmelgarn @ 2016-01-11 13:05 UTC (permalink / raw)
  To: cheater00 ., Hugo Mills, Chris Murphy, Btrfs BTRFS

On 2016-01-09 16:07, cheater00 . wrote:
> Would like to point out that this can cause data loss. If I'm writing
> to disk and the disk becomes unexpectedly read only - that data will
> be lost, because who in their right mind makes their code expect this
> and builds a contingency (e.g. caching, backpressure, etc)...
If a data critical application (mail server, database server, anything 
similar) can't gracefully handle ENOSPC, then that application is 
broken, not the FS.  As an example, set up a small VM with an SMTP 
server, then force the FS the server uses for queuing mail read-only, 
and see if you can submit mail, then go read the RFCs for SMTP and see 
what clients are supposed to do when they can't submit mail.  A properly 
designed piece of software is supposed to be resilient against common 
failure modes of the resources it depends on (which includes ENOSPC and 
read-only filesystems for anything that works with data on disk).
>
> There's no loss of data on the disk because the data doesn't make it
> to disk in the first place. But it's exactly the same as if the data
> had been written to disk, and then lost.
>
No, it isn't.  If you absolutely need the data on disk, you should be 
calling fsync or fdatasync, and then assuming if those return an error 
that none of the data written since the last call has gotten to the disk 
(some of it might have, but you need to assume it hasn't).  Every piece 
of software in wide usage that requires data to be on the disk does 
this, because otherwise it can't guarantee that the data is on disk.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 13:05                         ` Austin S. Hemmelgarn
@ 2016-01-11 13:11                           ` cheater00 .
  2016-01-11 13:30                             ` cheater00 .
  2016-01-11 14:10                             ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 55+ messages in thread
From: cheater00 . @ 2016-01-11 13:11 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Hugo Mills, Chris Murphy, Btrfs BTRFS

On Mon, Jan 11, 2016 at 2:05 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2016-01-09 16:07, cheater00 . wrote:
>>
>> Would like to point out that this can cause data loss. If I'm writing
>> to disk and the disk becomes unexpectedly read only - that data will
>> be lost, because who in their right mind makes their code expect this
>> and builds a contingency (e.g. caching, backpressure, etc)...
>
> If a data critical application (mail server, database server, anything
> similar) can't gracefully handle ENOSPC, then that application is broken,
> not the FS.  As an example, set up a small VM with an SMTP server, then
> force the FS the server uses for queuing mail read-only, and see if you can
> submit mail, then go read the RFCs for SMTP and see what clients are
> supposed to do when they can't submit mail.  A properly designed piece of
> software is supposed to be resilient against common failure modes of the
> resources it depends on (which includes ENOSPC and read-only filesystems for
> anything that works with data on disk).
>>
>>
>> There's no loss of data on the disk because the data doesn't make it
>> to disk in the first place. But it's exactly the same as if the data
>> had been written to disk, and then lost.
>>
> No, it isn't.  If you absolutely need the data on disk, you should be
> calling fsync or fdatasync, and then assuming if those return an error that
> none of the data written since the last call has gotten to the disk (some of
> it might have, but you need to assume it hasn't).  Every piece of software
> in wide usage that requires data to be on the disk does this, because
> otherwise it can't guarantee that the data is on disk.

I agree that a lot of stuff goes right in a perfect world. But most of
the time what you're running isn't a mail server used by billions of
users, but instead a bash script someone wrote once that's supposed to
do something, and no one knows how it works.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 13:11                           ` cheater00 .
@ 2016-01-11 13:30                             ` cheater00 .
  2016-01-11 13:45                               ` cheater00 .
  2016-01-11 14:10                             ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 55+ messages in thread
From: cheater00 . @ 2016-01-11 13:30 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Hugo Mills, Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 4422 bytes --]

The bug just happened again. Attached is a log since the time I
mounted the FS right after the fsck.

Note the only things between the message I got while mounting:
[216798.144518] BTRFS info (device sdc1): disk space caching is enabled

and the beginning of the crash dump:
[241534.760651] ------------[ cut here ]------------

is this:
[218266.098344] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
[233647.332085] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci

I am not sure why those resets happen, though. I bought a few cables
and experimented with them, and the usb ports themselves are located
directly on the motherboard.
Also, they happened some considerable time before the crash dump. So
I'm not sure they're even related. Especially given that I was copying
a lot of very small files, and they all copied onto the disk fine all
the time between the last usb reset and the crash dump, which is
roughly two and a half hours. In fact I pressed ctrl-z on a move
operation and then wrote something like sleep $(echo '60*60*3' | bc) ;
fg and ran it just past 9 am, so the mv resumed past 12 pm, so as
things add up the last usb reset happened even before the mv was
resumed with fg.

I unmounted the fs and re-mounted the it to make it writeable again.
This showed up in dmesg:

[241766.485365] BTRFS error (device sdc1): cleaner transaction attach
returned -30
[241770.115897] BTRFS info (device sdc1): disk space caching is enabled

this time there was no "info" line about the free space cache file. So
maybe it wasn't important for the bug to occur at all.

The new output of btrfs fi df -g is:
Data, single: total=2080.01GiB, used=2078.80GiB
System, DUP: total=0.01GiB, used=0.00GiB
System, single: total=0.00GiB, used=0.00GiB
Metadata, DUP: total=5.50GiB, used=3.73GiB
Metadata, single: total=0.01GiB, used=0.00GiB
GlobalReserve, single: total=0.50GiB, used=0.00GiB

I could swap this disk onto sata and the other disk back onto usb to
see if the usb resets have anything to do with this. But I'm skeptic.
Also maybe btrfs has some other issues related to just the disk being
on usb, resets or not, and this way if the bug doesn't trigger on sata
we'll think "aha it was the resets, buggy hardware etc" but instead
it'll have been something else that just has to do with the disk being
on usb operating normally.

On Mon, Jan 11, 2016 at 2:11 PM, cheater00 . <cheater00@gmail.com> wrote:
> On Mon, Jan 11, 2016 at 2:05 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> On 2016-01-09 16:07, cheater00 . wrote:
>>>
>>> Would like to point out that this can cause data loss. If I'm writing
>>> to disk and the disk becomes unexpectedly read only - that data will
>>> be lost, because who in their right mind makes their code expect this
>>> and builds a contingency (e.g. caching, backpressure, etc)...
>>
>> If a data critical application (mail server, database server, anything
>> similar) can't gracefully handle ENOSPC, then that application is broken,
>> not the FS.  As an example, set up a small VM with an SMTP server, then
>> force the FS the server uses for queuing mail read-only, and see if you can
>> submit mail, then go read the RFCs for SMTP and see what clients are
>> supposed to do when they can't submit mail.  A properly designed piece of
>> software is supposed to be resilient against common failure modes of the
>> resources it depends on (which includes ENOSPC and read-only filesystems for
>> anything that works with data on disk).
>>>
>>>
>>> There's no loss of data on the disk because the data doesn't make it
>>> to disk in the first place. But it's exactly the same as if the data
>>> had been written to disk, and then lost.
>>>
>> No, it isn't.  If you absolutely need the data on disk, you should be
>> calling fsync or fdatasync, and then assuming if those return an error that
>> none of the data written since the last call has gotten to the disk (some of
>> it might have, but you need to assume it hasn't).  Every piece of software
>> in wide usage that requires data to be on the disk does this, because
>> otherwise it can't guarantee that the data is on disk.
>
> I agree that a lot of stuff goes right in a perfect world. But most of
> the time what you're running isn't a mail server used by billions of
> users, but instead a bash script someone wrote once that's supposed to
> do something, and no one knows how it works.

[-- Attachment #2: bug again after fsck and remount.txt --]
[-- Type: text/plain, Size: 3773 bytes --]

[216798.144518] BTRFS info (device sdc1): disk space caching is enabled
[218266.098344] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
[233647.332085] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
[241534.760651] ------------[ cut here ]------------
[241534.760675] WARNING: CPU: 0 PID: 19845 at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:2851 btrfs_run_delayed_refs+0x227/0x250 [btrfs]()
[241534.760677] BTRFS: Transaction aborted (error -28)
[241534.760679] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c nls_utf8 isofs pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) cuse ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables usblp snd_hda_codec_hdmi hp_wmi sparse_keymap snd_hda_codec_idt snd_hda_codec_generic intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp dm_multipath snd_hda_intel snd_hda_codec coretemp snd_hda_core kvm_intel radeon snd_hwdep hid_logitech_hidpp kvm snd_pcm i915 snd_seq_midi crc32_pclmul snd_seq_midi_event snd_rawmidi aesni_intel aes_i586 ttm xts snd_seq lrw drm_kms_helper snd_seq_device gf128mul snd_timer ablk_helper drm joydev cryptd bnep rfcomm snd input_leds i2c_algo_bit fb_sys_fops rtsx_pci_ms bluetooth serio_raw soundcore syscopyarea memstick sysfillrect sysimgblt hp_accel mei_me lis3lv02d lpc_ich wmi shpchp input_polldev mei video mac_hid nfsd auth_rpcgss nfs_acl parport_pc nfs ppdev lockd lp grace sunrpc parport fscache binfmt_misc hid_generic hid_logitech_dj usbhid hid btrfs xor uas usb_storage raid6_pq rtsx_pci_sdmmc ahci r8169 sdhci_pci psmouse libahci sdhci rtsx_pci mii fjes
[241534.760775] CPU: 0 PID: 19845 Comm: btrfs-transacti Tainted: G        W  OE   4.3.0-040300rc7-generic #201510260712
[241534.760776] Hardware name: Hewlett-Packard HP Pavilion dv6 Notebook PC/17FA, BIOS F.02 10/03/2011
[241534.760778]  00000000 00000000 e4c9ddd8 c13610e8 e4c9de18 e4c9de08 c1068107 f8ae4190
[241534.760782]  e4c9de34 00004d85 f8ae3ff0 00000b23 f8a45a77 f8a45a77 ea674078 ffffffe4
[241534.760786]  cafaf400 e4c9de20 c1068173 00000009 e4c9de18 f8ae4190 e4c9de34 e4c9de5c
[241534.760790] Call Trace:
[241534.760796]  [<c13610e8>] dump_stack+0x41/0x59
[241534.760800]  [<c1068107>] warn_slowpath_common+0x87/0xc0
[241534.760810]  [<f8a45a77>] ? btrfs_run_delayed_refs+0x227/0x250 [btrfs]
[241534.760818]  [<f8a45a77>] ? btrfs_run_delayed_refs+0x227/0x250 [btrfs]
[241534.760820]  [<c1068173>] warn_slowpath_fmt+0x33/0x40
[241534.760828]  [<f8a45a77>] btrfs_run_delayed_refs+0x227/0x250 [btrfs]
[241534.760836]  [<f8a467f0>] btrfs_write_dirty_block_groups+0x170/0x2a0 [btrfs]
[241534.760848]  [<f8adb3c8>] commit_cowonly_roots+0x1e9/0x26a [btrfs]
[241534.760859]  [<f8a5b6ba>] btrfs_commit_transaction+0x87a/0xe90 [btrfs]
[241534.760869]  [<f8a5bd4d>] ? start_transaction+0x7d/0x5b0 [btrfs]
[241534.760878]  [<f8a56865>] transaction_kthread+0x215/0x230 [btrfs]
[241534.760887]  [<f8a56650>] ? btrfs_cleanup_transaction+0x490/0x490 [btrfs]
[241534.760890]  [<c1083c3b>] kthread+0x9b/0xb0
[241534.760894]  [<c1743581>] ret_from_kernel_thread+0x21/0x30
[241534.760897]  [<c1083ba0>] ? kthread_create_on_node+0x110/0x110
[241534.760899] ---[ end trace dc3cf6814526c7ca ]---
[241534.760923] BTRFS: error (device sdc1) in btrfs_run_delayed_refs:2851: errno=-28 No space left
[241534.760927] BTRFS info (device sdc1): forced readonly
[241534.761198] BTRFS warning (device sdc1): Skipping commit of aborted transaction.
[241534.761201] BTRFS: error (device sdc1) in cleanup_transaction:1741: errno=-28 No space left

[-- Attachment #3: lsusb.txt --]
[-- Type: text/plain, Size: 2204 bytes --]

# lsusb
Bus 004 Device 007: ID 05e3:0608 Genesys Logic, Inc. Hub
Bus 004 Device 006: ID 046d:c31d Logitech, Inc. Media Keyboard K200
Bus 004 Device 005: ID 046d:c52b Logitech, Inc. Unifying Receiver
Bus 004 Device 004: ID 05e3:0608 Genesys Logic, Inc. Hub
Bus 004 Device 003: ID 067b:2773 Prolific Technology, Inc. PL2773 SATAII bridge controller
Bus 004 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 003: ID 04fc:0c05 Sunplus Technology Co., Ltd 
Bus 001 Device 002: ID 04e8:3252 Samsung Electronics Co., Ltd 
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 003: ID 138a:0018 Validity Sensors, Inc. Fingerprint scanner
Bus 003 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

# lsusb -t
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/6p, 480M
        |__ Port 1: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 480M
        |__ Port 2: Dev 4, If 0, Class=Hub, Driver=hub/4p, 480M
            |__ Port 1: Dev 5, If 0, Class=Human Interface Device, Driver=usbhid, 12M
            |__ Port 1: Dev 5, If 1, Class=Human Interface Device, Driver=usbhid, 12M
            |__ Port 1: Dev 5, If 2, Class=Human Interface Device, Driver=usbhid, 12M
            |__ Port 2: Dev 6, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
            |__ Port 2: Dev 6, If 1, Class=Human Interface Device, Driver=usbhid, 1.5M
            |__ Port 3: Dev 7, If 0, Class=Hub, Driver=hub/4p, 480M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/6p, 480M
        |__ Port 1: Dev 3, If 0, Class=Vendor Specific Class, Driver=, 12M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
    |__ Port 1: Dev 2, If 0, Class=Printer, Driver=usblp, 480M
    |__ Port 2: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 480M

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 13:30                             ` cheater00 .
@ 2016-01-11 13:45                               ` cheater00 .
  2016-01-11 14:04                                 ` cheater00 .
  2016-08-04 16:53                                 ` Lutz Vieweg
  0 siblings, 2 replies; 55+ messages in thread
From: cheater00 . @ 2016-01-11 13:45 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Hugo Mills, Chris Murphy, Btrfs BTRFS

After remounting, the bug doesn't transpire any more, Data gets resized.

It is my experience that this bug will go untriggered for weeks at a
time until I write a lot to that disk there, at which point it'll
happen very quickly. I believe this has more to do with the amount of
data that's been written to disk than anything else. It has been about
48 GB to trigger the last instance and I don't think that's very
different from what happened before but I didn't keep track exactly.

On Mon, Jan 11, 2016 at 2:30 PM, cheater00 . <cheater00@gmail.com> wrote:
> The bug just happened again. Attached is a log since the time I
> mounted the FS right after the fsck.
>
> Note the only things between the message I got while mounting:
> [216798.144518] BTRFS info (device sdc1): disk space caching is enabled
>
> and the beginning of the crash dump:
> [241534.760651] ------------[ cut here ]------------
>
> is this:
> [218266.098344] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
> [233647.332085] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
>
> I am not sure why those resets happen, though. I bought a few cables
> and experimented with them, and the usb ports themselves are located
> directly on the motherboard.
> Also, they happened some considerable time before the crash dump. So
> I'm not sure they're even related. Especially given that I was copying
> a lot of very small files, and they all copied onto the disk fine all
> the time between the last usb reset and the crash dump, which is
> roughly two and a half hours. In fact I pressed ctrl-z on a move
> operation and then wrote something like sleep $(echo '60*60*3' | bc) ;
> fg and ran it just past 9 am, so the mv resumed past 12 pm, so as
> things add up the last usb reset happened even before the mv was
> resumed with fg.
>
> I unmounted the fs and re-mounted the it to make it writeable again.
> This showed up in dmesg:
>
> [241766.485365] BTRFS error (device sdc1): cleaner transaction attach
> returned -30
> [241770.115897] BTRFS info (device sdc1): disk space caching is enabled
>
> this time there was no "info" line about the free space cache file. So
> maybe it wasn't important for the bug to occur at all.
>
> The new output of btrfs fi df -g is:
> Data, single: total=2080.01GiB, used=2078.80GiB
> System, DUP: total=0.01GiB, used=0.00GiB
> System, single: total=0.00GiB, used=0.00GiB
> Metadata, DUP: total=5.50GiB, used=3.73GiB
> Metadata, single: total=0.01GiB, used=0.00GiB
> GlobalReserve, single: total=0.50GiB, used=0.00GiB
>
> I could swap this disk onto sata and the other disk back onto usb to
> see if the usb resets have anything to do with this. But I'm skeptic.
> Also maybe btrfs has some other issues related to just the disk being
> on usb, resets or not, and this way if the bug doesn't trigger on sata
> we'll think "aha it was the resets, buggy hardware etc" but instead
> it'll have been something else that just has to do with the disk being
> on usb operating normally.
>
> On Mon, Jan 11, 2016 at 2:11 PM, cheater00 . <cheater00@gmail.com> wrote:
>> On Mon, Jan 11, 2016 at 2:05 PM, Austin S. Hemmelgarn
>> <ahferroin7@gmail.com> wrote:
>>> On 2016-01-09 16:07, cheater00 . wrote:
>>>>
>>>> Would like to point out that this can cause data loss. If I'm writing
>>>> to disk and the disk becomes unexpectedly read only - that data will
>>>> be lost, because who in their right mind makes their code expect this
>>>> and builds a contingency (e.g. caching, backpressure, etc)...
>>>
>>> If a data critical application (mail server, database server, anything
>>> similar) can't gracefully handle ENOSPC, then that application is broken,
>>> not the FS.  As an example, set up a small VM with an SMTP server, then
>>> force the FS the server uses for queuing mail read-only, and see if you can
>>> submit mail, then go read the RFCs for SMTP and see what clients are
>>> supposed to do when they can't submit mail.  A properly designed piece of
>>> software is supposed to be resilient against common failure modes of the
>>> resources it depends on (which includes ENOSPC and read-only filesystems for
>>> anything that works with data on disk).
>>>>
>>>>
>>>> There's no loss of data on the disk because the data doesn't make it
>>>> to disk in the first place. But it's exactly the same as if the data
>>>> had been written to disk, and then lost.
>>>>
>>> No, it isn't.  If you absolutely need the data on disk, you should be
>>> calling fsync or fdatasync, and then assuming if those return an error that
>>> none of the data written since the last call has gotten to the disk (some of
>>> it might have, but you need to assume it hasn't).  Every piece of software
>>> in wide usage that requires data to be on the disk does this, because
>>> otherwise it can't guarantee that the data is on disk.
>>
>> I agree that a lot of stuff goes right in a perfect world. But most of
>> the time what you're running isn't a mail server used by billions of
>> users, but instead a bash script someone wrote once that's supposed to
>> do something, and no one knows how it works.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 13:45                               ` cheater00 .
@ 2016-01-11 14:04                                 ` cheater00 .
  2016-01-12  2:18                                   ` Duncan
  2016-08-04 16:53                                 ` Lutz Vieweg
  1 sibling, 1 reply; 55+ messages in thread
From: cheater00 . @ 2016-01-11 14:04 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Hugo Mills, Chris Murphy, Btrfs BTRFS

I noticed that every time Data gets bumped, it only gets bumped by a
couple GB. I rarely ever store files on that disk that are larger than
2 GB, But the last time it crashed, I was moving a file that was 4.3
GB, so maybe that's conductive to the crash happening? Maybe the file
being larger than what btrfs would allocate has something to do with
this. I will keep track of the amount of data since last crash, and
the file size when the crash occured.

On Mon, Jan 11, 2016 at 2:45 PM, cheater00 . <cheater00@gmail.com> wrote:
> After remounting, the bug doesn't transpire any more, Data gets resized.
>
> It is my experience that this bug will go untriggered for weeks at a
> time until I write a lot to that disk there, at which point it'll
> happen very quickly. I believe this has more to do with the amount of
> data that's been written to disk than anything else. It has been about
> 48 GB to trigger the last instance and I don't think that's very
> different from what happened before but I didn't keep track exactly.
>
> On Mon, Jan 11, 2016 at 2:30 PM, cheater00 . <cheater00@gmail.com> wrote:
>> The bug just happened again. Attached is a log since the time I
>> mounted the FS right after the fsck.
>>
>> Note the only things between the message I got while mounting:
>> [216798.144518] BTRFS info (device sdc1): disk space caching is enabled
>>
>> and the beginning of the crash dump:
>> [241534.760651] ------------[ cut here ]------------
>>
>> is this:
>> [218266.098344] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
>> [233647.332085] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
>>
>> I am not sure why those resets happen, though. I bought a few cables
>> and experimented with them, and the usb ports themselves are located
>> directly on the motherboard.
>> Also, they happened some considerable time before the crash dump. So
>> I'm not sure they're even related. Especially given that I was copying
>> a lot of very small files, and they all copied onto the disk fine all
>> the time between the last usb reset and the crash dump, which is
>> roughly two and a half hours. In fact I pressed ctrl-z on a move
>> operation and then wrote something like sleep $(echo '60*60*3' | bc) ;
>> fg and ran it just past 9 am, so the mv resumed past 12 pm, so as
>> things add up the last usb reset happened even before the mv was
>> resumed with fg.
>>
>> I unmounted the fs and re-mounted the it to make it writeable again.
>> This showed up in dmesg:
>>
>> [241766.485365] BTRFS error (device sdc1): cleaner transaction attach
>> returned -30
>> [241770.115897] BTRFS info (device sdc1): disk space caching is enabled
>>
>> this time there was no "info" line about the free space cache file. So
>> maybe it wasn't important for the bug to occur at all.
>>
>> The new output of btrfs fi df -g is:
>> Data, single: total=2080.01GiB, used=2078.80GiB
>> System, DUP: total=0.01GiB, used=0.00GiB
>> System, single: total=0.00GiB, used=0.00GiB
>> Metadata, DUP: total=5.50GiB, used=3.73GiB
>> Metadata, single: total=0.01GiB, used=0.00GiB
>> GlobalReserve, single: total=0.50GiB, used=0.00GiB
>>
>> I could swap this disk onto sata and the other disk back onto usb to
>> see if the usb resets have anything to do with this. But I'm skeptic.
>> Also maybe btrfs has some other issues related to just the disk being
>> on usb, resets or not, and this way if the bug doesn't trigger on sata
>> we'll think "aha it was the resets, buggy hardware etc" but instead
>> it'll have been something else that just has to do with the disk being
>> on usb operating normally.
>>
>> On Mon, Jan 11, 2016 at 2:11 PM, cheater00 . <cheater00@gmail.com> wrote:
>>> On Mon, Jan 11, 2016 at 2:05 PM, Austin S. Hemmelgarn
>>> <ahferroin7@gmail.com> wrote:
>>>> On 2016-01-09 16:07, cheater00 . wrote:
>>>>>
>>>>> Would like to point out that this can cause data loss. If I'm writing
>>>>> to disk and the disk becomes unexpectedly read only - that data will
>>>>> be lost, because who in their right mind makes their code expect this
>>>>> and builds a contingency (e.g. caching, backpressure, etc)...
>>>>
>>>> If a data critical application (mail server, database server, anything
>>>> similar) can't gracefully handle ENOSPC, then that application is broken,
>>>> not the FS.  As an example, set up a small VM with an SMTP server, then
>>>> force the FS the server uses for queuing mail read-only, and see if you can
>>>> submit mail, then go read the RFCs for SMTP and see what clients are
>>>> supposed to do when they can't submit mail.  A properly designed piece of
>>>> software is supposed to be resilient against common failure modes of the
>>>> resources it depends on (which includes ENOSPC and read-only filesystems for
>>>> anything that works with data on disk).
>>>>>
>>>>>
>>>>> There's no loss of data on the disk because the data doesn't make it
>>>>> to disk in the first place. But it's exactly the same as if the data
>>>>> had been written to disk, and then lost.
>>>>>
>>>> No, it isn't.  If you absolutely need the data on disk, you should be
>>>> calling fsync or fdatasync, and then assuming if those return an error that
>>>> none of the data written since the last call has gotten to the disk (some of
>>>> it might have, but you need to assume it hasn't).  Every piece of software
>>>> in wide usage that requires data to be on the disk does this, because
>>>> otherwise it can't guarantee that the data is on disk.
>>>
>>> I agree that a lot of stuff goes right in a perfect world. But most of
>>> the time what you're running isn't a mail server used by billions of
>>> users, but instead a bash script someone wrote once that's supposed to
>>> do something, and no one knows how it works.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 13:11                           ` cheater00 .
  2016-01-11 13:30                             ` cheater00 .
@ 2016-01-11 14:10                             ` Austin S. Hemmelgarn
  2016-01-11 16:02                               ` cheater00 .
  1 sibling, 1 reply; 55+ messages in thread
From: Austin S. Hemmelgarn @ 2016-01-11 14:10 UTC (permalink / raw)
  To: cheater00 .; +Cc: Hugo Mills, Chris Murphy, Btrfs BTRFS

On 2016-01-11 08:11, cheater00 . wrote:
> On Mon, Jan 11, 2016 at 2:05 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> On 2016-01-09 16:07, cheater00 . wrote:
>>>
>>> Would like to point out that this can cause data loss. If I'm writing
>>> to disk and the disk becomes unexpectedly read only - that data will
>>> be lost, because who in their right mind makes their code expect this
>>> and builds a contingency (e.g. caching, backpressure, etc)...
>>
>> If a data critical application (mail server, database server, anything
>> similar) can't gracefully handle ENOSPC, then that application is broken,
>> not the FS.  As an example, set up a small VM with an SMTP server, then
>> force the FS the server uses for queuing mail read-only, and see if you can
>> submit mail, then go read the RFCs for SMTP and see what clients are
>> supposed to do when they can't submit mail.  A properly designed piece of
>> software is supposed to be resilient against common failure modes of the
>> resources it depends on (which includes ENOSPC and read-only filesystems for
>> anything that works with data on disk).
>>>
>>>
>>> There's no loss of data on the disk because the data doesn't make it
>>> to disk in the first place. But it's exactly the same as if the data
>>> had been written to disk, and then lost.
>>>
>> No, it isn't.  If you absolutely need the data on disk, you should be
>> calling fsync or fdatasync, and then assuming if those return an error that
>> none of the data written since the last call has gotten to the disk (some of
>> it might have, but you need to assume it hasn't).  Every piece of software
>> in wide usage that requires data to be on the disk does this, because
>> otherwise it can't guarantee that the data is on disk.
>
> I agree that a lot of stuff goes right in a perfect world. But most of
> the time what you're running isn't a mail server used by billions of
> users, but instead a bash script someone wrote once that's supposed to
> do something, and no one knows how it works.
>
And that's why no sane person does stuff like that on enterprise level 
systems.  And even then, if the person writing the bash script actually 
knows what they're doing, they will be using the 'sync' command to 
ensure data integrity when they actually need it, or they will write 
their script in such a way that it gracefully handles a partial run.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 14:10                             ` Austin S. Hemmelgarn
@ 2016-01-11 16:02                               ` cheater00 .
  2016-01-11 16:33                                 ` cheater00 .
  0 siblings, 1 reply; 55+ messages in thread
From: cheater00 . @ 2016-01-11 16:02 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Hugo Mills, Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3339 bytes --]

I triggered the bug again, attaching log. There were some usb resets,
but they happened 23 minutes before the fs crashed.

At mount, the output of btrfs fi df -g was like this:
Data, single: total=2080.01GiB, used=2078.80GiB
System, DUP: total=0.01GiB, used=0.00GiB
System, single: total=0.00GiB, used=0.00GiB
Metadata, DUP: total=5.50GiB, used=3.73GiB
Metadata, single: total=0.01GiB, used=0.00GiB
GlobalReserve, single: total=0.50GiB, used=0.00GiB

Now it is:
Data, single: total=2094.01GiB, used=2092.26GiB
System, DUP: total=0.01GiB, used=0.00GiB
System, single: total=0.00GiB, used=0.00GiB
Metadata, DUP: total=5.50GiB, used=3.79GiB
Metadata, single: total=0.01GiB, used=0.00GiB
GlobalReserve, single: total=0.50GiB, used=0.00GiB

The file being copied at the time was 954 MB.

On Mon, Jan 11, 2016 at 3:10 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2016-01-11 08:11, cheater00 . wrote:
>>
>> On Mon, Jan 11, 2016 at 2:05 PM, Austin S. Hemmelgarn
>> <ahferroin7@gmail.com> wrote:
>>>
>>> On 2016-01-09 16:07, cheater00 . wrote:
>>>>
>>>>
>>>> Would like to point out that this can cause data loss. If I'm writing
>>>> to disk and the disk becomes unexpectedly read only - that data will
>>>> be lost, because who in their right mind makes their code expect this
>>>> and builds a contingency (e.g. caching, backpressure, etc)...
>>>
>>>
>>> If a data critical application (mail server, database server, anything
>>> similar) can't gracefully handle ENOSPC, then that application is broken,
>>> not the FS.  As an example, set up a small VM with an SMTP server, then
>>> force the FS the server uses for queuing mail read-only, and see if you
>>> can
>>> submit mail, then go read the RFCs for SMTP and see what clients are
>>> supposed to do when they can't submit mail.  A properly designed piece of
>>> software is supposed to be resilient against common failure modes of the
>>> resources it depends on (which includes ENOSPC and read-only filesystems
>>> for
>>> anything that works with data on disk).
>>>>
>>>>
>>>>
>>>> There's no loss of data on the disk because the data doesn't make it
>>>> to disk in the first place. But it's exactly the same as if the data
>>>> had been written to disk, and then lost.
>>>>
>>> No, it isn't.  If you absolutely need the data on disk, you should be
>>> calling fsync or fdatasync, and then assuming if those return an error
>>> that
>>> none of the data written since the last call has gotten to the disk (some
>>> of
>>> it might have, but you need to assume it hasn't).  Every piece of
>>> software
>>> in wide usage that requires data to be on the disk does this, because
>>> otherwise it can't guarantee that the data is on disk.
>>
>>
>> I agree that a lot of stuff goes right in a perfect world. But most of
>> the time what you're running isn't a mail server used by billions of
>> users, but instead a bash script someone wrote once that's supposed to
>> do something, and no one knows how it works.
>>
> And that's why no sane person does stuff like that on enterprise level
> systems.  And even then, if the person writing the bash script actually
> knows what they're doing, they will be using the 'sync' command to ensure
> data integrity when they actually need it, or they will write their script
> in such a way that it gracefully handles a partial run.

[-- Attachment #2: bug yet again.txt --]
[-- Type: text/plain, Size: 4326 bytes --]

[241770.115897] BTRFS info (device sdc1): disk space caching is enabled
[242773.777365] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
[248064.722181] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
[249457.369166] ------------[ cut here ]------------
[249457.369215] WARNING: CPU: 4 PID: 7358 at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:6360 __btrfs_free_extent+0x354/0xe70 [btrfs]()
[249457.369220] BTRFS: Transaction aborted (error -28)
[249457.369224] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c nls_utf8 isofs pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) cuse ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables usblp snd_hda_codec_hdmi hp_wmi sparse_keymap snd_hda_codec_idt snd_hda_codec_generic intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp dm_multipath snd_hda_intel snd_hda_codec coretemp snd_hda_core kvm_intel radeon snd_hwdep hid_logitech_hidpp kvm snd_pcm i915 snd_seq_midi crc32_pclmul snd_seq_midi_event snd_rawmidi aesni_intel aes_i586 ttm xts snd_seq lrw drm_kms_helper snd_seq_device gf128mul snd_timer ablk_helper drm joydev cryptd bnep rfcomm snd input_leds i2c_algo_bit fb_sys_fops rtsx_pci_ms bluetooth serio_raw soundcore syscopyarea memstick sysfillrect sysimgblt hp_accel mei_me lis3lv02d lpc_ich wmi shpchp input_polldev mei video mac_hid nfsd auth_rpcgss nfs_acl parport_pc nfs ppdev lockd lp grace sunrpc parport fscache binfmt_misc hid_generic hid_logitech_dj usbhid hid btrfs xor uas usb_storage raid6_pq rtsx_pci_sdmmc ahci r8169 sdhci_pci psmouse libahci sdhci rtsx_pci mii fjes
[249457.369455] CPU: 4 PID: 7358 Comm: btrfs-transacti Tainted: G        W  OE   4.3.0-040300rc7-generic #201510260712
[249457.369460] Hardware name: Hewlett-Packard HP Pavilion dv6 Notebook PC/17FA, BIOS F.02 10/03/2011
[249457.369464]  00000000 00000000 d6d1bc40 c13610e8 d6d1bc80 d6d1bc70 c1068107 f8ae4190
[249457.369490]  d6d1bc9c 00001cbe f8ae3ff0 000018d8 f8a3d8d4 f8a3d8d4 ea42f2a0 ffffffe4
[249457.369503]  00000000 d6d1bc88 c1068173 00000009 d6d1bc80 f8ae4190 d6d1bc9c d6d1bd4c
[249457.369516] Call Trace:
[249457.369530]  [<c13610e8>] dump_stack+0x41/0x59
[249457.369542]  [<c1068107>] warn_slowpath_common+0x87/0xc0
[249457.369574]  [<f8a3d8d4>] ? __btrfs_free_extent+0x354/0xe70 [btrfs]
[249457.369610]  [<f8a3d8d4>] ? __btrfs_free_extent+0x354/0xe70 [btrfs]
[249457.369620]  [<c1068173>] warn_slowpath_fmt+0x33/0x40
[249457.369655]  [<f8a3d8d4>] __btrfs_free_extent+0x354/0xe70 [btrfs]
[249457.369666]  [<c10d6001>] ? ktime_get+0x41/0x120
[249457.369715]  [<f8aad26b>] ? btrfs_delayed_ref_lock+0x2b/0x200 [btrfs]
[249457.369749]  [<f8a42370>] __btrfs_run_delayed_refs+0x970/0x1110 [btrfs]
[249457.369763]  [<c11674a1>] ? set_page_dirty+0x31/0x70
[249457.369814]  [<f8a837cc>] ? set_extent_buffer_dirty+0x7c/0xd0 [btrfs]
[249457.369847]  [<f8a458bd>] btrfs_run_delayed_refs+0x6d/0x250 [btrfs]
[249457.369879]  [<f8a467f0>] btrfs_write_dirty_block_groups+0x170/0x2a0 [btrfs]
[249457.369926]  [<f8adb3c8>] commit_cowonly_roots+0x1e9/0x26a [btrfs]
[249457.369974]  [<f8a5b6ba>] btrfs_commit_transaction+0x87a/0xe90 [btrfs]
[249457.370012]  [<f8a5bd4d>] ? start_transaction+0x7d/0x5b0 [btrfs]
[249457.370026]  [<c10a5060>] ? wake_atomic_t_function+0x70/0x70
[249457.370066]  [<f8a56865>] transaction_kthread+0x215/0x230 [btrfs]
[249457.370101]  [<f8a56650>] ? btrfs_cleanup_transaction+0x490/0x490 [btrfs]
[249457.370113]  [<c1083c3b>] kthread+0x9b/0xb0
[249457.370125]  [<c1743581>] ret_from_kernel_thread+0x21/0x30
[249457.370136]  [<c1083ba0>] ? kthread_create_on_node+0x110/0x110
[249457.370144] ---[ end trace dc3cf6814526c7cb ]---
[249457.370203] BTRFS: error (device sdc1) in __btrfs_free_extent:6360: errno=-28 No space left
[249457.370211] BTRFS info (device sdc1): forced readonly
[249457.370220] BTRFS: error (device sdc1) in btrfs_run_delayed_refs:2851: errno=-28 No space left
[249457.419978] BTRFS warning (device sdc1): Skipping commit of aborted transaction.
[249457.419984] BTRFS: error (device sdc1) in cleanup_transaction:1741: errno=-28 No space left

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 16:02                               ` cheater00 .
@ 2016-01-11 16:33                                 ` cheater00 .
  2016-01-11 20:29                                   ` Henk Slager
  0 siblings, 1 reply; 55+ messages in thread
From: cheater00 . @ 2016-01-11 16:33 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Hugo Mills, Chris Murphy, Btrfs BTRFS

After unmounting:
[251818.992992] BTRFS error (device sdc1): cleaner transaction attach
returned -30

and remounting:
[251837.393750] BTRFS info (device sdc1): disk space caching is enabled

the disk again resizes Data.

On Mon, Jan 11, 2016 at 5:02 PM, cheater00 . <cheater00@gmail.com> wrote:
> I triggered the bug again, attaching log. There were some usb resets,
> but they happened 23 minutes before the fs crashed.
>
> At mount, the output of btrfs fi df -g was like this:
> Data, single: total=2080.01GiB, used=2078.80GiB
> System, DUP: total=0.01GiB, used=0.00GiB
> System, single: total=0.00GiB, used=0.00GiB
> Metadata, DUP: total=5.50GiB, used=3.73GiB
> Metadata, single: total=0.01GiB, used=0.00GiB
> GlobalReserve, single: total=0.50GiB, used=0.00GiB
>
> Now it is:
> Data, single: total=2094.01GiB, used=2092.26GiB
> System, DUP: total=0.01GiB, used=0.00GiB
> System, single: total=0.00GiB, used=0.00GiB
> Metadata, DUP: total=5.50GiB, used=3.79GiB
> Metadata, single: total=0.01GiB, used=0.00GiB
> GlobalReserve, single: total=0.50GiB, used=0.00GiB
>
> The file being copied at the time was 954 MB.
>
>
>
> On Mon, Jan 11, 2016 at 3:10 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> On 2016-01-11 08:11, cheater00 . wrote:
>>>
>>> On Mon, Jan 11, 2016 at 2:05 PM, Austin S. Hemmelgarn
>>> <ahferroin7@gmail.com> wrote:
>>>>
>>>> On 2016-01-09 16:07, cheater00 . wrote:
>>>>>
>>>>>
>>>>> Would like to point out that this can cause data loss. If I'm writing
>>>>> to disk and the disk becomes unexpectedly read only - that data will
>>>>> be lost, because who in their right mind makes their code expect this
>>>>> and builds a contingency (e.g. caching, backpressure, etc)...
>>>>
>>>>
>>>> If a data critical application (mail server, database server, anything
>>>> similar) can't gracefully handle ENOSPC, then that application is broken,
>>>> not the FS.  As an example, set up a small VM with an SMTP server, then
>>>> force the FS the server uses for queuing mail read-only, and see if you
>>>> can
>>>> submit mail, then go read the RFCs for SMTP and see what clients are
>>>> supposed to do when they can't submit mail.  A properly designed piece of
>>>> software is supposed to be resilient against common failure modes of the
>>>> resources it depends on (which includes ENOSPC and read-only filesystems
>>>> for
>>>> anything that works with data on disk).
>>>>>
>>>>>
>>>>>
>>>>> There's no loss of data on the disk because the data doesn't make it
>>>>> to disk in the first place. But it's exactly the same as if the data
>>>>> had been written to disk, and then lost.
>>>>>
>>>> No, it isn't.  If you absolutely need the data on disk, you should be
>>>> calling fsync or fdatasync, and then assuming if those return an error
>>>> that
>>>> none of the data written since the last call has gotten to the disk (some
>>>> of
>>>> it might have, but you need to assume it hasn't).  Every piece of
>>>> software
>>>> in wide usage that requires data to be on the disk does this, because
>>>> otherwise it can't guarantee that the data is on disk.
>>>
>>>
>>> I agree that a lot of stuff goes right in a perfect world. But most of
>>> the time what you're running isn't a mail server used by billions of
>>> users, but instead a bash script someone wrote once that's supposed to
>>> do something, and no one knows how it works.
>>>
>> And that's why no sane person does stuff like that on enterprise level
>> systems.  And even then, if the person writing the bash script actually
>> knows what they're doing, they will be using the 'sync' command to ensure
>> data integrity when they actually need it, or they will write their script
>> in such a way that it gracefully handles a partial run.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-10 23:47                     ` cheater00 .
  2016-01-11  0:24                       ` Chris Murphy
@ 2016-01-11 19:50                       ` Henk Slager
  2016-01-11 23:03                         ` cheater00 .
  1 sibling, 1 reply; 55+ messages in thread
From: Henk Slager @ 2016-01-11 19:50 UTC (permalink / raw)
  To: Btrfs BTRFS

On Mon, Jan 11, 2016 at 12:47 AM, cheater00 . <cheater00@gmail.com> wrote:
> On Sun, Jan 10, 2016 at 3:14 PM, Henk Slager <eye1tm@gmail.com> wrote:
>> On Sat, Jan 9, 2016 at 9:26 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>>> On Sat, Jan 09, 2016 at 09:00:47PM +0100, cheater00 . wrote:
>>>> Hello,
>>>> I can repeatedly trigger this bug by making the "data" portion fill
>>>> up. If you remember the partition is 6 TB but in btrfs filesystem df
>>>> Data is shown as only 2TB when in fact it should be nearly 6TB. So
>>>> this has nothing to do with kernel bugs. The filesystem on disk is
>>>> structured incorrectly. How do i fix this? How do I make "Data"
>>>> bigger? What is it exactly?
>>>
>>>    This is *exactly* the behaviour of the known kernel bug. The bug is
>>> that the FS *should* be extending the data allocation when it gets
>>> near to full, and it's not. There is no way of manually allocating
>>> more (because the FS should be doing it automatically). There is no
>>> known way of persuading the FS to it when it isn't.
>>
>> Probably this is 'the'  bug we talk about:
>> https://bugzilla.kernel.org/show_bug.cgi?id=74101
>
> Yes, would seem like it.
>
>> Size of the fs is much smaller, but also problem occurs when fill-level is <50%
>>
>> btrfs fs resize  did nothing you mention, but AFAIK you should see
>> something in dmesg when you do that.
>
>
> I remounted (to make the filesystem rw again) and got the attached
> log. There are some errors in there. But bear in mind that this disk
> has been hanging on in ro mode for a few days now.
>
> I did this:
> # btrfs fi resize -1T Media
> Resize 'Media' of '-1T'
> # btrfs filesystem resize -1G Media
> Resize 'Media' of '-1G'
> # btrfs filesystem resize -1T Media
> Resize 'Media' of '-1T'
> # btrfs filesystem resize max Media
> Resize 'Media' of 'max'
>
> and it resulted in the following lines in dmesg:
> [189115.919160] BTRFS: new size for /dev/sdc1 is 4901661835264
> [189177.306291] BTRFS: new size for /dev/sdc1 is 4900588093440
> [189181.950289] BTRFS: new size for /dev/sdc1 is 3801076465664
> [189232.064357] BTRFS: new size for /dev/sdc1 is 6001173463040
>
> (note the device changed from sdd to sdc when I rebooted last)
>
>> And what is the output of   gdisk -l /dev/sdd  , just to check?
>
> (the device changed to sdc since I've rebooted)
>
> # gdisk -l /dev/sdc
> GPT fdisk (gdisk) version 0.8.8
>
> Partition table scan:
>   MBR: protective
>   BSD: not present
>   APM: not present
>   GPT: present
>
> Found valid GPT with protective MBR; using GPT.
> Disk /dev/sdc: 11721045168 sectors, 5.5 TiB
> Logical sector size: 512 bytes
> Disk identifier (GUID): 0DEF5509-8730-4AB4-A846-79DA3C376F66
> Partition table holds up to 128 entries
> First usable sector is 34, last usable sector is 11721045134
> Partitions will be aligned on 2048-sector boundaries
> Total free space is 3181 sectors (1.6 MiB)
>
> Number  Start (sector)    End (sector)  Size       Code  Name
>    1            2048     11721043967   5.5 TiB     8300
>
>
>> Have you had the fs already filled up to e.g. 95% before or has is
>> always been not more than 2TiB?
>
> It has never been more than 2TB, I've had it for quite some time now
> but it's always hovered around 1TB.
>
>> Why are the single (empty )profiles for metadata and system still
>> there?  They should have been removed already by the various balancing
>> operations that are advised in the btrfs-wiki.
>
> Not sure. Would this happen automatically? Or is this something I
> should have done?
> I have another fs on the same model/size disk, which isn't exhibiting
> this bug, and it has those profiles as well.

There is nothing really wrong (just by quickly browsing the numbers)
as far as I can see. So it seems the 'btrfs space allocation
mechanism' somehow gives up under certain circumstances. The fs was
created by bit older tools/kernel, that is why these dummy single
chunks are there. They should not harm, but at least a full balance
will remove them. Also a parttial balance -musage=0 with latest tools
should also remove them, i don't remember exactly. If they are not
needed, keep the fs as simple as possible and remove them I would say.
I was just wondering what balancing options you have tried.

> Here's what both look like:
>
> buggy fs ("Media"):
> Data, single: total=1.98TiB, used=1.98TiB
> System, DUP: total=8.00MiB, used=240.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, DUP: total=5.50GiB, used=3.49GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> non-buggy fs:
> Data, single: total=5.43TiB, used=5.40TiB
> System, DUP: total=8.00MiB, used=608.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, DUP: total=12.50GiB, used=11.21GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
>> What is the output of   btrfs check  /dev/sdd  ?  The usb resets
>> mentioned might have introduced some errors to the fs (that is what I
>> have experienced at least, but it depends on timing etc)
>
> I'll run that overnight and report tomorrow.
>
>> What you could try is to create an image+'copy' of the fs with
>> btrfs-image just after you get ENOSPC abd then do various tests with
>> that (make sure unmount or even better unplug the physical hdd!). Like
>> mounting and then try to add a file, convert all metadata + system
>> from dup to single and then try to add a file. It all doesn't give
>> real space, but it might give hints to what could be wrong.
>
> I can't do that because I would have to buy an extra disk which is 300 euro.

btrfs-image only images/dumps the metadata. In your case it is ~6GiB.
# umount /dev/sdc1
# cd /sparse-file-featured-fs-mount/
# btrfs-image /dev/sdc1 Mediafs.metadump
# btrfs-image -r Mediafs.metadump Mediafs.img
# losetup -f /Mediafs.img
# mount /dev/loopX /mnt
# dd if=/dev/zero of=/mnt/testfile bs=1M count=5000

The file Mediafs.img is initially using ~6GiB space, its reported size
is same as real partition on the harddisk. The dd command increases it
with 5GB and simulates a large file copy. So you need upto 10's of GBs
freespace to do this tests, fast SSD preferred. You can also copy some
real files and see if you hit ENOSPC.

If the test is on btrfs, you can easily snapshot or copy--reflink the
original Mediafs.img and try various metadata balancing. And use the
btrfs compression.

>
>> If you somehow manage to reduce the fs by lets say 100G and also the
>> partition, you could install or copy a newer linux+kernel to
>> partition(s) in that 100G space and boot from there.
>
> let me try finding the latest kernel then. There are backports.
As suggested earlier, try kernel 4.4 (released today). People have
reported that they weren't bugged anymore by ENOSPC aborts after
switching from some older kernel.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 16:33                                 ` cheater00 .
@ 2016-01-11 20:29                                   ` Henk Slager
  2016-01-12  1:16                                     ` Duncan
  0 siblings, 1 reply; 55+ messages in thread
From: Henk Slager @ 2016-01-11 20:29 UTC (permalink / raw)
  To: Btrfs BTRFS

On Mon, Jan 11, 2016 at 5:33 PM, cheater00 . <cheater00@gmail.com> wrote:
> After unmounting:
> [251818.992992] BTRFS error (device sdc1): cleaner transaction attach
> returned -30
>
> and remounting:
> [251837.393750] BTRFS info (device sdc1): disk space caching is enabled
>
> the disk again resizes Data.
>
> On Mon, Jan 11, 2016 at 5:02 PM, cheater00 . <cheater00@gmail.com> wrote:
>> I triggered the bug again, attaching log. There were some usb resets,
>> but they happened 23 minutes before the fs crashed.
>>
>> At mount, the output of btrfs fi df -g was like this:
>> Data, single: total=2080.01GiB, used=2078.80GiB
>> System, DUP: total=0.01GiB, used=0.00GiB
>> System, single: total=0.00GiB, used=0.00GiB
>> Metadata, DUP: total=5.50GiB, used=3.73GiB
>> Metadata, single: total=0.01GiB, used=0.00GiB
>> GlobalReserve, single: total=0.50GiB, used=0.00GiB
>>
>> Now it is:
>> Data, single: total=2094.01GiB, used=2092.26GiB
>> System, DUP: total=0.01GiB, used=0.00GiB
>> System, single: total=0.00GiB, used=0.00GiB
>> Metadata, DUP: total=5.50GiB, used=3.79GiB
>> Metadata, single: total=0.01GiB, used=0.00GiB
>> GlobalReserve, single: total=0.50GiB, used=0.00GiB

The Data increments with 1G (or 10G?) chunks if there is no
well-fitting space in existing chunks.

>> The file being copied at the time was 954 MB.

The space cache faults get repaired automatically, that is not an
issue. Also your fs gives no errors when doing a check.
The sort of files you write is nothing difficult for btrfs, even not
over a stable usb2 connection. I am writing/incrementing files of
~50G, also C-sources etc over an old usb2 link since 2 years Also
balancing many hours over this link. I have not seen resets however
(grepped /var/log/messages). The usb2 link is an 2T usb3 disk plugged
into an usb2 port. In usb3 port it isn't even recognised. I also have
a usb3 3.5 inch dock that produces resets with a 4T harddisk since
kernel 4.x (was 4.1.x or so, it was a dd full disk copy). Also if you
have too many ATA resets caused by whatever or short disconnects,
btrfs might get in trouble; it can't handle bad disks/IO very well
currently.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11  9:03                         ` Hugo Mills
  2016-01-11 13:04                           ` cheater00 .
@ 2016-01-11 21:31                           ` Chris Murphy
  2016-01-11 22:10                             ` Hugo Mills
  2016-01-11 22:57                             ` cheater00 .
  1 sibling, 2 replies; 55+ messages in thread
From: Chris Murphy @ 2016-01-11 21:31 UTC (permalink / raw)
  To: Hugo Mills, Btrfs BTRFS

On Mon, Jan 11, 2016 at 2:03 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Sun, Jan 10, 2016 at 05:13:28PM -0700, Chris Murphy wrote:
>> On Sat, Jan 9, 2016 at 2:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> > On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
>> >> OK. How do we track down that bug and get it fixed?
>> >
>> >    I have no idea. I'm not a btrfs dev, I'm afraid.
>> >
>> >    It's been around for a number of years. None of the devs has, I
>> > think, had the time to look at it. When Josef was still (publicly)
>> > active, he had it second on his list of bugs to look at for many
>> > months -- but it always got trumped by some new bug that could cause
>> > data loss.
>>
>>
>> Interesting. I did not know of this bug. It's pretty rare.
>
>    Not really. It shows up maybe on average once a week on IRC. It
> gets reported much less on the mailing list.

Is there a pattern? Does it only happen at a 2TiB threshold?


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 21:31                           ` Chris Murphy
@ 2016-01-11 22:10                             ` Hugo Mills
  2016-01-11 22:20                               ` Chris Murphy
  2016-01-11 22:57                             ` cheater00 .
  1 sibling, 1 reply; 55+ messages in thread
From: Hugo Mills @ 2016-01-11 22:10 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]

On Mon, Jan 11, 2016 at 02:31:41PM -0700, Chris Murphy wrote:
> On Mon, Jan 11, 2016 at 2:03 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
> > On Sun, Jan 10, 2016 at 05:13:28PM -0700, Chris Murphy wrote:
> >> On Sat, Jan 9, 2016 at 2:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> >> > On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
> >> >> OK. How do we track down that bug and get it fixed?
> >> >
> >> >    I have no idea. I'm not a btrfs dev, I'm afraid.
> >> >
> >> >    It's been around for a number of years. None of the devs has, I
> >> > think, had the time to look at it. When Josef was still (publicly)
> >> > active, he had it second on his list of bugs to look at for many
> >> > months -- but it always got trumped by some new bug that could cause
> >> > data loss.
> >>
> >>
> >> Interesting. I did not know of this bug. It's pretty rare.
> >
> >    Not really. It shows up maybe on average once a week on IRC. It
> > gets reported much less on the mailing list.
> 
> Is there a pattern? Does it only happen at a 2TiB threshold?

   No, and no.

   There is, as far as I can tell from some years of seeing reports of
this bug, no correlation with RAID level, hardware, OS, kernel
version, FS size, usage of the FS at failure, or allocation level of
either data or metadata at failure.

   I haven't tried correlating with the phase of the moon or the
losses on Lloyds Register yet.

   Hugo.

-- 
Hugo Mills             | "What are we going to do tonight?"
hugo@... carfax.org.uk | "The same thing we do every night, Pinky. Try to
http://carfax.org.uk/  | take over the world!"
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 22:10                             ` Hugo Mills
@ 2016-01-11 22:20                               ` Chris Murphy
  2016-01-11 22:30                                 ` Hugo Mills
  0 siblings, 1 reply; 55+ messages in thread
From: Chris Murphy @ 2016-01-11 22:20 UTC (permalink / raw)
  To: Hugo Mills, Chris Murphy, Btrfs BTRFS

On Mon, Jan 11, 2016 at 3:10 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Mon, Jan 11, 2016 at 02:31:41PM -0700, Chris Murphy wrote:
>> On Mon, Jan 11, 2016 at 2:03 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> > On Sun, Jan 10, 2016 at 05:13:28PM -0700, Chris Murphy wrote:
>> >> On Sat, Jan 9, 2016 at 2:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> >> > On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
>> >> >> OK. How do we track down that bug and get it fixed?
>> >> >
>> >> >    I have no idea. I'm not a btrfs dev, I'm afraid.
>> >> >
>> >> >    It's been around for a number of years. None of the devs has, I
>> >> > think, had the time to look at it. When Josef was still (publicly)
>> >> > active, he had it second on his list of bugs to look at for many
>> >> > months -- but it always got trumped by some new bug that could cause
>> >> > data loss.
>> >>
>> >>
>> >> Interesting. I did not know of this bug. It's pretty rare.
>> >
>> >    Not really. It shows up maybe on average once a week on IRC. It
>> > gets reported much less on the mailing list.
>>
>> Is there a pattern? Does it only happen at a 2TiB threshold?
>
>    No, and no.
>
>    There is, as far as I can tell from some years of seeing reports of
> this bug, no correlation with RAID level, hardware, OS, kernel
> version, FS size, usage of the FS at failure, or allocation level of
> either data or metadata at failure.
>
>    I haven't tried correlating with the phase of the moon or the
> losses on Lloyds Register yet.

Huh. So it's goofy cakes.

This is specifically where btrfs_free_extent produces errno -28 no
space left, and then the fs goes read-only?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 22:20                               ` Chris Murphy
@ 2016-01-11 22:30                                 ` Hugo Mills
  2016-01-11 22:39                                   ` Chris Murphy
                                                     ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Hugo Mills @ 2016-01-11 22:30 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 2712 bytes --]

On Mon, Jan 11, 2016 at 03:20:36PM -0700, Chris Murphy wrote:
> On Mon, Jan 11, 2016 at 3:10 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> > On Mon, Jan 11, 2016 at 02:31:41PM -0700, Chris Murphy wrote:
> >> On Mon, Jan 11, 2016 at 2:03 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
> >> > On Sun, Jan 10, 2016 at 05:13:28PM -0700, Chris Murphy wrote:
> >> >> On Sat, Jan 9, 2016 at 2:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> >> >> > On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
> >> >> >> OK. How do we track down that bug and get it fixed?
> >> >> >
> >> >> >    I have no idea. I'm not a btrfs dev, I'm afraid.
> >> >> >
> >> >> >    It's been around for a number of years. None of the devs has, I
> >> >> > think, had the time to look at it. When Josef was still (publicly)
> >> >> > active, he had it second on his list of bugs to look at for many
> >> >> > months -- but it always got trumped by some new bug that could cause
> >> >> > data loss.
> >> >>
> >> >>
> >> >> Interesting. I did not know of this bug. It's pretty rare.
> >> >
> >> >    Not really. It shows up maybe on average once a week on IRC. It
> >> > gets reported much less on the mailing list.
> >>
> >> Is there a pattern? Does it only happen at a 2TiB threshold?
> >
> >    No, and no.
> >
> >    There is, as far as I can tell from some years of seeing reports of
> > this bug, no correlation with RAID level, hardware, OS, kernel
> > version, FS size, usage of the FS at failure, or allocation level of
> > either data or metadata at failure.
> >
> >    I haven't tried correlating with the phase of the moon or the
> > losses on Lloyds Register yet.
> 
> Huh. So it's goofy cakes.
> 
> This is specifically where btrfs_free_extent produces errno -28 no
> space left, and then the fs goes read-only?

   The symptoms I'm using for a diagnosis of this bug are that the FS
runs out of (usually data) space when there's still unallocated space
remaining that it could use for another block group.

   Forced RO isn't usually a symptom, although the FS can get into a
state where you can't modify it (as distinct from being explicitly
read-only).

   Block-group level operations, like balance, device delete, device
add sometimes seem to have some kind of (usually small) effect on the
point at which the error occurs. If you hit the problem and run a
balance, you might end up making things worse by a couple of
gigabytes, or making things better by the same amount, or having no
effect at all.

   Hugo.

-- 
Hugo Mills             | "What are we going to do tonight?"
hugo@... carfax.org.uk | "The same thing we do every night, Pinky. Try to
http://carfax.org.uk/  | take over the world!"
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 22:30                                 ` Hugo Mills
@ 2016-01-11 22:39                                   ` Chris Murphy
  2016-01-11 23:07                                     ` Hugo Mills
  2016-01-11 23:05                                   ` cheater00 .
  2016-01-12  2:05                                   ` Duncan
  2 siblings, 1 reply; 55+ messages in thread
From: Chris Murphy @ 2016-01-11 22:39 UTC (permalink / raw)
  To: Hugo Mills, Chris Murphy, Btrfs BTRFS

On Mon, Jan 11, 2016 at 3:30 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Mon, Jan 11, 2016 at 03:20:36PM -0700, Chris Murphy wrote:
>> On Mon, Jan 11, 2016 at 3:10 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> > On Mon, Jan 11, 2016 at 02:31:41PM -0700, Chris Murphy wrote:
>> >> On Mon, Jan 11, 2016 at 2:03 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> >> > On Sun, Jan 10, 2016 at 05:13:28PM -0700, Chris Murphy wrote:
>> >> >> On Sat, Jan 9, 2016 at 2:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> >> >> > On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
>> >> >> >> OK. How do we track down that bug and get it fixed?
>> >> >> >
>> >> >> >    I have no idea. I'm not a btrfs dev, I'm afraid.
>> >> >> >
>> >> >> >    It's been around for a number of years. None of the devs has, I
>> >> >> > think, had the time to look at it. When Josef was still (publicly)
>> >> >> > active, he had it second on his list of bugs to look at for many
>> >> >> > months -- but it always got trumped by some new bug that could cause
>> >> >> > data loss.
>> >> >>
>> >> >>
>> >> >> Interesting. I did not know of this bug. It's pretty rare.
>> >> >
>> >> >    Not really. It shows up maybe on average once a week on IRC. It
>> >> > gets reported much less on the mailing list.
>> >>
>> >> Is there a pattern? Does it only happen at a 2TiB threshold?
>> >
>> >    No, and no.
>> >
>> >    There is, as far as I can tell from some years of seeing reports of
>> > this bug, no correlation with RAID level, hardware, OS, kernel
>> > version, FS size, usage of the FS at failure, or allocation level of
>> > either data or metadata at failure.
>> >
>> >    I haven't tried correlating with the phase of the moon or the
>> > losses on Lloyds Register yet.
>>
>> Huh. So it's goofy cakes.
>>
>> This is specifically where btrfs_free_extent produces errno -28 no
>> space left, and then the fs goes read-only?
>
>    The symptoms I'm using for a diagnosis of this bug are that the FS
> runs out of (usually data) space when there's still unallocated space
> remaining that it could use for another block group.
>
>    Forced RO isn't usually a symptom, although the FS can get into a
> state where you can't modify it (as distinct from being explicitly
> read-only).
>
>    Block-group level operations, like balance, device delete, device
> add sometimes seem to have some kind of (usually small) effect on the
> point at which the error occurs. If you hit the problem and run a
> balance, you might end up making things worse by a couple of
> gigabytes, or making things better by the same amount, or having no
> effect at all.

Are there any compile time options not normally set that would help find it?
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set

Once it starts to happen, it sounds like it's straightforward to
reproduce in a short amount of time. I'm kinda surprised I've never
run into this.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 21:31                           ` Chris Murphy
  2016-01-11 22:10                             ` Hugo Mills
@ 2016-01-11 22:57                             ` cheater00 .
  1 sibling, 0 replies; 55+ messages in thread
From: cheater00 . @ 2016-01-11 22:57 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Hugo Mills, Btrfs BTRFS

On Mon, Jan 11, 2016 at 10:31 PM, Chris Murphy <lists@colorremedies.com> wrote:
> On Mon, Jan 11, 2016 at 2:03 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> On Sun, Jan 10, 2016 at 05:13:28PM -0700, Chris Murphy wrote:
>>> Interesting. I did not know of this bug. It's pretty rare.
>> Not really. It shows up maybe on average once a week on IRC. It
>> gets reported much less on the mailing list.
>
> Is there a pattern? Does it only happen at a 2TiB threshold?

In my case, it seems to me that there's two bugs that I encountered:

1. the free space cache being invalid blocked any and all resizes.
This was fixed with fsck (maybe resizing the fs to smaller and then to
max size contributed to the fix as well)

2. when i copy files onto the fs, most of the time the resizes work
correctly, but every now and then they won't, and it seems to me like
this only happens when there's a lot of data being copied, the cpu is
hot, likely the usb controller is hot, etc. It seems to me like at
that point the hardware just becomes less robust (without exactly
becoming buggy).

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 19:50                       ` Henk Slager
@ 2016-01-11 23:03                         ` cheater00 .
  0 siblings, 0 replies; 55+ messages in thread
From: cheater00 . @ 2016-01-11 23:03 UTC (permalink / raw)
  To: Henk Slager; +Cc: Btrfs BTRFS

On Mon, Jan 11, 2016 at 8:50 PM, Henk Slager <eye1tm@gmail.com> wrote:
> On Mon, Jan 11, 2016 at 12:47 AM, cheater00 . <cheater00@gmail.com> wrote:
>> On Sun, Jan 10, 2016 at 3:14 PM, Henk Slager <eye1tm@gmail.com> wrote:
>>> On Sat, Jan 9, 2016 at 9:26 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>>>> On Sat, Jan 09, 2016 at 09:00:47PM +0100, cheater00 . wrote:
>>>>> Hello,
>>>>> I can repeatedly trigger this bug by making the "data" portion fill
>>>>> up. If you remember the partition is 6 TB but in btrfs filesystem df
>>>>> Data is shown as only 2TB when in fact it should be nearly 6TB. So
>>>>> this has nothing to do with kernel bugs. The filesystem on disk is
>>>>> structured incorrectly. How do i fix this? How do I make "Data"
>>>>> bigger? What is it exactly?
>>>>
>>>>    This is *exactly* the behaviour of the known kernel bug. The bug is
>>>> that the FS *should* be extending the data allocation when it gets
>>>> near to full, and it's not. There is no way of manually allocating
>>>> more (because the FS should be doing it automatically). There is no
>>>> known way of persuading the FS to it when it isn't.
>>>
>>> Probably this is 'the'  bug we talk about:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=74101
>>
>> Yes, would seem like it.
>>
>>> Size of the fs is much smaller, but also problem occurs when fill-level is <50%
>>>
>>> btrfs fs resize  did nothing you mention, but AFAIK you should see
>>> something in dmesg when you do that.
>>
>>
>> I remounted (to make the filesystem rw again) and got the attached
>> log. There are some errors in there. But bear in mind that this disk
>> has been hanging on in ro mode for a few days now.
>>
>> I did this:
>> # btrfs fi resize -1T Media
>> Resize 'Media' of '-1T'
>> # btrfs filesystem resize -1G Media
>> Resize 'Media' of '-1G'
>> # btrfs filesystem resize -1T Media
>> Resize 'Media' of '-1T'
>> # btrfs filesystem resize max Media
>> Resize 'Media' of 'max'
>>
>> and it resulted in the following lines in dmesg:
>> [189115.919160] BTRFS: new size for /dev/sdc1 is 4901661835264
>> [189177.306291] BTRFS: new size for /dev/sdc1 is 4900588093440
>> [189181.950289] BTRFS: new size for /dev/sdc1 is 3801076465664
>> [189232.064357] BTRFS: new size for /dev/sdc1 is 6001173463040
>>
>> (note the device changed from sdd to sdc when I rebooted last)
>>
>>> And what is the output of   gdisk -l /dev/sdd  , just to check?
>>
>> (the device changed to sdc since I've rebooted)
>>
>> # gdisk -l /dev/sdc
>> GPT fdisk (gdisk) version 0.8.8
>>
>> Partition table scan:
>>   MBR: protective
>>   BSD: not present
>>   APM: not present
>>   GPT: present
>>
>> Found valid GPT with protective MBR; using GPT.
>> Disk /dev/sdc: 11721045168 sectors, 5.5 TiB
>> Logical sector size: 512 bytes
>> Disk identifier (GUID): 0DEF5509-8730-4AB4-A846-79DA3C376F66
>> Partition table holds up to 128 entries
>> First usable sector is 34, last usable sector is 11721045134
>> Partitions will be aligned on 2048-sector boundaries
>> Total free space is 3181 sectors (1.6 MiB)
>>
>> Number  Start (sector)    End (sector)  Size       Code  Name
>>    1            2048     11721043967   5.5 TiB     8300
>>
>>
>>> Have you had the fs already filled up to e.g. 95% before or has is
>>> always been not more than 2TiB?
>>
>> It has never been more than 2TB, I've had it for quite some time now
>> but it's always hovered around 1TB.
>>
>>> Why are the single (empty )profiles for metadata and system still
>>> there?  They should have been removed already by the various balancing
>>> operations that are advised in the btrfs-wiki.
>>
>> Not sure. Would this happen automatically? Or is this something I
>> should have done?
>> I have another fs on the same model/size disk, which isn't exhibiting
>> this bug, and it has those profiles as well.
>
> There is nothing really wrong (just by quickly browsing the numbers)
> as far as I can see. So it seems the 'btrfs space allocation
> mechanism' somehow gives up under certain circumstances.

Can we somehow make this thing tell us what it's doing and why it's
breaking? Some sort of debug log on when things are being allocated?

> The fs was
> created by bit older tools/kernel, that is why these dummy single
> chunks are there. They should not harm, but at least a full balance
> will remove them. Also a parttial balance -musage=0 with latest tools
> should also remove them, i don't remember exactly. If they are not
> needed, keep the fs as simple as possible and remove them I would say.
> I was just wondering what balancing options you have tried.

That's one thing to try.

>>> What you could try is to create an image+'copy' of the fs with
>>> btrfs-image just after you get ENOSPC abd then do various tests with
>>> that (make sure unmount or even better unplug the physical hdd!). Like
>>> mounting and then try to add a file, convert all metadata + system
>>> from dup to single and then try to add a file. It all doesn't give
>>> real space, but it might give hints to what could be wrong.
>>
>> I can't do that because I would have to buy an extra disk which is 300 euro.
>
> btrfs-image only images/dumps the metadata. In your case it is ~6GiB.
> # umount /dev/sdc1
> # cd /sparse-file-featured-fs-mount/
> # btrfs-image /dev/sdc1 Mediafs.metadump
> # btrfs-image -r Mediafs.metadump Mediafs.img
> # losetup -f /Mediafs.img
> # mount /dev/loopX /mnt
> # dd if=/dev/zero of=/mnt/testfile bs=1M count=5000
>
> The file Mediafs.img is initially using ~6GiB space, its reported size
> is same as real partition on the harddisk. The dd command increases it
> with 5GB and simulates a large file copy. So you need upto 10's of GBs
> freespace to do this tests, fast SSD preferred. You can also copy some
> real files and see if you hit ENOSPC.
>
> If the test is on btrfs, you can easily snapshot or copy--reflink the
> original Mediafs.img and try various metadata balancing. And use the
> btrfs compression.

That's another thing to try, thanks. This seems very manageable.

>>> If you somehow manage to reduce the fs by lets say 100G and also the
>>> partition, you could install or copy a newer linux+kernel to
>>> partition(s) in that 100G space and boot from there.
>>
>> let me try finding the latest kernel then. There are backports.
> As suggested earlier, try kernel 4.4 (released today). People have
> reported that they weren't bugged anymore by ENOSPC aborts after
> switching from some older kernel.

I'll see if I can get a hold of this for my version of Ubuntu. This
will be the first thing I'll try. Then I'll try the imaging, and then
the rebalancing on the image.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 22:30                                 ` Hugo Mills
  2016-01-11 22:39                                   ` Chris Murphy
@ 2016-01-11 23:05                                   ` cheater00 .
  2016-01-12  2:05                                   ` Duncan
  2 siblings, 0 replies; 55+ messages in thread
From: cheater00 . @ 2016-01-11 23:05 UTC (permalink / raw)
  To: Hugo Mills, Chris Murphy, Btrfs BTRFS

On Mon, Jan 11, 2016 at 11:30 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Mon, Jan 11, 2016 at 03:20:36PM -0700, Chris Murphy wrote:
>> On Mon, Jan 11, 2016 at 3:10 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> > On Mon, Jan 11, 2016 at 02:31:41PM -0700, Chris Murphy wrote:
>> >> On Mon, Jan 11, 2016 at 2:03 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> >> > On Sun, Jan 10, 2016 at 05:13:28PM -0700, Chris Murphy wrote:
>> >> >> On Sat, Jan 9, 2016 at 2:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> >> >> > On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
>> >> >> >> OK. How do we track down that bug and get it fixed?
>> >> >> >
>> >> >> >    I have no idea. I'm not a btrfs dev, I'm afraid.
>> >> >> >
>> >> >> >    It's been around for a number of years. None of the devs has, I
>> >> >> > think, had the time to look at it. When Josef was still (publicly)
>> >> >> > active, he had it second on his list of bugs to look at for many
>> >> >> > months -- but it always got trumped by some new bug that could cause
>> >> >> > data loss.
>> >> >>
>> >> >>
>> >> >> Interesting. I did not know of this bug. It's pretty rare.
>> >> >
>> >> >    Not really. It shows up maybe on average once a week on IRC. It
>> >> > gets reported much less on the mailing list.
>> >>
>> >> Is there a pattern? Does it only happen at a 2TiB threshold?
>> >
>> >    No, and no.
>> >
>> >    There is, as far as I can tell from some years of seeing reports of
>> > this bug, no correlation with RAID level, hardware, OS, kernel
>> > version, FS size, usage of the FS at failure, or allocation level of
>> > either data or metadata at failure.
>> >
>> >    I haven't tried correlating with the phase of the moon or the
>> > losses on Lloyds Register yet.
>>
>> Huh. So it's goofy cakes.
>>
>> This is specifically where btrfs_free_extent produces errno -28 no
>> space left, and then the fs goes read-only?
>
>    The symptoms I'm using for a diagnosis of this bug are that the FS
> runs out of (usually data) space when there's still unallocated space
> remaining that it could use for another block group.
>
>    Forced RO isn't usually a symptom, although the FS can get into a
> state where you can't modify it (as distinct from being explicitly
> read-only).

In my case, the fs always remounts as RO immediately, so maybe you're
encountering another bug. It might make sense to keep those separate
in our heads.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 22:39                                   ` Chris Murphy
@ 2016-01-11 23:07                                     ` Hugo Mills
  2016-01-11 23:12                                       ` cheater00 .
  0 siblings, 1 reply; 55+ messages in thread
From: Hugo Mills @ 2016-01-11 23:07 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3979 bytes --]

On Mon, Jan 11, 2016 at 03:39:43PM -0700, Chris Murphy wrote:
> On Mon, Jan 11, 2016 at 3:30 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> > On Mon, Jan 11, 2016 at 03:20:36PM -0700, Chris Murphy wrote:
> >> On Mon, Jan 11, 2016 at 3:10 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> >> > On Mon, Jan 11, 2016 at 02:31:41PM -0700, Chris Murphy wrote:
> >> >> On Mon, Jan 11, 2016 at 2:03 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
> >> >> > On Sun, Jan 10, 2016 at 05:13:28PM -0700, Chris Murphy wrote:
> >> >> >> On Sat, Jan 9, 2016 at 2:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> >> >> >> > On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
> >> >> >> >> OK. How do we track down that bug and get it fixed?
> >> >> >> >
> >> >> >> >    I have no idea. I'm not a btrfs dev, I'm afraid.
> >> >> >> >
> >> >> >> >    It's been around for a number of years. None of the devs has, I
> >> >> >> > think, had the time to look at it. When Josef was still (publicly)
> >> >> >> > active, he had it second on his list of bugs to look at for many
> >> >> >> > months -- but it always got trumped by some new bug that could cause
> >> >> >> > data loss.
> >> >> >>
> >> >> >>
> >> >> >> Interesting. I did not know of this bug. It's pretty rare.
> >> >> >
> >> >> >    Not really. It shows up maybe on average once a week on IRC. It
> >> >> > gets reported much less on the mailing list.
> >> >>
> >> >> Is there a pattern? Does it only happen at a 2TiB threshold?
> >> >
> >> >    No, and no.
> >> >
> >> >    There is, as far as I can tell from some years of seeing reports of
> >> > this bug, no correlation with RAID level, hardware, OS, kernel
> >> > version, FS size, usage of the FS at failure, or allocation level of
> >> > either data or metadata at failure.
> >> >
> >> >    I haven't tried correlating with the phase of the moon or the
> >> > losses on Lloyds Register yet.
> >>
> >> Huh. So it's goofy cakes.
> >>
> >> This is specifically where btrfs_free_extent produces errno -28 no
> >> space left, and then the fs goes read-only?
> >
> >    The symptoms I'm using for a diagnosis of this bug are that the FS
> > runs out of (usually data) space when there's still unallocated space
> > remaining that it could use for another block group.
> >
> >    Forced RO isn't usually a symptom, although the FS can get into a
> > state where you can't modify it (as distinct from being explicitly
> > read-only).
> >
> >    Block-group level operations, like balance, device delete, device
> > add sometimes seem to have some kind of (usually small) effect on the
> > point at which the error occurs. If you hit the problem and run a
> > balance, you might end up making things worse by a couple of
> > gigabytes, or making things better by the same amount, or having no
> > effect at all.
> 
> Are there any compile time options not normally set that would help find it?
> # CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
> # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
> # CONFIG_BTRFS_DEBUG is not set
> # CONFIG_BTRFS_ASSERT is not set
> 
> Once it starts to happen, it sounds like it's straightforward to
> reproduce in a short amount of time. I'm kinda surprised I've never
> run into this.

   It does sometimes have a repeating nature: I'm reasonably sure
we've seen a few people get it repeatedly on different filesystems.
This might point at a particular workload needed to trigger it. (Or
just bad luck / statistical likelihood). Some people have never hit
it.

   There is (or at least, was) an ENOSPC debugging option. I think
that's a mount option. That's probably the most useful one, but the
range of usefulness of existing debug output may be very small. :)

   (Sorry for the vague nature of this reply -- it's been a very long
day).

   Hugo.

-- 
Hugo Mills             | "What are we going to do tonight?"
hugo@... carfax.org.uk | "The same thing we do every night, Pinky. Try to
http://carfax.org.uk/  | take over the world!"
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 23:07                                     ` Hugo Mills
@ 2016-01-11 23:12                                       ` cheater00 .
  0 siblings, 0 replies; 55+ messages in thread
From: cheater00 . @ 2016-01-11 23:12 UTC (permalink / raw)
  To: Hugo Mills, Chris Murphy, Btrfs BTRFS

yeah it's -o enospc_debug. I forgot to enable it this time. I'll
enable it and see where that goes. I'll put it in fstab.

On Tue, Jan 12, 2016 at 12:07 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Mon, Jan 11, 2016 at 03:39:43PM -0700, Chris Murphy wrote:
>> On Mon, Jan 11, 2016 at 3:30 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> > On Mon, Jan 11, 2016 at 03:20:36PM -0700, Chris Murphy wrote:
>> >> On Mon, Jan 11, 2016 at 3:10 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> >> > On Mon, Jan 11, 2016 at 02:31:41PM -0700, Chris Murphy wrote:
>> >> >> On Mon, Jan 11, 2016 at 2:03 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> >> >> > On Sun, Jan 10, 2016 at 05:13:28PM -0700, Chris Murphy wrote:
>> >> >> >> On Sat, Jan 9, 2016 at 2:04 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> >> >> >> > On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
>> >> >> >> >> OK. How do we track down that bug and get it fixed?
>> >> >> >> >
>> >> >> >> >    I have no idea. I'm not a btrfs dev, I'm afraid.
>> >> >> >> >
>> >> >> >> >    It's been around for a number of years. None of the devs has, I
>> >> >> >> > think, had the time to look at it. When Josef was still (publicly)
>> >> >> >> > active, he had it second on his list of bugs to look at for many
>> >> >> >> > months -- but it always got trumped by some new bug that could cause
>> >> >> >> > data loss.
>> >> >> >>
>> >> >> >>
>> >> >> >> Interesting. I did not know of this bug. It's pretty rare.
>> >> >> >
>> >> >> >    Not really. It shows up maybe on average once a week on IRC. It
>> >> >> > gets reported much less on the mailing list.
>> >> >>
>> >> >> Is there a pattern? Does it only happen at a 2TiB threshold?
>> >> >
>> >> >    No, and no.
>> >> >
>> >> >    There is, as far as I can tell from some years of seeing reports of
>> >> > this bug, no correlation with RAID level, hardware, OS, kernel
>> >> > version, FS size, usage of the FS at failure, or allocation level of
>> >> > either data or metadata at failure.
>> >> >
>> >> >    I haven't tried correlating with the phase of the moon or the
>> >> > losses on Lloyds Register yet.
>> >>
>> >> Huh. So it's goofy cakes.
>> >>
>> >> This is specifically where btrfs_free_extent produces errno -28 no
>> >> space left, and then the fs goes read-only?
>> >
>> >    The symptoms I'm using for a diagnosis of this bug are that the FS
>> > runs out of (usually data) space when there's still unallocated space
>> > remaining that it could use for another block group.
>> >
>> >    Forced RO isn't usually a symptom, although the FS can get into a
>> > state where you can't modify it (as distinct from being explicitly
>> > read-only).
>> >
>> >    Block-group level operations, like balance, device delete, device
>> > add sometimes seem to have some kind of (usually small) effect on the
>> > point at which the error occurs. If you hit the problem and run a
>> > balance, you might end up making things worse by a couple of
>> > gigabytes, or making things better by the same amount, or having no
>> > effect at all.
>>
>> Are there any compile time options not normally set that would help find it?
>> # CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
>> # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
>> # CONFIG_BTRFS_DEBUG is not set
>> # CONFIG_BTRFS_ASSERT is not set
>>
>> Once it starts to happen, it sounds like it's straightforward to
>> reproduce in a short amount of time. I'm kinda surprised I've never
>> run into this.
>
>    It does sometimes have a repeating nature: I'm reasonably sure
> we've seen a few people get it repeatedly on different filesystems.
> This might point at a particular workload needed to trigger it. (Or
> just bad luck / statistical likelihood). Some people have never hit
> it.
>
>    There is (or at least, was) an ENOSPC debugging option. I think
> that's a mount option. That's probably the most useful one, but the
> range of usefulness of existing debug output may be very small. :)
>
>    (Sorry for the vague nature of this reply -- it's been a very long
> day).
>
>    Hugo.
>
> --
> Hugo Mills             | "What are we going to do tonight?"
> hugo@... carfax.org.uk | "The same thing we do every night, Pinky. Try to
> http://carfax.org.uk/  | take over the world!"
> PGP: E2AB1DE4          |

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11  7:54                             ` cheater00 .
@ 2016-01-12  0:35                               ` Duncan
  0 siblings, 0 replies; 55+ messages in thread
From: Duncan @ 2016-01-12  0:35 UTC (permalink / raw)
  To: linux-btrfs

cheater00 . posted on Mon, 11 Jan 2016 08:54:55 +0100 as excerpted:

> After the fsck, the Data segment is being resized correctly. It would
> seem to me that the fact Data was 2TB when this bug transpired was just
> a coincidence.
> 
> Perhaps this line:
> "BTRFS info (device sdc1): The free space cache file (2159324168192)
> is invalid. skip it"
> 
> should not be "info" but an error, and should instruct the user to fsck
> the file system.

[Please reply in context, under the context you're quoting, like this, 
not above it, which makes following the context very difficult and 
replying to multi-level context even more so, especially when others are 
trying to reply in context so some of the conversation is at the end, 
some at the beginning.]

It really /is/ info, as the filesystem simply regenerates that section of 
the free-space cache.  That the line disappeared in the remount after the 
fsck would appear to be coincidence as well.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 20:29                                   ` Henk Slager
@ 2016-01-12  1:16                                     ` Duncan
  0 siblings, 0 replies; 55+ messages in thread
From: Duncan @ 2016-01-12  1:16 UTC (permalink / raw)
  To: linux-btrfs

Henk Slager posted on Mon, 11 Jan 2016 21:29:37 +0100 as excerpted:

> The Data increments with 1G (or 10G?) chunks if there is no well-fitting
> space in existing chunks.

One of the devs (Qu?) explained this rather well in a followup to 
something or other, probably about a week or two ago.  I remember 
replying to the effect of how much more I knew of the exact numbers at 
the time.

Based on that, nominal data chunk size is 1 GiB, on btrfs upto 100 GiB in 
size (tho real small ones will have smaller data chunk sizes as well, not 
sure of the small end numbers, tho when mixed-mode was introduced until 
progs 4.3(.1 I think), mixed-mode was the default below 1 GiB size, and 
it uses metadata sized chunks).  Above 100 GiB, data chunk size increases 
to 10 GiB.  Similarly, metadata chunk sizes are 256 MiB below 100 GiB, I 
think 1 GiB above.

As we're talking multi-TB in this case, obviously well above 100 GiB, 
data chunk sizes are likely to be 10 GiB, tho I'm not 100% sure if that 
applies on single device, or only to striped chunks, with 1 GiB strips 
and stripes of up to 10 GiB if there's 10 devices in the stripe.

(As regulars here will know if they've kept track, I use multiple, much 
smaller btrfs here, the largest of which is 24 GiB per device, pair-
device raid1 for both data/metadata, so 48 GiB total fs size, 24 GiB each 
on two devices in raid1, 24 GiB capacity.  Obviously I see only 1 GiB 
data chunks, smaller of course on my 256 MiB /boot and its backup on the 
other device, both of which are single-device mixed-mode dup, so 128 MiB 
capacity.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 22:30                                 ` Hugo Mills
  2016-01-11 22:39                                   ` Chris Murphy
  2016-01-11 23:05                                   ` cheater00 .
@ 2016-01-12  2:05                                   ` Duncan
  2 siblings, 0 replies; 55+ messages in thread
From: Duncan @ 2016-01-12  2:05 UTC (permalink / raw)
  To: linux-btrfs

Hugo Mills posted on Mon, 11 Jan 2016 22:30:17 +0000 as excerpted:

> On Mon, Jan 11, 2016 at 03:20:36PM -0700, Chris Murphy wrote:
>> On Mon, Jan 11, 2016 at 3:10 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
>> >
>> > There is, as far as I can tell from some years of seeing reports of
>> > this bug, no correlation with RAID level, hardware, OS, kernel
>> > version, FS size, usage of the FS at failure, or allocation level of
>> > either data or metadata at failure.
>> >
>> > I haven't tried correlating with the phase of the moon or the
>> > losses on Lloyds Register yet.
>> 
>> Huh. So it's goofy cakes.
>> 
>> This is specifically where btrfs_free_extent produces errno -28 no
>> space left, and then the fs goes read-only?
> 
> The symptoms I'm using for a diagnosis of this bug are that the FS
> runs out of (usually data) space when there's still unallocated space
> remaining that it could use for another block group.
> 
> Forced RO isn't usually a symptom, although the FS can get into a
> state where you can't modify it (as distinct from being explicitly
> read-only).
> 
> Block-group level operations, like balance, device delete, device
> add sometimes seem to have some kind of (usually small) effect on the
> point at which the error occurs. If you hit the problem and run a
> balance, you might end up making things worse by a couple of gigabytes,
> or making things better by the same amount, or having no effect at all.

I had the problem for some kernels on my 256 MiB mixed-mode dup (so 128 
MiB capacity) /boot and its backup on my other ssd, when I'd recreate 
them to eliminate any hidden historic issues and take advantage of newer 
btrfs features, as I do all my btrfs from time to time, as an extension 
of my regular backups procedures.

My newly mkfs.btrfsed /boot or backup would NOT go read-only, but would 
ENOSPC as I attempted to copy files over from the older one -- same size 
and both btrfs so obviously the files should all fit.

The problem was obviously due to btrfs refusing to create a new chunk 
when the existing chunk (mixed-mode, so both data and metadata, at this 
size filesystem we're talking 16 MiB chunks or so) ran out of space.

The first time it happened I think I fiddled with balance, etc, maybe 
umount/remount, and eventually was able to copy the files.  I don't 
remember exactly.

The second time, I had been copying everything over in one go, and some 
of it copied while some didn't.  In particular, it was the grub2 modules 
subdir, grub/modules, that failed with some copied and some not.  So in mc 
I did a directory diff between source and destination, which selected the 
files that hadn't copied in the source.  I then tried copying them again, 
and I think a few copied before I got another ENOSPC.  At some point I 
think I fell back to trying one at a time.

Eventually they all copied.  Apparently, under some conditions a file 
copy that crosses the chunk threshold will trigger an ENOSPC instead of 
creation of another chunk, despite free space being available for 
creation of those chunks.  But by trying smaller files first, that would 
fit into the existing chunks, then trying a file that would force 
creation of a new chunk again, I eventually no longer triggered the 
failure to create chunk problem, and it created one as it should of, 
thereby allowing me to continue copying files normally.  But I'm not sure 
if it was simply chance based (maybe a race between the chunk creation 
and the attempt to copy data into it) and I tried enough times that 
eventually one succeeded, or if it was some filesystem condition that 
somehow eventually changed and let the new chunk be created, or if, 
perhaps, it was time based and the chunk creation eventually 
"registered", so files could then copy without issue.

But the last time I redid my /boot, perhaps 3.16 or 3.18 timeframe, the 
problem didn't occur at all, so I thought it must have been fixed.  Now 
I'm reading that no, it's still triggering for many.

Anyway, you can add really small (256 MiB) mixed-mode dup btrfs to the 
list of btrfs where it is known to sometimes trigger, if that combination 
wasn't on the list already.

I've not had the problem occur on my other btrfs.

One thing that occurs to me is that given that it seems to be a 
relatively straightforward failure under certain conditions to allocate a 
new chunk, the fact that btrfs post 3.17 or whatever now cleans up empty 
chunks, should in turn mean that btrfs has to create new chunks to 
accommodate new or growing files much more often, which should mean that 
people run into this issue much more frequently as well... unless there's 
some other limiting characteristic that keeps it from happening in the 
same proportion of chunk creations now, that it did back when empty 
chunks were kept around and thus fewer chunk creations were needed.

Meanwhile, this case seems to have the additional complication of forcing 
the btrfs to read-only, something that doesn't seem to occur in many case 
and certainly didn't happen in mine.  With the USB resets and etc, it 
could be considered a different bug, but it could also be that they 
simply create an environment much more likely to trigger the bug than 
normally working hardware.  If so, it could be some clue to grasp at to 
try (again) to track this thing down.

Meanwhile(2), on a personal note, I'm not particularly happy to find that 
this bug still exists, and that my last /boot remake simply didn't 
trigger it for some reason, while the previous two did.  That means I 
have to look forward to the possibility of it happening again.  And I can 
tell you from experience, it's a pretty frustrating bug, particularly 
when you're copying from a btrfs of the exact same size and 
configuration, so you KNOW the files fit!

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 14:04                                 ` cheater00 .
@ 2016-01-12  2:18                                   ` Duncan
  0 siblings, 0 replies; 55+ messages in thread
From: Duncan @ 2016-01-12  2:18 UTC (permalink / raw)
  To: linux-btrfs

cheater00 . posted on Mon, 11 Jan 2016 15:04:42 +0100 as excerpted:

> I noticed that every time Data gets bumped, it only gets bumped by a
> couple GB. I rarely ever store files on that disk that are larger than 2
> GB, But the last time it crashed, I was moving a file that was 4.3 GB,
> so maybe that's conductive to the crash happening? Maybe the file being
> larger than what btrfs would allocate has something to do with this. I
> will keep track of the amount of data since last crash, and the file
> size when the crash occured.

See my just completed post to a different subthread of this now rather 
large thread, where I describe my experience with this bug.  The bug 
seems to be related to chunk creation, and those chunks are 1-10 GiB in 
size, depending on factors such as the size of the filesystem, chunk mode 
(raidx vs single or dup), etc.

Assuming your filesystem is creating 2 GiB data chunks and that existing 
chunks are close to full, a 4.3 GiB file would likely force creation of 
two such chunks, which, other factors being equal, would give it twice 
the chance of failure as attempting to copy a smaller file that only 
forces creation of one such chunk.  Similarly, attempting to copy 
(perhaps using --reflink=never or copying in from a different filesystem 
to insure that it's actually copying the data, not just reflinking it) a 
really large file of 10+ GiB would be several times more likely to fail, 
while attempting to copy files of under say half a GiB in size would most 
of the time succeed, since they'd be rather unlikely to force creation of 
a new data chunk.

[And again, as requested elsewhere, please reply inline in quote-context, 
under the part of the quote providing the context for what you're 
replying to, instead of above, out of context.  It makes both reading in 
context, and further replies in context, /so/ much easier.]

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-01-11 13:45                               ` cheater00 .
  2016-01-11 14:04                                 ` cheater00 .
@ 2016-08-04 16:53                                 ` Lutz Vieweg
  2016-08-04 20:30                                   ` Chris Murphy
  2016-08-05 20:03                                   ` Gabriel C
  1 sibling, 2 replies; 55+ messages in thread
From: Lutz Vieweg @ 2016-08-04 16:53 UTC (permalink / raw)
  To: cheater00 .,
	Austin S. Hemmelgarn, Hugo Mills, Chris Murphy, Btrfs BTRFS,
	russell

Hi,

I was today hit by what I think is probably the same bug:
A btrfs on a close-to-4TB sized block device, only half filled
to almost exactly 2 TB, suddenly says "no space left on device"
upon any attempt to write to it. The filesystem was NOT automatically
switched to read-only by the kernel, I should mention.

Re-mounting (which is a pain as this filesystem is used for
$HOMEs of a multitude of active users who I have to kick from
the server for doing things like re-mounting) removed the symptom
for now, but from what I can read in linux-btrfs mailing list
archives, it pretty likely the symptom will re-appear.

Here are some more details:

Software versions:
> linux-4.6.1 (vanilla from kernel.org)
> btrfs-progs v4.1

Info obtained while the symptom occured (before re-mount):
> > btrfs filesystem show /data3
> Label: 'data3'  uuid: f4c69d29-62ac-4e15-a825-c6283c8fd74c
>         Total devices 1 FS bytes used 2.05TiB
>         devid    1 size 3.64TiB used 2.16TiB path /dev/mapper/cryptedResourceData3

(/dev/mapper/cryptedResourceData3 is a dm-crypt device,
which is based on a DRBD block device, which is based
on locally attached SATA disks on two servers - no trouble
with that setup for years, no I/O-errors or such, same
kind of block-device stack also used for another btrfs
and some XFS filesystems.)

> > btrfs filesystem df /data3
> Data, single: total=2.11TiB, used=2.01TiB
> System, single: total=4.00MiB, used=256.00KiB
> Metadata, single: total=48.01GiB, used=36.67GiB
> GlobalReserve, single: total=512.00MiB, used=5.52MiB

Currently and at the time the bug occured no snapshots existed
on "/data3". A snapshot is created once per night, a backup
created, then the snapshot is removed again.
There is lots of mixed I/O-activity during the day, both from interactive
users and from automatic build processes and such.

dmesg output from the time the "no space left on device"-symptom
appeared:

> [5171203.601620] WARNING: CPU: 4 PID: 23208 at fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x263/0x2a0 [btrfs]
> [5171203.602719] Modules linked in: dm_snapshot dm_bufio fuse btrfs xor raid6_pq nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter drbd lru_cache bridge stp llc kvm_amd kvm irqbypass ghash_clmulni_intel amd64_edac_mod ses edac_mce_amd enclosure edac_core sp5100_tco pcspkr k10temp fam15h_power sg i2c_piix4 shpchp acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c dm_crypt mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe crct10dif_pclmul crc32_pclmul crc32c_intel igb ahci libahci aesni_intel glue_helper libata lrw gf128mul ablk_helper mdio cryptd ptp serio_raw i2c_algo_bit pps_core i2c_core dca sd_mod dm_mirror dm_region_hash dm_log dm_mod
...
> [5171203.617358] Call Trace:
> [5171203.618543]  [<ffffffff812faac1>] dump_stack+0x4d/0x6c
> [5171203.619568]  [<ffffffff8106baf3>] __warn+0xe3/0x100
> [5171203.620660]  [<ffffffff8106bc2d>] warn_slowpath_null+0x1d/0x20
> [5171203.621779]  [<ffffffffa0728a03>] btrfs_destroy_inode+0x263/0x2a0 [btrfs]
> [5171203.622716]  [<ffffffff812090bb>] destroy_inode+0x3b/0x60
> [5171203.623774]  [<ffffffff812091fc>] evict+0x11c/0x180
...
> [5171230.306037] WARNING: CPU: 18 PID: 12656 at fs/btrfs/extent-tree.c:4233 btrfs_free_reserved_data_space_noquota+0xf3/0x100 [btrfs]
> [5171230.310298] Modules linked in: dm_snapshot dm_bufio fuse btrfs xor raid6_pq nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter drbd lru_cache bridge stp llc kvm_amd kvm irqbypass ghash_clmulni_intel amd64_edac_mod ses edac_mce_amd enclosure edac_core sp5100_tco pcspkr k10temp fam15h_power sg i2c_piix4 shpchp acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c dm_crypt mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe crct10dif_pclmul crc32_pclmul crc32c_intel igb ahci libahci aesni_intel glue_helper libata lrw gf128mul ablk_helper mdio cryptd ptp serio_raw i2c_algo_bit pps_core i2c_core dca sd_mod dm_mirror dm_region_hash dm_log dm_mod
...
> [5171230.341755] Call Trace:
> [5171230.344119]  [<ffffffff812faac1>] dump_stack+0x4d/0x6c
> [5171230.346444]  [<ffffffff8106baf3>] __warn+0xe3/0x100
> [5171230.348709]  [<ffffffff8106bc2d>] warn_slowpath_null+0x1d/0x20
> [5171230.350976]  [<ffffffffa06fb863>] btrfs_free_reserved_data_space_noquota+0xf3/0x100 [btrfs]
> [5171230.353212]  [<ffffffffa071b10f>] btrfs_clear_bit_hook+0x27f/0x350 [btrfs]
> [5171230.355392]  [<ffffffffa073527a>] ? free_extent_state+0x1a/0x20 [btrfs]
> [5171230.357556]  [<ffffffffa0735bd6>] clear_state_bit+0x66/0x1d0 [btrfs]
> [5171230.359698]  [<ffffffffa0735f64>] __clear_extent_bit+0x224/0x3a0 [btrfs]
> [5171230.361810]  [<ffffffffa06f7e35>] ? btrfs_update_reserved_bytes+0x45/0x130 [btrfs]
> [5171230.363960]  [<ffffffffa0736cfa>] extent_clear_unlock_delalloc+0x7a/0x2d0 [btrfs]
> [5171230.366079]  [<ffffffff811c9ccd>] ? kmem_cache_alloc+0x17d/0x1f0
> [5171230.368204]  [<ffffffffa0732243>] ? __btrfs_add_ordered_extent+0x43/0x310 [btrfs]
> [5171230.370350]  [<ffffffffa07323fb>] ? __btrfs_add_ordered_extent+0x1fb/0x310 [btrfs]
> [5171230.372491]  [<ffffffffa071e89a>] cow_file_range+0x28a/0x460 [btrfs]
> [5171230.374636]  [<ffffffffa071f712>] run_delalloc_range+0x102/0x3e0 [btrfs]
> [5171230.376785]  [<ffffffffa073754c>] writepage_delalloc.isra.40+0x10c/0x170 [btrfs]
> [5171230.378941]  [<ffffffffa0738729>] __extent_writepage+0xd9/0x2e0 [btrfs]
> [5171230.381107]  [<ffffffffa0738bd3>] extent_write_cache_pages.isra.36.constprop.60+0x2a3/0x450 [btrfs]
> [5171230.383319]  [<ffffffffa071a225>] ? btrfs_submit_bio_hook+0xe5/0x1c0 [btrfs]
> [5171230.385542]  [<ffffffffa0719020>] ? btrfs_fiemap+0x70/0x70 [btrfs]
> [5171230.387764]  [<ffffffffa0718770>] ? btrfs_init_inode_security+0x60/0x60 [btrfs]
> [5171230.390010]  [<ffffffffa073a9cb>] extent_writepages+0x5b/0x90 [btrfs]
> [5171230.392270]  [<ffffffffa071c240>] ? btrfs_submit_direct+0x840/0x840 [btrfs]
> [5171230.394545]  [<ffffffffa0718ba8>] btrfs_writepages+0x28/0x30 [btrfs]
> [5171230.396796]  [<ffffffff8117526e>] do_writepages+0x1e/0x30
> [5171230.399041]  [<ffffffff8121bd55>] __writeback_single_inode+0x45/0x320
> [5171230.401289]  [<ffffffff8121c566>] writeback_sb_inodes+0x266/0x550
> [5171230.403542]  [<ffffffff8121c8dc>] __writeback_inodes_wb+0x8c/0xc0
> [5171230.405799]  [<ffffffff8121cb89>] wb_writeback+0x279/0x310
> [5171230.408047]  [<ffffffff8121d32a>] wb_workfn+0x22a/0x3f0
> [5171230.410274]  [<ffffffff81084f57>] process_one_work+0x147/0x3f0
> [5171230.412506]  [<ffffffff81085324>] worker_thread+0x124/0x480
> [5171230.414686]  [<ffffffff8167ff4c>] ? __schedule+0x29c/0x8b0
> [5171230.416807]  [<ffffffff81085200>] ? process_one_work+0x3f0/0x3f0
> [5171230.418875]  [<ffffffff8108b0c8>] kthread+0xd8/0xf0
> [5171230.420880]  [<ffffffff81684052>] ret_from_fork+0x22/0x40
...
> [5255978.507519] WARNING: CPU: 8 PID: 2678 at fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x263/0x2a0 [btrfs]
...
> [5255978.538796] Call Trace:
> [5255978.541263]  [<ffffffff812faac1>] dump_stack+0x4d/0x6c
> [5255978.543762]  [<ffffffff8106baf3>] __warn+0xe3/0x100
> [5255978.546192]  [<ffffffff8106bc2d>] warn_slowpath_null+0x1d/0x20
> [5255978.548639]  [<ffffffffa0728a03>] btrfs_destroy_inode+0x263/0x2a0 [btrfs]
> [5255978.551044]  [<ffffffff812090bb>] destroy_inode+0x3b/0x60
> [5255978.553435]  [<ffffffff812091fc>] evict+0x11c/0x180
...
> 5256861.709481] WARNING: CPU: 16 PID: 21344 at fs/btrfs/extent-tree.c:4233 btrfs_free_reserved_data_space_noquota+0xf3/0x100 [btrfs]
...

Then, as I did "umount /data3":
> [5335831.813498] WARNING: CPU: 5 PID: 16957 at fs/btrfs/extent-tree.c:5436 btrfs_free_block_groups+0x30b/0x3b0 [btrfs]
...
> [5335831.835835] Call Trace:
> [5335831.837585]  [<ffffffff812faac1>] dump_stack+0x4d/0x6c
> [5335831.839335]  [<ffffffff8106baf3>] __warn+0xe3/0x100
> [5335831.841063]  [<ffffffff8106bc2d>] warn_slowpath_null+0x1d/0x20
> [5335831.842818]  [<ffffffffa070018b>] btrfs_free_block_groups+0x30b/0x3b0 [btrfs]
> [5335831.844576]  [<ffffffffa0711d7c>] close_ctree+0x18c/0x390 [btrfs]
> [5335831.846309]  [<ffffffff8120a07b>] ? evict_inodes+0x13b/0x160
> [5335831.848051]  [<ffffffffa06df279>] btrfs_put_super+0x19/0x20 [btrfs]
> [5335831.849781]  [<ffffffff811f043f>] generic_shutdown_super+0x6f/0xf0
> [5335831.851507]  [<ffffffff811f10b2>] kill_anon_super+0x12/0x20
> [5335831.853246]  [<ffffffffa06e5128>] btrfs_kill_super+0x18/0x110 [btrfs]
> [5335831.854979]  [<ffffffff811effe1>] deactivate_locked_super+0x51/0x90
> [5335831.856714]  [<ffffffff811f0066>] deactivate_super+0x46/0x60
> [5335831.858447]  [<ffffffff8120d06f>] cleanup_mnt+0x3f/0x80
> [5335831.860178]  [<ffffffff8120d102>] __cleanup_mnt+0x12/0x20
> [5335831.861908]  [<ffffffff81089683>] task_work_run+0x83/0xa0
> [5335831.863633]  [<ffffffff81066814>] exit_to_usermode_loop+0x6d/0x96
> [5335831.865361]  [<ffffffff81002d2d>] do_syscall_64+0xfd/0x110
> [5335831.867088]  [<ffffffff81683efc>] entry_SYSCALL64_slow_path+0x25/0x25
...
> [5335831.872552] WARNING: CPU: 5 PID: 16957 at fs/btrfs/extent-tree.c:5437 btrfs_free_block_groups+0x3a5/0x3b0 [btrfs]
...
> [5335831.896906] Call Trace:
> [5335831.898668]  [<ffffffff812faac1>] dump_stack+0x4d/0x6c
> [5335831.900414]  [<ffffffff8106baf3>] __warn+0xe3/0x100
> [5335831.902117]  [<ffffffff8106bc2d>] warn_slowpath_null+0x1d/0x20
> [5335831.903805]  [<ffffffffa0700225>] btrfs_free_block_groups+0x3a5/0x3b0 [btrfs]
> [5335831.905467]  [<ffffffffa0711d7c>] close_ctree+0x18c/0x390 [btrfs]
> [5335831.907067]  [<ffffffff8120a07b>] ? evict_inodes+0x13b/0x160
> [5335831.908642]  [<ffffffffa06df279>] btrfs_put_super+0x19/0x20 [btrfs]
> [5335831.910169]  [<ffffffff811f043f>] generic_shutdown_super+0x6f/0xf0
> [5335831.911680]  [<ffffffff811f10b2>] kill_anon_super+0x12/0x20
> [5335831.913186]  [<ffffffffa06e5128>] btrfs_kill_super+0x18/0x110 [btrfs]
> [5335831.914664]  [<ffffffff811effe1>] deactivate_locked_super+0x51/0x90
> [5335831.916152]  [<ffffffff811f0066>] deactivate_super+0x46/0x60
> [5335831.917633]  [<ffffffff8120d06f>] cleanup_mnt+0x3f/0x80
> [5335831.919094]  [<ffffffff8120d102>] __cleanup_mnt+0x12/0x20
> [5335831.920549]  [<ffffffff81089683>] task_work_run+0x83/0xa0
> [5335831.922004]  [<ffffffff81066814>] exit_to_usermode_loop+0x6d/0x96
> [5335831.923465]  [<ffffffff81002d2d>] do_syscall_64+0xfd/0x110
> [5335831.924917]  [<ffffffff81683efc>] entry_SYSCALL64_slow_path+0x25/0x25

After re-mounting, so far the only messages to "dmesg" were:
> [5335997.784884] BTRFS info (device dm-7): disk space caching is enabled
> [5335997.787007] BTRFS: has skinny extents

Does it make sense to amend
> https://bugzilla.kernel.org/show_bug.cgi?id=74101
- that report doesn't look like it sparked attention, right?

The amount of threads on "lost or unused free space" without resolutions
in the btrfs mailing list archive is really frightening. If these
symptoms commonly re-appear with no fix in sight, I'm afraid I'll have
to either resort to using XFS (with ugly block-device based snapshots
for backup) or try my luck with OpenZFS :-(

Regards,

Lutz Vieweg




On 01/11/2016 02:45 PM, cheater00 . wrote:
> After remounting, the bug doesn't transpire any more, Data gets resized.
>
> It is my experience that this bug will go untriggered for weeks at a
> time until I write a lot to that disk there, at which point it'll
> happen very quickly. I believe this has more to do with the amount of
> data that's been written to disk than anything else. It has been about
> 48 GB to trigger the last instance and I don't think that's very
> different from what happened before but I didn't keep track exactly.
>
> On Mon, Jan 11, 2016 at 2:30 PM, cheater00 . <cheater00@gmail.com> wrote:
>> The bug just happened again. Attached is a log since the time I
>> mounted the FS right after the fsck.
>>
>> Note the only things between the message I got while mounting:
>> [216798.144518] BTRFS info (device sdc1): disk space caching is enabled
>>
>> and the beginning of the crash dump:
>> [241534.760651] ------------[ cut here ]------------
>>
>> is this:
>> [218266.098344] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
>> [233647.332085] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
>>
>> I am not sure why those resets happen, though. I bought a few cables
>> and experimented with them, and the usb ports themselves are located
>> directly on the motherboard.
>> Also, they happened some considerable time before the crash dump. So
>> I'm not sure they're even related. Especially given that I was copying
>> a lot of very small files, and they all copied onto the disk fine all
>> the time between the last usb reset and the crash dump, which is
>> roughly two and a half hours. In fact I pressed ctrl-z on a move
>> operation and then wrote something like sleep $(echo '60*60*3' | bc) ;
>> fg and ran it just past 9 am, so the mv resumed past 12 pm, so as
>> things add up the last usb reset happened even before the mv was
>> resumed with fg.
>>
>> I unmounted the fs and re-mounted the it to make it writeable again.
>> This showed up in dmesg:
>>
>> [241766.485365] BTRFS error (device sdc1): cleaner transaction attach
>> returned -30
>> [241770.115897] BTRFS info (device sdc1): disk space caching is enabled
>>
>> this time there was no "info" line about the free space cache file. So
>> maybe it wasn't important for the bug to occur at all.
>>
>> The new output of btrfs fi df -g is:
>> Data, single: total=2080.01GiB, used=2078.80GiB
>> System, DUP: total=0.01GiB, used=0.00GiB
>> System, single: total=0.00GiB, used=0.00GiB
>> Metadata, DUP: total=5.50GiB, used=3.73GiB
>> Metadata, single: total=0.01GiB, used=0.00GiB
>> GlobalReserve, single: total=0.50GiB, used=0.00GiB
>>
>> I could swap this disk onto sata and the other disk back onto usb to
>> see if the usb resets have anything to do with this. But I'm skeptic.
>> Also maybe btrfs has some other issues related to just the disk being
>> on usb, resets or not, and this way if the bug doesn't trigger on sata
>> we'll think "aha it was the resets, buggy hardware etc" but instead
>> it'll have been something else that just has to do with the disk being
>> on usb operating normally.
>>
>> On Mon, Jan 11, 2016 at 2:11 PM, cheater00 . <cheater00@gmail.com> wrote:
>>> On Mon, Jan 11, 2016 at 2:05 PM, Austin S. Hemmelgarn
>>> <ahferroin7@gmail.com> wrote:
>>>> On 2016-01-09 16:07, cheater00 . wrote:
>>>>>
>>>>> Would like to point out that this can cause data loss. If I'm writing
>>>>> to disk and the disk becomes unexpectedly read only - that data will
>>>>> be lost, because who in their right mind makes their code expect this
>>>>> and builds a contingency (e.g. caching, backpressure, etc)...
>>>>
>>>> If a data critical application (mail server, database server, anything
>>>> similar) can't gracefully handle ENOSPC, then that application is broken,
>>>> not the FS.  As an example, set up a small VM with an SMTP server, then
>>>> force the FS the server uses for queuing mail read-only, and see if you can
>>>> submit mail, then go read the RFCs for SMTP and see what clients are
>>>> supposed to do when they can't submit mail.  A properly designed piece of
>>>> software is supposed to be resilient against common failure modes of the
>>>> resources it depends on (which includes ENOSPC and read-only filesystems for
>>>> anything that works with data on disk).
>>>>>
>>>>>
>>>>> There's no loss of data on the disk because the data doesn't make it
>>>>> to disk in the first place. But it's exactly the same as if the data
>>>>> had been written to disk, and then lost.
>>>>>
>>>> No, it isn't.  If you absolutely need the data on disk, you should be
>>>> calling fsync or fdatasync, and then assuming if those return an error that
>>>> none of the data written since the last call has gotten to the disk (some of
>>>> it might have, but you need to assume it hasn't).  Every piece of software
>>>> in wide usage that requires data to be on the disk does this, because
>>>> otherwise it can't guarantee that the data is on disk.
>>>
>>> I agree that a lot of stuff goes right in a perfect world. But most of
>>> the time what you're running isn't a mail server used by billions of
>>> users, but instead a bash script someone wrote once that's supposed to
>>> do something, and no one knows how it works.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-08-04 16:53                                 ` Lutz Vieweg
@ 2016-08-04 20:30                                   ` Chris Murphy
  2016-08-05 10:56                                     ` Lutz Vieweg
  2016-08-05 20:03                                   ` Gabriel C
  1 sibling, 1 reply; 55+ messages in thread
From: Chris Murphy @ 2016-08-04 20:30 UTC (permalink / raw)
  To: Lutz Vieweg
  Cc: cheater00 .,
	Austin S. Hemmelgarn, Hugo Mills, Chris Murphy, Btrfs BTRFS,
	Russell Coker

On Thu, Aug 4, 2016 at 10:53 AM, Lutz Vieweg <lvml@5t9.de> wrote:

> The amount of threads on "lost or unused free space" without resolutions
> in the btrfs mailing list archive is really frightening. If these
> symptoms commonly re-appear with no fix in sight, I'm afraid I'll have
> to either resort to using XFS (with ugly block-device based snapshots
> for backup) or try my luck with OpenZFS :-(

Keep in mind the list is rather self-selecting for problems. People
who aren't having problems are unlikely to post their non-problems to
the list.

It'll be interesting to see what other suggestions you get, but I see
it as basically three options in order of increasing risk+effort.

a. Try the clear_cache mount option (one time) and let the file system
stay mounted so the cache is recreated. If the problem happens soon
after again, try nospace_cache. This might buy you time before 4.8 is
out, which has a bunch of new enospc code in it.

b. Recreate the file system. For reasons not well understood, some
file systems just get stuck in this state with bogus enospc claims.

c. Take some risk and use 4.8 rc1 once it's out. Just make sure to
keep backups. I have no idea to what degree the new enospc code can
help well used existing systems already having enospc issues, vs the
code prevents the problem from happening in the first place. So you
may end up at b. anyway.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-08-04 20:30                                   ` Chris Murphy
@ 2016-08-05 10:56                                     ` Lutz Vieweg
  2016-08-05 12:12                                       ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 55+ messages in thread
From: Lutz Vieweg @ 2016-08-05 10:56 UTC (permalink / raw)
  To: linux-btrfs

On 08/04/2016 10:30 PM, Chris Murphy wrote:
> Keep in mind the list is rather self-selecting for problems. People
> who aren't having problems are unlikely to post their non-problems to
> the list.

True, but the number of people inclined to post a bug report to
the list is also a lot smaller than the number of people who
experienced problems.

Personally, I know at least 2 Linux users who happened to
get a btrfs filesystem as part of upgrading to a newer Suse
distribution on their PC, and both of them experienced
trouble with their filesystems that caused them to re-install
without using btrfs. They weren't interested in what filesystem
they use enough to bother investigating what happened
in detail or to issue bug-reports.

I'm afraid that btrfs' reputation has already taken damage
from the combination of "early deployment as a root filesystem
to unsuspecting users" and "being at a development stage where
users are likely to experience trouble at some time".

> c. Take some risk and use 4.8 rc1 once it's out. Just make sure to
> keep backups.

We sure do - actually, the possibility to "run daily backups from a
snapshot while write performance remains acceptable" is the one and
only reason for me to use btrfs rather than xfs for those $HOME dirs.
In every other aspect (stability, performance, suitability for
storing VM-images or database-files) xfs wins for me.
And the btrfs advantage "file system based snapshot being more
performant than block device based snapshot" may fade away
with the replacement of magnetic disks with SSDs in the long run.

Regards,

Lutz Vieweg

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-08-05 10:56                                     ` Lutz Vieweg
@ 2016-08-05 12:12                                       ` Austin S. Hemmelgarn
  2016-08-05 13:14                                         ` Lutz Vieweg
  0 siblings, 1 reply; 55+ messages in thread
From: Austin S. Hemmelgarn @ 2016-08-05 12:12 UTC (permalink / raw)
  To: Lutz Vieweg, linux-btrfs

On 2016-08-05 06:56, Lutz Vieweg wrote:
> On 08/04/2016 10:30 PM, Chris Murphy wrote:
>> Keep in mind the list is rather self-selecting for problems. People
>> who aren't having problems are unlikely to post their non-problems to
>> the list.
>
> True, but the number of people inclined to post a bug report to
> the list is also a lot smaller than the number of people who
> experienced problems.
>
> Personally, I know at least 2 Linux users who happened to
> get a btrfs filesystem as part of upgrading to a newer Suse
> distribution on their PC, and both of them experienced
> trouble with their filesystems that caused them to re-install
> without using btrfs. They weren't interested in what filesystem
> they use enough to bother investigating what happened
> in detail or to issue bug-reports.
>
> I'm afraid that btrfs' reputation has already taken damage
> from the combination of "early deployment as a root filesystem
> to unsuspecting users" and "being at a development stage where
> users are likely to experience trouble at some time".
FWIW, the 'early deployment' thing is an issue of the distributions 
themselves, and most people who have come to me personally complaining 
about BTRFS have understood this after I explained it to them.

As far as the rest, it's hit or miss whether you have issues.  I've been 
using BTRFS on all my personal systems since about 3.14, and have had 
zero issues with data loss or filesystem corruption (or horrible 
performance) since about 3.18 that were actually BTRFS issues (it's 
helped me ID a lot of marginal hardware though), and in fact, I had more 
issues trying to use ZFS for a year than I've had in the now multiple 
years of using BTRFS, and in the case of BTRFS, I was actually able to 
fix things.  I know quite a few people (and a number of big companies 
for that matter) who have been running BTRFS for longer and had fewer 
issues too.  The biggest issue is that the risks involved aren't well 
characterized, although most filesystems have that same issue.

If you stick to single disk or raid1 mode, don't use quota groups (which 
at least SUSE does by default now), stick to reasonably sized 
filesystems (not more than a few TB), and avoid a couple of specific 
unconventional storage configurations below it, BTRFS works fine.  The 
whole issue with databases is often a non-issue for desktop users in my 
experience, and if you think VM image performance is bad, you should 
really be looking at using real block storage instead of a file 
(seriously, this will usually get you a bigger performance boost than 
using ext4 or XFS over BTRFS as an underlying filesystem will).
>
>> c. Take some risk and use 4.8 rc1 once it's out. Just make sure to
>> keep backups.
>
> We sure do - actually, the possibility to "run daily backups from a
> snapshot while write performance remains acceptable" is the one and
> only reason for me to use btrfs rather than xfs for those $HOME dirs.
> In every other aspect (stability, performance, suitability for
> storing VM-images or database-files) xfs wins for me.
> And the btrfs advantage "file system based snapshot being more
> performant than block device based snapshot" may fade away
> with the replacement of magnetic disks with SSDs in the long run.
I'm going to respond to the two parts of this separately:
1. As far as snapshot performance, you'd be surprised. I've got pretty 
good consumer grade SSD's that can do a sustained 250MB/s write speed, 
which means that to be as fast as a snapshot, the data set would have to 
be less than 25MB (and that's being generous, snapshots usually take 
less than 0.1s to create on my system).  Where the turnover point occurs 
varies of course based on storage bandwidth, but I don't see it being 
very likely that SSD's will obsolete snapshotting any time soon.  Even 
if disks suddenly get the ability to run at full bandwidth of the link 
they're on, a SAS3 disk (12Gbit/s signaling, practical bandwidth of 
about 1GB/s) would have a turn over point of about 100MB, and a NVMe 
device on a PCIe 4.0 X16 link (3.151GB/s theoretical bandwidth) would 
have a turn over point of 3.1GB.  In theory, a high-end NVDIMM might be 
able to do better than a snapshot, but it probably couldn't get much 
faster right now than twice the speed of a PCIe 4.0 X16 link, which 
means that it would likely have a turn over point of about 6.2GB.  In 
comparison, it's not unusual to need a snapshot of a data set in excess 
of a terabyte in size.
2. As far as snapshots being the only advantage of BTRFS, that's just 
bogus. XFS does have metadata checksumming now, but that provides no 
protection for data, just metadata.  XFS also doesn't have transparent 
compression support, filesystems can't be shrunk, and it stores no 
backups of any metadata except super-blocks.  While the compression and 
filesystem shrinking may not be needed in your use case, the data 
integrity features are almost certainly an advantage.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-08-05 12:12                                       ` Austin S. Hemmelgarn
@ 2016-08-05 13:14                                         ` Lutz Vieweg
  0 siblings, 0 replies; 55+ messages in thread
From: Lutz Vieweg @ 2016-08-05 13:14 UTC (permalink / raw)
  To: linux-btrfs

On 08/05/2016 02:12 PM, Austin S. Hemmelgarn wrote:

 > If you stick to single disk

We do, all our btrfs filesystems reside on one single block device,
redundancy is provided by a DRBD layer below.

 > don't use quota groups

We don't use any quotas.

 > stick to reasonably sized filesystems (not more than a few TB)

We do, currently 4 TB max, because that's the only way to utilize
different physical storage devices for different filesystem instances
such that we can backup them in parallel within reasonable time.

 > and avoid a couple of specific unconventional storage configurations below it

Configurations like what?

 > The whole issue with
 > databases is often a non-issue for desktop users in my experience

Well, try a "cat" on a sqlite file that has been used by some ordinary
desktop software (like a browser) for a year - and you'll experience
horrible performance, due to the extreme amount of fragments.

(Having to manually "de-fragment" a filesystem periodically is something
that I had considered a thing of the past when I started using BSD's hfs
instead of the Amiga FFS in the late 1980s... ;-)

 > and if you think VM image
 > performance is bad, you should really be looking at using real block storage instead of a file
 > (seriously, this will usually get you a bigger performance boost than using ext4 or XFS
 > over BTRFS as an underlying filesystem will).

Sure, assigning block devices to each VM would be even better, but
also much less convenient for operations. It's a feature here that any
user can start a new VM instance (without root privileges) at any
time, and that the images used by those VMs are part of the incremental
backup that stores only differences, not "whole files that have been changed".

 >> We sure do - actually, the possibility to "run daily backups from a
 >> snapshot while write performance remains acceptable" is the one and
 >> only reason for me to use btrfs rather than xfs for those $HOME dirs.
 >> In every other aspect (stability, performance, suitability for
 >> storing VM-images or database-files) xfs wins for me.
 >> And the btrfs advantage "file system based snapshot being more
 >> performant than block device based snapshot" may fade away
 >> with the replacement of magnetic disks with SSDs in the long run.
 > I'm going to respond to the two parts of this separately:
 > 1. As far as snapshot performance, you'd be surprised. I've got pretty good consumer grade SSD's
 > that can do a sustained 250MB/s write speed, which means that to be as fast as a snapshot,
 > the data set would have to be less than 25MB

No, I'm talking about LVM snapshots, which utilitze Copy-On-Write
on the block device level. Creating such an LVM snapshot is
as quick as creating a btrfs snapshot, regardless of the size.
The only significant draw-back of the LVM snapshot is that whenever
data is written to the filesystem, that causes copy operations from
one part of the (currently magnetic) storage to another part, and
that seriously hurts the write performance.

(Of course, it would not be a reasonable option to take a block device
snapshot by first copying all the data on it.)

 > 2. As far as snapshots being the only advantage of BTRFS, that's just bogus.
 > XFS does have metadata checksumming now, but that provides no protection for
 > data, just metadata.

We check for bit-rot on the block device level, DRBD verifies the integrity
of the data by reading from both redundant storage devices and comparing the
checksums, periodically every week.

So far, we never encountered a single bit-rot error, even though the underlying
physical storage devices are "cheap SATA disks".

 > XFS also doesn't have transparent compression support

I have no use for that. Disk space is relatively cheap, cheap enough
that we don't bother with RAID-5 or such, but use the "full redundancy"
provided by a shared-nothing DRBD setup.

 > filesystems can't be shrunk

I enlarged XFS filesystems multiple times while in use, which worked well.
I never had to shrink a filesystem, and I cannot imagine how such a use case
could occur to me.

 > and it stores no backups of any metadata except super-blocks.

Which is fine with me, as redundancy is provided on the block device level
by DRBD.

 > While the compression and filesystem shrinking may not be needed in
 > your use case, the data integrity features are almost certainly an advantage.

Btrfs sure has some nifty features, and I understand that for some
stuff like "subvolumes" or "deduplication" are important.

But a hundred great features cannot make up for a lack of stability,
therefore I would love to see those ENOSPC-related issues to
be resolved rather than more fancy features being built :-)

Regards,

Lutz Vieweg

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-08-04 16:53                                 ` Lutz Vieweg
  2016-08-04 20:30                                   ` Chris Murphy
@ 2016-08-05 20:03                                   ` Gabriel C
  2016-08-25 15:48                                     ` Lutz Vieweg
  1 sibling, 1 reply; 55+ messages in thread
From: Gabriel C @ 2016-08-05 20:03 UTC (permalink / raw)
  To: Lutz Vieweg, cheater00 .,
	Austin S. Hemmelgarn, Hugo Mills, Chris Murphy, Btrfs BTRFS,
	russell


On 04.08.2016 18:53, Lutz Vieweg wrote:
> 
> I was today hit by what I think is probably the same bug:
> A btrfs on a close-to-4TB sized block device, only half filled
> to almost exactly 2 TB, suddenly says "no space left on device"
> upon any attempt to write to it. The filesystem was NOT automatically
> switched to read-only by the kernel, I should mention.
> 
> Re-mounting (which is a pain as this filesystem is used for
> $HOMEs of a multitude of active users who I have to kick from
> the server for doing things like re-mounting) removed the symptom
> for now, but from what I can read in linux-btrfs mailing list
> archives, it pretty likely the symptom will re-appear.
> 
> Here are some more details:
> 
> Software versions:
>> linux-4.6.1 (vanilla from kernel.org)
...
> 
> dmesg output from the time the "no space left on device"-symptom
> appeared:
> 
>> [5171203.601620] WARNING: CPU: 4 PID: 23208 at fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x263/0x2a0 [btrfs]

....
> ...
>> [5171230.306037] WARNING: CPU: 18 PID: 12656 at fs/btrfs/extent-tree.c:4233 btrfs_free_reserved_data_space_noquota+0xf3/0x100 [btrfs]


Sounds like the bug I hit too also ..

To fix this you'll need :


crazy@zwerg:~/Work/linux-git$ git show 8b8b08cbf
commit 8b8b08cbfb9021af4b54b4175fc4c51d655aac8c
Author: Chris Mason <clm@fb.com>
Date:   Tue Jul 19 05:52:36 2016 -0700

    Btrfs: fix delalloc accounting after copy_from_user faults

    Commit 56244ef151c3cd11 was almost but not quite enough to fix the
    reservation math after btrfs_copy_from_user returned partial copies.

    Some users are still seeing warnings in btrfs_destroy_inode, and with a
    long enough test run I'm able to trigger them as well.

    This patch fixes the accounting math again, bringing it much closer to
    the way it was before the sectorsize conversion Chandan did.  The
    problem is accounting for the offset into the page/sector when we do a
    partial copy.  This one just uses the dirty_sectors variable which
    should already be updated properly.

    Signed-off-by: Chris Mason <clm@fb.com>
    cc: stable@vger.kernel.org # v4.6+

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f3f61d1..bcfb4a2 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1629,13 +1629,11 @@ again:
                 * managed to copy.
                 */
                if (num_sectors > dirty_sectors) {
-                       /*
-                        * we round down because we don't want to count
-                        * any partial blocks actually sent through the
-                        * IO machines
-                        */
-                       release_bytes = round_down(release_bytes - copied,
-                                     root->sectorsize);
+
+                       /* release everything except the sectors we dirtied */
+                       release_bytes -= dirty_sectors <<
+                               root->fs_info->sb->s_blocksize_bits;
+
                        if (copied > 0) {
                                spin_lock(&BTRFS_I(inode)->lock);
                                BTRFS_I(inode)->outstanding_extents++;

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem
  2016-08-05 20:03                                   ` Gabriel C
@ 2016-08-25 15:48                                     ` Lutz Vieweg
  0 siblings, 0 replies; 55+ messages in thread
From: Lutz Vieweg @ 2016-08-25 15:48 UTC (permalink / raw)
  To: linux-btrfs

On 08/05/2016 10:03 PM, Gabriel C wrote:
> On 04.08.2016 18:53, Lutz Vieweg wrote:
>>
>> I was today hit by what I think is probably the same bug:
>> A btrfs on a close-to-4TB sized block device, only half filled
>> to almost exactly 2 TB, suddenly says "no space left on device"
>> upon any attempt to write to it. The filesystem was NOT automatically
>> switched to read-only by the kernel, I should mention.
>>
>> Re-mounting (which is a pain as this filesystem is used for
>> $HOMEs of a multitude of active users who I have to kick from
>> the server for doing things like re-mounting) removed the symptom
>> for now, but from what I can read in linux-btrfs mailing list
>> archives, it pretty likely the symptom will re-appear.
>>
>> Here are some more details:
>>
>> Software versions:
>>> linux-4.6.1 (vanilla from kernel.org)
> ...
>>
>> dmesg output from the time the "no space left on device"-symptom
>> appeared:
>>
>>> [5171203.601620] WARNING: CPU: 4 PID: 23208 at fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x263/0x2a0 [btrfs]
>
> ....
>> ...
>>> [5171230.306037] WARNING: CPU: 18 PID: 12656 at fs/btrfs/extent-tree.c:4233 btrfs_free_reserved_data_space_noquota+0xf3/0x100 [btrfs]
>
> Sounds like the bug I hit too also ..
>
> To fix this you'll need :
>
> crazy@zwerg:~/Work/linux-git$ git show 8b8b08cbf
> commit 8b8b08cbfb9021af4b54b4175fc4c51d655aac8c
> Author: Chris Mason <clm@fb.com>
> Date:   Tue Jul 19 05:52:36 2016 -0700
>
>      Btrfs: fix delalloc accounting after copy_from_user faults

Thanks for this hint!

Yesterday (20 days after the first time this bug struck us, and after re-mounting
the filesystem) we were hit by the same bug again - twice! - once in the morning,
and again in the evening.

That called for immediate action, and short of reverting the whole setup to XFS,
installing a new kernel with the above (and other) btrfs fix(es) was the one thing
I could try.

The system is now running linux-4.7.2, which does contain those patches.
If that doesn't fix it, we're really running out of options.

Regards,

Lutz Vieweg



^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2016-08-25 16:22 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-30 21:44 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem cheater00 .
2015-12-30 22:13 ` Chris Murphy
2016-01-02  2:09   ` cheater00 .
2016-01-02  2:10     ` cheater00 .
     [not found]       ` <CA+9GZUiWQ2tAotFuq2Svkjnk+2Quz5B8UwZSSpm4SJfhqfoStQ@mail.gmail.com>
2016-01-07 21:55         ` Chris Murphy
     [not found]           ` <CA+9GZUjLcRnRX_mwO-McXWFd+G4o3jtBENMLnszg-rJTn6vL1w@mail.gmail.com>
     [not found]             ` <CAJCQCtRhYZi9nqWP_LYmZeg1yRQVkpnmUDQ-P5o1-gc-3w+Pdg@mail.gmail.com>
2016-01-09 20:00               ` cheater00 .
2016-01-09 20:26                 ` Hugo Mills
2016-01-09 20:59                   ` cheater00 .
2016-01-09 21:04                     ` Hugo Mills
2016-01-09 21:07                       ` cheater00 .
2016-01-09 21:15                         ` Hugo Mills
2016-01-10  3:59                           ` cheater00 .
2016-01-10  6:16                         ` Russell Coker
2016-01-10 22:24                           ` cheater00 .
2016-01-10 22:32                             ` Lionel Bouton
2016-01-11 13:05                         ` Austin S. Hemmelgarn
2016-01-11 13:11                           ` cheater00 .
2016-01-11 13:30                             ` cheater00 .
2016-01-11 13:45                               ` cheater00 .
2016-01-11 14:04                                 ` cheater00 .
2016-01-12  2:18                                   ` Duncan
2016-08-04 16:53                                 ` Lutz Vieweg
2016-08-04 20:30                                   ` Chris Murphy
2016-08-05 10:56                                     ` Lutz Vieweg
2016-08-05 12:12                                       ` Austin S. Hemmelgarn
2016-08-05 13:14                                         ` Lutz Vieweg
2016-08-05 20:03                                   ` Gabriel C
2016-08-25 15:48                                     ` Lutz Vieweg
2016-01-11 14:10                             ` Austin S. Hemmelgarn
2016-01-11 16:02                               ` cheater00 .
2016-01-11 16:33                                 ` cheater00 .
2016-01-11 20:29                                   ` Henk Slager
2016-01-12  1:16                                     ` Duncan
2016-01-11  0:13                       ` Chris Murphy
2016-01-11  9:03                         ` Hugo Mills
2016-01-11 13:04                           ` cheater00 .
2016-01-11 21:31                           ` Chris Murphy
2016-01-11 22:10                             ` Hugo Mills
2016-01-11 22:20                               ` Chris Murphy
2016-01-11 22:30                                 ` Hugo Mills
2016-01-11 22:39                                   ` Chris Murphy
2016-01-11 23:07                                     ` Hugo Mills
2016-01-11 23:12                                       ` cheater00 .
2016-01-11 23:05                                   ` cheater00 .
2016-01-12  2:05                                   ` Duncan
2016-01-11 22:57                             ` cheater00 .
2016-01-10 14:14                   ` Henk Slager
2016-01-10 23:47                     ` cheater00 .
2016-01-11  0:24                       ` Chris Murphy
2016-01-11  6:07                         ` cheater00 .
2016-01-11  6:24                           ` cheater00 .
2016-01-11  7:54                             ` cheater00 .
2016-01-12  0:35                               ` Duncan
2016-01-11 19:50                       ` Henk Slager
2016-01-11 23:03                         ` cheater00 .

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.