All of lore.kernel.org
 help / color / mirror / Atom feed
* fallocate does not prevent ENOSPC on write
@ 2019-04-22 21:09 Jakob Unterwurzacher
  2019-04-23  2:16 ` Qu Wenruo
  0 siblings, 1 reply; 22+ messages in thread
From: Jakob Unterwurzacher @ 2019-04-22 21:09 UTC (permalink / raw)
  To: linux-btrfs

I have a user who is reporting ENOSPC errors when running gocryptfs on
top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).

What is interesting is that the error gets thrown at write time. This
is not supposed to happen, because gocryptfs does

    fallocate(..., FALLOC_FL_KEEP_SIZE, ...)

before writing.

I wrote a minimal reproducer in C: https://github.com/rfjakob/fallocate_write
This is what it looks like on ext4:

    $ ../fallocate_write/fallocate_write
    reading from /dev/urandom
    writing to ./blob.379Q8P
    writing blocks of 132096 bytes each
    [...]
    fallocate failed: No space left on device

On btrfs, it will instead look like this:

    [...]
    pwrite failed: No space left on device

Is this a bug in btrfs' fallocate implementation or am I reading the
guarantees that fallocate gives me wrong?

Thanks!
Jakob

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-22 21:09 fallocate does not prevent ENOSPC on write Jakob Unterwurzacher
@ 2019-04-23  2:16 ` Qu Wenruo
  2019-04-23 11:33   ` David Sterba
  0 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2019-04-23  2:16 UTC (permalink / raw)
  To: Jakob Unterwurzacher, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1587 bytes --]



On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote:
> I have a user who is reporting ENOSPC errors when running gocryptfs on
> top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).
> 
> What is interesting is that the error gets thrown at write time. This
> is not supposed to happen, because gocryptfs does
> 
>     fallocate(..., FALLOC_FL_KEEP_SIZE, ...)
> 
> before writing.
> 
> I wrote a minimal reproducer in C: https://github.com/rfjakob/fallocate_write
> This is what it looks like on ext4:
> 
>     $ ../fallocate_write/fallocate_write
>     reading from /dev/urandom
>     writing to ./blob.379Q8P
>     writing blocks of 132096 bytes each
>     [...]
>     fallocate failed: No space left on device
> 
> On btrfs, it will instead look like this:
> 
>     [...]
>     pwrite failed: No space left on device
> 
> Is this a bug in btrfs' fallocate implementation or am I reading the
> guarantees that fallocate gives me wrong?

Since v4.7, this commit changed the how btrfs do NodataCOW check:
c6887cd11149 ("Btrfs: don't do nocow check unless we have to").

Before that commit, btrfs always check if they need to reserve space for
COW, while after that patch, btrfs never checks unless we have no space.

However this screws up other nodatacow space check.
And due to its age and deep changeset, it's pretty hard to fix it.
I have tried several times, but it will only cause more problems.

So I'm afraid it's a known problem and not something we can fix very soon.

Thanks,
Qu

> 
> Thanks!
> Jakob
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-23  2:16 ` Qu Wenruo
@ 2019-04-23 11:33   ` David Sterba
  2019-04-23 12:12     ` Qu Wenruo
  2019-04-25  5:49     ` Qu Wenruo
  0 siblings, 2 replies; 22+ messages in thread
From: David Sterba @ 2019-04-23 11:33 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Jakob Unterwurzacher, linux-btrfs

On Tue, Apr 23, 2019 at 10:16:32AM +0800, Qu Wenruo wrote:
> On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote:
> > I have a user who is reporting ENOSPC errors when running gocryptfs on
> > top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).
> > 
> > What is interesting is that the error gets thrown at write time. This
> > is not supposed to happen, because gocryptfs does
> > 
> >     fallocate(..., FALLOC_FL_KEEP_SIZE, ...)
> > 
> > before writing.
> > 
> > I wrote a minimal reproducer in C: https://github.com/rfjakob/fallocate_write
> > This is what it looks like on ext4:
> > 
> >     $ ../fallocate_write/fallocate_write
> >     reading from /dev/urandom
> >     writing to ./blob.379Q8P
> >     writing blocks of 132096 bytes each
> >     [...]
> >     fallocate failed: No space left on device
> > 
> > On btrfs, it will instead look like this:
> > 
> >     [...]
> >     pwrite failed: No space left on device
> > 
> > Is this a bug in btrfs' fallocate implementation or am I reading the
> > guarantees that fallocate gives me wrong?
> 
> Since v4.7, this commit changed the how btrfs do NodataCOW check:
> c6887cd11149 ("Btrfs: don't do nocow check unless we have to").
> 
> Before that commit, btrfs always check if they need to reserve space for
> COW, while after that patch, btrfs never checks unless we have no space.
> 
> However this screws up other nodatacow space check.
> And due to its age and deep changeset, it's pretty hard to fix it.
> I have tried several times, but it will only cause more problems.

What if the commit is reverted, if the problem is otherwise hard to fix?
This seems to break the semantics of fallocate so the performance should
not the main concern here.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-23 11:33   ` David Sterba
@ 2019-04-23 12:12     ` Qu Wenruo
  2019-04-23 14:50       ` Filipe Manana
  2019-04-25  5:49     ` Qu Wenruo
  1 sibling, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2019-04-23 12:12 UTC (permalink / raw)
  To: dsterba, Jakob Unterwurzacher, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1985 bytes --]



On 2019/4/23 下午7:33, David Sterba wrote:
> On Tue, Apr 23, 2019 at 10:16:32AM +0800, Qu Wenruo wrote:
>> On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote:
>>> I have a user who is reporting ENOSPC errors when running gocryptfs on
>>> top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).
>>>
>>> What is interesting is that the error gets thrown at write time. This
>>> is not supposed to happen, because gocryptfs does
>>>
>>>     fallocate(..., FALLOC_FL_KEEP_SIZE, ...)
>>>
>>> before writing.
>>>
>>> I wrote a minimal reproducer in C: https://github.com/rfjakob/fallocate_write
>>> This is what it looks like on ext4:
>>>
>>>     $ ../fallocate_write/fallocate_write
>>>     reading from /dev/urandom
>>>     writing to ./blob.379Q8P
>>>     writing blocks of 132096 bytes each
>>>     [...]
>>>     fallocate failed: No space left on device
>>>
>>> On btrfs, it will instead look like this:
>>>
>>>     [...]
>>>     pwrite failed: No space left on device
>>>
>>> Is this a bug in btrfs' fallocate implementation or am I reading the
>>> guarantees that fallocate gives me wrong?
>>
>> Since v4.7, this commit changed the how btrfs do NodataCOW check:
>> c6887cd11149 ("Btrfs: don't do nocow check unless we have to").
>>
>> Before that commit, btrfs always check if they need to reserve space for
>> COW, while after that patch, btrfs never checks unless we have no space.
>>
>> However this screws up other nodatacow space check.
>> And due to its age and deep changeset, it's pretty hard to fix it.
>> I have tried several times, but it will only cause more problems.
> 
> What if the commit is reverted, if the problem is otherwise hard to fix?

Tried reverted, but all other problems came up.

E.g. reserved space underflow.

I'll find the old thread and retry again.

Thanks,
Qu

> This seems to break the semantics of fallocate so the performance should
> not the main concern here.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-23 12:12     ` Qu Wenruo
@ 2019-04-23 14:50       ` Filipe Manana
  2019-04-23 19:21         ` Jakob Unterwurzacher
  2019-04-23 23:49         ` Qu Wenruo
  0 siblings, 2 replies; 22+ messages in thread
From: Filipe Manana @ 2019-04-23 14:50 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Jakob Unterwurzacher, linux-btrfs

On Tue, Apr 23, 2019 at 1:14 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2019/4/23 下午7:33, David Sterba wrote:
> > On Tue, Apr 23, 2019 at 10:16:32AM +0800, Qu Wenruo wrote:
> >> On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote:
> >>> I have a user who is reporting ENOSPC errors when running gocryptfs on
> >>> top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).
> >>>
> >>> What is interesting is that the error gets thrown at write time. This
> >>> is not supposed to happen, because gocryptfs does
> >>>
> >>>     fallocate(..., FALLOC_FL_KEEP_SIZE, ...)
> >>>
> >>> before writing.
> >>>
> >>> I wrote a minimal reproducer in C: https://github.com/rfjakob/fallocate_write
> >>> This is what it looks like on ext4:
> >>>
> >>>     $ ../fallocate_write/fallocate_write
> >>>     reading from /dev/urandom
> >>>     writing to ./blob.379Q8P
> >>>     writing blocks of 132096 bytes each
> >>>     [...]
> >>>     fallocate failed: No space left on device
> >>>
> >>> On btrfs, it will instead look like this:
> >>>
> >>>     [...]
> >>>     pwrite failed: No space left on device
> >>>
> >>> Is this a bug in btrfs' fallocate implementation or am I reading the
> >>> guarantees that fallocate gives me wrong?
> >>
> >> Since v4.7, this commit changed the how btrfs do NodataCOW check:
> >> c6887cd11149 ("Btrfs: don't do nocow check unless we have to").
> >>
> >> Before that commit, btrfs always check if they need to reserve space for
> >> COW, while after that patch, btrfs never checks unless we have no space.
> >>
> >> However this screws up other nodatacow space check.
> >> And due to its age and deep changeset, it's pretty hard to fix it.
> >> I have tried several times, but it will only cause more problems.
> >
> > What if the commit is reverted, if the problem is otherwise hard to fix?
>
> Tried reverted, but all other problems came up.

I haven't seen an explanation on why that patch causes ENOSPC or what
nodatacow space check screw ups it causes.

It seems fine to me, and what we currently do:

1) For any buffered write, check if there's enough free data space;
2) If not try to allocate a new data chunk;
3) If that fails check if the file has the "have prealloc extents"
flag or has the nodatacow flag set
4) If any of those conditions is true, check if we can write to the
existing extent - if it's not shared or no checksums exist in its
range, meaning it's an unwritten (prealloc) extent, return success to
userspace

So what's wrong with it? And how does it cause the ENOSPC?

Trying the reproducer, at least on a 5.0 kernel, does never fail on a
pwrite for me, but always on fallocate:

$ mkfs.btrfs -f -b $((4 * 1024 * 1024 * 1024)) /dev/sdi
$ mount /dev/sdi /mnt/sdi
$ cd /mnt/sdi
$ /path/to/reproducer
reading from /dev/urandom
writing to ./blob.IIa6tH
writing blocks of 132096 bytes each
total    125 MiB,  65.52 MiB/s
total    251 MiB,  44.59 MiB/s
total    377 MiB,  55.23 MiB/s
total    503 MiB,  66.21 MiB/s
total    629 MiB,  59.97 MiB/s
total    755 MiB,   3.70 MiB/s
total    881 MiB,  50.24 MiB/s
total   1007 MiB,  64.51 MiB/s
total   1133 MiB,  50.70 MiB/s
total   1259 MiB,  49.29 MiB/s
total   1385 MiB,  47.93 MiB/s
total   1511 MiB,   4.00 MiB/s
total   1637 MiB,  49.85 MiB/s
total   1763 MiB,  48.11 MiB/s
total   1889 MiB,  66.62 MiB/s
total   2015 MiB,   5.60 MiB/s
total   2141 MiB,  19.58 MiB/s
total   2267 MiB,  64.80 MiB/s
total   2393 MiB,  13.23 MiB/s
total   2519 MiB,  14.95 MiB/s
fallocate failed: No space left on device

So either that was tested on a rather old kernel or:

1) we had snapshotting happening between a fallocate and a pwrite (or
at the same time as the pwrite)
2) before the pwrite (or during) the unwritten/prealloc extent was
reflinked (cp --reflink, clone or dedupe ioctls)

What did I miss here?

Thanks.

>
> E.g. reserved space underflow.
>
> I'll find the old thread and retry again.
>
> Thanks,
> Qu
>
> > This seems to break the semantics of fallocate so the performance should
> > not the main concern here.
> >
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-23 14:50       ` Filipe Manana
@ 2019-04-23 19:21         ` Jakob Unterwurzacher
  2019-04-23 23:56           ` Zygo Blaxell
  2019-04-23 23:49         ` Qu Wenruo
  1 sibling, 1 reply; 22+ messages in thread
From: Jakob Unterwurzacher @ 2019-04-23 19:21 UTC (permalink / raw)
  To: fdmanana; +Cc: Qu Wenruo, dsterba, linux-btrfs

> Trying the reproducer, at least on a 5.0 kernel, does never fail on a
> pwrite for me, but always on fallocate:
[...]
> So either that was tested on a rather old kernel or:
>
> 1) we had snapshotting happening between a fallocate and a pwrite (or
> at the same time as the pwrite)
> 2) before the pwrite (or during) the unwritten/prealloc extent was
> reflinked (cp --reflink, clone or dedupe ioctls)

I am at Linux 5.0.4-200.fc29.x86_64, the user in the github ticket is
at Linux 5.0.7-arch1-1-ARCH, so pretty recent.
There should be no snapshot or reflink or really any other activity on
the test filesystem.

Maybe the difference is that I am testing on a file and you on a raw
block device?
This is how things look at 4GB size:

$ dd if=/dev/zero of=img bs=1M count=5000
$ mkfs.btrfs -f -b $((4 * 1024 * 1024 * 1024)) img
$ mkdir mnt
$ sudo mount img mnt
$ sudo chmod 777 mnt
$ cd mnt
$ ../fallocate_write/fallocate_write
reading from /dev/urandom
writing to ./blob.qEaSZl
writing blocks of 132096 bytes each
total    125 MiB, 162.06 MiB/s
total    251 MiB, 162.92 MiB/s
pwrite failed: No space left on device

Is your /dev/sdi an SSD? I noticed that mkfs.btrfs does NOT think that
the disk image file is an SSD,
despite the file residing on an SSD.

Thanks,
Jakob

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-23 14:50       ` Filipe Manana
  2019-04-23 19:21         ` Jakob Unterwurzacher
@ 2019-04-23 23:49         ` Qu Wenruo
  2019-04-24  9:28           ` Filipe Manana
  1 sibling, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2019-04-23 23:49 UTC (permalink / raw)
  To: fdmanana; +Cc: dsterba, Jakob Unterwurzacher, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4844 bytes --]



On 2019/4/23 下午10:50, Filipe Manana wrote:
> On Tue, Apr 23, 2019 at 1:14 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2019/4/23 下午7:33, David Sterba wrote:
>>> On Tue, Apr 23, 2019 at 10:16:32AM +0800, Qu Wenruo wrote:
>>>> On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote:
>>>>> I have a user who is reporting ENOSPC errors when running gocryptfs on
>>>>> top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).
>>>>>
>>>>> What is interesting is that the error gets thrown at write time. This
>>>>> is not supposed to happen, because gocryptfs does
>>>>>
>>>>>     fallocate(..., FALLOC_FL_KEEP_SIZE, ...)
>>>>>
>>>>> before writing.
>>>>>
>>>>> I wrote a minimal reproducer in C: https://github.com/rfjakob/fallocate_write
>>>>> This is what it looks like on ext4:
>>>>>
>>>>>     $ ../fallocate_write/fallocate_write
>>>>>     reading from /dev/urandom
>>>>>     writing to ./blob.379Q8P
>>>>>     writing blocks of 132096 bytes each
>>>>>     [...]
>>>>>     fallocate failed: No space left on device
>>>>>
>>>>> On btrfs, it will instead look like this:
>>>>>
>>>>>     [...]
>>>>>     pwrite failed: No space left on device
>>>>>
>>>>> Is this a bug in btrfs' fallocate implementation or am I reading the
>>>>> guarantees that fallocate gives me wrong?
>>>>
>>>> Since v4.7, this commit changed the how btrfs do NodataCOW check:
>>>> c6887cd11149 ("Btrfs: don't do nocow check unless we have to").
>>>>
>>>> Before that commit, btrfs always check if they need to reserve space for
>>>> COW, while after that patch, btrfs never checks unless we have no space.
>>>>
>>>> However this screws up other nodatacow space check.
>>>> And due to its age and deep changeset, it's pretty hard to fix it.
>>>> I have tried several times, but it will only cause more problems.
>>>
>>> What if the commit is reverted, if the problem is otherwise hard to fix?
>>
>> Tried reverted, but all other problems came up.
> 
> I haven't seen an explanation on why that patch causes ENOSPC or what
> nodatacow space check screw ups it causes.
> 
> It seems fine to me, and what we currently do:
> 
> 1) For any buffered write, check if there's enough free data space;
> 2) If not try to allocate a new data chunk;
> 3) If that fails check if the file has the "have prealloc extents"
> flag or has the nodatacow flag set
> 4) If any of those conditions is true, check if we can write to the
> existing extent - if it's not shared or no checksums exist in its
> range, meaning it's an unwritten (prealloc) extent, return success to
> userspace
> 
> So what's wrong with it? And how does it cause the ENOSPC?

E.g.

We have a 128Mb preallocated file extent.
And assume the fs only have 128M free data space, meaning 0 remaining
space at all.

Then we try to buffer write, which means buffered will just fail as it
will need data space.

The idea is always here for fallocate/pwrite, just the timing where the
ENOSPC happens.


We have btrfs/153 for the same reason to fail for a long time, although
it's from quota, but the reason the completely the same.

Thanks,
Qu

> 
> Trying the reproducer, at least on a 5.0 kernel, does never fail on a
> pwrite for me, but always on fallocate:
> 
> $ mkfs.btrfs -f -b $((4 * 1024 * 1024 * 1024)) /dev/sdi
> $ mount /dev/sdi /mnt/sdi
> $ cd /mnt/sdi
> $ /path/to/reproducer
> reading from /dev/urandom
> writing to ./blob.IIa6tH
> writing blocks of 132096 bytes each
> total    125 MiB,  65.52 MiB/s
> total    251 MiB,  44.59 MiB/s
> total    377 MiB,  55.23 MiB/s
> total    503 MiB,  66.21 MiB/s
> total    629 MiB,  59.97 MiB/s
> total    755 MiB,   3.70 MiB/s
> total    881 MiB,  50.24 MiB/s
> total   1007 MiB,  64.51 MiB/s
> total   1133 MiB,  50.70 MiB/s
> total   1259 MiB,  49.29 MiB/s
> total   1385 MiB,  47.93 MiB/s
> total   1511 MiB,   4.00 MiB/s
> total   1637 MiB,  49.85 MiB/s
> total   1763 MiB,  48.11 MiB/s
> total   1889 MiB,  66.62 MiB/s
> total   2015 MiB,   5.60 MiB/s
> total   2141 MiB,  19.58 MiB/s
> total   2267 MiB,  64.80 MiB/s
> total   2393 MiB,  13.23 MiB/s
> total   2519 MiB,  14.95 MiB/s
> fallocate failed: No space left on device
> 
> So either that was tested on a rather old kernel or:
> 
> 1) we had snapshotting happening between a fallocate and a pwrite (or
> at the same time as the pwrite)
> 2) before the pwrite (or during) the unwritten/prealloc extent was
> reflinked (cp --reflink, clone or dedupe ioctls)
> 
> What did I miss here?
> 
> Thanks.
> 
>>
>> E.g. reserved space underflow.
>>
>> I'll find the old thread and retry again.
>>
>> Thanks,
>> Qu
>>
>>> This seems to break the semantics of fallocate so the performance should
>>> not the main concern here.
>>>
>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-23 19:21         ` Jakob Unterwurzacher
@ 2019-04-23 23:56           ` Zygo Blaxell
  2019-04-27 11:25             ` Jakob Unterwurzacher
  0 siblings, 1 reply; 22+ messages in thread
From: Zygo Blaxell @ 2019-04-23 23:56 UTC (permalink / raw)
  To: Jakob Unterwurzacher; +Cc: fdmanana, Qu Wenruo, dsterba, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3689 bytes --]

On Tue, Apr 23, 2019 at 09:21:09PM +0200, Jakob Unterwurzacher wrote:

> > Trying the reproducer, at least on a 5.0 kernel, does never fail on a
> > pwrite for me, but always on fallocate:
> [...]
> > So either that was tested on a rather old kernel or:
> >
> > 1) we had snapshotting happening between a fallocate and a pwrite (or
> > at the same time as the pwrite)
> > 2) before the pwrite (or during) the unwritten/prealloc extent was
> > reflinked (cp --reflink, clone or dedupe ioctls)
> 
> I am at Linux 5.0.4-200.fc29.x86_64, the user in the github ticket is
> at Linux 5.0.7-arch1-1-ARCH, so pretty recent.
> There should be no snapshot or reflink or really any other activity on
> the test filesystem.
> 
> Maybe the difference is that I am testing on a file and you on a raw
> block device?
> This is how things look at 4GB size:
> 
> $ dd if=/dev/zero of=img bs=1M count=5000
> $ mkfs.btrfs -f -b $((4 * 1024 * 1024 * 1024)) img
> $ mkdir mnt
> $ sudo mount img mnt
> $ sudo chmod 777 mnt
> $ cd mnt
> $ ../fallocate_write/fallocate_write
> reading from /dev/urandom
> writing to ./blob.qEaSZl
> writing blocks of 132096 bytes each

132096 is 129 * 1024, which is not a multiple of 4K.  There will be a CoW
operation in cases where one 4K block from each pwrite is written twice
in separate transactions (or with fsync between).

Also, fallocate only works _once_ on btrfs.  After the first write,
prealloc extents are replaced with ordinary CoW extent (ref)s, and the
fallocate no-ENOSPC guarantee is gone:

	# fallocate -l 1m foo
	# sync
	# fiewalk foo 
	File: foo
	Extent { begin = 0x0, end = 0x100000, physical = 0x4aedc01000, flags = Extent::PREALLOC|FIEMAP_EXTENT_LAST, physical_len = 0x100000, logical_len = 0x100000 }
	# head -c 128k /dev/urandom | dd conv=notrunc of=foo 
	256+0 records in
	256+0 records out
	131072 bytes (131 kB, 128 KiB) copied, 0.00201152 s, 65.2 MB/s
	# sync
	# fiewalk foo 
	File: foo
	Extent { begin = 0x0, end = 0x20000, physical = 0x4aedc01000, flags = 0, physical_len = 0x100000, logical_len = 0x20000 }
	Extent { begin = 0x20000, end = 0x100000, physical = 0x4aedc21000, flags = Extent::PREALLOC|FIEMAP_EXTENT_LAST, physical_len = 0x100000, logical_len = 0xe0000, offset = 0x20000 }

Here we see the first block is overwriting the same physical address,
but it loses the PREALLOC attribute.  A second write will trigger CoW,
and a new data extent will be allocated:

	# head -c 128k /dev/urandom | dd conv=notrunc of=foo 
	256+0 records in
	256+0 records out
	131072 bytes (131 kB, 128 KiB) copied, 0.00187461 s, 69.9 MB/s
	# sync
	# fiewalk foo 
	File: foo
	Extent { begin = 0x0, end = 0x20000, physical = 0x4ae5f00000, flags = 0, physical_len = 0x20000, logical_len = 0x20000 }
	Extent { begin = 0x20000, end = 0x100000, physical = 0x4aedc21000, flags = Extent::PREALLOC|FIEMAP_EXTENT_LAST, physical_len = 0x100000, logical_len = 0xe0000, offset = 0x20000 }

Note that the physical address of the first extent changed, indicating
CoW.  Also, all of the space allocated to the PREALLOC extent remains
allocated until the entire PREALLOC extent is overwritten (i.e. this
uses 128K of _additional_ space, the partial overwrite doesn't free the
first 128K of prealloc space).

> total    125 MiB, 162.06 MiB/s
> total    251 MiB, 162.92 MiB/s
> pwrite failed: No space left on device
> 
> Is your /dev/sdi an SSD? I noticed that mkfs.btrfs does NOT think that
> the disk image file is an SSD,
> despite the file residing on an SSD.

fallocate is only going to behave the way posix_fallocate specifies on
files with datacow turned off.

> Thanks,
> Jakob

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-23 23:49         ` Qu Wenruo
@ 2019-04-24  9:28           ` Filipe Manana
  2019-04-24  9:50             ` Qu Wenruo
  0 siblings, 1 reply; 22+ messages in thread
From: Filipe Manana @ 2019-04-24  9:28 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Jakob Unterwurzacher, linux-btrfs

On Wed, Apr 24, 2019 at 12:49 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2019/4/23 下午10:50, Filipe Manana wrote:
> > On Tue, Apr 23, 2019 at 1:14 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>
> >>
> >>
> >> On 2019/4/23 下午7:33, David Sterba wrote:
> >>> On Tue, Apr 23, 2019 at 10:16:32AM +0800, Qu Wenruo wrote:
> >>>> On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote:
> >>>>> I have a user who is reporting ENOSPC errors when running gocryptfs on
> >>>>> top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).
> >>>>>
> >>>>> What is interesting is that the error gets thrown at write time. This
> >>>>> is not supposed to happen, because gocryptfs does
> >>>>>
> >>>>>     fallocate(..., FALLOC_FL_KEEP_SIZE, ...)
> >>>>>
> >>>>> before writing.
> >>>>>
> >>>>> I wrote a minimal reproducer in C: https://github.com/rfjakob/fallocate_write
> >>>>> This is what it looks like on ext4:
> >>>>>
> >>>>>     $ ../fallocate_write/fallocate_write
> >>>>>     reading from /dev/urandom
> >>>>>     writing to ./blob.379Q8P
> >>>>>     writing blocks of 132096 bytes each
> >>>>>     [...]
> >>>>>     fallocate failed: No space left on device
> >>>>>
> >>>>> On btrfs, it will instead look like this:
> >>>>>
> >>>>>     [...]
> >>>>>     pwrite failed: No space left on device
> >>>>>
> >>>>> Is this a bug in btrfs' fallocate implementation or am I reading the
> >>>>> guarantees that fallocate gives me wrong?
> >>>>
> >>>> Since v4.7, this commit changed the how btrfs do NodataCOW check:
> >>>> c6887cd11149 ("Btrfs: don't do nocow check unless we have to").
> >>>>
> >>>> Before that commit, btrfs always check if they need to reserve space for
> >>>> COW, while after that patch, btrfs never checks unless we have no space.
> >>>>
> >>>> However this screws up other nodatacow space check.
> >>>> And due to its age and deep changeset, it's pretty hard to fix it.
> >>>> I have tried several times, but it will only cause more problems.
> >>>
> >>> What if the commit is reverted, if the problem is otherwise hard to fix?
> >>
> >> Tried reverted, but all other problems came up.
> >
> > I haven't seen an explanation on why that patch causes ENOSPC or what
> > nodatacow space check screw ups it causes.
> >
> > It seems fine to me, and what we currently do:
> >
> > 1) For any buffered write, check if there's enough free data space;
> > 2) If not try to allocate a new data chunk;
> > 3) If that fails check if the file has the "have prealloc extents"
> > flag or has the nodatacow flag set
> > 4) If any of those conditions is true, check if we can write to the
> > existing extent - if it's not shared or no checksums exist in its
> > range, meaning it's an unwritten (prealloc) extent, return success to
> > userspace
> >
> > So what's wrong with it? And how does it cause the ENOSPC?
>
> E.g.
>
> We have a 128Mb preallocated file extent.
> And assume the fs only have 128M free data space, meaning 0 remaining
> space at all.

That's a contradicting sentence...

>
> Then we try to buffer write, which means buffered will just fail as it
> will need data space.
>
> The idea is always here for fallocate/pwrite, just the timing where the
> ENOSPC happens.

Can't make sense of that sentence as well.

So I suppose what you are trying to say is that a write into an
unwritten extent causes space allocation,
and that can prevent some other write (which is not into an unwritten
extent) from being able to allocate space and therefore fail.

That's a valid problem that should be temporary.

However when allocating space for a write into an unwritten extent (or
any nodatacow write) we increment the data space info's bytes_may_use
counter,
but then if when writeback starts if we don't need to fallback into
CoW, we end up never decrementing the bytes_may_use counter (even
after writeback completes), leaking it.
Not sure if this is the problem you were mentioning or just causing
other writes to temporarily fail.

thanks


>
>
> We have btrfs/153 for the same reason to fail for a long time, although
> it's from quota, but the reason the completely the same.
>
> Thanks,
> Qu
>
> >
> > Trying the reproducer, at least on a 5.0 kernel, does never fail on a
> > pwrite for me, but always on fallocate:
> >
> > $ mkfs.btrfs -f -b $((4 * 1024 * 1024 * 1024)) /dev/sdi
> > $ mount /dev/sdi /mnt/sdi
> > $ cd /mnt/sdi
> > $ /path/to/reproducer
> > reading from /dev/urandom
> > writing to ./blob.IIa6tH
> > writing blocks of 132096 bytes each
> > total    125 MiB,  65.52 MiB/s
> > total    251 MiB,  44.59 MiB/s
> > total    377 MiB,  55.23 MiB/s
> > total    503 MiB,  66.21 MiB/s
> > total    629 MiB,  59.97 MiB/s
> > total    755 MiB,   3.70 MiB/s
> > total    881 MiB,  50.24 MiB/s
> > total   1007 MiB,  64.51 MiB/s
> > total   1133 MiB,  50.70 MiB/s
> > total   1259 MiB,  49.29 MiB/s
> > total   1385 MiB,  47.93 MiB/s
> > total   1511 MiB,   4.00 MiB/s
> > total   1637 MiB,  49.85 MiB/s
> > total   1763 MiB,  48.11 MiB/s
> > total   1889 MiB,  66.62 MiB/s
> > total   2015 MiB,   5.60 MiB/s
> > total   2141 MiB,  19.58 MiB/s
> > total   2267 MiB,  64.80 MiB/s
> > total   2393 MiB,  13.23 MiB/s
> > total   2519 MiB,  14.95 MiB/s
> > fallocate failed: No space left on device
> >
> > So either that was tested on a rather old kernel or:
> >
> > 1) we had snapshotting happening between a fallocate and a pwrite (or
> > at the same time as the pwrite)
> > 2) before the pwrite (or during) the unwritten/prealloc extent was
> > reflinked (cp --reflink, clone or dedupe ioctls)
> >
> > What did I miss here?
> >
> > Thanks.
> >
> >>
> >> E.g. reserved space underflow.
> >>
> >> I'll find the old thread and retry again.
> >>
> >> Thanks,
> >> Qu
> >>
> >>> This seems to break the semantics of fallocate so the performance should
> >>> not the main concern here.
> >>>
> >>
> >
> >
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-24  9:28           ` Filipe Manana
@ 2019-04-24  9:50             ` Qu Wenruo
  0 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2019-04-24  9:50 UTC (permalink / raw)
  To: fdmanana; +Cc: dsterba, Jakob Unterwurzacher, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4211 bytes --]



On 2019/4/24 下午5:28, Filipe Manana wrote:
[snip]
>>> So what's wrong with it? And how does it cause the ENOSPC?
>>
>> E.g.
>>
>> We have a 128Mb preallocated file extent.
>> And assume the fs only have 128M free data space, meaning 0 remaining
>> space at all.
> 
> That's a contradicting sentence...
> 
>>
>> Then we try to buffer write, which means buffered will just fail as it
>> will need data space.
>>
>> The idea is always here for fallocate/pwrite, just the timing where the
>> ENOSPC happens.
> 
> Can't make sense of that sentence as well.

My bad, that change is already in buffered_write(), so that sentence
makes no sense.
> 
> So I suppose what you are trying to say is that a write into an
> unwritten extent causes space allocation,
> and that can prevent some other write (which is not into an unwritten
> extent) from being able to allocate space and therefore fail.

That's one case.

> 
> That's a valid problem that should be temporary.

I just tried a basic script:
---
#!/bin/bash

dev=/dev/test/test
mnt=/mnt/btrfs

mkfs.btrfs -f $dev -b 512M

mount $dev $mnt

fallocate -l 384M $mnt/file1
echo "fallocate success"
dd if=/dev/zero bs=512K  conv=notrunc count=768 of=$mnt/file2

umount $mnt
---

This fails just like the error report.


At least in current form, if we're writing into the preallocated space,
it indeed skips the data space reservation so it shouldn't cause problem
at that buffered write in theory.


However we have other locations which can reserve data space:
- btrfs_page_mkwrite()
- btrfs_truncate_block()
- btrfs_direct_IO()

Haven't looked into why above script fails, but it should have something
to do with any of the data space reservation.

Thanks,
Qu
> 
> However when allocating space for a write into an unwritten extent (or
> any nodatacow write) we increment the data space info's bytes_may_use
> counter,
> but then if when writeback starts if we don't need to fallback into
> CoW, we end up never decrementing the bytes_may_use counter (even
> after writeback completes), leaking it.
> Not sure if this is the problem you were mentioning or just causing
> other writes to temporarily fail.
> 
> thanks
> 
> 
>>
>>
>> We have btrfs/153 for the same reason to fail for a long time, although
>> it's from quota, but the reason the completely the same.
>>
>> Thanks,
>> Qu
>>
>>>
>>> Trying the reproducer, at least on a 5.0 kernel, does never fail on a
>>> pwrite for me, but always on fallocate:
>>>
>>> $ mkfs.btrfs -f -b $((4 * 1024 * 1024 * 1024)) /dev/sdi
>>> $ mount /dev/sdi /mnt/sdi
>>> $ cd /mnt/sdi
>>> $ /path/to/reproducer
>>> reading from /dev/urandom
>>> writing to ./blob.IIa6tH
>>> writing blocks of 132096 bytes each
>>> total    125 MiB,  65.52 MiB/s
>>> total    251 MiB,  44.59 MiB/s
>>> total    377 MiB,  55.23 MiB/s
>>> total    503 MiB,  66.21 MiB/s
>>> total    629 MiB,  59.97 MiB/s
>>> total    755 MiB,   3.70 MiB/s
>>> total    881 MiB,  50.24 MiB/s
>>> total   1007 MiB,  64.51 MiB/s
>>> total   1133 MiB,  50.70 MiB/s
>>> total   1259 MiB,  49.29 MiB/s
>>> total   1385 MiB,  47.93 MiB/s
>>> total   1511 MiB,   4.00 MiB/s
>>> total   1637 MiB,  49.85 MiB/s
>>> total   1763 MiB,  48.11 MiB/s
>>> total   1889 MiB,  66.62 MiB/s
>>> total   2015 MiB,   5.60 MiB/s
>>> total   2141 MiB,  19.58 MiB/s
>>> total   2267 MiB,  64.80 MiB/s
>>> total   2393 MiB,  13.23 MiB/s
>>> total   2519 MiB,  14.95 MiB/s
>>> fallocate failed: No space left on device
>>>
>>> So either that was tested on a rather old kernel or:
>>>
>>> 1) we had snapshotting happening between a fallocate and a pwrite (or
>>> at the same time as the pwrite)
>>> 2) before the pwrite (or during) the unwritten/prealloc extent was
>>> reflinked (cp --reflink, clone or dedupe ioctls)
>>>
>>> What did I miss here?
>>>
>>> Thanks.
>>>
>>>>
>>>> E.g. reserved space underflow.
>>>>
>>>> I'll find the old thread and retry again.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>> This seems to break the semantics of fallocate so the performance should
>>>>> not the main concern here.
>>>>>
>>>>
>>>
>>>
>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-23 11:33   ` David Sterba
  2019-04-23 12:12     ` Qu Wenruo
@ 2019-04-25  5:49     ` Qu Wenruo
  2019-04-25 13:25       ` Josef Bacik
  2019-04-25 14:39       ` Filipe Manana
  1 sibling, 2 replies; 22+ messages in thread
From: Qu Wenruo @ 2019-04-25  5:49 UTC (permalink / raw)
  To: dsterba, Jakob Unterwurzacher, linux-btrfs, Josef Bacik


[-- Attachment #1.1: Type: text/plain, Size: 2995 bytes --]



On 2019/4/23 下午7:33, David Sterba wrote:
> On Tue, Apr 23, 2019 at 10:16:32AM +0800, Qu Wenruo wrote:
>> On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote:
>>> I have a user who is reporting ENOSPC errors when running gocryptfs on
>>> top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).
>>>
>>> What is interesting is that the error gets thrown at write time. This
>>> is not supposed to happen, because gocryptfs does
>>>
>>>     fallocate(..., FALLOC_FL_KEEP_SIZE, ...)
>>>
>>> before writing.
>>>
>>> I wrote a minimal reproducer in C: https://github.com/rfjakob/fallocate_write
>>> This is what it looks like on ext4:
>>>
>>>     $ ../fallocate_write/fallocate_write
>>>     reading from /dev/urandom
>>>     writing to ./blob.379Q8P
>>>     writing blocks of 132096 bytes each
>>>     [...]
>>>     fallocate failed: No space left on device
>>>
>>> On btrfs, it will instead look like this:
>>>
>>>     [...]
>>>     pwrite failed: No space left on device
>>>
>>> Is this a bug in btrfs' fallocate implementation or am I reading the
>>> guarantees that fallocate gives me wrong?
>>
>> Since v4.7, this commit changed the how btrfs do NodataCOW check:
>> c6887cd11149 ("Btrfs: don't do nocow check unless we have to").
>>
>> Before that commit, btrfs always check if they need to reserve space for
>> COW, while after that patch, btrfs never checks unless we have no space.
>>
>> However this screws up other nodatacow space check.
>> And due to its age and deep changeset, it's pretty hard to fix it.
>> I have tried several times, but it will only cause more problems.
> 
> What if the commit is reverted, if the problem is otherwise hard to fix?
> This seems to break the semantics of fallocate so the performance should
> not the main concern here.

My blur memory of the underflow case is something like below: (failed to
locate the old thread)

- fallocate
- pwrite in to the reallocated range
  At this timing, we can do nocow, thus no data space is reserved.

- Something happened to make that preallocated extent shared, without
  writing back dirty pages.
  Some possible causes are snapshot and reflink.
  However nowadays, snapshots will write all dirty inodes, and reflink
  will write the source range to disk.

  Maybe it's a small window inside create_snapshot() between
  btrfs_start_delalloc_snapshot() and btrfs_commit_transaction() calls?

- dirty pages get written back
  We created ordered extent, but at this timing, we can't do nocow any
  more, we need to fallback to cow.
  However at the buffered write timing, we didn't reserved data space.
  Now we will underflow data space reservation.

However nowadays there are some new mechanism to handle this case more
gracefully, like btrfs_root::will_be_snapshotted.

I'll double check if reverting that patch in latest kernel still cause
problem.
But any idea on the possible problem is welcomed.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-25  5:49     ` Qu Wenruo
@ 2019-04-25 13:25       ` Josef Bacik
  2019-04-25 13:50         ` Qu Wenruo
  2019-04-25 14:39       ` Filipe Manana
  1 sibling, 1 reply; 22+ messages in thread
From: Josef Bacik @ 2019-04-25 13:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Jakob Unterwurzacher, linux-btrfs, Josef Bacik

On Thu, Apr 25, 2019 at 01:49:23PM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/4/23 下午7:33, David Sterba wrote:
> > On Tue, Apr 23, 2019 at 10:16:32AM +0800, Qu Wenruo wrote:
> >> On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote:
> >>> I have a user who is reporting ENOSPC errors when running gocryptfs on
> >>> top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).
> >>>
> >>> What is interesting is that the error gets thrown at write time. This
> >>> is not supposed to happen, because gocryptfs does
> >>>
> >>>     fallocate(..., FALLOC_FL_KEEP_SIZE, ...)
> >>>
> >>> before writing.
> >>>
> >>> I wrote a minimal reproducer in C: https://github.com/rfjakob/fallocate_write
> >>> This is what it looks like on ext4:
> >>>
> >>>     $ ../fallocate_write/fallocate_write
> >>>     reading from /dev/urandom
> >>>     writing to ./blob.379Q8P
> >>>     writing blocks of 132096 bytes each
> >>>     [...]
> >>>     fallocate failed: No space left on device
> >>>
> >>> On btrfs, it will instead look like this:
> >>>
> >>>     [...]
> >>>     pwrite failed: No space left on device
> >>>
> >>> Is this a bug in btrfs' fallocate implementation or am I reading the
> >>> guarantees that fallocate gives me wrong?
> >>
> >> Since v4.7, this commit changed the how btrfs do NodataCOW check:
> >> c6887cd11149 ("Btrfs: don't do nocow check unless we have to").
> >>
> >> Before that commit, btrfs always check if they need to reserve space for
> >> COW, while after that patch, btrfs never checks unless we have no space.
> >>
> >> However this screws up other nodatacow space check.
> >> And due to its age and deep changeset, it's pretty hard to fix it.
> >> I have tried several times, but it will only cause more problems.
> > 
> > What if the commit is reverted, if the problem is otherwise hard to fix?
> > This seems to break the semantics of fallocate so the performance should
> > not the main concern here.
> 

Are we sure the ENOSPC is coming from the data reservation?  That change makes
us fall back on the old behavior, which means we should still succeed at making
the data reservation.

However it fallocate() _does not_ guarantee you won't fail the metadata
reservation, I suspect that may be what you are running into.

> My blur memory of the underflow case is something like below: (failed to
> locate the old thread)
> 
> - fallocate
> - pwrite in to the reallocated range
>   At this timing, we can do nocow, thus no data space is reserved.
> 
> - Something happened to make that preallocated extent shared, without
>   writing back dirty pages.
>   Some possible causes are snapshot and reflink.
>   However nowadays, snapshots will write all dirty inodes, and reflink
>   will write the source range to disk.
> 
>   Maybe it's a small window inside create_snapshot() between
>   btrfs_start_delalloc_snapshot() and btrfs_commit_transaction() calls?
> 
> - dirty pages get written back
>   We created ordered extent, but at this timing, we can't do nocow any
>   more, we need to fallback to cow.
>   However at the buffered write timing, we didn't reserved data space.
>   Now we will underflow data space reservation.
> 
> However nowadays there are some new mechanism to handle this case more
> gracefully, like btrfs_root::will_be_snapshotted.
> 
> I'll double check if reverting that patch in latest kernel still cause
> problem.
> But any idea on the possible problem is welcomed.
> 

Reading the code there's two scenarios that happen.  All of our down stream
stuff assumes that we've updated ->bytes_may_use for our data write.  So if we
fail our reservation and do the nocow thing of skipping our reservation we can
overflow if we

1) Need to allocate an extent anyway because of reflink/snapshot.
btrfs_add_reserved_space() expects that space_info->bytes_may_use has our region
in it, so in this case it doesn't and we underflow here.  I think you are right
in that we do all dirty writeback nowadays so this is less of an issue, buuuut

2) In run_delalloc_nocow we do EXTENT_CLEAR_DATA_RESV unconditionally if we did
manage to do a nocow.  If we fell back on the no reserve case then this would
underflow our ->bytes_may_use counter here.

Off the top of my head I say we just add our write_bytes to ->bytes_may_use if
we use the nocow path.  If we're already failing to reserve data space as it is
then there's no harm in making it appear like we have less space by inflating
->bytes_may_use.  This is the straightforward fix for the underflow, and we
could come up with something more crafty later, like setting the range with
EXTENT_NO_DATA_RESERVE and doing magic later with ->bytes_may_use.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-25 13:25       ` Josef Bacik
@ 2019-04-25 13:50         ` Qu Wenruo
  2019-04-25 14:09           ` Josef Bacik
  0 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2019-04-25 13:50 UTC (permalink / raw)
  To: Josef Bacik; +Cc: dsterba, Jakob Unterwurzacher, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3687 bytes --]



On 2019/4/25 下午9:25, Josef Bacik wrote:
[snip]
>>>
>>> What if the commit is reverted, if the problem is otherwise hard to fix?
>>> This seems to break the semantics of fallocate so the performance should
>>> not the main concern here.
>>
> 
> Are we sure the ENOSPC is coming from the data reservation?  That change makes
> us fall back on the old behavior, which means we should still succeed at making
> the data reservation.
> 
> However it fallocate() _does not_ guarantee you won't fail the metadata
> reservation, I suspect that may be what you are running into.

For this script, we only needs 4 file extents at most.
Even the initial 8M metadata should be pretty enough, thus I don't think
it's metadata causing the problem.
---
#!/bin/bash

dev=/dev/test/test
mnt=/mnt/btrfs

mkfs.btrfs -f $dev -b 512M

mount $dev $mnt

fallocate -l 384M $mnt/file1
echo "fallocate success"
sync
dd if=/dev/zero bs=512K  oflag=direct conv=notrunc count=768 of=$mnt/file2

umount $mnt
---

> 
>> My blur memory of the underflow case is something like below: (failed to
>> locate the old thread)
>>
>> - fallocate
>> - pwrite in to the reallocated range
>>   At this timing, we can do nocow, thus no data space is reserved.
>>
>> - Something happened to make that preallocated extent shared, without
>>   writing back dirty pages.
>>   Some possible causes are snapshot and reflink.
>>   However nowadays, snapshots will write all dirty inodes, and reflink
>>   will write the source range to disk.
>>
>>   Maybe it's a small window inside create_snapshot() between
>>   btrfs_start_delalloc_snapshot() and btrfs_commit_transaction() calls?
>>
>> - dirty pages get written back
>>   We created ordered extent, but at this timing, we can't do nocow any
>>   more, we need to fallback to cow.
>>   However at the buffered write timing, we didn't reserved data space.
>>   Now we will underflow data space reservation.
>>
>> However nowadays there are some new mechanism to handle this case more
>> gracefully, like btrfs_root::will_be_snapshotted.
>>
>> I'll double check if reverting that patch in latest kernel still cause
>> problem.
>> But any idea on the possible problem is welcomed.
>>
> 
> Reading the code there's two scenarios that happen.  All of our down stream
> stuff assumes that we've updated ->bytes_may_use for our data write.  So if we
> fail our reservation and do the nocow thing of skipping our reservation we can
> overflow if we
> 
> 1) Need to allocate an extent anyway because of reflink/snapshot.
> btrfs_add_reserved_space() expects that space_info->bytes_may_use has our region
> in it, so in this case it doesn't and we underflow here.  I think you are right
> in that we do all dirty writeback nowadays so this is less of an issue, buuuut
> 
> 2) In run_delalloc_nocow we do EXTENT_CLEAR_DATA_RESV unconditionally if we did
> manage to do a nocow.  If we fell back on the no reserve case then this would
> underflow our ->bytes_may_use counter here.

Right, I missed this case. Thanks for pointing this out.

> 
> Off the top of my head I say we just add our write_bytes to ->bytes_may_use if
> we use the nocow path.  If we're already failing to reserve data space as it is
> then there's no harm in making it appear like we have less space by inflating
> ->bytes_may_use.  This is the straightforward fix for the underflow, and we
> could come up with something more crafty later, like setting the range with
> EXTENT_NO_DATA_RESERVE and doing magic later with ->bytes_may_use.  Thanks,

Sounds pretty valid to me.

Thanks for the idea,
Qu

> 
> Josef
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-25 13:50         ` Qu Wenruo
@ 2019-04-25 14:09           ` Josef Bacik
  2019-04-25 14:11             ` Qu Wenruo
  0 siblings, 1 reply; 22+ messages in thread
From: Josef Bacik @ 2019-04-25 14:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Josef Bacik, dsterba, Jakob Unterwurzacher, linux-btrfs

On Thu, Apr 25, 2019 at 09:50:25PM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/4/25 下午9:25, Josef Bacik wrote:
> [snip]
> >>>
> >>> What if the commit is reverted, if the problem is otherwise hard to fix?
> >>> This seems to break the semantics of fallocate so the performance should
> >>> not the main concern here.
> >>
> > 
> > Are we sure the ENOSPC is coming from the data reservation?  That change makes
> > us fall back on the old behavior, which means we should still succeed at making
> > the data reservation.
> > 
> > However it fallocate() _does not_ guarantee you won't fail the metadata
> > reservation, I suspect that may be what you are running into.
> 
> For this script, we only needs 4 file extents at most.
> Even the initial 8M metadata should be pretty enough, thus I don't think
> it's metadata causing the problem.
> ---
> #!/bin/bash
> 
> dev=/dev/test/test
> mnt=/mnt/btrfs
> 
> mkfs.btrfs -f $dev -b 512M
> 
> mount $dev $mnt
> 
> fallocate -l 384M $mnt/file1
> echo "fallocate success"
> sync
> dd if=/dev/zero bs=512K  oflag=direct conv=notrunc count=768 of=$mnt/file2
> 

Wellll we don't do the nocow check _at all_ for O_DIRECT, so mystery solved
there.

Josef

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-25 14:09           ` Josef Bacik
@ 2019-04-25 14:11             ` Qu Wenruo
  2019-04-25 14:13               ` Josef Bacik
  2019-04-25 14:43               ` Filipe Manana
  0 siblings, 2 replies; 22+ messages in thread
From: Qu Wenruo @ 2019-04-25 14:11 UTC (permalink / raw)
  To: Josef Bacik; +Cc: dsterba, Jakob Unterwurzacher, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1584 bytes --]



On 2019/4/25 下午10:09, Josef Bacik wrote:
> On Thu, Apr 25, 2019 at 09:50:25PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/4/25 下午9:25, Josef Bacik wrote:
>> [snip]
>>>>>
>>>>> What if the commit is reverted, if the problem is otherwise hard to fix?
>>>>> This seems to break the semantics of fallocate so the performance should
>>>>> not the main concern here.
>>>>
>>>
>>> Are we sure the ENOSPC is coming from the data reservation?  That change makes
>>> us fall back on the old behavior, which means we should still succeed at making
>>> the data reservation.
>>>
>>> However it fallocate() _does not_ guarantee you won't fail the metadata
>>> reservation, I suspect that may be what you are running into.
>>
>> For this script, we only needs 4 file extents at most.
>> Even the initial 8M metadata should be pretty enough, thus I don't think
>> it's metadata causing the problem.
>> ---
>> #!/bin/bash
>>
>> dev=/dev/test/test
>> mnt=/mnt/btrfs
>>
>> mkfs.btrfs -f $dev -b 512M
>>
>> mount $dev $mnt
>>
>> fallocate -l 384M $mnt/file1
>> echo "fallocate success"
>> sync
>> dd if=/dev/zero bs=512K  oflag=direct conv=notrunc count=768 of=$mnt/file2
>>
> 
> Wellll we don't do the nocow check _at all_ for O_DIRECT, so mystery solved
> there.

Oh, wrong flag, remove that oflag and we still get the same problem.

fallocate success
dd: error writing '/mnt/btrfs/file2': No space left on device
95+0 records in
94+0 records out
49283072 bytes (49 MB, 47 MiB) copied, 0.0807034 s, 611 MB/s

Thanks,
Qu

> 
> Josef
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-25 14:11             ` Qu Wenruo
@ 2019-04-25 14:13               ` Josef Bacik
  2019-04-25 14:16                 ` Qu Wenruo
  2019-04-26 12:47                 ` David Sterba
  2019-04-25 14:43               ` Filipe Manana
  1 sibling, 2 replies; 22+ messages in thread
From: Josef Bacik @ 2019-04-25 14:13 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Josef Bacik, dsterba, Jakob Unterwurzacher, linux-btrfs

On Thu, Apr 25, 2019 at 10:11:41PM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/4/25 下午10:09, Josef Bacik wrote:
> > On Thu, Apr 25, 2019 at 09:50:25PM +0800, Qu Wenruo wrote:
> >>
> >>
> >> On 2019/4/25 下午9:25, Josef Bacik wrote:
> >> [snip]
> >>>>>
> >>>>> What if the commit is reverted, if the problem is otherwise hard to fix?
> >>>>> This seems to break the semantics of fallocate so the performance should
> >>>>> not the main concern here.
> >>>>
> >>>
> >>> Are we sure the ENOSPC is coming from the data reservation?  That change makes
> >>> us fall back on the old behavior, which means we should still succeed at making
> >>> the data reservation.
> >>>
> >>> However it fallocate() _does not_ guarantee you won't fail the metadata
> >>> reservation, I suspect that may be what you are running into.
> >>
> >> For this script, we only needs 4 file extents at most.
> >> Even the initial 8M metadata should be pretty enough, thus I don't think
> >> it's metadata causing the problem.
> >> ---
> >> #!/bin/bash
> >>
> >> dev=/dev/test/test
> >> mnt=/mnt/btrfs
> >>
> >> mkfs.btrfs -f $dev -b 512M
> >>
> >> mount $dev $mnt
> >>
> >> fallocate -l 384M $mnt/file1
> >> echo "fallocate success"
> >> sync
> >> dd if=/dev/zero bs=512K  oflag=direct conv=notrunc count=768 of=$mnt/file2
> >>
> > 
> > Wellll we don't do the nocow check _at all_ for O_DIRECT, so mystery solved
> > there.
> 
> Oh, wrong flag, remove that oflag and we still get the same problem.
> 
> fallocate success
> dd: error writing '/mnt/btrfs/file2': No space left on device
> 95+0 records in
> 94+0 records out
> 49283072 bytes (49 MB, 47 MiB) copied, 0.0807034 s, 611 MB/s
> 

Hmph, then I'm not sure, and I've already exceeded my allowed btrfs/things I
enjoy time for this month.  If we really are getting enospc from the data
reservation then it must mean that the nocow check is failing when it shouldn't.
It shouldn't be hard for you to narrow down what's going wrong.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-25 14:13               ` Josef Bacik
@ 2019-04-25 14:16                 ` Qu Wenruo
  2019-04-26 12:47                 ` David Sterba
  1 sibling, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2019-04-25 14:16 UTC (permalink / raw)
  To: Josef Bacik; +Cc: dsterba, Jakob Unterwurzacher, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2241 bytes --]



On 2019/4/25 下午10:13, Josef Bacik wrote:
> On Thu, Apr 25, 2019 at 10:11:41PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/4/25 下午10:09, Josef Bacik wrote:
>>> On Thu, Apr 25, 2019 at 09:50:25PM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2019/4/25 下午9:25, Josef Bacik wrote:
>>>> [snip]
>>>>>>>
>>>>>>> What if the commit is reverted, if the problem is otherwise hard to fix?
>>>>>>> This seems to break the semantics of fallocate so the performance should
>>>>>>> not the main concern here.
>>>>>>
>>>>>
>>>>> Are we sure the ENOSPC is coming from the data reservation?  That change makes
>>>>> us fall back on the old behavior, which means we should still succeed at making
>>>>> the data reservation.
>>>>>
>>>>> However it fallocate() _does not_ guarantee you won't fail the metadata
>>>>> reservation, I suspect that may be what you are running into.
>>>>
>>>> For this script, we only needs 4 file extents at most.
>>>> Even the initial 8M metadata should be pretty enough, thus I don't think
>>>> it's metadata causing the problem.
>>>> ---
>>>> #!/bin/bash
>>>>
>>>> dev=/dev/test/test
>>>> mnt=/mnt/btrfs
>>>>
>>>> mkfs.btrfs -f $dev -b 512M
>>>>
>>>> mount $dev $mnt
>>>>
>>>> fallocate -l 384M $mnt/file1
>>>> echo "fallocate success"
>>>> sync
>>>> dd if=/dev/zero bs=512K  oflag=direct conv=notrunc count=768 of=$mnt/file2
>>>>
>>>
>>> Wellll we don't do the nocow check _at all_ for O_DIRECT, so mystery solved
>>> there.
>>
>> Oh, wrong flag, remove that oflag and we still get the same problem.
>>
>> fallocate success
>> dd: error writing '/mnt/btrfs/file2': No space left on device
>> 95+0 records in
>> 94+0 records out
>> 49283072 bytes (49 MB, 47 MiB) copied, 0.0807034 s, 611 MB/s
>>
> 
> Hmph, then I'm not sure, and I've already exceeded my allowed btrfs/things I
> enjoy time for this month.  If we really are getting enospc from the data
> reservation then it must mean that the nocow check is failing when it shouldn't.
> It shouldn't be hard for you to narrow down what's going wrong.  Thanks,

Yep, no need to bother, I'll pin down the problem.

And thanks again for your idea on the bytes_may_use fix.

Thanks,
Qu

> 
> Josef
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-25  5:49     ` Qu Wenruo
  2019-04-25 13:25       ` Josef Bacik
@ 2019-04-25 14:39       ` Filipe Manana
  1 sibling, 0 replies; 22+ messages in thread
From: Filipe Manana @ 2019-04-25 14:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Jakob Unterwurzacher, linux-btrfs, Josef Bacik

On Thu, Apr 25, 2019 at 9:17 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2019/4/23 下午7:33, David Sterba wrote:
> > On Tue, Apr 23, 2019 at 10:16:32AM +0800, Qu Wenruo wrote:
> >> On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote:
> >>> I have a user who is reporting ENOSPC errors when running gocryptfs on
> >>> top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).
> >>>
> >>> What is interesting is that the error gets thrown at write time. This
> >>> is not supposed to happen, because gocryptfs does
> >>>
> >>>     fallocate(..., FALLOC_FL_KEEP_SIZE, ...)
> >>>
> >>> before writing.
> >>>
> >>> I wrote a minimal reproducer in C: https://github.com/rfjakob/fallocate_write
> >>> This is what it looks like on ext4:
> >>>
> >>>     $ ../fallocate_write/fallocate_write
> >>>     reading from /dev/urandom
> >>>     writing to ./blob.379Q8P
> >>>     writing blocks of 132096 bytes each
> >>>     [...]
> >>>     fallocate failed: No space left on device
> >>>
> >>> On btrfs, it will instead look like this:
> >>>
> >>>     [...]
> >>>     pwrite failed: No space left on device
> >>>
> >>> Is this a bug in btrfs' fallocate implementation or am I reading the
> >>> guarantees that fallocate gives me wrong?
> >>
> >> Since v4.7, this commit changed the how btrfs do NodataCOW check:
> >> c6887cd11149 ("Btrfs: don't do nocow check unless we have to").
> >>
> >> Before that commit, btrfs always check if they need to reserve space for
> >> COW, while after that patch, btrfs never checks unless we have no space.
> >>
> >> However this screws up other nodatacow space check.
> >> And due to its age and deep changeset, it's pretty hard to fix it.
> >> I have tried several times, but it will only cause more problems.
> >
> > What if the commit is reverted, if the problem is otherwise hard to fix?
> > This seems to break the semantics of fallocate so the performance should
> > not the main concern here.
>
> My blur memory of the underflow case is something like below: (failed to
> locate the old thread)
>
> - fallocate
> - pwrite in to the reallocated range
>   At this timing, we can do nocow, thus no data space is reserved.
>
> - Something happened to make that preallocated extent shared, without
>   writing back dirty pages.
>   Some possible causes are snapshot and reflink.
>   However nowadays, snapshots will write all dirty inodes, and reflink
>   will write the source range to disk.

Nowadays? It's like that for 11 years now.
It's been like that in clone since it was introduced (2008, commit
f2eb0a241f0e5c135d93243b0236cb1f14c305e0)
and in the snapshot creation ioctl since 2008 as well (commit
dc17ff8f11d129db9e83ab7244769e4eae05e14d).

>
>   Maybe it's a small window inside create_snapshot() between
>   btrfs_start_delalloc_snapshot() and btrfs_commit_transaction() calls?
>
> - dirty pages get written back
>   We created ordered extent, but at this timing, we can't do nocow any
>   more, we need to fallback to cow.
>   However at the buffered write timing, we didn't reserved data space.
>   Now we will underflow data space reservation.
>
> However nowadays there are some new mechanism to handle this case more
> gracefully, like btrfs_root::will_be_snapshotted.
>
> I'll double check if reverting that patch in latest kernel still cause
> problem.
> But any idea on the possible problem is welcomed.

To me it seems the problem is not yet well formulated, therefore it's
hard to give ideas/suggestions.

The one you pointed isn't related to the issue reported by Jakob,
since it involves only a single file (I couldn't reproduce it anyway).
So what's your explanation for Jakob's test case, which happens for
him on a fresh filesystem with a single file?

I could only see the potential bytes_may_use counter leak issue I
mentioned previously.

Perhaps creating a test case for fstests will make it clear and avoid
so many replies back and forth in this thread and others.

Thanks

>
> Thanks,
> Qu




>
--
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-25 14:11             ` Qu Wenruo
  2019-04-25 14:13               ` Josef Bacik
@ 2019-04-25 14:43               ` Filipe Manana
  2019-04-25 23:16                 ` Qu Wenruo
  1 sibling, 1 reply; 22+ messages in thread
From: Filipe Manana @ 2019-04-25 14:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Josef Bacik, dsterba, Jakob Unterwurzacher, linux-btrfs

On Thu, Apr 25, 2019 at 3:28 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2019/4/25 下午10:09, Josef Bacik wrote:
> > On Thu, Apr 25, 2019 at 09:50:25PM +0800, Qu Wenruo wrote:
> >>
> >>
> >> On 2019/4/25 下午9:25, Josef Bacik wrote:
> >> [snip]
> >>>>>
> >>>>> What if the commit is reverted, if the problem is otherwise hard to fix?
> >>>>> This seems to break the semantics of fallocate so the performance should
> >>>>> not the main concern here.
> >>>>
> >>>
> >>> Are we sure the ENOSPC is coming from the data reservation?  That change makes
> >>> us fall back on the old behavior, which means we should still succeed at making
> >>> the data reservation.
> >>>
> >>> However it fallocate() _does not_ guarantee you won't fail the metadata
> >>> reservation, I suspect that may be what you are running into.
> >>
> >> For this script, we only needs 4 file extents at most.
> >> Even the initial 8M metadata should be pretty enough, thus I don't think
> >> it's metadata causing the problem.
> >> ---
> >> #!/bin/bash
> >>
> >> dev=/dev/test/test
> >> mnt=/mnt/btrfs
> >>
> >> mkfs.btrfs -f $dev -b 512M
> >>
> >> mount $dev $mnt
> >>
> >> fallocate -l 384M $mnt/file1
> >> echo "fallocate success"
> >> sync
> >> dd if=/dev/zero bs=512K  oflag=direct conv=notrunc count=768 of=$mnt/file2
> >>
> >
> > Wellll we don't do the nocow check _at all_ for O_DIRECT, so mystery solved
> > there.
>
> Oh, wrong flag, remove that oflag and we still get the same problem.
>
> fallocate success
> dd: error writing '/mnt/btrfs/file2': No space left on device

I don't get it. Why is this unexpected error?
You created a fs with 512Mb, fallocated 384Mb for a file named file1,
and then tried to write 384Mb 512K * 768 to a file named file2 (i.e. a
different file).
Wasn't the test supposed to write to file1 instead?

> 95+0 records in
> 94+0 records out
> 49283072 bytes (49 MB, 47 MiB) copied, 0.0807034 s, 611 MB/s
>
> Thanks,
> Qu
>
> >
> > Josef
> >
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-25 14:43               ` Filipe Manana
@ 2019-04-25 23:16                 ` Qu Wenruo
  0 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2019-04-25 23:16 UTC (permalink / raw)
  To: fdmanana; +Cc: Josef Bacik, dsterba, Jakob Unterwurzacher, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2246 bytes --]



On 2019/4/25 下午10:43, Filipe Manana wrote:
> On Thu, Apr 25, 2019 at 3:28 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2019/4/25 下午10:09, Josef Bacik wrote:
>>> On Thu, Apr 25, 2019 at 09:50:25PM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2019/4/25 下午9:25, Josef Bacik wrote:
>>>> [snip]
>>>>>>>
>>>>>>> What if the commit is reverted, if the problem is otherwise hard to fix?
>>>>>>> This seems to break the semantics of fallocate so the performance should
>>>>>>> not the main concern here.
>>>>>>
>>>>>
>>>>> Are we sure the ENOSPC is coming from the data reservation?  That change makes
>>>>> us fall back on the old behavior, which means we should still succeed at making
>>>>> the data reservation.
>>>>>
>>>>> However it fallocate() _does not_ guarantee you won't fail the metadata
>>>>> reservation, I suspect that may be what you are running into.
>>>>
>>>> For this script, we only needs 4 file extents at most.
>>>> Even the initial 8M metadata should be pretty enough, thus I don't think
>>>> it's metadata causing the problem.
>>>> ---
>>>> #!/bin/bash
>>>>
>>>> dev=/dev/test/test
>>>> mnt=/mnt/btrfs
>>>>
>>>> mkfs.btrfs -f $dev -b 512M
>>>>
>>>> mount $dev $mnt
>>>>
>>>> fallocate -l 384M $mnt/file1
>>>> echo "fallocate success"
>>>> sync
>>>> dd if=/dev/zero bs=512K  oflag=direct conv=notrunc count=768 of=$mnt/file2
>>>>
>>>
>>> Wellll we don't do the nocow check _at all_ for O_DIRECT, so mystery solved
>>> there.
>>
>> Oh, wrong flag, remove that oflag and we still get the same problem.
>>
>> fallocate success
>> dd: error writing '/mnt/btrfs/file2': No space left on device
> 
> I don't get it. Why is this unexpected error?
> You created a fs with 512Mb, fallocated 384Mb for a file named file1,
> and then tried to write 384Mb 512K * 768 to a file named file2 (i.e. a
> different file).
> Wasn't the test supposed to write to file1 instead?

My brain wasn't working last night. :(

It should write into file1, and it passes in that case.

Thanks,
Qu

> 
>> 95+0 records in
>> 94+0 records out
>> 49283072 bytes (49 MB, 47 MiB) copied, 0.0807034 s, 611 MB/s
>>
>> Thanks,
>> Qu
>>
>>>
>>> Josef
>>>
>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-25 14:13               ` Josef Bacik
  2019-04-25 14:16                 ` Qu Wenruo
@ 2019-04-26 12:47                 ` David Sterba
  1 sibling, 0 replies; 22+ messages in thread
From: David Sterba @ 2019-04-26 12:47 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Qu Wenruo, Jakob Unterwurzacher, linux-btrfs

On Thu, Apr 25, 2019 at 10:13:58AM -0400, Josef Bacik wrote:
> Hmph, then I'm not sure, and I've already exceeded my allowed btrfs/things I
> enjoy time for this month.

Josef, if you don't have time or btrfs has become a side project for
you, then please step down from the maintainer role.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: fallocate does not prevent ENOSPC on write
  2019-04-23 23:56           ` Zygo Blaxell
@ 2019-04-27 11:25             ` Jakob Unterwurzacher
  0 siblings, 0 replies; 22+ messages in thread
From: Jakob Unterwurzacher @ 2019-04-27 11:25 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: fdmanana, Qu Wenruo, dsterba, linux-btrfs

On Wed, Apr 24, 2019 at 1:56 AM Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
>
> 132096 is 129 * 1024, which is not a multiple of 4K.  There will be a CoW
> operation in cases where one 4K block from each pwrite is written twice
> in separate transactions (or with fsync between).

Yes, the writes have odd sizes. This is unfortunate, but it's due to
gocryptfs' encryption overhead.

> fallocate is only going to behave the way posix_fallocate specifies on
> files with datacow turned off.

I see. Thank you!
Jakob

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-04-27 11:25 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-22 21:09 fallocate does not prevent ENOSPC on write Jakob Unterwurzacher
2019-04-23  2:16 ` Qu Wenruo
2019-04-23 11:33   ` David Sterba
2019-04-23 12:12     ` Qu Wenruo
2019-04-23 14:50       ` Filipe Manana
2019-04-23 19:21         ` Jakob Unterwurzacher
2019-04-23 23:56           ` Zygo Blaxell
2019-04-27 11:25             ` Jakob Unterwurzacher
2019-04-23 23:49         ` Qu Wenruo
2019-04-24  9:28           ` Filipe Manana
2019-04-24  9:50             ` Qu Wenruo
2019-04-25  5:49     ` Qu Wenruo
2019-04-25 13:25       ` Josef Bacik
2019-04-25 13:50         ` Qu Wenruo
2019-04-25 14:09           ` Josef Bacik
2019-04-25 14:11             ` Qu Wenruo
2019-04-25 14:13               ` Josef Bacik
2019-04-25 14:16                 ` Qu Wenruo
2019-04-26 12:47                 ` David Sterba
2019-04-25 14:43               ` Filipe Manana
2019-04-25 23:16                 ` Qu Wenruo
2019-04-25 14:39       ` Filipe Manana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.