All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: fdmanana@gmail.com
Cc: dsterba@suse.cz, Jakob Unterwurzacher <jakobunt@gmail.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: fallocate does not prevent ENOSPC on write
Date: Wed, 24 Apr 2019 17:50:34 +0800	[thread overview]
Message-ID: <a5519a4b-be00-95b1-0371-8a62f9c19ca9@gmx.com> (raw)
In-Reply-To: <CAL3q7H4cjWJgoNg5C4+KsKFHg66KFmj7CDJiTMM9yBGU-fnjzA@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 4211 bytes --]



On 2019/4/24 下午5:28, Filipe Manana wrote:
[snip]
>>> So what's wrong with it? And how does it cause the ENOSPC?
>>
>> E.g.
>>
>> We have a 128Mb preallocated file extent.
>> And assume the fs only have 128M free data space, meaning 0 remaining
>> space at all.
> 
> That's a contradicting sentence...
> 
>>
>> Then we try to buffer write, which means buffered will just fail as it
>> will need data space.
>>
>> The idea is always here for fallocate/pwrite, just the timing where the
>> ENOSPC happens.
> 
> Can't make sense of that sentence as well.

My bad, that change is already in buffered_write(), so that sentence
makes no sense.
> 
> So I suppose what you are trying to say is that a write into an
> unwritten extent causes space allocation,
> and that can prevent some other write (which is not into an unwritten
> extent) from being able to allocate space and therefore fail.

That's one case.

> 
> That's a valid problem that should be temporary.

I just tried a basic script:
---
#!/bin/bash

dev=/dev/test/test
mnt=/mnt/btrfs

mkfs.btrfs -f $dev -b 512M

mount $dev $mnt

fallocate -l 384M $mnt/file1
echo "fallocate success"
dd if=/dev/zero bs=512K  conv=notrunc count=768 of=$mnt/file2

umount $mnt
---

This fails just like the error report.


At least in current form, if we're writing into the preallocated space,
it indeed skips the data space reservation so it shouldn't cause problem
at that buffered write in theory.


However we have other locations which can reserve data space:
- btrfs_page_mkwrite()
- btrfs_truncate_block()
- btrfs_direct_IO()

Haven't looked into why above script fails, but it should have something
to do with any of the data space reservation.

Thanks,
Qu
> 
> However when allocating space for a write into an unwritten extent (or
> any nodatacow write) we increment the data space info's bytes_may_use
> counter,
> but then if when writeback starts if we don't need to fallback into
> CoW, we end up never decrementing the bytes_may_use counter (even
> after writeback completes), leaking it.
> Not sure if this is the problem you were mentioning or just causing
> other writes to temporarily fail.
> 
> thanks
> 
> 
>>
>>
>> We have btrfs/153 for the same reason to fail for a long time, although
>> it's from quota, but the reason the completely the same.
>>
>> Thanks,
>> Qu
>>
>>>
>>> Trying the reproducer, at least on a 5.0 kernel, does never fail on a
>>> pwrite for me, but always on fallocate:
>>>
>>> $ mkfs.btrfs -f -b $((4 * 1024 * 1024 * 1024)) /dev/sdi
>>> $ mount /dev/sdi /mnt/sdi
>>> $ cd /mnt/sdi
>>> $ /path/to/reproducer
>>> reading from /dev/urandom
>>> writing to ./blob.IIa6tH
>>> writing blocks of 132096 bytes each
>>> total    125 MiB,  65.52 MiB/s
>>> total    251 MiB,  44.59 MiB/s
>>> total    377 MiB,  55.23 MiB/s
>>> total    503 MiB,  66.21 MiB/s
>>> total    629 MiB,  59.97 MiB/s
>>> total    755 MiB,   3.70 MiB/s
>>> total    881 MiB,  50.24 MiB/s
>>> total   1007 MiB,  64.51 MiB/s
>>> total   1133 MiB,  50.70 MiB/s
>>> total   1259 MiB,  49.29 MiB/s
>>> total   1385 MiB,  47.93 MiB/s
>>> total   1511 MiB,   4.00 MiB/s
>>> total   1637 MiB,  49.85 MiB/s
>>> total   1763 MiB,  48.11 MiB/s
>>> total   1889 MiB,  66.62 MiB/s
>>> total   2015 MiB,   5.60 MiB/s
>>> total   2141 MiB,  19.58 MiB/s
>>> total   2267 MiB,  64.80 MiB/s
>>> total   2393 MiB,  13.23 MiB/s
>>> total   2519 MiB,  14.95 MiB/s
>>> fallocate failed: No space left on device
>>>
>>> So either that was tested on a rather old kernel or:
>>>
>>> 1) we had snapshotting happening between a fallocate and a pwrite (or
>>> at the same time as the pwrite)
>>> 2) before the pwrite (or during) the unwritten/prealloc extent was
>>> reflinked (cp --reflink, clone or dedupe ioctls)
>>>
>>> What did I miss here?
>>>
>>> Thanks.
>>>
>>>>
>>>> E.g. reserved space underflow.
>>>>
>>>> I'll find the old thread and retry again.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>> This seems to break the semantics of fallocate so the performance should
>>>>> not the main concern here.
>>>>>
>>>>
>>>
>>>
>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-04-24  9:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-22 21:09 fallocate does not prevent ENOSPC on write Jakob Unterwurzacher
2019-04-23  2:16 ` Qu Wenruo
2019-04-23 11:33   ` David Sterba
2019-04-23 12:12     ` Qu Wenruo
2019-04-23 14:50       ` Filipe Manana
2019-04-23 19:21         ` Jakob Unterwurzacher
2019-04-23 23:56           ` Zygo Blaxell
2019-04-27 11:25             ` Jakob Unterwurzacher
2019-04-23 23:49         ` Qu Wenruo
2019-04-24  9:28           ` Filipe Manana
2019-04-24  9:50             ` Qu Wenruo [this message]
2019-04-25  5:49     ` Qu Wenruo
2019-04-25 13:25       ` Josef Bacik
2019-04-25 13:50         ` Qu Wenruo
2019-04-25 14:09           ` Josef Bacik
2019-04-25 14:11             ` Qu Wenruo
2019-04-25 14:13               ` Josef Bacik
2019-04-25 14:16                 ` Qu Wenruo
2019-04-26 12:47                 ` David Sterba
2019-04-25 14:43               ` Filipe Manana
2019-04-25 23:16                 ` Qu Wenruo
2019-04-25 14:39       ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a5519a4b-be00-95b1-0371-8a62f9c19ca9@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=dsterba@suse.cz \
    --cc=fdmanana@gmail.com \
    --cc=jakobunt@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.