linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Dunlop <chris@onthe.net.au>
To: Dave Chinner <david@fromorbit.com>
Cc: Eric Sandeen <sandeen@sandeen.net>, linux-xfs@vger.kernel.org
Subject: Re: XFS fallocate implementation incorrectly reports ENOSPC
Date: Fri, 27 Aug 2021 16:53:47 +1000	[thread overview]
Message-ID: <20210827065347.GA3594069@onthe.net.au> (raw)
In-Reply-To: <20210827054956.GP3657114@dread.disaster.area>

G'day Dave,

On Fri, Aug 27, 2021 at 03:49:56PM +1000, Dave Chinner wrote:
> On Fri, Aug 27, 2021 at 12:55:39PM +1000, Chris Dunlop wrote:
>> On Fri, Aug 27, 2021 at 06:56:35AM +1000, Chris Dunlop wrote:
>>> On Thu, Aug 26, 2021 at 10:05:00AM -0500, Eric Sandeen wrote:
>>>> On 8/25/21 9:06 PM, Chris Dunlop wrote:
>>>>>
>>>>> fallocate -l 1GB image.img
>>>>> mkfs.xfs -f image.img
>>>>> mkdir mnt
>>>>> mount -o loop ./image.img mnt
>>>>> fallocate -o 0 -l 700mb mnt/image.img
>>>>> fallocate -o 0 -l 700mb mnt/image.img
>>>>>
>>>>> Why does the second fallocate fail with ENOSPC, and is that considered an XFS bug?
>>>>
>>>> Interesting.  Off the top of my head, I assume that xfs is not looking at
>>>> current file space usage when deciding how much is needed to satisfy the
>>>> fallocate request.  While filesystems can return ENOSPC at any time for
>>>> any reason, this does seem a bit suboptimal.
>>>
>>> Yes, I would have thought the second fallocate should be a noop.
>>
>> On further reflection, "filesystems can return ENOSPC at any time" is
>> certainly something apps need to be prepared for (and in this case, it's
>> doing the right thing, by logging the error and aborting), but it's not
>> really a "not a bug" excuse for the filesystem in all circumstances (or this
>> one?), is it? E.g. a write(fd, buf, 1) returning ENOSPC on an fresh
>> filesystem would be considered a bug, no?
>
> Sure, but the fallocate case here is different. You're asking to
> preallocate up to 700MB of space on a filesystem that only has 300MB
> of space free. Up front, without knowing anything about the layout
> of the file we might need to allocate 700MB of space into, there's a
> very good chance that we'll get ENOSPC partially through the
> operation.

But I'm not asking for more space - the space is already there:

$ filefrag -v mnt/image.img 
Filesystem type is: ef53
File size of mnt/image.img is 700000000 (170899 blocks of 4096 bytes)
  ext:     logical_offset:        physical_offset: length:   expected: flags:
    0:        0..   30719:      34816..     65535:  30720:             unwritten
    1:    30720..   59391:      69632..     98303:  28672:      65536: unwritten
    2:    59392..  122879:     100352..    163839:  63488:      98304: unwritten
    3:   122880..  170898:     165888..    213906:  48019:     163840: last,unwritten,eof
mnt/image.img: 4 extents found

I.e. the fallocate /could/ potentially look at the existing file and 
say "nothing for me do to here".

Of course, that should be pretty easy and quick in this case - but for 
a file with hundereds of thousands of extents and potential holes in 
the midst it would be somewhat less quick and easy. So that's probably 
a good reason for it to fail. Sigh. On the other hand that might be a 
case of "play stupid games, win stupid prizes". On the gripping hand I 
can imagine the emails to the mailing list from people like me asking 
why their "simple" fallocate is taking 20 minutes...

>>>>> Background: I'm chasing a mysterious ENOSPC error on an XFS
>>>>> filesystem with way more space than the app should be asking
>>>>> for. There are no quotas on the fs. Unfortunately it's a third
>>>>> party app and I can't tell what sequence is producing the error,
>>>>> but this fallocate issue is a possibility.
>
> More likely speculative preallocation is causing this than
> fallocate. However, we've had a background worker that cleans up
> speculative prealloc before reporting ENOSPC for a while now - what
> kernel version are seeing this on?

5.10.60. How long is "a while now"? I vaguely recall something about 
that going through.

> Also, it might not even be data allocation that is the issue - if
> the filesystem is full and free space is fragmented, you could be
> getting ENOSPC because inodes cannot be allocated. In which case,
> the output of xfs-info would be useful so we can see if sparse inode
> clusters are enabled or not....

$ xfs_info /chroot
meta-data=/dev/mapper/vg00-chroot isize=512    agcount=32, agsize=244184192 blks
          =                       sectsz=4096  attr=2, projid32bit=1
          =                       crc=1        finobt=1, sparse=1, rmapbt=1
          =                       reflink=1    bigtime=0 inobtcount=0
data     =                       bsize=4096   blocks=7813893120, imaxpct=5
          =                       sunit=128    swidth=512 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
          =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

It's currently fuller than I like:

$ df /chroot
Filesystem                1K-blocks        Used  Available Use% Mounted on
/dev/mapper/vg00-chroot 31253485568 24541378460 6712107108  79% /chroot

...so that's 6.3T free, but this problem was happening with 71% (8.5T) 
free. The /maximum/ the app could conceivably be asking for is around 
1.1T (to entirely duplicate an existing file), but it really shouldn't 
be doing anywhere near that: I can see it doing write-in-place on the 
existing file and should be asking for modest amounts of extention 
(then again, userland developers, so who knows, right? ;-}).

Oh, another reference: this is extensive reflinking happening on this 
filesystem. I don't know if that's a factor. You may remember my 
previous email relating to that:

Extreme fragmentation ho!
https://www.spinics.net/lists/linux-xfs/msg47707.html

I'm excited by my new stracing script prompted by Eric - at least that 
should tell us what precisely is failing. Shame I'm going to have to 
wait a while for it to trigger.


Cheers,

Chris

  reply	other threads:[~2021-08-27  6:53 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-26  2:06 XFS fallocate implementation incorrectly reports ENOSPC Chris Dunlop
2021-08-26 15:05 ` Eric Sandeen
2021-08-26 20:56   ` Chris Dunlop
2021-08-27  2:55     ` Chris Dunlop
2021-08-27  5:49       ` Dave Chinner
2021-08-27  6:53         ` Chris Dunlop [this message]
2021-08-27 22:03           ` Dave Chinner
2021-08-28  0:21             ` Mysterious ENOSPC [was: XFS fallocate implementation incorrectly reports ENOSPC] Chris Dunlop
2021-08-28  3:58               ` Chris Dunlop
2021-08-29 22:04                 ` Dave Chinner
2021-08-30  4:21                   ` Darrick J. Wong
2021-08-30  7:40                     ` Chris Dunlop
2021-08-30  7:37                   ` Mysterious ENOSPC Chris Dunlop
2021-09-02  1:42                     ` Dave Chinner
2021-09-17  6:07                       ` Chris Dunlop

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210827065347.GA3594069@onthe.net.au \
    --to=chris@onthe.net.au \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).