* XFS fallocate implementation incorrectly reports ENOSPC
@ 2021-08-26 2:06 Chris Dunlop
2021-08-26 15:05 ` Eric Sandeen
0 siblings, 1 reply; 15+ messages in thread
From: Chris Dunlop @ 2021-08-26 2:06 UTC (permalink / raw)
To: linux-xfs
Hi,
As reported by Charles Hathaway here (with no resolution):
XFS fallocate implementation incorrectly reports ENOSPC
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1791323
Given this sequence:
fallocate -l 1GB image.img
mkfs.xfs -f image.img
mkdir mnt
mount -o loop ./image.img mnt
fallocate -o 0 -l 700mb mnt/image.img
fallocate -o 0 -l 700mb mnt/image.img
Why does the second fallocate fail with ENOSPC, and is that considered an
XFS bug?
Ext4 is happy to do the second fallocate without error.
Tested on linux-5.10.60
Background: I'm chasing a mysterious ENOSPC error on an XFS filesystem
with way more space than the app should be asking for. There are no quotas
on the fs. Unfortunately it's a third party app and I can't tell what
sequence is producing the error, but this fallocate issue is a
possibility.
Cheers,
Chris
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: XFS fallocate implementation incorrectly reports ENOSPC
2021-08-26 2:06 XFS fallocate implementation incorrectly reports ENOSPC Chris Dunlop
@ 2021-08-26 15:05 ` Eric Sandeen
2021-08-26 20:56 ` Chris Dunlop
0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2021-08-26 15:05 UTC (permalink / raw)
To: Chris Dunlop, linux-xfs
On 8/25/21 9:06 PM, Chris Dunlop wrote:
> Hi,
>
> As reported by Charles Hathaway here (with no resolution):
>
> XFS fallocate implementation incorrectly reports ENOSPC
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1791323
>
> Given this sequence:
>
> fallocate -l 1GB image.img
> mkfs.xfs -f image.img
> mkdir mnt
> mount -o loop ./image.img mnt
> fallocate -o 0 -l 700mb mnt/image.img
> fallocate -o 0 -l 700mb mnt/image.img
>
> Why does the second fallocate fail with ENOSPC, and is that considered an XFS bug?
Interesting. Off the top of my head, I assume that xfs is not looking at
current file space usage when deciding how much is needed to satisfy the
fallocate request. While filesystems can return ENOSPC at any time for
any reason, this does seem a bit suboptimal.
> Ext4 is happy to do the second fallocate without error.
>
> Tested on linux-5.10.60
>
> Background: I'm chasing a mysterious ENOSPC error on an XFS filesystem with way more space than the app should be asking for. There are no quotas on the fs. Unfortunately it's a third party app and I can't tell what sequence is producing the error, but this fallocate issue is a possibility.
Presumably you've tried stracing it and looking for ENOSPC returns from
syscalls?
-Eric
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: XFS fallocate implementation incorrectly reports ENOSPC
2021-08-26 15:05 ` Eric Sandeen
@ 2021-08-26 20:56 ` Chris Dunlop
2021-08-27 2:55 ` Chris Dunlop
0 siblings, 1 reply; 15+ messages in thread
From: Chris Dunlop @ 2021-08-26 20:56 UTC (permalink / raw)
To: Eric Sandeen; +Cc: linux-xfs
On Thu, Aug 26, 2021 at 10:05:00AM -0500, Eric Sandeen wrote:
> On 8/25/21 9:06 PM, Chris Dunlop wrote:
>>
>> fallocate -l 1GB image.img
>> mkfs.xfs -f image.img
>> mkdir mnt
>> mount -o loop ./image.img mnt
>> fallocate -o 0 -l 700mb mnt/image.img
>> fallocate -o 0 -l 700mb mnt/image.img
>>
>> Why does the second fallocate fail with ENOSPC, and is that considered an XFS bug?
>
> Interesting. Off the top of my head, I assume that xfs is not looking at
> current file space usage when deciding how much is needed to satisfy the
> fallocate request. While filesystems can return ENOSPC at any time for
> any reason, this does seem a bit suboptimal.
Yes, I would have thought the second fallocate should be a noop.
>> Background: I'm chasing a mysterious ENOSPC error on an XFS filesystem
>> with way more space than the app should be asking for. There are no
>> quotas on the fs. Unfortunately it's a third party app and I can't tell
>> what sequence is producing the error, but this fallocate issue is a
>> possibility.
>
> Presumably you've tried stracing it and looking for ENOSPC returns from
> syscalls?
That would be an obvious approach. Unfortunately it's not that easy. The
problem is associated with one specific client which is out of my control
so I can't experiment in a controlled environment. The app runs for
several hours in multiple phases, each with multiple threads, and the
problem typically occurs in the early hours of the morning after several
hours of running, so attaching to the correct instance is fraught, and the
strace output will be voluminous.
Cheers,
Chris
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: XFS fallocate implementation incorrectly reports ENOSPC
2021-08-26 20:56 ` Chris Dunlop
@ 2021-08-27 2:55 ` Chris Dunlop
2021-08-27 5:49 ` Dave Chinner
0 siblings, 1 reply; 15+ messages in thread
From: Chris Dunlop @ 2021-08-27 2:55 UTC (permalink / raw)
To: Eric Sandeen; +Cc: linux-xfs
On Fri, Aug 27, 2021 at 06:56:35AM +1000, Chris Dunlop wrote:
> On Thu, Aug 26, 2021 at 10:05:00AM -0500, Eric Sandeen wrote:
>> On 8/25/21 9:06 PM, Chris Dunlop wrote:
>>>
>>> fallocate -l 1GB image.img
>>> mkfs.xfs -f image.img
>>> mkdir mnt
>>> mount -o loop ./image.img mnt
>>> fallocate -o 0 -l 700mb mnt/image.img
>>> fallocate -o 0 -l 700mb mnt/image.img
>>>
>>> Why does the second fallocate fail with ENOSPC, and is that considered an XFS bug?
>>
>> Interesting. Off the top of my head, I assume that xfs is not looking at
>> current file space usage when deciding how much is needed to satisfy the
>> fallocate request. While filesystems can return ENOSPC at any time for
>> any reason, this does seem a bit suboptimal.
>
> Yes, I would have thought the second fallocate should be a noop.
On further reflection, "filesystems can return ENOSPC at any time" is
certainly something apps need to be prepared for (and in this case, it's
doing the right thing, by logging the error and aborting), but it's not
really a "not a bug" excuse for the filesystem in all circumstances (or
this one?), is it? E.g. a write(fd, buf, 1) returning ENOSPC on an fresh
filesystem would be considered a bug, no?
...or maybe your "suboptimal" was entirely tongue in cheek?
>>> Background: I'm chasing a mysterious ENOSPC error on an XFS
>>> filesystem with way more space than the app should be asking for.
>>> There are no quotas on the fs. Unfortunately it's a third party
>>> app and I can't tell what sequence is producing the error, but
>>> this fallocate issue is a possibility.
>>
>> Presumably you've tried stracing it and looking for ENOSPC returns from
>> syscalls?
>
> That would be an obvious approach. Unfortunately it's not that easy.
> The problem is associated with one specific client which is out of my
> control so I can't experiment in a controlled environment. The app
> runs for several hours in multiple phases, each with multiple threads,
> and the problem typically occurs in the early hours of the morning
> after several hours of running, so attaching to the correct instance
> is fraught, and the strace output will be voluminous.
I decided to stop being lazy and look into taking the strace option
further. I can script looking for the right process as it starts up, and
with judicious use of "-Z" for failed calls only, and filtering out
commonly failing syscalls (futex, stat etc.), the output volume is reduced
to just about nothing. This could be the solution - but it'll probably
take a week or so for it to fail again and see if I can catch what's going
on.
Thanks for the inspiration / kick in the pants to get this going.
Strace has grown more options since the last time I looked at the man
page: "-Z" is fantastic!
Cheers,
Chris
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: XFS fallocate implementation incorrectly reports ENOSPC
2021-08-27 2:55 ` Chris Dunlop
@ 2021-08-27 5:49 ` Dave Chinner
2021-08-27 6:53 ` Chris Dunlop
0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2021-08-27 5:49 UTC (permalink / raw)
To: Chris Dunlop; +Cc: Eric Sandeen, linux-xfs
On Fri, Aug 27, 2021 at 12:55:39PM +1000, Chris Dunlop wrote:
> On Fri, Aug 27, 2021 at 06:56:35AM +1000, Chris Dunlop wrote:
> > On Thu, Aug 26, 2021 at 10:05:00AM -0500, Eric Sandeen wrote:
> > > On 8/25/21 9:06 PM, Chris Dunlop wrote:
> > > >
> > > > fallocate -l 1GB image.img
> > > > mkfs.xfs -f image.img
> > > > mkdir mnt
> > > > mount -o loop ./image.img mnt
> > > > fallocate -o 0 -l 700mb mnt/image.img
> > > > fallocate -o 0 -l 700mb mnt/image.img
> > > >
> > > > Why does the second fallocate fail with ENOSPC, and is that considered an XFS bug?
> > >
> > > Interesting. Off the top of my head, I assume that xfs is not looking at
> > > current file space usage when deciding how much is needed to satisfy the
> > > fallocate request. While filesystems can return ENOSPC at any time for
> > > any reason, this does seem a bit suboptimal.
> >
> > Yes, I would have thought the second fallocate should be a noop.
>
> On further reflection, "filesystems can return ENOSPC at any time" is
> certainly something apps need to be prepared for (and in this case, it's
> doing the right thing, by logging the error and aborting), but it's not
> really a "not a bug" excuse for the filesystem in all circumstances (or this
> one?), is it? E.g. a write(fd, buf, 1) returning ENOSPC on an fresh
> filesystem would be considered a bug, no?
Sure, but the fallocate case here is different. You're asking to
preallocate up to 700MB of space on a filesystem that only has 300MB
of space free. Up front, without knowing anything about the layout
of the file we might need to allocate 700MB of space into, there's a
very good chance that we'll get ENOSPC partially through the
operation.
The real problem with preallocation failing part way through due to
overcommit of space is that we can't go back an undo the
allocation(s) made by fallocate because when we get ENOSPC we have
lost all the state of the previous allocations made. If fallocate is
filling holes between unwritten extents already in the file, then we
have no way of knowing where the holes we filled were and hence
cannot reliably free the space we've allocated before ENOSPC was
hit.
Hence if we allow the fallocate to go ahead and preallocate space
until we hit ENOSPC, we still end up returning to userspace with
ENOSPC, but we've also consumed all the remaining space in the
filesystem.
So there's a very good argument for simply rejecting any attempt to
preallocate space that has the possibility of over-committing space
and hence hitting ENOSPC part way through. Given that we spend a lot
of effort in XFS to avoid over-committing resources so that ENOSPC
is reliable and not prone to deadlocks, the choice to make fallocate
avoid a potential over-commit is at least internally consistent with
the XFS ENOSPC architecture.
IOWs, either behaviour could be considered a "bug" because it is
sub-optimal behaviour, but at some point you've got to choose what
is the least worst behaviour and run with it.
> ...or maybe your "suboptimal" was entirely tongue in cheek?
>
> > > > Background: I'm chasing a mysterious ENOSPC error on an XFS
> > > > filesystem with way more space than the app should be asking
> > > > for. There are no quotas on the fs. Unfortunately it's a third
> > > > party app and I can't tell what sequence is producing the error,
> > > > but this fallocate issue is a possibility.
More likely speculative preallocation is causing this than
fallocate. However, we've had a background worker that cleans up
speculative prealloc before reporting ENOSPC for a while now - what
kernel version are seeing this on?
Also, it might not even be data allocation that is the issue - if
the filesystem is full and free space is fragmented, you could be
getting ENOSPC because inodes cannot be allocated. In which case,
the output of xfs-info would be useful so we can see if sparse inode
clusters are enabled or not....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: XFS fallocate implementation incorrectly reports ENOSPC
2021-08-27 5:49 ` Dave Chinner
@ 2021-08-27 6:53 ` Chris Dunlop
2021-08-27 22:03 ` Dave Chinner
0 siblings, 1 reply; 15+ messages in thread
From: Chris Dunlop @ 2021-08-27 6:53 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, linux-xfs
G'day Dave,
On Fri, Aug 27, 2021 at 03:49:56PM +1000, Dave Chinner wrote:
> On Fri, Aug 27, 2021 at 12:55:39PM +1000, Chris Dunlop wrote:
>> On Fri, Aug 27, 2021 at 06:56:35AM +1000, Chris Dunlop wrote:
>>> On Thu, Aug 26, 2021 at 10:05:00AM -0500, Eric Sandeen wrote:
>>>> On 8/25/21 9:06 PM, Chris Dunlop wrote:
>>>>>
>>>>> fallocate -l 1GB image.img
>>>>> mkfs.xfs -f image.img
>>>>> mkdir mnt
>>>>> mount -o loop ./image.img mnt
>>>>> fallocate -o 0 -l 700mb mnt/image.img
>>>>> fallocate -o 0 -l 700mb mnt/image.img
>>>>>
>>>>> Why does the second fallocate fail with ENOSPC, and is that considered an XFS bug?
>>>>
>>>> Interesting. Off the top of my head, I assume that xfs is not looking at
>>>> current file space usage when deciding how much is needed to satisfy the
>>>> fallocate request. While filesystems can return ENOSPC at any time for
>>>> any reason, this does seem a bit suboptimal.
>>>
>>> Yes, I would have thought the second fallocate should be a noop.
>>
>> On further reflection, "filesystems can return ENOSPC at any time" is
>> certainly something apps need to be prepared for (and in this case, it's
>> doing the right thing, by logging the error and aborting), but it's not
>> really a "not a bug" excuse for the filesystem in all circumstances (or this
>> one?), is it? E.g. a write(fd, buf, 1) returning ENOSPC on an fresh
>> filesystem would be considered a bug, no?
>
> Sure, but the fallocate case here is different. You're asking to
> preallocate up to 700MB of space on a filesystem that only has 300MB
> of space free. Up front, without knowing anything about the layout
> of the file we might need to allocate 700MB of space into, there's a
> very good chance that we'll get ENOSPC partially through the
> operation.
But I'm not asking for more space - the space is already there:
$ filefrag -v mnt/image.img
Filesystem type is: ef53
File size of mnt/image.img is 700000000 (170899 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 30719: 34816.. 65535: 30720: unwritten
1: 30720.. 59391: 69632.. 98303: 28672: 65536: unwritten
2: 59392.. 122879: 100352.. 163839: 63488: 98304: unwritten
3: 122880.. 170898: 165888.. 213906: 48019: 163840: last,unwritten,eof
mnt/image.img: 4 extents found
I.e. the fallocate /could/ potentially look at the existing file and
say "nothing for me do to here".
Of course, that should be pretty easy and quick in this case - but for
a file with hundereds of thousands of extents and potential holes in
the midst it would be somewhat less quick and easy. So that's probably
a good reason for it to fail. Sigh. On the other hand that might be a
case of "play stupid games, win stupid prizes". On the gripping hand I
can imagine the emails to the mailing list from people like me asking
why their "simple" fallocate is taking 20 minutes...
>>>>> Background: I'm chasing a mysterious ENOSPC error on an XFS
>>>>> filesystem with way more space than the app should be asking
>>>>> for. There are no quotas on the fs. Unfortunately it's a third
>>>>> party app and I can't tell what sequence is producing the error,
>>>>> but this fallocate issue is a possibility.
>
> More likely speculative preallocation is causing this than
> fallocate. However, we've had a background worker that cleans up
> speculative prealloc before reporting ENOSPC for a while now - what
> kernel version are seeing this on?
5.10.60. How long is "a while now"? I vaguely recall something about
that going through.
> Also, it might not even be data allocation that is the issue - if
> the filesystem is full and free space is fragmented, you could be
> getting ENOSPC because inodes cannot be allocated. In which case,
> the output of xfs-info would be useful so we can see if sparse inode
> clusters are enabled or not....
$ xfs_info /chroot
meta-data=/dev/mapper/vg00-chroot isize=512 agcount=32, agsize=244184192 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=1
= reflink=1 bigtime=0 inobtcount=0
data = bsize=4096 blocks=7813893120, imaxpct=5
= sunit=128 swidth=512 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=521728, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
It's currently fuller than I like:
$ df /chroot
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg00-chroot 31253485568 24541378460 6712107108 79% /chroot
...so that's 6.3T free, but this problem was happening with 71% (8.5T)
free. The /maximum/ the app could conceivably be asking for is around
1.1T (to entirely duplicate an existing file), but it really shouldn't
be doing anywhere near that: I can see it doing write-in-place on the
existing file and should be asking for modest amounts of extention
(then again, userland developers, so who knows, right? ;-}).
Oh, another reference: this is extensive reflinking happening on this
filesystem. I don't know if that's a factor. You may remember my
previous email relating to that:
Extreme fragmentation ho!
https://www.spinics.net/lists/linux-xfs/msg47707.html
I'm excited by my new stracing script prompted by Eric - at least that
should tell us what precisely is failing. Shame I'm going to have to
wait a while for it to trigger.
Cheers,
Chris
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: XFS fallocate implementation incorrectly reports ENOSPC
2021-08-27 6:53 ` Chris Dunlop
@ 2021-08-27 22:03 ` Dave Chinner
2021-08-28 0:21 ` Mysterious ENOSPC [was: XFS fallocate implementation incorrectly reports ENOSPC] Chris Dunlop
0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2021-08-27 22:03 UTC (permalink / raw)
To: Chris Dunlop; +Cc: Eric Sandeen, linux-xfs
On Fri, Aug 27, 2021 at 04:53:47PM +1000, Chris Dunlop wrote:
> G'day Dave,
>
> On Fri, Aug 27, 2021 at 03:49:56PM +1000, Dave Chinner wrote:
> > On Fri, Aug 27, 2021 at 12:55:39PM +1000, Chris Dunlop wrote:
> > > On Fri, Aug 27, 2021 at 06:56:35AM +1000, Chris Dunlop wrote:
> > > > On Thu, Aug 26, 2021 at 10:05:00AM -0500, Eric Sandeen wrote:
> > > > > On 8/25/21 9:06 PM, Chris Dunlop wrote:
> > > > > >
> > > > > > fallocate -l 1GB image.img
> > > > > > mkfs.xfs -f image.img
> > > > > > mkdir mnt
> > > > > > mount -o loop ./image.img mnt
> > > > > > fallocate -o 0 -l 700mb mnt/image.img
> > > > > > fallocate -o 0 -l 700mb mnt/image.img
> > > > > >
> > > > > > Why does the second fallocate fail with ENOSPC, and is that considered an XFS bug?
> > > > >
> > > > > Interesting. Off the top of my head, I assume that xfs is not looking at
> > > > > current file space usage when deciding how much is needed to satisfy the
> > > > > fallocate request. While filesystems can return ENOSPC at any time for
> > > > > any reason, this does seem a bit suboptimal.
> > > >
> > > > Yes, I would have thought the second fallocate should be a noop.
> > >
> > > On further reflection, "filesystems can return ENOSPC at any time" is
> > > certainly something apps need to be prepared for (and in this case, it's
> > > doing the right thing, by logging the error and aborting), but it's not
> > > really a "not a bug" excuse for the filesystem in all circumstances (or this
> > > one?), is it? E.g. a write(fd, buf, 1) returning ENOSPC on an fresh
> > > filesystem would be considered a bug, no?
> >
> > Sure, but the fallocate case here is different. You're asking to
> > preallocate up to 700MB of space on a filesystem that only has 300MB
> > of space free. Up front, without knowing anything about the layout
> > of the file we might need to allocate 700MB of space into, there's a
> > very good chance that we'll get ENOSPC partially through the
> > operation.
>
> But I'm not asking for more space - the space is already there:
"Up front, without knowing anything about the layout of the file..."
[....]
> Sigh. On the other hand that might be a case of "play stupid
> games, win stupid prizes". On the gripping hand I can imagine the emails to
> the mailing list from people like me asking why their "simple" fallocate is
> taking 20 minutes...
Yup, we have to chose between behaviours people will complain about.
We chose the behaviour that doesn't happen except on really small
filesystems because, in practice, we almost never see production
workloads asking to fallocate() more than half the entire filesystem
capacity at a time.....
> > > > > > Background: I'm chasing a mysterious ENOSPC error on an XFS
> > > > > > filesystem with way more space than the app should be asking
> > > > > > for. There are no quotas on the fs. Unfortunately it's a third
> > > > > > party app and I can't tell what sequence is producing the error,
> > > > > > but this fallocate issue is a possibility.
> >
> > More likely speculative preallocation is causing this than
> > fallocate. However, we've had a background worker that cleans up
> > speculative prealloc before reporting ENOSPC for a while now - what
> > kernel version are seeing this on?
>
> 5.10.60. How long is "a while now"? I vaguely recall something about that
> going through.
Longer than that.
> > Also, it might not even be data allocation that is the issue - if
> > the filesystem is full and free space is fragmented, you could be
> > getting ENOSPC because inodes cannot be allocated. In which case,
> > the output of xfs-info would be useful so we can see if sparse inode
> > clusters are enabled or not....
>
> $ xfs_info /chroot
> meta-data=/dev/mapper/vg00-chroot isize=512 agcount=32, agsize=244184192 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=1 finobt=1, sparse=1, rmapbt=1
> = reflink=1 bigtime=0 inobtcount=0
> data = bsize=4096 blocks=7813893120, imaxpct=5
> = sunit=128 swidth=512 blks
> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> log =internal log bsize=4096 blocks=521728, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> It's currently fuller than I like:
>
> $ df /chroot
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/mapper/vg00-chroot 31253485568 24541378460 6712107108 79% /chroot
>
> ...so that's 6.3T free, but this problem was happening with 71% (8.5T) free.
> The /maximum/ the app could conceivably be asking for is around 1.1T (to
> entirely duplicate an existing file), but it really shouldn't be doing
> anywhere near that: I can see it doing write-in-place on the existing file
> and should be asking for modest amounts of extention (then again, userland
> developers, so who knows, right? ;-}).
>
> Oh, another reference: this is extensive reflinking happening on this
> filesystem. I don't know if that's a factor. You may remember my previous
> email relating to that:
>
> Extreme fragmentation ho!
> https://www.spinics.net/lists/linux-xfs/msg47707.html
Ah. Details that are likely extremely important. The workload,
layout problems and ephemeral ENOSPC symptoms match the description
of the problem that was fixed by the series of commits that went
into 5.13 that ended in this one:
commit fd43cf600cf61c66ae0a1021aca2f636115c7fcb
Author: Brian Foster <bfoster@redhat.com>
Date: Wed Apr 28 15:06:05 2021 -0700
xfs: set aside allocation btree blocks from block reservation
The blocks used for allocation btrees (bnobt and countbt) are
technically considered free space. This is because as free space is
used, allocbt blocks are removed and naturally become available for
traditional allocation. However, this means that a significant
portion of free space may consist of in-use btree blocks if free
space is severely fragmented.
On large filesystems with large perag reservations, this can lead to
a rare but nasty condition where a significant amount of physical
free space is available, but the majority of actual usable blocks
consist of in-use allocbt blocks. We have a record of a (~12TB, 32
AG) filesystem with multiple AGs in a state with ~2.5GB or so free
blocks tracked across ~300 total allocbt blocks, but effectively at
100% full because the the free space is entirely consumed by
refcountbt perag reservation.
Such a large perag reservation is by design on large filesystems.
The problem is that because the free space is so fragmented, this AG
contributes the 300 or so allocbt blocks to the global counters as
free space. If this pattern repeats across enough AGs, the
filesystem lands in a state where global block reservation can
outrun physical block availability. For example, a streaming
buffered write on the affected filesystem continues to allow delayed
allocation beyond the point where writeback starts to fail due to
physical block allocation failures. The expected behavior is for the
delalloc block reservation to fail gracefully with -ENOSPC before
physical block allocation failure is a possibility.
To address this problem, set aside in-use allocbt blocks at
reservation time and thus ensure they cannot be reserved until truly
available for physical allocation. This allows alloc btree metadata
to continue to reside in free space, but dynamically adjusts
reservation availability based on internal state. Note that the
logic requires that the allocbt counter is fully populated at
reservation time before it is fully effective. We currently rely on
the mount time AGF scan in the perag reservation initialization code
for this dependency on filesystems where it's most important (i.e.
with active perag reservations).
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Mysterious ENOSPC [was: XFS fallocate implementation incorrectly reports ENOSPC]
2021-08-27 22:03 ` Dave Chinner
@ 2021-08-28 0:21 ` Chris Dunlop
2021-08-28 3:58 ` Chris Dunlop
0 siblings, 1 reply; 15+ messages in thread
From: Chris Dunlop @ 2021-08-28 0:21 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, linux-xfs
On Sat, Aug 28, 2021 at 08:03:43AM +1000, Dave Chinner wrote:
> On Fri, Aug 27, 2021 at 04:53:47PM +1000, Chris Dunlop wrote:
>> On 8/25/21 9:06 PM, Chris Dunlop wrote:
>>> Background: I'm chasing a mysterious ENOSPC error on an XFS
>>> filesystem with way more space than the app should be asking
>>> for. There are no quotas on the fs. Unfortunately it's a third
>>> party app and I can't tell what sequence is producing the error,
>>> but this fallocate issue is a possibility.
>>
>> Oh, another reference: this is extensive reflinking happening on this
>> filesystem.
>
> Ah. Details that are likely extremely important. The workload,
> layout problems and ephemeral ENOSPC symptoms match the description
> of the problem that was fixed by the series of commits that went
> into 5.13 that ended in this one:
>
> commit fd43cf600cf61c66ae0a1021aca2f636115c7fcb
> Author: Brian Foster <bfoster@redhat.com>
> Date: Wed Apr 28 15:06:05 2021 -0700
>
> xfs: set aside allocation btree blocks from block reservation
Oh wow. Yes, sounds like a candidate. Is there same easy(-ish?) way of
seeing if this fs is likely to be suffering from this particular issue or
is it a matter of installing an appropriate kernel and seeing if the
problem goes away?
The job getting this ENOSPC error is one of 45 similar jobs, and it's the
only one getting the error. There doesn't seem to be anything special
about this job, it's main file where the writes are going is the 9th
largest (up to 1.8T), and it has a lot of extents (842G split into 750M
extents) but not as many as some others (e.g. 809G split into 1G extents).
That said, the app works in mysterious ways so this particular job may be
a special snowflake in some unobvious manner.
Cheers,
Chris
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Mysterious ENOSPC [was: XFS fallocate implementation incorrectly reports ENOSPC]
2021-08-28 0:21 ` Mysterious ENOSPC [was: XFS fallocate implementation incorrectly reports ENOSPC] Chris Dunlop
@ 2021-08-28 3:58 ` Chris Dunlop
2021-08-29 22:04 ` Dave Chinner
0 siblings, 1 reply; 15+ messages in thread
From: Chris Dunlop @ 2021-08-28 3:58 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, linux-xfs
On Sat, Aug 28, 2021 at 10:21:37AM +1000, Chris Dunlop wrote:
> On Sat, Aug 28, 2021 at 08:03:43AM +1000, Dave Chinner wrote:
>> commit fd43cf600cf61c66ae0a1021aca2f636115c7fcb
>> Author: Brian Foster <bfoster@redhat.com>
>> Date: Wed Apr 28 15:06:05 2021 -0700
>>
>> xfs: set aside allocation btree blocks from block reservation
>
> Oh wow. Yes, sounds like a candidate. Is there same easy(-ish?) way of
> seeing if this fs is likely to be suffering from this particular issue
> or is it a matter of installing an appropriate kernel and seeing if
> the problem goes away?
Is this sufficient to tell us that this filesystem probably isn't suffering
from that issue?
$ sudo xfs_db -r -c 'freesp -s' /dev/mapper/vg00-chroot
from to extents blocks pct
1 1 74943 74943 0.00
2 3 71266 179032 0.01
4 7 155670 855072 0.04
8 15 304838 3512336 0.17
16 31 613606 14459417 0.72
32 63 1043230 47413004 2.35
64 127 1130921 106646418 5.29
128 255 1043683 188291054 9.34
256 511 576818 200011819 9.93
512 1023 328790 230908212 11.46
1024 2047 194784 276975084 13.75
2048 4095 119242 341977975 16.97
4096 8191 72903 406955899 20.20
8192 16383 5991 67763286 3.36
16384 32767 1431 31354803 1.56
32768 65535 310 14366959 0.71
65536 131071 122 10838153 0.54
131072 262143 87 15901152 0.79
262144 524287 44 17822179 0.88
524288 1048575 16 12482310 0.62
1048576 2097151 14 20897049 1.04
4194304 8388607 1 5213142 0.26
total free extents 5738710
total free blocks 2014899298
average free extent size 351.107
Or from:
How to tell how fragmented the free space is on an XFS filesystem?
https://www.suse.com/support/kb/doc/?id=000018219
Based on xfs_info "agcount=32":
$ {
for AGNO in {0..31}; do
sudo /usr/sbin/xfs_db -r -c "freesp -s -a $AGNO" /dev/mapper/vg00-chroot > /tmp/ag${AGNO}.txt
done
grep -h '^average free extent size' /tmp/ag*.txt | sort -k5n | head -n5
echo --
grep -h '^average free extent size' /tmp/ag*.txt | sort -k5n | tail -n5
}
average free extent size 66.7806
average free extent size 79.201
average free extent size 80.221
average free extent size 87.595
average free extent size 103.079
--
average free extent size 898.962
average free extent size 906.709
average free extent size 1001.18
average free extent size 1849.23
average free extent size 2782.75
Even those ags with the lowest average free extent size are higher than what
the web page suggests is "an AG in fairly good shape".
Cheers,
Chris
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Mysterious ENOSPC [was: XFS fallocate implementation incorrectly reports ENOSPC]
2021-08-28 3:58 ` Chris Dunlop
@ 2021-08-29 22:04 ` Dave Chinner
2021-08-30 4:21 ` Darrick J. Wong
2021-08-30 7:37 ` Mysterious ENOSPC Chris Dunlop
0 siblings, 2 replies; 15+ messages in thread
From: Dave Chinner @ 2021-08-29 22:04 UTC (permalink / raw)
To: Chris Dunlop; +Cc: Eric Sandeen, linux-xfs
On Sat, Aug 28, 2021 at 01:58:24PM +1000, Chris Dunlop wrote:
> On Sat, Aug 28, 2021 at 10:21:37AM +1000, Chris Dunlop wrote:
> > On Sat, Aug 28, 2021 at 08:03:43AM +1000, Dave Chinner wrote:
> > > commit fd43cf600cf61c66ae0a1021aca2f636115c7fcb
> > > Author: Brian Foster <bfoster@redhat.com>
> > > Date: Wed Apr 28 15:06:05 2021 -0700
> > >
> > > xfs: set aside allocation btree blocks from block reservation
> >
> > Oh wow. Yes, sounds like a candidate. Is there same easy(-ish?) way of
> > seeing if this fs is likely to be suffering from this particular issue
> > or is it a matter of installing an appropriate kernel and seeing if the
> > problem goes away?
>
> Is this sufficient to tell us that this filesystem probably isn't suffering
> from that issue?
IIRC, it's the per-ag histograms that are more important here
because we are running out of space in an AG because of
overcommitting the per-ag space. If there is an AG that is much more
fragmented than others, then it will be consuming much more in way
of freespace btree blocks than others...
FWIW, if you are using reflink heavily and you have rmap enabled (as
you have), there's every chance that an AG has completely run out of
space and so new rmap records for shared extents can't be allocated
- that can give you spurious ENOSPC errors before the filesystem is
100% full, too.
i.e. every shared extent in the filesystem has a rmap record
pointing back to each owner of the shared extent. That means for an
extent shared 1000 times, there are 1000 rmap records for that
shared extent. If you share it again, a new rmap record needs to be
inserted into the rmapbt, and if the AG is completely out of space
this can fail w/ ENOSPC. Hence you can get ENOSPC errors attempting
to shared or unshare extents because there isn't space in the AG for
the tracking metadata for the new extent record....
> $ sudo xfs_db -r -c 'freesp -s' /dev/mapper/vg00-chroot
> from to extents blocks pct
> 1 1 74943 74943 0.00
> 2 3 71266 179032 0.01
> 4 7 155670 855072 0.04
> 8 15 304838 3512336 0.17
> 16 31 613606 14459417 0.72
> 32 63 1043230 47413004 2.35
> 64 127 1130921 106646418 5.29
> 128 255 1043683 188291054 9.34
> 256 511 576818 200011819 9.93
> 512 1023 328790 230908212 11.46
> 1024 2047 194784 276975084 13.75
> 2048 4095 119242 341977975 16.97
> 4096 8191 72903 406955899 20.20 8192 16383 5991 67763286
> 3.36
> 16384 32767 1431 31354803 1.56
> 32768 65535 310 14366959 0.71 65536 131071 122 10838153
> 0.54 131072 262143 87 15901152 0.79
> 262144 524287 44 17822179 0.88
> 524288 1048575 16 12482310 0.62
> 1048576 2097151 14 20897049 1.04
> 4194304 8388607 1 5213142 0.26
> total free extents 5738710
> total free blocks 2014899298
> average free extent size 351.107
So 5.7M freespace records. Assume perfect packing an thats roughly
500 records to a btree block so at least 10,000 freespace btree
blocks in the filesytem. But we really need to see the per-ag
histograms to be able to make any meaningful analysis of the free
space layout in the filesystem....
> Or from:
>
> How to tell how fragmented the free space is on an XFS filesystem?
> https://www.suse.com/support/kb/doc/?id=000018219
>
> Based on xfs_info "agcount=32":
>
> $ {
> for AGNO in {0..31}; do
> sudo /usr/sbin/xfs_db -r -c "freesp -s -a $AGNO" /dev/mapper/vg00-chroot > /tmp/ag${AGNO}.txt
> done
> grep -h '^average free extent size' /tmp/ag*.txt | sort -k5n | head -n5
> echo --
> grep -h '^average free extent size' /tmp/ag*.txt | sort -k5n | tail -n5
> }
> average free extent size 66.7806
Average size by itself isn't actually useful for analysis. The
histogram is what gives us all the necessary information. e.g. this
could be a thousand single block extents and one 65000 block extent
or it could be a million 64k extents. The former is pretty good, the
latter is awful (indicates likely worst case 64kB extent
fragmentation behaviour), because ....
> Even those ags with the lowest average free extent size are higher than what
> the web page suggests is "an AG in fairly good shape".
... the kb article completely glosses over the fact that we really
have to consider the histogram those averages are dervied from
before making a judgement on the state of the AG. It equates
"average extent size" with "fragmented AG", when in reality there's
a whole lot more to consider such as number of free extents, the
size of the AG, the amount of free space being indexed, the nature
of the workload and the allocations it requires, etc.
e.g. I'd consider the "AG greatly fragmented" case given in that KB
article to be perfectly fine if the workload is random 4KB writes
and hole punching to manage space in sparse files (perhaps, say,
lots of raw VM image files and guests have -o discard enabled). In
those cases, there's a huge number of viable allocation candidates
in the free space that can be found quickly and efficiently as
there's no possibility of large contiguous extents being formed for
user data because the IO patterns are small random writes into
sparse files...
Context is very important when trying to determine if free space
fragmentation is an issue or not. Most of the time, it isn't an
issue at all but people have generally been trained to think "all
fragmentation is bad" rather than "only worry about fragmentation if
there is a problem that is directly related to physical allocation
patterns"...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Mysterious ENOSPC [was: XFS fallocate implementation incorrectly reports ENOSPC]
2021-08-29 22:04 ` Dave Chinner
@ 2021-08-30 4:21 ` Darrick J. Wong
2021-08-30 7:40 ` Chris Dunlop
2021-08-30 7:37 ` Mysterious ENOSPC Chris Dunlop
1 sibling, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2021-08-30 4:21 UTC (permalink / raw)
To: Dave Chinner; +Cc: Chris Dunlop, Eric Sandeen, linux-xfs
On Mon, Aug 30, 2021 at 08:04:57AM +1000, Dave Chinner wrote:
> On Sat, Aug 28, 2021 at 01:58:24PM +1000, Chris Dunlop wrote:
> > On Sat, Aug 28, 2021 at 10:21:37AM +1000, Chris Dunlop wrote:
> > > On Sat, Aug 28, 2021 at 08:03:43AM +1000, Dave Chinner wrote:
> > > > commit fd43cf600cf61c66ae0a1021aca2f636115c7fcb
> > > > Author: Brian Foster <bfoster@redhat.com>
> > > > Date: Wed Apr 28 15:06:05 2021 -0700
> > > >
> > > > xfs: set aside allocation btree blocks from block reservation
> > >
> > > Oh wow. Yes, sounds like a candidate. Is there same easy(-ish?) way of
> > > seeing if this fs is likely to be suffering from this particular issue
> > > or is it a matter of installing an appropriate kernel and seeing if the
> > > problem goes away?
> >
> > Is this sufficient to tell us that this filesystem probably isn't suffering
> > from that issue?
Since you've formatted with rmapbt enabled, you probably have a new
enough xfsprogs that you can /also/ use this on a live fs:
$ xfs_spaceman -c 'freesp -g' /
AG extents blocks
0 2225 1426437
1 2201 1716114
2 2635 1196409
3 2307 1567751
And if you really want the per-AG histogram...
$ xfs_spaceman -c 'freesp -s -a 2' /
from to extents blocks pct
1 1 262 262 0.02
2 3 240 551 0.05
4 7 306 1740 0.15
8 15 370 4194 0.35
16 31 563 13286 1.11
32 63 362 16926 1.41
64 127 271 22729 1.90
128 255 112 20234 1.69
256 511 82 30446 2.54
512 1023 36 26021 2.17
1024 2047 20 29074 2.43
2048 4095 5 13499 1.13
4096 8191 2 9550 0.80
8192 16383 1 14484 1.21
16384 32767 2 50101 4.19
65536 131071 1 68649 5.74
524288 1048575 1 874663 73.11
total free extents 2636
total free blocks 1196409
average free extent size 453.873
--D
> IIRC, it's the per-ag histograms that are more important here
> because we are running out of space in an AG because of
> overcommitting the per-ag space. If there is an AG that is much more
> fragmented than others, then it will be consuming much more in way
> of freespace btree blocks than others...
>
> FWIW, if you are using reflink heavily and you have rmap enabled (as
> you have), there's every chance that an AG has completely run out of
> space and so new rmap records for shared extents can't be allocated
> - that can give you spurious ENOSPC errors before the filesystem is
> 100% full, too.
>
> i.e. every shared extent in the filesystem has a rmap record
> pointing back to each owner of the shared extent. That means for an
> extent shared 1000 times, there are 1000 rmap records for that
> shared extent. If you share it again, a new rmap record needs to be
> inserted into the rmapbt, and if the AG is completely out of space
> this can fail w/ ENOSPC. Hence you can get ENOSPC errors attempting
> to shared or unshare extents because there isn't space in the AG for
> the tracking metadata for the new extent record....
>
> > $ sudo xfs_db -r -c 'freesp -s' /dev/mapper/vg00-chroot
> > from to extents blocks pct
> > 1 1 74943 74943 0.00
> > 2 3 71266 179032 0.01
> > 4 7 155670 855072 0.04
> > 8 15 304838 3512336 0.17
> > 16 31 613606 14459417 0.72
> > 32 63 1043230 47413004 2.35
> > 64 127 1130921 106646418 5.29
> > 128 255 1043683 188291054 9.34
> > 256 511 576818 200011819 9.93
> > 512 1023 328790 230908212 11.46
> > 1024 2047 194784 276975084 13.75
> > 2048 4095 119242 341977975 16.97
> > 4096 8191 72903 406955899 20.20 8192 16383 5991 67763286
> > 3.36
> > 16384 32767 1431 31354803 1.56
> > 32768 65535 310 14366959 0.71 65536 131071 122 10838153
> > 0.54 131072 262143 87 15901152 0.79
> > 262144 524287 44 17822179 0.88
> > 524288 1048575 16 12482310 0.62
> > 1048576 2097151 14 20897049 1.04
> > 4194304 8388607 1 5213142 0.26
> > total free extents 5738710
> > total free blocks 2014899298
> > average free extent size 351.107
>
> So 5.7M freespace records. Assume perfect packing an thats roughly
> 500 records to a btree block so at least 10,000 freespace btree
> blocks in the filesytem. But we really need to see the per-ag
> histograms to be able to make any meaningful analysis of the free
> space layout in the filesystem....
>
> > Or from:
> >
> > How to tell how fragmented the free space is on an XFS filesystem?
> > https://www.suse.com/support/kb/doc/?id=000018219
> >
> > Based on xfs_info "agcount=32":
> >
> > $ {
> > for AGNO in {0..31}; do
> > sudo /usr/sbin/xfs_db -r -c "freesp -s -a $AGNO" /dev/mapper/vg00-chroot > /tmp/ag${AGNO}.txt
> > done
> > grep -h '^average free extent size' /tmp/ag*.txt | sort -k5n | head -n5
> > echo --
> > grep -h '^average free extent size' /tmp/ag*.txt | sort -k5n | tail -n5
> > }
> > average free extent size 66.7806
>
> Average size by itself isn't actually useful for analysis. The
> histogram is what gives us all the necessary information. e.g. this
> could be a thousand single block extents and one 65000 block extent
> or it could be a million 64k extents. The former is pretty good, the
> latter is awful (indicates likely worst case 64kB extent
> fragmentation behaviour), because ....
>
> > Even those ags with the lowest average free extent size are higher than what
> > the web page suggests is "an AG in fairly good shape".
>
> ... the kb article completely glosses over the fact that we really
> have to consider the histogram those averages are dervied from
> before making a judgement on the state of the AG. It equates
> "average extent size" with "fragmented AG", when in reality there's
> a whole lot more to consider such as number of free extents, the
> size of the AG, the amount of free space being indexed, the nature
> of the workload and the allocations it requires, etc.
>
> e.g. I'd consider the "AG greatly fragmented" case given in that KB
> article to be perfectly fine if the workload is random 4KB writes
> and hole punching to manage space in sparse files (perhaps, say,
> lots of raw VM image files and guests have -o discard enabled). In
> those cases, there's a huge number of viable allocation candidates
> in the free space that can be found quickly and efficiently as
> there's no possibility of large contiguous extents being formed for
> user data because the IO patterns are small random writes into
> sparse files...
>
> Context is very important when trying to determine if free space
> fragmentation is an issue or not. Most of the time, it isn't an
> issue at all but people have generally been trained to think "all
> fragmentation is bad" rather than "only worry about fragmentation if
> there is a problem that is directly related to physical allocation
> patterns"...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Mysterious ENOSPC
2021-08-29 22:04 ` Dave Chinner
2021-08-30 4:21 ` Darrick J. Wong
@ 2021-08-30 7:37 ` Chris Dunlop
2021-09-02 1:42 ` Dave Chinner
1 sibling, 1 reply; 15+ messages in thread
From: Chris Dunlop @ 2021-08-30 7:37 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, linux-xfs
[-- Attachment #1: Type: text/plain, Size: 6503 bytes --]
On Mon, Aug 30, 2021 at 08:04:57AM +1000, Dave Chinner wrote:
> On Sat, Aug 28, 2021 at 01:58:24PM +1000, Chris Dunlop wrote:
>> On Sat, Aug 28, 2021 at 10:21:37AM +1000, Chris Dunlop wrote:
>>> On Sat, Aug 28, 2021 at 08:03:43AM +1000, Dave Chinner wrote:
>>>> commit fd43cf600cf61c66ae0a1021aca2f636115c7fcb
>>>> Author: Brian Foster <bfoster@redhat.com>
>>>> Date: Wed Apr 28 15:06:05 2021 -0700
>>>>
>>>> xfs: set aside allocation btree blocks from block reservation
>>>
>>> Oh wow. Yes, sounds like a candidate. Is there same easy(-ish?) way of
>>> seeing if this fs is likely to be suffering from this particular issue
>>> or is it a matter of installing an appropriate kernel and seeing if the
>>> problem goes away?
>>
>> Is this sufficient to tell us that this filesystem probably isn't suffering
>> from that issue?
>
> IIRC, it's the per-ag histograms that are more important here
> because we are running out of space in an AG because of
> overcommitting the per-ag space. If there is an AG that is much more
> fragmented than others, then it will be consuming much more in way
> of freespace btree blocks than others...
Per-ag histograms attached.
Do the blocks used by the allocation btrees show up in the AG
histograms? E.g. with an AG like this:
AG 18
from to extents blocks pct
1 1 1961 1961 0.01
2 3 17129 42602 0.11
4 7 33374 183312 0.48
8 15 68076 783020 2.06
16 31 146868 3469398 9.14
32 63 248690 10614558 27.96
64 127 32088 2798748 7.37
128 255 8654 1492521 3.93
256 511 4227 1431586 3.77
512 1023 2531 1824377 4.81
1024 2047 2125 3076304 8.10
2048 4095 1615 4691302 12.36
4096 8191 1070 6062351 15.97
8192 16383 139 1454627 3.83
16384 32767 2 41359 0.11
total free extents 568549
total free blocks 37968026
average free extent size 66.7806
...it looks like it's significantly fragmented, but, if the allocation
btrees aren't part of this, it seems there's still sufficient free
space that it shouldn't be getting to ENOSPC?
> FWIW, if you are using reflink heavily and you have rmap enabled (as
> you have), there's every chance that an AG has completely run out of
> space and so new rmap records for shared extents can't be allocated
> - that can give you spurious ENOSPC errors before the filesystem is
> 100% full, too.
This doesn't seem to be the case for this fs as we seem to have "free"
space in all the AGs, IFF the allocation btrees aren't included in the
per-AG reported free space.
> i.e. every shared extent in the filesystem has a rmap record
> pointing back to each owner of the shared extent. That means for an
> extent shared 1000 times, there are 1000 rmap records for that
> shared extent. If you share it again, a new rmap record needs to be
> inserted into the rmapbt, and if the AG is completely out of space
> this can fail w/ ENOSPC. Hence you can get ENOSPC errors attempting
> to shared or unshare extents because there isn't space in the AG for
> the tracking metadata for the new extent record....
FYI, in this particular fs the reflinks have low owner counts: I think
most of the extents are single owner, and the vast majority (and
perhaps all of) of the multi-owner extents have only 2 owners. I don't
think there would be any with more than, say, 3 owners.
Out of interest: if an multi-reflinked extent is reduced down to one
owner is that extent then removed from the reflink btree?
>> $ sudo xfs_db -r -c 'freesp -s' /dev/mapper/vg00-chroot
>> from to extents blocks pct
>> 1 1 74943 74943 0.00
>> 2 3 71266 179032 0.01
>> 4 7 155670 855072 0.04
>> 8 15 304838 3512336 0.17
>> 16 31 613606 14459417 0.72
>> 32 63 1043230 47413004 2.35
>> 64 127 1130921 106646418 5.29
>> 128 255 1043683 188291054 9.34
>> 256 511 576818 200011819 9.93
>> 512 1023 328790 230908212 11.46
>> 1024 2047 194784 276975084 13.75
>> 2048 4095 119242 341977975 16.97
>> 4096 8191 72903 406955899 20.20
>> 8192 16383 5991 67763286 3.36
>> 16384 32767 1431 31354803 1.56
>> 32768 65535 310 14366959 0.71
>> 65536 131071 122 10838153 0.54
>> 131072 262143 87 15901152 0.79
>> 262144 524287 44 17822179 0.88
>> 524288 1048575 16 12482310 0.62
>> 1048576 2097151 14 20897049 1.04
>> 4194304 8388607 1 5213142 0.26
>> total free extents 5738710
>> total free blocks 2014899298
>> average free extent size 351.107
>
> So 5.7M freespace records. Assume perfect packing an thats roughly
> 500 records to a btree block so at least 10,000 freespace btree
> blocks in the filesytem. But we really need to see the per-ag
> histograms to be able to make any meaningful analysis of the free
> space layout in the filesystem....
See attached for per-ag histograms.
> Context is very important when trying to determine if free space
> fragmentation is an issue or not. Most of the time, it isn't an
> issue at all but people have generally been trained to think "all
> fragmentation is bad" rather than "only worry about fragmentation if
> there is a problem that is directly related to physical allocation
> patterns"...
In this case it's a typical backup application: it uploads regular
incremental files and those are later merged into a full backup file,
either by extending or overwriting or reflinking depending on whether the
app decides to use reflinks or not. The uploads are sequential and mostly
large-ish writes (132K+), then the merge is small to medium size randomish
writes or reflinks (4K-???). So the smaller writes/reflinks are going to
create a significant amount of fragmentation. The incremental files are
removed entirely at some later time (no discard involved).
I guess if it's determined this pattern is critically suboptimal and
causing this errant ENOSPC issue, and the changes in 5.13 don't help,
there's nothing to stop me from occasionally doing a full (non-reflink)
copy of the large full backup files into another file to get them nicely
sequential. I'd lose any reflinks along the way of course, but they don't
last a long time anyway (days to a few weeks) depending on how long the
smaller incremental files are kept.
Cheers,
Chris
[-- Attachment #2: freesp-per-ag.txt --]
[-- Type: text/plain, Size: 24313 bytes --]
AG 0
from to extents blocks pct
1 1 5215 5215 0.00
2 3 2095 4778 0.00
4 7 1870 10786 0.01
8 15 4696 53963 0.05
16 31 8157 192210 0.18
32 63 14912 707014 0.65
64 127 31799 3052928 2.79
128 255 62759 11177040 10.22
256 511 51477 17851082 16.33
512 1023 31651 22157880 20.26
1024 2047 15353 21439837 19.61
2048 4095 6229 17536785 16.04
4096 8191 2416 13086492 11.97
8192 16383 144 1500294 1.37
16384 32767 21 461837 0.42
32768 65535 3 108715 0.10
total free extents 238797
total free blocks 109346856
average free extent size 457.907
AG 1
from to extents blocks pct
1 1 2395 2395 0.00
2 3 988 2299 0.00
4 7 2101 12169 0.01
8 15 4150 46624 0.04
16 31 9433 228624 0.19
32 63 16784 784775 0.67
64 127 28022 2665137 2.26
128 255 47302 8960792 7.60
256 511 36831 13355167 11.33
512 1023 29405 21185337 17.98
1024 2047 18508 26248652 22.27
2048 4095 8667 24290355 20.61
4096 8191 3239 17611659 14.95
8192 16383 187 1969888 1.67
16384 32767 16 314583 0.27
32768 65535 4 161187 0.14
total free extents 208032
total free blocks 117839643
average free extent size 566.45
AG 2
from to extents blocks pct
1 1 792 792 0.00
2 3 1391 3490 0.01
4 7 10670 57044 0.09
8 15 13407 156686 0.24
16 31 20931 491161 0.74
32 63 37588 1774854 2.69
64 127 75778 7268636 11.01
128 255 79386 13883934 21.04
256 511 44863 15288638 23.17
512 1023 17691 12086170 18.31
1024 2047 5478 7497222 11.36
2048 4095 1470 4030441 6.11
4096 8191 400 2179114 3.30
8192 16383 80 907597 1.38
16384 32767 15 309075 0.47
32768 65535 1 61508 0.09
total free extents 309941
total free blocks 65996362
average free extent size 212.932
AG 3
from to extents blocks pct
1 1 29 29 0.00
2 3 546 1392 0.00
4 7 3695 19778 0.04
8 15 5115 59644 0.11
16 31 8650 203188 0.39
32 63 19446 931805 1.78
64 127 33918 3206707 6.12
128 255 41472 7109466 13.56
256 511 17943 6233330 11.89
512 1023 9813 6800951 12.97
1024 2047 5172 7319826 13.96
2048 4095 3156 8975995 17.12
4096 8191 1841 10198723 19.46
8192 16383 94 1092849 2.08
16384 32767 13 267832 0.51
total free extents 150903
total free blocks 52421515
average free extent size 347.386
AG 4
from to extents blocks pct
1 1 3456 3456 0.00
2 3 500 1139 0.00
4 7 569 3705 0.00
8 15 4786 55972 0.07
16 31 9612 224801 0.28
32 63 19371 919993 1.14
64 127 34259 3236421 3.99
128 255 62935 10890988 13.44
256 511 41997 14096666 17.39
512 1023 21973 14775110 18.23
1024 2047 6946 9350039 11.54
2048 4095 2680 8009016 9.88
4096 8191 3185 17772132 21.93
8192 16383 119 1383976 1.71
16384 32767 14 283682 0.35
32768 65535 1 37998 0.05
total free extents 212403
total free blocks 81045094
average free extent size 381.563
AG 5
from to extents blocks pct
1 1 3724 3724 0.00
2 3 700 1556 0.00
4 7 117 604 0.00
8 15 2299 27842 0.03
16 31 5116 118536 0.14
32 63 10153 479910 0.55
64 127 17573 1659991 1.91
128 255 48504 8454216 9.74
256 511 33549 11365292 13.10
512 1023 16890 11524528 13.28
1024 2047 8227 11463511 13.21
2048 4095 4975 14744962 16.99
4096 8191 4363 24030370 27.69
8192 16383 235 2623084 3.02
16384 32767 14 281041 0.32
total free extents 156439
total free blocks 86779167
average free extent size 554.716
AG 6
from to extents blocks pct
1 1 1674 1674 0.00
2 3 355 813 0.00
4 7 2715 16326 0.03
8 15 7931 88289 0.19
16 31 13045 305893 0.65
32 63 24945 1181407 2.52
64 127 41067 3842176 8.21
128 255 32329 6311725 13.49
256 511 18891 6828744 14.59
512 1023 9908 6967594 14.89
1024 2047 4350 6045470 12.92
2048 4095 2193 6439871 13.76
4096 8191 1461 7911939 16.90
8192 16383 68 771093 1.65
16384 32767 4 92454 0.20
total free extents 160936
total free blocks 46805468
average free extent size 290.833
AG 7
from to extents blocks pct
1 1 2619 2619 0.01
2 3 8411 22859 0.05
4 7 24615 135628 0.29
8 15 52117 602928 1.27
16 31 113925 2689338 5.68
32 63 130210 5632031 11.89
64 127 70649 6543643 13.81
128 255 34479 6648866 14.03
256 511 12599 4479515 9.45
512 1023 4811 3368174 7.11
1024 2047 2103 3005162 6.34
2048 4095 1634 5032217 10.62
4096 8191 1350 7678961 16.21
8192 16383 122 1317350 2.78
16384 32767 10 221553 0.47
total free extents 459654
total free blocks 47380844
average free extent size 103.079
AG 8
from to extents blocks pct
1 1 3356 3356 0.00
2 3 1201 2823 0.00
4 7 3239 17470 0.02
8 15 5367 61388 0.08
16 31 10163 239796 0.31
32 63 16501 762075 0.97
64 127 23574 2217722 2.84
128 255 25811 4591149 5.87
256 511 17518 6205779 7.94
512 1023 11425 8119641 10.38
1024 2047 8511 12248970 15.67
2048 4095 6239 17951935 22.96
4096 8191 3798 21258331 27.19
8192 16383 298 3316518 4.24
16384 32767 47 1032837 1.32
32768 65535 4 159307 0.20
total free extents 137052
total free blocks 78189097
average free extent size 570.507
AG 9
from to extents blocks pct
1 1 1658 1658 0.00
2 3 671 1583 0.00
4 7 3038 17804 0.02
8 15 7393 84680 0.08
16 31 14733 346835 0.34
32 63 26665 1247101 1.23
64 127 42681 4044159 3.99
128 255 27529 5020861 4.95
256 511 17799 6408209 6.32
512 1023 13490 9703531 9.57
1024 2047 10811 15690391 15.48
2048 4095 8127 23294001 22.98
4096 8191 4988 27939154 27.56
8192 16383 408 4660404 4.60
16384 32767 96 2050928 2.02
32768 65535 18 852224 0.84
total free extents 180105
total free blocks 101363523
average free extent size 562.802
AG 10
from to extents blocks pct
1 1 966 966 0.00
2 3 746 2040 0.00
4 7 6188 34186 0.04
8 15 13586 157397 0.18
16 31 33591 800147 0.93
32 63 47124 2046100 2.38
64 127 17498 1627896 1.90
128 255 22847 4329426 5.04
256 511 17337 6308302 7.35
512 1023 12951 9341029 10.88
1024 2047 9293 13400500 15.61
2048 4095 6398 18424400 21.47
4096 8191 3745 20831876 24.27
8192 16383 316 3677143 4.28
16384 32767 120 2682190 3.13
32768 65535 25 1167323 1.36
65536 131071 11 992269 1.16
total free extents 192742
total free blocks 85823190
average free extent size 445.275
AG 11
from to extents blocks pct
1 1 1299 1299 0.00
2 3 323 708 0.00
4 7 3575 19776 0.03
8 15 7735 89139 0.13
16 31 17713 420384 0.62
32 63 26712 1166375 1.73
64 127 14584 1334644 1.97
128 255 16415 3058179 4.52
256 511 13009 4603114 6.81
512 1023 9239 6592614 9.75
1024 2047 7659 11116956 16.45
2048 4095 5350 15410257 22.80
4096 8191 3331 18748075 27.74
8192 16383 208 2429051 3.59
16384 32767 55 1244321 1.84
32768 65535 12 505639 0.75
65536 131071 10 855948 1.27
total free extents 127229
total free blocks 67596479
average free extent size 531.298
AG 12
from to extents blocks pct
1 1 3005 3005 0.00
2 3 714 1597 0.00
4 7 187 945 0.00
8 15 1305 16178 0.02
16 31 2951 69749 0.08
32 63 7235 345651 0.38
64 127 12451 1169316 1.29
128 255 17704 3131500 3.47
256 511 17974 6108480 6.76
512 1023 12145 8411406 9.31
1024 2047 10674 15329545 16.97
2048 4095 8003 22792418 25.23
4096 8191 4873 27235851 30.14
8192 16383 362 4212767 4.66
16384 32767 64 1314405 1.45
32768 65535 5 212528 0.24
total free extents 99652
total free blocks 90355341
average free extent size 906.709
AG 13
from to extents blocks pct
1 1 2160 2160 0.00
2 3 519 1163 0.00
4 7 484 3058 0.00
8 15 2409 27772 0.03
16 31 4278 100415 0.11
32 63 9579 457724 0.51
64 127 16929 1587860 1.76
128 255 17847 3225558 3.57
256 511 16287 5626056 6.23
512 1023 11821 8380892 9.28
1024 2047 10335 14995088 16.60
2048 4095 7862 22594531 25.01
4096 8191 4758 26754739 29.61
8192 16383 420 4805199 5.32
16384 32767 71 1495249 1.65
32768 65535 7 291647 0.32
total free extents 105766
total free blocks 90349111
average free extent size 854.236
AG 14
from to extents blocks pct
1 1 1381 1381 0.00
2 3 245 547 0.00
4 7 283 1808 0.00
8 15 2387 27720 0.03
16 31 4178 97402 0.10
32 63 8758 415855 0.42
64 127 15409 1452075 1.46
128 255 21170 4098561 4.12
256 511 17815 6425768 6.46
512 1023 12923 9287495 9.34
1024 2047 11645 16908465 17.00
2048 4095 8668 25031077 25.16
4096 8191 5335 29855758 30.01
8192 16383 380 4260079 4.28
16384 32767 72 1495166 1.50
32768 65535 3 112759 0.11
total free extents 110652
total free blocks 99471916
average free extent size 898.962
AG 15
from to extents blocks pct
1 1 207 207 0.00
2 3 519 1471 0.02
4 7 1978 10867 0.13
8 15 3736 42434 0.50
16 31 6604 154719 1.83
32 63 13689 653865 7.73
64 127 24824 2356818 27.86
128 255 21639 3771966 44.59
256 511 1990 611208 7.23
512 1023 157 105129 1.24
1024 2047 74 107559 1.27
2048 4095 153 377991 4.47
4096 8191 27 163987 1.94
8192 16383 9 101213 1.20
total free extents 75606
total free blocks 8459434
average free extent size 111.888
AG 16
from to extents blocks pct
1 1 1140 1140 0.00
2 3 758 1750 0.00
4 7 1018 5639 0.01
8 15 1759 19882 0.02
16 31 3172 75580 0.08
32 63 8026 384525 0.40
64 127 13894 1294275 1.36
128 255 16508 3126070 3.27
256 511 13907 5090005 5.33
512 1023 11460 8377061 8.77
1024 2047 9811 14258785 14.93
2048 4095 7805 22582010 23.65
4096 8191 5407 30571003 32.02
8192 16383 551 6374167 6.68
16384 32767 156 3255515 3.41
32768 65535 2 69263 0.07
total free extents 95374
total free blocks 95486670
average free extent size 1001.18
AG 17
from to extents blocks pct
1 1 3555 3555 0.02
2 3 3384 8530 0.04
4 7 6883 37356 0.20
8 15 11694 133428 0.70
16 31 33349 776859 4.10
32 63 69828 3159981 16.67
64 127 78168 7397203 39.01
128 255 24793 4100867 21.63
256 511 3287 1064507 5.61
512 1023 816 550030 2.90
1024 2047 321 453787 2.39
2048 4095 167 457272 2.41
4096 8191 77 412315 2.17
8192 16383 28 309618 1.63
16384 32767 4 95252 0.50
total free extents 236354
total free blocks 18960560
average free extent size 80.221
AG 18
from to extents blocks pct
1 1 1961 1961 0.01
2 3 17129 42602 0.11
4 7 33374 183312 0.48
8 15 68076 783020 2.06
16 31 146868 3469398 9.14
32 63 248690 10614558 27.96
64 127 32088 2798748 7.37
128 255 8654 1492521 3.93
256 511 4227 1431586 3.77
512 1023 2531 1824377 4.81
1024 2047 2125 3076304 8.10
2048 4095 1615 4691302 12.36
4096 8191 1070 6062351 15.97
8192 16383 139 1454627 3.83
16384 32767 2 41359 0.11
total free extents 568549
total free blocks 37968026
average free extent size 66.7806
AG 19
from to extents blocks pct
1 1 107 107 0.00
2 3 1571 4670 0.01
4 7 4283 22880 0.06
8 15 6402 73460 0.19
16 31 13908 333208 0.88
32 63 30597 1431708 3.78
64 127 55680 5363294 14.16
128 255 47437 8647119 22.83
256 511 17362 5680534 15.00
512 1023 5096 3578314 9.45
1024 2047 2397 3364622 8.89
2048 4095 1309 3712093 9.80
4096 8191 715 3963154 10.47
8192 16383 143 1501527 3.97
16384 32767 9 191537 0.51
total free extents 187016
total free blocks 37868227
average free extent size 202.487
AG 20
from to extents blocks pct
1 1 598 598 0.00
2 3 827 2187 0.01
4 7 3608 19728 0.05
8 15 6881 79046 0.21
16 31 13022 305537 0.80
32 63 25486 1210959 3.17
64 127 49017 4712712 12.34
128 255 30156 5748421 15.05
256 511 10155 3495836 9.15
512 1023 5674 4149553 10.86
1024 2047 3253 4652429 12.18
2048 4095 1900 5476465 14.34
4096 8191 1208 6732503 17.63
8192 16383 105 1198304 3.14
16384 32767 16 338049 0.89
32768 65535 2 74999 0.20
total free extents 151908
total free blocks 38197326
average free extent size 251.45
AG 21
from to extents blocks pct
1 1 20 20 0.00
2 3 28 63 0.00
4 7 115 649 0.01
8 15 244 2809 0.03
16 31 453 10626 0.11
32 63 1265 61288 0.63
64 127 2405 228896 2.35
128 255 3165 577983 5.93
256 511 1913 647654 6.64
512 1023 1038 714076 7.32
1024 2047 700 1028270 10.55
2048 4095 698 2035971 20.88
4096 8191 514 2981234 30.58
8192 16383 74 834862 8.56
16384 32767 15 325690 3.34
32768 65535 7 299123 3.07
total free extents 12654
total free blocks 9749214
average free extent size 770.445
AG 22
from to extents blocks pct
1 1 181 181 0.00
2 3 181 485 0.00
4 7 471 2569 0.00
8 15 1133 12622 0.02
16 31 1660 38976 0.06
32 63 2968 138446 0.23
64 127 5624 534816 0.88
128 255 6238 1013851 1.67
256 511 4898 1598605 2.63
512 1023 3271 2265332 3.73
1024 2047 2349 3367324 5.54
2048 4095 1707 4927265 8.10
4096 8191 1258 7233661 11.90
8192 16383 416 4681490 7.70
16384 32767 240 5417597 8.91
32768 65535 142 6879800 11.31
65536 131071 73 6599297 10.85
131072 262143 55 9907812 16.30
262144 524287 15 6182398 10.17
total free extents 32880
total free blocks 60802527
average free extent size 1849.23
AG 23
from to extents blocks pct
1 1 90 90 0.00
2 3 73 174 0.00
4 7 260 1427 0.00
8 15 639 7159 0.01
16 31 1103 26041 0.03
32 63 2083 99039 0.12
64 127 4417 423805 0.52
128 255 6331 1033005 1.28
256 511 5124 1670066 2.07
512 1023 3618 2499924 3.09
1024 2047 2480 3492362 4.32
2048 4095 1509 4278094 5.30
4096 8191 996 5688231 7.04
8192 16383 124 1403821 1.74
16384 32767 35 805104 1.00
32768 65535 29 1301420 1.61
65536 131071 26 2259171 2.80
131072 262143 31 5862140 7.26
262144 524287 30 11971903 14.82
524288 1048575 15 11844396 14.66
1048576 2097151 14 20897049 25.87
4194304 8388607 1 5213142 6.45
total free extents 29028
total free blocks 80777563
average free extent size 2782.75
AG 24
from to extents blocks pct
1 1 670 670 0.00
2 3 1385 3465 0.01
4 7 4049 22284 0.04
8 15 7957 91548 0.15
16 31 18216 435852 0.72
32 63 50655 2550157 4.20
64 127 82684 7564495 12.46
128 255 48304 8493950 14.00
256 511 21102 7373519 12.15
512 1023 10541 7401799 12.20
1024 2047 5096 7179159 11.83
2048 4095 2525 7157989 11.79
4096 8191 1378 7677669 12.65
8192 16383 119 1376478 2.27
16384 32767 103 2424330 3.99
32768 65535 15 672757 1.11
65536 131071 2 131468 0.22
131072 262143 1 131200 0.22
total free extents 254802
total free blocks 60688789
average free extent size 238.18
AG 25
from to extents blocks pct
1 1 405 405 0.00
2 3 165 378 0.00
4 7 743 4356 0.01
8 15 1868 21572 0.03
16 31 3909 93678 0.12
32 63 8919 425029 0.55
64 127 15412 1455339 1.90
128 255 24492 4427828 5.77
256 511 19279 6661076 8.68
512 1023 13237 9346109 12.17
1024 2047 9003 12812201 16.69
2048 4095 5778 16383579 21.34
4096 8191 3442 19072421 24.84
8192 16383 248 2911859 3.79
16384 32767 99 2365870 3.08
32768 65535 18 795680 1.04
total free extents 107017
total free blocks 76777380
average free extent size 717.432
AG 26
from to extents blocks pct
1 1 99 99 0.00
2 3 298 817 0.00
4 7 1677 8992 0.01
8 15 2826 32643 0.04
16 31 5761 136023 0.15
32 63 12636 615792 0.66
64 127 18809 1774258 1.91
128 255 28196 4707461 5.06
256 511 24529 8159340 8.77
512 1023 16408 11363082 12.21
1024 2047 10999 15601527 16.76
2048 4095 7504 21234282 22.81
4096 8191 4545 25548241 27.45
8192 16383 254 2849284 3.06
16384 32767 34 733619 0.79
32768 65535 6 307024 0.33
total free extents 134581
total free blocks 93072484
average free extent size 691.572
AG 27
from to extents blocks pct
1 1 10879 10879 0.02
2 3 11703 30182 0.06
4 7 15365 82109 0.16
8 15 21594 249932 0.50
16 31 28409 655518 1.31
32 63 43497 2049403 4.08
64 127 72680 6904421 13.76
128 255 102300 19405736 38.67
256 511 26206 9273138 18.48
512 1023 8191 5686275 11.33
1024 2047 2047 2745613 5.47
2048 4095 489 1365775 2.72
4096 8191 180 993540 1.98
8192 16383 30 320798 0.64
16384 32767 16 368624 0.73
32768 65535 1 34557 0.07
total free extents 343587
total free blocks 50176500
average free extent size 146.037
AG 28
from to extents blocks pct
1 1 7552 7552 0.01
2 3 3098 7292 0.01
4 7 2771 15821 0.02
8 15 7104 82312 0.10
16 31 10221 237417 0.29
32 63 20206 967751 1.16
64 127 35069 3342138 4.02
128 255 72234 13584381 16.35
256 511 37385 13205348 15.89
512 1023 19299 13455515 16.19
1024 2047 8191 11466542 13.80
2048 4095 3650 10478167 12.61
4096 8191 2434 13579353 16.34
8192 16383 162 1867524 2.25
16384 32767 27 595097 0.72
32768 65535 4 207899 0.25
total free extents 229407
total free blocks 83100109
average free extent size 362.239
AG 29
from to extents blocks pct
1 1 7059 7059 0.04
2 3 6270 15470 0.09
4 7 9456 50893 0.30
8 15 15264 175419 1.03
16 31 26440 618832 3.64
32 63 44154 2089707 12.28
64 127 80897 7696115 45.23
128 255 20391 3384375 19.89
256 511 3974 1229352 7.22
512 1023 626 410718 2.41
1024 2047 141 193862 1.14
2048 4095 59 170185 1.00
4096 8191 85 473185 2.78
8192 16383 21 239248 1.41
16384 32767 10 208379 1.22
32768 65535 1 53376 0.31
total free extents 214848
total free blocks 17016175
average free extent size 79.201
AG 30
from to extents blocks pct
1 1 1672 1672 0.03
2 3 1073 2577 0.05
4 7 1202 6461 0.13
8 15 1751 19741 0.39
16 31 2830 65939 1.29
32 63 4589 216879 4.25
64 127 8443 801744 15.71
128 255 5988 1023450 20.05
256 511 2230 737877 14.46
512 1023 714 495411 9.71
1024 2047 377 536218 10.51
2048 4095 212 611170 11.98
4096 8191 85 478388 9.37
8192 16383 7 86683 1.70
16384 32767 1 19328 0.38
total free extents 31174
total free blocks 5103538
average free extent size 163.711
AG 31
from to extents blocks pct
1 1 4225 4225 0.02
2 3 3496 8376 0.05
4 7 5808 32657 0.19
8 15 12158 139830 0.80
16 31 22809 534239 3.07
32 63 42627 2017708 11.60
64 127 79484 7548752 43.41
128 255 22140 3540716 20.36
256 511 4862 1476486 8.49
512 1023 568 364917 2.10
1024 2047 151 204945 1.18
2048 4095 58 168429 0.97
4096 8191 74 435946 2.51
8192 16383 30 379043 2.18
16384 32767 27 532823 3.06
total free extents 198517
total free blocks 17389092
average free extent size 87.595
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Mysterious ENOSPC [was: XFS fallocate implementation incorrectly reports ENOSPC]
2021-08-30 4:21 ` Darrick J. Wong
@ 2021-08-30 7:40 ` Chris Dunlop
0 siblings, 0 replies; 15+ messages in thread
From: Chris Dunlop @ 2021-08-30 7:40 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Dave Chinner, Eric Sandeen, linux-xfs
On Sun, Aug 29, 2021 at 09:21:18PM -0700, Darrick J. Wong wrote:
> Since you've formatted with rmapbt enabled, you probably have a new
> enough xfsprogs that you can /also/ use this on a live fs:
Yep, I put on xfsprogs 5.12.0 to look into all of this.
> $ xfs_spaceman -c 'freesp -g' /
...
> $ xfs_spaceman -c 'freesp -s -a 2' /
Tks, that's useful.
Cheers,
Chris
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Mysterious ENOSPC
2021-08-30 7:37 ` Mysterious ENOSPC Chris Dunlop
@ 2021-09-02 1:42 ` Dave Chinner
2021-09-17 6:07 ` Chris Dunlop
0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2021-09-02 1:42 UTC (permalink / raw)
To: Chris Dunlop; +Cc: Eric Sandeen, linux-xfs
On Mon, Aug 30, 2021 at 05:37:20PM +1000, Chris Dunlop wrote:
> On Mon, Aug 30, 2021 at 08:04:57AM +1000, Dave Chinner wrote:
> > On Sat, Aug 28, 2021 at 01:58:24PM +1000, Chris Dunlop wrote:
> > > On Sat, Aug 28, 2021 at 10:21:37AM +1000, Chris Dunlop wrote:
> > > > On Sat, Aug 28, 2021 at 08:03:43AM +1000, Dave Chinner wrote:
> > > > > commit fd43cf600cf61c66ae0a1021aca2f636115c7fcb
> > > > > Author: Brian Foster <bfoster@redhat.com>
> > > > > Date: Wed Apr 28 15:06:05 2021 -0700
> > > > >
> > > > > xfs: set aside allocation btree blocks from block reservation
> > > >
> > > > Oh wow. Yes, sounds like a candidate. Is there same easy(-ish?) way of
> > > > seeing if this fs is likely to be suffering from this particular issue
> > > > or is it a matter of installing an appropriate kernel and seeing if the
> > > > problem goes away?
> > >
> > > Is this sufficient to tell us that this filesystem probably isn't suffering
> > > from that issue?
> >
> > IIRC, it's the per-ag histograms that are more important here
> > because we are running out of space in an AG because of
> > overcommitting the per-ag space. If there is an AG that is much more
> > fragmented than others, then it will be consuming much more in way
> > of freespace btree blocks than others...
>
> Per-ag histograms attached.
>
> Do the blocks used by the allocation btrees show up in the AG histograms?
> E.g. with an AG like this:
>
> AG 18
> from to extents blocks pct
> 1 1 1961 1961 0.01
> 2 3 17129 42602 0.11
> 4 7 33374 183312 0.48
> 8 15 68076 783020 2.06
> 16 31 146868 3469398 9.14
> 32 63 248690 10614558 27.96
> 64 127 32088 2798748 7.37
> 128 255 8654 1492521 3.93
> 256 511 4227 1431586 3.77
> 512 1023 2531 1824377 4.81
> 1024 2047 2125 3076304 8.10
> 2048 4095 1615 4691302 12.36
> 4096 8191 1070 6062351 15.97
> 8192 16383 139 1454627 3.83
> 16384 32767 2 41359 0.11
> total free extents 568549
> total free blocks 37968026
> average free extent size 66.7806
>
> ...it looks like it's significantly fragmented, but, if the allocation
> btrees aren't part of this, it seems there's still sufficient free space
> that it shouldn't be getting to ENOSPC?
Unless something asks for ~120GB of space to be allocated from the
AG, and then it will have only a small amount of free space and
could trigger such issues.
As you said, this is difficult to reproduce, so the current state of
the FS is unlikely to be in the exact state that triggers the
problem. What I'm looking at is whether the underlying conditions
are present that could potentially lead to that sort of problem
occuring
> > Context is very important when trying to determine if free space
> > fragmentation is an issue or not. Most of the time, it isn't an
> > issue at all but people have generally been trained to think "all
> > fragmentation is bad" rather than "only worry about fragmentation if
> > there is a problem that is directly related to physical allocation
> > patterns"...
>
> In this case it's a typical backup application: it uploads regular
> incremental files and those are later merged into a full backup file, either
> by extending or overwriting or reflinking depending on whether the app
> decides to use reflinks or not. The uploads are sequential and mostly
> large-ish writes (132K+), then the merge is small to medium size randomish
> writes or reflinks (4K-???). So the smaller writes/reflinks are going to
> create a significant amount of fragmentation. The incremental files are
> removed entirely at some later time (no discard involved).
IOWs, sets of data with different layouts and temporal
characteristics. Yup, that will cause fragmentation over time and
slowly prevent recovery of large free spaces as files are deleted.
The AG histograms largely reflect this.
> I guess if it's determined this pattern is critically suboptimal and causing
> this errant ENOSPC issue, and the changes in 5.13 don't help, there's
> nothing to stop me from occasionally doing a full (non-reflink) copy of the
> large full backup files into another file to get them nicely sequential. I'd
> lose any reflinks along the way of course, but they don't last a long time
> anyway (days to a few weeks) depending on how long the smaller incremental
> files are kept.
IOWs, you suggest defragmenting the file data. You could do that
transparently with xfs_fsr, but defragmenting data doesn't actually
fix free space fragmentation - it actually makes it worse. This is
inherent in the defragmentation algorithm - small used spaces get
turned into small free spaces and large free spaces get turned into
large used spaces.
Defragmenting free space is a whole lot harder, and it involves
identifying where free space is interleaved with data and then
moving that data to other free space so the small free spaces are
reconnected into a large free space. Defragmenting data is easy,
defragmenting free space is much harder...
> AG 15
> from to extents blocks pct
> 1 1 207 207 0.00
> 2 3 519 1471 0.02
> 4 7 1978 10867 0.13
> 8 15 3736 42434 0.50
> 16 31 6604 154719 1.83
> 32 63 13689 653865 7.73
> 64 127 24824 2356818 27.86
> 128 255 21639 3771966 44.59
> 256 511 1990 611208 7.23
> 512 1023 157 105129 1.24
> 1024 2047 74 107559 1.27
> 2048 4095 153 377991 4.47
> 4096 8191 27 163987 1.94
> 8192 16383 9 101213 1.20
> total free extents 75606
> total free blocks 8459434
> average free extent size 111.888
This is the AG is a candidate - it's only got ~35GB of free space
in it and has significant free space fragmentation - at least 160
freespace btree blocks per btree in this AG.
> AG 30
> from to extents blocks pct
> 1 1 1672 1672 0.03
> 2 3 1073 2577 0.05
> 4 7 1202 6461 0.13
> 8 15 1751 19741 0.39
> 16 31 2830 65939 1.29
> 32 63 4589 216879 4.25
> 64 127 8443 801744 15.71
> 128 255 5988 1023450 20.05
> 256 511 2230 737877 14.46
> 512 1023 714 495411 9.71
> 1024 2047 377 536218 10.51
> 2048 4095 212 611170 11.98
> 4096 8191 85 478388 9.37
> 8192 16383 7 86683 1.70
> 16384 32767 1 19328 0.38
> total free extents 31174
> total free blocks 5103538
> average free extent size 163.711
This one has the least free space, but fewer free space
extents. It's still a potential candidate for AG ENOSPC conditions
to be triggered, though.
Ok, now I've seen the filesystem layout, I can say that the
preconditions for per-ag ENOSPC conditions do actually exist. Hence
we now really need to know what operation is reporting ENOSPC. I
guess we'll just have to wait for that to occur again and hope your
scripts capture it.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Mysterious ENOSPC
2021-09-02 1:42 ` Dave Chinner
@ 2021-09-17 6:07 ` Chris Dunlop
0 siblings, 0 replies; 15+ messages in thread
From: Chris Dunlop @ 2021-09-17 6:07 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, linux-xfs
On Thu, Sep 02, 2021 at 11:42:06AM +1000, Dave Chinner wrote:
> On Mon, Aug 30, 2021 at 08:04:57AM +1000, Dave Chinner wrote:
>> FWIW, if you are using reflink heavily and you have rmap enabled (as
>> you have), there's every chance that an AG has completely run out of
>> space and so new rmap records for shared extents can't be allocated
>> - that can give you spurious ENOSPC errors before the filesystem is
>> 100% full, too.
>>
>> i.e. every shared extent in the filesystem has a rmap record
>> pointing back to each owner of the shared extent. That means for an
>> extent shared 1000 times, there are 1000 rmap records for that
>> shared extent. If you share it again, a new rmap record needs to be
>> inserted into the rmapbt, and if the AG is completely out of space
>> this can fail w/ ENOSPC. Hence you can get ENOSPC errors attempting
>> to shared or unshare extents because there isn't space in the AG for
>> the tracking metadata for the new extent record....
...
> Ok, now I've seen the filesystem layout, I can say that the
> preconditions for per-ag ENOSPC conditions do actually exist. Hence
> we now really need to know what operation is reporting ENOSPC. I
> guess we'll just have to wait for that to occur again and hope your
> scripts capture it.
FYI, "something" seems to have changed without any particular prompting
and there haven't been any ENOSPC events in the last 3 weeks whereas
previously they were occurring 4-5 times a week. Sigh.
Cheers,
Chris
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2021-09-17 6:07 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-26 2:06 XFS fallocate implementation incorrectly reports ENOSPC Chris Dunlop
2021-08-26 15:05 ` Eric Sandeen
2021-08-26 20:56 ` Chris Dunlop
2021-08-27 2:55 ` Chris Dunlop
2021-08-27 5:49 ` Dave Chinner
2021-08-27 6:53 ` Chris Dunlop
2021-08-27 22:03 ` Dave Chinner
2021-08-28 0:21 ` Mysterious ENOSPC [was: XFS fallocate implementation incorrectly reports ENOSPC] Chris Dunlop
2021-08-28 3:58 ` Chris Dunlop
2021-08-29 22:04 ` Dave Chinner
2021-08-30 4:21 ` Darrick J. Wong
2021-08-30 7:40 ` Chris Dunlop
2021-08-30 7:37 ` Mysterious ENOSPC Chris Dunlop
2021-09-02 1:42 ` Dave Chinner
2021-09-17 6:07 ` Chris Dunlop
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).