* xfs: Temporary extra disk space consumption?
@ 2022-03-23 11:21 Tetsuo Handa
2022-03-23 16:51 ` Darrick J. Wong
2022-03-23 19:16 ` Dave Chinner
0 siblings, 2 replies; 5+ messages in thread
From: Tetsuo Handa @ 2022-03-23 11:21 UTC (permalink / raw)
To: linux-xfs
Hello.
I found that running a sample program shown below on xfs filesystem
results in consuming extra disk space until close() is called.
Is this expected result?
I don't care if temporarily consumed extra disk space is trivial. But since
this amount as of returning from fsync() is as much as amount of written data,
I worry that there might be some bug.
---------- my_write_unlink.c ----------
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
static char buffer[1048576];
const char *filename = "my_testfile";
const int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0600);
int i;
if (fd == EOF)
return 1;
printf("Before write().\n");
system("/bin/df -m .");
for (i = 0; i < 1024; i++)
if (write(fd, buffer, sizeof(buffer)) != sizeof(buffer))
return 1;
if (fsync(fd))
return 1;
printf("Before close().\n");
system("/bin/df -m .");
if (close(fd))
return 1;
printf("Before unlink().\n");
system("/bin/df -m .");
if (unlink(filename))
return 1;
printf("After unlink().\n");
system("/bin/df -m .");
return 0;
}
---------- my_write_unlink.c ----------
----------
$ uname -r
5.17.0
$ ./my_write_unlink
Before write().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 255875 130392 125483 51% /
Before close().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 255875 132443 123432 52% /
Before unlink().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 255875 131416 124459 52% /
After unlink().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 255875 130392 125483 51% /
$ grep sda /proc/mounts
/dev/sda1 / xfs rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
----------
----------
$ uname -r
4.18.0-365.el8.x86_64
$ ./my_write_unlink
Before write().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 20469 2743 17727 14% /
Before close().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 20469 4791 15679 24% /
Before unlink().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 20469 3767 16703 19% /
After unlink().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 20469 2743 17727 14% /
$ grep sda /proc/mounts
/dev/sda1 / xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
----------
----------
$ uname -r
3.10.0-1160.59.1.el7.x86_64
$ ./my_write_unlink
Before write().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 20469 2310 18160 12% /
Before close().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 20469 4358 16112 22% /
Before unlink().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 20469 3334 17136 17% /
After unlink().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 20469 2310 18160 12% /
$ grep sda /proc/mounts
/dev/sda1 / xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
----------
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: xfs: Temporary extra disk space consumption?
2022-03-23 11:21 xfs: Temporary extra disk space consumption? Tetsuo Handa
@ 2022-03-23 16:51 ` Darrick J. Wong
2022-03-23 19:16 ` Dave Chinner
1 sibling, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2022-03-23 16:51 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: linux-xfs
On Wed, Mar 23, 2022 at 08:21:52PM +0900, Tetsuo Handa wrote:
> Hello.
>
> I found that running a sample program shown below on xfs filesystem
> results in consuming extra disk space until close() is called.
> Is this expected result?
>
> I don't care if temporarily consumed extra disk space is trivial. But since
> this amount as of returning from fsync() is as much as amount of written data,
> I worry that there might be some bug.
>
> ---------- my_write_unlink.c ----------
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
>
> int main(int argc, char *argv[])
> {
> static char buffer[1048576];
> const char *filename = "my_testfile";
> const int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0600);
> int i;
>
> if (fd == EOF)
> return 1;
> printf("Before write().\n");
> system("/bin/df -m .");
> for (i = 0; i < 1024; i++)
> if (write(fd, buffer, sizeof(buffer)) != sizeof(buffer))
> return 1;
> if (fsync(fd))
> return 1;
> printf("Before close().\n");
If you run filefrag -v at this point and see blocks mapped into the file
after EOF, then the extra disk space consumption you see is most likely
speculative preallocation for extending writes.
--D
> system("/bin/df -m .");
> if (close(fd))
> return 1;
> printf("Before unlink().\n");
> system("/bin/df -m .");
> if (unlink(filename))
> return 1;
> printf("After unlink().\n");
> system("/bin/df -m .");
> return 0;
> }
> ---------- my_write_unlink.c ----------
>
> ----------
> $ uname -r
> 5.17.0
> $ ./my_write_unlink
> Before write().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 255875 130392 125483 51% /
> Before close().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 255875 132443 123432 52% /
> Before unlink().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 255875 131416 124459 52% /
> After unlink().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 255875 130392 125483 51% /
> $ grep sda /proc/mounts
> /dev/sda1 / xfs rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
> ----------
>
> ----------
> $ uname -r
> 4.18.0-365.el8.x86_64
> $ ./my_write_unlink
> Before write().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 20469 2743 17727 14% /
> Before close().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 20469 4791 15679 24% /
> Before unlink().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 20469 3767 16703 19% /
> After unlink().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 20469 2743 17727 14% /
> $ grep sda /proc/mounts
> /dev/sda1 / xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
> ----------
>
> ----------
> $ uname -r
> 3.10.0-1160.59.1.el7.x86_64
> $ ./my_write_unlink
> Before write().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 20469 2310 18160 12% /
> Before close().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 20469 4358 16112 22% /
> Before unlink().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 20469 3334 17136 17% /
> After unlink().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 20469 2310 18160 12% /
> $ grep sda /proc/mounts
> /dev/sda1 / xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
> ----------
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: xfs: Temporary extra disk space consumption?
2022-03-23 11:21 xfs: Temporary extra disk space consumption? Tetsuo Handa
2022-03-23 16:51 ` Darrick J. Wong
@ 2022-03-23 19:16 ` Dave Chinner
2022-03-23 23:28 ` Tetsuo Handa
1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2022-03-23 19:16 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: linux-xfs
On Wed, Mar 23, 2022 at 08:21:52PM +0900, Tetsuo Handa wrote:
> Hello.
>
> I found that running a sample program shown below on xfs filesystem
> results in consuming extra disk space until close() is called.
> Is this expected result?
Yes. It's an anti-fragmentation mechanism that is intended to
prevent ecessive fragmentation when many files are being written at
once.
> I don't care if temporarily consumed extra disk space is trivial. But since
> this amount as of returning from fsync() is as much as amount of written data,
> I worry that there might be some bug.
>
> ---------- my_write_unlink.c ----------
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
>
> int main(int argc, char *argv[])
> {
> static char buffer[1048576];
> const char *filename = "my_testfile";
> const int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0600);
> int i;
Truncate to zero length - all writes will be sequential extending
EOF.
>
> if (fd == EOF)
> return 1;
> printf("Before write().\n");
> system("/bin/df -m .");
> for (i = 0; i < 1024; i++)
> if (write(fd, buffer, sizeof(buffer)) != sizeof(buffer))
> return 1;
And then wrote 1GB of sequential data. Without looking yet at your
results, I would expect between about 1.5 and 2GB of space was
allocated.
> if (fsync(fd))
> return 1;
This will allocate it all as a single unwritten extent if possible,
then write the 1GB of data to it converting that range to written.
Check your file size here - it will be 1GB. You can't read beyond
EOF, so the extra allocation in not accesible. It's also unwritten,
so even if you could read beyond EOF, you can't read any data from
the range because reads of unwritten extents return zeros.
> printf("Before close().\n");
> system("/bin/df -m .");
> if (close(fd))
> return 1;
This will run ->release() which will remove any extra allocation
we do at write() and result in just the written data up to EOF
remaining allocated on disk.
> printf("Before unlink().\n");
> system("/bin/df -m .");
> if (unlink(filename))
> return 1;
> printf("After unlink().\n");
> system("/bin/df -m .");
> return 0;
> }
> ---------- my_write_unlink.c ----------
>
> ----------
> $ uname -r
> 5.17.0
> $ ./my_write_unlink
> Before write().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 255875 130392 125483 51% /
> Before close().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 255875 132443 123432 52% /
Yup, 2GB of space allocated.
> Before unlink().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 255875 131416 124459 52% /
and ->release trims extra allocation beyond EOF and now you are
back to just the 1GB the file consumes.
> After unlink().
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sda1 255875 130392 125483 51% /
And now it's all gone.
> $ grep sda /proc/mounts
> /dev/sda1 / xfs rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
> ----------
>
> ----------
> $ uname -r
> 4.18.0-365.el8.x86_64
Same.
> ----------
> $ uname -r
> 3.10.0-1160.59.1.el7.x86_64
Same.
Looks like specualtive preallocation for sequential writes is
behaving exactly as designed....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: xfs: Temporary extra disk space consumption?
2022-03-23 19:16 ` Dave Chinner
@ 2022-03-23 23:28 ` Tetsuo Handa
2022-03-24 1:13 ` Dave Chinner
0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2022-03-23 23:28 UTC (permalink / raw)
To: Dave Chinner, Darrick J. Wong; +Cc: linux-xfs
On 2022/03/24 4:16, Dave Chinner wrote:
> On Wed, Mar 23, 2022 at 08:21:52PM +0900, Tetsuo Handa wrote:
>> Hello.
>>
>> I found that running a sample program shown below on xfs filesystem
>> results in consuming extra disk space until close() is called.
>> Is this expected result?
>
> Yes. It's an anti-fragmentation mechanism that is intended to
> prevent ecessive fragmentation when many files are being written at
> once.
OK, this is an xfs specific behavior.
> Looks like specualtive preallocation for sequential writes is
> behaving exactly as designed....
Here is the result of "filefrag -v my_testfile" before close().
Filesystem type is: 58465342
File size of my_testfile is 1073741824 (262144 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 65519: 33363497.. 33429016: 65520:
1: 65520.. 229915: 62724762.. 62889157: 164396: 33429017:
2: 229916.. 262143: 63132138.. 63164365: 32228: 62889158: eof
3: 262144.. 294895: 63164366.. 63197117: 32752: unwritten,eof
my_testfile: 3 extents found
Filesystem type is: 58465342
File size of my_testfile is 1073741824 (262144 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 131055: 62724762.. 62855817: 131056:
1: 131056.. 240361: 63813369.. 63922674: 109306: 62855818:
2: 240362.. 262143: 32448944.. 32470725: 21782: 63922675: eof
3: 262144.. 349194: 32470726.. 32557776: 87051: unwritten,eof
4: 349195.. 524271: 0.. 175076: 175077: 32557777: unknown,delalloc,eof
my_testfile: 4 extents found
An interesting behavior I noticed is that, since "filefrag -v" opens this file
for reading and then closes this file descriptor opened for reading, injecting
close(open(filename, O_RDONLY)) like below causes consumption by speculative
preallocation gone; close() of a file descriptor opened for writing is not
required.
----------
diff -u my_write_unlink.c my_write_unlink2.c
--- my_write_unlink.c
+++ my_write_unlink2.c
@@ -23,6 +23,8 @@
return 1;
printf("Before close().\n");
system("/bin/df -m .");
+ close(open(filename, O_RDONLY));
+ system("/bin/df -m .");
if (close(fd))
return 1;
printf("Before unlink().\n");
----------
----------
Before write().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 255875 130396 125479 51% /
Before close().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 255875 132447 123428 52% /
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 255875 131420 124455 52% /
Before unlink().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 255875 131420 124455 52% /
After unlink().
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sda1 255875 130396 125479 51% /
----------
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: xfs: Temporary extra disk space consumption?
2022-03-23 23:28 ` Tetsuo Handa
@ 2022-03-24 1:13 ` Dave Chinner
0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2022-03-24 1:13 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: Darrick J. Wong, linux-xfs
On Thu, Mar 24, 2022 at 08:28:30AM +0900, Tetsuo Handa wrote:
> On 2022/03/24 4:16, Dave Chinner wrote:
> > On Wed, Mar 23, 2022 at 08:21:52PM +0900, Tetsuo Handa wrote:
> >> Hello.
> >>
> >> I found that running a sample program shown below on xfs filesystem
> >> results in consuming extra disk space until close() is called.
> >> Is this expected result?
> >
> > Yes. It's an anti-fragmentation mechanism that is intended to
> > prevent ecessive fragmentation when many files are being written at
> > once.
>
> OK, this is an xfs specific behavior.
>
> > Looks like specualtive preallocation for sequential writes is
> > behaving exactly as designed....
>
> Here is the result of "filefrag -v my_testfile" before close().
>
> Filesystem type is: 58465342
> File size of my_testfile is 1073741824 (262144 blocks of 4096 bytes)
> ext: logical_offset: physical_offset: length: expected: flags:
> 0: 0.. 65519: 33363497.. 33429016: 65520:
> 1: 65520.. 229915: 62724762.. 62889157: 164396: 33429017:
> 2: 229916.. 262143: 63132138.. 63164365: 32228: 62889158: eof
> 3: 262144.. 294895: 63164366.. 63197117: 32752: unwritten,eof
> my_testfile: 3 extents found
>
> Filesystem type is: 58465342
> File size of my_testfile is 1073741824 (262144 blocks of 4096 bytes)
> ext: logical_offset: physical_offset: length: expected: flags:
> 0: 0.. 131055: 62724762.. 62855817: 131056:
> 1: 131056.. 240361: 63813369.. 63922674: 109306: 62855818:
> 2: 240362.. 262143: 32448944.. 32470725: 21782: 63922675: eof
> 3: 262144.. 349194: 32470726.. 32557776: 87051: unwritten,eof
> 4: 349195.. 524271: 0.. 175076: 175077: 32557777: unknown,delalloc,eof
> my_testfile: 4 extents found
>
> An interesting behavior I noticed is that, since "filefrag -v" opens this file
> for reading and then closes this file descriptor opened for reading, injecting
> close(open(filename, O_RDONLY)) like below causes consumption by speculative
> preallocation gone; close() of a file descriptor opened for writing is not
> required.
Yup. This has never had a measurable impact on real world workloads
for us to need to optimise that away. Besides, ->release() cannot
tell if the prealloc belongs to that fd or not, so even if we were
to gate it on O_RDONLY, closing any writeable fd on that inode would
trigger the same behaviour regardless of whether that was the fd
that hte data was written to....
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-03-24 1:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-23 11:21 xfs: Temporary extra disk space consumption? Tetsuo Handa
2022-03-23 16:51 ` Darrick J. Wong
2022-03-23 19:16 ` Dave Chinner
2022-03-23 23:28 ` Tetsuo Handa
2022-03-24 1:13 ` Dave Chinner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.