All of lore.kernel.org
 help / color / mirror / Atom feed
* xfs: Temporary extra disk space consumption?
@ 2022-03-23 11:21 Tetsuo Handa
  2022-03-23 16:51 ` Darrick J. Wong
  2022-03-23 19:16 ` Dave Chinner
  0 siblings, 2 replies; 5+ messages in thread
From: Tetsuo Handa @ 2022-03-23 11:21 UTC (permalink / raw)
  To: linux-xfs

Hello.

I found that running a sample program shown below on xfs filesystem
results in consuming extra disk space until close() is called.
Is this expected result?

I don't care if temporarily consumed extra disk space is trivial. But since
this amount as of returning from fsync() is as much as amount of written data,
I worry that there might be some bug.

---------- my_write_unlink.c ----------
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
	static char buffer[1048576];
	const char *filename = "my_testfile";
	const int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0600);
	int i;

	if (fd == EOF)
		return 1;
	printf("Before write().\n");
	system("/bin/df -m .");
	for (i = 0; i < 1024; i++)
		if (write(fd, buffer, sizeof(buffer)) != sizeof(buffer))
			return 1;
	if (fsync(fd))
		return 1;
	printf("Before close().\n");
	system("/bin/df -m .");
	if (close(fd))
		return 1;
	printf("Before unlink().\n");
	system("/bin/df -m .");
	if (unlink(filename))
		return 1;
	printf("After unlink().\n");
	system("/bin/df -m .");
	return 0;
}
---------- my_write_unlink.c ----------

----------
$ uname -r
5.17.0
$ ./my_write_unlink
Before write().
Filesystem     1M-blocks   Used Available Use% Mounted on
/dev/sda1         255875 130392    125483  51% /
Before close().
Filesystem     1M-blocks   Used Available Use% Mounted on
/dev/sda1         255875 132443    123432  52% /
Before unlink().
Filesystem     1M-blocks   Used Available Use% Mounted on
/dev/sda1         255875 131416    124459  52% /
After unlink().
Filesystem     1M-blocks   Used Available Use% Mounted on
/dev/sda1         255875 130392    125483  51% /
$ grep sda /proc/mounts
/dev/sda1 / xfs rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
----------

----------
$ uname -r
4.18.0-365.el8.x86_64
$ ./my_write_unlink
Before write().
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/sda1          20469  2743     17727  14% /
Before close().
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/sda1          20469  4791     15679  24% /
Before unlink().
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/sda1          20469  3767     16703  19% /
After unlink().
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/sda1          20469  2743     17727  14% /
$ grep sda /proc/mounts
/dev/sda1 / xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
----------

----------
$ uname -r
3.10.0-1160.59.1.el7.x86_64
$ ./my_write_unlink
Before write().
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/sda1          20469  2310     18160  12% /
Before close().
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/sda1          20469  4358     16112  22% /
Before unlink().
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/sda1          20469  3334     17136  17% /
After unlink().
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/sda1          20469  2310     18160  12% /
$ grep sda /proc/mounts
/dev/sda1 / xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
----------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xfs: Temporary extra disk space consumption?
  2022-03-23 11:21 xfs: Temporary extra disk space consumption? Tetsuo Handa
@ 2022-03-23 16:51 ` Darrick J. Wong
  2022-03-23 19:16 ` Dave Chinner
  1 sibling, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2022-03-23 16:51 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-xfs

On Wed, Mar 23, 2022 at 08:21:52PM +0900, Tetsuo Handa wrote:
> Hello.
> 
> I found that running a sample program shown below on xfs filesystem
> results in consuming extra disk space until close() is called.
> Is this expected result?
> 
> I don't care if temporarily consumed extra disk space is trivial. But since
> this amount as of returning from fsync() is as much as amount of written data,
> I worry that there might be some bug.
> 
> ---------- my_write_unlink.c ----------
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> 
> int main(int argc, char *argv[])
> {
> 	static char buffer[1048576];
> 	const char *filename = "my_testfile";
> 	const int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0600);
> 	int i;
> 
> 	if (fd == EOF)
> 		return 1;
> 	printf("Before write().\n");
> 	system("/bin/df -m .");
> 	for (i = 0; i < 1024; i++)
> 		if (write(fd, buffer, sizeof(buffer)) != sizeof(buffer))
> 			return 1;
> 	if (fsync(fd))
> 		return 1;
> 	printf("Before close().\n");

If you run filefrag -v at this point and see blocks mapped into the file
after EOF, then the extra disk space consumption you see is most likely
speculative preallocation for extending writes.

--D

> 	system("/bin/df -m .");
> 	if (close(fd))
> 		return 1;
> 	printf("Before unlink().\n");
> 	system("/bin/df -m .");
> 	if (unlink(filename))
> 		return 1;
> 	printf("After unlink().\n");
> 	system("/bin/df -m .");
> 	return 0;
> }
> ---------- my_write_unlink.c ----------
> 
> ----------
> $ uname -r
> 5.17.0
> $ ./my_write_unlink
> Before write().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 130392    125483  51% /
> Before close().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 132443    123432  52% /
> Before unlink().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 131416    124459  52% /
> After unlink().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 130392    125483  51% /
> $ grep sda /proc/mounts
> /dev/sda1 / xfs rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
> ----------
> 
> ----------
> $ uname -r
> 4.18.0-365.el8.x86_64
> $ ./my_write_unlink
> Before write().
> Filesystem     1M-blocks  Used Available Use% Mounted on
> /dev/sda1          20469  2743     17727  14% /
> Before close().
> Filesystem     1M-blocks  Used Available Use% Mounted on
> /dev/sda1          20469  4791     15679  24% /
> Before unlink().
> Filesystem     1M-blocks  Used Available Use% Mounted on
> /dev/sda1          20469  3767     16703  19% /
> After unlink().
> Filesystem     1M-blocks  Used Available Use% Mounted on
> /dev/sda1          20469  2743     17727  14% /
> $ grep sda /proc/mounts
> /dev/sda1 / xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
> ----------
> 
> ----------
> $ uname -r
> 3.10.0-1160.59.1.el7.x86_64
> $ ./my_write_unlink
> Before write().
> Filesystem     1M-blocks  Used Available Use% Mounted on
> /dev/sda1          20469  2310     18160  12% /
> Before close().
> Filesystem     1M-blocks  Used Available Use% Mounted on
> /dev/sda1          20469  4358     16112  22% /
> Before unlink().
> Filesystem     1M-blocks  Used Available Use% Mounted on
> /dev/sda1          20469  3334     17136  17% /
> After unlink().
> Filesystem     1M-blocks  Used Available Use% Mounted on
> /dev/sda1          20469  2310     18160  12% /
> $ grep sda /proc/mounts
> /dev/sda1 / xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
> ----------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xfs: Temporary extra disk space consumption?
  2022-03-23 11:21 xfs: Temporary extra disk space consumption? Tetsuo Handa
  2022-03-23 16:51 ` Darrick J. Wong
@ 2022-03-23 19:16 ` Dave Chinner
  2022-03-23 23:28   ` Tetsuo Handa
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2022-03-23 19:16 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-xfs

On Wed, Mar 23, 2022 at 08:21:52PM +0900, Tetsuo Handa wrote:
> Hello.
> 
> I found that running a sample program shown below on xfs filesystem
> results in consuming extra disk space until close() is called.
> Is this expected result?

Yes. It's an anti-fragmentation mechanism that is intended to
prevent ecessive fragmentation when many files are being written at
once.

> I don't care if temporarily consumed extra disk space is trivial. But since
> this amount as of returning from fsync() is as much as amount of written data,
> I worry that there might be some bug.
> 
> ---------- my_write_unlink.c ----------
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> 
> int main(int argc, char *argv[])
> {
> 	static char buffer[1048576];
> 	const char *filename = "my_testfile";
> 	const int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0600);
> 	int i;

Truncate to zero length - all writes will be sequential extending
EOF.

> 
> 	if (fd == EOF)
> 		return 1;
> 	printf("Before write().\n");
> 	system("/bin/df -m .");
> 	for (i = 0; i < 1024; i++)
> 		if (write(fd, buffer, sizeof(buffer)) != sizeof(buffer))
> 			return 1;

And then wrote 1GB of sequential data. Without looking yet at your
results, I would expect between about 1.5 and 2GB of space was
allocated.

> 	if (fsync(fd))
> 		return 1;

This will allocate it all as a single unwritten extent if possible,
then write the 1GB of data to it converting that range to written.

Check your file size here - it will be 1GB. You can't read beyond
EOF, so the extra allocation in not accesible. It's also unwritten,
so even if you could read beyond EOF, you can't read any data from
the range because reads of unwritten extents return zeros.

> 	printf("Before close().\n");
> 	system("/bin/df -m .");
> 	if (close(fd))
> 		return 1;

This will run ->release() which will remove any extra allocation
we do at write() and result in just the written data up to EOF
remaining allocated on disk.

> 	printf("Before unlink().\n");
> 	system("/bin/df -m .");
> 	if (unlink(filename))
> 		return 1;
> 	printf("After unlink().\n");
> 	system("/bin/df -m .");
> 	return 0;
> }
> ---------- my_write_unlink.c ----------
> 
> ----------
> $ uname -r
> 5.17.0
> $ ./my_write_unlink
> Before write().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 130392    125483  51% /
> Before close().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 132443    123432  52% /

Yup, 2GB of space allocated.

> Before unlink().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 131416    124459  52% /

and ->release trims extra allocation beyond EOF and now you are
back to just the 1GB the file consumes.

> After unlink().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 130392    125483  51% /

And now it's all gone.

> $ grep sda /proc/mounts
> /dev/sda1 / xfs rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
> ----------
> 
> ----------
> $ uname -r
> 4.18.0-365.el8.x86_64

Same.

> ----------
> $ uname -r
> 3.10.0-1160.59.1.el7.x86_64

Same.

Looks like specualtive preallocation for sequential writes is
behaving exactly as designed....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xfs: Temporary extra disk space consumption?
  2022-03-23 19:16 ` Dave Chinner
@ 2022-03-23 23:28   ` Tetsuo Handa
  2022-03-24  1:13     ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2022-03-23 23:28 UTC (permalink / raw)
  To: Dave Chinner, Darrick J. Wong; +Cc: linux-xfs

On 2022/03/24 4:16, Dave Chinner wrote:
> On Wed, Mar 23, 2022 at 08:21:52PM +0900, Tetsuo Handa wrote:
>> Hello.
>>
>> I found that running a sample program shown below on xfs filesystem
>> results in consuming extra disk space until close() is called.
>> Is this expected result?
> 
> Yes. It's an anti-fragmentation mechanism that is intended to
> prevent ecessive fragmentation when many files are being written at
> once.

OK, this is an xfs specific behavior.

> Looks like specualtive preallocation for sequential writes is
> behaving exactly as designed....

Here is the result of "filefrag -v my_testfile" before close().

Filesystem type is: 58465342
File size of my_testfile is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..   65519:   33363497..  33429016:  65520:
   1:    65520..  229915:   62724762..  62889157: 164396:   33429017:
   2:   229916..  262143:   63132138..  63164365:  32228:   62889158: eof
   3:   262144..  294895:   63164366..  63197117:  32752:             unwritten,eof
my_testfile: 3 extents found

Filesystem type is: 58465342
File size of my_testfile is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  131055:   62724762..  62855817: 131056:
   1:   131056..  240361:   63813369..  63922674: 109306:   62855818:
   2:   240362..  262143:   32448944..  32470725:  21782:   63922675: eof
   3:   262144..  349194:   32470726..  32557776:  87051:             unwritten,eof
   4:   349195..  524271:          0..    175076: 175077:   32557777: unknown,delalloc,eof
my_testfile: 4 extents found



An interesting behavior I noticed is that, since "filefrag -v" opens this file
for reading and then closes this file descriptor opened for reading, injecting
close(open(filename, O_RDONLY)) like below causes consumption by speculative
preallocation gone; close() of a file descriptor opened for writing is not
required.

----------
diff -u my_write_unlink.c my_write_unlink2.c
--- my_write_unlink.c
+++ my_write_unlink2.c
@@ -23,6 +23,8 @@
                return 1;
        printf("Before close().\n");
        system("/bin/df -m .");
+       close(open(filename, O_RDONLY));
+       system("/bin/df -m .");
        if (close(fd))
                return 1;
        printf("Before unlink().\n");
----------

----------
Before write().
Filesystem     1M-blocks   Used Available Use% Mounted on
/dev/sda1         255875 130396    125479  51% /
Before close().
Filesystem     1M-blocks   Used Available Use% Mounted on
/dev/sda1         255875 132447    123428  52% /
Filesystem     1M-blocks   Used Available Use% Mounted on
/dev/sda1         255875 131420    124455  52% /
Before unlink().
Filesystem     1M-blocks   Used Available Use% Mounted on
/dev/sda1         255875 131420    124455  52% /
After unlink().
Filesystem     1M-blocks   Used Available Use% Mounted on
/dev/sda1         255875 130396    125479  51% /
----------


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xfs: Temporary extra disk space consumption?
  2022-03-23 23:28   ` Tetsuo Handa
@ 2022-03-24  1:13     ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2022-03-24  1:13 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Darrick J. Wong, linux-xfs

On Thu, Mar 24, 2022 at 08:28:30AM +0900, Tetsuo Handa wrote:
> On 2022/03/24 4:16, Dave Chinner wrote:
> > On Wed, Mar 23, 2022 at 08:21:52PM +0900, Tetsuo Handa wrote:
> >> Hello.
> >>
> >> I found that running a sample program shown below on xfs filesystem
> >> results in consuming extra disk space until close() is called.
> >> Is this expected result?
> > 
> > Yes. It's an anti-fragmentation mechanism that is intended to
> > prevent ecessive fragmentation when many files are being written at
> > once.
> 
> OK, this is an xfs specific behavior.
> 
> > Looks like specualtive preallocation for sequential writes is
> > behaving exactly as designed....
> 
> Here is the result of "filefrag -v my_testfile" before close().
> 
> Filesystem type is: 58465342
> File size of my_testfile is 1073741824 (262144 blocks of 4096 bytes)
>  ext:     logical_offset:        physical_offset: length:   expected: flags:
>    0:        0..   65519:   33363497..  33429016:  65520:
>    1:    65520..  229915:   62724762..  62889157: 164396:   33429017:
>    2:   229916..  262143:   63132138..  63164365:  32228:   62889158: eof
>    3:   262144..  294895:   63164366..  63197117:  32752:             unwritten,eof
> my_testfile: 3 extents found
> 
> Filesystem type is: 58465342
> File size of my_testfile is 1073741824 (262144 blocks of 4096 bytes)
>  ext:     logical_offset:        physical_offset: length:   expected: flags:
>    0:        0..  131055:   62724762..  62855817: 131056:
>    1:   131056..  240361:   63813369..  63922674: 109306:   62855818:
>    2:   240362..  262143:   32448944..  32470725:  21782:   63922675: eof
>    3:   262144..  349194:   32470726..  32557776:  87051:             unwritten,eof
>    4:   349195..  524271:          0..    175076: 175077:   32557777: unknown,delalloc,eof
> my_testfile: 4 extents found
> 
> An interesting behavior I noticed is that, since "filefrag -v" opens this file
> for reading and then closes this file descriptor opened for reading, injecting
> close(open(filename, O_RDONLY)) like below causes consumption by speculative
> preallocation gone; close() of a file descriptor opened for writing is not
> required.

Yup. This has never had a measurable impact on real world workloads
for us to need to optimise that away. Besides, ->release() cannot
tell if the prealloc belongs to that fd or not, so even if we were
to gate it on O_RDONLY, closing any writeable fd on that inode would
trigger the same behaviour regardless of whether that was the fd
that hte data was written to....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-24  1:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-23 11:21 xfs: Temporary extra disk space consumption? Tetsuo Handa
2022-03-23 16:51 ` Darrick J. Wong
2022-03-23 19:16 ` Dave Chinner
2022-03-23 23:28   ` Tetsuo Handa
2022-03-24  1:13     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.