All of lore.kernel.org
 help / color / mirror / Atom feed
* [FAQ] XFS speculative preallocation
@ 2014-03-21 16:29 Brian Foster
  2014-03-21 16:54 ` Shaun Gosse
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Brian Foster @ 2014-03-21 16:29 UTC (permalink / raw)
  To: xfs

Hi all,

Eric had suggested we add an FAQ entry for speculative preallocation
since it seems to be a common question, so I offered to write something
up. I started with a single entry but split it into a couple Q's when it
turned into TL;DR fodder. ;)

The text is embedded below for review. Thoughts on the questions or
content is appreciated. Also, once folks are Ok with this... how does
one gain edit access to the wiki?

Brian

---

Q: Why do files on XFS use more data blocks than expected?

A:

The XFS speculative preallocation algorithm allocates extra blocks
beyond end of file (EOF) to combat fragmentation under parallel
sequential write workloads. This post-EOF block allocation is included
in 'st_blocks' counts via stat() system calls and is accounted as
globally allocated space by the filesystem. This is reported by various
userspace utilities (stat, du, df, ls) and thus provides a common source
of confusion for administrators. Post-EOF blocks are temporary in most
situations and are usually reclaimed via several possible mechanisms in
XFS.

See the FAQ entry on speculative preallocation for details.

Q: What is speculative preallocation? How can I manage it?

A:

XFS speculatively preallocates post-EOF blocks on file extending writes
in anticipation of future extending writes. The size of a preallocation
is dynamic and depends on the size of the previous extent in the file
(starting from 0 again if the write extends past a hole). As files grow
larger, so do the size of preallocations. Speculative preallocation is
not enabled for files smaller than a minimum size (64k by default, but
can vary depending on filesystem geometry and/or mount options).
Preallocations are capped at a maximum of 8GB on 4k block filesystems.
Preallocation is throttled automatically as the filesystem approaches
low free space conditions or other allocation limits on a file (such as
a quota).
 
In most cases, speculative preallocation is automatically reclaimed when
a file is closed. The preallocation may persist after file close if an
open, write, close pattern is repeated on a file. In this scenario,
post-EOF preallocation is trimmed once the inode is reclaimed from cache
or the filesystem unmounted.

Linux 3.8 (and later) includes a scanner to perform background trimming
of files with lingering post-EOF preallocations. The scanner bypasses
files that have been recently modified to not interfere with ongoing
writes. A 5 minute scan interval is used by default and can be adjusted
via the following file (value in seconds):

	/proc/sys/fs/xfs/speculative_prealloc_lifetime

Although speculative preallocation can lead to reports of excess space
usage, the preallocated space is not permanent unless explicitly made so
via fallocate or a similar interface. Preallocated space can also be
encoded permanently in situations where file size is extended beyond a
range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated
blocks are reclaimed on file close, inode reclaim, unmount or in the
background once file write activity subsides.

Finally, the XFS block allocation algorithm can be configured to use a
fixed allocation size with the 'allocsize=' mount option. Note that
speculative preallocation does not occur when a fixed allocation size is
set and thus increases the potential for fragmentation via parallel
writes.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [FAQ] XFS speculative preallocation
  2014-03-21 16:29 [FAQ] XFS speculative preallocation Brian Foster
@ 2014-03-21 16:54 ` Shaun Gosse
  2014-03-21 17:09 ` Arkadiusz Miśkiewicz
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Shaun Gosse @ 2014-03-21 16:54 UTC (permalink / raw)
  To: Brian Foster, xfs

Brian,

FWIW, from my perspective as a newcomer to XFS that is quite clear and understandable and informative. Looks like a valuable addition.

I've got no idea how to get write access on the wiki personally, but hopefully that answer will arrive for you 'soon(tm)'.

Cheers,
-Shaun

-----Original Message-----
From: xfs-bounces@oss.sgi.com [mailto:xfs-bounces@oss.sgi.com] On Behalf Of Brian Foster
Sent: Friday, March 21, 2014 11:29 AM
To: xfs@oss.sgi.com
Subject: [FAQ] XFS speculative preallocation

Hi all,

Eric had suggested we add an FAQ entry for speculative preallocation since it seems to be a common question, so I offered to write something up. I started with a single entry but split it into a couple Q's when it turned into TL;DR fodder. ;)

The text is embedded below for review. Thoughts on the questions or content is appreciated. Also, once folks are Ok with this... how does one gain edit access to the wiki?

Brian

---

Q: Why do files on XFS use more data blocks than expected?

A:

The XFS speculative preallocation algorithm allocates extra blocks beyond end of file (EOF) to combat fragmentation under parallel sequential write workloads. This post-EOF block allocation is included in 'st_blocks' counts via stat() system calls and is accounted as globally allocated space by the filesystem. This is reported by various userspace utilities (stat, du, df, ls) and thus provides a common source of confusion for administrators. Post-EOF blocks are temporary in most situations and are usually reclaimed via several possible mechanisms in XFS.

See the FAQ entry on speculative preallocation for details.

Q: What is speculative preallocation? How can I manage it?

A:

XFS speculatively preallocates post-EOF blocks on file extending writes in anticipation of future extending writes. The size of a preallocation is dynamic and depends on the size of the previous extent in the file (starting from 0 again if the write extends past a hole). As files grow larger, so do the size of preallocations. Speculative preallocation is not enabled for files smaller than a minimum size (64k by default, but can vary depending on filesystem geometry and/or mount options).
Preallocations are capped at a maximum of 8GB on 4k block filesystems.
Preallocation is throttled automatically as the filesystem approaches low free space conditions or other allocation limits on a file (such as a quota).
 
In most cases, speculative preallocation is automatically reclaimed when a file is closed. The preallocation may persist after file close if an open, write, close pattern is repeated on a file. In this scenario, post-EOF preallocation is trimmed once the inode is reclaimed from cache or the filesystem unmounted.

Linux 3.8 (and later) includes a scanner to perform background trimming of files with lingering post-EOF preallocations. The scanner bypasses files that have been recently modified to not interfere with ongoing writes. A 5 minute scan interval is used by default and can be adjusted via the following file (value in seconds):

	/proc/sys/fs/xfs/speculative_prealloc_lifetime

Although speculative preallocation can lead to reports of excess space usage, the preallocated space is not permanent unless explicitly made so via fallocate or a similar interface. Preallocated space can also be encoded permanently in situations where file size is extended beyond a range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated blocks are reclaimed on file close, inode reclaim, unmount or in the background once file write activity subsides.

Finally, the XFS block allocation algorithm can be configured to use a fixed allocation size with the 'allocsize=' mount option. Note that speculative preallocation does not occur when a fixed allocation size is set and thus increases the potential for fragmentation via parallel writes.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FAQ] XFS speculative preallocation
  2014-03-21 16:29 [FAQ] XFS speculative preallocation Brian Foster
  2014-03-21 16:54 ` Shaun Gosse
@ 2014-03-21 17:09 ` Arkadiusz Miśkiewicz
  2014-03-21 18:02   ` Brian Foster
  2014-03-21 23:16   ` Dave Chinner
  2014-03-21 20:11 ` Florian Weimer
  2014-03-21 23:05 ` Dave Chinner
  3 siblings, 2 replies; 11+ messages in thread
From: Arkadiusz Miśkiewicz @ 2014-03-21 17:09 UTC (permalink / raw)
  To: xfs

On Friday 21 of March 2014, Brian Foster wrote:
> Hi all,
> 
> Eric had suggested we add an FAQ entry for speculative preallocation
> since it seems to be a common question, so I offered to write something
> up. I started with a single entry but split it into a couple Q's when it
> turned into TL;DR fodder. ;)
> 
> The text is embedded below for review. Thoughts on the questions or
> content is appreciated. Also, once folks are Ok with this... how does
> one gain edit access to the wiki?

More questions or topics that can be converted to questions from me:

1) Before preallocation kernel did things differently. AFAIK it wasn't the 
same as allocsize=64k, was it? Is there a way to get old behaviour or 
something similar to old behaviour?

2) Is there a way to see which file got some preallocation and how big that 
preallocation is? Scenario - something ate free space due to preallocation and 
from admin point of view it would be usefull to know which app did that and 
how many MB was due to preallocation (vs real, written data).

> Linux 3.8 (and later) includes a scanner to perform background trimming
> of files with lingering post-EOF preallocations. The scanner bypasses
> files that have been recently 

What time is "recently" ? Is "modified" equal to "file data modified" or "file 
data or metadata modified" ?

> modified to not interfere with ongoing
> writes.

In case of some app that constantly writes to files (apache web server 
writting to its logs for example) that background trimming will never do 
anything for these files, right?

> A 5 minute scan interval is used by default and can be adjusted
> via the following file (value in seconds):
> 
> 	/proc/sys/fs/xfs/speculative_prealloc_lifetime
> 
> Although speculative preallocation can lead to reports of excess space
> usage, the preallocated space is not permanent unless explicitly made so
> via fallocate or a similar interface. Preallocated space can also be
> encoded permanently in situations where file size is extended beyond a
> range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated
> blocks are reclaimed on file close, inode reclaim, unmount or in the
> background once file write activity subsides.

So there is no mechanism that would shirnk preallocations in case when free 
space is (almost or) gone on a fs? Case: apache causes xfs to preallocate 
several GB for its /var/..../{access,error}_log (common problem here) and then 
free space ends on that fs causing problems for every app that writes to /var.

Thanks!

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FAQ] XFS speculative preallocation
  2014-03-21 17:09 ` Arkadiusz Miśkiewicz
@ 2014-03-21 18:02   ` Brian Foster
  2014-03-21 23:16   ` Dave Chinner
  1 sibling, 0 replies; 11+ messages in thread
From: Brian Foster @ 2014-03-21 18:02 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: xfs

On Fri, Mar 21, 2014 at 06:09:03PM +0100, Arkadiusz Miśkiewicz wrote:
> On Friday 21 of March 2014, Brian Foster wrote:
> > Hi all,
> > 
> > Eric had suggested we add an FAQ entry for speculative preallocation
> > since it seems to be a common question, so I offered to write something
> > up. I started with a single entry but split it into a couple Q's when it
> > turned into TL;DR fodder. ;)
> > 
> > The text is embedded below for review. Thoughts on the questions or
> > content is appreciated. Also, once folks are Ok with this... how does
> > one gain edit access to the wiki?
> 
> More questions or topics that can be converted to questions from me:
> 
> 1) Before preallocation kernel did things differently. AFAIK it wasn't the 
> same as allocsize=64k, was it? Is there a way to get old behaviour or 
> something similar to old behaviour?
> 

Going from the commit log that introduced speculative preallocation, it
appears that the behavior was effectively allocsize=64k. For reference:

	055388a3 xfs: dynamic speculative EOF preallocation

> 2) Is there a way to see which file got some preallocation and how big that 
> preallocation is? Scenario - something ate free space due to preallocation and 
> from admin point of view it would be usefull to know which app did that and 
> how many MB was due to preallocation (vs real, written data).
> 

The common scenario is when du/stat reports a larger block usage than
file size, so the question of how much extra space is allocated is just
the difference between the two. I suppose we could include a simple
example of that in the first Q.

This isn't necessarily true in the case of sparse files. xfs_bmap prints
the extent information for a file, so it should be possible to determine
how much post-EOF space exists from looking at the extent that covers
EOF. That said, this strikes me as more "user guide" material than FAQ.

> > Linux 3.8 (and later) includes a scanner to perform background trimming
> > of files with lingering post-EOF preallocations. The scanner bypasses
> > files that have been recently 
> 
> What time is "recently" ? Is "modified" equal to "file data modified" or "file 
> data or metadata modified" ?
> 

I originally had something like "files that have not been modified since
last flushed to disk," which is the heuristic as I understand it. That
seemed too verbose and technical for FAQ. I could replace "recently
modified" with "... bypasses files that are dirty ..." if that is more
useful..?

> > modified to not interfere with ongoing
> > writes.
> 
> In case of some app that constantly writes to files (apache web server 
> writting to its logs for example) that background trimming will never do 
> anything for these files, right?
> 

Most likely true. Though by the same logic, those files will eventually
use the preallocated space.

> > A 5 minute scan interval is used by default and can be adjusted
> > via the following file (value in seconds):
> > 
> > 	/proc/sys/fs/xfs/speculative_prealloc_lifetime
> > 
> > Although speculative preallocation can lead to reports of excess space
> > usage, the preallocated space is not permanent unless explicitly made so
> > via fallocate or a similar interface. Preallocated space can also be
> > encoded permanently in situations where file size is extended beyond a
> > range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated
> > blocks are reclaimed on file close, inode reclaim, unmount or in the
> > background once file write activity subsides.
> 
> So there is no mechanism that would shirnk preallocations in case when free 
> space is (almost or) gone on a fs? Case: apache causes xfs to preallocate 
> several GB for its /var/..../{access,error}_log (common problem here) and then 
> free space ends on that fs causing problems for every app that writes to /var.
> 

I noted in the second answer that the preallocation is throttled as we
near allocation limits such as no free space or quota. I think that
should cover most cases. I still have some code lying around somewhere
that forces a scan and retry in EDQUOT scenarios though. I should dust
that off...

Thanks for the reviews!

Brian

> Thanks!
> 
> -- 
> Arkadiusz Miśkiewicz, arekm / maven.pl
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FAQ] XFS speculative preallocation
  2014-03-21 16:29 [FAQ] XFS speculative preallocation Brian Foster
  2014-03-21 16:54 ` Shaun Gosse
  2014-03-21 17:09 ` Arkadiusz Miśkiewicz
@ 2014-03-21 20:11 ` Florian Weimer
  2014-03-21 23:10   ` Dave Chinner
  2014-03-22 13:32   ` Christoph Hellwig
  2014-03-21 23:05 ` Dave Chinner
  3 siblings, 2 replies; 11+ messages in thread
From: Florian Weimer @ 2014-03-21 20:11 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

* Brian Foster:

> Although speculative preallocation can lead to reports of excess space
> usage, the preallocated space is not permanent unless explicitly made so
> via fallocate or a similar interface.

How does an explicit allocation with posix_fallocate interact with
speculative preallocation?  Does it disable it?

I see rather dramatic fragmentation of the systemd journal when it is
stored on XFS, and it calls posix_fallocate before writing data to the
file.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FAQ] XFS speculative preallocation
  2014-03-21 16:29 [FAQ] XFS speculative preallocation Brian Foster
                   ` (2 preceding siblings ...)
  2014-03-21 20:11 ` Florian Weimer
@ 2014-03-21 23:05 ` Dave Chinner
  3 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2014-03-21 23:05 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Fri, Mar 21, 2014 at 12:29:20PM -0400, Brian Foster wrote:
> Hi all,
> 
> Eric had suggested we add an FAQ entry for speculative preallocation
> since it seems to be a common question, so I offered to write something
> up. I started with a single entry but split it into a couple Q's when it
> turned into TL;DR fodder. ;)
> 
> The text is embedded below for review. Thoughts on the questions or
> content is appreciated. Also, once folks are Ok with this... how does
> one gain edit access to the wiki?

Request an account and wait for one of us admins to ack it.

FWIW, what I'd really like is for the FAQ to be converted to a
asciidoc document in the xfs-documentation tree. The current FAQ has
lots of stuff that could do with updating, but editing a wiki
document that long in a browser is, well, painful. We can then
publish the build html version of the FAQ on the wiki...

> Brian
> 
> ---
> 
> Q: Why do files on XFS use more data blocks than expected?
> 
> A:
> 
> The XFS speculative preallocation algorithm allocates extra blocks
> beyond end of file (EOF) to combat fragmentation under parallel
> sequential write workloads.

"minimise file fragmentation during buffered write workloads.
Workloads that benefit from this behaviour include slowly growing
files, concurrent writers and mixed reader/writers workloads. It
also provides fragmentation resistence in situations where memory
pressure prevents adequate buffering of dirty data to allow large
contiguous regions of dirty data to be formed in memory."

> This post-EOF block allocation is included

"is accounted identically to blocks withing EOF. It is visible..."

> in 'st_blocks' counts via stat() system calls and is accounted as
> globally allocated space by the filesystem. This is reported by various
> userspace utilities (stat, du, df, ls) and thus provides a common source
> of confusion for administrators. Post-EOF blocks are temporary in most
> situations and are usually reclaimed via several possible mechanisms in
> XFS.

Also accounted for in quotas.

> See the FAQ entry on speculative preallocation for details.
> 
> Q: What is speculative preallocation? How can I manage it?
> 
> A:
> 
> XFS speculatively preallocates post-EOF blocks on file extending writes
> in anticipation of future extending writes. The size of a preallocation
> is dynamic and depends on the size of the previous extent in the file
> (starting from 0 again if the write extends past a hole).

I'd keep specific heuristics out of the description. Heuristics
change....

> As files grow
> larger, so do the size of preallocations. Speculative preallocation is
> not enabled for files smaller than a minimum size (64k by default, but
> can vary depending on filesystem geometry and/or mount options).

Again, actual numbers should probably be avoided, because we can
change that at will...

> Preallocations are capped at a maximum of 8GB on 4k block filesystems.

"capped at a single extent of the maximum supported size of the filesystem"

> Preallocation is throttled automatically as the filesystem approaches
> low free space conditions or other allocation limits on a file (such as
> a quota).

"Preallocation size is throttled..."

> In most cases, speculative preallocation is automatically reclaimed when
> a file is closed. The preallocation may persist after file close if an
> open, write, close pattern is repeated on a file. In this scenario,
> post-EOF preallocation is trimmed once the inode is reclaimed from cache
> or the filesystem unmounted.

I'd rewrite this slightly differently, saying that preallocation "may
persist beyond the lifecycle of any given file descriptor." And then
describe the reason for this - that certain application behaviours
(like slowly growing files, or file servers) can cause fragmentation
if we remove the preallocation on fd close. These behaviours are
automatically detected, and result in "delayed removal" of the
preallocation.

Q: How can I speed up or avoid delayed removal of speculative preallocation?

A. Removing the inode from the VFS cache or unmounting the
filesystem will remove speculative preallocations associated with an
inode.

> Linux 3.8 (and later) includes a scanner to perform background trimming
> of files with lingering post-EOF preallocations. The scanner bypasses
> files that have been recently modified to not interfere with ongoing
> writes. A 5 minute scan interval is used by default and can be adjusted
> via the following file (value in seconds):
> 
> 	/proc/sys/fs/xfs/speculative_prealloc_lifetime
> 

Q: Is speculative preallocation permanent?

A:
> Although speculative preallocation can lead to reports of excess space
> usage, the preallocated space is not permanent unless explicitly made so
> via fallocate or a similar interface. Preallocated space can also be
> encoded permanently in situations where file size is extended beyond a
> range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated
> blocks are reclaimed on file close, inode reclaim, unmount or in the
> background once file write activity subsides.
> 

Q: My workload has known characteristics - can I tune speculative
preallocation to be an optimal fixed size?

A.
> Finally, the XFS block allocation algorithm can be configured to use a
> fixed allocation size with the 'allocsize=' mount option. Note that
> speculative preallocation does not occur when a fixed allocation size is
> set and thus increases the potential for fragmentation via parallel
> writes.

This should say "dynamic resizing of speculative preallocation does
not occur" rather than "speculative preallocation does not occur",
because allocsize only determines the size of the speculative
preallocation beyond EOF that is done - it doesn't turn it off...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FAQ] XFS speculative preallocation
  2014-03-21 20:11 ` Florian Weimer
@ 2014-03-21 23:10   ` Dave Chinner
  2014-03-21 23:13     ` Eric Sandeen
  2014-03-22 13:32   ` Christoph Hellwig
  1 sibling, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2014-03-21 23:10 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Brian Foster, xfs

On Fri, Mar 21, 2014 at 09:11:29PM +0100, Florian Weimer wrote:
> * Brian Foster:
> 
> > Although speculative preallocation can lead to reports of excess space
> > usage, the preallocated space is not permanent unless explicitly made so
> > via fallocate or a similar interface.
> 
> How does an explicit allocation with posix_fallocate interact with
> speculative preallocation?  Does it disable it?

fallocate is permanent preallocation using unwritten extents.
Speculative preallocation is an extension of delayed allocation that
is done when extending the file and the EOF falls into a hole. If
there is unwritten extents beyond EOF, speulative preallocation is
not performed.

> I see rather dramatic fragmentation of the systemd journal when it is
> stored on XFS, and it calls posix_fallocate before writing data to the
> file.

There's your problem - systemd is preventing delayed allocation, and
so it fragmenting the file itself with it's write pattern.
Basically, that's a bug in systemd, and not something the filesystem
can avoid because userspace is directly controlling block
allocation.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FAQ] XFS speculative preallocation
  2014-03-21 23:10   ` Dave Chinner
@ 2014-03-21 23:13     ` Eric Sandeen
  2014-03-21 23:18       ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Sandeen @ 2014-03-21 23:13 UTC (permalink / raw)
  To: Dave Chinner, Florian Weimer; +Cc: Brian Foster, xfs

On 3/21/14, 6:10 PM, Dave Chinner wrote:
> On Fri, Mar 21, 2014 at 09:11:29PM +0100, Florian Weimer wrote:
>> * Brian Foster:
>>
>>> Although speculative preallocation can lead to reports of excess space
>>> usage, the preallocated space is not permanent unless explicitly made so
>>> via fallocate or a similar interface.
>>
>> How does an explicit allocation with posix_fallocate interact with
>> speculative preallocation?  Does it disable it?
> 
> fallocate is permanent preallocation using unwritten extents.
> Speculative preallocation is an extension of delayed allocation that
> is done when extending the file and the EOF falls into a hole. If
> there is unwritten extents beyond EOF, speulative preallocation is
> not performed.
> 
>> I see rather dramatic fragmentation of the systemd journal when it is
>> stored on XFS, and it calls posix_fallocate before writing data to the
>> file.
> 
> There's your problem - systemd is preventing delayed allocation, and
> so it fragmenting the file itself with it's write pattern.
> Basically, that's a bug in systemd, and not something the filesystem
> can avoid because userspace is directly controlling block
> allocation.

hohum, I guess we should look into this.

OTOH: nothing wrong with calling posix_fallocate() if you need the space
guarantees it provides for proper operation...

-Eric

> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FAQ] XFS speculative preallocation
  2014-03-21 17:09 ` Arkadiusz Miśkiewicz
  2014-03-21 18:02   ` Brian Foster
@ 2014-03-21 23:16   ` Dave Chinner
  1 sibling, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2014-03-21 23:16 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: xfs

On Fri, Mar 21, 2014 at 06:09:03PM +0100, Arkadiusz Miśkiewicz wrote:
> On Friday 21 of March 2014, Brian Foster wrote:
> > Hi all,
> > 
> > Eric had suggested we add an FAQ entry for speculative preallocation
> > since it seems to be a common question, so I offered to write something
> > up. I started with a single entry but split it into a couple Q's when it
> > turned into TL;DR fodder. ;)
> > 
> > The text is embedded below for review. Thoughts on the questions or
> > content is appreciated. Also, once folks are Ok with this... how does
> > one gain edit access to the wiki?
> 
> More questions or topics that can be converted to questions from me:
> 
> 1) Before preallocation kernel did things differently. AFAIK it wasn't the 
> same as allocsize=64k, was it? Is there a way to get old behaviour or 
> something similar to old behaviour?

The old behaviour is exactly that of allocsize=64k.

> > modified to not interfere with ongoing
> > writes.
> 
> In case of some app that constantly writes to files (apache web server 
> writting to its logs for example) that background trimming will never do 
> anything for these files, right?

If the inode is being constantly dirtied, then the speculative
prealloc will not be removed by the background scanner. It only
removes prealloc from clean inodes.

> > A 5 minute scan interval is used by default and can be adjusted
> > via the following file (value in seconds):
> > 
> > 	/proc/sys/fs/xfs/speculative_prealloc_lifetime
> > 
> > Although speculative preallocation can lead to reports of excess space
> > usage, the preallocated space is not permanent unless explicitly made so
> > via fallocate or a similar interface. Preallocated space can also be
> > encoded permanently in situations where file size is extended beyond a
> > range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated
> > blocks are reclaimed on file close, inode reclaim, unmount or in the
> > background once file write activity subsides.
> 
> So there is no mechanism that would shirnk preallocations in case when free 
> space is (almost or) gone on a fs?

Background space trimmer takes care of that. We could probably also
trigger it on ENOSPC, but once you are already at ENOSPC it's too
late....

> Case: apache causes xfs to preallocate 
> several GB for its /var/..../{access,error}_log (common problem here) and then 
> free space ends on that fs causing problems for every app that writes to /var.

Your log files would have to already be GB in size for that your
apache logs to preallocate that much. If your log files are that
big, then /var needs to be much, much larger than what the
speculative prealloc for a handful of files could easily exhaust.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FAQ] XFS speculative preallocation
  2014-03-21 23:13     ` Eric Sandeen
@ 2014-03-21 23:18       ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2014-03-21 23:18 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Brian Foster, Florian Weimer, xfs

On Fri, Mar 21, 2014 at 06:13:30PM -0500, Eric Sandeen wrote:
> On 3/21/14, 6:10 PM, Dave Chinner wrote:
> > On Fri, Mar 21, 2014 at 09:11:29PM +0100, Florian Weimer wrote:
> >> * Brian Foster:
> >>
> >>> Although speculative preallocation can lead to reports of excess space
> >>> usage, the preallocated space is not permanent unless explicitly made so
> >>> via fallocate or a similar interface.
> >>
> >> How does an explicit allocation with posix_fallocate interact with
> >> speculative preallocation?  Does it disable it?
> > 
> > fallocate is permanent preallocation using unwritten extents.
> > Speculative preallocation is an extension of delayed allocation that
> > is done when extending the file and the EOF falls into a hole. If
> > there is unwritten extents beyond EOF, speulative preallocation is
> > not performed.
> > 
> >> I see rather dramatic fragmentation of the systemd journal when it is
> >> stored on XFS, and it calls posix_fallocate before writing data to the
> >> file.
> > 
> > There's your problem - systemd is preventing delayed allocation, and
> > so it fragmenting the file itself with it's write pattern.
> > Basically, that's a bug in systemd, and not something the filesystem
> > can avoid because userspace is directly controlling block
> > allocation.
> 
> hohum, I guess we should look into this.
> 
> OTOH: nothing wrong with calling posix_fallocate() if you need the space
> guarantees it provides for proper operation...

Right, but it's something that the filesystem has no real control
over. We've been asked to allocate blocks immediately by
fallocate(), and so we get what we get....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [FAQ] XFS speculative preallocation
  2014-03-21 20:11 ` Florian Weimer
  2014-03-21 23:10   ` Dave Chinner
@ 2014-03-22 13:32   ` Christoph Hellwig
  1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2014-03-22 13:32 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Brian Foster, xfs

On Fri, Mar 21, 2014 at 09:11:29PM +0100, Florian Weimer wrote:
> I see rather dramatic fragmentation of the systemd journal when it is
> stored on XFS, and it calls posix_fallocate before writing data to the
> file.

You mean it calls fallocate before each write?  That's not very useful
behaviour and should be fixed.  If it calls fallocate for the whole
expeted file size (or large increments) it should not fragment the file,
and if it does there's a bug we'd need to look into.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-03-22 13:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-21 16:29 [FAQ] XFS speculative preallocation Brian Foster
2014-03-21 16:54 ` Shaun Gosse
2014-03-21 17:09 ` Arkadiusz Miśkiewicz
2014-03-21 18:02   ` Brian Foster
2014-03-21 23:16   ` Dave Chinner
2014-03-21 20:11 ` Florian Weimer
2014-03-21 23:10   ` Dave Chinner
2014-03-21 23:13     ` Eric Sandeen
2014-03-21 23:18       ` Dave Chinner
2014-03-22 13:32   ` Christoph Hellwig
2014-03-21 23:05 ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.