All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Marc Lehmann <schmorp@schmorp.de>
Cc: xfs@oss.sgi.com
Subject: Re: drastic changes to allocsize semantics in or around 2.6.38?
Date: Fri, 20 May 2011 12:56:59 +1000	[thread overview]
Message-ID: <20110520025659.GO32466@dastard> (raw)
In-Reply-To: <20110520005510.GA15348@schmorp.de>

On Fri, May 20, 2011 at 02:55:11AM +0200, Marc Lehmann wrote:
> Hi!
> 
> I have "allocsize=64m" (or simialr sizes, such as 1m, 16m etc.) on many of my
> xfs filesystems, in an attempt to fight fragmentation on logfiles.
> 
> I am not sure about it's effectiveness, but in 2.6.38 (but not in 2.6.32),
> this leads to very unexpected and weird behaviour, namely that files being
> written have semi-permanently allocated chunks of allocsize to them.

The change that will be causing this was to how the preallocation is
dropped. In normal use cases, the preallocation should be dropped
when the file descriptor is closed. The change in 2.6.38 was to make
this conditional on whether the inode had been closed multiple times
while dirty. If the inode is closed (.release is called) multiple
times while dirty, then the preallocation is not truncated away
until the inode is dropped from the caches, rather than immediately
on close.  This prevents writes on NFS servers from doing excessive
work and triggering excessive fragmentation, as the NFS server does
an "open-write-close" for every write that comes across the wire.

This was also coupled witha change to the default speculative
allocation behaviour to do more and larger specualtive preallocation
and so in most cases remove the need for ever using the allocsize
mount option. It dynamically increases the preallocation size as the
file size increases, so small file writes behave like pre-2.6.38
without the allocsize mount option, large file writes behave like
they have a large allocsize mount option set and thereby preventing
most known delayed allocation fragmentation cases from occurring.

> I realised this when I did a make clean and a make in a buildroot directory,
> which cross-compiles uclibc, gcc, and lots of other packages, leading to a
> lot of mostly small files.

So the question there: how is your workload accessing the files? Is
it opening and closing them multiple times in quick succession after
writing them? I think it is triggering the "NFS server access
pattern" logic and so keeping speculative preallocation around for
longer.

> Atfer I deleted some files to get some space and rebooted, I suddenly had
> 180GB of space again, so it seems an unmount "fixes" this issue.
> 
> I often do these kind of build,s and I have allocsize on thee high values for
> a very long time, without ever having run into this kind of problem.
> 
> It seems that files get temporarily allocated much larger chunks (which is
> expoected behaviour), but xfs doesn't free them until there is a unmount
> (which is unexpected).

"echo 3 > /proc/sys/vm/drop_caches" should free up the space as the
preallocation will be truncated as the inodes are removed from the
VFS inode cache.

> Is this the desired behaviour? I would assume that any allocsize > 0 could
> lead to a lot of fragmentation if files that are closed and no longer being
> in-use always have extra space allocated for expansion for extremely long
> periods of time.

I'd suggest removing the allocsize mount option - you shouldn't need
it anymore because the new default behaviour resists fragmentation a
whole lot better than pre-2.6.38 kernels.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2011-05-20  2:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-20  0:55 drastic changes to allocsize semantics in or around 2.6.38? Marc Lehmann
2011-05-20  2:56 ` Dave Chinner [this message]
2011-05-20 15:49   ` Marc Lehmann
2011-05-21  0:45     ` Dave Chinner
2011-05-21  1:36       ` Marc Lehmann
2011-05-21  3:15         ` Dave Chinner
2011-05-21  4:16           ` Marc Lehmann
2011-05-22  2:00             ` Dave Chinner
2011-05-22  7:59               ` Matthias Schniedermeyer
2011-05-23  1:20                 ` Dave Chinner
2011-05-23  9:01                   ` Christoph Hellwig
2011-05-24  0:20                     ` Dave Chinner
2011-05-23 13:35               ` Marc Lehmann
2011-05-24  1:30                 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110520025659.GO32466@dastard \
    --to=david@fromorbit.com \
    --cc=schmorp@schmorp.de \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.