All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH 0/3]: Extreme fragmentation ahoy!
Date: Thu, 7 Feb 2019 16:39:41 +1100	[thread overview]
Message-ID: <20190207053941.GL14116@dastard> (raw)
In-Reply-To: <20190207052114.GA7991@magnolia>

On Wed, Feb 06, 2019 at 09:21:14PM -0800, Darrick J. Wong wrote:
> On Thu, Feb 07, 2019 at 04:08:10PM +1100, Dave Chinner wrote:
> > Hi folks,
> > 
> > I've just finished analysing an IO trace from a application
> > generating an extreme filesystem fragmentation problem that started
> > with extent size hints and ended with spurious ENOSPC reports due to
> > massively fragmented files and free space. While the ENOSPC issue
> > looks to have previously been solved, I still wanted to understand
> > how the application had so comprehensively defeated extent size
> > hints as a method of avoiding file fragmentation.
> > 
> > The key behaviour that I discovered was that specific "append write
> > only" files that had extent size hints to prevent fragmentation
> > weren't actually write only.  The application didn't do a lot of
> > writes to the file, but it kept the file open and appended to the
> > file (from the traces I have) in chunks of between ~3000 bytes and
> > ~160000 bytes. This didn't explain the problem. I did notice that
> > the files were opened O_SYNC, however.
> > 
> > I then found was another process that, once every second, opened the
> > log file O_RDONLY, read 28 bytes from offset zero, then closed the
> > file. Every second. IOWs, between every appending write that would
> > allocate an extent size hint worth of space beyond EOF and then
> > write a small chunk of it, there were numerous open/read/close
> > cycles being done on the same file.
> > 
> > And what do we do on close()? We call xfs_release() and that can
> > truncate away blocks beyond EOF. For some reason the close wasn't
> > triggering the IDIRTY_RELEASE heuristic that preventd close from
> > removing EOF blocks prematurely. Then I realised that O_SYNC writes
> > don't leave delayed allocation blocks behind - they are always
> > converted in the context of the write. That's why it wasn't
> > triggering, and that meant that the open/read/close cycle was
> > removing the extent size hint allocation beyond EOF prematurely.
> > beyond EOF prematurely.
> 
> <urk>
> 
> > Then it occurred to me that extent size hints don't use delalloc
> > either, so they behave the same was as O_SYNC writes in this
> > situation.
> > 
> > Oh, and we remove EOF blocks on O_RDONLY file close, too. i.e. we
> > modify the file without having write permissions.
> 
> Yikes!
> 
> > I suspect there's more cases like this when combined with repeated
> > open/<do_something>/close operations on a file that is being
> > written, but the patches address just these ones I just talked
> > about. The test script to reproduce them is below. Fragmentation
> > reduction results are in the commit descriptions. It's running
> > through fstests for a couple of hours now, no issues have been
> > noticed yet.
> > 
> > FWIW, I suspect we need to have a good hard think about whether we
> > should be trimming EOF blocks on close by default, or whether we
> > should only be doing it in very limited situations....
> > 
> > Comments, thoughts, flames welcome.
> > 
> > -Dave.
> > 
> > 
> > #!/bin/bash
> > #
> > # Test 1
> 
> Can you please turn these into fstests to cause the maintainer maximal
> immediate pain^W^W^Wmake everyone pay attention^W^W^W^Westablish a basis
> for regression testing and finding whatever other problems we can find
> from digging deeper? :)

I will, but not today - I only understood the cause well enough to
write a prototype reproducer about 4 hours ago. The rest of the time
since then has been fixing the issues and running smoke tests. My
brain is about fried now....

FWIW, I think the scope of the problem is quite widespread -
anything that does open/something/close repeatedly on a file that is
being written to with O_DSYNC or O_DIRECT appending writes will kill
the post-eof extent size hint allocated space. That's why I suspect
we need to think about not trimming by default and trying to
enumerating only the cases that need to trim eof blocks.

e.g. I closed the O_RDONLY case, but O_RDWR/read/close in a loop
will still trigger removal of post EOF extent size hint
preallocation and hence severe fragmentation.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-02-07  5:39 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-07  5:08 [RFC PATCH 0/3]: Extreme fragmentation ahoy! Dave Chinner
2019-02-07  5:08 ` [PATCH 1/3] xfs: Don't free EOF blocks on sync write close Dave Chinner
2019-02-07  5:08 ` [PATCH 2/3] xfs: Don't free EOF blocks on close when extent size hints are set Dave Chinner
2019-02-07 15:51   ` Brian Foster
2019-02-07  5:08 ` [PATCH 3/3] xfs: Don't free EOF blocks on sync write close Dave Chinner
2019-02-07  5:19   ` Dave Chinner
2019-02-07  5:21 ` [RFC PATCH 0/3]: Extreme fragmentation ahoy! Darrick J. Wong
2019-02-07  5:39   ` Dave Chinner [this message]
2019-02-07 15:52     ` Brian Foster
2019-02-08  2:47       ` Dave Chinner
2019-02-08 12:34         ` Brian Foster
2019-02-12  1:13           ` Darrick J. Wong
2019-02-12 11:46             ` Brian Foster
2019-02-12 20:21               ` Dave Chinner
2019-02-13 13:50                 ` Brian Foster
2019-02-13 22:27                   ` Dave Chinner
2019-02-14 13:00                     ` Brian Foster
2019-02-14 21:51                       ` Dave Chinner
2019-02-15  2:35                         ` Brian Foster
2019-02-15  7:23                           ` Dave Chinner
2019-02-15 20:33                             ` Brian Foster
2019-02-08 16:29         ` Darrick J. Wong
2019-02-18  2:26 ` [PATCH 4/3] xfs: EOF blocks are not busy extents Dave Chinner
2019-02-20 15:12   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190207053941.GL14116@dastard \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.