All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dilip Simha <nmdilipsimha@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Eric Sandeen <sandeen@sandeen.net>, xfs@oss.sgi.com
Subject: Re: Request for information on bloated writes using Swift
Date: Wed, 3 Feb 2016 22:16:35 -0800	[thread overview]
Message-ID: <CAFHL4X2QudU6d_i25R9JLFN5=V5r6_4EqPO9hoZYZ39AV1m8dQ@mail.gmail.com> (raw)
In-Reply-To: <20160203232834.GH459@dastard>


[-- Attachment #1.1: Type: text/plain, Size: 6326 bytes --]

Hi Dave,

Thanks much for the suggestions. Your suggestion of not mixing preallocated
and non-preallocated writes on the same file makes sense to me.

Regards,
Dilip

On Wed, Feb 3, 2016 at 3:28 PM, Dave Chinner <david@fromorbit.com> wrote:

> On Wed, Feb 03, 2016 at 02:43:27PM -0800, Dilip Simha wrote:
> > On Wed, Feb 3, 2016 at 1:51 PM, Dave Chinner <david@fromorbit.com>
> wrote:
> >
> > > On Wed, Feb 03, 2016 at 09:02:40AM -0600, Eric Sandeen wrote:
> > > >
> > > >
> > > > On 2/3/16 2:30 AM, Dave Chinner wrote:
> > > > > On Tue, Feb 02, 2016 at 11:09:15PM -0800, Dilip Simha wrote:
> > > > >> Hi Dave,
> > > > >>
> > > > >> On Tue, Feb 2, 2016 at 10:37 PM, Dave Chinner <
> david@fromorbit.com>
> > > wrote:
> > > > >>
> > > > >>> On Tue, Feb 02, 2016 at 07:40:34PM -0800, Dilip Simha wrote:
> > > > >>>> Hi Eric,
> > > > >>>>
> > > > >>>> Thank you for your quick reply.
> > > > >>>>
> > > > >>>> Using xfs_io as per your suggestion, I am able to reproduce the
> > > issue.
> > > > >>>> However, I need to falloc for 256K and write for 257K to see
> this
> > > issue.
> > > > >>>>
> > > > >>>> # xfs_io -f -c "falloc 0 256k" -c "pwrite 0 257k"
> > > /srv/node/r1/t1.txt
> > > > >>>> # stat /srv/node/r1/t4.txt | grep Blocks
> > > > >>>>   Size: 263168     Blocks: 1536       IO Block: 4096   regular
> file
> > > > >>>
> > > > >>> Fallocate sets the XFS_DIFLAG_PREALLOC on the inode.
> > > > >>>
> > > > >>> When you writing *past the preallocated area* and do delayed
> > > > >>> allocation, the speculative preallocation beyond EOF is double
> the
> > > > >>> size of the extent at EOF. i.e. 512k, leading to 768k being
> > > > >>> allocated to the file (1536 blocks, exactly).
> > > > >>>
> > > > >>
> > > > >> Thank you for the details.
> > > > >> This is exactly where I am a bit perplexed. Since the reclamation
> > > logic
> > > > >> skips inodes that have the XFS_DIFLAG_PREALLOC flag set, why did
> the
> > > > >> allocation logic allot more blocks on such an inode?
> > > > >
> > > > > To store the data you wrote outside the preallocated region, of
> > > > > course.
> > > >
> > > > I think what Dilip meant was, why does it do preallocation, not
> > > > why does it allocate blocks for the data.  That part is obvious
> > > > of course.  ;)
> > > >
> > > > IOWS, if XFS_DIFLAG_PREALLOC prevents speculative preallocation
> > > > from being reclaimed, why is speculative preallocation added to files
> > > > with that flag set?
> > > >
> > > > Seems like a fair question, even if Swift's use of preallocation is
> > > > ill-advised.
> > > >
> > > > I don't have all the speculative preallocation heuristics in my
> > > > head like you do Dave, but if I have it right, and it's i.e.:
> > > >
> > > > 1) preallocate 256k
> > > > 2) inode gets XFS_DIFLAG_PREALLOC
> > > > 3) write 257k
> > > > 4) inode gets speculative preallocation added due to write past EOF
> > > > 5) inode never gets preallocation trimmed due to XFS_DIFLAG_PREALLOC
> > > >
> > > > that seems suboptimal.
> > >
> > > So do things the other way around:
> > >
> > > 1) write 257k
> > > 2) preallocate 256k beyond EOF and speculative prealloc region
> > > 3) inode gets XFS_DIFLAG_PREALLOC
> > > 4) inode never gets preallocation trimmed due to XFS_DIFLAG_PREALLOC
> > >
> > > This is correct behaviour.
> > >
> >
> > I am sorry, but I don't agree to this. How can an user application know
> > about step2.
>
> Step 2 is fallocate(keep size) to a range well beyond EOF. e.g. in
> preparation for a bunch of sparse writes that are about to take
> place. So userspace will most definitely know about it. It's now the
> kernel that now doesn't have a clue what to do about the speculative
> preallocation it already has because the application is mixing it's
> IO models.
>
> Fundamentally, if you mix writes across persistent preallocation and
> adjacent holes, you are going to get a mess no matter what
> filesystem you do this to. If you don't like the way XFS handles it,
> either fix the application to not do this, or use the mount option
> to turn off speculative preallocation.
>
> Just like we say "don't mix direct IO and buffered IO on the same
> file", it's a really good idea not to mix preallocated and
> non-preallocated writes to the same file.
>
> > > But if we decide that we don't do speculative prealloc when
> > > XFS_DIFLAG_PREALLOC is set, then workloads that mis-use fallocate
> > > (like swift), or use fallocate to fill sparse holes in files are
> > > going fragment the hell out of their files when they extending
> > > them.
> > >
> >
> > I don't understand why would this be the case. If XFS doesn't do
> > speculative preallocation then for the 256 byte write after the end of
> EOF
> > will simply result in pushing the EOF ahead. So I see no harm if XFS
> > doesn't do speculative preallocation when XFS_DIFLAG_PREALLOC is set.
>
> I see *potential harm* in changing a long standing default
> behaviour.
>
> > > In reality, if swift is really just writing 1k past the prealloc'd
> > > range it creates, then that is clearly an application bug. Further,
> > > if swift is only ever preallocating the first 256k of each file it
> > > writes, regardless of size, then that is also an application bug.
> >
> > Its not a bug. Assume a use-case like appending to a file. Would you say
> > append is a buggy operation?
>
> If the app is using preallocation to reduce append workload file
> fragmenation, and then doesn't use preallocation once it is used up,
> the the app is definitely buggy because it's not being consistent in
> it's IO behaviour.  The app should always use fallocate() to control
> file layout, or it should never use fallocate and leave the
> filesystem to optimise the layout at it sees best.
>
> In my experience, the filesystem will almost always do a better job
> of optimising allocation for best throughput and minimum seeks than
> applications using fallocate().
>
> IOWs, the default behaviour of XFS has been around for more than 15
> years and is sane for the majority of applications out there. Hence
> the solution here is to either fix the application that is doing
> stupid things with fallocate(), or use the allocasize mount option
> to minimise the impact of the stupid thing the buggy application is
> doing.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>

[-- Attachment #1.2: Type: text/html, Size: 8470 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2016-02-04  6:17 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-02 22:32 Request for information on bloated writes using Swift Dilip Simha
2016-02-03  2:47 ` Eric Sandeen
2016-02-03  3:40   ` Dilip Simha
2016-02-03  3:42     ` Dilip Simha
2016-02-03  6:37     ` Dave Chinner
2016-02-03  7:09       ` Dilip Simha
2016-02-03  8:30         ` Dave Chinner
2016-02-03 15:02           ` Eric Sandeen
2016-02-03 21:51             ` Dave Chinner
2016-02-03 22:43               ` Dilip Simha
2016-02-03 23:28                 ` Dave Chinner
2016-02-04  6:16                   ` Dilip Simha [this message]
2016-02-03 16:10           ` Dilip Simha
2016-02-03 16:15             ` Dilip Simha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFHL4X2QudU6d_i25R9JLFN5=V5r6_4EqPO9hoZYZ39AV1m8dQ@mail.gmail.com' \
    --to=nmdilipsimha@gmail.com \
    --cc=david@fromorbit.com \
    --cc=sandeen@sandeen.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.