All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dilip Simha <nmdilipsimha@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Eric Sandeen <sandeen@sandeen.net>, xfs@oss.sgi.com
Subject: Re: Request for information on bloated writes using Swift
Date: Tue, 2 Feb 2016 23:09:15 -0800	[thread overview]
Message-ID: <CAFHL4X0m8Ov+zJxteUJJxzEHVXpJsfe=9mtapRmWkhT6VRkDxg@mail.gmail.com> (raw)
In-Reply-To: <20160203063705.GB459@dastard>


[-- Attachment #1.1: Type: text/plain, Size: 3855 bytes --]

Hi Dave,

On Tue, Feb 2, 2016 at 10:37 PM, Dave Chinner <david@fromorbit.com> wrote:

> On Tue, Feb 02, 2016 at 07:40:34PM -0800, Dilip Simha wrote:
> > Hi Eric,
> >
> > Thank you for your quick reply.
> >
> > Using xfs_io as per your suggestion, I am able to reproduce the issue.
> > However, I need to falloc for 256K and write for 257K to see this issue.
> >
> > # xfs_io -f -c "falloc 0 256k" -c "pwrite 0 257k" /srv/node/r1/t1.txt
> > # stat /srv/node/r1/t4.txt | grep Blocks
> >   Size: 263168     Blocks: 1536       IO Block: 4096   regular file
>
> Fallocate sets the XFS_DIFLAG_PREALLOC on the inode.
>
> When you writing *past the preallocated area* and do delayed
> allocation, the speculative preallocation beyond EOF is double the
> size of the extent at EOF. i.e. 512k, leading to 768k being
> allocated to the file (1536 blocks, exactly).
>

Thank you for the details.
This is exactly where I am a bit perplexed. Since the reclamation logic
skips inodes that have the XFS_DIFLAG_PREALLOC flag set, why did the
allocation logic allot more blocks on such an inode?
My understanding is that the fallocate caller only requested for 256K worth
of blocks to be available sequentially if possible. On any subsequent write
beyond the EOF, the caller is completely unaware of the underlying
file-system storing that data adjacent to the first 256K data. Since XFS is
speculatively allocating additional space (512K) adjacent to the first 256K
data, I would expect XFS to either treat these two allocations distinctly
and NOT mark XFS_DIFLAG_PREALLOC on the additional 512K data(minus the
actually used additional data=1K), OR remove XFS_DIFLAG_PREALLOC flag on
the entire inode.

Also, is there any way I can check for this flag?
The FLAGS, as observed from xfs_bmap doesn't show any flags set to it. Am I
not looking at the right flags?

# xfs_bmap -lpv
/srv/node/r16/objects/10/ff3/55517cd029bee36151a5098ce7cdeff3/1453771923.11401.data
/srv/node/r16/objects/10/ff3/55517cd029bee36151a5098ce7cdeff3/1453771923.11401.data:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..1535]: 1465876416..1465877951 1 (744384..745919) 1536 00000

Thanks & Regards,
Dilip


> This is expected behaviour.
>
> > # xfs_io -f -c "pwrite 0 257k" /srv/node/r1/t2.txt
> > # stat  /srv/node/r1/t2.txt | grep Blocks
> > Size: 263168    *Blocks*: 520        IO Block: 4096   regular file
>
> So pure delayed allocation, specualtive preallocation starts at 64k
> file size, so it would have been (((64k + 64K) + 128K) + 256k) =
> 768k.
>
>
> > I waited for around 15 mins before collecting the stat output to give the
> > background reclamation logic a fair chance to do its job. I also tried
> > changing the value of speculative_prealloc_lifetime from 300 to 10. But
> it
> > was of no use.
>
> The prealloc cleaner skips inodes with XFS_DIFLAG_PREALLOC set on
> them.
>
> Because the XFS_DIFLAG_PREALLOC flag is not set on the delayed
> allocation inode, the EOF blocks cleaner runs truncates it to EOF,
> and 260k (520 blocks) remains allocated to the file.
>
> i.e. you are seeing behaviour exactly as designed and intended.
>
> The way swift is using fallocate is actively harmful. You do not
> want preallocation for write once files - this is exactly the
> workload that delayed allocation was designed to be optimal for as
> delayed allocation sequentialises the IO from multiple files.
>
> Using preallocation means writeback of the data cannot be optimised
> across files as the preallocation location will not be sequential to
> the IO that was just issued, hence writeback will seek the disks
> back and forth instead of seeing a nice sequential IO stream.
>
> <sigh>
>
> Yet another way that the swift storage back end tries to be smart
> but ends up just making things go slow....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>

[-- Attachment #1.2: Type: text/html, Size: 5062 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2016-02-03  7:09 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-02 22:32 Request for information on bloated writes using Swift Dilip Simha
2016-02-03  2:47 ` Eric Sandeen
2016-02-03  3:40   ` Dilip Simha
2016-02-03  3:42     ` Dilip Simha
2016-02-03  6:37     ` Dave Chinner
2016-02-03  7:09       ` Dilip Simha [this message]
2016-02-03  8:30         ` Dave Chinner
2016-02-03 15:02           ` Eric Sandeen
2016-02-03 21:51             ` Dave Chinner
2016-02-03 22:43               ` Dilip Simha
2016-02-03 23:28                 ` Dave Chinner
2016-02-04  6:16                   ` Dilip Simha
2016-02-03 16:10           ` Dilip Simha
2016-02-03 16:15             ` Dilip Simha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFHL4X0m8Ov+zJxteUJJxzEHVXpJsfe=9mtapRmWkhT6VRkDxg@mail.gmail.com' \
    --to=nmdilipsimha@gmail.com \
    --cc=david@fromorbit.com \
    --cc=sandeen@sandeen.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.