All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org, hch@infradead.org
Subject: Re: [PATCH 3/3] xfs: measure all contiguous previous extents for prealloc size
Date: Wed, 20 May 2020 09:23:34 -0400	[thread overview]
Message-ID: <20200520132334.GA36554@bfoster> (raw)
In-Reply-To: <20200519124827.GC23387@bfoster>

On Tue, May 19, 2020 at 08:48:27AM -0400, Brian Foster wrote:
> On Mon, May 18, 2020 at 05:49:23PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When we're estimating a new speculative preallocation length for an
> > extending write, we should walk backwards through the extent list to
> > determine the number of number of blocks that are physically and
> > logically contiguous with the write offset, and use that as an input to
> > the preallocation size computation.
> > 
> > This way, preallocation length is truly measured by the effectiveness of
> > the allocator in giving us contiguous allocations without being
> > influenced by the state of a given extent.  This fixes both the problem
> > where ZERO_RANGE within an EOF can reduce preallocation, and prevents
> > the unnecessary shrinkage of preallocation when delalloc extents are
> > turned into unwritten extents.
> > 
> > This was found as a regression in xfs/014 after changing delalloc writes
> > to create unwritten extents during writeback.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_iomap.c |   63 +++++++++++++++++++++++++++++++++++++++++++---------
> >  1 file changed, 52 insertions(+), 11 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index ac970b13b1f8..2dffd56a433c 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
...
> > @@ -413,15 +453,16 @@ xfs_iomap_prealloc_size(
> >  	 * preallocation size.
> >  	 *
> >  	 * If the extent is a hole, then preallocation is essentially disabled.
> > -	 * Otherwise we take the size of the preceding data extent as the basis
> > -	 * for the preallocation size. If the size of the extent is greater than
> > -	 * half the maximum extent length, then use the current offset as the
> > -	 * basis. This ensures that for large files the preallocation size
> > -	 * always extends to MAXEXTLEN rather than falling short due to things
> > -	 * like stripe unit/width alignment of real extents.
> > +	 * Otherwise we take the size of the contiguous preceding data extents
> > +	 * as the basis for the preallocation size. If the size of the extent
> 
> I'd refer to it as the "size of the contiguous range" or some such since
> we're now talking about aggregating many extents to form the prealloc
> basis.
> 
> I am a little curious if there's any noticeable impact from having to
> perform the worst case extent walk in this path. For example, suppose we
> had a speculatively preallocated file where we started writing (i.e.
> converting) every other unwritten block such that this path had to walk
> an extent per block until hitting the 8g max (8g == 2097152 4k blocks).
> I guess the hope is that either most of those blocks wouldn't have been
> written back and converted by the time we'd trigger the next post-eof
> prealloc or that it would just be a wash in the stream of staggered
> writes falling into our max sized preallocations. Either way, I think
> it's more important to maintain the existing heuristic so this seems
> reasonable from that perspective.
> 

I had a system spinning yesterday to create this worst case condition
across a couple max sized extents. Testing out the preallocation path, I
see a hit from ~5ms to ~35ms between the baseline variant and the
updated calculation algorithm. Note that the baseline here is only doing
a 16 block prealloc vs. 8gb with the fix, but regardless a 30ms
difference for an 8gb allocation doesn't really seem noticeable enough
to matter. IOW, I think we can disgregard this particular concern...

> This scenario also makes me wonder if we should consider continuing to
> do write time file size extension zeroing in certain cases vs. leaving
> around unwritten extents. For example, the current post-eof prealloc
> behavior is that writes that jump over current EOF will zero (via
> buffered writes) any allocated blocks (delalloc or physical) between
> current EOF and the start of the write. That behavior doesn't change
> with delalloc -> unwritten if the prealloc is still delalloc at the time
> of the extending write, but we'd presumably skip those zeroing writes if
> the prealloc had been converted to real+unwritten first. Hmm.. that does
> seem a little random to me, particularly if somebody starts to see
> unwritten extents sprinkled throughout a file that never explicitly saw
> preallocation. Perhaps we should avoid converting post-eof blocks? OTOH,
> unwritten probably makes sense for large jumps in EOF vs zeroing GBs of
> blocks, so one could argue that we should convert such ranges (if still
> delalloc) rather than zero them at all. Thoughts? We should probably
> work this out one way or another before landing the delalloc ->
> unwritten behavior..
> 

Thinking about this one a little more.. it seems it's probably not worth
conditionally changing post-eof conversion behavior. Since there are
cases where we probably want post-eof prealloc to remain as unwritten if
it's carried within i_size, that would either be extra code for post-eof
blocks or would split up any kind of heuristic for manually zeroing such
blocks. ISTM that the right approach might be to convert all delalloc ->
unwritten (as this series already does), then perhaps consider a write
time heuristic that would perform manual zeroing of extents if certain
conditions are met (e.g., for unwritten extents under a certain size).

The remaining question to me is whether we should include something like
that before changing delalloc behavior or consider it separately as an
optimization...

Brian

P.S., BTW I forgot to mention on my last pass of this series that it
probably makes sense to reorder these behavioral fixup patches to
precede the patch that actually changes delalloc to allocate unwritten
extents.

> Brian
> 
> > +	 * is greater than half the maximum extent length, then use the current
> > +	 * offset as the basis. This ensures that for large files the
> > +	 * preallocation size always extends to MAXEXTLEN rather than falling
> > +	 * short due to things like stripe unit/width alignment of real
> > +	 * extents.
> >  	 */
> > -	if (prev.br_blockcount <= (MAXEXTLEN >> 1))
> > -		alloc_blocks = prev.br_blockcount << 1;
> > +	if (plen <= (MAXEXTLEN >> 1))
> > +		alloc_blocks = plen << 1;
> >  	else
> >  		alloc_blocks = XFS_B_TO_FSB(mp, offset);
> >  	if (!alloc_blocks)
> > 
> 


  reply	other threads:[~2020-05-20 13:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-19  0:49 [PATCH 0/3] xfs: fix stale disk exposure after crash Darrick J. Wong
2020-05-19  0:49 ` [PATCH 1/3] xfs: force writes to delalloc regions to unwritten Darrick J. Wong
2020-05-19 12:45   ` Brian Foster
2020-05-19  0:49 ` [PATCH 2/3] xfs: don't fail unwritten extent conversion on writeback due to edquot Darrick J. Wong
2020-05-19  7:13   ` Christoph Hellwig
2020-05-19 12:46   ` Brian Foster
2020-05-19  0:49 ` [PATCH 3/3] xfs: measure all contiguous previous extents for prealloc size Darrick J. Wong
2020-05-19 12:48   ` Brian Foster
2020-05-20 13:23     ` Brian Foster [this message]
2020-05-20 19:48       ` Darrick J. Wong
2020-05-21 12:24         ` Brian Foster
2020-05-19 12:54   ` Christoph Hellwig
2020-05-20 21:17     ` Darrick J. Wong
2020-05-21  9:31       ` Christoph Hellwig
2020-05-21 17:19         ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200520132334.GA36554@bfoster \
    --to=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.