linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 14/16] xfs: align writepages to large block sizes
Date: Mon, 19 Nov 2018 12:14:55 +1100	[thread overview]
Message-ID: <20181119011455.GI19305@dastard> (raw)
In-Reply-To: <20181116132922.GA31603@bfoster>

On Fri, Nov 16, 2018 at 08:29:23AM -0500, Brian Foster wrote:
> On Fri, Nov 16, 2018 at 05:19:36PM +1100, Dave Chinner wrote:
> > i.e. we round WB_SYNC_NONE to try to get whole blocks written so we
> > don't end up with partially written blocks on disk for extended
> > periods of time (e.g. between background writeback periods). It
> > doesn't matter for WB_SYNC_ALL, but it will reduce the potential for
> > stale data beign exposed when crashes occur and random pages in
> > blocks have't been written back. i.e. it's to help iprevent
> > re-exposing the problematic cases that we added the "NULL files
> > after crash" workarounds for.
> > 
> 
> Ok. I see that there are earlier patches to do zero-around on sub-block
> writes, so the idea makes a bit more sense with that in mind. That said,
> I still don't grok how messing with nr_to_write is effective.

Because background writeback is range_cyclic = 1, and that means we
always start at offset zero, and then if nr_to_write expires we
stash the page index we are up to in:

	mapping->writeback_index = done_index

And the next background writeback will start again from there.

Hence if nr_to_write is always rounding to the number of pages per
block, background writeback will /tend/ towards writing full blocks
because the writeback_index will always end up a multiple of pages
per block. Hence cyclic writeback will tend towards writing aligned,
full blocks when nr_to_write is rounded.

That's the fundamental concept here - write-in does "zero-around" to
initialise full blocks, writeback does "write-around" to push full
blocks to disk. WB_SYNC_ALL needs ranges to be rounded to do full
block writeback, WB_SYNC_NONE background writeback needs it's range
cyclic behaviour to round to writing full blocks (and that's what
rounding nr_to_write is doing in this patch).

> For background writeback (WB_SYNC_NONE), the range fields are clamped
> out (0-LONG_MAX) since the location of pages to write is not really a
> concern. In that case, ->nr_to_write is set based on some bandwidth
> heuristic and the only change we make here is to round it. If we
> consider the fact that any mapping itself may consist of some
> combination of zeroed-around delalloc blocks (covered by an aligned
> number of dirty pages) and already allocated/overwrite blocks (covered
> by any number of dirty pages), how does a rounded ->nr_to_write actually
> help us at all? Can't the magic ->nr_to_write value that prevents
> stopping at a partially written sub-block page be unaligned to the block
> size?

Yup, I never intended for this RFC prototype to deal with all these
problems. That doesn't mean I'm not aware of them, nor that I don't
have a plan to deal with them.

> Given the above, I don't see how tweaking ->nr_to_write really helps at
> all even as an optimization. Unless I'm missing something else in the
> earlier patches that facilitate this, ISTM that something more explicit
> is required if you want to increase the odds that zeroed-around blocks
> are written together.

Which I've always intended as future work. I've spent about 14 hours
on this patch set so far - it's a functional prototype, not a
finished, completed piece of work.

I'm fully aware that this going to need a lot more work before it is
ready for merging This is an early prototype I'm putting out there
for architectural/design review. i.e. don't bother nitpicking
unimplemented details or bugs, look for big picture things I've got
wrong. Look for showstoppers and fundamental problems, not things
that just require a little bit of time and coding to implement....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2018-11-19 11:52 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-07  6:31 [RFC PATCH 00/16] xfs: Block size > PAGE_SIZE support Dave Chinner
2018-11-07  6:31 ` [PATCH 01/16] xfs: drop ->writepage completely Dave Chinner
2018-11-09 15:12   ` Christoph Hellwig
2018-11-12 21:08     ` Dave Chinner
2021-02-02 20:51       ` Darrick J. Wong
2018-11-07  6:31 ` [PATCH 02/16] xfs: move writepage context warnings to writepages Dave Chinner
2018-11-07  6:31 ` [PATCH 03/16] xfs: finobt AG reserves don't consider last AG can be a runt Dave Chinner
2018-11-07 16:55   ` Darrick J. Wong
2018-11-09  0:21     ` Dave Chinner
2018-11-07  6:31 ` [PATCH 04/16] xfs: extent shifting doesn't fully invalidate page cache Dave Chinner
2018-11-07  6:31 ` [PATCH 05/16] iomap: sub-block dio needs to zeroout beyond EOF Dave Chinner
2018-11-09 15:15   ` Christoph Hellwig
2018-11-07  6:31 ` [PATCH 06/16] iomap: support block size > page size for direct IO Dave Chinner
2018-11-08 11:28   ` Nikolay Borisov
2018-11-09 15:18   ` Christoph Hellwig
2018-11-11  1:12     ` Dave Chinner
2018-11-07  6:31 ` [PATCH 07/16] iomap: prepare buffered IO paths for block size > page size Dave Chinner
2018-11-09 15:19   ` Christoph Hellwig
2018-11-11  1:15     ` Dave Chinner
2018-11-07  6:31 ` [PATCH 08/16] iomap: mode iomap_zero_range and friends Dave Chinner
2018-11-09 15:19   ` Christoph Hellwig
2018-11-07  6:31 ` [PATCH 09/16] iomap: introduce zero-around functionality Dave Chinner
2018-11-07  6:31 ` [PATCH 10/16] iomap: enable zero-around for iomap_zero_range() Dave Chinner
2018-11-07  6:31 ` [PATCH 11/16] iomap: Don't mark partial pages zeroing uptodate for zero-around Dave Chinner
2018-11-07  6:31 ` [PATCH 12/16] iomap: zero-around in iomap_page_mkwrite Dave Chinner
2018-11-07  6:31 ` [PATCH 13/16] xfs: add zero-around controls to iomap Dave Chinner
2018-11-07  6:31 ` [PATCH 14/16] xfs: align writepages to large block sizes Dave Chinner
2018-11-09 15:22   ` Christoph Hellwig
2018-11-11  1:20     ` Dave Chinner
2018-11-11 16:32       ` Christoph Hellwig
2018-11-14 14:19   ` Brian Foster
2018-11-14 21:18     ` Dave Chinner
2018-11-15 12:55       ` Brian Foster
2018-11-16  6:19         ` Dave Chinner
2018-11-16 13:29           ` Brian Foster
2018-11-19  1:14             ` Dave Chinner [this message]
2018-11-07  6:31 ` [PATCH 15/16] xfs: expose block size in stat Dave Chinner
2018-11-07  6:31 ` [PATCH 16/16] xfs: enable block size larger than page size support Dave Chinner
2018-11-07 17:14 ` [RFC PATCH 00/16] xfs: Block size > PAGE_SIZE support Darrick J. Wong
2018-11-07 22:04   ` Dave Chinner
2018-11-08  1:38     ` Darrick J. Wong
2018-11-08  9:04       ` Dave Chinner
2018-11-08 22:17         ` Darrick J. Wong
2018-11-08 22:22           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181119011455.GI19305@dastard \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).