From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH 13/24] xfs: make xfs_writepage_map extent map centric
Date: Mon, 18 Jun 2018 22:43:25 -0700 [thread overview]
Message-ID: <20180619054325.GV8128@magnolia> (raw)
In-Reply-To: <20180615130209.1970-14-hch@lst.de>
On Fri, Jun 15, 2018 at 03:01:58PM +0200, Christoph Hellwig wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> xfs_writepage_map() iterates over the bufferheads on a page to decide
> what sort of IO to do and what actions to take. However, when it comes
> to reflink and deciding when it needs to execute a COW operation, we no
> longer look at the bufferhead state but instead we ignore than and look
> up internal state held in the COW fork extent list.
>
> This means xfs_writepage_map() is somewhat confused. It does stuff, then
> ignores it, then tries to handle the impedence mismatch by shovelling the
> results inside the existing mapping code. It works, but it's a bit of a
> mess and it makes it hard to fix the cached map bug that the writepage
> code currently has.
>
> To unify the two different mechanisms, we first have to choose a direction.
> That's already been set - we're de-emphasising bufferheads so they are no
> longer a control structure as we need to do taht to allow for eventual
> removal. Hence we need to move away from looking at bufferhead state to
> determine what operations we need to perform.
>
> We can't completely get rid of bufferheads yet - they do contain some
> state that is absolutely necessary, such as whether that part of the page
> contains valid data or not (buffer_uptodate()). Other state in the
> bufferhead is redundant:
>
> BH_dirty - the page is dirty, so we can ignore this and just
> write it
> BH_delay - we have delalloc extent info in the DATA fork extent
> tree
> BH_unwritten - same as BH_delay
> BH_mapped - indicates we've already used it once for IO and it is
> mapped to a disk address. Needs to be ignored for COW
> blocks.
>
> The BH_mapped flag is an interesting case - it's supposed to indicate that
> it's already mapped to disk and so we can just use it "as is". In theory,
> we don't even have to do an extent lookup to find where to write it too,
> but we have to do that anyway to determine we are actually writing over a
> valid extent. Hence it's not even serving the purpose of avoiding a an
> extent lookup during writeback, and so we can pretty much ignore it.
> Especially as we have to ignore it for COW operations...
>
> Therefore, use the extent map as the source of information to tell us
> what actions we need to take and what sort of IO we should perform. The
> first step is to have xfs_map_blocks() set the io type according to what
> it looks up. This means it can easily handle both normal overwrite and
> COW cases. The only thing we also need to add is the ability to return
> hole mappings.
>
> We need to return and cache hole mappings now for the case of multiple
> blocks per page. We no longer use the BH_mapped to indicate a block over
> a hole, so we have to get that info from xfs_map_blocks(). We cache it so
> that holes that span two pages don't need separate lookups. This allows us
> to avoid ever doing write IO over a hole, too.
>
> Now that we have xfs_map_blocks() returning both a cached map and the type
> of IO we need to perform, we can rewrite xfs_writepage_map() to drop all
> the bufferhead control. It's also much simplified because it doesn't need
> to explicitly handle COW operations. Instead of iterating bufferheads, it
> iterates blocks within the page and then looks up what per-block state is
> required from the appropriate bufferhead. It then validates the cached
> map, and if it's not valid, we get a new map. If we don't get a valid map
> or it's over a hole, we skip the block.
>
> At this point, we have to remap the bufferhead via xfs_map_at_offset().
> As previously noted, we had to do this even if the buffer was already
> mapped as the mapping would be stale for XFS_IO_DELALLOC, XFS_IO_UNWRITTEN
> and XFS_IO_COW IO types. With xfs_map_blocks() now controlling the type,
> even XFS_IO_OVERWRITE types need remapping, as converted-but-not-yet-
> written delalloc extents beyond EOF can be reported at XFS_IO_OVERWRITE.
> Bufferheads that span such regions still need their BH_Delay flags cleared
> and their block numbers calculated, so we now unconditionally map each
> bufferhead before submission.
>
> But wait! There's more - remember the old "treat unwritten extents as
> holes on read" hack? Yeah, that means we can have a dirty page with
> unmapped, unwritten bufferheads that contain data! What makes these so
> special is that the unwritten "hole" bufferheads do not have a valid block
> device pointer, so if we attempt to write them xfs_add_to_ioend() blows
> up. So we make xfs_map_at_offset() do the "realtime or data device"
> lookup from the inode and ignore what was or wasn't put into the
> bufferhead when the buffer was instantiated.
>
> The astute reader will have realised by now that this code treats
> unwritten extents in multiple-blocks-per-page situations differently.
> If we get any combination of unwritten blocks on a dirty page that contain
> valid data in the page, we're going to convert them to real extents. This
> can actually be a win, because it means that pages with interleaving
> unwritten and written blocks will get converted to a single written extent
> with zeros replacing the interspersed unwritten blocks. This is actually
> good for reducing extent list and conversion overhead, and it means we
> issue a contiguous IO instead of lots of little ones. The downside is
> that we use up a little extra IO bandwidth. Neither of these seem like a
> bad thing given that spinning disks are seek sensitive, and SSDs/pmem have
> bandwidth to burn and the lower Io latency/CPU overhead of fewer, larger
> IOs will result in better performance on them...
>
> As a result of all this, the only state we actually care about from the
> bufferhead is a single flag - BH_Uptodate. We still use the bufferhead to
> pass some information to the bio via xfs_add_to_ioend(), but that is
> trivial to separate and pass explicitly. This means we really only need
> 1 bit of state per block per page from the buffered write path in the
> writeback path. Everything else we do with the bufferhead is purely to
> make the buffered IO front end continue to work correctly. i.e we've
> pretty much marginalised bufferheads in the writeback path completely.
>
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> [hch: forward port + slight refactoring]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_aops.c | 89 ++++++++++++++++++++---------------------------
> 1 file changed, 37 insertions(+), 52 deletions(-)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 8c1a28f39197..165891ecb4ba 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -454,19 +454,19 @@ xfs_map_blocks(
> } else if (imap.br_startblock == HOLESTARTBLOCK) {
> /* landed in a hole */
> wpc->io_type = XFS_IO_HOLE;
> - }
> -
> - if (wpc->io_type == XFS_IO_DELALLOC &&
> - (!nimaps || isnullstartblock(imap.br_startblock)))
> - goto allocate_blocks;
> + } else {
> + if (isnullstartblock(imap.br_startblock)) {
> + /* got a delalloc extent */
> + wpc->io_type = XFS_IO_DELALLOC;
> + goto allocate_blocks;
> + }
>
> -#ifdef DEBUG
> - if (wpc->io_type == XFS_IO_UNWRITTEN) {
> - ASSERT(nimaps);
> - ASSERT(imap.br_startblock != HOLESTARTBLOCK);
> - ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
> + if (imap.br_state == XFS_EXT_UNWRITTEN)
> + wpc->io_type = XFS_IO_UNWRITTEN;
> + else
> + wpc->io_type = XFS_IO_OVERWRITE;
> }
> -#endif
> +
> wpc->imap = imap;
> trace_xfs_map_blocks_found(ip, offset, count, wpc->io_type, &imap);
> return 0;
> @@ -736,6 +736,14 @@ xfs_map_at_offset(
> set_buffer_mapped(bh);
> clear_buffer_delay(bh);
> clear_buffer_unwritten(bh);
> +
> + /*
> + * If this is a realtime file, data may be on a different device.
> + * to that pointed to from the buffer_head b_bdev currently. We can't
> + * trust that the bufferhead has a already been mapped correctly, so
> + * set the bdev now.
> + */
> + bh->b_bdev = xfs_find_bdev_for_inode(inode);
> }
>
> STATIC void
> @@ -822,58 +830,36 @@ xfs_writepage_map(
> {
> LIST_HEAD(submit_list);
> struct xfs_ioend *ioend, *next;
> - struct buffer_head *bh, *head;
> + struct buffer_head *bh;
> ssize_t len = i_blocksize(inode);
> uint64_t file_offset; /* file offset of page */
> + unsigned poffset; /* offset into page */
> int error = 0;
> int count = 0;
> - unsigned int new_type;
>
> - bh = head = page_buffers(page);
> + /*
> + * Walk the blocks on the page, and we we run off then end of the
> + * current map or find the current map invalid, grab a new one.
Will rework this comment to read:
"...and if we run off the end of the current map or..."
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> + * We only use bufferheads here to check per-block state - they no
> + * longer control the iteration through the page. This allows us to
> + * replace the bufferhead with some other state tracking mechanism in
> + * future.
> + */
> file_offset = page_offset(page);
> - do {
> + bh = page_buffers(page);
> + for (poffset = 0;
> + poffset < PAGE_SIZE;
> + poffset += len, file_offset += len, bh = bh->b_this_page) {
> + /* past the range we are writing, so nothing more to write. */
> if (file_offset >= end_offset)
> break;
>
> - /*
> - * set_page_dirty dirties all buffers in a page, independent
> - * of their state. The dirty state however is entirely
> - * meaningless for holes (!mapped && uptodate), so skip
> - * buffers covering holes here.
> - */
> - if (!buffer_mapped(bh) && buffer_uptodate(bh))
> - continue;
> -
> - if (buffer_unwritten(bh))
> - new_type = XFS_IO_UNWRITTEN;
> - else if (buffer_delay(bh))
> - new_type = XFS_IO_DELALLOC;
> - else if (buffer_uptodate(bh))
> - new_type = XFS_IO_OVERWRITE;
> - else {
> + if (!buffer_uptodate(bh)) {
> if (PageUptodate(page))
> ASSERT(buffer_mapped(bh));
> - /*
> - * This buffer is not uptodate and will not be
> - * written to disk.
> - */
> continue;
> }
>
> - /*
> - * If we already have a valid COW mapping keep using it.
> - */
> - if (wpc->io_type == XFS_IO_COW &&
> - xfs_imap_valid(inode, &wpc->imap, file_offset)) {
> - wpc->imap_valid = true;
> - new_type = XFS_IO_COW;
> - }
> -
> - if (wpc->io_type != new_type) {
> - wpc->io_type = new_type;
> - wpc->imap_valid = false;
> - }
> -
> if (wpc->imap_valid)
> wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> file_offset);
> @@ -891,11 +877,10 @@ xfs_writepage_map(
> continue;
>
> lock_buffer(bh);
> - if (wpc->io_type != XFS_IO_OVERWRITE)
> - xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> + xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
> count++;
> - } while (file_offset += len, ((bh = bh->b_this_page) != head));
> + }
>
> ASSERT(wpc->ioend || list_empty(&submit_list));
>
> --
> 2.17.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-06-19 5:43 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-15 13:01 stop using buffer heads in xfs v6 Christoph Hellwig
2018-06-15 13:01 ` [PATCH 01/24] iomap: add an iomap-based readpage and readpages implementation Christoph Hellwig
2018-06-29 14:44 ` [PATCH] iomap: Add inline data support to iomap_readpage_actor Andreas Gruenbacher
2018-07-01 6:21 ` Christoph Hellwig
2018-07-01 21:43 ` Andreas Gruenbacher
2018-07-02 12:52 ` Christoph Hellwig
2018-07-02 15:05 ` Andreas Gruenbacher
2018-06-15 13:01 ` [PATCH 02/24] xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages Christoph Hellwig
2018-06-15 13:01 ` [PATCH 03/24] iomap: add initial support for writes without buffer heads Christoph Hellwig
2018-06-15 13:01 ` [PATCH 04/24] xfs: simplify xfs_bmap_punch_delalloc_range Christoph Hellwig
2018-06-15 13:01 ` [PATCH 05/24] xfs: simplify xfs_aops_discard_page Christoph Hellwig
2018-06-15 13:01 ` [PATCH 06/24] xfs: move locking into xfs_bmap_punch_delalloc_range Christoph Hellwig
2018-06-19 5:26 ` Darrick J. Wong
2018-06-15 13:01 ` [PATCH 07/24] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
2018-06-15 13:01 ` [PATCH 08/24] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
2018-06-15 13:01 ` [PATCH 09/24] xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocks Christoph Hellwig
2018-06-19 5:27 ` Darrick J. Wong
2018-06-15 13:01 ` [PATCH 10/24] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
2018-06-19 5:30 ` Darrick J. Wong
2018-06-15 13:01 ` [PATCH 11/24] xfs: remove xfs_map_cow Christoph Hellwig
2018-06-18 17:38 ` Brian Foster
2018-06-19 5:35 ` Darrick J. Wong
2018-06-19 16:53 ` Christoph Hellwig
2018-06-20 0:37 ` Darrick J. Wong
2018-06-15 13:01 ` [PATCH 12/24] xfs: rename the offset variable in xfs_writepage_map Christoph Hellwig
2018-06-19 5:37 ` Darrick J. Wong
2018-06-15 13:01 ` [PATCH 13/24] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
2018-06-18 17:38 ` Brian Foster
2018-06-19 5:43 ` Darrick J. Wong [this message]
2018-06-19 16:52 ` Christoph Hellwig
2018-06-15 13:01 ` [PATCH 14/24] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
2018-06-15 13:02 ` [PATCH 15/24] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
2018-06-15 13:02 ` [PATCH 16/24] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
2018-06-15 13:02 ` [PATCH 17/24] xfs: remove the imap_valid flag Christoph Hellwig
2018-06-15 13:02 ` [PATCH 18/24] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
2018-06-15 13:02 ` [PATCH 19/24] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
2018-06-15 13:02 ` [PATCH 20/24] xfs: remove xfs_start_page_writeback Christoph Hellwig
2018-06-15 13:02 ` [PATCH 21/24] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
2018-06-15 13:02 ` [PATCH 22/24] xfs: allow writeback on pages without buffer heads Christoph Hellwig
2018-06-15 13:02 ` [PATCH 23/24] iomap: add support for sub-pagesize buffered I/O " Christoph Hellwig
2018-06-19 16:52 ` Brian Foster
2018-06-20 7:56 ` Christoph Hellwig
2018-06-20 14:32 ` Brian Foster
2018-06-20 16:08 ` Darrick J. Wong
2018-06-20 18:12 ` Brian Foster
2018-06-20 19:02 ` Darrick J. Wong
2018-06-21 8:46 ` Christoph Hellwig
2018-06-23 13:06 ` Brian Foster
2018-06-29 15:59 ` Christoph Hellwig
2018-07-02 12:50 ` Christoph Hellwig
2018-07-02 18:16 ` Brian Foster
2018-06-21 7:53 ` Christoph Hellwig
2018-06-15 13:02 ` [PATCH 24/24] xfs: add support for sub-pagesize writeback without buffer_heads Christoph Hellwig
2018-06-19 6:15 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180619054325.GV8128@magnolia \
--to=darrick.wong@oracle.com \
--cc=dchinner@redhat.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).