All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4 v2] xfs: get rid of xfs_cluster_write()
@ 2015-08-25  5:05 Dave Chinner
  2015-08-25  5:05 ` [PATCH 1/4] xfs: remove nonblocking mode from xfs_vm_writepage Dave Chinner
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Dave Chinner @ 2015-08-25  5:05 UTC (permalink / raw)
  To: xfs

Hi folks,

This is the second version of the patchset that removes
xfs_cluster_write() in preference to using state in
generic_writepages to enable clustering of pages. The full
description from the v1 patchset is here:

http://oss.sgi.com/archives/xfs/2015-08/msg00224.html

The end result is effectively unchanged from version 1, just the
breakdown of the patches is different to the original posting, as
requested by Christoph.

-Dave.

Version 2:
- promote removal of non-blocking behaviour to be an initial cleanup
  patch
- collapse staged introduction of struct xfs_writepage_context into
  a single patch.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] xfs: remove nonblocking mode from xfs_vm_writepage
  2015-08-25  5:05 [PATCH 0/4 v2] xfs: get rid of xfs_cluster_write() Dave Chinner
@ 2015-08-25  5:05 ` Dave Chinner
  2015-08-31 18:41   ` Christoph Hellwig
  2015-08-25  5:05 ` [PATCH 2/4] xfs: Introduce writeback context for writepages Dave Chinner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2015-08-25  5:05 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Remove the nonblocking optimisation done for mapping lookups during
writeback. It's not clear that leaving a hole in the writeback range
just because we couldn't get a lock is really a win, as it makes us
do another small random IO later on rather than a large sequential
IO now.

As this gets in the way of sane error handling later on, just remove
for the moment and we can re-introduce an equivalent optimisation in
future if we see problems due to extent map lock contention.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 20 +++-----------------
 1 file changed, 3 insertions(+), 17 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 3859f5e..89fad6b 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -276,8 +276,7 @@ xfs_map_blocks(
 	struct inode		*inode,
 	loff_t			offset,
 	struct xfs_bmbt_irec	*imap,
-	int			type,
-	int			nonblocking)
+	int			type)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -293,12 +292,7 @@ xfs_map_blocks(
 	if (type == XFS_IO_UNWRITTEN)
 		bmapi_flags |= XFS_BMAPI_IGSTATE;
 
-	if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED)) {
-		if (nonblocking)
-			return -EAGAIN;
-		xfs_ilock(ip, XFS_ILOCK_SHARED);
-	}
-
+	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
 	ASSERT(offset <= mp->m_super->s_maxbytes);
@@ -949,7 +943,6 @@ xfs_vm_writepage(
 	ssize_t			len;
 	int			err, imap_valid = 0, uptodate = 1;
 	int			count = 0;
-	int			nonblocking = 0;
 
 	trace_xfs_writepage(inode, page, 0, 0);
 
@@ -1049,9 +1042,6 @@ xfs_vm_writepage(
 	offset = page_offset(page);
 	type = XFS_IO_OVERWRITE;
 
-	if (wbc->sync_mode == WB_SYNC_NONE)
-		nonblocking = 1;
-
 	do {
 		int new_ioend = 0;
 
@@ -1111,8 +1101,7 @@ xfs_vm_writepage(
 			 * time.
 			 */
 			new_ioend = 1;
-			err = xfs_map_blocks(inode, offset, &imap, type,
-					     nonblocking);
+			err = xfs_map_blocks(inode, offset, &imap, type);
 			if (err)
 				goto error;
 			imap_valid = xfs_imap_valid(inode, &imap, offset);
@@ -1182,9 +1171,6 @@ error:
 	if (iohead)
 		xfs_cancel_ioend(iohead);
 
-	if (err == -EAGAIN)
-		goto redirty;
-
 	xfs_aops_discard_page(page);
 	ClearPageUptodate(page);
 	unlock_page(page);
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] xfs: Introduce writeback context for writepages
  2015-08-25  5:05 [PATCH 0/4 v2] xfs: get rid of xfs_cluster_write() Dave Chinner
  2015-08-25  5:05 ` [PATCH 1/4] xfs: remove nonblocking mode from xfs_vm_writepage Dave Chinner
@ 2015-08-25  5:05 ` Dave Chinner
  2015-08-31 18:02   ` Brian Foster
  2016-02-08  7:36   ` Christoph Hellwig
  2015-08-25  5:05 ` [PATCH 3/4] xfs: xfs_cluster_write is redundant Dave Chinner
  2015-08-25  5:05 ` [PATCH 4/4] xfs: factor mapping out of xfs_do_writepage Dave Chinner
  3 siblings, 2 replies; 16+ messages in thread
From: Dave Chinner @ 2015-08-25  5:05 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_vm_writepages() calls generic_writepages to writeback a range of
a file, but then xfs_vm_writepage() clusters pages itself as it does
not have any context it can pass between->writepage calls from
__write_cache_pages().

Introduce a writeback context for xfs_vm_writepages() and call
__write_cache_pages directly with our own writepage callback so that
we can pass that context to each writepage invocation. This
encapsulates the current mapping, whether it is valid or not, the
current ioend and it's IO type and the ioend chain being built.

This requires us to move the ioend submission up to the level where
the writepage context is declared. This does mean we do not submit
IO until we packaged the entire writeback range, but with the block
plugging in the writepages call this is the way IO is submitted,
anyway.

It also means that we need to handle discontiguous page ranges.  If
the pages sent down by write_cache_pages to the writepage callback
are discontiguous, we need to detect this and put each discontiguous
page range into individual ioends. This is needed to ensure that the
ioend accurately represents the range of the file that it covers so
that file size updates during IO completion set the size correctly.
Failure to take into account the discontiguous ranges results in
files being too small when writeback patterns are non-sequential.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 277 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 146 insertions(+), 131 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 89fad6b..93bf13c 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -36,6 +36,18 @@
 #include <linux/pagevec.h>
 #include <linux/writeback.h>
 
+/*
+ * structure owned by writepages passed to individual writepage calls
+ */
+struct xfs_writepage_ctx {
+	struct xfs_bmbt_irec    imap;
+	bool			imap_valid;
+	unsigned int		io_type;
+	struct xfs_ioend	*iohead;
+	struct xfs_ioend	*ioend;
+	sector_t		last_block;
+};
+
 void
 xfs_count_page_state(
 	struct page		*page,
@@ -328,7 +340,7 @@ xfs_map_blocks(
 	return 0;
 }
 
-STATIC int
+STATIC bool
 xfs_imap_valid(
 	struct inode		*inode,
 	struct xfs_bmbt_irec	*imap,
@@ -516,38 +528,6 @@ xfs_submit_ioend(
 }
 
 /*
- * Cancel submission of all buffer_heads so far in this endio.
- * Toss the endio too.  Only ever called for the initial page
- * in a writepage request, so only ever one page.
- */
-STATIC void
-xfs_cancel_ioend(
-	xfs_ioend_t		*ioend)
-{
-	xfs_ioend_t		*next;
-	struct buffer_head	*bh, *next_bh;
-
-	do {
-		next = ioend->io_list;
-		bh = ioend->io_buffer_head;
-		do {
-			next_bh = bh->b_private;
-			clear_buffer_async_write(bh);
-			/*
-			 * The unwritten flag is cleared when added to the
-			 * ioend. We're not submitting for I/O so mark the
-			 * buffer unwritten again for next time around.
-			 */
-			if (ioend->io_type == XFS_IO_UNWRITTEN)
-				set_buffer_unwritten(bh);
-			unlock_buffer(bh);
-		} while ((bh = next_bh) != NULL);
-
-		mempool_free(ioend, xfs_ioend_pool);
-	} while ((ioend = next) != NULL);
-}
-
-/*
  * Test to see if we've been building up a completion structure for
  * earlier buffers -- if so, we try to append to this ioend if we
  * can, otherwise we finish off any current ioend and start another.
@@ -558,29 +538,27 @@ xfs_add_to_ioend(
 	struct inode		*inode,
 	struct buffer_head	*bh,
 	xfs_off_t		offset,
-	unsigned int		type,
-	xfs_ioend_t		**result,
-	int			need_ioend)
+	struct xfs_writepage_ctx *wpc)
 {
-	xfs_ioend_t		*ioend = *result;
-
-	if (!ioend || need_ioend || type != ioend->io_type) {
-		xfs_ioend_t	*previous = *result;
-
-		ioend = xfs_alloc_ioend(inode, type);
-		ioend->io_offset = offset;
-		ioend->io_buffer_head = bh;
-		ioend->io_buffer_tail = bh;
-		if (previous)
-			previous->io_list = ioend;
-		*result = ioend;
+	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
+	    bh->b_blocknr != wpc->last_block + 1) {
+		struct xfs_ioend	*new;
+
+		new = xfs_alloc_ioend(inode, wpc->io_type);
+		new->io_offset = offset;
+		new->io_buffer_head = bh;
+		new->io_buffer_tail = bh;
+		if (wpc->ioend)
+			wpc->ioend->io_list = new;
+		wpc->ioend = new;
 	} else {
-		ioend->io_buffer_tail->b_private = bh;
-		ioend->io_buffer_tail = bh;
+		wpc->ioend->io_buffer_tail->b_private = bh;
+		wpc->ioend->io_buffer_tail = bh;
 	}
 
 	bh->b_private = NULL;
-	ioend->io_size += bh->b_size;
+	wpc->ioend->io_size += bh->b_size;
+	wpc->last_block = bh->b_blocknr;
 }
 
 STATIC void
@@ -677,17 +655,15 @@ xfs_convert_page(
 	struct inode		*inode,
 	struct page		*page,
 	loff_t			tindex,
-	struct xfs_bmbt_irec	*imap,
-	xfs_ioend_t		**ioendp,
+	struct xfs_writepage_ctx *wpc,
 	struct writeback_control *wbc)
 {
 	struct buffer_head	*bh, *head;
 	xfs_off_t		end_offset;
 	unsigned long		p_offset;
-	unsigned int		type;
 	int			len, page_dirty;
 	int			count = 0, done = 0, uptodate = 1;
- 	xfs_off_t		offset = page_offset(page);
+	xfs_off_t		offset = page_offset(page);
 
 	if (page->index != tindex)
 		goto fail;
@@ -697,7 +673,7 @@ xfs_convert_page(
 		goto fail_unlock_page;
 	if (page->mapping != inode->i_mapping)
 		goto fail_unlock_page;
-	if (!xfs_check_page_type(page, (*ioendp)->io_type, false))
+	if (!xfs_check_page_type(page, wpc->ioend->io_type, false))
 		goto fail_unlock_page;
 
 	/*
@@ -733,7 +709,7 @@ xfs_convert_page(
 	 * writeback.  Hence for more optimal IO patterns, we should always
 	 * avoid partial page writeback due to multiple mappings on a page here.
 	 */
-	if (!xfs_imap_valid(inode, imap, end_offset))
+	if (!xfs_imap_valid(inode, &wpc->imap, end_offset))
 		goto fail_unlock_page;
 
 	len = 1 << inode->i_blkbits;
@@ -765,23 +741,22 @@ xfs_convert_page(
 		if (buffer_unwritten(bh) || buffer_delay(bh) ||
 		    buffer_mapped(bh)) {
 			if (buffer_unwritten(bh))
-				type = XFS_IO_UNWRITTEN;
+				wpc->io_type = XFS_IO_UNWRITTEN;
 			else if (buffer_delay(bh))
-				type = XFS_IO_DELALLOC;
+				wpc->io_type = XFS_IO_DELALLOC;
 			else
-				type = XFS_IO_OVERWRITE;
+				wpc->io_type = XFS_IO_OVERWRITE;
 
 			/*
 			 * imap should always be valid because of the above
 			 * partial page end_offset check on the imap.
 			 */
-			ASSERT(xfs_imap_valid(inode, imap, offset));
+			ASSERT(xfs_imap_valid(inode, &wpc->imap, offset));
 
 			lock_buffer(bh);
-			if (type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, type,
-					 ioendp, done);
+			if (wpc->io_type != XFS_IO_OVERWRITE)
+				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
+			xfs_add_to_ioend(inode, bh, offset, wpc);
 
 			page_dirty--;
 			count++;
@@ -816,8 +791,7 @@ STATIC void
 xfs_cluster_write(
 	struct inode		*inode,
 	pgoff_t			tindex,
-	struct xfs_bmbt_irec	*imap,
-	xfs_ioend_t		**ioendp,
+	struct xfs_writepage_ctx *wpc,
 	struct writeback_control *wbc,
 	pgoff_t			tlast)
 {
@@ -833,7 +807,7 @@ xfs_cluster_write(
 
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			done = xfs_convert_page(inode, pvec.pages[i], tindex++,
-					imap, ioendp, wbc);
+						wpc, wbc);
 			if (done)
 				break;
 		}
@@ -919,6 +893,27 @@ out_invalidate:
 	return;
 }
 
+static int
+xfs_writepage_submit(
+	struct xfs_writepage_ctx *wpc,
+	struct writeback_control *wbc,
+	int			status)
+{
+	struct blk_plug		plug;
+
+	/* Reserve log space if we might write beyond the on-disk inode size. */
+	if (!status && wpc->ioend && wpc->ioend->io_type != XFS_IO_UNWRITTEN &&
+	    xfs_ioend_is_append(wpc->ioend))
+		status = xfs_setfilesize_trans_alloc(wpc->ioend);
+
+	if (wpc->iohead) {
+		blk_start_plug(&plug);
+		xfs_submit_ioend(wbc, wpc->iohead, status);
+		blk_finish_plug(&plug);
+	}
+	return status;
+}
+
 /*
  * Write out a dirty page.
  *
@@ -928,20 +923,19 @@ out_invalidate:
  * For any other dirty buffer heads on the page we should flush them.
  */
 STATIC int
-xfs_vm_writepage(
+xfs_do_writepage(
 	struct page		*page,
-	struct writeback_control *wbc)
+	struct writeback_control *wbc,
+	void			*data)
 {
+	struct xfs_writepage_ctx *wpc = data;
 	struct inode		*inode = page->mapping->host;
 	struct buffer_head	*bh, *head;
-	struct xfs_bmbt_irec	imap;
-	xfs_ioend_t		*ioend = NULL, *iohead = NULL;
 	loff_t			offset;
-	unsigned int		type;
 	__uint64_t              end_offset;
 	pgoff_t                 end_index, last_index;
 	ssize_t			len;
-	int			err, imap_valid = 0, uptodate = 1;
+	int			err, uptodate = 1;
 	int			count = 0;
 
 	trace_xfs_writepage(inode, page, 0, 0);
@@ -1040,11 +1034,8 @@ xfs_vm_writepage(
 
 	bh = head = page_buffers(page);
 	offset = page_offset(page);
-	type = XFS_IO_OVERWRITE;
 
 	do {
-		int new_ioend = 0;
-
 		if (offset >= end_offset)
 			break;
 		if (!buffer_uptodate(bh))
@@ -1057,24 +1048,24 @@ xfs_vm_writepage(
 		 * buffers covering holes here.
 		 */
 		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
-			imap_valid = 0;
+			wpc->imap_valid = false;
 			continue;
 		}
 
 		if (buffer_unwritten(bh)) {
-			if (type != XFS_IO_UNWRITTEN) {
-				type = XFS_IO_UNWRITTEN;
-				imap_valid = 0;
+			if (wpc->io_type != XFS_IO_UNWRITTEN) {
+				wpc->io_type = XFS_IO_UNWRITTEN;
+				wpc->imap_valid = false;
 			}
 		} else if (buffer_delay(bh)) {
-			if (type != XFS_IO_DELALLOC) {
-				type = XFS_IO_DELALLOC;
-				imap_valid = 0;
+			if (wpc->io_type != XFS_IO_DELALLOC) {
+				wpc->io_type = XFS_IO_DELALLOC;
+				wpc->imap_valid = false;
 			}
 		} else if (buffer_uptodate(bh)) {
-			if (type != XFS_IO_OVERWRITE) {
-				type = XFS_IO_OVERWRITE;
-				imap_valid = 0;
+			if (wpc->io_type != XFS_IO_OVERWRITE) {
+				wpc->io_type = XFS_IO_OVERWRITE;
+				wpc->imap_valid = false;
 			}
 		} else {
 			if (PageUptodate(page))
@@ -1085,38 +1076,31 @@ xfs_vm_writepage(
 			 * subsequent writeable buffers into a new
 			 * ioend.
 			 */
-			imap_valid = 0;
+			wpc->imap_valid = false;
 			continue;
 		}
 
-		if (imap_valid)
-			imap_valid = xfs_imap_valid(inode, &imap, offset);
-		if (!imap_valid) {
-			/*
-			 * If we didn't have a valid mapping then we need to
-			 * put the new mapping into a separate ioend structure.
-			 * This ensures non-contiguous extents always have
-			 * separate ioends, which is particularly important
-			 * for unwritten extent conversion at I/O completion
-			 * time.
-			 */
-			new_ioend = 1;
-			err = xfs_map_blocks(inode, offset, &imap, type);
+		if (wpc->imap_valid)
+			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
+							 offset);
+		if (!wpc->imap_valid) {
+			err = xfs_map_blocks(inode, offset, &wpc->imap,
+					     wpc->io_type);
 			if (err)
 				goto error;
-			imap_valid = xfs_imap_valid(inode, &imap, offset);
+			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
+							 offset);
 		}
-		if (imap_valid) {
+		if (wpc->imap_valid) {
 			lock_buffer(bh);
-			if (type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, &imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, type, &ioend,
-					 new_ioend);
+			if (wpc->io_type != XFS_IO_OVERWRITE)
+				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
+			xfs_add_to_ioend(inode, bh, offset, wpc);
 			count++;
 		}
 
-		if (!iohead)
-			iohead = ioend;
+		if (!wpc->iohead)
+			wpc->iohead = wpc->ioend;
 
 	} while (offset += len, ((bh = bh->b_this_page) != head));
 
@@ -1126,20 +1110,20 @@ xfs_vm_writepage(
 	xfs_start_page_writeback(page, 1, count);
 
 	/* if there is no IO to be submitted for this page, we are done */
-	if (!ioend)
+	if (!count)
 		return 0;
 
-	ASSERT(iohead);
+	ASSERT(wpc->iohead);
 
 	/*
 	 * Any errors from this point onwards need tobe reported through the IO
 	 * completion path as we have marked the initial page as under writeback
 	 * and unlocked it.
 	 */
-	if (imap_valid) {
+	if (wpc->imap_valid) {
 		xfs_off_t		end_index;
 
-		end_index = imap.br_startoff + imap.br_blockcount;
+		end_index = wpc->imap.br_startoff + wpc->imap.br_blockcount;
 
 		/* to bytes */
 		end_index <<= inode->i_blkbits;
@@ -1151,29 +1135,36 @@ xfs_vm_writepage(
 		if (end_index > last_index)
 			end_index = last_index;
 
-		xfs_cluster_write(inode, page->index + 1, &imap, &ioend,
-				  wbc, end_index);
+		xfs_cluster_write(inode, page->index + 1, wpc, wbc, end_index);
 	}
 
-
-	/*
-	 * Reserve log space if we might write beyond the on-disk inode size.
-	 */
-	err = 0;
-	if (ioend->io_type != XFS_IO_UNWRITTEN && xfs_ioend_is_append(ioend))
-		err = xfs_setfilesize_trans_alloc(ioend);
-
-	xfs_submit_ioend(wbc, iohead, err);
-
 	return 0;
 
 error:
-	if (iohead)
-		xfs_cancel_ioend(iohead);
+	/*
+	 * We have to fail the iohead here because we buffers locked in the
+	 * ioend chain. If we don't do this, we'll deadlock invalidating the
+	 * page as that tries to lock the buffers on the page. Also, because we
+	 * have set pages under writeback, we have to run IO completion to mark
+	 * the error state of the IO appropriately, so we can't cancel the ioend
+	 * directly here. That means we have to mark this page as under
+	 * writeback if we included any buffers from it in the ioend chain.
+	 */
+	if (count)
+		xfs_start_page_writeback(page, 0, count);
+	xfs_writepage_submit(wpc, wbc, err);
 
-	xfs_aops_discard_page(page);
-	ClearPageUptodate(page);
-	unlock_page(page);
+	/*
+	 * We can only discard the page we had the IO error on if we haven't
+	 * included it in the ioend above. If it has already been errored out,
+	 * the it is unlocked and we can't touch it here.
+	 */
+	if (!count) {
+		xfs_aops_discard_page(page);
+		ClearPageUptodate(page);
+		unlock_page(page);
+	}
+	mapping_set_error(page->mapping, err);
 	return err;
 
 redirty:
@@ -1183,12 +1174,36 @@ redirty:
 }
 
 STATIC int
+xfs_vm_writepage(
+	struct page		*page,
+	struct writeback_control *wbc)
+{
+	struct xfs_writepage_ctx wpc = {
+		.io_type = XFS_IO_OVERWRITE,
+	};
+	int			ret;
+
+	ret = xfs_do_writepage(page, wbc, &wpc);
+	if (ret)
+		return ret;
+	return xfs_writepage_submit(&wpc, wbc, ret);
+}
+
+STATIC int
 xfs_vm_writepages(
 	struct address_space	*mapping,
 	struct writeback_control *wbc)
 {
+	struct xfs_writepage_ctx wpc = {
+		.io_type = XFS_IO_OVERWRITE,
+	};
+	int			ret;
+
 	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
-	return generic_writepages(mapping, wbc);
+	ret = write_cache_pages(mapping, wbc, xfs_do_writepage, &wpc);
+	if (ret)
+		return ret;
+	return xfs_writepage_submit(&wpc, wbc, ret);
 }
 
 /*
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] xfs: xfs_cluster_write is redundant
  2015-08-25  5:05 [PATCH 0/4 v2] xfs: get rid of xfs_cluster_write() Dave Chinner
  2015-08-25  5:05 ` [PATCH 1/4] xfs: remove nonblocking mode from xfs_vm_writepage Dave Chinner
  2015-08-25  5:05 ` [PATCH 2/4] xfs: Introduce writeback context for writepages Dave Chinner
@ 2015-08-25  5:05 ` Dave Chinner
  2015-08-25  5:05 ` [PATCH 4/4] xfs: factor mapping out of xfs_do_writepage Dave Chinner
  3 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2015-08-25  5:05 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_cluster_write() is not necessary now that xfs_vm_writepages()
aggregates writepage calls across a single mapping. This means we no
longer need to do page lookups in xfs_cluster_write, so writeback
only needs to look up th epage cache once per page being written.
This also removes a large amount of mostly duplicate code between
xfs_do_writepage() and xfs_convert_page().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 215 ++----------------------------------------------------
 1 file changed, 6 insertions(+), 209 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 93bf13c..1fb1ec9 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -644,179 +644,6 @@ xfs_check_page_type(
 	return false;
 }
 
-/*
- * Allocate & map buffers for page given the extent map. Write it out.
- * except for the original page of a writepage, this is called on
- * delalloc/unwritten pages only, for the original page it is possible
- * that the page has no mapping at all.
- */
-STATIC int
-xfs_convert_page(
-	struct inode		*inode,
-	struct page		*page,
-	loff_t			tindex,
-	struct xfs_writepage_ctx *wpc,
-	struct writeback_control *wbc)
-{
-	struct buffer_head	*bh, *head;
-	xfs_off_t		end_offset;
-	unsigned long		p_offset;
-	int			len, page_dirty;
-	int			count = 0, done = 0, uptodate = 1;
-	xfs_off_t		offset = page_offset(page);
-
-	if (page->index != tindex)
-		goto fail;
-	if (!trylock_page(page))
-		goto fail;
-	if (PageWriteback(page))
-		goto fail_unlock_page;
-	if (page->mapping != inode->i_mapping)
-		goto fail_unlock_page;
-	if (!xfs_check_page_type(page, wpc->ioend->io_type, false))
-		goto fail_unlock_page;
-
-	/*
-	 * page_dirty is initially a count of buffers on the page before
-	 * EOF and is decremented as we move each into a cleanable state.
-	 *
-	 * Derivation:
-	 *
-	 * End offset is the highest offset that this page should represent.
-	 * If we are on the last page, (end_offset & (PAGE_CACHE_SIZE - 1))
-	 * will evaluate non-zero and be less than PAGE_CACHE_SIZE and
-	 * hence give us the correct page_dirty count. On any other page,
-	 * it will be zero and in that case we need page_dirty to be the
-	 * count of buffers on the page.
-	 */
-	end_offset = min_t(unsigned long long,
-			(xfs_off_t)(page->index + 1) << PAGE_CACHE_SHIFT,
-			i_size_read(inode));
-
-	/*
-	 * If the current map does not span the entire page we are about to try
-	 * to write, then give up. The only way we can write a page that spans
-	 * multiple mappings in a single writeback iteration is via the
-	 * xfs_vm_writepage() function. Data integrity writeback requires the
-	 * entire page to be written in a single attempt, otherwise the part of
-	 * the page we don't write here doesn't get written as part of the data
-	 * integrity sync.
-	 *
-	 * For normal writeback, we also don't attempt to write partial pages
-	 * here as it simply means that write_cache_pages() will see it under
-	 * writeback and ignore the page until some point in the future, at
-	 * which time this will be the only page in the file that needs
-	 * writeback.  Hence for more optimal IO patterns, we should always
-	 * avoid partial page writeback due to multiple mappings on a page here.
-	 */
-	if (!xfs_imap_valid(inode, &wpc->imap, end_offset))
-		goto fail_unlock_page;
-
-	len = 1 << inode->i_blkbits;
-	p_offset = min_t(unsigned long, end_offset & (PAGE_CACHE_SIZE - 1),
-					PAGE_CACHE_SIZE);
-	p_offset = p_offset ? roundup(p_offset, len) : PAGE_CACHE_SIZE;
-	page_dirty = p_offset / len;
-
-	/*
-	 * The moment we find a buffer that doesn't match our current type
-	 * specification or can't be written, abort the loop and start
-	 * writeback. As per the above xfs_imap_valid() check, only
-	 * xfs_vm_writepage() can handle partial page writeback fully - we are
-	 * limited here to the buffers that are contiguous with the current
-	 * ioend, and hence a buffer we can't write breaks that contiguity and
-	 * we have to defer the rest of the IO to xfs_vm_writepage().
-	 */
-	bh = head = page_buffers(page);
-	do {
-		if (offset >= end_offset)
-			break;
-		if (!buffer_uptodate(bh))
-			uptodate = 0;
-		if (!(PageUptodate(page) || buffer_uptodate(bh))) {
-			done = 1;
-			break;
-		}
-
-		if (buffer_unwritten(bh) || buffer_delay(bh) ||
-		    buffer_mapped(bh)) {
-			if (buffer_unwritten(bh))
-				wpc->io_type = XFS_IO_UNWRITTEN;
-			else if (buffer_delay(bh))
-				wpc->io_type = XFS_IO_DELALLOC;
-			else
-				wpc->io_type = XFS_IO_OVERWRITE;
-
-			/*
-			 * imap should always be valid because of the above
-			 * partial page end_offset check on the imap.
-			 */
-			ASSERT(xfs_imap_valid(inode, &wpc->imap, offset));
-
-			lock_buffer(bh);
-			if (wpc->io_type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, wpc);
-
-			page_dirty--;
-			count++;
-		} else {
-			done = 1;
-			break;
-		}
-	} while (offset += len, (bh = bh->b_this_page) != head);
-
-	if (uptodate && bh == head)
-		SetPageUptodate(page);
-
-	if (count) {
-		if (--wbc->nr_to_write <= 0 &&
-		    wbc->sync_mode == WB_SYNC_NONE)
-			done = 1;
-	}
-	xfs_start_page_writeback(page, !page_dirty, count);
-
-	return done;
- fail_unlock_page:
-	unlock_page(page);
- fail:
-	return 1;
-}
-
-/*
- * Convert & write out a cluster of pages in the same extent as defined
- * by mp and following the start page.
- */
-STATIC void
-xfs_cluster_write(
-	struct inode		*inode,
-	pgoff_t			tindex,
-	struct xfs_writepage_ctx *wpc,
-	struct writeback_control *wbc,
-	pgoff_t			tlast)
-{
-	struct pagevec		pvec;
-	int			done = 0, i;
-
-	pagevec_init(&pvec, 0);
-	while (!done && tindex <= tlast) {
-		unsigned len = min_t(pgoff_t, PAGEVEC_SIZE, tlast - tindex + 1);
-
-		if (!pagevec_lookup(&pvec, inode->i_mapping, tindex, len))
-			break;
-
-		for (i = 0; i < pagevec_count(&pvec); i++) {
-			done = xfs_convert_page(inode, pvec.pages[i], tindex++,
-						wpc, wbc);
-			if (done)
-				break;
-		}
-
-		pagevec_release(&pvec);
-		cond_resched();
-	}
-}
-
 STATIC void
 xfs_vm_invalidatepage(
 	struct page		*page,
@@ -933,7 +760,7 @@ xfs_do_writepage(
 	struct buffer_head	*bh, *head;
 	loff_t			offset;
 	__uint64_t              end_offset;
-	pgoff_t                 end_index, last_index;
+	pgoff_t                 end_index;
 	ssize_t			len;
 	int			err, uptodate = 1;
 	int			count = 0;
@@ -963,12 +790,9 @@ xfs_do_writepage(
 	if (WARN_ON_ONCE(current->flags & PF_FSTRANS))
 		goto redirty;
 
-	/* Is this page beyond the end of the file? */
-	offset = i_size_read(inode);
-	end_index = offset >> PAGE_CACHE_SHIFT;
-	last_index = (offset - 1) >> PAGE_CACHE_SHIFT;
-
 	/*
+	 * Is this page beyond the end of the file?
+	 *
 	 * The page index is less than the end_index, adjust the end_offset
 	 * to the highest offset that this page should represent.
 	 * -----------------------------------------------------
@@ -979,6 +803,8 @@ xfs_do_writepage(
 	 * |     desired writeback range    |      see else    |
 	 * ---------------------------------^------------------|
 	 */
+	offset = i_size_read(inode);
+	end_index = offset >> PAGE_CACHE_SHIFT;
 	if (page->index < end_index)
 		end_offset = (xfs_off_t)(page->index + 1) << PAGE_CACHE_SHIFT;
 	else {
@@ -1108,36 +934,7 @@ xfs_do_writepage(
 		SetPageUptodate(page);
 
 	xfs_start_page_writeback(page, 1, count);
-
-	/* if there is no IO to be submitted for this page, we are done */
-	if (!count)
-		return 0;
-
-	ASSERT(wpc->iohead);
-
-	/*
-	 * Any errors from this point onwards need tobe reported through the IO
-	 * completion path as we have marked the initial page as under writeback
-	 * and unlocked it.
-	 */
-	if (wpc->imap_valid) {
-		xfs_off_t		end_index;
-
-		end_index = wpc->imap.br_startoff + wpc->imap.br_blockcount;
-
-		/* to bytes */
-		end_index <<= inode->i_blkbits;
-
-		/* to pages */
-		end_index = (end_index - 1) >> PAGE_CACHE_SHIFT;
-
-		/* check against file size */
-		if (end_index > last_index)
-			end_index = last_index;
-
-		xfs_cluster_write(inode, page->index + 1, wpc, wbc, end_index);
-	}
-
+	ASSERT(wpc->iohead || !count);
 	return 0;
 
 error:
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] xfs: factor mapping out of xfs_do_writepage
  2015-08-25  5:05 [PATCH 0/4 v2] xfs: get rid of xfs_cluster_write() Dave Chinner
                   ` (2 preceding siblings ...)
  2015-08-25  5:05 ` [PATCH 3/4] xfs: xfs_cluster_write is redundant Dave Chinner
@ 2015-08-25  5:05 ` Dave Chinner
  3 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2015-08-25  5:05 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Separate out the bufferhead based mapping from the writepage code so
that we have a clear separation of the page operations and the
bufferhead state.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 221 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 119 insertions(+), 102 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1fb1ec9..08a0205 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -741,6 +741,116 @@ xfs_writepage_submit(
 	return status;
 }
 
+static int
+xfs_writepage_map(
+	struct xfs_writepage_ctx *wpc,
+	struct inode		*inode,
+	struct page		*page,
+	loff_t			offset,
+	__uint64_t              end_offset)
+{
+	struct buffer_head	*bh, *head;
+	ssize_t			len = 1 << inode->i_blkbits;
+	int			error = 0;
+	int			uptodate = 1;
+	int			count = 0;
+
+	bh = head = page_buffers(page);
+	offset = page_offset(page);
+
+	do {
+		if (offset >= end_offset)
+			break;
+		if (!buffer_uptodate(bh))
+			uptodate = 0;
+
+		/*
+		 * set_page_dirty dirties all buffers in a page, independent
+		 * of their state.  The dirty state however is entirely
+		 * meaningless for holes (!mapped && uptodate), so skip
+		 * buffers covering holes here.
+		 */
+		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
+			wpc->imap_valid = false;
+			continue;
+		}
+
+		if (buffer_unwritten(bh)) {
+			if (wpc->io_type != XFS_IO_UNWRITTEN) {
+				wpc->io_type = XFS_IO_UNWRITTEN;
+				wpc->imap_valid = false;
+			}
+		} else if (buffer_delay(bh)) {
+			if (wpc->io_type != XFS_IO_DELALLOC) {
+				wpc->io_type = XFS_IO_DELALLOC;
+				wpc->imap_valid = false;
+			}
+		} else if (buffer_uptodate(bh)) {
+			if (wpc->io_type != XFS_IO_OVERWRITE) {
+				wpc->io_type = XFS_IO_OVERWRITE;
+				wpc->imap_valid = false;
+			}
+		} else {
+			if (PageUptodate(page))
+				ASSERT(buffer_mapped(bh));
+			/*
+			 * This buffer is not uptodate and will not be
+			 * written to disk.  Ensure that we will put any
+			 * subsequent writeable buffers into a new
+			 * ioend.
+			 */
+			wpc->imap_valid = false;
+			continue;
+		}
+
+		if (wpc->imap_valid)
+			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
+							 offset);
+		if (!wpc->imap_valid) {
+			error = xfs_map_blocks(inode, offset, &wpc->imap,
+					     wpc->io_type);
+			if (error)
+				goto out_error;
+			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
+							 offset);
+		}
+		if (wpc->imap_valid) {
+			lock_buffer(bh);
+			if (wpc->io_type != XFS_IO_OVERWRITE)
+				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
+			xfs_add_to_ioend(inode, bh, offset, wpc);
+			count++;
+		}
+
+		if (!wpc->iohead)
+			wpc->iohead = wpc->ioend;
+
+	} while (offset += len, ((bh = bh->b_this_page) != head));
+
+	if (uptodate && bh == head)
+		SetPageUptodate(page);
+
+	xfs_start_page_writeback(page, 1, count);
+	ASSERT(wpc->iohead || !count);
+	return 0;
+
+out_error:
+	/*
+	 * We can only discard the page we had the IO error on if we haven't
+	 * included it in the ioend above. If it has already been added to the
+	 * ioend, then we can't touch it here and need to rely on IO submission
+	 * to unlock it.
+	 */
+	if (count)
+		xfs_start_page_writeback(page, 0, count);
+	else {
+		xfs_aops_discard_page(page);
+		ClearPageUptodate(page);
+		unlock_page(page);
+	}
+	return error;
+}
+
 /*
  * Write out a dirty page.
  *
@@ -757,13 +867,10 @@ xfs_do_writepage(
 {
 	struct xfs_writepage_ctx *wpc = data;
 	struct inode		*inode = page->mapping->host;
-	struct buffer_head	*bh, *head;
 	loff_t			offset;
 	__uint64_t              end_offset;
 	pgoff_t                 end_index;
-	ssize_t			len;
-	int			err, uptodate = 1;
-	int			count = 0;
+	int			error = 0;
 
 	trace_xfs_writepage(inode, page, 0, 0);
 
@@ -856,113 +963,23 @@ xfs_do_writepage(
 		end_offset = offset;
 	}
 
-	len = 1 << inode->i_blkbits;
-
-	bh = head = page_buffers(page);
-	offset = page_offset(page);
-
-	do {
-		if (offset >= end_offset)
-			break;
-		if (!buffer_uptodate(bh))
-			uptodate = 0;
-
-		/*
-		 * set_page_dirty dirties all buffers in a page, independent
-		 * of their state.  The dirty state however is entirely
-		 * meaningless for holes (!mapped && uptodate), so skip
-		 * buffers covering holes here.
-		 */
-		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
-			wpc->imap_valid = false;
-			continue;
-		}
-
-		if (buffer_unwritten(bh)) {
-			if (wpc->io_type != XFS_IO_UNWRITTEN) {
-				wpc->io_type = XFS_IO_UNWRITTEN;
-				wpc->imap_valid = false;
-			}
-		} else if (buffer_delay(bh)) {
-			if (wpc->io_type != XFS_IO_DELALLOC) {
-				wpc->io_type = XFS_IO_DELALLOC;
-				wpc->imap_valid = false;
-			}
-		} else if (buffer_uptodate(bh)) {
-			if (wpc->io_type != XFS_IO_OVERWRITE) {
-				wpc->io_type = XFS_IO_OVERWRITE;
-				wpc->imap_valid = false;
-			}
-		} else {
-			if (PageUptodate(page))
-				ASSERT(buffer_mapped(bh));
-			/*
-			 * This buffer is not uptodate and will not be
-			 * written to disk.  Ensure that we will put any
-			 * subsequent writeable buffers into a new
-			 * ioend.
-			 */
-			wpc->imap_valid = false;
-			continue;
-		}
-
-		if (wpc->imap_valid)
-			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 offset);
-		if (!wpc->imap_valid) {
-			err = xfs_map_blocks(inode, offset, &wpc->imap,
-					     wpc->io_type);
-			if (err)
-				goto error;
-			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 offset);
-		}
-		if (wpc->imap_valid) {
-			lock_buffer(bh);
-			if (wpc->io_type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, wpc);
-			count++;
-		}
-
-		if (!wpc->iohead)
-			wpc->iohead = wpc->ioend;
-
-	} while (offset += len, ((bh = bh->b_this_page) != head));
-
-	if (uptodate && bh == head)
-		SetPageUptodate(page);
-
-	xfs_start_page_writeback(page, 1, count);
-	ASSERT(wpc->iohead || !count);
+	error = xfs_writepage_map(wpc, inode, page, offset, end_offset);
+	if (error)
+		goto out_error;
 	return 0;
 
-error:
+out_error:
 	/*
 	 * We have to fail the iohead here because we buffers locked in the
 	 * ioend chain. If we don't do this, we'll deadlock invalidating the
 	 * page as that tries to lock the buffers on the page. Also, because we
 	 * have set pages under writeback, we have to run IO completion to mark
 	 * the error state of the IO appropriately, so we can't cancel the ioend
-	 * directly here. That means we have to mark this page as under
-	 * writeback if we included any buffers from it in the ioend chain.
+	 * directly here.
 	 */
-	if (count)
-		xfs_start_page_writeback(page, 0, count);
-	xfs_writepage_submit(wpc, wbc, err);
-
-	/*
-	 * We can only discard the page we had the IO error on if we haven't
-	 * included it in the ioend above. If it has already been errored out,
-	 * the it is unlocked and we can't touch it here.
-	 */
-	if (!count) {
-		xfs_aops_discard_page(page);
-		ClearPageUptodate(page);
-		unlock_page(page);
-	}
-	mapping_set_error(page->mapping, err);
-	return err;
+	xfs_writepage_submit(wpc, wbc, error);
+	mapping_set_error(page->mapping, error);
+	return error;
 
 redirty:
 	redirty_page_for_writepage(wbc, page);
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xfs: Introduce writeback context for writepages
  2015-08-25  5:05 ` [PATCH 2/4] xfs: Introduce writeback context for writepages Dave Chinner
@ 2015-08-31 18:02   ` Brian Foster
  2015-08-31 18:56     ` Christoph Hellwig
  2016-02-08  7:36   ` Christoph Hellwig
  1 sibling, 1 reply; 16+ messages in thread
From: Brian Foster @ 2015-08-31 18:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Aug 25, 2015 at 03:05:51PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_vm_writepages() calls generic_writepages to writeback a range of
> a file, but then xfs_vm_writepage() clusters pages itself as it does
> not have any context it can pass between->writepage calls from
> __write_cache_pages().
> 
> Introduce a writeback context for xfs_vm_writepages() and call
> __write_cache_pages directly with our own writepage callback so that
> we can pass that context to each writepage invocation. This
> encapsulates the current mapping, whether it is valid or not, the
> current ioend and it's IO type and the ioend chain being built.
> 
> This requires us to move the ioend submission up to the level where
> the writepage context is declared. This does mean we do not submit
> IO until we packaged the entire writeback range, but with the block
> plugging in the writepages call this is the way IO is submitted,
> anyway.
> 

Ok, but the comment for blk_start_plug() mentions some kind of flush on
task sleep mechanism. I could be wrong, but I take this to mean there
are cases where I/O can initiate before the plug is stopped. Does
deferring the I/O submission across writepages defeat that heuristic in
any way? My (preliminary) understanding is that while the I/O submission
would still be deferred by the plug in the same way in most cases, we're
potentially holding back I/Os from the block infrastructure until the
entire writepages sequence is complete.

> It also means that we need to handle discontiguous page ranges.  If
> the pages sent down by write_cache_pages to the writepage callback
> are discontiguous, we need to detect this and put each discontiguous
> page range into individual ioends. This is needed to ensure that the
> ioend accurately represents the range of the file that it covers so
> that file size updates during IO completion set the size correctly.
> Failure to take into account the discontiguous ranges results in
> files being too small when writeback patterns are non-sequential.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_aops.c | 277 ++++++++++++++++++++++++++++--------------------------
>  1 file changed, 146 insertions(+), 131 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 89fad6b..93bf13c 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -36,6 +36,18 @@
>  #include <linux/pagevec.h>
>  #include <linux/writeback.h>
>  
...
> @@ -1151,29 +1135,36 @@ xfs_vm_writepage(
>  		if (end_index > last_index)
>  			end_index = last_index;
>  
> -		xfs_cluster_write(inode, page->index + 1, &imap, &ioend,
> -				  wbc, end_index);
> +		xfs_cluster_write(inode, page->index + 1, wpc, wbc, end_index);
>  	}
>  
> -
> -	/*
> -	 * Reserve log space if we might write beyond the on-disk inode size.
> -	 */
> -	err = 0;
> -	if (ioend->io_type != XFS_IO_UNWRITTEN && xfs_ioend_is_append(ioend))
> -		err = xfs_setfilesize_trans_alloc(ioend);
> -
> -	xfs_submit_ioend(wbc, iohead, err);
> -
>  	return 0;
>  
>  error:
> -	if (iohead)
> -		xfs_cancel_ioend(iohead);
> +	/*
> +	 * We have to fail the iohead here because we buffers locked in the
> +	 * ioend chain. If we don't do this, we'll deadlock invalidating the
> +	 * page as that tries to lock the buffers on the page. Also, because we
> +	 * have set pages under writeback, we have to run IO completion to mark
> +	 * the error state of the IO appropriately, so we can't cancel the ioend
> +	 * directly here. That means we have to mark this page as under
> +	 * writeback if we included any buffers from it in the ioend chain.
> +	 */
> +	if (count)
> +		xfs_start_page_writeback(page, 0, count);
> +	xfs_writepage_submit(wpc, wbc, err);

What are the error handling ramifications here for the previous, pending
ioends? Previously, it looks like we would either fail in
xfs_map_blocks() or submit I/O for each extent mapping. In other words,
errors were not taken into consideration by the time we get into/past
xfs_cluster_write().

Now it looks as though writepages carries on chaining ioends until we're
done or hit an error, and then the entire ioend chain is subject to the
error. I suppose a mapping error here is indicative of a larger problem,
but do we really want to fail the entire writeback here? (If nothing
else, the comments above should probably touch on this case).

Brian

>  
> -	xfs_aops_discard_page(page);
> -	ClearPageUptodate(page);
> -	unlock_page(page);
> +	/*
> +	 * We can only discard the page we had the IO error on if we haven't
> +	 * included it in the ioend above. If it has already been errored out,
> +	 * the it is unlocked and we can't touch it here.
> +	 */
> +	if (!count) {
> +		xfs_aops_discard_page(page);
> +		ClearPageUptodate(page);
> +		unlock_page(page);
> +	}
> +	mapping_set_error(page->mapping, err);
>  	return err;
>  
>  redirty:
> @@ -1183,12 +1174,36 @@ redirty:
>  }
>  
>  STATIC int
> +xfs_vm_writepage(
> +	struct page		*page,
> +	struct writeback_control *wbc)
> +{
> +	struct xfs_writepage_ctx wpc = {
> +		.io_type = XFS_IO_OVERWRITE,
> +	};
> +	int			ret;
> +
> +	ret = xfs_do_writepage(page, wbc, &wpc);
> +	if (ret)
> +		return ret;
> +	return xfs_writepage_submit(&wpc, wbc, ret);
> +}
> +
> +STATIC int
>  xfs_vm_writepages(
>  	struct address_space	*mapping,
>  	struct writeback_control *wbc)
>  {
> +	struct xfs_writepage_ctx wpc = {
> +		.io_type = XFS_IO_OVERWRITE,
> +	};
> +	int			ret;
> +
>  	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
> -	return generic_writepages(mapping, wbc);
> +	ret = write_cache_pages(mapping, wbc, xfs_do_writepage, &wpc);
> +	if (ret)
> +		return ret;
> +	return xfs_writepage_submit(&wpc, wbc, ret);
>  }
>  
>  /*
> -- 
> 2.5.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] xfs: remove nonblocking mode from xfs_vm_writepage
  2015-08-25  5:05 ` [PATCH 1/4] xfs: remove nonblocking mode from xfs_vm_writepage Dave Chinner
@ 2015-08-31 18:41   ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2015-08-31 18:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xfs: Introduce writeback context for writepages
  2015-08-31 18:02   ` Brian Foster
@ 2015-08-31 18:56     ` Christoph Hellwig
  2015-08-31 22:17       ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2015-08-31 18:56 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Mon, Aug 31, 2015 at 02:02:22PM -0400, Brian Foster wrote:
> Ok, but the comment for blk_start_plug() mentions some kind of flush on
> task sleep mechanism. I could be wrong, but I take this to mean there
> are cases where I/O can initiate before the plug is stopped. Does
> deferring the I/O submission across writepages defeat that heuristic in
> any way? My (preliminary) understanding is that while the I/O submission
> would still be deferred by the plug in the same way in most cases, we're
> potentially holding back I/Os from the block infrastructure until the
> entire writepages sequence is complete.

Yes, we do.  But the reason why the block layer needs to flush the
plug on context switch is because we only have a limited bio mempool,
and if processes that are not running consume that we can't make
guaranteed progress.  The XFS writeback code doesn't allocate bios
until the end we are not affected by that particular issues, although
we still need to worry about our own ioend mempools.

Sidenote:  Jens now has the arbitrarily sized bio code in his queue
for 4.3.  With that we could replace our ioends with bios that have
a little private data and simplify the submission phase of the
writeback code significantly.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xfs: Introduce writeback context for writepages
  2015-08-31 18:56     ` Christoph Hellwig
@ 2015-08-31 22:17       ` Dave Chinner
  2015-09-01  7:41         ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2015-08-31 22:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Brian Foster, xfs

On Mon, Aug 31, 2015 at 11:56:12AM -0700, Christoph Hellwig wrote:
> On Mon, Aug 31, 2015 at 02:02:22PM -0400, Brian Foster wrote:
> > Ok, but the comment for blk_start_plug() mentions some kind of flush on
> > task sleep mechanism. I could be wrong, but I take this to mean there
> > are cases where I/O can initiate before the plug is stopped. Does
> > deferring the I/O submission across writepages defeat that heuristic in
> > any way? My (preliminary) understanding is that while the I/O submission
> > would still be deferred by the plug in the same way in most cases, we're
> > potentially holding back I/Os from the block infrastructure until the
> > entire writepages sequence is complete.
> 
> Yes, we do.  But the reason why the block layer needs to flush the
> plug on context switch is because we only have a limited bio mempool,
> and if processes that are not running consume that we can't make
> guaranteed progress.  The XFS writeback code doesn't allocate bios
> until the end we are not affected by that particular issues, although
> we still need to worry about our own ioend mempools.

The patch changes the bio allocation patterns - it allocates them on
the fly and holds them, so we could potentially exhaust the bio
mempool with this submission technique. The ioend allocation pattern
is different, too, because we only used to have 1 per buffer on a
writepage call and the last one was used for the write clustering.

> Sidenote:  Jens now has the arbitrarily sized bio code in his queue
> for 4.3.  With that we could replace our ioends with bios that have
> a little private data and simplify the submission phase of the
> writeback code significantly.

Yes, I've been keeping an eye on that and waiting for it to land as
it solves many of the above problems. I still wanted to get the write
cluster infrastructure changes underway, though, so I can work
on making the writeback path not use bufferheads at all...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xfs: Introduce writeback context for writepages
  2015-08-31 22:17       ` Dave Chinner
@ 2015-09-01  7:41         ` Christoph Hellwig
  2015-11-10 23:25           ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2015-09-01  7:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, Brian Foster, xfs

On Tue, Sep 01, 2015 at 08:17:43AM +1000, Dave Chinner wrote:
> The patch changes the bio allocation patterns - it allocates them on
> the fly and holds them, so we could potentially exhaust the bio
> mempool with this submission technique.

I've spend time to look over the patch again, and still don't see
a change.  Both in the old and new code we walk over the ioends
and build bios on the fly in xfs_submit_ioend(), which is always
called near then end of the writeback code; at the end of
xfs_vm_writepage in the old version, and from the end of
xfs_vm_writepage/xfs_vm_writepages through xfs_writepage_submit in
the new code (not taking the error case into account, which probably
should moe there, too).

The only big difference is..

> The ioend allocation pattern
> is different, too, because we only used to have 1 per buffer on a
> writepage call and the last one was used for the write clustering.

.. that we now build up way bigger ioend chains.


So back to Brians concern:  we can now have fairly large piles of
ioends built up while potentially getting scheduled out, and this
does look like a potential real issue to me.  I wonder if we should
(ab-)use the blk_plug_cb infrastructure so that we can flush the
pending ioends out on a context switch?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xfs: Introduce writeback context for writepages
  2015-09-01  7:41         ` Christoph Hellwig
@ 2015-11-10 23:25           ` Dave Chinner
  2015-11-11 11:32             ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2015-11-10 23:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Brian Foster, xfs

[ finally getting back to this :/ ]

On Tue, Sep 01, 2015 at 12:41:03AM -0700, Christoph Hellwig wrote:
> On Tue, Sep 01, 2015 at 08:17:43AM +1000, Dave Chinner wrote:
> > The patch changes the bio allocation patterns - it allocates them on
> > the fly and holds them, so we could potentially exhaust the bio
> > mempool with this submission technique.
> 
> I've spend time to look over the patch again, and still don't see
> a change. 

Ah, it's in the next patch that I haven't posted that "goes straight
to bios". Ignore it for now...

> > The ioend allocation pattern
> > is different, too, because we only used to have 1 per buffer on a
> > writepage call and the last one was used for the write clustering.
> 
> .. that we now build up way bigger ioend chains.
> 
> So back to Brians concern:  we can now have fairly large piles of
> ioends built up while potentially getting scheduled out, and this
> does look like a potential real issue to me.  I wonder if we should
> (ab-)use the blk_plug_cb infrastructure so that we can flush the
> pending ioends out on a context switch?

Possibly, but I'm thinking that we should just end up building bios
directly and submitting them once they are reach size limits of
boundaries rather than building ioend chains for later submission.

Did the work for arbitrarily sized bios ever get merged? I can't
find any evidence that it did....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xfs: Introduce writeback context for writepages
  2015-11-10 23:25           ` Dave Chinner
@ 2015-11-11 11:32             ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2015-11-11 11:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, xfs

On Wed, Nov 11, 2015 at 10:25:43AM +1100, Dave Chinner wrote:
> Did the work for arbitrarily sized bios ever get merged? I can't
> find any evidence that it did....

Yes, it landed in 4.3.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xfs: Introduce writeback context for writepages
  2015-08-25  5:05 ` [PATCH 2/4] xfs: Introduce writeback context for writepages Dave Chinner
  2015-08-31 18:02   ` Brian Foster
@ 2016-02-08  7:36   ` Christoph Hellwig
  2016-02-08  7:54     ` Christoph Hellwig
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2016-02-08  7:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

FYI, I recently started testing this series again after a trivial
rebase to 4.5-rc2, and this patch seems to cause reproducible failures
of generic/311 (fsync tester)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xfs: Introduce writeback context for writepages
  2016-02-08  7:36   ` Christoph Hellwig
@ 2016-02-08  7:54     ` Christoph Hellwig
  2016-02-08 20:21       ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2016-02-08  7:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Sun, Feb 07, 2016 at 11:36:31PM -0800, Christoph Hellwig wrote:
> FYI, I recently started testing this series again after a trivial
> rebase to 4.5-rc2, and this patch seems to cause reproducible failures
> of generic/311 (fsync tester)

And by coincidence you posted a new version the night before I posted
this mail.  It still shows that same failure.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xfs: Introduce writeback context for writepages
  2016-02-08  7:54     ` Christoph Hellwig
@ 2016-02-08 20:21       ` Dave Chinner
  2016-02-09  9:11         ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2016-02-08 20:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Feb 07, 2016 at 11:54:03PM -0800, Christoph Hellwig wrote:
> On Sun, Feb 07, 2016 at 11:36:31PM -0800, Christoph Hellwig wrote:
> > FYI, I recently started testing this series again after a trivial
> > rebase to 4.5-rc2, and this patch seems to cause reproducible failures
> > of generic/311 (fsync tester)
> 
> And by coincidence you posted a new version the night before I posted
> this mail.  It still shows that same failure.

What is your test config? The only failures I've had recently of
generic/311 were caused by a bug in the fsync-tester application.
This was fixed by commit 9165a84 ("fsync-tester: reopen files with
correct flags") that was pushed out in the fstests update I did
yesterday.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] xfs: Introduce writeback context for writepages
  2016-02-08 20:21       ` Dave Chinner
@ 2016-02-09  9:11         ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2016-02-09  9:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Feb 09, 2016 at 07:21:23AM +1100, Dave Chinner wrote:
> What is your test config? The only failures I've had recently of
> generic/311 were caused by a bug in the fsync-tester application.
> This was fixed by commit 9165a84 ("fsync-tester: reopen files with
> correct flags") that was pushed out in the fstests update I did
> yesterday.

Yeah, that seems to fix it.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-02-09  9:11 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-25  5:05 [PATCH 0/4 v2] xfs: get rid of xfs_cluster_write() Dave Chinner
2015-08-25  5:05 ` [PATCH 1/4] xfs: remove nonblocking mode from xfs_vm_writepage Dave Chinner
2015-08-31 18:41   ` Christoph Hellwig
2015-08-25  5:05 ` [PATCH 2/4] xfs: Introduce writeback context for writepages Dave Chinner
2015-08-31 18:02   ` Brian Foster
2015-08-31 18:56     ` Christoph Hellwig
2015-08-31 22:17       ` Dave Chinner
2015-09-01  7:41         ` Christoph Hellwig
2015-11-10 23:25           ` Dave Chinner
2015-11-11 11:32             ` Christoph Hellwig
2016-02-08  7:36   ` Christoph Hellwig
2016-02-08  7:54     ` Christoph Hellwig
2016-02-08 20:21       ` Dave Chinner
2016-02-09  9:11         ` Christoph Hellwig
2015-08-25  5:05 ` [PATCH 3/4] xfs: xfs_cluster_write is redundant Dave Chinner
2015-08-25  5:05 ` [PATCH 4/4] xfs: factor mapping out of xfs_do_writepage Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.