All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5 v3] xfs: get rid of xfs_cluster_write()
@ 2016-02-08  5:44 Dave Chinner
  2016-02-08  5:44 ` [PATCH 1/5] xfs: remove nonblocking mode from xfs_vm_writepage Dave Chinner
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Dave Chinner @ 2016-02-08  5:44 UTC (permalink / raw)
  To: xfs

Hi folks,

Still clearing out my pending patches. This one was last posted
almost 6 months ago:

http://oss.sgi.com/archives/xfs/2015-08/msg00523.html

The main issue at the time was the fact that the ioends were added
to the writepage context and only submitted once
write_cached_pages() was done, resulting in building up a large
number of writeback pages and allocated ioends before IO submission
was done. hence we could potentially prevent writeback progress from
being made due to mempool starvation as we weren't following the
required rules for the ioend mempool.

This is addressed in the new patch 5. instead of chaining ioends
onto the writepage context, we submit completed ioends the moment we
are done processing the page that triggered allocation of a new
ioend. Hence we issue ioends as they are built, rather than as a
batch once page processing is done. This ensures that writeback
follows the rules required for mempools to work effectively and we
won't deadlock due to mempool starvation during low memory
writeback.

This modified patch set has been sitting in my queue now for almost
4 months, so it's time to get it moved onwards...

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] xfs: remove nonblocking mode from xfs_vm_writepage
  2016-02-08  5:44 [PATCH 0/5 v3] xfs: get rid of xfs_cluster_write() Dave Chinner
@ 2016-02-08  5:44 ` Dave Chinner
  2016-02-08  5:44 ` [PATCH 2/5] xfs: Introduce writeback context for writepages Dave Chinner
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2016-02-08  5:44 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Remove the nonblocking optimisation done for mapping lookups during
writeback. It's not clear that leaving a hole in the writeback range
just because we couldn't get a lock is really a win, as it makes us
do another small random IO later on rather than a large sequential
IO now.

As this gets in the way of sane error handling later on, just remove
for the moment and we can re-introduce an equivalent optimisation in
future if we see problems due to extent map lock contention.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 20 +++-----------------
 1 file changed, 3 insertions(+), 17 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 14ac982..00452cb 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -289,8 +289,7 @@ xfs_map_blocks(
 	struct inode		*inode,
 	loff_t			offset,
 	struct xfs_bmbt_irec	*imap,
-	int			type,
-	int			nonblocking)
+	int			type)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -306,12 +305,7 @@ xfs_map_blocks(
 	if (type == XFS_IO_UNWRITTEN)
 		bmapi_flags |= XFS_BMAPI_IGSTATE;
 
-	if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED)) {
-		if (nonblocking)
-			return -EAGAIN;
-		xfs_ilock(ip, XFS_ILOCK_SHARED);
-	}
-
+	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
 	ASSERT(offset <= mp->m_super->s_maxbytes);
@@ -961,7 +955,6 @@ xfs_vm_writepage(
 	ssize_t			len;
 	int			err, imap_valid = 0, uptodate = 1;
 	int			count = 0;
-	int			nonblocking = 0;
 
 	trace_xfs_writepage(inode, page, 0, 0);
 
@@ -1061,9 +1054,6 @@ xfs_vm_writepage(
 	offset = page_offset(page);
 	type = XFS_IO_OVERWRITE;
 
-	if (wbc->sync_mode == WB_SYNC_NONE)
-		nonblocking = 1;
-
 	do {
 		int new_ioend = 0;
 
@@ -1123,8 +1113,7 @@ xfs_vm_writepage(
 			 * time.
 			 */
 			new_ioend = 1;
-			err = xfs_map_blocks(inode, offset, &imap, type,
-					     nonblocking);
+			err = xfs_map_blocks(inode, offset, &imap, type);
 			if (err)
 				goto error;
 			imap_valid = xfs_imap_valid(inode, &imap, offset);
@@ -1194,9 +1183,6 @@ error:
 	if (iohead)
 		xfs_cancel_ioend(iohead);
 
-	if (err == -EAGAIN)
-		goto redirty;
-
 	xfs_aops_discard_page(page);
 	ClearPageUptodate(page);
 	unlock_page(page);
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] xfs: Introduce writeback context for writepages
  2016-02-08  5:44 [PATCH 0/5 v3] xfs: get rid of xfs_cluster_write() Dave Chinner
  2016-02-08  5:44 ` [PATCH 1/5] xfs: remove nonblocking mode from xfs_vm_writepage Dave Chinner
@ 2016-02-08  5:44 ` Dave Chinner
  2016-02-09 13:39   ` Christoph Hellwig
  2016-02-09 14:22   ` Brian Foster
  2016-02-08  5:44 ` [PATCH 3/5] xfs: xfs_cluster_write is redundant Dave Chinner
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 20+ messages in thread
From: Dave Chinner @ 2016-02-08  5:44 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_vm_writepages() calls generic_writepages to writeback a range of
a file, but then xfs_vm_writepage() clusters pages itself as it does
not have any context it can pass between->writepage calls from
__write_cache_pages().

Introduce a writeback context for xfs_vm_writepages() and call
__write_cache_pages directly with our own writepage callback so that
we can pass that context to each writepage invocation. This
encapsulates the current mapping, whether it is valid or not, the
current ioend and it's IO type and the ioend chain being built.

This requires us to move the ioend submission up to the level where
the writepage context is declared. This does mean we do not submit
IO until we packaged the entire writeback range, but with the block
plugging in the writepages call this is the way IO is submitted,
anyway.

It also means that we need to handle discontiguous page ranges.  If
the pages sent down by write_cache_pages to the writepage callback
are discontiguous, we need to detect this and put each discontiguous
page range into individual ioends. This is needed to ensure that the
ioend accurately represents the range of the file that it covers so
that file size updates during IO completion set the size correctly.
Failure to take into account the discontiguous ranges results in
files being too small when writeback patterns are non-sequential.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 277 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 146 insertions(+), 131 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 00452cb..4453d1d 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -40,6 +40,18 @@
 #define XFS_DIO_FLAG_UNWRITTEN	(1 << 0)
 #define XFS_DIO_FLAG_APPEND	(1 << 1)
 
+/*
+ * structure owned by writepages passed to individual writepage calls
+ */
+struct xfs_writepage_ctx {
+	struct xfs_bmbt_irec    imap;
+	bool			imap_valid;
+	unsigned int		io_type;
+	struct xfs_ioend	*iohead;
+	struct xfs_ioend	*ioend;
+	sector_t		last_block;
+};
+
 void
 xfs_count_page_state(
 	struct page		*page,
@@ -341,7 +353,7 @@ xfs_map_blocks(
 	return 0;
 }
 
-STATIC int
+STATIC bool
 xfs_imap_valid(
 	struct inode		*inode,
 	struct xfs_bmbt_irec	*imap,
@@ -528,38 +540,6 @@ xfs_submit_ioend(
 }
 
 /*
- * Cancel submission of all buffer_heads so far in this endio.
- * Toss the endio too.  Only ever called for the initial page
- * in a writepage request, so only ever one page.
- */
-STATIC void
-xfs_cancel_ioend(
-	xfs_ioend_t		*ioend)
-{
-	xfs_ioend_t		*next;
-	struct buffer_head	*bh, *next_bh;
-
-	do {
-		next = ioend->io_list;
-		bh = ioend->io_buffer_head;
-		do {
-			next_bh = bh->b_private;
-			clear_buffer_async_write(bh);
-			/*
-			 * The unwritten flag is cleared when added to the
-			 * ioend. We're not submitting for I/O so mark the
-			 * buffer unwritten again for next time around.
-			 */
-			if (ioend->io_type == XFS_IO_UNWRITTEN)
-				set_buffer_unwritten(bh);
-			unlock_buffer(bh);
-		} while ((bh = next_bh) != NULL);
-
-		mempool_free(ioend, xfs_ioend_pool);
-	} while ((ioend = next) != NULL);
-}
-
-/*
  * Test to see if we've been building up a completion structure for
  * earlier buffers -- if so, we try to append to this ioend if we
  * can, otherwise we finish off any current ioend and start another.
@@ -570,29 +550,27 @@ xfs_add_to_ioend(
 	struct inode		*inode,
 	struct buffer_head	*bh,
 	xfs_off_t		offset,
-	unsigned int		type,
-	xfs_ioend_t		**result,
-	int			need_ioend)
+	struct xfs_writepage_ctx *wpc)
 {
-	xfs_ioend_t		*ioend = *result;
-
-	if (!ioend || need_ioend || type != ioend->io_type) {
-		xfs_ioend_t	*previous = *result;
-
-		ioend = xfs_alloc_ioend(inode, type);
-		ioend->io_offset = offset;
-		ioend->io_buffer_head = bh;
-		ioend->io_buffer_tail = bh;
-		if (previous)
-			previous->io_list = ioend;
-		*result = ioend;
+	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
+	    bh->b_blocknr != wpc->last_block + 1) {
+		struct xfs_ioend	*new;
+
+		new = xfs_alloc_ioend(inode, wpc->io_type);
+		new->io_offset = offset;
+		new->io_buffer_head = bh;
+		new->io_buffer_tail = bh;
+		if (wpc->ioend)
+			wpc->ioend->io_list = new;
+		wpc->ioend = new;
 	} else {
-		ioend->io_buffer_tail->b_private = bh;
-		ioend->io_buffer_tail = bh;
+		wpc->ioend->io_buffer_tail->b_private = bh;
+		wpc->ioend->io_buffer_tail = bh;
 	}
 
 	bh->b_private = NULL;
-	ioend->io_size += bh->b_size;
+	wpc->ioend->io_size += bh->b_size;
+	wpc->last_block = bh->b_blocknr;
 }
 
 STATIC void
@@ -689,17 +667,15 @@ xfs_convert_page(
 	struct inode		*inode,
 	struct page		*page,
 	loff_t			tindex,
-	struct xfs_bmbt_irec	*imap,
-	xfs_ioend_t		**ioendp,
+	struct xfs_writepage_ctx *wpc,
 	struct writeback_control *wbc)
 {
 	struct buffer_head	*bh, *head;
 	xfs_off_t		end_offset;
 	unsigned long		p_offset;
-	unsigned int		type;
 	int			len, page_dirty;
 	int			count = 0, done = 0, uptodate = 1;
- 	xfs_off_t		offset = page_offset(page);
+	xfs_off_t		offset = page_offset(page);
 
 	if (page->index != tindex)
 		goto fail;
@@ -709,7 +685,7 @@ xfs_convert_page(
 		goto fail_unlock_page;
 	if (page->mapping != inode->i_mapping)
 		goto fail_unlock_page;
-	if (!xfs_check_page_type(page, (*ioendp)->io_type, false))
+	if (!xfs_check_page_type(page, wpc->ioend->io_type, false))
 		goto fail_unlock_page;
 
 	/*
@@ -745,7 +721,7 @@ xfs_convert_page(
 	 * writeback.  Hence for more optimal IO patterns, we should always
 	 * avoid partial page writeback due to multiple mappings on a page here.
 	 */
-	if (!xfs_imap_valid(inode, imap, end_offset))
+	if (!xfs_imap_valid(inode, &wpc->imap, end_offset))
 		goto fail_unlock_page;
 
 	len = 1 << inode->i_blkbits;
@@ -777,23 +753,22 @@ xfs_convert_page(
 		if (buffer_unwritten(bh) || buffer_delay(bh) ||
 		    buffer_mapped(bh)) {
 			if (buffer_unwritten(bh))
-				type = XFS_IO_UNWRITTEN;
+				wpc->io_type = XFS_IO_UNWRITTEN;
 			else if (buffer_delay(bh))
-				type = XFS_IO_DELALLOC;
+				wpc->io_type = XFS_IO_DELALLOC;
 			else
-				type = XFS_IO_OVERWRITE;
+				wpc->io_type = XFS_IO_OVERWRITE;
 
 			/*
 			 * imap should always be valid because of the above
 			 * partial page end_offset check on the imap.
 			 */
-			ASSERT(xfs_imap_valid(inode, imap, offset));
+			ASSERT(xfs_imap_valid(inode, &wpc->imap, offset));
 
 			lock_buffer(bh);
-			if (type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, type,
-					 ioendp, done);
+			if (wpc->io_type != XFS_IO_OVERWRITE)
+				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
+			xfs_add_to_ioend(inode, bh, offset, wpc);
 
 			page_dirty--;
 			count++;
@@ -828,8 +803,7 @@ STATIC void
 xfs_cluster_write(
 	struct inode		*inode,
 	pgoff_t			tindex,
-	struct xfs_bmbt_irec	*imap,
-	xfs_ioend_t		**ioendp,
+	struct xfs_writepage_ctx *wpc,
 	struct writeback_control *wbc,
 	pgoff_t			tlast)
 {
@@ -845,7 +819,7 @@ xfs_cluster_write(
 
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			done = xfs_convert_page(inode, pvec.pages[i], tindex++,
-					imap, ioendp, wbc);
+						wpc, wbc);
 			if (done)
 				break;
 		}
@@ -931,6 +905,27 @@ out_invalidate:
 	return;
 }
 
+static int
+xfs_writepage_submit(
+	struct xfs_writepage_ctx *wpc,
+	struct writeback_control *wbc,
+	int			status)
+{
+	struct blk_plug		plug;
+
+	/* Reserve log space if we might write beyond the on-disk inode size. */
+	if (!status && wpc->ioend && wpc->ioend->io_type != XFS_IO_UNWRITTEN &&
+	    xfs_ioend_is_append(wpc->ioend))
+		status = xfs_setfilesize_trans_alloc(wpc->ioend);
+
+	if (wpc->iohead) {
+		blk_start_plug(&plug);
+		xfs_submit_ioend(wbc, wpc->iohead, status);
+		blk_finish_plug(&plug);
+	}
+	return status;
+}
+
 /*
  * Write out a dirty page.
  *
@@ -940,20 +935,19 @@ out_invalidate:
  * For any other dirty buffer heads on the page we should flush them.
  */
 STATIC int
-xfs_vm_writepage(
+xfs_do_writepage(
 	struct page		*page,
-	struct writeback_control *wbc)
+	struct writeback_control *wbc,
+	void			*data)
 {
+	struct xfs_writepage_ctx *wpc = data;
 	struct inode		*inode = page->mapping->host;
 	struct buffer_head	*bh, *head;
-	struct xfs_bmbt_irec	imap;
-	xfs_ioend_t		*ioend = NULL, *iohead = NULL;
 	loff_t			offset;
-	unsigned int		type;
 	__uint64_t              end_offset;
 	pgoff_t                 end_index, last_index;
 	ssize_t			len;
-	int			err, imap_valid = 0, uptodate = 1;
+	int			err, uptodate = 1;
 	int			count = 0;
 
 	trace_xfs_writepage(inode, page, 0, 0);
@@ -1052,11 +1046,8 @@ xfs_vm_writepage(
 
 	bh = head = page_buffers(page);
 	offset = page_offset(page);
-	type = XFS_IO_OVERWRITE;
 
 	do {
-		int new_ioend = 0;
-
 		if (offset >= end_offset)
 			break;
 		if (!buffer_uptodate(bh))
@@ -1069,24 +1060,24 @@ xfs_vm_writepage(
 		 * buffers covering holes here.
 		 */
 		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
-			imap_valid = 0;
+			wpc->imap_valid = false;
 			continue;
 		}
 
 		if (buffer_unwritten(bh)) {
-			if (type != XFS_IO_UNWRITTEN) {
-				type = XFS_IO_UNWRITTEN;
-				imap_valid = 0;
+			if (wpc->io_type != XFS_IO_UNWRITTEN) {
+				wpc->io_type = XFS_IO_UNWRITTEN;
+				wpc->imap_valid = false;
 			}
 		} else if (buffer_delay(bh)) {
-			if (type != XFS_IO_DELALLOC) {
-				type = XFS_IO_DELALLOC;
-				imap_valid = 0;
+			if (wpc->io_type != XFS_IO_DELALLOC) {
+				wpc->io_type = XFS_IO_DELALLOC;
+				wpc->imap_valid = false;
 			}
 		} else if (buffer_uptodate(bh)) {
-			if (type != XFS_IO_OVERWRITE) {
-				type = XFS_IO_OVERWRITE;
-				imap_valid = 0;
+			if (wpc->io_type != XFS_IO_OVERWRITE) {
+				wpc->io_type = XFS_IO_OVERWRITE;
+				wpc->imap_valid = false;
 			}
 		} else {
 			if (PageUptodate(page))
@@ -1097,38 +1088,31 @@ xfs_vm_writepage(
 			 * subsequent writeable buffers into a new
 			 * ioend.
 			 */
-			imap_valid = 0;
+			wpc->imap_valid = false;
 			continue;
 		}
 
-		if (imap_valid)
-			imap_valid = xfs_imap_valid(inode, &imap, offset);
-		if (!imap_valid) {
-			/*
-			 * If we didn't have a valid mapping then we need to
-			 * put the new mapping into a separate ioend structure.
-			 * This ensures non-contiguous extents always have
-			 * separate ioends, which is particularly important
-			 * for unwritten extent conversion at I/O completion
-			 * time.
-			 */
-			new_ioend = 1;
-			err = xfs_map_blocks(inode, offset, &imap, type);
+		if (wpc->imap_valid)
+			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
+							 offset);
+		if (!wpc->imap_valid) {
+			err = xfs_map_blocks(inode, offset, &wpc->imap,
+					     wpc->io_type);
 			if (err)
 				goto error;
-			imap_valid = xfs_imap_valid(inode, &imap, offset);
+			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
+							 offset);
 		}
-		if (imap_valid) {
+		if (wpc->imap_valid) {
 			lock_buffer(bh);
-			if (type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, &imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, type, &ioend,
-					 new_ioend);
+			if (wpc->io_type != XFS_IO_OVERWRITE)
+				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
+			xfs_add_to_ioend(inode, bh, offset, wpc);
 			count++;
 		}
 
-		if (!iohead)
-			iohead = ioend;
+		if (!wpc->iohead)
+			wpc->iohead = wpc->ioend;
 
 	} while (offset += len, ((bh = bh->b_this_page) != head));
 
@@ -1138,20 +1122,20 @@ xfs_vm_writepage(
 	xfs_start_page_writeback(page, 1, count);
 
 	/* if there is no IO to be submitted for this page, we are done */
-	if (!ioend)
+	if (!count)
 		return 0;
 
-	ASSERT(iohead);
+	ASSERT(wpc->iohead);
 
 	/*
 	 * Any errors from this point onwards need tobe reported through the IO
 	 * completion path as we have marked the initial page as under writeback
 	 * and unlocked it.
 	 */
-	if (imap_valid) {
+	if (wpc->imap_valid) {
 		xfs_off_t		end_index;
 
-		end_index = imap.br_startoff + imap.br_blockcount;
+		end_index = wpc->imap.br_startoff + wpc->imap.br_blockcount;
 
 		/* to bytes */
 		end_index <<= inode->i_blkbits;
@@ -1163,29 +1147,36 @@ xfs_vm_writepage(
 		if (end_index > last_index)
 			end_index = last_index;
 
-		xfs_cluster_write(inode, page->index + 1, &imap, &ioend,
-				  wbc, end_index);
+		xfs_cluster_write(inode, page->index + 1, wpc, wbc, end_index);
 	}
 
-
-	/*
-	 * Reserve log space if we might write beyond the on-disk inode size.
-	 */
-	err = 0;
-	if (ioend->io_type != XFS_IO_UNWRITTEN && xfs_ioend_is_append(ioend))
-		err = xfs_setfilesize_trans_alloc(ioend);
-
-	xfs_submit_ioend(wbc, iohead, err);
-
 	return 0;
 
 error:
-	if (iohead)
-		xfs_cancel_ioend(iohead);
+	/*
+	 * We have to fail the iohead here because we buffers locked in the
+	 * ioend chain. If we don't do this, we'll deadlock invalidating the
+	 * page as that tries to lock the buffers on the page. Also, because we
+	 * have set pages under writeback, we have to run IO completion to mark
+	 * the error state of the IO appropriately, so we can't cancel the ioend
+	 * directly here. That means we have to mark this page as under
+	 * writeback if we included any buffers from it in the ioend chain.
+	 */
+	if (count)
+		xfs_start_page_writeback(page, 0, count);
+	xfs_writepage_submit(wpc, wbc, err);
 
-	xfs_aops_discard_page(page);
-	ClearPageUptodate(page);
-	unlock_page(page);
+	/*
+	 * We can only discard the page we had the IO error on if we haven't
+	 * included it in the ioend above. If it has already been errored out,
+	 * the it is unlocked and we can't touch it here.
+	 */
+	if (!count) {
+		xfs_aops_discard_page(page);
+		ClearPageUptodate(page);
+		unlock_page(page);
+	}
+	mapping_set_error(page->mapping, err);
 	return err;
 
 redirty:
@@ -1195,12 +1186,36 @@ redirty:
 }
 
 STATIC int
+xfs_vm_writepage(
+	struct page		*page,
+	struct writeback_control *wbc)
+{
+	struct xfs_writepage_ctx wpc = {
+		.io_type = XFS_IO_OVERWRITE,
+	};
+	int			ret;
+
+	ret = xfs_do_writepage(page, wbc, &wpc);
+	if (ret)
+		return ret;
+	return xfs_writepage_submit(&wpc, wbc, ret);
+}
+
+STATIC int
 xfs_vm_writepages(
 	struct address_space	*mapping,
 	struct writeback_control *wbc)
 {
+	struct xfs_writepage_ctx wpc = {
+		.io_type = XFS_IO_OVERWRITE,
+	};
+	int			ret;
+
 	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
-	return generic_writepages(mapping, wbc);
+	ret = write_cache_pages(mapping, wbc, xfs_do_writepage, &wpc);
+	if (ret)
+		return ret;
+	return xfs_writepage_submit(&wpc, wbc, ret);
 }
 
 /*
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] xfs: xfs_cluster_write is redundant
  2016-02-08  5:44 [PATCH 0/5 v3] xfs: get rid of xfs_cluster_write() Dave Chinner
  2016-02-08  5:44 ` [PATCH 1/5] xfs: remove nonblocking mode from xfs_vm_writepage Dave Chinner
  2016-02-08  5:44 ` [PATCH 2/5] xfs: Introduce writeback context for writepages Dave Chinner
@ 2016-02-08  5:44 ` Dave Chinner
  2016-02-09 13:40   ` Christoph Hellwig
  2016-02-08  5:44 ` [PATCH 4/5] xfs: factor mapping out of xfs_do_writepage Dave Chinner
  2016-02-08  5:44 ` [PATCH 5/5] xfs: don't chain ioends during writepage submission Dave Chinner
  4 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2016-02-08  5:44 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_cluster_write() is not necessary now that xfs_vm_writepages()
aggregates writepage calls across a single mapping. This means we no
longer need to do page lookups in xfs_cluster_write, so writeback
only needs to look up th epage cache once per page being written.
This also removes a large amount of mostly duplicate code between
xfs_do_writepage() and xfs_convert_page().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 215 ++----------------------------------------------------
 1 file changed, 6 insertions(+), 209 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 4453d1d..e473f53 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -656,179 +656,6 @@ xfs_check_page_type(
 	return false;
 }
 
-/*
- * Allocate & map buffers for page given the extent map. Write it out.
- * except for the original page of a writepage, this is called on
- * delalloc/unwritten pages only, for the original page it is possible
- * that the page has no mapping at all.
- */
-STATIC int
-xfs_convert_page(
-	struct inode		*inode,
-	struct page		*page,
-	loff_t			tindex,
-	struct xfs_writepage_ctx *wpc,
-	struct writeback_control *wbc)
-{
-	struct buffer_head	*bh, *head;
-	xfs_off_t		end_offset;
-	unsigned long		p_offset;
-	int			len, page_dirty;
-	int			count = 0, done = 0, uptodate = 1;
-	xfs_off_t		offset = page_offset(page);
-
-	if (page->index != tindex)
-		goto fail;
-	if (!trylock_page(page))
-		goto fail;
-	if (PageWriteback(page))
-		goto fail_unlock_page;
-	if (page->mapping != inode->i_mapping)
-		goto fail_unlock_page;
-	if (!xfs_check_page_type(page, wpc->ioend->io_type, false))
-		goto fail_unlock_page;
-
-	/*
-	 * page_dirty is initially a count of buffers on the page before
-	 * EOF and is decremented as we move each into a cleanable state.
-	 *
-	 * Derivation:
-	 *
-	 * End offset is the highest offset that this page should represent.
-	 * If we are on the last page, (end_offset & (PAGE_CACHE_SIZE - 1))
-	 * will evaluate non-zero and be less than PAGE_CACHE_SIZE and
-	 * hence give us the correct page_dirty count. On any other page,
-	 * it will be zero and in that case we need page_dirty to be the
-	 * count of buffers on the page.
-	 */
-	end_offset = min_t(unsigned long long,
-			(xfs_off_t)(page->index + 1) << PAGE_CACHE_SHIFT,
-			i_size_read(inode));
-
-	/*
-	 * If the current map does not span the entire page we are about to try
-	 * to write, then give up. The only way we can write a page that spans
-	 * multiple mappings in a single writeback iteration is via the
-	 * xfs_vm_writepage() function. Data integrity writeback requires the
-	 * entire page to be written in a single attempt, otherwise the part of
-	 * the page we don't write here doesn't get written as part of the data
-	 * integrity sync.
-	 *
-	 * For normal writeback, we also don't attempt to write partial pages
-	 * here as it simply means that write_cache_pages() will see it under
-	 * writeback and ignore the page until some point in the future, at
-	 * which time this will be the only page in the file that needs
-	 * writeback.  Hence for more optimal IO patterns, we should always
-	 * avoid partial page writeback due to multiple mappings on a page here.
-	 */
-	if (!xfs_imap_valid(inode, &wpc->imap, end_offset))
-		goto fail_unlock_page;
-
-	len = 1 << inode->i_blkbits;
-	p_offset = min_t(unsigned long, end_offset & (PAGE_CACHE_SIZE - 1),
-					PAGE_CACHE_SIZE);
-	p_offset = p_offset ? roundup(p_offset, len) : PAGE_CACHE_SIZE;
-	page_dirty = p_offset / len;
-
-	/*
-	 * The moment we find a buffer that doesn't match our current type
-	 * specification or can't be written, abort the loop and start
-	 * writeback. As per the above xfs_imap_valid() check, only
-	 * xfs_vm_writepage() can handle partial page writeback fully - we are
-	 * limited here to the buffers that are contiguous with the current
-	 * ioend, and hence a buffer we can't write breaks that contiguity and
-	 * we have to defer the rest of the IO to xfs_vm_writepage().
-	 */
-	bh = head = page_buffers(page);
-	do {
-		if (offset >= end_offset)
-			break;
-		if (!buffer_uptodate(bh))
-			uptodate = 0;
-		if (!(PageUptodate(page) || buffer_uptodate(bh))) {
-			done = 1;
-			break;
-		}
-
-		if (buffer_unwritten(bh) || buffer_delay(bh) ||
-		    buffer_mapped(bh)) {
-			if (buffer_unwritten(bh))
-				wpc->io_type = XFS_IO_UNWRITTEN;
-			else if (buffer_delay(bh))
-				wpc->io_type = XFS_IO_DELALLOC;
-			else
-				wpc->io_type = XFS_IO_OVERWRITE;
-
-			/*
-			 * imap should always be valid because of the above
-			 * partial page end_offset check on the imap.
-			 */
-			ASSERT(xfs_imap_valid(inode, &wpc->imap, offset));
-
-			lock_buffer(bh);
-			if (wpc->io_type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, wpc);
-
-			page_dirty--;
-			count++;
-		} else {
-			done = 1;
-			break;
-		}
-	} while (offset += len, (bh = bh->b_this_page) != head);
-
-	if (uptodate && bh == head)
-		SetPageUptodate(page);
-
-	if (count) {
-		if (--wbc->nr_to_write <= 0 &&
-		    wbc->sync_mode == WB_SYNC_NONE)
-			done = 1;
-	}
-	xfs_start_page_writeback(page, !page_dirty, count);
-
-	return done;
- fail_unlock_page:
-	unlock_page(page);
- fail:
-	return 1;
-}
-
-/*
- * Convert & write out a cluster of pages in the same extent as defined
- * by mp and following the start page.
- */
-STATIC void
-xfs_cluster_write(
-	struct inode		*inode,
-	pgoff_t			tindex,
-	struct xfs_writepage_ctx *wpc,
-	struct writeback_control *wbc,
-	pgoff_t			tlast)
-{
-	struct pagevec		pvec;
-	int			done = 0, i;
-
-	pagevec_init(&pvec, 0);
-	while (!done && tindex <= tlast) {
-		unsigned len = min_t(pgoff_t, PAGEVEC_SIZE, tlast - tindex + 1);
-
-		if (!pagevec_lookup(&pvec, inode->i_mapping, tindex, len))
-			break;
-
-		for (i = 0; i < pagevec_count(&pvec); i++) {
-			done = xfs_convert_page(inode, pvec.pages[i], tindex++,
-						wpc, wbc);
-			if (done)
-				break;
-		}
-
-		pagevec_release(&pvec);
-		cond_resched();
-	}
-}
-
 STATIC void
 xfs_vm_invalidatepage(
 	struct page		*page,
@@ -945,7 +772,7 @@ xfs_do_writepage(
 	struct buffer_head	*bh, *head;
 	loff_t			offset;
 	__uint64_t              end_offset;
-	pgoff_t                 end_index, last_index;
+	pgoff_t                 end_index;
 	ssize_t			len;
 	int			err, uptodate = 1;
 	int			count = 0;
@@ -975,12 +802,9 @@ xfs_do_writepage(
 	if (WARN_ON_ONCE(current->flags & PF_FSTRANS))
 		goto redirty;
 
-	/* Is this page beyond the end of the file? */
-	offset = i_size_read(inode);
-	end_index = offset >> PAGE_CACHE_SHIFT;
-	last_index = (offset - 1) >> PAGE_CACHE_SHIFT;
-
 	/*
+	 * Is this page beyond the end of the file?
+	 *
 	 * The page index is less than the end_index, adjust the end_offset
 	 * to the highest offset that this page should represent.
 	 * -----------------------------------------------------
@@ -991,6 +815,8 @@ xfs_do_writepage(
 	 * |     desired writeback range    |      see else    |
 	 * ---------------------------------^------------------|
 	 */
+	offset = i_size_read(inode);
+	end_index = offset >> PAGE_CACHE_SHIFT;
 	if (page->index < end_index)
 		end_offset = (xfs_off_t)(page->index + 1) << PAGE_CACHE_SHIFT;
 	else {
@@ -1120,36 +946,7 @@ xfs_do_writepage(
 		SetPageUptodate(page);
 
 	xfs_start_page_writeback(page, 1, count);
-
-	/* if there is no IO to be submitted for this page, we are done */
-	if (!count)
-		return 0;
-
-	ASSERT(wpc->iohead);
-
-	/*
-	 * Any errors from this point onwards need tobe reported through the IO
-	 * completion path as we have marked the initial page as under writeback
-	 * and unlocked it.
-	 */
-	if (wpc->imap_valid) {
-		xfs_off_t		end_index;
-
-		end_index = wpc->imap.br_startoff + wpc->imap.br_blockcount;
-
-		/* to bytes */
-		end_index <<= inode->i_blkbits;
-
-		/* to pages */
-		end_index = (end_index - 1) >> PAGE_CACHE_SHIFT;
-
-		/* check against file size */
-		if (end_index > last_index)
-			end_index = last_index;
-
-		xfs_cluster_write(inode, page->index + 1, wpc, wbc, end_index);
-	}
-
+	ASSERT(wpc->iohead || !count);
 	return 0;
 
 error:
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] xfs: factor mapping out of xfs_do_writepage
  2016-02-08  5:44 [PATCH 0/5 v3] xfs: get rid of xfs_cluster_write() Dave Chinner
                   ` (2 preceding siblings ...)
  2016-02-08  5:44 ` [PATCH 3/5] xfs: xfs_cluster_write is redundant Dave Chinner
@ 2016-02-08  5:44 ` Dave Chinner
  2016-02-09 13:40   ` Christoph Hellwig
  2016-02-08  5:44 ` [PATCH 5/5] xfs: don't chain ioends during writepage submission Dave Chinner
  4 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2016-02-08  5:44 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Separate out the bufferhead based mapping from the writepage code so
that we have a clear separation of the page operations and the
bufferhead state.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 221 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 119 insertions(+), 102 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e473f53..e5b8214 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -753,6 +753,116 @@ xfs_writepage_submit(
 	return status;
 }
 
+static int
+xfs_writepage_map(
+	struct xfs_writepage_ctx *wpc,
+	struct inode		*inode,
+	struct page		*page,
+	loff_t			offset,
+	__uint64_t              end_offset)
+{
+	struct buffer_head	*bh, *head;
+	ssize_t			len = 1 << inode->i_blkbits;
+	int			error = 0;
+	int			uptodate = 1;
+	int			count = 0;
+
+	bh = head = page_buffers(page);
+	offset = page_offset(page);
+
+	do {
+		if (offset >= end_offset)
+			break;
+		if (!buffer_uptodate(bh))
+			uptodate = 0;
+
+		/*
+		 * set_page_dirty dirties all buffers in a page, independent
+		 * of their state.  The dirty state however is entirely
+		 * meaningless for holes (!mapped && uptodate), so skip
+		 * buffers covering holes here.
+		 */
+		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
+			wpc->imap_valid = false;
+			continue;
+		}
+
+		if (buffer_unwritten(bh)) {
+			if (wpc->io_type != XFS_IO_UNWRITTEN) {
+				wpc->io_type = XFS_IO_UNWRITTEN;
+				wpc->imap_valid = false;
+			}
+		} else if (buffer_delay(bh)) {
+			if (wpc->io_type != XFS_IO_DELALLOC) {
+				wpc->io_type = XFS_IO_DELALLOC;
+				wpc->imap_valid = false;
+			}
+		} else if (buffer_uptodate(bh)) {
+			if (wpc->io_type != XFS_IO_OVERWRITE) {
+				wpc->io_type = XFS_IO_OVERWRITE;
+				wpc->imap_valid = false;
+			}
+		} else {
+			if (PageUptodate(page))
+				ASSERT(buffer_mapped(bh));
+			/*
+			 * This buffer is not uptodate and will not be
+			 * written to disk.  Ensure that we will put any
+			 * subsequent writeable buffers into a new
+			 * ioend.
+			 */
+			wpc->imap_valid = false;
+			continue;
+		}
+
+		if (wpc->imap_valid)
+			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
+							 offset);
+		if (!wpc->imap_valid) {
+			error = xfs_map_blocks(inode, offset, &wpc->imap,
+					     wpc->io_type);
+			if (error)
+				goto out_error;
+			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
+							 offset);
+		}
+		if (wpc->imap_valid) {
+			lock_buffer(bh);
+			if (wpc->io_type != XFS_IO_OVERWRITE)
+				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
+			xfs_add_to_ioend(inode, bh, offset, wpc);
+			count++;
+		}
+
+		if (!wpc->iohead)
+			wpc->iohead = wpc->ioend;
+
+	} while (offset += len, ((bh = bh->b_this_page) != head));
+
+	if (uptodate && bh == head)
+		SetPageUptodate(page);
+
+	xfs_start_page_writeback(page, 1, count);
+	ASSERT(wpc->iohead || !count);
+	return 0;
+
+out_error:
+	/*
+	 * We can only discard the page we had the IO error on if we haven't
+	 * included it in the ioend above. If it has already been added to the
+	 * ioend, then we can't touch it here and need to rely on IO submission
+	 * to unlock it.
+	 */
+	if (count)
+		xfs_start_page_writeback(page, 0, count);
+	else {
+		xfs_aops_discard_page(page);
+		ClearPageUptodate(page);
+		unlock_page(page);
+	}
+	return error;
+}
+
 /*
  * Write out a dirty page.
  *
@@ -769,13 +879,10 @@ xfs_do_writepage(
 {
 	struct xfs_writepage_ctx *wpc = data;
 	struct inode		*inode = page->mapping->host;
-	struct buffer_head	*bh, *head;
 	loff_t			offset;
 	__uint64_t              end_offset;
 	pgoff_t                 end_index;
-	ssize_t			len;
-	int			err, uptodate = 1;
-	int			count = 0;
+	int			error = 0;
 
 	trace_xfs_writepage(inode, page, 0, 0);
 
@@ -868,113 +975,23 @@ xfs_do_writepage(
 		end_offset = offset;
 	}
 
-	len = 1 << inode->i_blkbits;
-
-	bh = head = page_buffers(page);
-	offset = page_offset(page);
-
-	do {
-		if (offset >= end_offset)
-			break;
-		if (!buffer_uptodate(bh))
-			uptodate = 0;
-
-		/*
-		 * set_page_dirty dirties all buffers in a page, independent
-		 * of their state.  The dirty state however is entirely
-		 * meaningless for holes (!mapped && uptodate), so skip
-		 * buffers covering holes here.
-		 */
-		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
-			wpc->imap_valid = false;
-			continue;
-		}
-
-		if (buffer_unwritten(bh)) {
-			if (wpc->io_type != XFS_IO_UNWRITTEN) {
-				wpc->io_type = XFS_IO_UNWRITTEN;
-				wpc->imap_valid = false;
-			}
-		} else if (buffer_delay(bh)) {
-			if (wpc->io_type != XFS_IO_DELALLOC) {
-				wpc->io_type = XFS_IO_DELALLOC;
-				wpc->imap_valid = false;
-			}
-		} else if (buffer_uptodate(bh)) {
-			if (wpc->io_type != XFS_IO_OVERWRITE) {
-				wpc->io_type = XFS_IO_OVERWRITE;
-				wpc->imap_valid = false;
-			}
-		} else {
-			if (PageUptodate(page))
-				ASSERT(buffer_mapped(bh));
-			/*
-			 * This buffer is not uptodate and will not be
-			 * written to disk.  Ensure that we will put any
-			 * subsequent writeable buffers into a new
-			 * ioend.
-			 */
-			wpc->imap_valid = false;
-			continue;
-		}
-
-		if (wpc->imap_valid)
-			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 offset);
-		if (!wpc->imap_valid) {
-			err = xfs_map_blocks(inode, offset, &wpc->imap,
-					     wpc->io_type);
-			if (err)
-				goto error;
-			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 offset);
-		}
-		if (wpc->imap_valid) {
-			lock_buffer(bh);
-			if (wpc->io_type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, wpc);
-			count++;
-		}
-
-		if (!wpc->iohead)
-			wpc->iohead = wpc->ioend;
-
-	} while (offset += len, ((bh = bh->b_this_page) != head));
-
-	if (uptodate && bh == head)
-		SetPageUptodate(page);
-
-	xfs_start_page_writeback(page, 1, count);
-	ASSERT(wpc->iohead || !count);
+	error = xfs_writepage_map(wpc, inode, page, offset, end_offset);
+	if (error)
+		goto out_error;
 	return 0;
 
-error:
+out_error:
 	/*
 	 * We have to fail the iohead here because we buffers locked in the
 	 * ioend chain. If we don't do this, we'll deadlock invalidating the
 	 * page as that tries to lock the buffers on the page. Also, because we
 	 * have set pages under writeback, we have to run IO completion to mark
 	 * the error state of the IO appropriately, so we can't cancel the ioend
-	 * directly here. That means we have to mark this page as under
-	 * writeback if we included any buffers from it in the ioend chain.
+	 * directly here.
 	 */
-	if (count)
-		xfs_start_page_writeback(page, 0, count);
-	xfs_writepage_submit(wpc, wbc, err);
-
-	/*
-	 * We can only discard the page we had the IO error on if we haven't
-	 * included it in the ioend above. If it has already been errored out,
-	 * the it is unlocked and we can't touch it here.
-	 */
-	if (!count) {
-		xfs_aops_discard_page(page);
-		ClearPageUptodate(page);
-		unlock_page(page);
-	}
-	mapping_set_error(page->mapping, err);
-	return err;
+	xfs_writepage_submit(wpc, wbc, error);
+	mapping_set_error(page->mapping, error);
+	return error;
 
 redirty:
 	redirty_page_for_writepage(wbc, page);
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] xfs: don't chain ioends during writepage submission
  2016-02-08  5:44 [PATCH 0/5 v3] xfs: get rid of xfs_cluster_write() Dave Chinner
                   ` (3 preceding siblings ...)
  2016-02-08  5:44 ` [PATCH 4/5] xfs: factor mapping out of xfs_do_writepage Dave Chinner
@ 2016-02-08  5:44 ` Dave Chinner
  2016-02-09 13:49   ` Christoph Hellwig
  2016-02-09 14:23   ` Brian Foster
  4 siblings, 2 replies; 20+ messages in thread
From: Dave Chinner @ 2016-02-08  5:44 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Currently we can build a long ioend chain during ->writepages that
gets attached to the writepage context. IO submission only then
occurs when we finish all the writepage processing. This means we
can have many ioends allocated and pending, and this violates the
mempool guarantees that we need to give about forwards progress.
i.e. we really should only have one ioend being built at a time,
otherwise we may drain the mempool trying to allocate a new ioend
and that blocks submission, completion and freeing of ioends that
are already in progress.

To prevent this situation from happening, we need to submit ioends
for IO as soon as they are ready for dispatch rather than queuing
them for later submission. This means the ioends have bios built
immediately and they get queued on any plug that is current active.
Hence if we schedule away from writeback, the ioends that have been
built will make forwards progress due to the plug flushing on
context switch. This will also prevent context switches from
creating unnecessary IO submission latency.

We can't completely avoid having nested IO allocation - when we have
a block size smaller than a page size, we still need to hold the
ioend submission until after we have marked the current page dirty.
Hence we may need multiple ioends to be held while the current page
is completely mapped and made ready for IO dispatch. We cannot avoid
this problem - the current code already has this ioend chaining
within a page so we can mostly ignore that it occurs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 94 ++++++++++++++++++++++++++++---------------------------
 1 file changed, 48 insertions(+), 46 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e5b8214..852ecf2 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -47,7 +47,6 @@ struct xfs_writepage_ctx {
 	struct xfs_bmbt_irec    imap;
 	bool			imap_valid;
 	unsigned int		io_type;
-	struct xfs_ioend	*iohead;
 	struct xfs_ioend	*ioend;
 	sector_t		last_block;
 };
@@ -461,19 +460,6 @@ static inline int xfs_bio_add_buffer(struct bio *bio, struct buffer_head *bh)
  * Submit all of the bios for all of the ioends we have saved up, covering the
  * initial writepage page and also any probed pages.
  *
- * Because we may have multiple ioends spanning a page, we need to start
- * writeback on all the buffers before we submit them for I/O. If we mark the
- * buffers as we got, then we can end up with a page that only has buffers
- * marked async write and I/O complete on can occur before we mark the other
- * buffers async write.
- *
- * The end result of this is that we trip a bug in end_page_writeback() because
- * we call it twice for the one page as the code in end_buffer_async_write()
- * assumes that all buffers on the page are started at the same time.
- *
- * The fix is two passes across the ioend list - one to start writeback on the
- * buffer_heads, and then submit them for I/O on the second pass.
- *
  * If @fail is non-zero, it means that we have a situation where some part of
  * the submission process has failed after we have marked paged for writeback
  * and unlocked them. In this situation, we need to fail the ioend chain rather
@@ -491,14 +477,11 @@ xfs_submit_ioend(
 	struct bio		*bio;
 	sector_t		lastblock = 0;
 
-	/* Pass 1 - start writeback */
-	do {
-		next = ioend->io_list;
-		for (bh = ioend->io_buffer_head; bh; bh = bh->b_private)
-			xfs_start_buffer_writeback(bh);
-	} while ((ioend = next) != NULL);
+	/* Reserve log space if we might write beyond the on-disk inode size. */
+	if (!fail && ioend && ioend->io_type != XFS_IO_UNWRITTEN &&
+	    xfs_ioend_is_append(ioend))
+		fail = xfs_setfilesize_trans_alloc(ioend);
 
-	/* Pass 2 - submit I/O */
 	ioend = head;
 	do {
 		next = ioend->io_list;
@@ -543,25 +526,28 @@ xfs_submit_ioend(
  * Test to see if we've been building up a completion structure for
  * earlier buffers -- if so, we try to append to this ioend if we
  * can, otherwise we finish off any current ioend and start another.
- * Return true if we've finished the given ioend.
+ * Return the ioend we finished off so that the caller can submit it
+ * once it has finished processing the dirty page.
  */
-STATIC void
+STATIC struct xfs_ioend *
 xfs_add_to_ioend(
 	struct inode		*inode,
 	struct buffer_head	*bh,
 	xfs_off_t		offset,
 	struct xfs_writepage_ctx *wpc)
 {
+	struct xfs_ioend	*ioend_to_submit = NULL;
+
 	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
 	    bh->b_blocknr != wpc->last_block + 1) {
 		struct xfs_ioend	*new;
 
+		ioend_to_submit = wpc->ioend;
+
 		new = xfs_alloc_ioend(inode, wpc->io_type);
 		new->io_offset = offset;
 		new->io_buffer_head = bh;
 		new->io_buffer_tail = bh;
-		if (wpc->ioend)
-			wpc->ioend->io_list = new;
 		wpc->ioend = new;
 	} else {
 		wpc->ioend->io_buffer_tail->b_private = bh;
@@ -571,6 +557,8 @@ xfs_add_to_ioend(
 	bh->b_private = NULL;
 	wpc->ioend->io_size += bh->b_size;
 	wpc->last_block = bh->b_blocknr;
+	xfs_start_buffer_writeback(bh);
+	return ioend_to_submit;
 }
 
 STATIC void
@@ -738,29 +726,22 @@ xfs_writepage_submit(
 	struct writeback_control *wbc,
 	int			status)
 {
-	struct blk_plug		plug;
-
-	/* Reserve log space if we might write beyond the on-disk inode size. */
-	if (!status && wpc->ioend && wpc->ioend->io_type != XFS_IO_UNWRITTEN &&
-	    xfs_ioend_is_append(wpc->ioend))
-		status = xfs_setfilesize_trans_alloc(wpc->ioend);
-
-	if (wpc->iohead) {
-		blk_start_plug(&plug);
-		xfs_submit_ioend(wbc, wpc->iohead, status);
-		blk_finish_plug(&plug);
-	}
+	if (wpc->ioend)
+		xfs_submit_ioend(wbc, wpc->ioend, status);
 	return status;
 }
 
 static int
 xfs_writepage_map(
 	struct xfs_writepage_ctx *wpc,
+	struct writeback_control *wbc,
 	struct inode		*inode,
 	struct page		*page,
 	loff_t			offset,
 	__uint64_t              end_offset)
 {
+	struct xfs_ioend	*ioend_to_submit = NULL;
+	struct xfs_ioend	*ioend_tail = NULL;
 	struct buffer_head	*bh, *head;
 	ssize_t			len = 1 << inode->i_blkbits;
 	int			error = 0;
@@ -827,23 +808,37 @@ xfs_writepage_map(
 							 offset);
 		}
 		if (wpc->imap_valid) {
+			struct xfs_ioend *ioend;
+
 			lock_buffer(bh);
 			if (wpc->io_type != XFS_IO_OVERWRITE)
 				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, wpc);
+			ioend = xfs_add_to_ioend(inode, bh, offset, wpc);
+			if (ioend) {
+				ioend->io_list = NULL;
+				if (!ioend_to_submit)
+					ioend_to_submit = ioend;
+				else
+					ioend_tail->io_list = ioend;
+				ioend_tail = ioend;
+			}
 			count++;
 		}
 
-		if (!wpc->iohead)
-			wpc->iohead = wpc->ioend;
-
 	} while (offset += len, ((bh = bh->b_this_page) != head));
 
 	if (uptodate && bh == head)
 		SetPageUptodate(page);
 
 	xfs_start_page_writeback(page, 1, count);
-	ASSERT(wpc->iohead || !count);
+	ASSERT(wpc->ioend || !count);
+	while (ioend_to_submit) {
+		struct xfs_ioend *next = ioend_to_submit->io_list;
+
+		ioend_to_submit->io_list = NULL;
+		xfs_submit_ioend(wbc, ioend_to_submit, 0);
+		ioend_to_submit = next;
+	}
 	return 0;
 
 out_error:
@@ -853,9 +848,16 @@ out_error:
 	 * ioend, then we can't touch it here and need to rely on IO submission
 	 * to unlock it.
 	 */
-	if (count)
+	if (count) {
 		xfs_start_page_writeback(page, 0, count);
-	else {
+		while (ioend_to_submit) {
+			struct xfs_ioend *next = ioend_to_submit->io_list;
+
+			ioend_to_submit->io_list = NULL;
+			xfs_submit_ioend(wbc, ioend_to_submit, 0);
+			ioend_to_submit = next;
+		}
+	} else {
 		xfs_aops_discard_page(page);
 		ClearPageUptodate(page);
 		unlock_page(page);
@@ -975,14 +977,14 @@ xfs_do_writepage(
 		end_offset = offset;
 	}
 
-	error = xfs_writepage_map(wpc, inode, page, offset, end_offset);
+	error = xfs_writepage_map(wpc, wbc, inode, page, offset, end_offset);
 	if (error)
 		goto out_error;
 	return 0;
 
 out_error:
 	/*
-	 * We have to fail the iohead here because we buffers locked in the
+	 * We have to fail the ioend here because we buffers locked in the
 	 * ioend chain. If we don't do this, we'll deadlock invalidating the
 	 * page as that tries to lock the buffers on the page. Also, because we
 	 * have set pages under writeback, we have to run IO completion to mark
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] xfs: Introduce writeback context for writepages
  2016-02-08  5:44 ` [PATCH 2/5] xfs: Introduce writeback context for writepages Dave Chinner
@ 2016-02-09 13:39   ` Christoph Hellwig
  2016-02-09 21:48     ` Dave Chinner
  2016-02-09 14:22   ` Brian Foster
  1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2016-02-09 13:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

This looks good in general and now passes testing for me.  A couple
comments below:

>  
>  /*
> - * Cancel submission of all buffer_heads so far in this endio.
> - * Toss the endio too.  Only ever called for the initial page
> - * in a writepage request, so only ever one page.
> - */
> -STATIC void
> -xfs_cancel_ioend(
> -	xfs_ioend_t		*ioend)
> -{
> -	xfs_ioend_t		*next;
> -	struct buffer_head	*bh, *next_bh;
> -
> -	do {
> -		next = ioend->io_list;
> -		bh = ioend->io_buffer_head;
> -		do {
> -			next_bh = bh->b_private;
> -			clear_buffer_async_write(bh);
> -			/*
> -			 * The unwritten flag is cleared when added to the
> -			 * ioend. We're not submitting for I/O so mark the
> -			 * buffer unwritten again for next time around.
> -			 */
> -			if (ioend->io_type == XFS_IO_UNWRITTEN)
> -				set_buffer_unwritten(bh);
> -			unlock_buffer(bh);
> -		} while ((bh = next_bh) != NULL);
> -
> -		mempool_free(ioend, xfs_ioend_pool);
> -	} while ((ioend = next) != NULL);
> -}

Removing xfs_cancel_ioend and replacing it with the start and cancel
writeback scheme that we currently only use for
xfs_setfilesize_trans_alloc failures actually seems to be the biggest
change in this patch and is entirely undocumented.  Any chance you
could split this into a prep patch and properly document it?

> -
> -	if (!ioend || need_ioend || type != ioend->io_type) {
> -		xfs_ioend_t	*previous = *result;
> -
> -		ioend = xfs_alloc_ioend(inode, type);
> -		ioend->io_offset = offset;
> -		ioend->io_buffer_head = bh;
> -		ioend->io_buffer_tail = bh;
> -		if (previous)
> -			previous->io_list = ioend;
> -		*result = ioend;
> +	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
> +	    bh->b_blocknr != wpc->last_block + 1) {

We now start a new ioend if the blocks aren't contiguous, which seems
reasonable.  But this also means the similar check in xfs_submit_ioend
should be removed at the same time.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5] xfs: xfs_cluster_write is redundant
  2016-02-08  5:44 ` [PATCH 3/5] xfs: xfs_cluster_write is redundant Dave Chinner
@ 2016-02-09 13:40   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-02-09 13:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] xfs: factor mapping out of xfs_do_writepage
  2016-02-08  5:44 ` [PATCH 4/5] xfs: factor mapping out of xfs_do_writepage Dave Chinner
@ 2016-02-09 13:40   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-02-09 13:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] xfs: don't chain ioends during writepage submission
  2016-02-08  5:44 ` [PATCH 5/5] xfs: don't chain ioends during writepage submission Dave Chinner
@ 2016-02-09 13:49   ` Christoph Hellwig
  2016-02-09 21:52     ` Dave Chinner
  2016-02-09 14:23   ` Brian Foster
  1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2016-02-09 13:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> +STATIC struct xfs_ioend *
>  xfs_add_to_ioend(
>  	struct inode		*inode,
>  	struct buffer_head	*bh,
>  	xfs_off_t		offset,
>  	struct xfs_writepage_ctx *wpc)
>  {
> +	struct xfs_ioend	*ioend_to_submit = NULL;

Maybe just

	struct xfs_ioend	*prev = NULL;

to be a little less verbose?

> @@ -738,29 +726,22 @@ xfs_writepage_submit(
>  	struct writeback_control *wbc,
>  	int			status)
>  {
> -	struct blk_plug		plug;
> -
> -	/* Reserve log space if we might write beyond the on-disk inode size. */
> -	if (!status && wpc->ioend && wpc->ioend->io_type != XFS_IO_UNWRITTEN &&
> -	    xfs_ioend_is_append(wpc->ioend))
> -		status = xfs_setfilesize_trans_alloc(wpc->ioend);
> -
> -	if (wpc->iohead) {
> -		blk_start_plug(&plug);
> -		xfs_submit_ioend(wbc, wpc->iohead, status);
> -		blk_finish_plug(&plug);
> -	}
> +	if (wpc->ioend)
> +		xfs_submit_ioend(wbc, wpc->ioend, status);
>  	return status;
>  }

With this change xfs_writepage_submit is rather pointless, I'd
rather open code it in the callers.

> +			ioend = xfs_add_to_ioend(inode, bh, offset, wpc);
> +			if (ioend) {
> +				ioend->io_list = NULL;
> +				if (!ioend_to_submit)
> +					ioend_to_submit = ioend;
> +				else
> +					ioend_tail->io_list = ioend;
> +				ioend_tail = ioend;
> +			}

Just using a list_head for this is a lot easier to read and less
error prone at the cost of a single additional pointer in the ioend.

> +	while (ioend_to_submit) {
> +		struct xfs_ioend *next = ioend_to_submit->io_list;
> +
> +		ioend_to_submit->io_list = NULL;
> +		xfs_submit_ioend(wbc, ioend_to_submit, 0);
> +		ioend_to_submit = next;
> +	}
>  	return 0;
>  
>  out_error:
> @@ -853,9 +848,16 @@ out_error:
>  	 * ioend, then we can't touch it here and need to rely on IO submission
>  	 * to unlock it.
>  	 */
> -	if (count)
> +	if (count) {
>  		xfs_start_page_writeback(page, 0, count);
> -	else {
> +		while (ioend_to_submit) {
> +			struct xfs_ioend *next = ioend_to_submit->io_list;
> +
> +			ioend_to_submit->io_list = NULL;
> +			xfs_submit_ioend(wbc, ioend_to_submit, 0);
> +			ioend_to_submit = next;
> +		}

I think this code cold be consolidated:

	ASSERT(wpc->ioend || !count);
out:
	if (count) {
		xfs_start_page_writeback(page, !error, count);
		while (ioend_to_submit) {
			struct xfs_ioend *next = ioend_to_submit->io_list;

			ioend_to_submit->io_list = NULL;
			xfs_submit_ioend(wbc, ioend_to_submit, 0);
			ioend_to_submit = next;
		}
	} else {
  		xfs_aops_discard_page(page);
  		ClearPageUptodate(page);
  		unlock_page(page);
  	}

	return error;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] xfs: Introduce writeback context for writepages
  2016-02-08  5:44 ` [PATCH 2/5] xfs: Introduce writeback context for writepages Dave Chinner
  2016-02-09 13:39   ` Christoph Hellwig
@ 2016-02-09 14:22   ` Brian Foster
  2016-02-09 21:51     ` Dave Chinner
  1 sibling, 1 reply; 20+ messages in thread
From: Brian Foster @ 2016-02-09 14:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Feb 08, 2016 at 04:44:15PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_vm_writepages() calls generic_writepages to writeback a range of
> a file, but then xfs_vm_writepage() clusters pages itself as it does
> not have any context it can pass between->writepage calls from
> __write_cache_pages().
> 
> Introduce a writeback context for xfs_vm_writepages() and call
> __write_cache_pages directly with our own writepage callback so that
> we can pass that context to each writepage invocation. This
> encapsulates the current mapping, whether it is valid or not, the
> current ioend and it's IO type and the ioend chain being built.
> 
> This requires us to move the ioend submission up to the level where
> the writepage context is declared. This does mean we do not submit
> IO until we packaged the entire writeback range, but with the block
> plugging in the writepages call this is the way IO is submitted,
> anyway.
> 
> It also means that we need to handle discontiguous page ranges.  If
> the pages sent down by write_cache_pages to the writepage callback
> are discontiguous, we need to detect this and put each discontiguous
> page range into individual ioends. This is needed to ensure that the
> ioend accurately represents the range of the file that it covers so
> that file size updates during IO completion set the size correctly.
> Failure to take into account the discontiguous ranges results in
> files being too small when writeback patterns are non-sequential.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_aops.c | 277 ++++++++++++++++++++++++++++--------------------------
>  1 file changed, 146 insertions(+), 131 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 00452cb..4453d1d 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
...
> @@ -1163,29 +1147,36 @@ xfs_vm_writepage(
>  		if (end_index > last_index)
>  			end_index = last_index;
>  
> -		xfs_cluster_write(inode, page->index + 1, &imap, &ioend,
> -				  wbc, end_index);
> +		xfs_cluster_write(inode, page->index + 1, wpc, wbc, end_index);
>  	}
>  
> -
> -	/*
> -	 * Reserve log space if we might write beyond the on-disk inode size.
> -	 */
> -	err = 0;
> -	if (ioend->io_type != XFS_IO_UNWRITTEN && xfs_ioend_is_append(ioend))
> -		err = xfs_setfilesize_trans_alloc(ioend);
> -
> -	xfs_submit_ioend(wbc, iohead, err);
> -
>  	return 0;
>  
>  error:
> -	if (iohead)
> -		xfs_cancel_ioend(iohead);
> +	/*
> +	 * We have to fail the iohead here because we buffers locked in the
> +	 * ioend chain. If we don't do this, we'll deadlock invalidating the
> +	 * page as that tries to lock the buffers on the page. Also, because we
> +	 * have set pages under writeback, we have to run IO completion to mark
> +	 * the error state of the IO appropriately, so we can't cancel the ioend
> +	 * directly here. That means we have to mark this page as under
> +	 * writeback if we included any buffers from it in the ioend chain.
> +	 */
> +	if (count)
> +		xfs_start_page_writeback(page, 0, count);
> +	xfs_writepage_submit(wpc, wbc, err);

We make the xfs_writepage_submit() error case call here because...

>  
> -	xfs_aops_discard_page(page);
> -	ClearPageUptodate(page);
> -	unlock_page(page);
> +	/*
> +	 * We can only discard the page we had the IO error on if we haven't
> +	 * included it in the ioend above. If it has already been errored out,
> +	 * the it is unlocked and we can't touch it here.
> +	 */
> +	if (!count) {
> +		xfs_aops_discard_page(page);
> +		ClearPageUptodate(page);
> +		unlock_page(page);
> +	}
> +	mapping_set_error(page->mapping, err);
>  	return err;
>  
>  redirty:
> @@ -1195,12 +1186,36 @@ redirty:
>  }
>  
>  STATIC int
> +xfs_vm_writepage(
> +	struct page		*page,
> +	struct writeback_control *wbc)
> +{
> +	struct xfs_writepage_ctx wpc = {
> +		.io_type = XFS_IO_OVERWRITE,
> +	};
> +	int			ret;
> +
> +	ret = xfs_do_writepage(page, wbc, &wpc);
> +	if (ret)
> +		return ret;
> +	return xfs_writepage_submit(&wpc, wbc, ret);


... the callers only call it when ret == 0. Can we eliminate the error
call down in xfs_do_writepage() and just invoke this consistently from
the writepage(s) callers?

Brian

> +}
> +
> +STATIC int
>  xfs_vm_writepages(
>  	struct address_space	*mapping,
>  	struct writeback_control *wbc)
>  {
> +	struct xfs_writepage_ctx wpc = {
> +		.io_type = XFS_IO_OVERWRITE,
> +	};
> +	int			ret;
> +
>  	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
> -	return generic_writepages(mapping, wbc);
> +	ret = write_cache_pages(mapping, wbc, xfs_do_writepage, &wpc);
> +	if (ret)
> +		return ret;
> +	return xfs_writepage_submit(&wpc, wbc, ret);
>  }
>  
>  /*
> -- 
> 2.5.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] xfs: don't chain ioends during writepage submission
  2016-02-08  5:44 ` [PATCH 5/5] xfs: don't chain ioends during writepage submission Dave Chinner
  2016-02-09 13:49   ` Christoph Hellwig
@ 2016-02-09 14:23   ` Brian Foster
  2016-02-09 21:59     ` Dave Chinner
  1 sibling, 1 reply; 20+ messages in thread
From: Brian Foster @ 2016-02-09 14:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Feb 08, 2016 at 04:44:18PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently we can build a long ioend chain during ->writepages that
> gets attached to the writepage context. IO submission only then
> occurs when we finish all the writepage processing. This means we
> can have many ioends allocated and pending, and this violates the
> mempool guarantees that we need to give about forwards progress.
> i.e. we really should only have one ioend being built at a time,
> otherwise we may drain the mempool trying to allocate a new ioend
> and that blocks submission, completion and freeing of ioends that
> are already in progress.
> 
> To prevent this situation from happening, we need to submit ioends
> for IO as soon as they are ready for dispatch rather than queuing
> them for later submission. This means the ioends have bios built
> immediately and they get queued on any plug that is current active.
> Hence if we schedule away from writeback, the ioends that have been
> built will make forwards progress due to the plug flushing on
> context switch. This will also prevent context switches from
> creating unnecessary IO submission latency.
> 
> We can't completely avoid having nested IO allocation - when we have
> a block size smaller than a page size, we still need to hold the
> ioend submission until after we have marked the current page dirty.
> Hence we may need multiple ioends to be held while the current page
> is completely mapped and made ready for IO dispatch. We cannot avoid
> this problem - the current code already has this ioend chaining
> within a page so we can mostly ignore that it occurs.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_aops.c | 94 ++++++++++++++++++++++++++++---------------------------
>  1 file changed, 48 insertions(+), 46 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index e5b8214..852ecf2 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
...
> @@ -738,29 +726,22 @@ xfs_writepage_submit(
>  	struct writeback_control *wbc,
>  	int			status)
>  {
> -	struct blk_plug		plug;
> -
> -	/* Reserve log space if we might write beyond the on-disk inode size. */
> -	if (!status && wpc->ioend && wpc->ioend->io_type != XFS_IO_UNWRITTEN &&
> -	    xfs_ioend_is_append(wpc->ioend))
> -		status = xfs_setfilesize_trans_alloc(wpc->ioend);
> -
> -	if (wpc->iohead) {
> -		blk_start_plug(&plug);
> -		xfs_submit_ioend(wbc, wpc->iohead, status);
> -		blk_finish_plug(&plug);
> -	}

We've dropped our plug here but I don't see anything added in
xfs_vm_writepages(). Shouldn't we have one there now that ioends are
submitted as we go? generic_writepages() uses one around its
write_cache_pages() call..

> +	if (wpc->ioend)
> +		xfs_submit_ioend(wbc, wpc->ioend, status);
>  	return status;
>  }
>  
>  static int
>  xfs_writepage_map(
>  	struct xfs_writepage_ctx *wpc,
> +	struct writeback_control *wbc,
>  	struct inode		*inode,
>  	struct page		*page,
>  	loff_t			offset,
>  	__uint64_t              end_offset)
>  {
> +	struct xfs_ioend	*ioend_to_submit = NULL;
> +	struct xfs_ioend	*ioend_tail = NULL;
>  	struct buffer_head	*bh, *head;
>  	ssize_t			len = 1 << inode->i_blkbits;
>  	int			error = 0;
> @@ -827,23 +808,37 @@ xfs_writepage_map(
>  							 offset);
>  		}
>  		if (wpc->imap_valid) {
> +			struct xfs_ioend *ioend;
> +
>  			lock_buffer(bh);
>  			if (wpc->io_type != XFS_IO_OVERWRITE)
>  				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
> -			xfs_add_to_ioend(inode, bh, offset, wpc);

"Big picture" comment here please, i.e. something along the lines of
(feel free to rewrite/fix/enhance):

"This implements an immediate ioend submission policy. If a new ioend is
required, the old ioend is returned and slated for submission on
function exit. The purpose of this policy is to avoid creating and
holding large chains of ioend objects in memory. While ioends are
submitted immediately after they are completed, block plugging helps
provide batching."

> +			ioend = xfs_add_to_ioend(inode, bh, offset, wpc);
> +			if (ioend) {
> +				ioend->io_list = NULL;
> +				if (!ioend_to_submit)
> +					ioend_to_submit = ioend;
> +				else
> +					ioend_tail->io_list = ioend;
> +				ioend_tail = ioend;
> +			}
>  			count++;
>  		}
>  
> -		if (!wpc->iohead)
> -			wpc->iohead = wpc->ioend;
> -
>  	} while (offset += len, ((bh = bh->b_this_page) != head));
>  
>  	if (uptodate && bh == head)
>  		SetPageUptodate(page);
>  
>  	xfs_start_page_writeback(page, 1, count);
> -	ASSERT(wpc->iohead || !count);
> +	ASSERT(wpc->ioend || !count);
> +	while (ioend_to_submit) {
> +		struct xfs_ioend *next = ioend_to_submit->io_list;
> +
> +		ioend_to_submit->io_list = NULL;
> +		xfs_submit_ioend(wbc, ioend_to_submit, 0);
> +		ioend_to_submit = next;
> +	}
>  	return 0;
>  
>  out_error:
> @@ -853,9 +848,16 @@ out_error:
>  	 * ioend, then we can't touch it here and need to rely on IO submission
>  	 * to unlock it.
>  	 */
> -	if (count)
> +	if (count) {
>  		xfs_start_page_writeback(page, 0, count);
> -	else {
> +		while (ioend_to_submit) {
> +			struct xfs_ioend *next = ioend_to_submit->io_list;
> +
> +			ioend_to_submit->io_list = NULL;
> +			xfs_submit_ioend(wbc, ioend_to_submit, 0);
> +			ioend_to_submit = next;
> +		}
> +	} else {
>  		xfs_aops_discard_page(page);
>  		ClearPageUptodate(page);
>  		unlock_page(page);

If we have an error and count == 0, we know that ioend_to_submit is NULL
because that is only potentially set once the first buffer is added.
That said, this doesn't mean that we don't have an ioend waiting on the
wpc. If we do, we still return the error and the ioend is errored out.

I wonder if that is really necessary if we haven't added any buffers
from the page..? Could we submit the ioend properly in that case? OTOH,
that might complicate the error reporting and an error here might be
serious enough that it isn't worth it, as opposed to just making sure we
clean up everything appropriately.

Brian

> @@ -975,14 +977,14 @@ xfs_do_writepage(
>  		end_offset = offset;
>  	}
>  
> -	error = xfs_writepage_map(wpc, inode, page, offset, end_offset);
> +	error = xfs_writepage_map(wpc, wbc, inode, page, offset, end_offset);
>  	if (error)
>  		goto out_error;
>  	return 0;
>  
>  out_error:
>  	/*
> -	 * We have to fail the iohead here because we buffers locked in the
> +	 * We have to fail the ioend here because we buffers locked in the
>  	 * ioend chain. If we don't do this, we'll deadlock invalidating the
>  	 * page as that tries to lock the buffers on the page. Also, because we
>  	 * have set pages under writeback, we have to run IO completion to mark
> -- 
> 2.5.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] xfs: Introduce writeback context for writepages
  2016-02-09 13:39   ` Christoph Hellwig
@ 2016-02-09 21:48     ` Dave Chinner
  2016-02-09 23:16       ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2016-02-09 21:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Feb 09, 2016 at 05:39:41AM -0800, Christoph Hellwig wrote:
> Removing xfs_cancel_ioend and replacing it with the start and cancel
> writeback scheme that we currently only use for
> xfs_setfilesize_trans_alloc failures actually seems to be the biggest
> change in this patch and is entirely undocumented.  Any chance you
> could split this into a prep patch and properly document it?

I can try.

> > -
> > -	if (!ioend || need_ioend || type != ioend->io_type) {
> > -		xfs_ioend_t	*previous = *result;
> > -
> > -		ioend = xfs_alloc_ioend(inode, type);
> > -		ioend->io_offset = offset;
> > -		ioend->io_buffer_head = bh;
> > -		ioend->io_buffer_tail = bh;
> > -		if (previous)
> > -			previous->io_list = ioend;
> > -		*result = ioend;
> > +	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
> > +	    bh->b_blocknr != wpc->last_block + 1) {
> 
> We now start a new ioend if the blocks aren't contiguous, which seems
> reasonable.  But this also means the similar check in xfs_submit_ioend
> should be removed at the same time.

OK.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] xfs: Introduce writeback context for writepages
  2016-02-09 14:22   ` Brian Foster
@ 2016-02-09 21:51     ` Dave Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2016-02-09 21:51 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Tue, Feb 09, 2016 at 09:22:26AM -0500, Brian Foster wrote:
> On Mon, Feb 08, 2016 at 04:44:15PM +1100, Dave Chinner wrote:
> > +	/*
> > +	 * We have to fail the iohead here because we buffers locked in the
> > +	 * ioend chain. If we don't do this, we'll deadlock invalidating the
> > +	 * page as that tries to lock the buffers on the page. Also, because we
> > +	 * have set pages under writeback, we have to run IO completion to mark
> > +	 * the error state of the IO appropriately, so we can't cancel the ioend
> > +	 * directly here. That means we have to mark this page as under
> > +	 * writeback if we included any buffers from it in the ioend chain.
> > +	 */
> > +	if (count)
> > +		xfs_start_page_writeback(page, 0, count);
> > +	xfs_writepage_submit(wpc, wbc, err);
> 
> We make the xfs_writepage_submit() error case call here because...
...
> >  STATIC int
> > +xfs_vm_writepage(
> > +	struct page		*page,
> > +	struct writeback_control *wbc)
> > +{
> > +	struct xfs_writepage_ctx wpc = {
> > +		.io_type = XFS_IO_OVERWRITE,
> > +	};
> > +	int			ret;
> > +
> > +	ret = xfs_do_writepage(page, wbc, &wpc);
> > +	if (ret)
> > +		return ret;
> > +	return xfs_writepage_submit(&wpc, wbc, ret);
> 
> 
> ... the callers only call it when ret == 0. Can we eliminate the error
> call down in xfs_do_writepage() and just invoke this consistently from
> the writepage(s) callers?

Probably - I think this is left over from an early concoction that
exploded badly when it was stirred too vigorously...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] xfs: don't chain ioends during writepage submission
  2016-02-09 13:49   ` Christoph Hellwig
@ 2016-02-09 21:52     ` Dave Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2016-02-09 21:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Feb 09, 2016 at 05:49:30AM -0800, Christoph Hellwig wrote:
> > +STATIC struct xfs_ioend *
> >  xfs_add_to_ioend(
> >  	struct inode		*inode,
> >  	struct buffer_head	*bh,
> >  	xfs_off_t		offset,
> >  	struct xfs_writepage_ctx *wpc)
> >  {
> > +	struct xfs_ioend	*ioend_to_submit = NULL;
> 
> Maybe just
> 
> 	struct xfs_ioend	*prev = NULL;
> 
> to be a little less verbose?

*nod*

> > +	if (wpc->ioend)
> > +		xfs_submit_ioend(wbc, wpc->ioend, status);
> >  	return status;
> >  }
> 
> With this change xfs_writepage_submit is rather pointless, I'd
> rather open code it in the callers.

Yup.

> 
> > +			ioend = xfs_add_to_ioend(inode, bh, offset, wpc);
> > +			if (ioend) {
> > +				ioend->io_list = NULL;
> > +				if (!ioend_to_submit)
> > +					ioend_to_submit = ioend;
> > +				else
> > +					ioend_tail->io_list = ioend;
> > +				ioend_tail = ioend;
> > +			}
> 
> Just using a list_head for this is a lot easier to read and less
> error prone at the cost of a single additional pointer in the ioend.

OK. I'll see what I can do here.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] xfs: don't chain ioends during writepage submission
  2016-02-09 14:23   ` Brian Foster
@ 2016-02-09 21:59     ` Dave Chinner
  2016-02-10 13:18       ` Brian Foster
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2016-02-09 21:59 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Tue, Feb 09, 2016 at 09:23:55AM -0500, Brian Foster wrote:
> On Mon, Feb 08, 2016 at 04:44:18PM +1100, Dave Chinner wrote:
> > @@ -738,29 +726,22 @@ xfs_writepage_submit(
> >  	struct writeback_control *wbc,
> >  	int			status)
> >  {
> > -	struct blk_plug		plug;
> > -
> > -	/* Reserve log space if we might write beyond the on-disk inode size. */
> > -	if (!status && wpc->ioend && wpc->ioend->io_type != XFS_IO_UNWRITTEN &&
> > -	    xfs_ioend_is_append(wpc->ioend))
> > -		status = xfs_setfilesize_trans_alloc(wpc->ioend);
> > -
> > -	if (wpc->iohead) {
> > -		blk_start_plug(&plug);
> > -		xfs_submit_ioend(wbc, wpc->iohead, status);
> > -		blk_finish_plug(&plug);
> > -	}
> 
> We've dropped our plug here but I don't see anything added in
> xfs_vm_writepages(). Shouldn't we have one there now that ioends are
> submitted as we go? generic_writepages() uses one around its
> write_cache_pages() call..

It's not really necessary, as we now have higher level plugging in
the writeback go will get flushed on context switch, and if we don't
have a high level plug (e.g. fsync triggered writeback), then we
submit the IO immediately, just like flushing the plug here would do
anyway....

> > @@ -827,23 +808,37 @@ xfs_writepage_map(
> >  							 offset);
> >  		}
> >  		if (wpc->imap_valid) {
> > +			struct xfs_ioend *ioend;
> > +
> >  			lock_buffer(bh);
> >  			if (wpc->io_type != XFS_IO_OVERWRITE)
> >  				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
> > -			xfs_add_to_ioend(inode, bh, offset, wpc);
> 
> "Big picture" comment here please, i.e. something along the lines of
> (feel free to rewrite/fix/enhance):
> 
> "This implements an immediate ioend submission policy. If a new ioend is
> required, the old ioend is returned and slated for submission on
> function exit. The purpose of this policy is to avoid creating and
> holding large chains of ioend objects in memory. While ioends are
> submitted immediately after they are completed, block plugging helps
> provide batching."

I can add something like that to the function.

> > -	else {
> > +		while (ioend_to_submit) {
> > +			struct xfs_ioend *next = ioend_to_submit->io_list;
> > +
> > +			ioend_to_submit->io_list = NULL;
> > +			xfs_submit_ioend(wbc, ioend_to_submit, 0);
> > +			ioend_to_submit = next;
> > +		}
> > +	} else {
> >  		xfs_aops_discard_page(page);
> >  		ClearPageUptodate(page);
> >  		unlock_page(page);
> 
> If we have an error and count == 0, we know that ioend_to_submit is NULL
> because that is only potentially set once the first buffer is added.
> That said, this doesn't mean that we don't have an ioend waiting on the
> wpc. If we do, we still return the error and the ioend is errored out.
> 
> I wonder if that is really necessary if we haven't added any buffers
> from the page..? Could we submit the ioend properly in that case? OTOH,
> that might complicate the error reporting and an error here might be
> serious enough that it isn't worth it, as opposed to just making sure we
> clean up everything appropriately.

The way I've done it is the same as the existing code - on error the
entire ioend chain that has been built is errored out. I'd prefer to
keep it that way right now to minimise the potential behavioural
changes of the patch series. We can look to changing to partial
submission in a separate patch set if it makes sense to do so.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] xfs: Introduce writeback context for writepages
  2016-02-09 21:48     ` Dave Chinner
@ 2016-02-09 23:16       ` Dave Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2016-02-09 23:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Feb 10, 2016 at 08:48:50AM +1100, Dave Chinner wrote:
> On Tue, Feb 09, 2016 at 05:39:41AM -0800, Christoph Hellwig wrote:
> > > -	if (!ioend || need_ioend || type != ioend->io_type) {
> > > -		xfs_ioend_t	*previous = *result;
> > > -
> > > -		ioend = xfs_alloc_ioend(inode, type);
> > > -		ioend->io_offset = offset;
> > > -		ioend->io_buffer_head = bh;
> > > -		ioend->io_buffer_tail = bh;
> > > -		if (previous)
> > > -			previous->io_list = ioend;
> > > -		*result = ioend;
> > > +	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
> > > +	    bh->b_blocknr != wpc->last_block + 1) {
> > 
> > We now start a new ioend if the blocks aren't contiguous, which seems
> > reasonable.  But this also means the similar check in xfs_submit_ioend
> > should be removed at the same time.
> 
> OK.

On second thoughts, I have another patch I haven't posted yet that
builds bios directly in xfs_add_to_ioend() that gets rid of the
buffer head chain walk during submission, so I'll leave changing the
submission code to that patch rather than changing here...

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] xfs: don't chain ioends during writepage submission
  2016-02-09 21:59     ` Dave Chinner
@ 2016-02-10 13:18       ` Brian Foster
  2016-02-10 21:09         ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2016-02-10 13:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Feb 10, 2016 at 08:59:00AM +1100, Dave Chinner wrote:
> On Tue, Feb 09, 2016 at 09:23:55AM -0500, Brian Foster wrote:
> > On Mon, Feb 08, 2016 at 04:44:18PM +1100, Dave Chinner wrote:
> > > @@ -738,29 +726,22 @@ xfs_writepage_submit(
> > >  	struct writeback_control *wbc,
> > >  	int			status)
> > >  {
> > > -	struct blk_plug		plug;
> > > -
> > > -	/* Reserve log space if we might write beyond the on-disk inode size. */
> > > -	if (!status && wpc->ioend && wpc->ioend->io_type != XFS_IO_UNWRITTEN &&
> > > -	    xfs_ioend_is_append(wpc->ioend))
> > > -		status = xfs_setfilesize_trans_alloc(wpc->ioend);
> > > -
> > > -	if (wpc->iohead) {
> > > -		blk_start_plug(&plug);
> > > -		xfs_submit_ioend(wbc, wpc->iohead, status);
> > > -		blk_finish_plug(&plug);
> > > -	}
> > 
> > We've dropped our plug here but I don't see anything added in
> > xfs_vm_writepages(). Shouldn't we have one there now that ioends are
> > submitted as we go? generic_writepages() uses one around its
> > write_cache_pages() call..
> 
> It's not really necessary, as we now have higher level plugging in
> the writeback go will get flushed on context switch, and if we don't
> have a high level plug (e.g. fsync triggered writeback), then we
> submit the IO immediately, just like flushing the plug here would do
> anyway....
> 

Ok, I'm digging around the wb code a bit and I see plugs in/around
wb_writeback(), so I assume that's what you're referring to in the first
case. I'm not quite following the fsync case though...

In the current upstream code, fsync() leads to the following call chain:

  filemap_write_and_wait_range()
    __filemap_fdatawrite_range()
      do_writepages()
        xfs_vm_writepages()
          generic_writepages()
            blk_start_plug()
            write_cache_pages()
            blk_finish_plug()

After this series, we have the following:

  filemap_write_and_wait_range()
    __filemap_fdatawrite_range()
      do_writepages()
        xfs_vm_writepages() 
          write_cache_pages()

... with no plug that I can see. What am I missing?
      
...
> > > -	else {
> > > +		while (ioend_to_submit) {
> > > +			struct xfs_ioend *next = ioend_to_submit->io_list;
> > > +
> > > +			ioend_to_submit->io_list = NULL;
> > > +			xfs_submit_ioend(wbc, ioend_to_submit, 0);
> > > +			ioend_to_submit = next;
> > > +		}
> > > +	} else {
> > >  		xfs_aops_discard_page(page);
> > >  		ClearPageUptodate(page);
> > >  		unlock_page(page);
> > 
> > If we have an error and count == 0, we know that ioend_to_submit is NULL
> > because that is only potentially set once the first buffer is added.
> > That said, this doesn't mean that we don't have an ioend waiting on the
> > wpc. If we do, we still return the error and the ioend is errored out.
> > 
> > I wonder if that is really necessary if we haven't added any buffers
> > from the page..? Could we submit the ioend properly in that case? OTOH,
> > that might complicate the error reporting and an error here might be
> > serious enough that it isn't worth it, as opposed to just making sure we
> > clean up everything appropriately.
> 
> The way I've done it is the same as the existing code - on error the
> entire ioend chain that has been built is errored out. I'd prefer to
> keep it that way right now to minimise the potential behavioural
> changes of the patch series. We can look to changing to partial
> submission in a separate patch set if it makes sense to do so.
> 

I noticed that the existing code is roughly equivalent, so that's fair I
think. Thanks.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] xfs: don't chain ioends during writepage submission
  2016-02-10 13:18       ` Brian Foster
@ 2016-02-10 21:09         ` Dave Chinner
  2016-02-11 12:24           ` Brian Foster
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2016-02-10 21:09 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, Feb 10, 2016 at 08:18:17AM -0500, Brian Foster wrote:
> On Wed, Feb 10, 2016 at 08:59:00AM +1100, Dave Chinner wrote:
> > On Tue, Feb 09, 2016 at 09:23:55AM -0500, Brian Foster wrote:
> > > On Mon, Feb 08, 2016 at 04:44:18PM +1100, Dave Chinner wrote:
> > > > @@ -738,29 +726,22 @@ xfs_writepage_submit(
> > > >  	struct writeback_control *wbc,
> > > >  	int			status)
> > > >  {
> > > > -	struct blk_plug		plug;
> > > > -
> > > > -	/* Reserve log space if we might write beyond the on-disk inode size. */
> > > > -	if (!status && wpc->ioend && wpc->ioend->io_type != XFS_IO_UNWRITTEN &&
> > > > -	    xfs_ioend_is_append(wpc->ioend))
> > > > -		status = xfs_setfilesize_trans_alloc(wpc->ioend);
> > > > -
> > > > -	if (wpc->iohead) {
> > > > -		blk_start_plug(&plug);
> > > > -		xfs_submit_ioend(wbc, wpc->iohead, status);
> > > > -		blk_finish_plug(&plug);
> > > > -	}
> > > 
> > > We've dropped our plug here but I don't see anything added in
> > > xfs_vm_writepages(). Shouldn't we have one there now that ioends are
> > > submitted as we go? generic_writepages() uses one around its
> > > write_cache_pages() call..
> > 
> > It's not really necessary, as we now have higher level plugging in
> > the writeback go will get flushed on context switch, and if we don't
> > have a high level plug (e.g. fsync triggered writeback), then we
> > submit the IO immediately, just like flushing the plug here would do
> > anyway....
> > 
> 
> Ok, I'm digging around the wb code a bit and I see plugs in/around
> wb_writeback(), so I assume that's what you're referring to in the first
> case. I'm not quite following the fsync case though...
> 
> In the current upstream code, fsync() leads to the following call chain:
> 
>   filemap_write_and_wait_range()
>     __filemap_fdatawrite_range()
>       do_writepages()
>         xfs_vm_writepages()
>           generic_writepages()
>             blk_start_plug()
>             write_cache_pages()
>             blk_finish_plug()
> 
> After this series, we have the following:
> 
>   filemap_write_and_wait_range()
>     __filemap_fdatawrite_range()
>       do_writepages()
>         xfs_vm_writepages() 
>           write_cache_pages()
> 
> ... with no plug that I can see. What am I missing?

fsync tends to be a latency sensitive operation, not a bandwidth
maximising operation. Plugging trades off IO submission latency for
maximising IO bandwidth. For fsync and other single inode operations
that block waiting for the IO to complete, maximising bandwidth is
not necessarily the right thing to do.

For single inode IO commands (such as through
__filemap_fdatawrite_range), block plugging will only improve
performance if the filesystem does not form large bios to begin
with. XFS always builds maximally sized bios if it can, so plugging
cannot improve the IO throughput from such writeback behaviour
because the bios it builds cannot be further merged.  Such bios are
better served being pushed straight in the the IO scheduler queues.

IOWs, plugging only makes a difference when the IO being formed is
small but is mergable in the IO scheduler. This what happens with
small file delayed allocation in writeback in XFS, and nowdays we
have a high level plug for this (i.e. in writeback_inodes_wb() and
wb_writeback()). Hence those one-bio-per-inode-but-all-sequential IO
will be merged in the plug before dispatch, thereby improving write
bandwidth under such small file writeback workloads. (See the
numbers in commmit d353d75 writeback: plug writeback at a high
level").)

IOWs, block plugging is not a magical "make everything go faster"
knob. Different filesystems have different IO dispatch methods, and
so require different plugging strategies to optimise their IO
patterns.  It may be that plugging in xfs_vm_writepages is
advantageous in some workloads for fsync, but I haven't been able to
measure them.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] xfs: don't chain ioends during writepage submission
  2016-02-10 21:09         ` Dave Chinner
@ 2016-02-11 12:24           ` Brian Foster
  0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2016-02-11 12:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Feb 11, 2016 at 08:09:26AM +1100, Dave Chinner wrote:
> On Wed, Feb 10, 2016 at 08:18:17AM -0500, Brian Foster wrote:
> > On Wed, Feb 10, 2016 at 08:59:00AM +1100, Dave Chinner wrote:
> > > On Tue, Feb 09, 2016 at 09:23:55AM -0500, Brian Foster wrote:
> > > > On Mon, Feb 08, 2016 at 04:44:18PM +1100, Dave Chinner wrote:
> > > > > @@ -738,29 +726,22 @@ xfs_writepage_submit(
> > > > >  	struct writeback_control *wbc,
> > > > >  	int			status)
> > > > >  {
> > > > > -	struct blk_plug		plug;
> > > > > -
> > > > > -	/* Reserve log space if we might write beyond the on-disk inode size. */
> > > > > -	if (!status && wpc->ioend && wpc->ioend->io_type != XFS_IO_UNWRITTEN &&
> > > > > -	    xfs_ioend_is_append(wpc->ioend))
> > > > > -		status = xfs_setfilesize_trans_alloc(wpc->ioend);
> > > > > -
> > > > > -	if (wpc->iohead) {
> > > > > -		blk_start_plug(&plug);
> > > > > -		xfs_submit_ioend(wbc, wpc->iohead, status);
> > > > > -		blk_finish_plug(&plug);
> > > > > -	}
> > > > 
> > > > We've dropped our plug here but I don't see anything added in
> > > > xfs_vm_writepages(). Shouldn't we have one there now that ioends are
> > > > submitted as we go? generic_writepages() uses one around its
> > > > write_cache_pages() call..
> > > 
> > > It's not really necessary, as we now have higher level plugging in
> > > the writeback go will get flushed on context switch, and if we don't
> > > have a high level plug (e.g. fsync triggered writeback), then we
> > > submit the IO immediately, just like flushing the plug here would do
> > > anyway....
> > > 
> > 
> > Ok, I'm digging around the wb code a bit and I see plugs in/around
> > wb_writeback(), so I assume that's what you're referring to in the first
> > case. I'm not quite following the fsync case though...
> > 
> > In the current upstream code, fsync() leads to the following call chain:
> > 
> >   filemap_write_and_wait_range()
> >     __filemap_fdatawrite_range()
> >       do_writepages()
> >         xfs_vm_writepages()
> >           generic_writepages()
> >             blk_start_plug()
> >             write_cache_pages()
> >             blk_finish_plug()
> > 
> > After this series, we have the following:
> > 
> >   filemap_write_and_wait_range()
> >     __filemap_fdatawrite_range()
> >       do_writepages()
> >         xfs_vm_writepages() 
> >           write_cache_pages()
> > 
> > ... with no plug that I can see. What am I missing?
> 
> fsync tends to be a latency sensitive operation, not a bandwidth
> maximising operation. Plugging trades off IO submission latency for
> maximising IO bandwidth. For fsync and other single inode operations
> that block waiting for the IO to complete, maximising bandwidth is
> not necessarily the right thing to do.
> 

Ok.

> For single inode IO commands (such as through
> __filemap_fdatawrite_range), block plugging will only improve
> performance if the filesystem does not form large bios to begin
> with. XFS always builds maximally sized bios if it can, so plugging
> cannot improve the IO throughput from such writeback behaviour
> because the bios it builds cannot be further merged.  Such bios are
> better served being pushed straight in the the IO scheduler queues.
> 
> IOWs, plugging only makes a difference when the IO being formed is
> small but is mergable in the IO scheduler. This what happens with
> small file delayed allocation in writeback in XFS, and nowdays we
> have a high level plug for this (i.e. in writeback_inodes_wb() and
> wb_writeback()). Hence those one-bio-per-inode-but-all-sequential IO
> will be merged in the plug before dispatch, thereby improving write
> bandwidth under such small file writeback workloads. (See the
> numbers in commmit d353d75 writeback: plug writeback at a high
> level").)
> 

Makes sense.

> IOWs, block plugging is not a magical "make everything go faster"
> knob. Different filesystems have different IO dispatch methods, and
> so require different plugging strategies to optimise their IO
> patterns.  It may be that plugging in xfs_vm_writepages is
> advantageous in some workloads for fsync, but I haven't been able to
> measure them.
> 

I don't think I suggested it was magical in any way. ;) My initial
feedback was simply based on the fact that it looked like the behavior
changed without notice, so it wasn't clear if that was intentional. You
pointed out the higher level wb plug but at the same time implied not
having a plug in the fsync case, which we previously did have (granted,
only for the mapping). Perhaps I read into that wrong. It would be nice
if the commit log made a quick mention about the plug situation (one
context has a separate plug, for the other it is unnecessary), but
otherwise the explanation addresses my concerns. Thanks!

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-02-11 12:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-08  5:44 [PATCH 0/5 v3] xfs: get rid of xfs_cluster_write() Dave Chinner
2016-02-08  5:44 ` [PATCH 1/5] xfs: remove nonblocking mode from xfs_vm_writepage Dave Chinner
2016-02-08  5:44 ` [PATCH 2/5] xfs: Introduce writeback context for writepages Dave Chinner
2016-02-09 13:39   ` Christoph Hellwig
2016-02-09 21:48     ` Dave Chinner
2016-02-09 23:16       ` Dave Chinner
2016-02-09 14:22   ` Brian Foster
2016-02-09 21:51     ` Dave Chinner
2016-02-08  5:44 ` [PATCH 3/5] xfs: xfs_cluster_write is redundant Dave Chinner
2016-02-09 13:40   ` Christoph Hellwig
2016-02-08  5:44 ` [PATCH 4/5] xfs: factor mapping out of xfs_do_writepage Dave Chinner
2016-02-09 13:40   ` Christoph Hellwig
2016-02-08  5:44 ` [PATCH 5/5] xfs: don't chain ioends during writepage submission Dave Chinner
2016-02-09 13:49   ` Christoph Hellwig
2016-02-09 21:52     ` Dave Chinner
2016-02-09 14:23   ` Brian Foster
2016-02-09 21:59     ` Dave Chinner
2016-02-10 13:18       ` Brian Foster
2016-02-10 21:09         ` Dave Chinner
2016-02-11 12:24           ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.