All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write
@ 2016-02-10  8:47 Dave Chinner
  2016-02-10  8:47 ` [PATCH 1/8] xfs: remove nonblocking mode from xfs_vm_writepage Dave Chinner
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Dave Chinner @ 2016-02-10  8:47 UTC (permalink / raw)
  To: xfs


Hi folks,

This is the 4th version of this series to followup on review
comments from Brian and Christoph. There are three new patches in
this series. Patch 2/8 is split out from patch 3/8 to document the
removal of the ioend cancelling code, spearate from the introduction
of the writepage context. Patches 7 and 8 are new patches (as in the
first time I've posted them) to demonstrate how to remove the IO
completion dependency on recording the bufferehads attached to the
ioend. This is the first step in removing bufferheads from the
writepage IO path - these are FYI patches, not patches I want to
have committed immediately.

The changes all run through xfstests on 4k and 1k block size
filesystems fine, and I think i addressed all the review comments.
Patch 6/8 changed enough that I removed Brian's reviewed-by from it
and so it will need to be looked at completely again.

-Dave.

Version 4:
- split xfs_ioend_cancel removal into it's own patch (patch 2)
- cleaned up submission of ioends to be consistent w.r.t. success
  and error paths (patch 3)
- made variables names less verbose (patch 6)
- got rid of xfs_writepage_submit() (patch 6)
- use a struct list_head for ioend chaining (patch 6)
- consolidated ioend submission paths to use common code (patch 6)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/8] xfs: remove nonblocking mode from xfs_vm_writepage
  2016-02-10  8:47 [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Dave Chinner
@ 2016-02-10  8:47 ` Dave Chinner
  2016-02-10  8:47 ` [PATCH 2/8] xfs: remove xfs_cancel_ioend Dave Chinner
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2016-02-10  8:47 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Remove the nonblocking optimisation done for mapping lookups during
writeback. It's not clear that leaving a hole in the writeback range
just because we couldn't get a lock is really a win, as it makes us
do another small random IO later on rather than a large sequential
IO now.

As this gets in the way of sane error handling later on, just remove
for the moment and we can re-introduce an equivalent optimisation in
future if we see problems due to extent map lock contention.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 20 +++-----------------
 1 file changed, 3 insertions(+), 17 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 14ac982..00452cb 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -289,8 +289,7 @@ xfs_map_blocks(
 	struct inode		*inode,
 	loff_t			offset,
 	struct xfs_bmbt_irec	*imap,
-	int			type,
-	int			nonblocking)
+	int			type)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -306,12 +305,7 @@ xfs_map_blocks(
 	if (type == XFS_IO_UNWRITTEN)
 		bmapi_flags |= XFS_BMAPI_IGSTATE;
 
-	if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED)) {
-		if (nonblocking)
-			return -EAGAIN;
-		xfs_ilock(ip, XFS_ILOCK_SHARED);
-	}
-
+	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
 	ASSERT(offset <= mp->m_super->s_maxbytes);
@@ -961,7 +955,6 @@ xfs_vm_writepage(
 	ssize_t			len;
 	int			err, imap_valid = 0, uptodate = 1;
 	int			count = 0;
-	int			nonblocking = 0;
 
 	trace_xfs_writepage(inode, page, 0, 0);
 
@@ -1061,9 +1054,6 @@ xfs_vm_writepage(
 	offset = page_offset(page);
 	type = XFS_IO_OVERWRITE;
 
-	if (wbc->sync_mode == WB_SYNC_NONE)
-		nonblocking = 1;
-
 	do {
 		int new_ioend = 0;
 
@@ -1123,8 +1113,7 @@ xfs_vm_writepage(
 			 * time.
 			 */
 			new_ioend = 1;
-			err = xfs_map_blocks(inode, offset, &imap, type,
-					     nonblocking);
+			err = xfs_map_blocks(inode, offset, &imap, type);
 			if (err)
 				goto error;
 			imap_valid = xfs_imap_valid(inode, &imap, offset);
@@ -1194,9 +1183,6 @@ error:
 	if (iohead)
 		xfs_cancel_ioend(iohead);
 
-	if (err == -EAGAIN)
-		goto redirty;
-
 	xfs_aops_discard_page(page);
 	ClearPageUptodate(page);
 	unlock_page(page);
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/8] xfs: remove xfs_cancel_ioend
  2016-02-10  8:47 [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Dave Chinner
  2016-02-10  8:47 ` [PATCH 1/8] xfs: remove nonblocking mode from xfs_vm_writepage Dave Chinner
@ 2016-02-10  8:47 ` Dave Chinner
  2016-02-10 11:28   ` Christoph Hellwig
  2016-02-10  8:47 ` [PATCH 3/8] xfs: Introduce writeback context for writepages Dave Chinner
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2016-02-10  8:47 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

We currently have code to cancel ioends being built because we
change bufferhead state as we build the ioend. On error, this needs
to be unwound and so we have cancelling code that walks the buffers
on the ioend chain and undoes these state changes.

However, the IO submission path already handles state changes for
buffers when a submission error occurs, so we don't really need a
separate cancel function to do this - we can simply submit the
ioend chain with the specific error and it will be cancelled rather
than submitted.

Hence we can remove the explicit cancel code and just rely on
submission to deal with the error correctly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 94 +++++++++++++++++++++++++++----------------------------
 1 file changed, 47 insertions(+), 47 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 00452cb..4fe74727 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -528,38 +528,6 @@ xfs_submit_ioend(
 }
 
 /*
- * Cancel submission of all buffer_heads so far in this endio.
- * Toss the endio too.  Only ever called for the initial page
- * in a writepage request, so only ever one page.
- */
-STATIC void
-xfs_cancel_ioend(
-	xfs_ioend_t		*ioend)
-{
-	xfs_ioend_t		*next;
-	struct buffer_head	*bh, *next_bh;
-
-	do {
-		next = ioend->io_list;
-		bh = ioend->io_buffer_head;
-		do {
-			next_bh = bh->b_private;
-			clear_buffer_async_write(bh);
-			/*
-			 * The unwritten flag is cleared when added to the
-			 * ioend. We're not submitting for I/O so mark the
-			 * buffer unwritten again for next time around.
-			 */
-			if (ioend->io_type == XFS_IO_UNWRITTEN)
-				set_buffer_unwritten(bh);
-			unlock_buffer(bh);
-		} while ((bh = next_bh) != NULL);
-
-		mempool_free(ioend, xfs_ioend_pool);
-	} while ((ioend = next) != NULL);
-}
-
-/*
  * Test to see if we've been building up a completion structure for
  * earlier buffers -- if so, we try to append to this ioend if we
  * can, otherwise we finish off any current ioend and start another.
@@ -931,6 +899,28 @@ out_invalidate:
 	return;
 }
 
+static int
+xfs_writepage_submit(
+	struct xfs_ioend	*ioend,
+	struct xfs_ioend	*iohead,
+	struct writeback_control *wbc,
+	int			status)
+{
+	struct blk_plug		plug;
+
+	/* Reserve log space if we might write beyond the on-disk inode size. */
+	if (!status && ioend && ioend->io_type != XFS_IO_UNWRITTEN &&
+	    xfs_ioend_is_append(ioend))
+		status = xfs_setfilesize_trans_alloc(ioend);
+
+	if (iohead) {
+		blk_start_plug(&plug);
+		xfs_submit_ioend(wbc, iohead, status);
+		blk_finish_plug(&plug);
+	}
+	return status;
+}
+
 /*
  * Write out a dirty page.
  *
@@ -1142,6 +1132,7 @@ xfs_vm_writepage(
 		return 0;
 
 	ASSERT(iohead);
+	ASSERT(err == 0);
 
 	/*
 	 * Any errors from this point onwards need tobe reported through the IO
@@ -1167,25 +1158,34 @@ xfs_vm_writepage(
 				  wbc, end_index);
 	}
 
-
-	/*
-	 * Reserve log space if we might write beyond the on-disk inode size.
-	 */
-	err = 0;
-	if (ioend->io_type != XFS_IO_UNWRITTEN && xfs_ioend_is_append(ioend))
-		err = xfs_setfilesize_trans_alloc(ioend);
-
-	xfs_submit_ioend(wbc, iohead, err);
-
+	xfs_writepage_submit(ioend, iohead, wbc, err);
 	return 0;
 
 error:
-	if (iohead)
-		xfs_cancel_ioend(iohead);
+	/*
+	 * On error, we have to fail the iohead here because we buffers locked
+	 * in the ioend chain. If we don't do this, we'll deadlock invalidating
+	 * the page as that tries to lock the buffers on the page. Also, because
+	 * we may have set pages under writeback, we have to run IO completion to
+	 * mark the error state of the IO appropriately, so we can't cancel the
+	 * ioend directly here. That means we have to mark this page as under
+	 * writeback if we included any buffers from it in the ioend chain.
+	 */
+	if (count)
+		xfs_start_page_writeback(page, 0, count);
+	xfs_writepage_submit(ioend, iohead, wbc, err);
 
-	xfs_aops_discard_page(page);
-	ClearPageUptodate(page);
-	unlock_page(page);
+	/*
+	 * We can only discard the page we had the IO error on if we haven't
+	 * included it in the ioend above. If it has already been errored out,
+	 * the it is unlocked and we can't touch it here.
+	 */
+	if (!count) {
+		xfs_aops_discard_page(page);
+		ClearPageUptodate(page);
+		unlock_page(page);
+	}
+	mapping_set_error(page->mapping, err);
 	return err;
 
 redirty:
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/8] xfs: Introduce writeback context for writepages
  2016-02-10  8:47 [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Dave Chinner
  2016-02-10  8:47 ` [PATCH 1/8] xfs: remove nonblocking mode from xfs_vm_writepage Dave Chinner
  2016-02-10  8:47 ` [PATCH 2/8] xfs: remove xfs_cancel_ioend Dave Chinner
@ 2016-02-10  8:47 ` Dave Chinner
  2016-02-10 11:31   ` Christoph Hellwig
  2016-02-10  8:47 ` [PATCH 4/8] xfs: xfs_cluster_write is redundant Dave Chinner
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2016-02-10  8:47 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_vm_writepages() calls generic_writepages to writeback a range of
a file, but then xfs_vm_writepage() clusters pages itself as it does
not have any context it can pass between->writepage calls from
__write_cache_pages().

Introduce a writeback context for xfs_vm_writepages() and call
__write_cache_pages directly with our own writepage callback so that
we can pass that context to each writepage invocation. This
encapsulates the current mapping, whether it is valid or not, the
current ioend and it's IO type and the ioend chain being built.

This requires us to move the ioend submission up to the level where
the writepage context is declared. This does mean we do not submit
IO until we packaged the entire writeback range, but with the block
plugging in the writepages call this is the way IO is submitted,
anyway.

It also means that we need to handle discontiguous page ranges.  If
the pages sent down by write_cache_pages to the writepage callback
are discontiguous, we need to detect this and put each discontiguous
page range into individual ioends. This is needed to ensure that the
ioend accurately represents the range of the file that it covers so
that file size updates during IO completion set the size correctly.
Failure to take into account the discontiguous ranges results in
files being too small when writeback patterns are non-sequential.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 219 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 114 insertions(+), 105 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 4fe74727..c93de73 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -40,6 +40,18 @@
 #define XFS_DIO_FLAG_UNWRITTEN	(1 << 0)
 #define XFS_DIO_FLAG_APPEND	(1 << 1)
 
+/*
+ * structure owned by writepages passed to individual writepage calls
+ */
+struct xfs_writepage_ctx {
+	struct xfs_bmbt_irec    imap;
+	bool			imap_valid;
+	unsigned int		io_type;
+	struct xfs_ioend	*iohead;
+	struct xfs_ioend	*ioend;
+	sector_t		last_block;
+};
+
 void
 xfs_count_page_state(
 	struct page		*page,
@@ -341,7 +353,7 @@ xfs_map_blocks(
 	return 0;
 }
 
-STATIC int
+STATIC bool
 xfs_imap_valid(
 	struct inode		*inode,
 	struct xfs_bmbt_irec	*imap,
@@ -538,29 +550,27 @@ xfs_add_to_ioend(
 	struct inode		*inode,
 	struct buffer_head	*bh,
 	xfs_off_t		offset,
-	unsigned int		type,
-	xfs_ioend_t		**result,
-	int			need_ioend)
+	struct xfs_writepage_ctx *wpc)
 {
-	xfs_ioend_t		*ioend = *result;
-
-	if (!ioend || need_ioend || type != ioend->io_type) {
-		xfs_ioend_t	*previous = *result;
-
-		ioend = xfs_alloc_ioend(inode, type);
-		ioend->io_offset = offset;
-		ioend->io_buffer_head = bh;
-		ioend->io_buffer_tail = bh;
-		if (previous)
-			previous->io_list = ioend;
-		*result = ioend;
+	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
+	    bh->b_blocknr != wpc->last_block + 1) {
+		struct xfs_ioend	*new;
+
+		new = xfs_alloc_ioend(inode, wpc->io_type);
+		new->io_offset = offset;
+		new->io_buffer_head = bh;
+		new->io_buffer_tail = bh;
+		if (wpc->ioend)
+			wpc->ioend->io_list = new;
+		wpc->ioend = new;
 	} else {
-		ioend->io_buffer_tail->b_private = bh;
-		ioend->io_buffer_tail = bh;
+		wpc->ioend->io_buffer_tail->b_private = bh;
+		wpc->ioend->io_buffer_tail = bh;
 	}
 
 	bh->b_private = NULL;
-	ioend->io_size += bh->b_size;
+	wpc->ioend->io_size += bh->b_size;
+	wpc->last_block = bh->b_blocknr;
 }
 
 STATIC void
@@ -657,17 +667,15 @@ xfs_convert_page(
 	struct inode		*inode,
 	struct page		*page,
 	loff_t			tindex,
-	struct xfs_bmbt_irec	*imap,
-	xfs_ioend_t		**ioendp,
+	struct xfs_writepage_ctx *wpc,
 	struct writeback_control *wbc)
 {
 	struct buffer_head	*bh, *head;
 	xfs_off_t		end_offset;
 	unsigned long		p_offset;
-	unsigned int		type;
 	int			len, page_dirty;
 	int			count = 0, done = 0, uptodate = 1;
- 	xfs_off_t		offset = page_offset(page);
+	xfs_off_t		offset = page_offset(page);
 
 	if (page->index != tindex)
 		goto fail;
@@ -677,7 +685,7 @@ xfs_convert_page(
 		goto fail_unlock_page;
 	if (page->mapping != inode->i_mapping)
 		goto fail_unlock_page;
-	if (!xfs_check_page_type(page, (*ioendp)->io_type, false))
+	if (!xfs_check_page_type(page, wpc->ioend->io_type, false))
 		goto fail_unlock_page;
 
 	/*
@@ -713,7 +721,7 @@ xfs_convert_page(
 	 * writeback.  Hence for more optimal IO patterns, we should always
 	 * avoid partial page writeback due to multiple mappings on a page here.
 	 */
-	if (!xfs_imap_valid(inode, imap, end_offset))
+	if (!xfs_imap_valid(inode, &wpc->imap, end_offset))
 		goto fail_unlock_page;
 
 	len = 1 << inode->i_blkbits;
@@ -745,23 +753,22 @@ xfs_convert_page(
 		if (buffer_unwritten(bh) || buffer_delay(bh) ||
 		    buffer_mapped(bh)) {
 			if (buffer_unwritten(bh))
-				type = XFS_IO_UNWRITTEN;
+				wpc->io_type = XFS_IO_UNWRITTEN;
 			else if (buffer_delay(bh))
-				type = XFS_IO_DELALLOC;
+				wpc->io_type = XFS_IO_DELALLOC;
 			else
-				type = XFS_IO_OVERWRITE;
+				wpc->io_type = XFS_IO_OVERWRITE;
 
 			/*
 			 * imap should always be valid because of the above
 			 * partial page end_offset check on the imap.
 			 */
-			ASSERT(xfs_imap_valid(inode, imap, offset));
+			ASSERT(xfs_imap_valid(inode, &wpc->imap, offset));
 
 			lock_buffer(bh);
-			if (type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, type,
-					 ioendp, done);
+			if (wpc->io_type != XFS_IO_OVERWRITE)
+				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
+			xfs_add_to_ioend(inode, bh, offset, wpc);
 
 			page_dirty--;
 			count++;
@@ -796,8 +803,7 @@ STATIC void
 xfs_cluster_write(
 	struct inode		*inode,
 	pgoff_t			tindex,
-	struct xfs_bmbt_irec	*imap,
-	xfs_ioend_t		**ioendp,
+	struct xfs_writepage_ctx *wpc,
 	struct writeback_control *wbc,
 	pgoff_t			tlast)
 {
@@ -813,7 +819,7 @@ xfs_cluster_write(
 
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			done = xfs_convert_page(inode, pvec.pages[i], tindex++,
-					imap, ioendp, wbc);
+						wpc, wbc);
 			if (done)
 				break;
 		}
@@ -901,21 +907,20 @@ out_invalidate:
 
 static int
 xfs_writepage_submit(
-	struct xfs_ioend	*ioend,
-	struct xfs_ioend	*iohead,
+	struct xfs_writepage_ctx *wpc,
 	struct writeback_control *wbc,
 	int			status)
 {
 	struct blk_plug		plug;
 
 	/* Reserve log space if we might write beyond the on-disk inode size. */
-	if (!status && ioend && ioend->io_type != XFS_IO_UNWRITTEN &&
-	    xfs_ioend_is_append(ioend))
-		status = xfs_setfilesize_trans_alloc(ioend);
+	if (!status && wpc->ioend && wpc->ioend->io_type != XFS_IO_UNWRITTEN &&
+	    xfs_ioend_is_append(wpc->ioend))
+		status = xfs_setfilesize_trans_alloc(wpc->ioend);
 
-	if (iohead) {
+	if (wpc->iohead) {
 		blk_start_plug(&plug);
-		xfs_submit_ioend(wbc, iohead, status);
+		xfs_submit_ioend(wbc, wpc->iohead, status);
 		blk_finish_plug(&plug);
 	}
 	return status;
@@ -930,20 +935,19 @@ xfs_writepage_submit(
  * For any other dirty buffer heads on the page we should flush them.
  */
 STATIC int
-xfs_vm_writepage(
+xfs_do_writepage(
 	struct page		*page,
-	struct writeback_control *wbc)
+	struct writeback_control *wbc,
+	void			*data)
 {
+	struct xfs_writepage_ctx *wpc = data;
 	struct inode		*inode = page->mapping->host;
 	struct buffer_head	*bh, *head;
-	struct xfs_bmbt_irec	imap;
-	xfs_ioend_t		*ioend = NULL, *iohead = NULL;
 	loff_t			offset;
-	unsigned int		type;
 	__uint64_t              end_offset;
 	pgoff_t                 end_index, last_index;
 	ssize_t			len;
-	int			err, imap_valid = 0, uptodate = 1;
+	int			err, uptodate = 1;
 	int			count = 0;
 
 	trace_xfs_writepage(inode, page, 0, 0);
@@ -1042,11 +1046,8 @@ xfs_vm_writepage(
 
 	bh = head = page_buffers(page);
 	offset = page_offset(page);
-	type = XFS_IO_OVERWRITE;
 
 	do {
-		int new_ioend = 0;
-
 		if (offset >= end_offset)
 			break;
 		if (!buffer_uptodate(bh))
@@ -1059,24 +1060,24 @@ xfs_vm_writepage(
 		 * buffers covering holes here.
 		 */
 		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
-			imap_valid = 0;
+			wpc->imap_valid = false;
 			continue;
 		}
 
 		if (buffer_unwritten(bh)) {
-			if (type != XFS_IO_UNWRITTEN) {
-				type = XFS_IO_UNWRITTEN;
-				imap_valid = 0;
+			if (wpc->io_type != XFS_IO_UNWRITTEN) {
+				wpc->io_type = XFS_IO_UNWRITTEN;
+				wpc->imap_valid = false;
 			}
 		} else if (buffer_delay(bh)) {
-			if (type != XFS_IO_DELALLOC) {
-				type = XFS_IO_DELALLOC;
-				imap_valid = 0;
+			if (wpc->io_type != XFS_IO_DELALLOC) {
+				wpc->io_type = XFS_IO_DELALLOC;
+				wpc->imap_valid = false;
 			}
 		} else if (buffer_uptodate(bh)) {
-			if (type != XFS_IO_OVERWRITE) {
-				type = XFS_IO_OVERWRITE;
-				imap_valid = 0;
+			if (wpc->io_type != XFS_IO_OVERWRITE) {
+				wpc->io_type = XFS_IO_OVERWRITE;
+				wpc->imap_valid = false;
 			}
 		} else {
 			if (PageUptodate(page))
@@ -1087,38 +1088,29 @@ xfs_vm_writepage(
 			 * subsequent writeable buffers into a new
 			 * ioend.
 			 */
-			imap_valid = 0;
+			wpc->imap_valid = 0;
 			continue;
 		}
 
-		if (imap_valid)
-			imap_valid = xfs_imap_valid(inode, &imap, offset);
-		if (!imap_valid) {
-			/*
-			 * If we didn't have a valid mapping then we need to
-			 * put the new mapping into a separate ioend structure.
-			 * This ensures non-contiguous extents always have
-			 * separate ioends, which is particularly important
-			 * for unwritten extent conversion at I/O completion
-			 * time.
-			 */
-			new_ioend = 1;
-			err = xfs_map_blocks(inode, offset, &imap, type);
+		if (wpc->imap_valid)
+			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap, offset);
+		if (!wpc->imap_valid) {
+			err = xfs_map_blocks(inode, offset, &wpc->imap,
+					     wpc->io_type);
 			if (err)
 				goto error;
-			imap_valid = xfs_imap_valid(inode, &imap, offset);
+			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap, offset);
 		}
-		if (imap_valid) {
+		if (wpc->imap_valid) {
 			lock_buffer(bh);
-			if (type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, &imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, type, &ioend,
-					 new_ioend);
+			if (wpc->io_type != XFS_IO_OVERWRITE)
+				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
+			xfs_add_to_ioend(inode, bh, offset, wpc);
 			count++;
 		}
 
-		if (!iohead)
-			iohead = ioend;
+		if (!wpc->iohead)
+			wpc->iohead = wpc->ioend;
 
 	} while (offset += len, ((bh = bh->b_this_page) != head));
 
@@ -1128,10 +1120,10 @@ xfs_vm_writepage(
 	xfs_start_page_writeback(page, 1, count);
 
 	/* if there is no IO to be submitted for this page, we are done */
-	if (!ioend)
+	if (!count)
 		return 0;
 
-	ASSERT(iohead);
+	ASSERT(wpc->iohead);
 	ASSERT(err == 0);
 
 	/*
@@ -1139,10 +1131,10 @@ xfs_vm_writepage(
 	 * completion path as we have marked the initial page as under writeback
 	 * and unlocked it.
 	 */
-	if (imap_valid) {
+	if (wpc->imap_valid) {
 		xfs_off_t		end_index;
 
-		end_index = imap.br_startoff + imap.br_blockcount;
+		end_index = wpc->imap.br_startoff + wpc->imap.br_blockcount;
 
 		/* to bytes */
 		end_index <<= inode->i_blkbits;
@@ -1154,11 +1146,8 @@ xfs_vm_writepage(
 		if (end_index > last_index)
 			end_index = last_index;
 
-		xfs_cluster_write(inode, page->index + 1, &imap, &ioend,
-				  wbc, end_index);
+		xfs_cluster_write(inode, page->index + 1, wpc, wbc, end_index);
 	}
-
-	xfs_writepage_submit(ioend, iohead, wbc, err);
 	return 0;
 
 error:
@@ -1166,21 +1155,21 @@ error:
 	 * On error, we have to fail the iohead here because we buffers locked
 	 * in the ioend chain. If we don't do this, we'll deadlock invalidating
 	 * the page as that tries to lock the buffers on the page. Also, because
-	 * we may have set pages under writeback, we have to run IO completion to
-	 * mark the error state of the IO appropriately, so we can't cancel the
-	 * ioend directly here. That means we have to mark this page as under
-	 * writeback if we included any buffers from it in the ioend chain.
+	 * we may have set pages under writeback, we have to make sure we run
+	 * IO completion to mark the error state of the IO appropriately, so we
+	 * can't cancel the ioend directly here. That means we have to mark this
+	 * page as under writeback if we included any buffers from it in the
+	 * ioend chain so that completion treats it correctly.
+	 *
+	 * If we didn't include the page in the ioend, then we can simply
+	 * discard and unlock it as there are no other users of the page or it's
+	 * buffers right now. The caller will still need to trigger submission
+	 * of outstanding ioends on the writepage context so they are treated
+	 * correctly on error.
 	 */
 	if (count)
 		xfs_start_page_writeback(page, 0, count);
-	xfs_writepage_submit(ioend, iohead, wbc, err);
-
-	/*
-	 * We can only discard the page we had the IO error on if we haven't
-	 * included it in the ioend above. If it has already been errored out,
-	 * the it is unlocked and we can't touch it here.
-	 */
-	if (!count) {
+	else {
 		xfs_aops_discard_page(page);
 		ClearPageUptodate(page);
 		unlock_page(page);
@@ -1195,12 +1184,32 @@ redirty:
 }
 
 STATIC int
+xfs_vm_writepage(
+	struct page		*page,
+	struct writeback_control *wbc)
+{
+	struct xfs_writepage_ctx wpc = {
+		.io_type = XFS_IO_OVERWRITE,
+	};
+	int			ret;
+
+	ret = xfs_do_writepage(page, wbc, &wpc);
+	return xfs_writepage_submit(&wpc, wbc, ret);
+}
+
+STATIC int
 xfs_vm_writepages(
 	struct address_space	*mapping,
 	struct writeback_control *wbc)
 {
+	struct xfs_writepage_ctx wpc = {
+		.io_type = XFS_IO_OVERWRITE,
+	};
+	int			ret;
+
 	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
-	return generic_writepages(mapping, wbc);
+	ret = write_cache_pages(mapping, wbc, xfs_do_writepage, &wpc);
+	return xfs_writepage_submit(&wpc, wbc, ret);
 }
 
 /*
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/8] xfs: xfs_cluster_write is redundant
  2016-02-10  8:47 [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Dave Chinner
                   ` (2 preceding siblings ...)
  2016-02-10  8:47 ` [PATCH 3/8] xfs: Introduce writeback context for writepages Dave Chinner
@ 2016-02-10  8:47 ` Dave Chinner
  2016-02-10  8:47 ` [PATCH 5/8] xfs: factor mapping out of xfs_do_writepage Dave Chinner
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2016-02-10  8:47 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_cluster_write() is not necessary now that xfs_vm_writepages()
aggregates writepage calls across a single mapping. This means we no
longer need to do page lookups in xfs_cluster_write, so writeback
only needs to look up th epage cache once per page being written.
This also removes a large amount of mostly duplicate code between
xfs_do_writepage() and xfs_convert_page().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_aops.c | 214 ++----------------------------------------------------
 1 file changed, 6 insertions(+), 208 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index c93de73..c9e14d3 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -656,179 +656,6 @@ xfs_check_page_type(
 	return false;
 }
 
-/*
- * Allocate & map buffers for page given the extent map. Write it out.
- * except for the original page of a writepage, this is called on
- * delalloc/unwritten pages only, for the original page it is possible
- * that the page has no mapping at all.
- */
-STATIC int
-xfs_convert_page(
-	struct inode		*inode,
-	struct page		*page,
-	loff_t			tindex,
-	struct xfs_writepage_ctx *wpc,
-	struct writeback_control *wbc)
-{
-	struct buffer_head	*bh, *head;
-	xfs_off_t		end_offset;
-	unsigned long		p_offset;
-	int			len, page_dirty;
-	int			count = 0, done = 0, uptodate = 1;
-	xfs_off_t		offset = page_offset(page);
-
-	if (page->index != tindex)
-		goto fail;
-	if (!trylock_page(page))
-		goto fail;
-	if (PageWriteback(page))
-		goto fail_unlock_page;
-	if (page->mapping != inode->i_mapping)
-		goto fail_unlock_page;
-	if (!xfs_check_page_type(page, wpc->ioend->io_type, false))
-		goto fail_unlock_page;
-
-	/*
-	 * page_dirty is initially a count of buffers on the page before
-	 * EOF and is decremented as we move each into a cleanable state.
-	 *
-	 * Derivation:
-	 *
-	 * End offset is the highest offset that this page should represent.
-	 * If we are on the last page, (end_offset & (PAGE_CACHE_SIZE - 1))
-	 * will evaluate non-zero and be less than PAGE_CACHE_SIZE and
-	 * hence give us the correct page_dirty count. On any other page,
-	 * it will be zero and in that case we need page_dirty to be the
-	 * count of buffers on the page.
-	 */
-	end_offset = min_t(unsigned long long,
-			(xfs_off_t)(page->index + 1) << PAGE_CACHE_SHIFT,
-			i_size_read(inode));
-
-	/*
-	 * If the current map does not span the entire page we are about to try
-	 * to write, then give up. The only way we can write a page that spans
-	 * multiple mappings in a single writeback iteration is via the
-	 * xfs_vm_writepage() function. Data integrity writeback requires the
-	 * entire page to be written in a single attempt, otherwise the part of
-	 * the page we don't write here doesn't get written as part of the data
-	 * integrity sync.
-	 *
-	 * For normal writeback, we also don't attempt to write partial pages
-	 * here as it simply means that write_cache_pages() will see it under
-	 * writeback and ignore the page until some point in the future, at
-	 * which time this will be the only page in the file that needs
-	 * writeback.  Hence for more optimal IO patterns, we should always
-	 * avoid partial page writeback due to multiple mappings on a page here.
-	 */
-	if (!xfs_imap_valid(inode, &wpc->imap, end_offset))
-		goto fail_unlock_page;
-
-	len = 1 << inode->i_blkbits;
-	p_offset = min_t(unsigned long, end_offset & (PAGE_CACHE_SIZE - 1),
-					PAGE_CACHE_SIZE);
-	p_offset = p_offset ? roundup(p_offset, len) : PAGE_CACHE_SIZE;
-	page_dirty = p_offset / len;
-
-	/*
-	 * The moment we find a buffer that doesn't match our current type
-	 * specification or can't be written, abort the loop and start
-	 * writeback. As per the above xfs_imap_valid() check, only
-	 * xfs_vm_writepage() can handle partial page writeback fully - we are
-	 * limited here to the buffers that are contiguous with the current
-	 * ioend, and hence a buffer we can't write breaks that contiguity and
-	 * we have to defer the rest of the IO to xfs_vm_writepage().
-	 */
-	bh = head = page_buffers(page);
-	do {
-		if (offset >= end_offset)
-			break;
-		if (!buffer_uptodate(bh))
-			uptodate = 0;
-		if (!(PageUptodate(page) || buffer_uptodate(bh))) {
-			done = 1;
-			break;
-		}
-
-		if (buffer_unwritten(bh) || buffer_delay(bh) ||
-		    buffer_mapped(bh)) {
-			if (buffer_unwritten(bh))
-				wpc->io_type = XFS_IO_UNWRITTEN;
-			else if (buffer_delay(bh))
-				wpc->io_type = XFS_IO_DELALLOC;
-			else
-				wpc->io_type = XFS_IO_OVERWRITE;
-
-			/*
-			 * imap should always be valid because of the above
-			 * partial page end_offset check on the imap.
-			 */
-			ASSERT(xfs_imap_valid(inode, &wpc->imap, offset));
-
-			lock_buffer(bh);
-			if (wpc->io_type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, wpc);
-
-			page_dirty--;
-			count++;
-		} else {
-			done = 1;
-			break;
-		}
-	} while (offset += len, (bh = bh->b_this_page) != head);
-
-	if (uptodate && bh == head)
-		SetPageUptodate(page);
-
-	if (count) {
-		if (--wbc->nr_to_write <= 0 &&
-		    wbc->sync_mode == WB_SYNC_NONE)
-			done = 1;
-	}
-	xfs_start_page_writeback(page, !page_dirty, count);
-
-	return done;
- fail_unlock_page:
-	unlock_page(page);
- fail:
-	return 1;
-}
-
-/*
- * Convert & write out a cluster of pages in the same extent as defined
- * by mp and following the start page.
- */
-STATIC void
-xfs_cluster_write(
-	struct inode		*inode,
-	pgoff_t			tindex,
-	struct xfs_writepage_ctx *wpc,
-	struct writeback_control *wbc,
-	pgoff_t			tlast)
-{
-	struct pagevec		pvec;
-	int			done = 0, i;
-
-	pagevec_init(&pvec, 0);
-	while (!done && tindex <= tlast) {
-		unsigned len = min_t(pgoff_t, PAGEVEC_SIZE, tlast - tindex + 1);
-
-		if (!pagevec_lookup(&pvec, inode->i_mapping, tindex, len))
-			break;
-
-		for (i = 0; i < pagevec_count(&pvec); i++) {
-			done = xfs_convert_page(inode, pvec.pages[i], tindex++,
-						wpc, wbc);
-			if (done)
-				break;
-		}
-
-		pagevec_release(&pvec);
-		cond_resched();
-	}
-}
-
 STATIC void
 xfs_vm_invalidatepage(
 	struct page		*page,
@@ -945,7 +772,7 @@ xfs_do_writepage(
 	struct buffer_head	*bh, *head;
 	loff_t			offset;
 	__uint64_t              end_offset;
-	pgoff_t                 end_index, last_index;
+	pgoff_t                 end_index;
 	ssize_t			len;
 	int			err, uptodate = 1;
 	int			count = 0;
@@ -975,12 +802,9 @@ xfs_do_writepage(
 	if (WARN_ON_ONCE(current->flags & PF_FSTRANS))
 		goto redirty;
 
-	/* Is this page beyond the end of the file? */
-	offset = i_size_read(inode);
-	end_index = offset >> PAGE_CACHE_SHIFT;
-	last_index = (offset - 1) >> PAGE_CACHE_SHIFT;
-
 	/*
+	 * Is this page beyond the end of the file?
+	 *
 	 * The page index is less than the end_index, adjust the end_offset
 	 * to the highest offset that this page should represent.
 	 * -----------------------------------------------------
@@ -991,6 +815,8 @@ xfs_do_writepage(
 	 * |     desired writeback range    |      see else    |
 	 * ---------------------------------^------------------|
 	 */
+	offset = i_size_read(inode);
+	end_index = offset >> PAGE_CACHE_SHIFT;
 	if (page->index < end_index)
 		end_offset = (xfs_off_t)(page->index + 1) << PAGE_CACHE_SHIFT;
 	else {
@@ -1119,35 +945,7 @@ xfs_do_writepage(
 
 	xfs_start_page_writeback(page, 1, count);
 
-	/* if there is no IO to be submitted for this page, we are done */
-	if (!count)
-		return 0;
-
-	ASSERT(wpc->iohead);
-	ASSERT(err == 0);
-
-	/*
-	 * Any errors from this point onwards need tobe reported through the IO
-	 * completion path as we have marked the initial page as under writeback
-	 * and unlocked it.
-	 */
-	if (wpc->imap_valid) {
-		xfs_off_t		end_index;
-
-		end_index = wpc->imap.br_startoff + wpc->imap.br_blockcount;
-
-		/* to bytes */
-		end_index <<= inode->i_blkbits;
-
-		/* to pages */
-		end_index = (end_index - 1) >> PAGE_CACHE_SHIFT;
-
-		/* check against file size */
-		if (end_index > last_index)
-			end_index = last_index;
-
-		xfs_cluster_write(inode, page->index + 1, wpc, wbc, end_index);
-	}
+	ASSERT(wpc->iohead || !count);
 	return 0;
 
 error:
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/8] xfs: factor mapping out of xfs_do_writepage
  2016-02-10  8:47 [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Dave Chinner
                   ` (3 preceding siblings ...)
  2016-02-10  8:47 ` [PATCH 4/8] xfs: xfs_cluster_write is redundant Dave Chinner
@ 2016-02-10  8:47 ` Dave Chinner
  2016-02-10  8:47 ` [PATCH 6/8] xfs: don't chain ioends during writepage submission Dave Chinner
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2016-02-10  8:47 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Separate out the bufferhead based mapping from the writepage code so
that we have a clear separation of the page operations and the
bufferhead state.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_aops.c | 232 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 122 insertions(+), 110 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index c9e14d3..95cfb27 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -753,6 +753,127 @@ xfs_writepage_submit(
 	return status;
 }
 
+static int
+xfs_writepage_map(
+	struct xfs_writepage_ctx *wpc,
+	struct inode		*inode,
+	struct page		*page,
+	loff_t			offset,
+	__uint64_t              end_offset)
+{
+	struct buffer_head	*bh, *head;
+	ssize_t			len = 1 << inode->i_blkbits;
+	int			error = 0;
+	int			uptodate = 1;
+	int			count = 0;
+
+	bh = head = page_buffers(page);
+	offset = page_offset(page);
+
+	do {
+		if (offset >= end_offset)
+			break;
+		if (!buffer_uptodate(bh))
+			uptodate = 0;
+
+		/*
+		 * set_page_dirty dirties all buffers in a page, independent
+		 * of their state.  The dirty state however is entirely
+		 * meaningless for holes (!mapped && uptodate), so skip
+		 * buffers covering holes here.
+		 */
+		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
+			wpc->imap_valid = false;
+			continue;
+		}
+
+		if (buffer_unwritten(bh)) {
+			if (wpc->io_type != XFS_IO_UNWRITTEN) {
+				wpc->io_type = XFS_IO_UNWRITTEN;
+				wpc->imap_valid = false;
+			}
+		} else if (buffer_delay(bh)) {
+			if (wpc->io_type != XFS_IO_DELALLOC) {
+				wpc->io_type = XFS_IO_DELALLOC;
+				wpc->imap_valid = false;
+			}
+		} else if (buffer_uptodate(bh)) {
+			if (wpc->io_type != XFS_IO_OVERWRITE) {
+				wpc->io_type = XFS_IO_OVERWRITE;
+				wpc->imap_valid = false;
+			}
+		} else {
+			if (PageUptodate(page))
+				ASSERT(buffer_mapped(bh));
+			/*
+			 * This buffer is not uptodate and will not be
+			 * written to disk.  Ensure that we will put any
+			 * subsequent writeable buffers into a new
+			 * ioend.
+			 */
+			wpc->imap_valid = false;
+			continue;
+		}
+
+		if (wpc->imap_valid)
+			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
+							 offset);
+		if (!wpc->imap_valid) {
+			error = xfs_map_blocks(inode, offset, &wpc->imap,
+					     wpc->io_type);
+			if (error)
+				goto out_error;
+			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
+							 offset);
+		}
+		if (wpc->imap_valid) {
+			lock_buffer(bh);
+			if (wpc->io_type != XFS_IO_OVERWRITE)
+				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
+			xfs_add_to_ioend(inode, bh, offset, wpc);
+			count++;
+		}
+
+		if (!wpc->iohead)
+			wpc->iohead = wpc->ioend;
+
+	} while (offset += len, ((bh = bh->b_this_page) != head));
+
+	if (uptodate && bh == head)
+		SetPageUptodate(page);
+
+	xfs_start_page_writeback(page, 1, count);
+	ASSERT(wpc->iohead || !count);
+	return 0;
+
+out_error:
+	/*
+	 * On error, we have to fail the iohead here because we locked buffers
+	 * in the ioend chain. If we don't do this, we'll deadlock invalidating
+	 * the page as that tries to lock the buffers on the page. Also, because
+	 * we may have set pages under writeback, we have to make sure we run
+	 * IO completion to mark the error state of the IO appropriately, so we
+	 * can't cancel the ioend directly here. That means we have to mark this
+	 * page as under writeback if we included any buffers from it in the
+	 * ioend chain so that completion treats it correctly.
+	 *
+	 * If we didn't include the page in the ioend, then we can simply
+	 * discard and unlock it as there are no other users of the page or it's
+	 * buffers right now. The caller will still need to trigger submission
+	 * of outstanding ioends on the writepage context so they are treated
+	 * correctly on error.
+	 */
+	if (count)
+		xfs_start_page_writeback(page, 0, count);
+	else {
+		xfs_aops_discard_page(page);
+		ClearPageUptodate(page);
+		unlock_page(page);
+	}
+	mapping_set_error(page->mapping, error);
+	return error;
+}
+
 /*
  * Write out a dirty page.
  *
@@ -769,13 +890,9 @@ xfs_do_writepage(
 {
 	struct xfs_writepage_ctx *wpc = data;
 	struct inode		*inode = page->mapping->host;
-	struct buffer_head	*bh, *head;
 	loff_t			offset;
 	__uint64_t              end_offset;
 	pgoff_t                 end_index;
-	ssize_t			len;
-	int			err, uptodate = 1;
-	int			count = 0;
 
 	trace_xfs_writepage(inode, page, 0, 0);
 
@@ -868,112 +985,7 @@ xfs_do_writepage(
 		end_offset = offset;
 	}
 
-	len = 1 << inode->i_blkbits;
-
-	bh = head = page_buffers(page);
-	offset = page_offset(page);
-
-	do {
-		if (offset >= end_offset)
-			break;
-		if (!buffer_uptodate(bh))
-			uptodate = 0;
-
-		/*
-		 * set_page_dirty dirties all buffers in a page, independent
-		 * of their state.  The dirty state however is entirely
-		 * meaningless for holes (!mapped && uptodate), so skip
-		 * buffers covering holes here.
-		 */
-		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
-			wpc->imap_valid = false;
-			continue;
-		}
-
-		if (buffer_unwritten(bh)) {
-			if (wpc->io_type != XFS_IO_UNWRITTEN) {
-				wpc->io_type = XFS_IO_UNWRITTEN;
-				wpc->imap_valid = false;
-			}
-		} else if (buffer_delay(bh)) {
-			if (wpc->io_type != XFS_IO_DELALLOC) {
-				wpc->io_type = XFS_IO_DELALLOC;
-				wpc->imap_valid = false;
-			}
-		} else if (buffer_uptodate(bh)) {
-			if (wpc->io_type != XFS_IO_OVERWRITE) {
-				wpc->io_type = XFS_IO_OVERWRITE;
-				wpc->imap_valid = false;
-			}
-		} else {
-			if (PageUptodate(page))
-				ASSERT(buffer_mapped(bh));
-			/*
-			 * This buffer is not uptodate and will not be
-			 * written to disk.  Ensure that we will put any
-			 * subsequent writeable buffers into a new
-			 * ioend.
-			 */
-			wpc->imap_valid = 0;
-			continue;
-		}
-
-		if (wpc->imap_valid)
-			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap, offset);
-		if (!wpc->imap_valid) {
-			err = xfs_map_blocks(inode, offset, &wpc->imap,
-					     wpc->io_type);
-			if (err)
-				goto error;
-			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap, offset);
-		}
-		if (wpc->imap_valid) {
-			lock_buffer(bh);
-			if (wpc->io_type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, wpc);
-			count++;
-		}
-
-		if (!wpc->iohead)
-			wpc->iohead = wpc->ioend;
-
-	} while (offset += len, ((bh = bh->b_this_page) != head));
-
-	if (uptodate && bh == head)
-		SetPageUptodate(page);
-
-	xfs_start_page_writeback(page, 1, count);
-
-	ASSERT(wpc->iohead || !count);
-	return 0;
-
-error:
-	/*
-	 * On error, we have to fail the iohead here because we buffers locked
-	 * in the ioend chain. If we don't do this, we'll deadlock invalidating
-	 * the page as that tries to lock the buffers on the page. Also, because
-	 * we may have set pages under writeback, we have to make sure we run
-	 * IO completion to mark the error state of the IO appropriately, so we
-	 * can't cancel the ioend directly here. That means we have to mark this
-	 * page as under writeback if we included any buffers from it in the
-	 * ioend chain so that completion treats it correctly.
-	 *
-	 * If we didn't include the page in the ioend, then we can simply
-	 * discard and unlock it as there are no other users of the page or it's
-	 * buffers right now. The caller will still need to trigger submission
-	 * of outstanding ioends on the writepage context so they are treated
-	 * correctly on error.
-	 */
-	if (count)
-		xfs_start_page_writeback(page, 0, count);
-	else {
-		xfs_aops_discard_page(page);
-		ClearPageUptodate(page);
-		unlock_page(page);
-	}
-	mapping_set_error(page->mapping, err);
-	return err;
+	return xfs_writepage_map(wpc, inode, page, offset, end_offset);
 
 redirty:
 	redirty_page_for_writepage(wbc, page);
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/8] xfs: don't chain ioends during writepage submission
  2016-02-10  8:47 [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Dave Chinner
                   ` (4 preceding siblings ...)
  2016-02-10  8:47 ` [PATCH 5/8] xfs: factor mapping out of xfs_do_writepage Dave Chinner
@ 2016-02-10  8:47 ` Dave Chinner
  2016-02-10 11:36   ` Christoph Hellwig
  2016-02-10  8:47 ` [PATCH 7/8] [RFC] xfs: build bios directly in xfs_add_to_ioend Dave Chinner
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2016-02-10  8:47 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Currently we can build a long ioend chain during ->writepages that
gets attached to the writepage context. IO submission only then
occurs when we finish all the writepage processing. This means we
can have many ioends allocated and pending, and this violates the
mempool guarantees that we need to give about forwards progress.
i.e. we really should only have one ioend being built at a time,
otherwise we may drain the mempool trying to allocate a new ioend
and that blocks submission, completion and freeing of ioends that
are already in progress.

To prevent this situation from happening, we need to submit ioends
for IO as soon as they are ready for dispatch rather than queuing
them for later submission. This means the ioends have bios built
immediately and they get queued on any plug that is current active.
Hence if we schedule away from writeback, the ioends that have been
built will make forwards progress due to the plug flushing on
context switch. This will also prevent context switches from
creating unnecessary IO submission latency.

We can't completely avoid having nested IO allocation - when we have
a block size smaller than a page size, we still need to hold the
ioend submission until after we have marked the current page dirty.
Hence we may need multiple ioends to be held while the current page
is completely mapped and made ready for IO dispatch. We cannot avoid
this problem - the current code already has this ioend chaining
within a page so we can mostly ignore that it occurs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 233 ++++++++++++++++++++++++++++--------------------------
 fs/xfs/xfs_aops.h |   2 +-
 2 files changed, 120 insertions(+), 115 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 95cfb27..210f18e 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -47,7 +47,6 @@ struct xfs_writepage_ctx {
 	struct xfs_bmbt_irec    imap;
 	bool			imap_valid;
 	unsigned int		io_type;
-	struct xfs_ioend	*iohead;
 	struct xfs_ioend	*ioend;
 	sector_t		last_block;
 };
@@ -283,7 +282,7 @@ xfs_alloc_ioend(
 	 */
 	atomic_set(&ioend->io_remaining, 1);
 	ioend->io_error = 0;
-	ioend->io_list = NULL;
+	INIT_LIST_HEAD(&ioend->io_list);
 	ioend->io_type = type;
 	ioend->io_inode = inode;
 	ioend->io_buffer_head = NULL;
@@ -458,110 +457,90 @@ static inline int xfs_bio_add_buffer(struct bio *bio, struct buffer_head *bh)
 }
 
 /*
- * Submit all of the bios for all of the ioends we have saved up, covering the
- * initial writepage page and also any probed pages.
- *
- * Because we may have multiple ioends spanning a page, we need to start
- * writeback on all the buffers before we submit them for I/O. If we mark the
- * buffers as we got, then we can end up with a page that only has buffers
- * marked async write and I/O complete on can occur before we mark the other
- * buffers async write.
- *
- * The end result of this is that we trip a bug in end_page_writeback() because
- * we call it twice for the one page as the code in end_buffer_async_write()
- * assumes that all buffers on the page are started at the same time.
- *
- * The fix is two passes across the ioend list - one to start writeback on the
- * buffer_heads, and then submit them for I/O on the second pass.
+ * Submit all of the bios for an ioend. We are only passed a single ioend at a
+ * time; the caller is responsible for chaining prior to submission.
  *
  * If @fail is non-zero, it means that we have a situation where some part of
  * the submission process has failed after we have marked paged for writeback
  * and unlocked them. In this situation, we need to fail the ioend chain rather
  * than submit it to IO. This typically only happens on a filesystem shutdown.
  */
-STATIC void
+STATIC int
 xfs_submit_ioend(
 	struct writeback_control *wbc,
 	xfs_ioend_t		*ioend,
 	int			fail)
 {
-	xfs_ioend_t		*head = ioend;
-	xfs_ioend_t		*next;
 	struct buffer_head	*bh;
 	struct bio		*bio;
 	sector_t		lastblock = 0;
 
-	/* Pass 1 - start writeback */
-	do {
-		next = ioend->io_list;
-		for (bh = ioend->io_buffer_head; bh; bh = bh->b_private)
-			xfs_start_buffer_writeback(bh);
-	} while ((ioend = next) != NULL);
+	/* Reserve log space if we might write beyond the on-disk inode size. */
+	if (!fail &&
+	     ioend->io_type != XFS_IO_UNWRITTEN && xfs_ioend_is_append(ioend))
+		fail = xfs_setfilesize_trans_alloc(ioend);
+	/*
+	 * If we are failing the IO now, just mark the ioend with an
+	 * error and finish it. This will run IO completion immediately
+	 * as there is only one reference to the ioend at this point in
+	 * time.
+	 */
+	if (fail) {
+		ioend->io_error = fail;
+		xfs_finish_ioend(ioend);
+		return fail;
+	}
 
-	/* Pass 2 - submit I/O */
-	ioend = head;
-	do {
-		next = ioend->io_list;
-		bio = NULL;
+	bio = NULL;
+	for (bh = ioend->io_buffer_head; bh; bh = bh->b_private) {
 
-		/*
-		 * If we are failing the IO now, just mark the ioend with an
-		 * error and finish it. This will run IO completion immediately
-		 * as there is only one reference to the ioend at this point in
-		 * time.
-		 */
-		if (fail) {
-			ioend->io_error = fail;
-			xfs_finish_ioend(ioend);
-			continue;
+		if (!bio) {
+retry:
+			bio = xfs_alloc_ioend_bio(bh);
+		} else if (bh->b_blocknr != lastblock + 1) {
+			xfs_submit_ioend_bio(wbc, ioend, bio);
+			goto retry;
 		}
 
-		for (bh = ioend->io_buffer_head; bh; bh = bh->b_private) {
-
-			if (!bio) {
- retry:
-				bio = xfs_alloc_ioend_bio(bh);
-			} else if (bh->b_blocknr != lastblock + 1) {
-				xfs_submit_ioend_bio(wbc, ioend, bio);
-				goto retry;
-			}
-
-			if (xfs_bio_add_buffer(bio, bh) != bh->b_size) {
-				xfs_submit_ioend_bio(wbc, ioend, bio);
-				goto retry;
-			}
-
-			lastblock = bh->b_blocknr;
-		}
-		if (bio)
+		if (xfs_bio_add_buffer(bio, bh) != bh->b_size) {
 			xfs_submit_ioend_bio(wbc, ioend, bio);
-		xfs_finish_ioend(ioend);
-	} while ((ioend = next) != NULL);
+			goto retry;
+		}
+
+		lastblock = bh->b_blocknr;
+	}
+	if (bio)
+		xfs_submit_ioend_bio(wbc, ioend, bio);
+	xfs_finish_ioend(ioend);
+	return 0;
 }
 
 /*
  * Test to see if we've been building up a completion structure for
  * earlier buffers -- if so, we try to append to this ioend if we
  * can, otherwise we finish off any current ioend and start another.
- * Return true if we've finished the given ioend.
+ * Return the ioend we finished off so that the caller can submit it
+ * once it has finished processing the dirty page.
  */
-STATIC void
+STATIC struct xfs_ioend *
 xfs_add_to_ioend(
 	struct inode		*inode,
 	struct buffer_head	*bh,
 	xfs_off_t		offset,
 	struct xfs_writepage_ctx *wpc)
 {
+	struct xfs_ioend	*prev = NULL;
+
 	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
 	    bh->b_blocknr != wpc->last_block + 1) {
 		struct xfs_ioend	*new;
 
+		prev = wpc->ioend;
+
 		new = xfs_alloc_ioend(inode, wpc->io_type);
 		new->io_offset = offset;
 		new->io_buffer_head = bh;
 		new->io_buffer_tail = bh;
-		if (wpc->ioend)
-			wpc->ioend->io_list = new;
 		wpc->ioend = new;
 	} else {
 		wpc->ioend->io_buffer_tail->b_private = bh;
@@ -571,6 +550,8 @@ xfs_add_to_ioend(
 	bh->b_private = NULL;
 	wpc->ioend->io_size += bh->b_size;
 	wpc->last_block = bh->b_blocknr;
+	xfs_start_buffer_writeback(bh);
+	return prev;
 }
 
 STATIC void
@@ -732,41 +713,40 @@ out_invalidate:
 	return;
 }
 
-static int
-xfs_writepage_submit(
-	struct xfs_writepage_ctx *wpc,
-	struct writeback_control *wbc,
-	int			status)
-{
-	struct blk_plug		plug;
-
-	/* Reserve log space if we might write beyond the on-disk inode size. */
-	if (!status && wpc->ioend && wpc->ioend->io_type != XFS_IO_UNWRITTEN &&
-	    xfs_ioend_is_append(wpc->ioend))
-		status = xfs_setfilesize_trans_alloc(wpc->ioend);
-
-	if (wpc->iohead) {
-		blk_start_plug(&plug);
-		xfs_submit_ioend(wbc, wpc->iohead, status);
-		blk_finish_plug(&plug);
-	}
-	return status;
-}
-
+/*
+ * We implement an immediate ioend submission policy here to avoid needing to
+ * chain multiple ioends and hence nest mempool allocations which can violate
+ * forward progress guarantees we need to provide. The current ioend we are
+ * adding buffers to is cached on the writepage context, and if the new buffer
+ * does not append to the cached ioend it will create a new ioend and cache that
+ * instead.
+ *
+ * If a new ioend is created and cached, the old ioend is returned and queued
+ * locally for submission once the entire page is processed or an error has been
+ * detected.  While ioends are submitted immediately after they are completed,
+ * batching optimisations are provided by higher level block plugging.
+ *
+ * At the end of a writeback pass, there will be a cached ioend remaining on the
+ * writepage context that the caller will need to submit.
+ */
 static int
 xfs_writepage_map(
 	struct xfs_writepage_ctx *wpc,
+	struct writeback_control *wbc,
 	struct inode		*inode,
 	struct page		*page,
 	loff_t			offset,
 	__uint64_t              end_offset)
 {
+	LIST_HEAD(submit_list);
+	struct xfs_ioend	*ioend, *next;
 	struct buffer_head	*bh, *head;
 	ssize_t			len = 1 << inode->i_blkbits;
 	int			error = 0;
 	int			uptodate = 1;
 	int			count = 0;
 
+
 	bh = head = page_buffers(page);
 	offset = page_offset(page);
 
@@ -830,46 +810,67 @@ xfs_writepage_map(
 			lock_buffer(bh);
 			if (wpc->io_type != XFS_IO_OVERWRITE)
 				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, wpc);
+			ioend = xfs_add_to_ioend(inode, bh, offset, wpc);
+			if (ioend)
+				list_add_tail(&ioend->io_list, &submit_list);
 			count++;
 		}
 
-		if (!wpc->iohead)
-			wpc->iohead = wpc->ioend;
-
 	} while (offset += len, ((bh = bh->b_this_page) != head));
 
 	if (uptodate && bh == head)
 		SetPageUptodate(page);
 
-	xfs_start_page_writeback(page, 1, count);
-	ASSERT(wpc->iohead || !count);
-	return 0;
+	ASSERT(wpc->ioend || !count);
 
 out_error:
 	/*
-	 * On error, we have to fail the iohead here because we locked buffers
-	 * in the ioend chain. If we don't do this, we'll deadlock invalidating
-	 * the page as that tries to lock the buffers on the page. Also, because
-	 * we may have set pages under writeback, we have to make sure we run
-	 * IO completion to mark the error state of the IO appropriately, so we
-	 * can't cancel the ioend directly here. That means we have to mark this
-	 * page as under writeback if we included any buffers from it in the
-	 * ioend chain so that completion treats it correctly.
+	 * On error, we have to fail the ioend here because we have locked
+	 * buffers in the ioend. If we don't do this, we'll deadlock
+	 * invalidating the page as that tries to lock the buffers on the page.
+	 * Also, because we may have set pages under writeback, we have to make
+	 * sure we run IO completion to mark the error state of the IO
+	 * appropriately, so we can't cancel the ioend directly here. That means
+	 * we have to mark this page as under writeback if we included any
+	 * buffers from it in the ioend chain so that completion treats it
+	 * correctly.
 	 *
-	 * If we didn't include the page in the ioend, then we can simply
-	 * discard and unlock it as there are no other users of the page or it's
-	 * buffers right now. The caller will still need to trigger submission
-	 * of outstanding ioends on the writepage context so they are treated
-	 * correctly on error.
+	 * If we didn't include the page in the ioend, the on error we can
+	 * simply discard and unlock it as there are no other users of the page
+	 * or it's buffers right now. The caller will still need to trigger
+	 * submission of outstanding ioends on the writepage context so they are
+	 * treated correctly on error.
 	 */
-	if (count)
-		xfs_start_page_writeback(page, 0, count);
-	else {
-		xfs_aops_discard_page(page);
-		ClearPageUptodate(page);
+	if (count) {
+		xfs_start_page_writeback(page, !error, count);
+
+		/*
+		 * Preserve the original error if there was one, otherwise catch
+		 * submission errors here and propagate into subsequent ioend
+		 * submissions.
+		 */
+		list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
+			int error2;
+
+			list_del_init(&ioend->io_list);
+			error2 = xfs_submit_ioend(wbc, ioend, error);
+			if (error2 && !error)
+				error = error2;
+		}
+	} else {
+		/*
+		 * We can end up here with no error and nothing to write if we
+		 * race with a partial page truncate on a sub-page block sized
+		 * filesystem
+		 */
+		ASSERT(list_empty(&submit_list));
+		if (error) {
+			xfs_aops_discard_page(page);
+			ClearPageUptodate(page);
+		}
 		unlock_page(page);
 	}
+
 	mapping_set_error(page->mapping, error);
 	return error;
 }
@@ -985,7 +986,7 @@ xfs_do_writepage(
 		end_offset = offset;
 	}
 
-	return xfs_writepage_map(wpc, inode, page, offset, end_offset);
+	return xfs_writepage_map(wpc, wbc, inode, page, offset, end_offset);
 
 redirty:
 	redirty_page_for_writepage(wbc, page);
@@ -1004,7 +1005,9 @@ xfs_vm_writepage(
 	int			ret;
 
 	ret = xfs_do_writepage(page, wbc, &wpc);
-	return xfs_writepage_submit(&wpc, wbc, ret);
+	if (wpc.ioend)
+		xfs_submit_ioend(wbc, wpc.ioend, ret);
+	return ret;
 }
 
 STATIC int
@@ -1019,7 +1022,9 @@ xfs_vm_writepages(
 
 	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
 	ret = write_cache_pages(mapping, wbc, xfs_do_writepage, &wpc);
-	return xfs_writepage_submit(&wpc, wbc, ret);
+	if (wpc.ioend)
+		xfs_submit_ioend(wbc, wpc.ioend, ret);
+	return ret;
 }
 
 /*
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index f6ffc9a..d76e15d 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -39,7 +39,7 @@ enum {
  * It can manage several multi-page bio's at once.
  */
 typedef struct xfs_ioend {
-	struct xfs_ioend	*io_list;	/* next ioend in chain */
+	struct list_head	io_list;	/* next ioend in chain */
 	unsigned int		io_type;	/* delalloc / unwritten */
 	int			io_error;	/* I/O error code */
 	atomic_t		io_remaining;	/* hold count */
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/8] [RFC] xfs: build bios directly in xfs_add_to_ioend
  2016-02-10  8:47 [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Dave Chinner
                   ` (5 preceding siblings ...)
  2016-02-10  8:47 ` [PATCH 6/8] xfs: don't chain ioends during writepage submission Dave Chinner
@ 2016-02-10  8:47 ` Dave Chinner
  2016-02-10  8:47 ` [PATCH 8/8] [RFC] xfs: don't release bios on completion immediately Dave Chinner
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2016-02-10  8:47 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Currently adding a buffer to the ioend and then building a bio from
the buffer list are two separate operations. We don't build the bios
and submit them until the ioend is submitted, and this places a
fixed dependency on bufferhead chaining in the ioend.

The first step to removing the bufferhead chaining in the ioend is
on the IO submission side. We can build the bio directly as we add
the buffers to the ioend chain, thereby removing the need for a
latter "buffer-to-bio" submission loop. This allows us to submit
bios on large ioends as soon as we cannot add more data to the bio.

These bios then get captured by the active plug, and hence will be
dispatched as soon as either the plug overflows or we schedule away
from the writeback context. This will reduce submission latency for
large IOs, but will also allow more timely request queue based
writeback blocking when the device becomes congested.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 118 ++++++++++++++++++++++++++----------------------------
 fs/xfs/xfs_aops.h |   1 +
 2 files changed, 58 insertions(+), 61 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 210f18e..91dd65b 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -274,6 +274,7 @@ xfs_alloc_ioend(
 	xfs_ioend_t		*ioend;
 
 	ioend = mempool_alloc(xfs_ioend_pool, GFP_NOFS);
+	memset(ioend, 0, sizeof(*ioend));
 
 	/*
 	 * Set the count to 1 initially, which will prevent an I/O
@@ -281,16 +282,9 @@ xfs_alloc_ioend(
 	 * all the I/O from calling the completion routine too early.
 	 */
 	atomic_set(&ioend->io_remaining, 1);
-	ioend->io_error = 0;
 	INIT_LIST_HEAD(&ioend->io_list);
 	ioend->io_type = type;
 	ioend->io_inode = inode;
-	ioend->io_buffer_head = NULL;
-	ioend->io_buffer_tail = NULL;
-	ioend->io_offset = 0;
-	ioend->io_size = 0;
-	ioend->io_append_trans = NULL;
-
 	INIT_WORK(&ioend->io_work, xfs_end_io);
 	return ioend;
 }
@@ -457,13 +451,18 @@ static inline int xfs_bio_add_buffer(struct bio *bio, struct buffer_head *bh)
 }
 
 /*
- * Submit all of the bios for an ioend. We are only passed a single ioend at a
- * time; the caller is responsible for chaining prior to submission.
+ * Submit the bio for an ioend. We are passed an ioend with a bio attached to
+ * it, and we submit that bio. The ioend may be used for multiple bio
+ * submissions, so we only want to allocate an append transaction for the ioend
+ * once. In the case of multiple bio submission, each bio will take an IO
+ * reference to the ioend to ensure that the ioend completion is only done once
+ * all bios have been submitted and the ioend is really done.
  *
  * If @fail is non-zero, it means that we have a situation where some part of
  * the submission process has failed after we have marked paged for writeback
- * and unlocked them. In this situation, we need to fail the ioend chain rather
- * than submit it to IO. This typically only happens on a filesystem shutdown.
+ * and unlocked them. In this situation, we need to fail the bio and ioend
+ * rather than submit it to IO. This typically only happens on a filesystem
+ * shutdown.
  */
 STATIC int
 xfs_submit_ioend(
@@ -471,48 +470,34 @@ xfs_submit_ioend(
 	xfs_ioend_t		*ioend,
 	int			fail)
 {
-	struct buffer_head	*bh;
-	struct bio		*bio;
-	sector_t		lastblock = 0;
+	if (!ioend->io_bio || fail)
+		goto error_finish;
 
 	/* Reserve log space if we might write beyond the on-disk inode size. */
-	if (!fail &&
-	     ioend->io_type != XFS_IO_UNWRITTEN && xfs_ioend_is_append(ioend))
+	if (ioend->io_type != XFS_IO_UNWRITTEN &&
+	    xfs_ioend_is_append(ioend) && !ioend->io_append_trans) {
 		fail = xfs_setfilesize_trans_alloc(ioend);
-	/*
-	 * If we are failing the IO now, just mark the ioend with an
-	 * error and finish it. This will run IO completion immediately
-	 * as there is only one reference to the ioend at this point in
-	 * time.
-	 */
-	if (fail) {
-		ioend->io_error = fail;
-		xfs_finish_ioend(ioend);
-		return fail;
+		if (fail)
+			goto error_finish;
 	}
 
-	bio = NULL;
-	for (bh = ioend->io_buffer_head; bh; bh = bh->b_private) {
-
-		if (!bio) {
-retry:
-			bio = xfs_alloc_ioend_bio(bh);
-		} else if (bh->b_blocknr != lastblock + 1) {
-			xfs_submit_ioend_bio(wbc, ioend, bio);
-			goto retry;
-		}
-
-		if (xfs_bio_add_buffer(bio, bh) != bh->b_size) {
-			xfs_submit_ioend_bio(wbc, ioend, bio);
-			goto retry;
-		}
-
-		lastblock = bh->b_blocknr;
-	}
-	if (bio)
-		xfs_submit_ioend_bio(wbc, ioend, bio);
+	xfs_submit_ioend_bio(wbc, ioend, ioend->io_bio);
+	ioend->io_bio = NULL;
 	xfs_finish_ioend(ioend);
 	return 0;
+
+	/*
+	 * If we are failing the IO now, just mark the ioend with an error and
+	 * finish it, releasing the active bio if there is one.  This will run
+	 * IO completion immediately as there is only one reference to the ioend
+	 * at this point in time.
+	 */
+error_finish:
+	if (ioend->io_bio)
+		bio_put(ioend->io_bio);
+	ioend->io_error = fail;
+	xfs_finish_ioend(ioend);
+	return fail;
 }
 
 /*
@@ -527,30 +512,41 @@ xfs_add_to_ioend(
 	struct inode		*inode,
 	struct buffer_head	*bh,
 	xfs_off_t		offset,
-	struct xfs_writepage_ctx *wpc)
+	struct xfs_writepage_ctx *wpc,
+	struct writeback_control *wbc)
 {
 	struct xfs_ioend	*prev = NULL;
+	struct xfs_ioend	*ioend = wpc->ioend;
 
-	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
+	if (!ioend || wpc->io_type != ioend->io_type ||
 	    bh->b_blocknr != wpc->last_block + 1) {
-		struct xfs_ioend	*new;
+		prev = ioend;
+		ioend = xfs_alloc_ioend(inode, wpc->io_type);
+		ioend->io_offset = offset;
+		ioend->io_buffer_head = bh;
+		ioend->io_buffer_tail = bh;
+		wpc->ioend = ioend;
+	} else {
+		ioend->io_buffer_tail->b_private = bh;
+		ioend->io_buffer_tail = bh;
+	}
+	bh->b_private = NULL;
 
-		prev = wpc->ioend;
+	/* add the bh to the current bio */
+retry:
+	if (!ioend->io_bio)
+		ioend->io_bio = xfs_alloc_ioend_bio(bh);
 
-		new = xfs_alloc_ioend(inode, wpc->io_type);
-		new->io_offset = offset;
-		new->io_buffer_head = bh;
-		new->io_buffer_tail = bh;
-		wpc->ioend = new;
-	} else {
-		wpc->ioend->io_buffer_tail->b_private = bh;
-		wpc->ioend->io_buffer_tail = bh;
+	if (xfs_bio_add_buffer(ioend->io_bio, bh) != bh->b_size) {
+		xfs_submit_ioend_bio(wbc, ioend, ioend->io_bio);
+		ioend->io_bio = NULL;
+		goto retry;
 	}
 
-	bh->b_private = NULL;
-	wpc->ioend->io_size += bh->b_size;
+	ioend->io_size += bh->b_size;
 	wpc->last_block = bh->b_blocknr;
 	xfs_start_buffer_writeback(bh);
+	ASSERT(wpc->ioend != prev);
 	return prev;
 }
 
@@ -810,7 +806,7 @@ xfs_writepage_map(
 			lock_buffer(bh);
 			if (wpc->io_type != XFS_IO_OVERWRITE)
 				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-			ioend = xfs_add_to_ioend(inode, bh, offset, wpc);
+			ioend = xfs_add_to_ioend(inode, bh, offset, wpc, wbc);
 			if (ioend)
 				list_add_tail(&ioend->io_list, &submit_list);
 			count++;
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index d76e15d..dac64c2 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -50,6 +50,7 @@ typedef struct xfs_ioend {
 	xfs_off_t		io_offset;	/* offset in the file */
 	struct work_struct	io_work;	/* xfsdatad work queue */
 	struct xfs_trans	*io_append_trans;/* xact. for size update */
+	struct bio		*io_bio;	/* bio being built */
 } xfs_ioend_t;
 
 extern const struct address_space_operations xfs_address_space_operations;
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 8/8] [RFC] xfs: don't release bios on completion immediately
  2016-02-10  8:47 [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Dave Chinner
                   ` (6 preceding siblings ...)
  2016-02-10  8:47 ` [PATCH 7/8] [RFC] xfs: build bios directly in xfs_add_to_ioend Dave Chinner
@ 2016-02-10  8:47 ` Dave Chinner
  2016-02-10  9:05 ` [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Christoph Hellwig
  2016-02-10 18:25 ` Christoph Hellwig
  9 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2016-02-10  8:47 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Completion of an ioend requires us to walk the bufferhead list to
end writback on all the bufferheads. This, in turn, is needed so
that we can end writeback on all the pages we just did IO on.

To remove our dependency on bufferheads in writeback, we need to
turn this around the other way - we need to walk the pages we've
just completed IO on, and then walk the buffers attached to the
pages and complete their IO. In doing this, we remove the
requirement for the ioend to track bufferheads directly.

To enable IO completion to walk all the pages we've submitted IO on,
we need to keep the bios that we used for IO around until the ioend
has been completed. We can do this simply by chaining the bios to
the ioend at completion time, and then walking their pages directly
just before destroying the ioend.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 88 ++++++++++++++++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_aops.h |  5 ++--
 2 files changed, 71 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 91dd65b..a15a032 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -84,25 +84,71 @@ xfs_find_bdev_for_inode(
 }
 
 /*
- * We're now finished for good with this ioend structure.
- * Update the page state via the associated buffer_heads,
- * release holds on the inode and bio, and finally free
- * up memory.  Do not use the ioend after this.
+ * We're now finished for good with this page.  Update the page state via the
+ * associated buffer_heads, paying attention to the start and end offsets that
+ * we need to process on the page.
+ */
+static void
+xfs_finish_page_writeback(
+	struct page	*page,
+	unsigned int	start,
+	unsigned int	end,
+	int		error)
+{
+	struct buffer_head	*head, *bh;
+	unsigned int		off = 0;
+
+	bh = head = page_buffers(page);
+
+	do {
+		if (start > off)
+			goto next_bh;
+		if (off > end)
+			break;
+		bh->b_end_io(bh, !error);
+next_bh:
+		off += bh->b_size;
+	} while ((bh = bh->b_this_page) != head);
+}
+
+/*
+ * We're now finished for good with this ioend structure.  Update the page
+ * state, release holds on bios, and finally free up memory.  Do not use the
+ * ioend after this.
  */
 STATIC void
 xfs_destroy_ioend(
-	xfs_ioend_t		*ioend)
+	struct xfs_ioend	*ioend)
 {
-	struct buffer_head	*bh, *next;
+	struct bio		*bio, *next;
 
-	for (bh = ioend->io_buffer_head; bh; bh = next) {
-		next = bh->b_private;
-		bh->b_end_io(bh, !ioend->io_error);
+	for (bio = ioend->io_bio_done; bio; bio = next) {
+		struct bio_vec  *bvec;
+		int             i;
+
+		next = bio->bi_private;
+		bio->bi_private = NULL;
+
+		/* walk each page on bio, ending page IO on them */
+		bio_for_each_segment_all(bvec, bio, i) {
+			struct page	*page = bvec->bv_page;
+			unsigned int	off, end_off;
+
+			off = bvec->bv_offset;
+			end_off = off + bvec->bv_len - 1;
+			ASSERT(off < PAGE_SIZE);
+			ASSERT(end_off <= PAGE_SIZE);
+			xfs_finish_page_writeback(page, off, end_off,
+						  ioend->io_error);
+
+		}
+		bio_put(bio);
 	}
 
 	mempool_free(ioend, xfs_ioend_pool);
 }
 
+
 /*
  * Fast and loose check if this write could update the on-disk inode size.
  */
@@ -286,6 +332,7 @@ xfs_alloc_ioend(
 	ioend->io_type = type;
 	ioend->io_inode = inode;
 	INIT_WORK(&ioend->io_work, xfs_end_io);
+	spin_lock_init(&ioend->io_lock);
 	return ioend;
 }
 
@@ -365,15 +412,21 @@ STATIC void
 xfs_end_bio(
 	struct bio		*bio)
 {
-	xfs_ioend_t		*ioend = bio->bi_private;
+	struct xfs_ioend	*ioend = bio->bi_private;
+	unsigned long		flags;
 
-	if (!ioend->io_error)
-		ioend->io_error = bio->bi_error;
-
-	/* Toss bio and pass work off to an xfsdatad thread */
 	bio->bi_private = NULL;
 	bio->bi_end_io = NULL;
-	bio_put(bio);
+
+	spin_lock_irqsave(&ioend->io_lock, flags);
+	if (!ioend->io_error)
+	       ioend->io_error = bio->bi_error;
+	if (!ioend->io_bio_done)
+		ioend->io_bio_done = bio;
+	else
+		ioend->io_bio_done_tail->bi_private = bio;
+	ioend->io_bio_done_tail = bio;
+	spin_unlock_irqrestore(&ioend->io_lock, flags);
 
 	xfs_finish_ioend(ioend);
 }
@@ -523,12 +576,7 @@ xfs_add_to_ioend(
 		prev = ioend;
 		ioend = xfs_alloc_ioend(inode, wpc->io_type);
 		ioend->io_offset = offset;
-		ioend->io_buffer_head = bh;
-		ioend->io_buffer_tail = bh;
 		wpc->ioend = ioend;
-	} else {
-		ioend->io_buffer_tail->b_private = bh;
-		ioend->io_buffer_tail = bh;
 	}
 	bh->b_private = NULL;
 
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index dac64c2..aaa74cb 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -44,13 +44,14 @@ typedef struct xfs_ioend {
 	int			io_error;	/* I/O error code */
 	atomic_t		io_remaining;	/* hold count */
 	struct inode		*io_inode;	/* file being written to */
-	struct buffer_head	*io_buffer_head;/* buffer linked list head */
-	struct buffer_head	*io_buffer_tail;/* buffer linked list tail */
 	size_t			io_size;	/* size of the extent */
 	xfs_off_t		io_offset;	/* offset in the file */
 	struct work_struct	io_work;	/* xfsdatad work queue */
 	struct xfs_trans	*io_append_trans;/* xact. for size update */
 	struct bio		*io_bio;	/* bio being built */
+	struct bio		*io_bio_done;	/* bios completed */
+	struct bio		*io_bio_done_tail; /* bios completed */
+	spinlock_t		io_lock;	/* for bio completion list */
 } xfs_ioend_t;
 
 extern const struct address_space_operations xfs_address_space_operations;
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write
  2016-02-10  8:47 [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Dave Chinner
                   ` (7 preceding siblings ...)
  2016-02-10  8:47 ` [PATCH 8/8] [RFC] xfs: don't release bios on completion immediately Dave Chinner
@ 2016-02-10  9:05 ` Christoph Hellwig
  2016-02-10 18:25 ` Christoph Hellwig
  9 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-10  9:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Seems like only 4 patches made it through..

On Wed, Feb 10, 2016 at 07:47:15PM +1100, Dave Chinner wrote:
> 
> Hi folks,
> 
> This is the 4th version of this series to followup on review
> comments from Brian and Christoph. There are three new patches in
> this series. Patch 2/8 is split out from patch 3/8 to document the
> removal of the ioend cancelling code, spearate from the introduction
> of the writepage context. Patches 7 and 8 are new patches (as in the
> first time I've posted them) to demonstrate how to remove the IO
> completion dependency on recording the bufferehads attached to the
> ioend. This is the first step in removing bufferheads from the
> writepage IO path - these are FYI patches, not patches I want to
> have committed immediately.
> 
> The changes all run through xfstests on 4k and 1k block size
> filesystems fine, and I think i addressed all the review comments.
> Patch 6/8 changed enough that I removed Brian's reviewed-by from it
> and so it will need to be looked at completely again.
> 
> -Dave.
> 
> Version 4:
> - split xfs_ioend_cancel removal into it's own patch (patch 2)
> - cleaned up submission of ioends to be consistent w.r.t. success
>   and error paths (patch 3)
> - made variables names less verbose (patch 6)
> - got rid of xfs_writepage_submit() (patch 6)
> - use a struct list_head for ioend chaining (patch 6)
> - consolidated ioend submission paths to use common code (patch 6)
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
---end quoted text---

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/8] xfs: remove xfs_cancel_ioend
  2016-02-10  8:47 ` [PATCH 2/8] xfs: remove xfs_cancel_ioend Dave Chinner
@ 2016-02-10 11:28   ` Christoph Hellwig
  2016-02-11  0:21     ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-10 11:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> +static int
> +xfs_writepage_submit(
> +	struct xfs_ioend	*ioend,
> +	struct xfs_ioend	*iohead,
> +	struct writeback_control *wbc,
> +	int			status)
> +{
> +	struct blk_plug		plug;
> +
> +	/* Reserve log space if we might write beyond the on-disk inode size. */
> +	if (!status && ioend && ioend->io_type != XFS_IO_UNWRITTEN &&
> +	    xfs_ioend_is_append(ioend))
> +		status = xfs_setfilesize_trans_alloc(ioend);
> +
> +	if (iohead) {
> +		blk_start_plug(&plug);
> +		xfs_submit_ioend(wbc, iohead, status);
> +		blk_finish_plug(&plug);
> +	}
> +	return status;
> +}

We return the xfs_setfilesize_trans_alloc failure status here,
but none of the callers pick it up.  The way this is handled later
changes a bit, but even at the end of the series only 1 of the
three callers handles the error.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/8] xfs: Introduce writeback context for writepages
  2016-02-10  8:47 ` [PATCH 3/8] xfs: Introduce writeback context for writepages Dave Chinner
@ 2016-02-10 11:31   ` Christoph Hellwig
  2016-02-11  0:25     ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-10 11:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> +	struct xfs_writepage_ctx wpc = {
> +		.io_type = XFS_IO_OVERWRITE,
> +	};
> +	int			ret;
> +
> +	ret = xfs_do_writepage(page, wbc, &wpc);
> +	return xfs_writepage_submit(&wpc, wbc, ret);
> +}
> +
> +STATIC int
>  xfs_vm_writepages(
>  	struct address_space	*mapping,
>  	struct writeback_control *wbc)
>  {
> +	struct xfs_writepage_ctx wpc = {
> +		.io_type = XFS_IO_OVERWRITE,
> +	};

Shouldn't we start out with an invalid (0) state, and just move
the actual states up to start from 1?

Otherwise looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] xfs: don't chain ioends during writepage submission
  2016-02-10  8:47 ` [PATCH 6/8] xfs: don't chain ioends during writepage submission Dave Chinner
@ 2016-02-10 11:36   ` Christoph Hellwig
  2016-02-11  0:26     ` Dave Chinner
  2016-02-11  6:39     ` Dave Chinner
  0 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-10 11:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> -STATIC void
> +STATIC int
>  xfs_submit_ioend(
>  	struct writeback_control *wbc,
>  	xfs_ioend_t		*ioend,
>  	int			fail)

No that almost all of the function is rewritten can you rename
fail to error or status?  fail always suggests a boolean to me and
is rather confusing.

> + * Return the ioend we finished off so that the caller can submit it
> + * once it has finished processing the dirty page.
>   */
> -STATIC void
> +STATIC struct xfs_ioend *
>  xfs_add_to_ioend(
>  	struct inode		*inode,
>  	struct buffer_head	*bh,
>  	xfs_off_t		offset,
>  	struct xfs_writepage_ctx *wpc)
>  {
> +	struct xfs_ioend	*prev = NULL;
> +
>  	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
>  	    bh->b_blocknr != wpc->last_block + 1) {
>  		struct xfs_ioend	*new;
>  
> +		prev = wpc->ioend;

Looking at the new list_head based code it might be either to just
pass in a pointer to the submit_list and just add the previous ioend
here directly.

> +	LIST_HEAD(submit_list);
> +	struct xfs_ioend	*ioend, *next;
>  	struct buffer_head	*bh, *head;
>  	ssize_t			len = 1 << inode->i_blkbits;
>  	int			error = 0;
>  	int			uptodate = 1;
>  	int			count = 0;

The count variable is pointless now - we only check for it being
non-zero, and we can do the same with a list_emptry on submit_list.

>  
> +
>  	bh = head = page_buffers(page);
>  	offset = page_offset(page);

nit: pointless second empty line added above

>  	ret = xfs_do_writepage(page, wbc, &wpc);
> -	return xfs_writepage_submit(&wpc, wbc, ret);
> +	if (wpc.ioend)
> +		xfs_submit_ioend(wbc, wpc.ioend, ret);
> +	return ret;
>  }
>  
>  STATIC int
> @@ -1019,7 +1022,9 @@ xfs_vm_writepages(
>  
>  	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
>  	ret = write_cache_pages(mapping, wbc, xfs_do_writepage, &wpc);
> -	return xfs_writepage_submit(&wpc, wbc, ret);
> +	if (wpc.ioend)
> +		xfs_submit_ioend(wbc, wpc.ioend, ret);
> +	return ret;

And this is where ignoreing the xfs_setfilesize_trans_alloc errors
reappears after a previous patch mostly fixed it up.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write
  2016-02-10  8:47 [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Dave Chinner
                   ` (8 preceding siblings ...)
  2016-02-10  9:05 ` [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Christoph Hellwig
@ 2016-02-10 18:25 ` Christoph Hellwig
  2016-02-10 21:25   ` Dave Chinner
  9 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-10 18:25 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Feb 10, 2016 at 07:47:15PM +1100, Dave Chinner wrote:
> of the writepage context. Patches 7 and 8 are new patches (as in the
> first time I've posted them) to demonstrate how to remove the IO
> completion dependency on recording the bufferehads attached to the
> ioend. This is the first step in removing bufferheads from the
> writepage IO path - these are FYI patches, not patches I want to
> have committed immediately.

This looks interesting.  I played around with this a bit and ported
my patch to embedd the main bio into struct ioend to it.  My older
version allowed to chain additional bios to get ioends larger than
1MB (or PAGE_SIZE * 256), but that will require additional work
in the new completion handler.  It does however simplify things quite
a bit, and I suspect a lot of that simplification could be kept even
with chained bios.  I've attached the patch below for reference:

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a15a032..53c04a2 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -111,44 +111,29 @@ next_bh:
 	} while ((bh = bh->b_this_page) != head);
 }
 
-/*
- * We're now finished for good with this ioend structure.  Update the page
- * state, release holds on bios, and finally free up memory.  Do not use the
- * ioend after this.
- */
 STATIC void
-xfs_destroy_ioend(
-	struct xfs_ioend	*ioend)
+__xfs_end_bio(
+	struct bio		*bio,
+	int			error)
 {
-	struct bio		*bio, *next;
-
-	for (bio = ioend->io_bio_done; bio; bio = next) {
-		struct bio_vec  *bvec;
-		int             i;
-
-		next = bio->bi_private;
-		bio->bi_private = NULL;
+	struct bio_vec  	*bvec;
+	int			i;
 
-		/* walk each page on bio, ending page IO on them */
-		bio_for_each_segment_all(bvec, bio, i) {
-			struct page	*page = bvec->bv_page;
-			unsigned int	off, end_off;
+	/* walk each page on bio, ending page IO on them */
+	bio_for_each_segment_all(bvec, bio, i) {
+		struct page	*page = bvec->bv_page;
+		unsigned int	off = bvec->bv_offset;
+		unsigned int	end_off = off + bvec->bv_len - 1;
 
-			off = bvec->bv_offset;
-			end_off = off + bvec->bv_len - 1;
-			ASSERT(off < PAGE_SIZE);
-			ASSERT(end_off <= PAGE_SIZE);
-			xfs_finish_page_writeback(page, off, end_off,
-						  ioend->io_error);
+		ASSERT(off < PAGE_SIZE);
+		ASSERT(end_off <= PAGE_SIZE);
 
-		}
-		bio_put(bio);
+		xfs_finish_page_writeback(page, off, end_off, error);
 	}
 
-	mempool_free(ioend, xfs_ioend_pool);
+	bio_put(bio);
 }
 
-
 /*
  * Fast and loose check if this write could update the on-disk inode size.
  */
@@ -220,7 +205,8 @@ xfs_setfilesize(
 
 STATIC int
 xfs_setfilesize_ioend(
-	struct xfs_ioend	*ioend)
+	struct xfs_ioend	*ioend,
+	int			error)
 {
 	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
 	struct xfs_trans	*tp = ioend->io_append_trans;
@@ -234,53 +220,32 @@ xfs_setfilesize_ioend(
 	__sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS);
 
 	/* we abort the update if there was an IO error */
-	if (ioend->io_error) {
+	if (error) {
 		xfs_trans_cancel(tp);
-		return ioend->io_error;
+		return error;
 	}
 
 	return xfs_setfilesize(ip, tp, ioend->io_offset, ioend->io_size);
 }
 
 /*
- * Schedule IO completion handling on the final put of an ioend.
- *
- * If there is no work to do we might as well call it a day and free the
- * ioend right now.
- */
-STATIC void
-xfs_finish_ioend(
-	struct xfs_ioend	*ioend)
-{
-	if (atomic_dec_and_test(&ioend->io_remaining)) {
-		struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
-
-		if (ioend->io_type == XFS_IO_UNWRITTEN)
-			queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
-		else if (ioend->io_append_trans)
-			queue_work(mp->m_data_workqueue, &ioend->io_work);
-		else
-			xfs_destroy_ioend(ioend);
-	}
-}
-
-/*
  * IO write completion.
  */
 STATIC void
 xfs_end_io(
 	struct work_struct *work)
 {
-	xfs_ioend_t	*ioend = container_of(work, xfs_ioend_t, io_work);
-	struct xfs_inode *ip = XFS_I(ioend->io_inode);
-	int		error = 0;
+	struct xfs_ioend	*ioend =
+		container_of(work, struct xfs_ioend, io_work);
+	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
+	int			error = ioend->io_bio.bi_error;
 
 	/*
 	 * Set an error if the mount has shut down and proceed with end I/O
 	 * processing so it can perform whatever cleanups are necessary.
 	 */
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
-		ioend->io_error = -EIO;
+		error = -EIO;
 
 	/*
 	 * For unwritten extents we need to issue transactions to convert a
@@ -290,20 +255,34 @@ xfs_end_io(
 	 * on error.
 	 */
 	if (ioend->io_type == XFS_IO_UNWRITTEN) {
-		if (ioend->io_error)
+		if (error)
 			goto done;
 		error = xfs_iomap_write_unwritten(ip, ioend->io_offset,
 						  ioend->io_size);
 	} else if (ioend->io_append_trans) {
-		error = xfs_setfilesize_ioend(ioend);
+		error = xfs_setfilesize_ioend(ioend, error);
 	} else {
 		ASSERT(!xfs_ioend_is_append(ioend));
 	}
 
 done:
-	if (error)
-		ioend->io_error = error;
-	xfs_destroy_ioend(ioend);
+	__xfs_end_bio(&ioend->io_bio, error);
+}
+
+STATIC void
+xfs_end_bio(
+	struct bio		*bio)
+{
+	struct xfs_ioend	*ioend = 
+		container_of(bio, struct xfs_ioend, io_bio);
+	struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
+
+	if (ioend->io_type == XFS_IO_UNWRITTEN)
+		queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
+	else if (ioend->io_append_trans)
+		queue_work(mp->m_data_workqueue, &ioend->io_work);
+	else
+		__xfs_end_bio(bio, bio->bi_error);
 }
 
 /*
@@ -312,27 +291,24 @@ done:
  * We'll need to extend this for updating the ondisk inode size later
  * (vs. incore size).
  */
-STATIC xfs_ioend_t *
+STATIC struct xfs_ioend *
 xfs_alloc_ioend(
 	struct inode		*inode,
 	unsigned int		type)
 {
-	xfs_ioend_t		*ioend;
+	struct xfs_ioend	*ioend;
+	struct bio		*bio;
 
-	ioend = mempool_alloc(xfs_ioend_pool, GFP_NOFS);
-	memset(ioend, 0, sizeof(*ioend));
+	bio = bio_alloc_bioset(GFP_NOFS, BIO_MAX_PAGES, xfs_ioend_bioset);
+	bio->bi_end_io = xfs_end_bio;
+
+	ioend = container_of(bio, struct xfs_ioend, io_bio);
+	memset(ioend, 0, offsetof(struct xfs_ioend, io_bio));
 
-	/*
-	 * Set the count to 1 initially, which will prevent an I/O
-	 * completion callback from happening before we have started
-	 * all the I/O from calling the completion routine too early.
-	 */
-	atomic_set(&ioend->io_remaining, 1);
 	INIT_LIST_HEAD(&ioend->io_list);
 	ioend->io_type = type;
 	ioend->io_inode = inode;
 	INIT_WORK(&ioend->io_work, xfs_end_io);
-	spin_lock_init(&ioend->io_lock);
 	return ioend;
 }
 
@@ -405,56 +381,6 @@ xfs_imap_valid(
 		offset < imap->br_startoff + imap->br_blockcount;
 }
 
-/*
- * BIO completion handler for buffered IO.
- */
-STATIC void
-xfs_end_bio(
-	struct bio		*bio)
-{
-	struct xfs_ioend	*ioend = bio->bi_private;
-	unsigned long		flags;
-
-	bio->bi_private = NULL;
-	bio->bi_end_io = NULL;
-
-	spin_lock_irqsave(&ioend->io_lock, flags);
-	if (!ioend->io_error)
-	       ioend->io_error = bio->bi_error;
-	if (!ioend->io_bio_done)
-		ioend->io_bio_done = bio;
-	else
-		ioend->io_bio_done_tail->bi_private = bio;
-	ioend->io_bio_done_tail = bio;
-	spin_unlock_irqrestore(&ioend->io_lock, flags);
-
-	xfs_finish_ioend(ioend);
-}
-
-STATIC void
-xfs_submit_ioend_bio(
-	struct writeback_control *wbc,
-	xfs_ioend_t		*ioend,
-	struct bio		*bio)
-{
-	atomic_inc(&ioend->io_remaining);
-	bio->bi_private = ioend;
-	bio->bi_end_io = xfs_end_bio;
-	submit_bio(wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE, bio);
-}
-
-STATIC struct bio *
-xfs_alloc_ioend_bio(
-	struct buffer_head	*bh)
-{
-	struct bio		*bio = bio_alloc(GFP_NOIO, BIO_MAX_PAGES);
-
-	ASSERT(bio->bi_private == NULL);
-	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
-	bio->bi_bdev = bh->b_bdev;
-	return bio;
-}
-
 STATIC void
 xfs_start_buffer_writeback(
 	struct buffer_head	*bh)
@@ -520,10 +446,10 @@ static inline int xfs_bio_add_buffer(struct bio *bio, struct buffer_head *bh)
 STATIC int
 xfs_submit_ioend(
 	struct writeback_control *wbc,
-	xfs_ioend_t		*ioend,
+	struct xfs_ioend	*ioend,
 	int			fail)
 {
-	if (!ioend->io_bio || fail)
+	if (fail)
 		goto error_finish;
 
 	/* Reserve log space if we might write beyond the on-disk inode size. */
@@ -534,9 +460,8 @@ xfs_submit_ioend(
 			goto error_finish;
 	}
 
-	xfs_submit_ioend_bio(wbc, ioend, ioend->io_bio);
-	ioend->io_bio = NULL;
-	xfs_finish_ioend(ioend);
+	submit_bio(wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE,
+			&ioend->io_bio);
 	return 0;
 
 	/*
@@ -546,10 +471,8 @@ xfs_submit_ioend(
 	 * at this point in time.
 	 */
 error_finish:
-	if (ioend->io_bio)
-		bio_put(ioend->io_bio);
-	ioend->io_error = fail;
-	xfs_finish_ioend(ioend);
+	ioend->io_bio.bi_error = fail;
+	bio_endio(&ioend->io_bio);
 	return fail;
 }
 
@@ -573,23 +496,22 @@ xfs_add_to_ioend(
 
 	if (!ioend || wpc->io_type != ioend->io_type ||
 	    bh->b_blocknr != wpc->last_block + 1) {
+retry:
 		prev = ioend;
+
 		ioend = xfs_alloc_ioend(inode, wpc->io_type);
 		ioend->io_offset = offset;
+
+		ioend->io_bio.bi_iter.bi_sector =
+			bh->b_blocknr * (bh->b_size >> 9);
+		ioend->io_bio.bi_bdev = bh->b_bdev;
+
 		wpc->ioend = ioend;
 	}
-	bh->b_private = NULL;
 
 	/* add the bh to the current bio */
-retry:
-	if (!ioend->io_bio)
-		ioend->io_bio = xfs_alloc_ioend_bio(bh);
-
-	if (xfs_bio_add_buffer(ioend->io_bio, bh) != bh->b_size) {
-		xfs_submit_ioend_bio(wbc, ioend, ioend->io_bio);
-		ioend->io_bio = NULL;
+	if (xfs_bio_add_buffer(&ioend->io_bio, bh) != bh->b_size)
 		goto retry;
-	}
 
 	ioend->io_size += bh->b_size;
 	wpc->last_block = bh->b_blocknr;
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index aaa74cb..acc5716 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -18,7 +18,7 @@
 #ifndef __XFS_AOPS_H__
 #define __XFS_AOPS_H__
 
-extern mempool_t *xfs_ioend_pool;
+extern struct bio_set *xfs_ioend_bioset;
 
 /*
  * Types of I/O for bmap clustering and I/O completion tracking.
@@ -35,24 +35,18 @@ enum {
 	{ XFS_IO_OVERWRITE,		"overwrite" }
 
 /*
- * xfs_ioend struct manages large extent writes for XFS.
- * It can manage several multi-page bio's at once.
+ * Structure for buffered I/O completions.
  */
-typedef struct xfs_ioend {
+struct xfs_ioend {
 	struct list_head	io_list;	/* next ioend in chain */
 	unsigned int		io_type;	/* delalloc / unwritten */
-	int			io_error;	/* I/O error code */
-	atomic_t		io_remaining;	/* hold count */
 	struct inode		*io_inode;	/* file being written to */
 	size_t			io_size;	/* size of the extent */
 	xfs_off_t		io_offset;	/* offset in the file */
 	struct work_struct	io_work;	/* xfsdatad work queue */
 	struct xfs_trans	*io_append_trans;/* xact. for size update */
-	struct bio		*io_bio;	/* bio being built */
-	struct bio		*io_bio_done;	/* bios completed */
-	struct bio		*io_bio_done_tail; /* bios completed */
-	spinlock_t		io_lock;	/* for bio completion list */
-} xfs_ioend_t;
+	struct bio		io_bio;
+};
 
 extern const struct address_space_operations xfs_address_space_operations;
 
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 59c9b7b..725c36c 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -57,8 +57,7 @@
 #include <linux/parser.h>
 
 static const struct super_operations xfs_super_operations;
-static kmem_zone_t *xfs_ioend_zone;
-mempool_t *xfs_ioend_pool;
+struct bio_set *xfs_ioend_bioset;
 
 static struct kset *xfs_kset;		/* top-level xfs sysfs dir */
 #ifdef DEBUG
@@ -1646,20 +1645,15 @@ MODULE_ALIAS_FS("xfs");
 STATIC int __init
 xfs_init_zones(void)
 {
-
-	xfs_ioend_zone = kmem_zone_init(sizeof(xfs_ioend_t), "xfs_ioend");
-	if (!xfs_ioend_zone)
+	xfs_ioend_bioset = bioset_create(4 * MAX_BUF_PER_PAGE,
+			offsetof(struct xfs_ioend, io_bio));
+	if (!xfs_ioend_bioset)
 		goto out;
 
-	xfs_ioend_pool = mempool_create_slab_pool(4 * MAX_BUF_PER_PAGE,
-						  xfs_ioend_zone);
-	if (!xfs_ioend_pool)
-		goto out_destroy_ioend_zone;
-
 	xfs_log_ticket_zone = kmem_zone_init(sizeof(xlog_ticket_t),
 						"xfs_log_ticket");
 	if (!xfs_log_ticket_zone)
-		goto out_destroy_ioend_pool;
+		goto out_free_ioend_bioset;
 
 	xfs_bmap_free_item_zone = kmem_zone_init(sizeof(xfs_bmap_free_item_t),
 						"xfs_bmap_free_item");
@@ -1755,10 +1749,8 @@ xfs_init_zones(void)
 	kmem_zone_destroy(xfs_bmap_free_item_zone);
  out_destroy_log_ticket_zone:
 	kmem_zone_destroy(xfs_log_ticket_zone);
- out_destroy_ioend_pool:
-	mempool_destroy(xfs_ioend_pool);
- out_destroy_ioend_zone:
-	kmem_zone_destroy(xfs_ioend_zone);
+ out_free_ioend_bioset:
+ 	bioset_free(xfs_ioend_bioset);
  out:
 	return -ENOMEM;
 }
@@ -1784,9 +1776,7 @@ xfs_destroy_zones(void)
 	kmem_zone_destroy(xfs_btree_cur_zone);
 	kmem_zone_destroy(xfs_bmap_free_item_zone);
 	kmem_zone_destroy(xfs_log_ticket_zone);
-	mempool_destroy(xfs_ioend_pool);
-	kmem_zone_destroy(xfs_ioend_zone);
-
+ 	bioset_free(xfs_ioend_bioset);
 }
 
 STATIC int __init

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write
  2016-02-10 18:25 ` Christoph Hellwig
@ 2016-02-10 21:25   ` Dave Chinner
  2016-02-11 15:13     ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2016-02-10 21:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Feb 10, 2016 at 10:25:38AM -0800, Christoph Hellwig wrote:
> On Wed, Feb 10, 2016 at 07:47:15PM +1100, Dave Chinner wrote:
> > of the writepage context. Patches 7 and 8 are new patches (as in the
> > first time I've posted them) to demonstrate how to remove the IO
> > completion dependency on recording the bufferehads attached to the
> > ioend. This is the first step in removing bufferheads from the
> > writepage IO path - these are FYI patches, not patches I want to
> > have committed immediately.
> 
> This looks interesting.  I played around with this a bit and ported
> my patch to embedd the main bio into struct ioend to it.  My older
> version allowed to chain additional bios to get ioends larger than
> 1MB (or PAGE_SIZE * 256), but that will require additional work
> in the new completion handler.  It does however simplify things quite
> a bit, and I suspect a lot of that simplification could be kept even
> with chained bios.  I've attached the patch below for reference:

I really like the idea, especially how using a bioset encapsulates
the ioend and binds the life cycle to the bio. It also removes a
heap of code, too.

We really do need some form of chaining here, though. If we don't,
then we'll be doing unwritten extent of set file size transactions
for ever 1MB bio completion instead of once for however large
writepages can build an ioend.

I think this is definitely worth pursuing - are you going to get any
time to work on this in the next couple of months, Christoph (i.e.
to target the 4.7 merge window)?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/8] xfs: remove xfs_cancel_ioend
  2016-02-10 11:28   ` Christoph Hellwig
@ 2016-02-11  0:21     ` Dave Chinner
  2016-02-11 15:14       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2016-02-11  0:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Feb 10, 2016 at 03:28:00AM -0800, Christoph Hellwig wrote:
> > +static int
> > +xfs_writepage_submit(
> > +	struct xfs_ioend	*ioend,
> > +	struct xfs_ioend	*iohead,
> > +	struct writeback_control *wbc,
> > +	int			status)
> > +{
> > +	struct blk_plug		plug;
> > +
> > +	/* Reserve log space if we might write beyond the on-disk inode size. */
> > +	if (!status && ioend && ioend->io_type != XFS_IO_UNWRITTEN &&
> > +	    xfs_ioend_is_append(ioend))
> > +		status = xfs_setfilesize_trans_alloc(ioend);
> > +
> > +	if (iohead) {
> > +		blk_start_plug(&plug);
> > +		xfs_submit_ioend(wbc, iohead, status);
> > +		blk_finish_plug(&plug);
> > +	}
> > +	return status;
> > +}
> 
> We return the xfs_setfilesize_trans_alloc failure status here,
> but none of the callers pick it up.  The way this is handled later
> changes a bit, but even at the end of the series only 1 of the
> three callers handles the error.

I'll propagate it through where it makes sense. If we alrady have an
error, then we aren't going to call xfs_setfilesize_trans_alloc()
anyway, so checking the return value only matters in the non-error
cases.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/8] xfs: Introduce writeback context for writepages
  2016-02-10 11:31   ` Christoph Hellwig
@ 2016-02-11  0:25     ` Dave Chinner
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2016-02-11  0:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Feb 10, 2016 at 03:31:26AM -0800, Christoph Hellwig wrote:
> > +	struct xfs_writepage_ctx wpc = {
> > +		.io_type = XFS_IO_OVERWRITE,
> > +	};
> > +	int			ret;
> > +
> > +	ret = xfs_do_writepage(page, wbc, &wpc);
> > +	return xfs_writepage_submit(&wpc, wbc, ret);
> > +}
> > +
> > +STATIC int
> >  xfs_vm_writepages(
> >  	struct address_space	*mapping,
> >  	struct writeback_control *wbc)
> >  {
> > +	struct xfs_writepage_ctx wpc = {
> > +		.io_type = XFS_IO_OVERWRITE,
> > +	};
> 
> Shouldn't we start out with an invalid (0) state, and just move
> the actual states up to start from 1?

This is just a translation of the existing code - the imap_valid
flag being initialised to zero ensures the io_type is correctly
initialised if it differs from XFS_IO_OVERWRITE. I guess there's no
harm in changing it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] xfs: don't chain ioends during writepage submission
  2016-02-10 11:36   ` Christoph Hellwig
@ 2016-02-11  0:26     ` Dave Chinner
  2016-02-11  6:39     ` Dave Chinner
  1 sibling, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2016-02-11  0:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Feb 10, 2016 at 03:36:09AM -0800, Christoph Hellwig wrote:
> > -STATIC void
> > +STATIC int
> >  xfs_submit_ioend(
> >  	struct writeback_control *wbc,
> >  	xfs_ioend_t		*ioend,
> >  	int			fail)
> 
> No that almost all of the function is rewritten can you rename
> fail to error or status?  fail always suggests a boolean to me and
> is rather confusing.

*nod*

> > + * Return the ioend we finished off so that the caller can submit it
> > + * once it has finished processing the dirty page.
> >   */
> > -STATIC void
> > +STATIC struct xfs_ioend *
> >  xfs_add_to_ioend(
> >  	struct inode		*inode,
> >  	struct buffer_head	*bh,
> >  	xfs_off_t		offset,
> >  	struct xfs_writepage_ctx *wpc)
> >  {
> > +	struct xfs_ioend	*prev = NULL;
> > +
> >  	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
> >  	    bh->b_blocknr != wpc->last_block + 1) {
> >  		struct xfs_ioend	*new;
> >  
> > +		prev = wpc->ioend;
> 
> Looking at the new list_head based code it might be either to just
> pass in a pointer to the submit_list and just add the previous ioend
> here directly.

OK.

> >  	ret = xfs_do_writepage(page, wbc, &wpc);
> > -	return xfs_writepage_submit(&wpc, wbc, ret);
> > +	if (wpc.ioend)
> > +		xfs_submit_ioend(wbc, wpc.ioend, ret);
> > +	return ret;
> >  }
> >  
> >  STATIC int
> > @@ -1019,7 +1022,9 @@ xfs_vm_writepages(
> >  
> >  	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
> >  	ret = write_cache_pages(mapping, wbc, xfs_do_writepage, &wpc);
> > -	return xfs_writepage_submit(&wpc, wbc, ret);
> > +	if (wpc.ioend)
> > +		xfs_submit_ioend(wbc, wpc.ioend, ret);
> > +	return ret;
> 
> And this is where ignoreing the xfs_setfilesize_trans_alloc errors
> reappears after a previous patch mostly fixed it up.

will fix.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] xfs: don't chain ioends during writepage submission
  2016-02-10 11:36   ` Christoph Hellwig
  2016-02-11  0:26     ` Dave Chinner
@ 2016-02-11  6:39     ` Dave Chinner
  1 sibling, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2016-02-11  6:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Feb 10, 2016 at 03:36:09AM -0800, Christoph Hellwig wrote:
> > +	LIST_HEAD(submit_list);
> > +	struct xfs_ioend	*ioend, *next;
> >  	struct buffer_head	*bh, *head;
> >  	ssize_t			len = 1 << inode->i_blkbits;
> >  	int			error = 0;
> >  	int			uptodate = 1;
> >  	int			count = 0;
> 
> The count variable is pointless now - we only check for it being
> non-zero, and we can do the same with a list_emptry on submit_list.

Actually, it's not pointless - we need it to determine what to
with the page. We have to start writeback on the page if there were
no errors, but the number of buffers we added to the ioend
determines if we need to end page writeback immediately (i.e. no
buffers added). This is spearate to whether we allocated a new ioend
on the page and hence have an ioend to submit on the submit_list.

That said, looking at this did point out a bug in this
version of the patch - if no buffers were added to the ioends, then
we didn't clear the dirty bit on the page and transition through the
writeback state, as the original code was doing. I did trip over
this in a test run, so it's a real problem that needed to be fixed.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write
  2016-02-10 21:25   ` Dave Chinner
@ 2016-02-11 15:13     ` Christoph Hellwig
  2016-02-11 20:12       ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-11 15:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Thu, Feb 11, 2016 at 08:25:51AM +1100, Dave Chinner wrote:
> I really like the idea, especially how using a bioset encapsulates
> the ioend and binds the life cycle to the bio. It also removes a
> heap of code, too.
> 
> We really do need some form of chaining here, though. If we don't,
> then we'll be doing unwritten extent of set file size transactions
> for ever 1MB bio completion instead of once for however large
> writepages can build an ioend.
> 
> I think this is definitely worth pursuing - are you going to get any
> time to work on this in the next couple of months, Christoph (i.e.
> to target the 4.7 merge window)?

Yes, I've just started working on this and would love to get it off
my table ASAP.  I was doing this ontop of your initial writeback
changes, so let me know if I should go back to that version, or
you think the last two patches will be ready in time as well?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/8] xfs: remove xfs_cancel_ioend
  2016-02-11  0:21     ` Dave Chinner
@ 2016-02-11 15:14       ` Christoph Hellwig
  2016-02-11 20:59         ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-11 15:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Thu, Feb 11, 2016 at 11:21:37AM +1100, Dave Chinner wrote:
> I'll propagate it through where it makes sense. If we alrady have an
> error, then we aren't going to call xfs_setfilesize_trans_alloc()
> anyway, so checking the return value only matters in the non-error
> cases.

Oh, I missed that we don't care about the failure case.  Maybe we
should just call xfs_setfilesize_trans_alloc instead, and just move
the conditionals to it so that it's a no-op if no transaction is needed?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write
  2016-02-11 15:13     ` Christoph Hellwig
@ 2016-02-11 20:12       ` Dave Chinner
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2016-02-11 20:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Feb 11, 2016 at 07:13:01AM -0800, Christoph Hellwig wrote:
> On Thu, Feb 11, 2016 at 08:25:51AM +1100, Dave Chinner wrote:
> > I really like the idea, especially how using a bioset encapsulates
> > the ioend and binds the life cycle to the bio. It also removes a
> > heap of code, too.
> > 
> > We really do need some form of chaining here, though. If we don't,
> > then we'll be doing unwritten extent of set file size transactions
> > for ever 1MB bio completion instead of once for however large
> > writepages can build an ioend.
> > 
> > I think this is definitely worth pursuing - are you going to get any
> > time to work on this in the next couple of months, Christoph (i.e.
> > to target the 4.7 merge window)?
> 
> Yes, I've just started working on this and would love to get it off
> my table ASAP.  I was doing this ontop of your initial writeback
> changes, so let me know if I should go back to that version, or
> you think the last two patches will be ready in time as well?

Keep my extra two patches in your series. I'm going to try to spend
the rest of this cycle (for the 4.6 merge window) on Darrick's
reflink patchset, but I can carry whatever patches you finish off
until the appropriate time to merge them is.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/8] xfs: remove xfs_cancel_ioend
  2016-02-11 15:14       ` Christoph Hellwig
@ 2016-02-11 20:59         ` Dave Chinner
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Chinner @ 2016-02-11 20:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Feb 11, 2016 at 07:14:25AM -0800, Christoph Hellwig wrote:
> On Thu, Feb 11, 2016 at 11:21:37AM +1100, Dave Chinner wrote:
> > I'll propagate it through where it makes sense. If we alrady have an
> > error, then we aren't going to call xfs_setfilesize_trans_alloc()
> > anyway, so checking the return value only matters in the non-error
> > cases.
> 
> Oh, I missed that we don't care about the failure case.  Maybe we
> should just call xfs_setfilesize_trans_alloc instead, and just move
> the conditionals to it so that it's a no-op if no transaction is needed?

OK, but let's make cleanups like this at the start of the next
batch of work we are already working on for this code so this can be
finalised and made available for wider testing....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-02-11 20:59 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-10  8:47 [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Dave Chinner
2016-02-10  8:47 ` [PATCH 1/8] xfs: remove nonblocking mode from xfs_vm_writepage Dave Chinner
2016-02-10  8:47 ` [PATCH 2/8] xfs: remove xfs_cancel_ioend Dave Chinner
2016-02-10 11:28   ` Christoph Hellwig
2016-02-11  0:21     ` Dave Chinner
2016-02-11 15:14       ` Christoph Hellwig
2016-02-11 20:59         ` Dave Chinner
2016-02-10  8:47 ` [PATCH 3/8] xfs: Introduce writeback context for writepages Dave Chinner
2016-02-10 11:31   ` Christoph Hellwig
2016-02-11  0:25     ` Dave Chinner
2016-02-10  8:47 ` [PATCH 4/8] xfs: xfs_cluster_write is redundant Dave Chinner
2016-02-10  8:47 ` [PATCH 5/8] xfs: factor mapping out of xfs_do_writepage Dave Chinner
2016-02-10  8:47 ` [PATCH 6/8] xfs: don't chain ioends during writepage submission Dave Chinner
2016-02-10 11:36   ` Christoph Hellwig
2016-02-11  0:26     ` Dave Chinner
2016-02-11  6:39     ` Dave Chinner
2016-02-10  8:47 ` [PATCH 7/8] [RFC] xfs: build bios directly in xfs_add_to_ioend Dave Chinner
2016-02-10  8:47 ` [PATCH 8/8] [RFC] xfs: don't release bios on completion immediately Dave Chinner
2016-02-10  9:05 ` [PATCH 0/8 v4] xfs: get rid of xfs_cluster_write Christoph Hellwig
2016-02-10 18:25 ` Christoph Hellwig
2016-02-10 21:25   ` Dave Chinner
2016-02-11 15:13     ` Christoph Hellwig
2016-02-11 20:12       ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.