All of lore.kernel.org
 help / color / mirror / Atom feed
* stop using ioends for direct write completions
@ 2016-01-14 10:10 Christoph Hellwig
  2016-01-14 10:10 ` [PATCH] xfs: don't use " Christoph Hellwig
  2016-01-28 13:16 ` stop using " Christoph Hellwig
  0 siblings, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2016-01-14 10:10 UTC (permalink / raw)
  To: xfs; +Cc: darrick.wong

The patch below rewrites the direct write I/O completion path to avoid
the use of ioends.  Besides simplifying the code this will also allow
us to make COW I/O a type of it's own for the buffer I/O path.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] xfs: don't use ioends for direct write completions
  2016-01-14 10:10 stop using ioends for direct write completions Christoph Hellwig
@ 2016-01-14 10:10 ` Christoph Hellwig
  2016-01-28 13:16 ` stop using " Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2016-01-14 10:10 UTC (permalink / raw)
  To: xfs; +Cc: darrick.wong

We only need to communicate two bits of information to the direct I/O
completion handler:

 (1) do we need to convert any unwritten extents in the range
 (2) do we need to check if we need to update the inode size based
     on the range passed to the completion handler

We can use the private data passed to the get_block handler and the
completion handler as a simple bitmask to communicate this information
instead of the current complicated infrastructure reusing the ioends
from the buffer I/O path, and thus avoiding a memory allocation and
a context switch for any non-trivial direct write.  As a nice side
effect we also decouple the direct I/O path implementation from that
of the buffered I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c  | 210 +++++++++++++++++++----------------------------------
 fs/xfs/xfs_trace.h |   9 +--
 2 files changed, 77 insertions(+), 142 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 379c089..ff2c571 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -36,6 +36,10 @@
 #include <linux/pagevec.h>
 #include <linux/writeback.h>
 
+/* flags for direct write completions */
+#define XFS_DIO_FLAG_UNWRITTEN	(1 << 0)
+#define XFS_DIO_FLAG_APPEND	(1 << 1)
+
 void
 xfs_count_page_state(
 	struct page		*page,
@@ -1238,27 +1242,8 @@ xfs_vm_releasepage(
 }
 
 /*
- * When we map a DIO buffer, we may need to attach an ioend that describes the
- * type of write IO we are doing. This passes to the completion function the
- * operations it needs to perform. If the mapping is for an overwrite wholly
- * within the EOF then we don't need an ioend and so we don't allocate one.
- * This avoids the unnecessary overhead of allocating and freeing ioends for
- * workloads that don't require transactions on IO completion.
- *
- * If we get multiple mappings in a single IO, we might be mapping different
- * types. But because the direct IO can only have a single private pointer, we
- * need to ensure that:
- *
- * a) i) the ioend spans the entire region of unwritten mappings; or
- *    ii) the ioend spans all the mappings that cross or are beyond EOF; and
- * b) if it contains unwritten extents, it is *permanently* marked as such
- *
- * We could do this by chaining ioends like buffered IO does, but we only
- * actually get one IO completion callback from the direct IO, and that spans
- * the entire IO regardless of how many mappings and IOs are needed to complete
- * the DIO. There is only going to be one reference to the ioend and its life
- * cycle is constrained by the DIO completion code. hence we don't need
- * reference counting here.
+ * When we map a DIO buffer, we may need to pass flags to
+ * xfs_end_io_direct_write to tell it what kind of write IO we are doing.
  *
  * Note that for DIO, an IO to the highest supported file block offset (i.e.
  * 2^63 - 1FSB bytes) will result in the offset + count overflowing a signed 64
@@ -1266,68 +1251,26 @@ xfs_vm_releasepage(
  * extending the file size. We won't know for sure until IO completion is run
  * and the actual max write offset is communicated to the IO completion
  * routine.
- *
- * For DAX page faults, we are preparing to never see unwritten extents here,
- * nor should we ever extend the inode size. Hence we will soon have nothing to
- * do here for this case, ensuring we don't have to provide an IO completion
- * callback to free an ioend that we don't actually need for a fault into the
- * page at offset (2^63 - 1FSB) bytes.
  */
-
 static void
 xfs_map_direct(
 	struct inode		*inode,
 	struct buffer_head	*bh_result,
 	struct xfs_bmbt_irec	*imap,
-	xfs_off_t		offset,
-	bool			dax_fault)
+	xfs_off_t		offset)
 {
-	struct xfs_ioend	*ioend;
+	uintptr_t		*flags = (uintptr_t *)&bh_result->b_private;
 	xfs_off_t		size = bh_result->b_size;
-	int			type;
 
-	if (ISUNWRITTEN(imap))
-		type = XFS_IO_UNWRITTEN;
-	else
-		type = XFS_IO_OVERWRITE;
+	trace_xfs_get_blocks_map_direct(XFS_I(inode), offset, size,
+		ISUNWRITTEN(imap) ? XFS_IO_UNWRITTEN : XFS_IO_OVERWRITE, imap);
 
-	trace_xfs_gbmap_direct(XFS_I(inode), offset, size, type, imap);
-
-	if (dax_fault) {
-		ASSERT(type == XFS_IO_OVERWRITE);
-		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
-					    imap);
-		return;
-	}
-
-	if (bh_result->b_private) {
-		ioend = bh_result->b_private;
-		ASSERT(ioend->io_size > 0);
-		ASSERT(offset >= ioend->io_offset);
-		if (offset + size > ioend->io_offset + ioend->io_size)
-			ioend->io_size = offset - ioend->io_offset + size;
-
-		if (type == XFS_IO_UNWRITTEN && type != ioend->io_type)
-			ioend->io_type = XFS_IO_UNWRITTEN;
-
-		trace_xfs_gbmap_direct_update(XFS_I(inode), ioend->io_offset,
-					      ioend->io_size, ioend->io_type,
-					      imap);
-	} else if (type == XFS_IO_UNWRITTEN ||
-		   offset + size > i_size_read(inode) ||
-		   offset + size < 0) {
-		ioend = xfs_alloc_ioend(inode, type);
-		ioend->io_offset = offset;
-		ioend->io_size = size;
-
-		bh_result->b_private = ioend;
+	if (ISUNWRITTEN(imap)) {
+		*flags |= XFS_DIO_FLAG_UNWRITTEN;
+		set_buffer_defer_completion(bh_result);
+	} else if (offset + size > i_size_read(inode) || offset + size < 0) {
+		*flags |= XFS_DIO_FLAG_APPEND;
 		set_buffer_defer_completion(bh_result);
-
-		trace_xfs_gbmap_direct_new(XFS_I(inode), offset, size, type,
-					   imap);
-	} else {
-		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
-					    imap);
 	}
 }
 
@@ -1498,9 +1441,12 @@ __xfs_get_blocks(
 		if (ISUNWRITTEN(&imap))
 			set_buffer_unwritten(bh_result);
 		/* direct IO needs special help */
-		if (create && direct)
-			xfs_map_direct(inode, bh_result, &imap, offset,
-				       dax_fault);
+		if (create && direct) {
+			if (dax_fault)
+				ASSERT(!ISUNWRITTEN(&imap));
+			else
+				xfs_map_direct(inode, bh_result, &imap, offset);
+		}
 	}
 
 	/*
@@ -1570,17 +1516,34 @@ xfs_get_blocks_dax_fault(
 	return __xfs_get_blocks(inode, iblock, bh_result, create, true, true);
 }
 
-static void
-__xfs_end_io_direct_write(
-	struct inode		*inode,
-	struct xfs_ioend	*ioend,
+/*
+ * Complete a direct I/O write request.
+ *
+ * xfs_map_direct passes us some flags in the private data to tell us what to
+ * do.  If not flags are set, then the write IO is an overwrite wholly within
+ * the existing allocated file size and so there is nothing for us to do.
+ *
+ * Note that in this case the completion can be called in interrupt context,
+ * whereas if we have flags set we will always be called in task context
+ * (i.e. from a workqueue).
+ */
+STATIC void
+xfs_end_io_direct_write(
+	struct kiocb		*iocb,
 	loff_t			offset,
-	ssize_t			size)
+	ssize_t			size,
+	void			*private)
 {
-	struct xfs_mount	*mp = XFS_I(inode)->i_mount;
+	struct inode		*inode = file_inode(iocb->ki_filp);
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	uintptr_t		flags = (uintptr_t)private;
+	int			error = 0;
+
+	trace_xfs_end_io_direct_write(ip, offset, size);
 
-	if (XFS_FORCED_SHUTDOWN(mp) || ioend->io_error)
-		goto out_end_io;
+	if (XFS_FORCED_SHUTDOWN(mp))
+		return;
 
 	/*
 	 * dio completion end_io functions are only called on writes if more
@@ -1589,23 +1552,17 @@ __xfs_end_io_direct_write(
 	ASSERT(size > 0);
 
 	/*
-	 * The ioend only maps whole blocks, while the IO may be sector aligned.
-	 * Hence the ioend offset/size may not match the IO offset/size exactly.
-	 * Because we don't map overwrites within EOF into the ioend, the offset
-	 * may not match, but only if the endio spans EOF.  Either way, write
-	 * the IO sizes into the ioend so that completion processing does the
-	 * right thing.
-	 */
-	ASSERT(offset + size <= ioend->io_offset + ioend->io_size);
-	ioend->io_size = size;
-	ioend->io_offset = offset;
-
-	/*
-	 * The ioend tells us whether we are doing unwritten extent conversion
+	 * The flags tell us whether we are doing unwritten extent conversion
 	 * or an append transaction that updates the on-disk file size. These
 	 * cases are the only cases where we should *potentially* be needing
 	 * to update the VFS inode size.
-	 *
+	 */
+	if (flags == 0) {
+		ASSERT(offset + size <= i_size_read(inode));
+		return;
+	}
+
+	/*
 	 * We need to update the in-core inode size here so that we don't end up
 	 * with the on-disk inode size being outside the in-core inode size. We
 	 * have no other method of updating EOF for AIO, so always do it here
@@ -1616,54 +1573,33 @@ __xfs_end_io_direct_write(
 	 * here can result in EOF moving backwards and Bad Things Happen when
 	 * that occurs.
 	 */
-	spin_lock(&XFS_I(inode)->i_flags_lock);
+	spin_lock(&ip->i_flags_lock);
 	if (offset + size > i_size_read(inode))
 		i_size_write(inode, offset + size);
-	spin_unlock(&XFS_I(inode)->i_flags_lock);
+	spin_unlock(&ip->i_flags_lock);
 
-	/*
-	 * If we are doing an append IO that needs to update the EOF on disk,
-	 * do the transaction reserve now so we can use common end io
-	 * processing. Stashing the error (if there is one) in the ioend will
-	 * result in the ioend processing passing on the error if it is
-	 * possible as we can't return it from here.
-	 */
-	if (ioend->io_type == XFS_IO_OVERWRITE)
-		ioend->io_error = xfs_setfilesize_trans_alloc(ioend);
+	if (flags & XFS_DIO_FLAG_UNWRITTEN) {
+		trace_xfs_end_io_direct_write_unwritten(ip, offset, size);
 
-out_end_io:
-	xfs_end_io(&ioend->io_work);
-	return;
-}
+		error = xfs_iomap_write_unwritten(ip, offset, size);
 
-/*
- * Complete a direct I/O write request.
- *
- * The ioend structure is passed from __xfs_get_blocks() to tell us what to do.
- * If no ioend exists (i.e. @private == NULL) then the write IO is an overwrite
- * wholly within the EOF and so there is nothing for us to do. Note that in this
- * case the completion can be called in interrupt context, whereas if we have an
- * ioend we will always be called in task context (i.e. from a workqueue).
- */
-STATIC void
-xfs_end_io_direct_write(
-	struct kiocb		*iocb,
-	loff_t			offset,
-	ssize_t			size,
-	void			*private)
-{
-	struct inode		*inode = file_inode(iocb->ki_filp);
-	struct xfs_ioend	*ioend = private;
+		WARN_ON_ONCE(error);
+	} else if (flags & XFS_DIO_FLAG_APPEND) {
+		struct xfs_trans *tp;
 
-	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
-				     ioend ? ioend->io_type : 0, NULL);
+		trace_xfs_end_io_direct_write_append(ip, offset, size);
 
-	if (!ioend) {
-		ASSERT(offset + size <= i_size_read(inode));
-		return;
-	}
+		tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
 
-	__xfs_end_io_direct_write(inode, ioend, offset, size);
+		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
+		if (error) {
+			xfs_trans_cancel(tp);
+			WARN_ON_ONCE(error);
+			return;
+		}
+		error = xfs_setfilesize(ip, tp, offset, size);
+		WARN_ON_ONCE(error);
+	}
 }
 
 static inline ssize_t
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 391d797..c8d5842 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1296,11 +1296,7 @@ DEFINE_IOMAP_EVENT(xfs_map_blocks_found);
 DEFINE_IOMAP_EVENT(xfs_map_blocks_alloc);
 DEFINE_IOMAP_EVENT(xfs_get_blocks_found);
 DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_new);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_update);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_none);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_endio);
+DEFINE_IOMAP_EVENT(xfs_get_blocks_map_direct);
 
 DECLARE_EVENT_CLASS(xfs_simple_io_class,
 	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),
@@ -1340,6 +1336,9 @@ DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert);
 DEFINE_SIMPLE_IO_EVENT(xfs_get_blocks_notfound);
 DEFINE_SIMPLE_IO_EVENT(xfs_setfilesize);
 DEFINE_SIMPLE_IO_EVENT(xfs_zero_eof);
+DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write);
+DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_unwritten);
+DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_append);
 
 DECLARE_EVENT_CLASS(xfs_itrunc_class,
 	TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size),
-- 
1.9.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-01-14 10:10 stop using ioends for direct write completions Christoph Hellwig
  2016-01-14 10:10 ` [PATCH] xfs: don't use " Christoph Hellwig
@ 2016-01-28 13:16 ` Christoph Hellwig
  2016-01-28 20:53   ` Darrick J. Wong
  2016-01-29 14:12   ` Brian Foster
  1 sibling, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2016-01-28 13:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: darrick.wong, xfs

Any chance to get a review for this?  It should really help
with sorting out the buffered I/O COW code.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-01-28 13:16 ` stop using " Christoph Hellwig
@ 2016-01-28 20:53   ` Darrick J. Wong
  2016-01-28 21:10     ` Christoph Hellwig
  2016-01-29 14:12   ` Brian Foster
  1 sibling, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2016-01-28 20:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Christoph Hellwig, xfs

On Thu, Jan 28, 2016 at 05:16:56AM -0800, Christoph Hellwig wrote:
> Any chance to get a review for this?  It should really help
> with sorting out the buffered I/O COW code.

It looks reasonable to me.  I separated the dio and buffered CoW remap paths a
couple of weeks ago, because it seems that IO errors only get passed back as a
return value from __blockdev_direct_IO, therefore the remapping has to be done
from xfs_vm_do_dio anyway because we don't want to remap if the write fails.
Just yesterday I removed the "is_cow" flag from the ioend, so now we're back to
having a separate XFS_IO_COW ioend type.

So... reflink doesn't need the patch but OTOH directio doesn't really need 
the overhead of allocating an ioend anyway. :)

--D

> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-01-28 20:53   ` Darrick J. Wong
@ 2016-01-28 21:10     ` Christoph Hellwig
  2016-01-28 21:58       ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2016-01-28 21:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Thu, Jan 28, 2016 at 12:53:33PM -0800, Darrick J. Wong wrote:
> On Thu, Jan 28, 2016 at 05:16:56AM -0800, Christoph Hellwig wrote:
> > Any chance to get a review for this?  It should really help
> > with sorting out the buffered I/O COW code.
> 
> It looks reasonable to me.  I separated the dio and buffered CoW remap paths a
> couple of weeks ago, because it seems that IO errors only get passed back as a
> return value from __blockdev_direct_IO, therefore the remapping has to be done
> from xfs_vm_do_dio anyway because we don't want to remap if the write fails.
> Just yesterday I removed the "is_cow" flag from the ioend, so now we're back to
> having a separate XFS_IO_COW ioend type.

For direct I/O we will need something like this to properly support AIO
writes.

> So... reflink doesn't need the patch but OTOH directio doesn't really need 
> the overhead of allocating an ioend anyway. :)

Yep.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-01-28 21:10     ` Christoph Hellwig
@ 2016-01-28 21:58       ` Darrick J. Wong
  2016-01-28 22:02         ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2016-01-28 21:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Jan 28, 2016 at 10:10:56PM +0100, Christoph Hellwig wrote:
> On Thu, Jan 28, 2016 at 12:53:33PM -0800, Darrick J. Wong wrote:
> > On Thu, Jan 28, 2016 at 05:16:56AM -0800, Christoph Hellwig wrote:
> > > Any chance to get a review for this?  It should really help
> > > with sorting out the buffered I/O COW code.
> > 
> > It looks reasonable to me.  I separated the dio and buffered CoW remap paths a
> > couple of weeks ago, because it seems that IO errors only get passed back as a
> > return value from __blockdev_direct_IO, therefore the remapping has to be done
> > from xfs_vm_do_dio anyway because we don't want to remap if the write fails.
> > Just yesterday I removed the "is_cow" flag from the ioend, so now we're back to
> > having a separate XFS_IO_COW ioend type.
> 
> For direct I/O we will need something like this to properly support AIO
> writes.

Aw, snap, I knew I'd forgotten something.  Yep, we'll need that... I think
xfs_end_io_direct_write will have to sniff out the error status from "size"
and either remap or discard the CoW allocations as appropriate.

Heh, guess I'd better go write some aio tests. :)

> > So... reflink doesn't need the patch but OTOH directio doesn't really need 
> > the overhead of allocating an ioend anyway. :)
> 
> Yep.

--D

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-01-28 21:58       ` Darrick J. Wong
@ 2016-01-28 22:02         ` Christoph Hellwig
  2016-01-28 22:31           ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2016-01-28 22:02 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Thu, Jan 28, 2016 at 01:58:53PM -0800, Darrick J. Wong wrote:
> Aw, snap, I knew I'd forgotten something.  Yep, we'll need that... I think
> xfs_end_io_direct_write will have to sniff out the error status from "size"
> and either remap or discard the CoW allocations as appropriate.

I'd rather fix the direct I/O code to give us that information directly
(pun intended).  I'll add that to my short term todo list as it seems
useful for the existing code as well.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-01-28 22:02         ` Christoph Hellwig
@ 2016-01-28 22:31           ` Darrick J. Wong
  2016-01-29  8:01             ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2016-01-28 22:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Jan 28, 2016 at 11:02:55PM +0100, Christoph Hellwig wrote:
> On Thu, Jan 28, 2016 at 01:58:53PM -0800, Darrick J. Wong wrote:
> > Aw, snap, I knew I'd forgotten something.  Yep, we'll need that... I think
> > xfs_end_io_direct_write will have to sniff out the error status from "size"
> > and either remap or discard the CoW allocations as appropriate.
> 
> I'd rather fix the direct I/O code to give us that information directly
> (pun intended).  I'll add that to my short term todo list as it seems
> useful for the existing code as well.

Ok.

As for the generic/154... the code on github is sadly not very bisectable,
other than to say that until yesterday I was still tacking new code onto
the end of the patchset.

That said, I was seeing occasional hangs in generic/154 (I needed to roll
the transactions between key parts of wrapping up a CoW) and they seem to
have gone away, so it's possible that I've fixed it already.

--D

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-01-28 22:31           ` Darrick J. Wong
@ 2016-01-29  8:01             ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2016-01-29  8:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Thu, Jan 28, 2016 at 02:31:31PM -0800, Darrick J. Wong wrote:
> As for the generic/154... the code on github is sadly not very bisectable,
> other than to say that until yesterday I was still tacking new code onto
> the end of the patchset.
> 
> That said, I was seeing occasional hangs in generic/154 (I needed to roll
> the transactions between key parts of wrapping up a CoW) and they seem to
> have gone away, so it's possible that I've fixed it already.

I'm seeing them with a HEAD of 9628553acf5ff4225e047d580094ffab64605f50,
but only when using NFS.  If you've got something newer feel free
to push it out, otherwise I'll have to spend some more time
trying to figure out what's going on with this version.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-01-28 13:16 ` stop using " Christoph Hellwig
  2016-01-28 20:53   ` Darrick J. Wong
@ 2016-01-29 14:12   ` Brian Foster
  2016-02-01 21:54     ` Darrick J. Wong
  2016-02-02 11:20     ` Christoph Hellwig
  1 sibling, 2 replies; 17+ messages in thread
From: Brian Foster @ 2016-01-29 14:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, Christoph Hellwig, darrick.wong

On Thu, Jan 28, 2016 at 05:16:56AM -0800, Christoph Hellwig wrote:
> Any chance to get a review for this?  It should really help
> with sorting out the buffered I/O COW code.
> 

I haven't taken a closer look at the patch yet. I was kind of waiting
for Dave to chime in because I'm a little confused over the back and
forth nature of dio/ioend completion lately.

My understanding is that the original requirement for ioends here was to
track the state necessary in order to defer (to wq) completions that had
to allocate transactions. Eventually, the deferred buffer state was
implemented and we no longer required an ioend for that, so we removed
the ioends here:

  2ba6623 xfs: don't allocate an ioend for direct I/O completions

Then just a couple months later, we merged the following to re-add the
ioend into the dio path:

  d5cc2e3f xfs: DIO needs an ioend for writes

I recall reviewing that one, but looking back afaict the ioend was used
simply to enable reuse of the ioend completion code. Now we have this
patch which presumably removes much of that code to eliminate the ioend
allocation overhead.

Neither this patch nor the previous has any technical reasoning for one
approach over the other in the commit logs, so afaics this appears to be
a matter of preference. Can we get some agreement (or discussion) on
what the right interface is to transfer information to dio completion?
E.g., is this allocation overhead noticeable? Is ioend usage problematic
for other reasons (such as copy-on-write)? Going back to the previous
patch, were there explicit reasons for using ioends aside from code
reuse? Note that the subsequent commit 6dfa1b67 ("xfs: handle DIO
overwrite EOF update completion correctly") does refer to some problems
not running dio completions in the right context... Dave?

Brian

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-01-29 14:12   ` Brian Foster
@ 2016-02-01 21:54     ` Darrick J. Wong
  2016-02-02 11:17       ` Christoph Hellwig
  2016-02-02 15:26       ` Brian Foster
  2016-02-02 11:20     ` Christoph Hellwig
  1 sibling, 2 replies; 17+ messages in thread
From: Darrick J. Wong @ 2016-02-01 21:54 UTC (permalink / raw)
  To: Dave Chinner, Brian Foster; +Cc: Christoph Hellwig, Christoph Hellwig, xfs

On Fri, Jan 29, 2016 at 09:12:33AM -0500, Brian Foster wrote:
> On Thu, Jan 28, 2016 at 05:16:56AM -0800, Christoph Hellwig wrote:
> > Any chance to get a review for this?  It should really help
> > with sorting out the buffered I/O COW code.
> > 
> 
> I haven't taken a closer look at the patch yet. I was kind of waiting
> for Dave to chime in because I'm a little confused over the back and
> forth nature of dio/ioend completion lately.
> 
> My understanding is that the original requirement for ioends here was to
> track the state necessary in order to defer (to wq) completions that had
> to allocate transactions. Eventually, the deferred buffer state was
> implemented and we no longer required an ioend for that, so we removed
> the ioends here:
> 
>   2ba6623 xfs: don't allocate an ioend for direct I/O completions
> 
> Then just a couple months later, we merged the following to re-add the
> ioend into the dio path:
> 
>   d5cc2e3f xfs: DIO needs an ioend for writes
> 
> I recall reviewing that one, but looking back afaict the ioend was used
> simply to enable reuse of the ioend completion code. Now we have this
> patch which presumably removes much of that code to eliminate the ioend
> allocation overhead.
> 
> Neither this patch nor the previous has any technical reasoning for one
> approach over the other in the commit logs, so afaics this appears to be
> a matter of preference. Can we get some agreement (or discussion) on
> what the right interface is to transfer information to dio completion?
> E.g., is this allocation overhead noticeable? Is ioend usage problematic
> for other reasons (such as copy-on-write)? Going back to the previous
> patch, were there explicit reasons for using ioends aside from code
> reuse? Note that the subsequent commit 6dfa1b67 ("xfs: handle DIO
> overwrite EOF update completion correctly") does refer to some problems
> not running dio completions in the right context... Dave?

I don't know the reason /for/ using ioends, but I'll share an argument in favor
of Christoph's patch to remove them:

Each ioend has a type code which determines what we do at the end of the IO.
For buffered writes, we can create as many ioends as necessary to handle all
the different dispositions of all the blocks in the range of dirty pages.

For directio, however, the requirements are a little different -- we want to be
able to say "change all the extents in this part of the file from unwritten to
written", and to be able to change i_size after a write.  We can reuse the
ioend structure to encode these desires until CoW came along.  Now we also want
to be able to say "remap all extents in this part of a file if the write
succeeds".  Since we can only have one type code per ioend, either we have to
hide extra offset/size fields in the ioend for this purpose, add a flags field,
revise the code to create a bunch of ioends for each disposition type,
unconditionally try to remap blocks for any reflink'd inode, or do what
Christoph proposes, which is to stop (ab)using ioends and simply encode status
flags into dio->private.

Christoph's approach seems like the easiest, since we're already provided
with the offset and size, and both the unwritten extent conversion and the
CoW remap code know how to iterate the data fork to look for the extents
they want to alter.

The overhead probably goes down once we get rid of the memory allocation,
but the primary purpose is to make life easier for CoW to avoid unnecessary
overhead.

Also: I've noticed that if I write 8MB file to a file backed by a failing
device, xfs_end_io_direct_write() gets called with size == 8MB even if the
writes failed.  If the write was to an unwritten extent, the extent will be
converted to a regular extent even though the blocks contain old file contents
and other garbage, which can subsequently be read off the disk.

For CoW I'd only want to remap the blocks if the write totally succeeds.  If a
disk error happens, the EIO code gets passed back to userspace, and it seems
logical to me that the file contents should remain unchanged.

To that end, I'm changing dio_iodone_t to pass an error code, and I think I
want to change XFS to avoid doing unwritten extent conversions when the write
fails.  Does that make sense?

--D

> 
> Brian
> 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-02-01 21:54     ` Darrick J. Wong
@ 2016-02-02 11:17       ` Christoph Hellwig
  2016-02-02 15:26       ` Brian Foster
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2016-02-02 11:17 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, Brian Foster, Christoph Hellwig, xfs

On Mon, Feb 01, 2016 at 01:54:07PM -0800, Darrick J. Wong wrote:
> Also: I've noticed that if I write 8MB file to a file backed by a failing
> device, xfs_end_io_direct_write() gets called with size == 8MB even if the
> writes failed.  If the write was to an unwritten extent, the extent will be
> converted to a regular extent even though the blocks contain old file contents
> and other garbage, which can subsequently be read off the disk.
> 
> For CoW I'd only want to remap the blocks if the write totally succeeds.  If a
> disk error happens, the EIO code gets passed back to userspace, and it seems
> logical to me that the file contents should remain unchanged.
> 
> To that end, I'm changing dio_iodone_t to pass an error code, and I think I
> want to change XFS to avoid doing unwritten extent conversions when the write
> fails.  Does that make sense?

I have some of these patches prepared already, I'm just too busy keeping
up with all your changes.  I'll rebase and post this ASAP!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-01-29 14:12   ` Brian Foster
  2016-02-01 21:54     ` Darrick J. Wong
@ 2016-02-02 11:20     ` Christoph Hellwig
  2016-02-02 15:31       ` Brian Foster
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2016-02-02 11:20 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, xfs, Christoph Hellwig, darrick.wong

On Fri, Jan 29, 2016 at 09:12:33AM -0500, Brian Foster wrote:
> My understanding is that the original requirement for ioends here was to
> track the state necessary in order to defer (to wq) completions that had
> to allocate transactions. Eventually, the deferred buffer state was
> implemented and we no longer required an ioend for that, so we removed
> the ioends here:
> 
>   2ba6623 xfs: don't allocate an ioend for direct I/O completions
> 
> Then just a couple months later, we merged the following to re-add the
> ioend into the dio path:
> 
>   d5cc2e3f xfs: DIO needs an ioend for writes

And I complained about that one!  I didn't have time to provide a full
analysis and the patches around this one were criticial enough.  But this
patch has the analysis on why ioends are not needed and bad which should
have been the full review back then.  Note that this new version is
better than the old ioend removal as it does not just use the private
data as a boolean, but as a bitmask so that additional information
such as the COW status can be communicated.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-02-01 21:54     ` Darrick J. Wong
  2016-02-02 11:17       ` Christoph Hellwig
@ 2016-02-02 15:26       ` Brian Foster
  1 sibling, 0 replies; 17+ messages in thread
From: Brian Foster @ 2016-02-02 15:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, Christoph Hellwig, xfs

On Mon, Feb 01, 2016 at 01:54:07PM -0800, Darrick J. Wong wrote:
> On Fri, Jan 29, 2016 at 09:12:33AM -0500, Brian Foster wrote:
> > On Thu, Jan 28, 2016 at 05:16:56AM -0800, Christoph Hellwig wrote:
> > > Any chance to get a review for this?  It should really help
> > > with sorting out the buffered I/O COW code.
> > > 
> > 
> > I haven't taken a closer look at the patch yet. I was kind of waiting
> > for Dave to chime in because I'm a little confused over the back and
> > forth nature of dio/ioend completion lately.
> > 
> > My understanding is that the original requirement for ioends here was to
> > track the state necessary in order to defer (to wq) completions that had
> > to allocate transactions. Eventually, the deferred buffer state was
> > implemented and we no longer required an ioend for that, so we removed
> > the ioends here:
> > 
> >   2ba6623 xfs: don't allocate an ioend for direct I/O completions
> > 
> > Then just a couple months later, we merged the following to re-add the
> > ioend into the dio path:
> > 
> >   d5cc2e3f xfs: DIO needs an ioend for writes
> > 
> > I recall reviewing that one, but looking back afaict the ioend was used
> > simply to enable reuse of the ioend completion code. Now we have this
> > patch which presumably removes much of that code to eliminate the ioend
> > allocation overhead.
> > 
> > Neither this patch nor the previous has any technical reasoning for one
> > approach over the other in the commit logs, so afaics this appears to be
> > a matter of preference. Can we get some agreement (or discussion) on
> > what the right interface is to transfer information to dio completion?
> > E.g., is this allocation overhead noticeable? Is ioend usage problematic
> > for other reasons (such as copy-on-write)? Going back to the previous
> > patch, were there explicit reasons for using ioends aside from code
> > reuse? Note that the subsequent commit 6dfa1b67 ("xfs: handle DIO
> > overwrite EOF update completion correctly") does refer to some problems
> > not running dio completions in the right context... Dave?
> 
> I don't know the reason /for/ using ioends, but I'll share an argument in favor
> of Christoph's patch to remove them:
> 

The only thing I can gather from the historical information is that they
are used to facilitate reuse of the completion path code responsible for
unwritten conversion, inode size updates, etc. (i.e., convenience).

> Each ioend has a type code which determines what we do at the end of the IO.
> For buffered writes, we can create as many ioends as necessary to handle all
> the different dispositions of all the blocks in the range of dirty pages.
> 
> For directio, however, the requirements are a little different -- we want to be
> able to say "change all the extents in this part of the file from unwritten to
> written", and to be able to change i_size after a write.  We can reuse the
> ioend structure to encode these desires until CoW came along.  Now we also want
> to be able to say "remap all extents in this part of a file if the write
> succeeds".  Since we can only have one type code per ioend, either we have to
> hide extra offset/size fields in the ioend for this purpose, add a flags field,
> revise the code to create a bunch of ioends for each disposition type,
> unconditionally try to remap blocks for any reflink'd inode, or do what
> Christoph proposes, which is to stop (ab)using ioends and simply encode status
> flags into dio->private.
> 
> Christoph's approach seems like the easiest, since we're already provided
> with the offset and size, and both the unwritten extent conversion and the
> CoW remap code know how to iterate the data fork to look for the extents
> they want to alter.
> 
> The overhead probably goes down once we get rid of the memory allocation,
> but the primary purpose is to make life easier for CoW to avoid unnecessary
> overhead.
> 

That sounds reasonable to me, though I'd probably have to see the cow
bits to grok/appreciate the complexity avoided by killing off ioends
here. I wonder if then perhaps the right thing to do is package this
change with the associated cow bits (as a dependency) wherever/once they
are available..?

> Also: I've noticed that if I write 8MB file to a file backed by a failing
> device, xfs_end_io_direct_write() gets called with size == 8MB even if the
> writes failed.  If the write was to an unwritten extent, the extent will be
> converted to a regular extent even though the blocks contain old file contents
> and other garbage, which can subsequently be read off the disk.
> 
> For CoW I'd only want to remap the blocks if the write totally succeeds.  If a
> disk error happens, the EIO code gets passed back to userspace, and it seems
> logical to me that the file contents should remain unchanged.
> 
> To that end, I'm changing dio_iodone_t to pass an error code, and I think I
> want to change XFS to avoid doing unwritten extent conversions when the write
> fails.  Does that make sense?
> 

Yep, sounds like a fail to me.

Brian

> --D
> 
> > 
> > Brian
> > 
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@oss.sgi.com
> > > http://oss.sgi.com/mailman/listinfo/xfs
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-02-02 11:20     ` Christoph Hellwig
@ 2016-02-02 15:31       ` Brian Foster
  2016-02-02 16:42         ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2016-02-02 15:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Christoph Hellwig, darrick.wong, xfs

On Tue, Feb 02, 2016 at 12:20:46PM +0100, Christoph Hellwig wrote:
> On Fri, Jan 29, 2016 at 09:12:33AM -0500, Brian Foster wrote:
> > My understanding is that the original requirement for ioends here was to
> > track the state necessary in order to defer (to wq) completions that had
> > to allocate transactions. Eventually, the deferred buffer state was
> > implemented and we no longer required an ioend for that, so we removed
> > the ioends here:
> > 
> >   2ba6623 xfs: don't allocate an ioend for direct I/O completions
> > 
> > Then just a couple months later, we merged the following to re-add the
> > ioend into the dio path:
> > 
> >   d5cc2e3f xfs: DIO needs an ioend for writes
> 
> And I complained about that one!  I didn't have time to provide a full
> analysis and the patches around this one were criticial enough.  But this
> patch has the analysis on why ioends are not needed and bad which should
> have been the full review back then.  Note that this new version is
> better than the old ioend removal as it does not just use the private
> data as a boolean, but as a bitmask so that additional information
> such as the COW status can be communicated.
> 

FWIW, I don't see any such review comments against the three versions of
the "DIO needs an ioend for writes" patch I have in my mailbox, but I
easily could have missed something..? But if there wasn't time, then
fair enough.

I'm just looking for context. I don't have much of an opinion on which
approach is used here. If it simplifies COW, then that seems good enough
reason to me to take this approach. I'm pointing this out more because
this code seems to have been rewritten the last couple of times we
needed to fix something, which makes backports particularly annoying.
The two patches above were associated with a broader enhancement and a
bug fix (respectively) as a sort of justification, whereas this post had
a much more vague purpose from what I could tell, and therefore why I at
least hadn't taken the time to review it.

If COW is the primary motivator, perhaps we can bundle it with that
work?

Brian

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-02-02 15:31       ` Brian Foster
@ 2016-02-02 16:42         ` Christoph Hellwig
  2016-02-03 22:22           ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2016-02-02 16:42 UTC (permalink / raw)
  To: Brian Foster; +Cc: darrick.wong, xfs

On Tue, Feb 02, 2016 at 10:31:18AM -0500, Brian Foster wrote:
> FWIW, I don't see any such review comments against the three versions of
> the "DIO needs an ioend for writes" patch I have in my mailbox, but I
> easily could have missed something..? But if there wasn't time, then
> fair enough.

I'll have to look at the mailboxes, but I remember Dave sending this
out and complaining.

> I'm just looking for context. I don't have much of an opinion on which
> approach is used here. If it simplifies COW, then that seems good enough
> reason to me to take this approach. I'm pointing this out more because
> this code seems to have been rewritten the last couple of times we
> needed to fix something, which makes backports particularly annoying.
> The two patches above were associated with a broader enhancement and a
> bug fix (respectively) as a sort of justification, whereas this post had
> a much more vague purpose from what I could tell, and therefore why I at
> least hadn't taken the time to review it.
> 
> If COW is the primary motivator, perhaps we can bundle it with that
> work?

The prime motivator is to:

 (1) avoid a pointless memory allocation
 (2) avoid a pointless context switch
 (3) avoid pointless code complexity

COW is just another case where these show up.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: stop using ioends for direct write completions
  2016-02-02 16:42         ` Christoph Hellwig
@ 2016-02-03 22:22           ` Dave Chinner
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2016-02-03 22:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Brian Foster, xfs, darrick.wong

On Tue, Feb 02, 2016 at 05:42:37PM +0100, Christoph Hellwig wrote:
> On Tue, Feb 02, 2016 at 10:31:18AM -0500, Brian Foster wrote:
> > FWIW, I don't see any such review comments against the three versions of
> > the "DIO needs an ioend for writes" patch I have in my mailbox, but I
> > easily could have missed something..? But if there wasn't time, then
> > fair enough.
> 
> I'll have to look at the mailboxes, but I remember Dave sending this
> out and complaining.

I don't recall the exact discussion that was had, but at the time it
was an evil that I couldn't see a way of avoiding, and with no other
solution being presented.

ISTR a tie-in with the DAX code, too, but that's gone away now with
the block zeroing during allocation rather than using unwritten
extents and completions for this.

> > If COW is the primary motivator, perhaps we can bundle it with that
> > work?
> 
> The prime motivator is to:
> 
>  (1) avoid a pointless memory allocation
>  (2) avoid a pointless context switch
>  (3) avoid pointless code complexity
> 
> COW is just another case where these show up.

*nod*

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-02-03 22:22 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-14 10:10 stop using ioends for direct write completions Christoph Hellwig
2016-01-14 10:10 ` [PATCH] xfs: don't use " Christoph Hellwig
2016-01-28 13:16 ` stop using " Christoph Hellwig
2016-01-28 20:53   ` Darrick J. Wong
2016-01-28 21:10     ` Christoph Hellwig
2016-01-28 21:58       ` Darrick J. Wong
2016-01-28 22:02         ` Christoph Hellwig
2016-01-28 22:31           ` Darrick J. Wong
2016-01-29  8:01             ` Christoph Hellwig
2016-01-29 14:12   ` Brian Foster
2016-02-01 21:54     ` Darrick J. Wong
2016-02-02 11:17       ` Christoph Hellwig
2016-02-02 15:26       ` Brian Foster
2016-02-02 11:20     ` Christoph Hellwig
2016-02-02 15:31       ` Brian Foster
2016-02-02 16:42         ` Christoph Hellwig
2016-02-03 22:22           ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.