All of lore.kernel.org
 help / color / mirror / Atom feed
* vfs/xfs: directio updates to ease COW handling V2
@ 2016-02-03 18:40 ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-03 18:40 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

See http://www.infradead.org/rpr.html

The first patch ensures ->end_io is always called for direct I/O requests
that pass it in, even if there was a zero length write, or if an error
occured.  The existing users have been updated to ignore it, but XFS
will make use of it in the future, and a comment in ext4 suggests it
might be useful for it as well.

The other two simplify the XFS direct I/O code.

Changes since V1:
 - allow ->end_io to return errors
 - a comment spelling fix



^ permalink raw reply	[flat|nested] 52+ messages in thread

* vfs/xfs: directio updates to ease COW handling V2
@ 2016-02-03 18:40 ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-03 18:40 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

The first patch ensures ->end_io is always called for direct I/O requests
that pass it in, even if there was a zero length write, or if an error
occured.  The existing users have been updated to ignore it, but XFS
will make use of it in the future, and a comment in ext4 suggests it
might be useful for it as well.

The other two simplify the XFS direct I/O code.

Changes since V1:
 - allow ->end_io to return errors
 - a comment spelling fix


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] vfs/xfs: directio updates to ease COW handling V2
@ 2016-02-03 18:40 ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-03 18:40 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

See http://www.infradead.org/rpr.html

The first patch ensures ->end_io is always called for direct I/O requests
that pass it in, even if there was a zero length write, or if an error
occured.  The existing users have been updated to ignore it, but XFS
will make use of it in the future, and a comment in ext4 suggests it
might be useful for it as well.

The other two simplify the XFS direct I/O code.

Changes since V1:
 - allow ->end_io to return errors
 - a comment spelling fix

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 1/3] direct-io: always call ->end_io if non-NULL
  2016-02-03 18:40 ` Christoph Hellwig
  (?)
@ 2016-02-03 18:40   ` Christoph Hellwig
  -1 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-03 18:40 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

See http://www.infradead.org/rpr.html

This way we can pass back errors to the file system, and allow for
cleanup required for all direct I/O invocations.

Also allow the ->end_io handlers to return errors on their own, so that
I/O completion errors can be passed on to the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/dax.c           |  9 +++++++--
 fs/direct-io.c     |  9 +++++++--
 fs/ext4/inode.c    |  9 +++++++--
 fs/ocfs2/aops.c    |  7 ++++++-
 fs/xfs/xfs_aops.c  | 13 +++++++------
 include/linux/fs.h |  2 +-
 6 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 4fd6b0c..e38b2c5 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -267,8 +267,13 @@ ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
 	if ((flags & DIO_LOCKING) && iov_iter_rw(iter) == READ)
 		inode_unlock(inode);
 
-	if ((retval > 0) && end_io)
-		end_io(iocb, pos, retval, bh.b_private);
+	if (end_io) {
+		int err;
+
+		err = end_io(iocb, pos, retval, bh.b_private);
+		if (err)
+			retval = err;
+	}
 
 	if (!(flags & DIO_SKIP_DIO_COUNT))
 		inode_dio_end(inode);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 1b2f7ff..9c6f885 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -253,8 +253,13 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
 	if (ret == 0)
 		ret = transferred;
 
-	if (dio->end_io && dio->result)
-		dio->end_io(dio->iocb, offset, transferred, dio->private);
+	if (dio->end_io) {
+		int err;
+
+		err = dio->end_io(dio->iocb, offset, ret, dio->private);
+		if (err)
+			ret = err;
+	}
 
 	if (!(dio->flags & DIO_SKIP_DIO_COUNT))
 		inode_dio_end(dio->inode);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 83bc8bf..9db04dd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3161,14 +3161,17 @@ out:
 }
 #endif
 
-static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
+static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
 			    ssize_t size, void *private)
 {
         ext4_io_end_t *io_end = iocb->private;
 
+	if (size <= 0)
+		return 0;
+
 	/* if not async direct IO just return */
 	if (!io_end)
-		return;
+		return 0;
 
 	ext_debug("ext4_end_io_dio(): io_end 0x%p "
 		  "for inode %lu, iocb 0x%p, offset %llu, size %zd\n",
@@ -3179,6 +3182,8 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
 	io_end->offset = offset;
 	io_end->size = size;
 	ext4_put_io_end(io_end);
+
+	return 0;
 }
 
 /*
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 794fd15..5dcc5f5 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -620,7 +620,7 @@ bail:
  * particularly interested in the aio/dio case.  We use the rw_lock DLM lock
  * to protect io on one node from truncation on another.
  */
-static void ocfs2_dio_end_io(struct kiocb *iocb,
+static int ocfs2_dio_end_io(struct kiocb *iocb,
 			     loff_t offset,
 			     ssize_t bytes,
 			     void *private)
@@ -628,6 +628,9 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
 	struct inode *inode = file_inode(iocb->ki_filp);
 	int level;
 
+	if (bytes <= 0)
+		return 0;
+
 	/* this io's submitter should not have unlocked this before we could */
 	BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
 
@@ -644,6 +647,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
 		level = ocfs2_iocb_rw_locked_level(iocb);
 		ocfs2_rw_unlock(inode, level);
 	}
+
+	return 0;
 }
 
 static int ocfs2_releasepage(struct page *page, gfp_t wait)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 379c089..295aaff 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1645,7 +1645,7 @@ out_end_io:
  * case the completion can be called in interrupt context, whereas if we have an
  * ioend we will always be called in task context (i.e. from a workqueue).
  */
-STATIC void
+STATIC int
 xfs_end_io_direct_write(
 	struct kiocb		*iocb,
 	loff_t			offset,
@@ -1655,15 +1655,19 @@ xfs_end_io_direct_write(
 	struct inode		*inode = file_inode(iocb->ki_filp);
 	struct xfs_ioend	*ioend = private;
 
+	if (size <= 0)
+		return 0;
+
 	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
 				     ioend ? ioend->io_type : 0, NULL);
 
 	if (!ioend) {
 		ASSERT(offset + size <= i_size_read(inode));
-		return;
+		return 0;
 	}
 
 	__xfs_end_io_direct_write(inode, ioend, offset, size);
+	return 0;
 }
 
 static inline ssize_t
@@ -1672,10 +1676,7 @@ xfs_vm_do_dio(
 	struct kiocb		*iocb,
 	struct iov_iter		*iter,
 	loff_t			offset,
-	void			(*endio)(struct kiocb	*iocb,
-					 loff_t		offset,
-					 ssize_t	size,
-					 void		*private),
+	dio_iodone_t		endio,
 	int			flags)
 {
 	struct block_device	*bdev;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1a20462..d7f37bf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -70,7 +70,7 @@ extern int sysctl_protected_hardlinks;
 struct buffer_head;
 typedef int (get_block_t)(struct inode *inode, sector_t iblock,
 			struct buffer_head *bh_result, int create);
-typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
+typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 			ssize_t bytes, void *private);
 typedef void (dax_iodone_t)(struct buffer_head *bh_map, int uptodate);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 1/3] direct-io: always call ->end_io if non-NULL
@ 2016-02-03 18:40   ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-03 18:40 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

This way we can pass back errors to the file system, and allow for
cleanup required for all direct I/O invocations.

Also allow the ->end_io handlers to return errors on their own, so that
I/O completion errors can be passed on to the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/dax.c           |  9 +++++++--
 fs/direct-io.c     |  9 +++++++--
 fs/ext4/inode.c    |  9 +++++++--
 fs/ocfs2/aops.c    |  7 ++++++-
 fs/xfs/xfs_aops.c  | 13 +++++++------
 include/linux/fs.h |  2 +-
 6 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 4fd6b0c..e38b2c5 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -267,8 +267,13 @@ ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
 	if ((flags & DIO_LOCKING) && iov_iter_rw(iter) == READ)
 		inode_unlock(inode);
 
-	if ((retval > 0) && end_io)
-		end_io(iocb, pos, retval, bh.b_private);
+	if (end_io) {
+		int err;
+
+		err = end_io(iocb, pos, retval, bh.b_private);
+		if (err)
+			retval = err;
+	}
 
 	if (!(flags & DIO_SKIP_DIO_COUNT))
 		inode_dio_end(inode);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 1b2f7ff..9c6f885 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -253,8 +253,13 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
 	if (ret == 0)
 		ret = transferred;
 
-	if (dio->end_io && dio->result)
-		dio->end_io(dio->iocb, offset, transferred, dio->private);
+	if (dio->end_io) {
+		int err;
+
+		err = dio->end_io(dio->iocb, offset, ret, dio->private);
+		if (err)
+			ret = err;
+	}
 
 	if (!(dio->flags & DIO_SKIP_DIO_COUNT))
 		inode_dio_end(dio->inode);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 83bc8bf..9db04dd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3161,14 +3161,17 @@ out:
 }
 #endif
 
-static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
+static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
 			    ssize_t size, void *private)
 {
         ext4_io_end_t *io_end = iocb->private;
 
+	if (size <= 0)
+		return 0;
+
 	/* if not async direct IO just return */
 	if (!io_end)
-		return;
+		return 0;
 
 	ext_debug("ext4_end_io_dio(): io_end 0x%p "
 		  "for inode %lu, iocb 0x%p, offset %llu, size %zd\n",
@@ -3179,6 +3182,8 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
 	io_end->offset = offset;
 	io_end->size = size;
 	ext4_put_io_end(io_end);
+
+	return 0;
 }
 
 /*
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 794fd15..5dcc5f5 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -620,7 +620,7 @@ bail:
  * particularly interested in the aio/dio case.  We use the rw_lock DLM lock
  * to protect io on one node from truncation on another.
  */
-static void ocfs2_dio_end_io(struct kiocb *iocb,
+static int ocfs2_dio_end_io(struct kiocb *iocb,
 			     loff_t offset,
 			     ssize_t bytes,
 			     void *private)
@@ -628,6 +628,9 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
 	struct inode *inode = file_inode(iocb->ki_filp);
 	int level;
 
+	if (bytes <= 0)
+		return 0;
+
 	/* this io's submitter should not have unlocked this before we could */
 	BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
 
@@ -644,6 +647,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
 		level = ocfs2_iocb_rw_locked_level(iocb);
 		ocfs2_rw_unlock(inode, level);
 	}
+
+	return 0;
 }
 
 static int ocfs2_releasepage(struct page *page, gfp_t wait)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 379c089..295aaff 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1645,7 +1645,7 @@ out_end_io:
  * case the completion can be called in interrupt context, whereas if we have an
  * ioend we will always be called in task context (i.e. from a workqueue).
  */
-STATIC void
+STATIC int
 xfs_end_io_direct_write(
 	struct kiocb		*iocb,
 	loff_t			offset,
@@ -1655,15 +1655,19 @@ xfs_end_io_direct_write(
 	struct inode		*inode = file_inode(iocb->ki_filp);
 	struct xfs_ioend	*ioend = private;
 
+	if (size <= 0)
+		return 0;
+
 	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
 				     ioend ? ioend->io_type : 0, NULL);
 
 	if (!ioend) {
 		ASSERT(offset + size <= i_size_read(inode));
-		return;
+		return 0;
 	}
 
 	__xfs_end_io_direct_write(inode, ioend, offset, size);
+	return 0;
 }
 
 static inline ssize_t
@@ -1672,10 +1676,7 @@ xfs_vm_do_dio(
 	struct kiocb		*iocb,
 	struct iov_iter		*iter,
 	loff_t			offset,
-	void			(*endio)(struct kiocb	*iocb,
-					 loff_t		offset,
-					 ssize_t	size,
-					 void		*private),
+	dio_iodone_t		endio,
 	int			flags)
 {
 	struct block_device	*bdev;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1a20462..d7f37bf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -70,7 +70,7 @@ extern int sysctl_protected_hardlinks;
 struct buffer_head;
 typedef int (get_block_t)(struct inode *inode, sector_t iblock,
 			struct buffer_head *bh_result, int create);
-typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
+typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 			ssize_t bytes, void *private);
 typedef void (dax_iodone_t)(struct buffer_head *bh_map, int uptodate);
 
-- 
2.1.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] direct-io: always call ->end_io if non-NULL
@ 2016-02-03 18:40   ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-03 18:40 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

See http://www.infradead.org/rpr.html

This way we can pass back errors to the file system, and allow for
cleanup required for all direct I/O invocations.

Also allow the ->end_io handlers to return errors on their own, so that
I/O completion errors can be passed on to the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/dax.c           |  9 +++++++--
 fs/direct-io.c     |  9 +++++++--
 fs/ext4/inode.c    |  9 +++++++--
 fs/ocfs2/aops.c    |  7 ++++++-
 fs/xfs/xfs_aops.c  | 13 +++++++------
 include/linux/fs.h |  2 +-
 6 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 4fd6b0c..e38b2c5 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -267,8 +267,13 @@ ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
 	if ((flags & DIO_LOCKING) && iov_iter_rw(iter) == READ)
 		inode_unlock(inode);
 
-	if ((retval > 0) && end_io)
-		end_io(iocb, pos, retval, bh.b_private);
+	if (end_io) {
+		int err;
+
+		err = end_io(iocb, pos, retval, bh.b_private);
+		if (err)
+			retval = err;
+	}
 
 	if (!(flags & DIO_SKIP_DIO_COUNT))
 		inode_dio_end(inode);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 1b2f7ff..9c6f885 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -253,8 +253,13 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
 	if (ret == 0)
 		ret = transferred;
 
-	if (dio->end_io && dio->result)
-		dio->end_io(dio->iocb, offset, transferred, dio->private);
+	if (dio->end_io) {
+		int err;
+
+		err = dio->end_io(dio->iocb, offset, ret, dio->private);
+		if (err)
+			ret = err;
+	}
 
 	if (!(dio->flags & DIO_SKIP_DIO_COUNT))
 		inode_dio_end(dio->inode);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 83bc8bf..9db04dd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3161,14 +3161,17 @@ out:
 }
 #endif
 
-static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
+static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
 			    ssize_t size, void *private)
 {
         ext4_io_end_t *io_end = iocb->private;
 
+	if (size <= 0)
+		return 0;
+
 	/* if not async direct IO just return */
 	if (!io_end)
-		return;
+		return 0;
 
 	ext_debug("ext4_end_io_dio(): io_end 0x%p "
 		  "for inode %lu, iocb 0x%p, offset %llu, size %zd\n",
@@ -3179,6 +3182,8 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
 	io_end->offset = offset;
 	io_end->size = size;
 	ext4_put_io_end(io_end);
+
+	return 0;
 }
 
 /*
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 794fd15..5dcc5f5 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -620,7 +620,7 @@ bail:
  * particularly interested in the aio/dio case.  We use the rw_lock DLM lock
  * to protect io on one node from truncation on another.
  */
-static void ocfs2_dio_end_io(struct kiocb *iocb,
+static int ocfs2_dio_end_io(struct kiocb *iocb,
 			     loff_t offset,
 			     ssize_t bytes,
 			     void *private)
@@ -628,6 +628,9 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
 	struct inode *inode = file_inode(iocb->ki_filp);
 	int level;
 
+	if (bytes <= 0)
+		return 0;
+
 	/* this io's submitter should not have unlocked this before we could */
 	BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
 
@@ -644,6 +647,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
 		level = ocfs2_iocb_rw_locked_level(iocb);
 		ocfs2_rw_unlock(inode, level);
 	}
+
+	return 0;
 }
 
 static int ocfs2_releasepage(struct page *page, gfp_t wait)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 379c089..295aaff 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1645,7 +1645,7 @@ out_end_io:
  * case the completion can be called in interrupt context, whereas if we have an
  * ioend we will always be called in task context (i.e. from a workqueue).
  */
-STATIC void
+STATIC int
 xfs_end_io_direct_write(
 	struct kiocb		*iocb,
 	loff_t			offset,
@@ -1655,15 +1655,19 @@ xfs_end_io_direct_write(
 	struct inode		*inode = file_inode(iocb->ki_filp);
 	struct xfs_ioend	*ioend = private;
 
+	if (size <= 0)
+		return 0;
+
 	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
 				     ioend ? ioend->io_type : 0, NULL);
 
 	if (!ioend) {
 		ASSERT(offset + size <= i_size_read(inode));
-		return;
+		return 0;
 	}
 
 	__xfs_end_io_direct_write(inode, ioend, offset, size);
+	return 0;
 }
 
 static inline ssize_t
@@ -1672,10 +1676,7 @@ xfs_vm_do_dio(
 	struct kiocb		*iocb,
 	struct iov_iter		*iter,
 	loff_t			offset,
-	void			(*endio)(struct kiocb	*iocb,
-					 loff_t		offset,
-					 ssize_t	size,
-					 void		*private),
+	dio_iodone_t		endio,
 	int			flags)
 {
 	struct block_device	*bdev;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1a20462..d7f37bf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -70,7 +70,7 @@ extern int sysctl_protected_hardlinks;
 struct buffer_head;
 typedef int (get_block_t)(struct inode *inode, sector_t iblock,
 			struct buffer_head *bh_result, int create);
-typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
+typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 			ssize_t bytes, void *private);
 typedef void (dax_iodone_t)(struct buffer_head *bh_map, int uptodate);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 2/3] xfs: don't use ioends for direct write completions
  2016-02-03 18:40 ` Christoph Hellwig
  (?)
@ 2016-02-03 18:40   ` Christoph Hellwig
  -1 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-03 18:40 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

See http://www.infradead.org/rpr.html

We only need to communicate two bits of information to the direct I/O
completion handler:

 (1) do we need to convert any unwritten extents in the range
 (2) do we need to check if we need to update the inode size based
     on the range passed to the completion handler

We can use the private data passed to the get_block handler and the
completion handler as a simple bitmask to communicate this information
instead of the current complicated infrastructure reusing the ioends
from the buffer I/O path, and thus avoiding a memory allocation and
a context switch for any non-trivial direct write.  As a nice side
effect we also decouple the direct I/O path implementation from that
of the buffered I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c  | 216 ++++++++++++++++++-----------------------------------
 fs/xfs/xfs_trace.h |   9 +--
 2 files changed, 75 insertions(+), 150 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 295aaff..f008a4f 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -36,6 +36,10 @@
 #include <linux/pagevec.h>
 #include <linux/writeback.h>
 
+/* flags for direct write completions */
+#define XFS_DIO_FLAG_UNWRITTEN	(1 << 0)
+#define XFS_DIO_FLAG_APPEND	(1 << 1)
+
 void
 xfs_count_page_state(
 	struct page		*page,
@@ -1238,27 +1242,8 @@ xfs_vm_releasepage(
 }
 
 /*
- * When we map a DIO buffer, we may need to attach an ioend that describes the
- * type of write IO we are doing. This passes to the completion function the
- * operations it needs to perform. If the mapping is for an overwrite wholly
- * within the EOF then we don't need an ioend and so we don't allocate one.
- * This avoids the unnecessary overhead of allocating and freeing ioends for
- * workloads that don't require transactions on IO completion.
- *
- * If we get multiple mappings in a single IO, we might be mapping different
- * types. But because the direct IO can only have a single private pointer, we
- * need to ensure that:
- *
- * a) i) the ioend spans the entire region of unwritten mappings; or
- *    ii) the ioend spans all the mappings that cross or are beyond EOF; and
- * b) if it contains unwritten extents, it is *permanently* marked as such
- *
- * We could do this by chaining ioends like buffered IO does, but we only
- * actually get one IO completion callback from the direct IO, and that spans
- * the entire IO regardless of how many mappings and IOs are needed to complete
- * the DIO. There is only going to be one reference to the ioend and its life
- * cycle is constrained by the DIO completion code. hence we don't need
- * reference counting here.
+ * When we map a DIO buffer, we may need to pass flags to
+ * xfs_end_io_direct_write to tell it what kind of write IO we are doing.
  *
  * Note that for DIO, an IO to the highest supported file block offset (i.e.
  * 2^63 - 1FSB bytes) will result in the offset + count overflowing a signed 64
@@ -1266,68 +1251,26 @@ xfs_vm_releasepage(
  * extending the file size. We won't know for sure until IO completion is run
  * and the actual max write offset is communicated to the IO completion
  * routine.
- *
- * For DAX page faults, we are preparing to never see unwritten extents here,
- * nor should we ever extend the inode size. Hence we will soon have nothing to
- * do here for this case, ensuring we don't have to provide an IO completion
- * callback to free an ioend that we don't actually need for a fault into the
- * page at offset (2^63 - 1FSB) bytes.
  */
-
 static void
 xfs_map_direct(
 	struct inode		*inode,
 	struct buffer_head	*bh_result,
 	struct xfs_bmbt_irec	*imap,
-	xfs_off_t		offset,
-	bool			dax_fault)
+	xfs_off_t		offset)
 {
-	struct xfs_ioend	*ioend;
+	uintptr_t		*flags = (uintptr_t *)&bh_result->b_private;
 	xfs_off_t		size = bh_result->b_size;
-	int			type;
-
-	if (ISUNWRITTEN(imap))
-		type = XFS_IO_UNWRITTEN;
-	else
-		type = XFS_IO_OVERWRITE;
-
-	trace_xfs_gbmap_direct(XFS_I(inode), offset, size, type, imap);
-
-	if (dax_fault) {
-		ASSERT(type == XFS_IO_OVERWRITE);
-		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
-					    imap);
-		return;
-	}
 
-	if (bh_result->b_private) {
-		ioend = bh_result->b_private;
-		ASSERT(ioend->io_size > 0);
-		ASSERT(offset >= ioend->io_offset);
-		if (offset + size > ioend->io_offset + ioend->io_size)
-			ioend->io_size = offset - ioend->io_offset + size;
-
-		if (type == XFS_IO_UNWRITTEN && type != ioend->io_type)
-			ioend->io_type = XFS_IO_UNWRITTEN;
-
-		trace_xfs_gbmap_direct_update(XFS_I(inode), ioend->io_offset,
-					      ioend->io_size, ioend->io_type,
-					      imap);
-	} else if (type == XFS_IO_UNWRITTEN ||
-		   offset + size > i_size_read(inode) ||
-		   offset + size < 0) {
-		ioend = xfs_alloc_ioend(inode, type);
-		ioend->io_offset = offset;
-		ioend->io_size = size;
+	trace_xfs_get_blocks_map_direct(XFS_I(inode), offset, size,
+		ISUNWRITTEN(imap) ? XFS_IO_UNWRITTEN : XFS_IO_OVERWRITE, imap);
 
-		bh_result->b_private = ioend;
+	if (ISUNWRITTEN(imap)) {
+		*flags |= XFS_DIO_FLAG_UNWRITTEN;
+		set_buffer_defer_completion(bh_result);
+	} else if (offset + size > i_size_read(inode) || offset + size < 0) {
+		*flags |= XFS_DIO_FLAG_APPEND;
 		set_buffer_defer_completion(bh_result);
-
-		trace_xfs_gbmap_direct_new(XFS_I(inode), offset, size, type,
-					   imap);
-	} else {
-		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
-					    imap);
 	}
 }
 
@@ -1498,9 +1441,12 @@ __xfs_get_blocks(
 		if (ISUNWRITTEN(&imap))
 			set_buffer_unwritten(bh_result);
 		/* direct IO needs special help */
-		if (create && direct)
-			xfs_map_direct(inode, bh_result, &imap, offset,
-				       dax_fault);
+		if (create && direct) {
+			if (dax_fault)
+				ASSERT(!ISUNWRITTEN(&imap));
+			else
+				xfs_map_direct(inode, bh_result, &imap, offset);
+		}
 	}
 
 	/*
@@ -1570,42 +1516,50 @@ xfs_get_blocks_dax_fault(
 	return __xfs_get_blocks(inode, iblock, bh_result, create, true, true);
 }
 
-static void
-__xfs_end_io_direct_write(
-	struct inode		*inode,
-	struct xfs_ioend	*ioend,
+/*
+ * Complete a direct I/O write request.
+ *
+ * xfs_map_direct passes us some flags in the private data to tell us what to
+ * do.  If no flags are set, then the write IO is an overwrite wholly within
+ * the existing allocated file size and so there is nothing for us to do.
+ *
+ * Note that in this case the completion can be called in interrupt context,
+ * whereas if we have flags set we will always be called in task context
+ * (i.e. from a workqueue).
+ */
+STATIC int
+xfs_end_io_direct_write(
+	struct kiocb		*iocb,
 	loff_t			offset,
-	ssize_t			size)
+	ssize_t			size,
+	void			*private)
 {
-	struct xfs_mount	*mp = XFS_I(inode)->i_mount;
+	struct inode		*inode = file_inode(iocb->ki_filp);
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	uintptr_t		flags = (uintptr_t)private;
+	int			error = 0;
 
-	if (XFS_FORCED_SHUTDOWN(mp) || ioend->io_error)
-		goto out_end_io;
+	trace_xfs_end_io_direct_write(ip, offset, size);
 
-	/*
-	 * dio completion end_io functions are only called on writes if more
-	 * than 0 bytes was written.
-	 */
-	ASSERT(size > 0);
+	if (XFS_FORCED_SHUTDOWN(mp))
+		return -EIO;
 
-	/*
-	 * The ioend only maps whole blocks, while the IO may be sector aligned.
-	 * Hence the ioend offset/size may not match the IO offset/size exactly.
-	 * Because we don't map overwrites within EOF into the ioend, the offset
-	 * may not match, but only if the endio spans EOF.  Either way, write
-	 * the IO sizes into the ioend so that completion processing does the
-	 * right thing.
-	 */
-	ASSERT(offset + size <= ioend->io_offset + ioend->io_size);
-	ioend->io_size = size;
-	ioend->io_offset = offset;
+	if (size <= 0)
+		return size;
 
 	/*
-	 * The ioend tells us whether we are doing unwritten extent conversion
+	 * The flags tell us whether we are doing unwritten extent conversions
 	 * or an append transaction that updates the on-disk file size. These
 	 * cases are the only cases where we should *potentially* be needing
 	 * to update the VFS inode size.
-	 *
+	 */
+	if (flags == 0) {
+		ASSERT(offset + size <= i_size_read(inode));
+		return 0;
+	}
+
+	/*
 	 * We need to update the in-core inode size here so that we don't end up
 	 * with the on-disk inode size being outside the in-core inode size. We
 	 * have no other method of updating EOF for AIO, so always do it here
@@ -1616,58 +1570,30 @@ __xfs_end_io_direct_write(
 	 * here can result in EOF moving backwards and Bad Things Happen when
 	 * that occurs.
 	 */
-	spin_lock(&XFS_I(inode)->i_flags_lock);
+	spin_lock(&ip->i_flags_lock);
 	if (offset + size > i_size_read(inode))
 		i_size_write(inode, offset + size);
-	spin_unlock(&XFS_I(inode)->i_flags_lock);
+	spin_unlock(&ip->i_flags_lock);
 
-	/*
-	 * If we are doing an append IO that needs to update the EOF on disk,
-	 * do the transaction reserve now so we can use common end io
-	 * processing. Stashing the error (if there is one) in the ioend will
-	 * result in the ioend processing passing on the error if it is
-	 * possible as we can't return it from here.
-	 */
-	if (ioend->io_type == XFS_IO_OVERWRITE)
-		ioend->io_error = xfs_setfilesize_trans_alloc(ioend);
+	if (flags & XFS_DIO_FLAG_UNWRITTEN) {
+		trace_xfs_end_io_direct_write_unwritten(ip, offset, size);
 
-out_end_io:
-	xfs_end_io(&ioend->io_work);
-	return;
-}
+		error = xfs_iomap_write_unwritten(ip, offset, size);
+	} else if (flags & XFS_DIO_FLAG_APPEND) {
+		struct xfs_trans *tp;
 
-/*
- * Complete a direct I/O write request.
- *
- * The ioend structure is passed from __xfs_get_blocks() to tell us what to do.
- * If no ioend exists (i.e. @private == NULL) then the write IO is an overwrite
- * wholly within the EOF and so there is nothing for us to do. Note that in this
- * case the completion can be called in interrupt context, whereas if we have an
- * ioend we will always be called in task context (i.e. from a workqueue).
- */
-STATIC int
-xfs_end_io_direct_write(
-	struct kiocb		*iocb,
-	loff_t			offset,
-	ssize_t			size,
-	void			*private)
-{
-	struct inode		*inode = file_inode(iocb->ki_filp);
-	struct xfs_ioend	*ioend = private;
+		trace_xfs_end_io_direct_write_append(ip, offset, size);
 
-	if (size <= 0)
-		return 0;
-
-	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
-				     ioend ? ioend->io_type : 0, NULL);
-
-	if (!ioend) {
-		ASSERT(offset + size <= i_size_read(inode));
-		return 0;
+		tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
+		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
+		if (error) {
+			xfs_trans_cancel(tp);
+			return error;
+		}
+		error = xfs_setfilesize(ip, tp, offset, size);
 	}
 
-	__xfs_end_io_direct_write(inode, ioend, offset, size);
-	return 0;
+	return error;
 }
 
 static inline ssize_t
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 391d797..c8d5842 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1296,11 +1296,7 @@ DEFINE_IOMAP_EVENT(xfs_map_blocks_found);
 DEFINE_IOMAP_EVENT(xfs_map_blocks_alloc);
 DEFINE_IOMAP_EVENT(xfs_get_blocks_found);
 DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_new);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_update);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_none);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_endio);
+DEFINE_IOMAP_EVENT(xfs_get_blocks_map_direct);
 
 DECLARE_EVENT_CLASS(xfs_simple_io_class,
 	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),
@@ -1340,6 +1336,9 @@ DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert);
 DEFINE_SIMPLE_IO_EVENT(xfs_get_blocks_notfound);
 DEFINE_SIMPLE_IO_EVENT(xfs_setfilesize);
 DEFINE_SIMPLE_IO_EVENT(xfs_zero_eof);
+DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write);
+DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_unwritten);
+DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_append);
 
 DECLARE_EVENT_CLASS(xfs_itrunc_class,
 	TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size),
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-03 18:40   ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-03 18:40 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

We only need to communicate two bits of information to the direct I/O
completion handler:

 (1) do we need to convert any unwritten extents in the range
 (2) do we need to check if we need to update the inode size based
     on the range passed to the completion handler

We can use the private data passed to the get_block handler and the
completion handler as a simple bitmask to communicate this information
instead of the current complicated infrastructure reusing the ioends
from the buffer I/O path, and thus avoiding a memory allocation and
a context switch for any non-trivial direct write.  As a nice side
effect we also decouple the direct I/O path implementation from that
of the buffered I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c  | 216 ++++++++++++++++++-----------------------------------
 fs/xfs/xfs_trace.h |   9 +--
 2 files changed, 75 insertions(+), 150 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 295aaff..f008a4f 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -36,6 +36,10 @@
 #include <linux/pagevec.h>
 #include <linux/writeback.h>
 
+/* flags for direct write completions */
+#define XFS_DIO_FLAG_UNWRITTEN	(1 << 0)
+#define XFS_DIO_FLAG_APPEND	(1 << 1)
+
 void
 xfs_count_page_state(
 	struct page		*page,
@@ -1238,27 +1242,8 @@ xfs_vm_releasepage(
 }
 
 /*
- * When we map a DIO buffer, we may need to attach an ioend that describes the
- * type of write IO we are doing. This passes to the completion function the
- * operations it needs to perform. If the mapping is for an overwrite wholly
- * within the EOF then we don't need an ioend and so we don't allocate one.
- * This avoids the unnecessary overhead of allocating and freeing ioends for
- * workloads that don't require transactions on IO completion.
- *
- * If we get multiple mappings in a single IO, we might be mapping different
- * types. But because the direct IO can only have a single private pointer, we
- * need to ensure that:
- *
- * a) i) the ioend spans the entire region of unwritten mappings; or
- *    ii) the ioend spans all the mappings that cross or are beyond EOF; and
- * b) if it contains unwritten extents, it is *permanently* marked as such
- *
- * We could do this by chaining ioends like buffered IO does, but we only
- * actually get one IO completion callback from the direct IO, and that spans
- * the entire IO regardless of how many mappings and IOs are needed to complete
- * the DIO. There is only going to be one reference to the ioend and its life
- * cycle is constrained by the DIO completion code. hence we don't need
- * reference counting here.
+ * When we map a DIO buffer, we may need to pass flags to
+ * xfs_end_io_direct_write to tell it what kind of write IO we are doing.
  *
  * Note that for DIO, an IO to the highest supported file block offset (i.e.
  * 2^63 - 1FSB bytes) will result in the offset + count overflowing a signed 64
@@ -1266,68 +1251,26 @@ xfs_vm_releasepage(
  * extending the file size. We won't know for sure until IO completion is run
  * and the actual max write offset is communicated to the IO completion
  * routine.
- *
- * For DAX page faults, we are preparing to never see unwritten extents here,
- * nor should we ever extend the inode size. Hence we will soon have nothing to
- * do here for this case, ensuring we don't have to provide an IO completion
- * callback to free an ioend that we don't actually need for a fault into the
- * page at offset (2^63 - 1FSB) bytes.
  */
-
 static void
 xfs_map_direct(
 	struct inode		*inode,
 	struct buffer_head	*bh_result,
 	struct xfs_bmbt_irec	*imap,
-	xfs_off_t		offset,
-	bool			dax_fault)
+	xfs_off_t		offset)
 {
-	struct xfs_ioend	*ioend;
+	uintptr_t		*flags = (uintptr_t *)&bh_result->b_private;
 	xfs_off_t		size = bh_result->b_size;
-	int			type;
-
-	if (ISUNWRITTEN(imap))
-		type = XFS_IO_UNWRITTEN;
-	else
-		type = XFS_IO_OVERWRITE;
-
-	trace_xfs_gbmap_direct(XFS_I(inode), offset, size, type, imap);
-
-	if (dax_fault) {
-		ASSERT(type == XFS_IO_OVERWRITE);
-		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
-					    imap);
-		return;
-	}
 
-	if (bh_result->b_private) {
-		ioend = bh_result->b_private;
-		ASSERT(ioend->io_size > 0);
-		ASSERT(offset >= ioend->io_offset);
-		if (offset + size > ioend->io_offset + ioend->io_size)
-			ioend->io_size = offset - ioend->io_offset + size;
-
-		if (type == XFS_IO_UNWRITTEN && type != ioend->io_type)
-			ioend->io_type = XFS_IO_UNWRITTEN;
-
-		trace_xfs_gbmap_direct_update(XFS_I(inode), ioend->io_offset,
-					      ioend->io_size, ioend->io_type,
-					      imap);
-	} else if (type == XFS_IO_UNWRITTEN ||
-		   offset + size > i_size_read(inode) ||
-		   offset + size < 0) {
-		ioend = xfs_alloc_ioend(inode, type);
-		ioend->io_offset = offset;
-		ioend->io_size = size;
+	trace_xfs_get_blocks_map_direct(XFS_I(inode), offset, size,
+		ISUNWRITTEN(imap) ? XFS_IO_UNWRITTEN : XFS_IO_OVERWRITE, imap);
 
-		bh_result->b_private = ioend;
+	if (ISUNWRITTEN(imap)) {
+		*flags |= XFS_DIO_FLAG_UNWRITTEN;
+		set_buffer_defer_completion(bh_result);
+	} else if (offset + size > i_size_read(inode) || offset + size < 0) {
+		*flags |= XFS_DIO_FLAG_APPEND;
 		set_buffer_defer_completion(bh_result);
-
-		trace_xfs_gbmap_direct_new(XFS_I(inode), offset, size, type,
-					   imap);
-	} else {
-		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
-					    imap);
 	}
 }
 
@@ -1498,9 +1441,12 @@ __xfs_get_blocks(
 		if (ISUNWRITTEN(&imap))
 			set_buffer_unwritten(bh_result);
 		/* direct IO needs special help */
-		if (create && direct)
-			xfs_map_direct(inode, bh_result, &imap, offset,
-				       dax_fault);
+		if (create && direct) {
+			if (dax_fault)
+				ASSERT(!ISUNWRITTEN(&imap));
+			else
+				xfs_map_direct(inode, bh_result, &imap, offset);
+		}
 	}
 
 	/*
@@ -1570,42 +1516,50 @@ xfs_get_blocks_dax_fault(
 	return __xfs_get_blocks(inode, iblock, bh_result, create, true, true);
 }
 
-static void
-__xfs_end_io_direct_write(
-	struct inode		*inode,
-	struct xfs_ioend	*ioend,
+/*
+ * Complete a direct I/O write request.
+ *
+ * xfs_map_direct passes us some flags in the private data to tell us what to
+ * do.  If no flags are set, then the write IO is an overwrite wholly within
+ * the existing allocated file size and so there is nothing for us to do.
+ *
+ * Note that in this case the completion can be called in interrupt context,
+ * whereas if we have flags set we will always be called in task context
+ * (i.e. from a workqueue).
+ */
+STATIC int
+xfs_end_io_direct_write(
+	struct kiocb		*iocb,
 	loff_t			offset,
-	ssize_t			size)
+	ssize_t			size,
+	void			*private)
 {
-	struct xfs_mount	*mp = XFS_I(inode)->i_mount;
+	struct inode		*inode = file_inode(iocb->ki_filp);
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	uintptr_t		flags = (uintptr_t)private;
+	int			error = 0;
 
-	if (XFS_FORCED_SHUTDOWN(mp) || ioend->io_error)
-		goto out_end_io;
+	trace_xfs_end_io_direct_write(ip, offset, size);
 
-	/*
-	 * dio completion end_io functions are only called on writes if more
-	 * than 0 bytes was written.
-	 */
-	ASSERT(size > 0);
+	if (XFS_FORCED_SHUTDOWN(mp))
+		return -EIO;
 
-	/*
-	 * The ioend only maps whole blocks, while the IO may be sector aligned.
-	 * Hence the ioend offset/size may not match the IO offset/size exactly.
-	 * Because we don't map overwrites within EOF into the ioend, the offset
-	 * may not match, but only if the endio spans EOF.  Either way, write
-	 * the IO sizes into the ioend so that completion processing does the
-	 * right thing.
-	 */
-	ASSERT(offset + size <= ioend->io_offset + ioend->io_size);
-	ioend->io_size = size;
-	ioend->io_offset = offset;
+	if (size <= 0)
+		return size;
 
 	/*
-	 * The ioend tells us whether we are doing unwritten extent conversion
+	 * The flags tell us whether we are doing unwritten extent conversions
 	 * or an append transaction that updates the on-disk file size. These
 	 * cases are the only cases where we should *potentially* be needing
 	 * to update the VFS inode size.
-	 *
+	 */
+	if (flags == 0) {
+		ASSERT(offset + size <= i_size_read(inode));
+		return 0;
+	}
+
+	/*
 	 * We need to update the in-core inode size here so that we don't end up
 	 * with the on-disk inode size being outside the in-core inode size. We
 	 * have no other method of updating EOF for AIO, so always do it here
@@ -1616,58 +1570,30 @@ __xfs_end_io_direct_write(
 	 * here can result in EOF moving backwards and Bad Things Happen when
 	 * that occurs.
 	 */
-	spin_lock(&XFS_I(inode)->i_flags_lock);
+	spin_lock(&ip->i_flags_lock);
 	if (offset + size > i_size_read(inode))
 		i_size_write(inode, offset + size);
-	spin_unlock(&XFS_I(inode)->i_flags_lock);
+	spin_unlock(&ip->i_flags_lock);
 
-	/*
-	 * If we are doing an append IO that needs to update the EOF on disk,
-	 * do the transaction reserve now so we can use common end io
-	 * processing. Stashing the error (if there is one) in the ioend will
-	 * result in the ioend processing passing on the error if it is
-	 * possible as we can't return it from here.
-	 */
-	if (ioend->io_type == XFS_IO_OVERWRITE)
-		ioend->io_error = xfs_setfilesize_trans_alloc(ioend);
+	if (flags & XFS_DIO_FLAG_UNWRITTEN) {
+		trace_xfs_end_io_direct_write_unwritten(ip, offset, size);
 
-out_end_io:
-	xfs_end_io(&ioend->io_work);
-	return;
-}
+		error = xfs_iomap_write_unwritten(ip, offset, size);
+	} else if (flags & XFS_DIO_FLAG_APPEND) {
+		struct xfs_trans *tp;
 
-/*
- * Complete a direct I/O write request.
- *
- * The ioend structure is passed from __xfs_get_blocks() to tell us what to do.
- * If no ioend exists (i.e. @private == NULL) then the write IO is an overwrite
- * wholly within the EOF and so there is nothing for us to do. Note that in this
- * case the completion can be called in interrupt context, whereas if we have an
- * ioend we will always be called in task context (i.e. from a workqueue).
- */
-STATIC int
-xfs_end_io_direct_write(
-	struct kiocb		*iocb,
-	loff_t			offset,
-	ssize_t			size,
-	void			*private)
-{
-	struct inode		*inode = file_inode(iocb->ki_filp);
-	struct xfs_ioend	*ioend = private;
+		trace_xfs_end_io_direct_write_append(ip, offset, size);
 
-	if (size <= 0)
-		return 0;
-
-	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
-				     ioend ? ioend->io_type : 0, NULL);
-
-	if (!ioend) {
-		ASSERT(offset + size <= i_size_read(inode));
-		return 0;
+		tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
+		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
+		if (error) {
+			xfs_trans_cancel(tp);
+			return error;
+		}
+		error = xfs_setfilesize(ip, tp, offset, size);
 	}
 
-	__xfs_end_io_direct_write(inode, ioend, offset, size);
-	return 0;
+	return error;
 }
 
 static inline ssize_t
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 391d797..c8d5842 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1296,11 +1296,7 @@ DEFINE_IOMAP_EVENT(xfs_map_blocks_found);
 DEFINE_IOMAP_EVENT(xfs_map_blocks_alloc);
 DEFINE_IOMAP_EVENT(xfs_get_blocks_found);
 DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_new);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_update);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_none);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_endio);
+DEFINE_IOMAP_EVENT(xfs_get_blocks_map_direct);
 
 DECLARE_EVENT_CLASS(xfs_simple_io_class,
 	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),
@@ -1340,6 +1336,9 @@ DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert);
 DEFINE_SIMPLE_IO_EVENT(xfs_get_blocks_notfound);
 DEFINE_SIMPLE_IO_EVENT(xfs_setfilesize);
 DEFINE_SIMPLE_IO_EVENT(xfs_zero_eof);
+DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write);
+DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_unwritten);
+DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_append);
 
 DECLARE_EVENT_CLASS(xfs_itrunc_class,
 	TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size),
-- 
2.1.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-03 18:40   ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-03 18:40 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

See http://www.infradead.org/rpr.html

We only need to communicate two bits of information to the direct I/O
completion handler:

 (1) do we need to convert any unwritten extents in the range
 (2) do we need to check if we need to update the inode size based
     on the range passed to the completion handler

We can use the private data passed to the get_block handler and the
completion handler as a simple bitmask to communicate this information
instead of the current complicated infrastructure reusing the ioends
from the buffer I/O path, and thus avoiding a memory allocation and
a context switch for any non-trivial direct write.  As a nice side
effect we also decouple the direct I/O path implementation from that
of the buffered I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c  | 216 ++++++++++++++++++-----------------------------------
 fs/xfs/xfs_trace.h |   9 +--
 2 files changed, 75 insertions(+), 150 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 295aaff..f008a4f 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -36,6 +36,10 @@
 #include <linux/pagevec.h>
 #include <linux/writeback.h>
 
+/* flags for direct write completions */
+#define XFS_DIO_FLAG_UNWRITTEN	(1 << 0)
+#define XFS_DIO_FLAG_APPEND	(1 << 1)
+
 void
 xfs_count_page_state(
 	struct page		*page,
@@ -1238,27 +1242,8 @@ xfs_vm_releasepage(
 }
 
 /*
- * When we map a DIO buffer, we may need to attach an ioend that describes the
- * type of write IO we are doing. This passes to the completion function the
- * operations it needs to perform. If the mapping is for an overwrite wholly
- * within the EOF then we don't need an ioend and so we don't allocate one.
- * This avoids the unnecessary overhead of allocating and freeing ioends for
- * workloads that don't require transactions on IO completion.
- *
- * If we get multiple mappings in a single IO, we might be mapping different
- * types. But because the direct IO can only have a single private pointer, we
- * need to ensure that:
- *
- * a) i) the ioend spans the entire region of unwritten mappings; or
- *    ii) the ioend spans all the mappings that cross or are beyond EOF; and
- * b) if it contains unwritten extents, it is *permanently* marked as such
- *
- * We could do this by chaining ioends like buffered IO does, but we only
- * actually get one IO completion callback from the direct IO, and that spans
- * the entire IO regardless of how many mappings and IOs are needed to complete
- * the DIO. There is only going to be one reference to the ioend and its life
- * cycle is constrained by the DIO completion code. hence we don't need
- * reference counting here.
+ * When we map a DIO buffer, we may need to pass flags to
+ * xfs_end_io_direct_write to tell it what kind of write IO we are doing.
  *
  * Note that for DIO, an IO to the highest supported file block offset (i.e.
  * 2^63 - 1FSB bytes) will result in the offset + count overflowing a signed 64
@@ -1266,68 +1251,26 @@ xfs_vm_releasepage(
  * extending the file size. We won't know for sure until IO completion is run
  * and the actual max write offset is communicated to the IO completion
  * routine.
- *
- * For DAX page faults, we are preparing to never see unwritten extents here,
- * nor should we ever extend the inode size. Hence we will soon have nothing to
- * do here for this case, ensuring we don't have to provide an IO completion
- * callback to free an ioend that we don't actually need for a fault into the
- * page@offset (2^63 - 1FSB) bytes.
  */
-
 static void
 xfs_map_direct(
 	struct inode		*inode,
 	struct buffer_head	*bh_result,
 	struct xfs_bmbt_irec	*imap,
-	xfs_off_t		offset,
-	bool			dax_fault)
+	xfs_off_t		offset)
 {
-	struct xfs_ioend	*ioend;
+	uintptr_t		*flags = (uintptr_t *)&bh_result->b_private;
 	xfs_off_t		size = bh_result->b_size;
-	int			type;
-
-	if (ISUNWRITTEN(imap))
-		type = XFS_IO_UNWRITTEN;
-	else
-		type = XFS_IO_OVERWRITE;
-
-	trace_xfs_gbmap_direct(XFS_I(inode), offset, size, type, imap);
-
-	if (dax_fault) {
-		ASSERT(type == XFS_IO_OVERWRITE);
-		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
-					    imap);
-		return;
-	}
 
-	if (bh_result->b_private) {
-		ioend = bh_result->b_private;
-		ASSERT(ioend->io_size > 0);
-		ASSERT(offset >= ioend->io_offset);
-		if (offset + size > ioend->io_offset + ioend->io_size)
-			ioend->io_size = offset - ioend->io_offset + size;
-
-		if (type == XFS_IO_UNWRITTEN && type != ioend->io_type)
-			ioend->io_type = XFS_IO_UNWRITTEN;
-
-		trace_xfs_gbmap_direct_update(XFS_I(inode), ioend->io_offset,
-					      ioend->io_size, ioend->io_type,
-					      imap);
-	} else if (type == XFS_IO_UNWRITTEN ||
-		   offset + size > i_size_read(inode) ||
-		   offset + size < 0) {
-		ioend = xfs_alloc_ioend(inode, type);
-		ioend->io_offset = offset;
-		ioend->io_size = size;
+	trace_xfs_get_blocks_map_direct(XFS_I(inode), offset, size,
+		ISUNWRITTEN(imap) ? XFS_IO_UNWRITTEN : XFS_IO_OVERWRITE, imap);
 
-		bh_result->b_private = ioend;
+	if (ISUNWRITTEN(imap)) {
+		*flags |= XFS_DIO_FLAG_UNWRITTEN;
+		set_buffer_defer_completion(bh_result);
+	} else if (offset + size > i_size_read(inode) || offset + size < 0) {
+		*flags |= XFS_DIO_FLAG_APPEND;
 		set_buffer_defer_completion(bh_result);
-
-		trace_xfs_gbmap_direct_new(XFS_I(inode), offset, size, type,
-					   imap);
-	} else {
-		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
-					    imap);
 	}
 }
 
@@ -1498,9 +1441,12 @@ __xfs_get_blocks(
 		if (ISUNWRITTEN(&imap))
 			set_buffer_unwritten(bh_result);
 		/* direct IO needs special help */
-		if (create && direct)
-			xfs_map_direct(inode, bh_result, &imap, offset,
-				       dax_fault);
+		if (create && direct) {
+			if (dax_fault)
+				ASSERT(!ISUNWRITTEN(&imap));
+			else
+				xfs_map_direct(inode, bh_result, &imap, offset);
+		}
 	}
 
 	/*
@@ -1570,42 +1516,50 @@ xfs_get_blocks_dax_fault(
 	return __xfs_get_blocks(inode, iblock, bh_result, create, true, true);
 }
 
-static void
-__xfs_end_io_direct_write(
-	struct inode		*inode,
-	struct xfs_ioend	*ioend,
+/*
+ * Complete a direct I/O write request.
+ *
+ * xfs_map_direct passes us some flags in the private data to tell us what to
+ * do.  If no flags are set, then the write IO is an overwrite wholly within
+ * the existing allocated file size and so there is nothing for us to do.
+ *
+ * Note that in this case the completion can be called in interrupt context,
+ * whereas if we have flags set we will always be called in task context
+ * (i.e. from a workqueue).
+ */
+STATIC int
+xfs_end_io_direct_write(
+	struct kiocb		*iocb,
 	loff_t			offset,
-	ssize_t			size)
+	ssize_t			size,
+	void			*private)
 {
-	struct xfs_mount	*mp = XFS_I(inode)->i_mount;
+	struct inode		*inode = file_inode(iocb->ki_filp);
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	uintptr_t		flags = (uintptr_t)private;
+	int			error = 0;
 
-	if (XFS_FORCED_SHUTDOWN(mp) || ioend->io_error)
-		goto out_end_io;
+	trace_xfs_end_io_direct_write(ip, offset, size);
 
-	/*
-	 * dio completion end_io functions are only called on writes if more
-	 * than 0 bytes was written.
-	 */
-	ASSERT(size > 0);
+	if (XFS_FORCED_SHUTDOWN(mp))
+		return -EIO;
 
-	/*
-	 * The ioend only maps whole blocks, while the IO may be sector aligned.
-	 * Hence the ioend offset/size may not match the IO offset/size exactly.
-	 * Because we don't map overwrites within EOF into the ioend, the offset
-	 * may not match, but only if the endio spans EOF.  Either way, write
-	 * the IO sizes into the ioend so that completion processing does the
-	 * right thing.
-	 */
-	ASSERT(offset + size <= ioend->io_offset + ioend->io_size);
-	ioend->io_size = size;
-	ioend->io_offset = offset;
+	if (size <= 0)
+		return size;
 
 	/*
-	 * The ioend tells us whether we are doing unwritten extent conversion
+	 * The flags tell us whether we are doing unwritten extent conversions
 	 * or an append transaction that updates the on-disk file size. These
 	 * cases are the only cases where we should *potentially* be needing
 	 * to update the VFS inode size.
-	 *
+	 */
+	if (flags == 0) {
+		ASSERT(offset + size <= i_size_read(inode));
+		return 0;
+	}
+
+	/*
 	 * We need to update the in-core inode size here so that we don't end up
 	 * with the on-disk inode size being outside the in-core inode size. We
 	 * have no other method of updating EOF for AIO, so always do it here
@@ -1616,58 +1570,30 @@ __xfs_end_io_direct_write(
 	 * here can result in EOF moving backwards and Bad Things Happen when
 	 * that occurs.
 	 */
-	spin_lock(&XFS_I(inode)->i_flags_lock);
+	spin_lock(&ip->i_flags_lock);
 	if (offset + size > i_size_read(inode))
 		i_size_write(inode, offset + size);
-	spin_unlock(&XFS_I(inode)->i_flags_lock);
+	spin_unlock(&ip->i_flags_lock);
 
-	/*
-	 * If we are doing an append IO that needs to update the EOF on disk,
-	 * do the transaction reserve now so we can use common end io
-	 * processing. Stashing the error (if there is one) in the ioend will
-	 * result in the ioend processing passing on the error if it is
-	 * possible as we can't return it from here.
-	 */
-	if (ioend->io_type == XFS_IO_OVERWRITE)
-		ioend->io_error = xfs_setfilesize_trans_alloc(ioend);
+	if (flags & XFS_DIO_FLAG_UNWRITTEN) {
+		trace_xfs_end_io_direct_write_unwritten(ip, offset, size);
 
-out_end_io:
-	xfs_end_io(&ioend->io_work);
-	return;
-}
+		error = xfs_iomap_write_unwritten(ip, offset, size);
+	} else if (flags & XFS_DIO_FLAG_APPEND) {
+		struct xfs_trans *tp;
 
-/*
- * Complete a direct I/O write request.
- *
- * The ioend structure is passed from __xfs_get_blocks() to tell us what to do.
- * If no ioend exists (i.e. @private == NULL) then the write IO is an overwrite
- * wholly within the EOF and so there is nothing for us to do. Note that in this
- * case the completion can be called in interrupt context, whereas if we have an
- * ioend we will always be called in task context (i.e. from a workqueue).
- */
-STATIC int
-xfs_end_io_direct_write(
-	struct kiocb		*iocb,
-	loff_t			offset,
-	ssize_t			size,
-	void			*private)
-{
-	struct inode		*inode = file_inode(iocb->ki_filp);
-	struct xfs_ioend	*ioend = private;
+		trace_xfs_end_io_direct_write_append(ip, offset, size);
 
-	if (size <= 0)
-		return 0;
-
-	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
-				     ioend ? ioend->io_type : 0, NULL);
-
-	if (!ioend) {
-		ASSERT(offset + size <= i_size_read(inode));
-		return 0;
+		tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
+		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
+		if (error) {
+			xfs_trans_cancel(tp);
+			return error;
+		}
+		error = xfs_setfilesize(ip, tp, offset, size);
 	}
 
-	__xfs_end_io_direct_write(inode, ioend, offset, size);
-	return 0;
+	return error;
 }
 
 static inline ssize_t
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 391d797..c8d5842 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1296,11 +1296,7 @@ DEFINE_IOMAP_EVENT(xfs_map_blocks_found);
 DEFINE_IOMAP_EVENT(xfs_map_blocks_alloc);
 DEFINE_IOMAP_EVENT(xfs_get_blocks_found);
 DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_new);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_update);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_none);
-DEFINE_IOMAP_EVENT(xfs_gbmap_direct_endio);
+DEFINE_IOMAP_EVENT(xfs_get_blocks_map_direct);
 
 DECLARE_EVENT_CLASS(xfs_simple_io_class,
 	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),
@@ -1340,6 +1336,9 @@ DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert);
 DEFINE_SIMPLE_IO_EVENT(xfs_get_blocks_notfound);
 DEFINE_SIMPLE_IO_EVENT(xfs_setfilesize);
 DEFINE_SIMPLE_IO_EVENT(xfs_zero_eof);
+DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write);
+DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_unwritten);
+DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_append);
 
 DECLARE_EVENT_CLASS(xfs_itrunc_class,
 	TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size),
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 3/3] xfs: fold xfs_vm_do_dio into xfs_vm_direct_IO
  2016-02-03 18:40 ` Christoph Hellwig
  (?)
  (?)
@ 2016-02-03 18:40   ` Christoph Hellwig
  -1 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-03 18:40 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

See http://www.infradead.org/rpr.html

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c | 36 ++++++++++++++----------------------
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index f008a4f..8163910 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1596,38 +1596,30 @@ xfs_end_io_direct_write(
 	return error;
 }
 
-static inline ssize_t
-xfs_vm_do_dio(
-	struct inode		*inode,
+STATIC ssize_t
+xfs_vm_direct_IO(
 	struct kiocb		*iocb,
 	struct iov_iter		*iter,
-	loff_t			offset,
-	dio_iodone_t		endio,
-	int			flags)
+	loff_t			offset)
 {
+	struct inode		*inode = iocb->ki_filp->f_mapping->host;
+	dio_iodone_t		*endio = NULL;
+	int			flags = 0;
 	struct block_device	*bdev;
 
-	if (IS_DAX(inode))
+	if (iov_iter_rw(iter) == WRITE) {
+		endio = xfs_end_io_direct_write;
+		flags = DIO_ASYNC_EXTEND;
+	}
+
+	if (IS_DAX(inode)) {
 		return dax_do_io(iocb, inode, iter, offset,
 				 xfs_get_blocks_direct, endio, 0);
+	}
 
 	bdev = xfs_find_bdev_for_inode(inode);
 	return  __blockdev_direct_IO(iocb, inode, bdev, iter, offset,
-				     xfs_get_blocks_direct, endio, NULL, flags);
-}
-
-STATIC ssize_t
-xfs_vm_direct_IO(
-	struct kiocb		*iocb,
-	struct iov_iter		*iter,
-	loff_t			offset)
-{
-	struct inode		*inode = iocb->ki_filp->f_mapping->host;
-
-	if (iov_iter_rw(iter) == WRITE)
-		return xfs_vm_do_dio(inode, iocb, iter, offset,
-				     xfs_end_io_direct_write, DIO_ASYNC_EXTEND);
-	return xfs_vm_do_dio(inode, iocb, iter, offset, NULL, 0);
+			xfs_get_blocks_direct, endio, NULL, flags);
 }
 
 /*
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 3/3] xfs: fold xfs_vm_do_dio into xfs_vm_direct_IO
@ 2016-02-03 18:40   ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-03 18:40 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

See http://www.infradead.org/rpr.html

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c | 36 ++++++++++++++----------------------
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index f008a4f..8163910 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1596,38 +1596,30 @@ xfs_end_io_direct_write(
 	return error;
 }
 
-static inline ssize_t
-xfs_vm_do_dio(
-	struct inode		*inode,
+STATIC ssize_t
+xfs_vm_direct_IO(
 	struct kiocb		*iocb,
 	struct iov_iter		*iter,
-	loff_t			offset,
-	dio_iodone_t		endio,
-	int			flags)
+	loff_t			offset)
 {
+	struct inode		*inode = iocb->ki_filp->f_mapping->host;
+	dio_iodone_t		*endio = NULL;
+	int			flags = 0;
 	struct block_device	*bdev;
 
-	if (IS_DAX(inode))
+	if (iov_iter_rw(iter) == WRITE) {
+		endio = xfs_end_io_direct_write;
+		flags = DIO_ASYNC_EXTEND;
+	}
+
+	if (IS_DAX(inode)) {
 		return dax_do_io(iocb, inode, iter, offset,
 				 xfs_get_blocks_direct, endio, 0);
+	}
 
 	bdev = xfs_find_bdev_for_inode(inode);
 	return  __blockdev_direct_IO(iocb, inode, bdev, iter, offset,
-				     xfs_get_blocks_direct, endio, NULL, flags);
-}
-
-STATIC ssize_t
-xfs_vm_direct_IO(
-	struct kiocb		*iocb,
-	struct iov_iter		*iter,
-	loff_t			offset)
-{
-	struct inode		*inode = iocb->ki_filp->f_mapping->host;

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 3/3] xfs: fold xfs_vm_do_dio into xfs_vm_direct_IO
@ 2016-02-03 18:40   ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-03 18:40 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c | 36 ++++++++++++++----------------------
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index f008a4f..8163910 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1596,38 +1596,30 @@ xfs_end_io_direct_write(
 	return error;
 }
 
-static inline ssize_t
-xfs_vm_do_dio(
-	struct inode		*inode,
+STATIC ssize_t
+xfs_vm_direct_IO(
 	struct kiocb		*iocb,
 	struct iov_iter		*iter,
-	loff_t			offset,
-	dio_iodone_t		endio,
-	int			flags)
+	loff_t			offset)
 {
+	struct inode		*inode = iocb->ki_filp->f_mapping->host;
+	dio_iodone_t		*endio = NULL;
+	int			flags = 0;
 	struct block_device	*bdev;
 
-	if (IS_DAX(inode))
+	if (iov_iter_rw(iter) == WRITE) {
+		endio = xfs_end_io_direct_write;
+		flags = DIO_ASYNC_EXTEND;
+	}
+
+	if (IS_DAX(inode)) {
 		return dax_do_io(iocb, inode, iter, offset,
 				 xfs_get_blocks_direct, endio, 0);
+	}
 
 	bdev = xfs_find_bdev_for_inode(inode);
 	return  __blockdev_direct_IO(iocb, inode, bdev, iter, offset,
-				     xfs_get_blocks_direct, endio, NULL, flags);
-}
-
-STATIC ssize_t
-xfs_vm_direct_IO(
-	struct kiocb		*iocb,
-	struct iov_iter		*iter,
-	loff_t			offset)
-{
-	struct inode		*inode = iocb->ki_filp->f_mapping->host;
-
-	if (iov_iter_rw(iter) == WRITE)
-		return xfs_vm_do_dio(inode, iocb, iter, offset,
-				     xfs_end_io_direct_write, DIO_ASYNC_EXTEND);
-	return xfs_vm_do_dio(inode, iocb, iter, offset, NULL, 0);
+			xfs_get_blocks_direct, endio, NULL, flags);
 }
 
 /*
-- 
2.1.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] xfs: fold xfs_vm_do_dio into xfs_vm_direct_IO
@ 2016-02-03 18:40   ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-03 18:40 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

See http://www.infradead.org/rpr.html

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c | 36 ++++++++++++++----------------------
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index f008a4f..8163910 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1596,38 +1596,30 @@ xfs_end_io_direct_write(
 	return error;
 }
 
-static inline ssize_t
-xfs_vm_do_dio(
-	struct inode		*inode,
+STATIC ssize_t
+xfs_vm_direct_IO(
 	struct kiocb		*iocb,
 	struct iov_iter		*iter,
-	loff_t			offset,
-	dio_iodone_t		endio,
-	int			flags)
+	loff_t			offset)
 {
+	struct inode		*inode = iocb->ki_filp->f_mapping->host;
+	dio_iodone_t		*endio = NULL;
+	int			flags = 0;
 	struct block_device	*bdev;
 
-	if (IS_DAX(inode))
+	if (iov_iter_rw(iter) == WRITE) {
+		endio = xfs_end_io_direct_write;
+		flags = DIO_ASYNC_EXTEND;
+	}
+
+	if (IS_DAX(inode)) {
 		return dax_do_io(iocb, inode, iter, offset,
 				 xfs_get_blocks_direct, endio, 0);
+	}
 
 	bdev = xfs_find_bdev_for_inode(inode);
 	return  __blockdev_direct_IO(iocb, inode, bdev, iter, offset,
-				     xfs_get_blocks_direct, endio, NULL, flags);
-}
-
-STATIC ssize_t
-xfs_vm_direct_IO(
-	struct kiocb		*iocb,
-	struct iov_iter		*iter,
-	loff_t			offset)
-{
-	struct inode		*inode = iocb->ki_filp->f_mapping->host;
-
-	if (iov_iter_rw(iter) == WRITE)
-		return xfs_vm_do_dio(inode, iocb, iter, offset,
-				     xfs_end_io_direct_write, DIO_ASYNC_EXTEND);
-	return xfs_vm_do_dio(inode, iocb, iter, offset, NULL, 0);
+			xfs_get_blocks_direct, endio, NULL, flags);
 }
 
 /*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: vfs/xfs: directio updates to ease COW handling V2
  2016-02-03 18:40 ` Christoph Hellwig
  (?)
@ 2016-02-03 19:43   ` Jeff Moyer
  -1 siblings, 0 replies; 52+ messages in thread
From: Jeff Moyer @ 2016-02-03 19:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

Hi, Christoph,

Can you explain a bit what you mean by easing COW handling?  Whenever I
see COW referenced near DIO, my mind always turns to g_u_p vs. fork.

Thanks!
Jeff

Christoph Hellwig <hch@lst.de> writes:

> See http://www.infradead.org/rpr.html
>
> The first patch ensures ->end_io is always called for direct I/O requests
> that pass it in, even if there was a zero length write, or if an error
> occured.  The existing users have been updated to ignore it, but XFS
> will make use of it in the future, and a comment in ext4 suggests it
> might be useful for it as well.
>
> The other two simplify the XFS direct I/O code.
>
> Changes since V1:
>  - allow ->end_io to return errors
>  - a comment spelling fix
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: vfs/xfs: directio updates to ease COW handling V2
@ 2016-02-03 19:43   ` Jeff Moyer
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff Moyer @ 2016-02-03 19:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4, ocfs2-devel, xfs

Hi, Christoph,

Can you explain a bit what you mean by easing COW handling?  Whenever I
see COW referenced near DIO, my mind always turns to g_u_p vs. fork.

Thanks!
Jeff

Christoph Hellwig <hch@lst.de> writes:

> See http://www.infradead.org/rpr.html
>
> The first patch ensures ->end_io is always called for direct I/O requests
> that pass it in, even if there was a zero length write, or if an error
> occured.  The existing users have been updated to ignore it, but XFS
> will make use of it in the future, and a comment in ext4 suggests it
> might be useful for it as well.
>
> The other two simplify the XFS direct I/O code.
>
> Changes since V1:
>  - allow ->end_io to return errors
>  - a comment spelling fix
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] vfs/xfs: directio updates to ease COW handling V2
@ 2016-02-03 19:43   ` Jeff Moyer
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff Moyer @ 2016-02-03 19:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

Hi, Christoph,

Can you explain a bit what you mean by easing COW handling?  Whenever I
see COW referenced near DIO, my mind always turns to g_u_p vs. fork.

Thanks!
Jeff

Christoph Hellwig <hch@lst.de> writes:

> See http://www.infradead.org/rpr.html
>
> The first patch ensures ->end_io is always called for direct I/O requests
> that pass it in, even if there was a zero length write, or if an error
> occured.  The existing users have been updated to ignore it, but XFS
> will make use of it in the future, and a comment in ext4 suggests it
> might be useful for it as well.
>
> The other two simplify the XFS direct I/O code.
>
> Changes since V1:
>  - allow ->end_io to return errors
>  - a comment spelling fix
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/3] direct-io: always call ->end_io if non-NULL
  2016-02-03 18:40   ` Christoph Hellwig
  (?)
@ 2016-02-03 19:55     ` Darrick J. Wong
  -1 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2016-02-03 19:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Wed, Feb 03, 2016 at 07:40:14PM +0100, Christoph Hellwig wrote:
> This way we can pass back errors to the file system, and allow for
> cleanup required for all direct I/O invocations.
> 
> Also allow the ->end_io handlers to return errors on their own, so that
> I/O completion errors can be passed on to the callers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/dax.c           |  9 +++++++--
>  fs/direct-io.c     |  9 +++++++--
>  fs/ext4/inode.c    |  9 +++++++--
>  fs/ocfs2/aops.c    |  7 ++++++-
>  fs/xfs/xfs_aops.c  | 13 +++++++------
>  include/linux/fs.h |  2 +-
>  6 files changed, 35 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 4fd6b0c..e38b2c5 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -267,8 +267,13 @@ ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
>  	if ((flags & DIO_LOCKING) && iov_iter_rw(iter) == READ)
>  		inode_unlock(inode);
>  
> -	if ((retval > 0) && end_io)
> -		end_io(iocb, pos, retval, bh.b_private);
> +	if (end_io) {
> +		int err;
> +
> +		err = end_io(iocb, pos, retval, bh.b_private);
> +		if (err)
> +			retval = err;

This will have the effect of a later error superseding an earlier error.  I'm
under the impression that code should generally preserve the first error, since
some side effect of that probably caused the rest of the errors.

That said, my guess is that 95% of the time err is set, retval and err will
both be -EIO anyway.  I'm not particularly passionate about whether or not we
preserve the first error code.

> +	}
>  
>  	if (!(flags & DIO_SKIP_DIO_COUNT))
>  		inode_dio_end(inode);
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 1b2f7ff..9c6f885 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -253,8 +253,13 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
>  	if (ret == 0)
>  		ret = transferred;
>  
> -	if (dio->end_io && dio->result)
> -		dio->end_io(dio->iocb, offset, transferred, dio->private);
> +	if (dio->end_io) {
> +		int err;
> +
> +		err = dio->end_io(dio->iocb, offset, ret, dio->private);
> +		if (err)
> +			ret = err;

Same comment here.  Other than that, everything vfs looks ok to me.

> +	}
>  
>  	if (!(dio->flags & DIO_SKIP_DIO_COUNT))
>  		inode_dio_end(dio->inode);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 83bc8bf..9db04dd 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3161,14 +3161,17 @@ out:
>  }
>  #endif
>  
> -static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
> +static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
>  			    ssize_t size, void *private)
>  {
>          ext4_io_end_t *io_end = iocb->private;
>  
> +	if (size <= 0)
> +		return 0;

This leaks the ext4_io_end_t, if there was one.  Granted, that only happens
during an AIO DIO to an unwritten extent, but in any case I suggest removing
this hunk and...

> +
>  	/* if not async direct IO just return */
>  	if (!io_end)
> -		return;
> +		return 0;
>  
>  	ext_debug("ext4_end_io_dio(): io_end 0x%p "
>  		  "for inode %lu, iocb 0x%p, offset %llu, size %zd\n",
> @@ -3179,6 +3182,8 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
>  	io_end->offset = offset;
>  	io_end->size = size;

...if size < 0, then call ext4_clear_io_unwritten_flag to neutralize the ioend:

if (size < 0)
	ext4_clear_io_unwritten_flag(io_end);
else
  	io_end->size = size;
ext4_put_io_end(io_end);
return 0;

You'll probably have to stuff ext4_clear_io_unwritten_flag into a header file;
it's currently a static function somewhere.

>  	ext4_put_io_end(io_end);
> +
> +	return 0;
>  }
>  
>  /*
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 794fd15..5dcc5f5 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -620,7 +620,7 @@ bail:
>   * particularly interested in the aio/dio case.  We use the rw_lock DLM lock
>   * to protect io on one node from truncation on another.
>   */
> -static void ocfs2_dio_end_io(struct kiocb *iocb,
> +static int ocfs2_dio_end_io(struct kiocb *iocb,
>  			     loff_t offset,
>  			     ssize_t bytes,
>  			     void *private)
> @@ -628,6 +628,9 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
>  	struct inode *inode = file_inode(iocb->ki_filp);
>  	int level;
>  
> +	if (bytes <= 0)
> +		return 0;
> +

I suspect we still need to unlock the mutexes later on in this function.

>  	/* this io's submitter should not have unlocked this before we could */
>  	BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
>  
> @@ -644,6 +647,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
>  		level = ocfs2_iocb_rw_locked_level(iocb);
>  		ocfs2_rw_unlock(inode, level);
>  	}

Do we need to still have an accurate value for bytes the conditional above
even if the IO errored out?  

> +
> +	return 0;
>  }
>  
>  static int ocfs2_releasepage(struct page *page, gfp_t wait)
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 379c089..295aaff 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1645,7 +1645,7 @@ out_end_io:
>   * case the completion can be called in interrupt context, whereas if we have an
>   * ioend we will always be called in task context (i.e. from a workqueue).
>   */
> -STATIC void
> +STATIC int
>  xfs_end_io_direct_write(
>  	struct kiocb		*iocb,
>  	loff_t			offset,
> @@ -1655,15 +1655,19 @@ xfs_end_io_direct_write(
>  	struct inode		*inode = file_inode(iocb->ki_filp);
>  	struct xfs_ioend	*ioend = private;
>  
> +	if (size <= 0)
> +		return 0;

Same thing here, I think we can end up leaking the ioend.

--D

> +
>  	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
>  				     ioend ? ioend->io_type : 0, NULL);
>  
>  	if (!ioend) {
>  		ASSERT(offset + size <= i_size_read(inode));
> -		return;
> +		return 0;
>  	}
>  
>  	__xfs_end_io_direct_write(inode, ioend, offset, size);
> +	return 0;
>  }
>  
>  static inline ssize_t
> @@ -1672,10 +1676,7 @@ xfs_vm_do_dio(
>  	struct kiocb		*iocb,
>  	struct iov_iter		*iter,
>  	loff_t			offset,
> -	void			(*endio)(struct kiocb	*iocb,
> -					 loff_t		offset,
> -					 ssize_t	size,
> -					 void		*private),
> +	dio_iodone_t		endio,
>  	int			flags)
>  {
>  	struct block_device	*bdev;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 1a20462..d7f37bf 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -70,7 +70,7 @@ extern int sysctl_protected_hardlinks;
>  struct buffer_head;
>  typedef int (get_block_t)(struct inode *inode, sector_t iblock,
>  			struct buffer_head *bh_result, int create);
> -typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
> +typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
>  			ssize_t bytes, void *private);
>  typedef void (dax_iodone_t)(struct buffer_head *bh_map, int uptodate);
>  
> -- 
> 2.1.4
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/3] direct-io: always call ->end_io if non-NULL
@ 2016-02-03 19:55     ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2016-02-03 19:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4, ocfs2-devel, xfs

On Wed, Feb 03, 2016 at 07:40:14PM +0100, Christoph Hellwig wrote:
> This way we can pass back errors to the file system, and allow for
> cleanup required for all direct I/O invocations.
> 
> Also allow the ->end_io handlers to return errors on their own, so that
> I/O completion errors can be passed on to the callers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/dax.c           |  9 +++++++--
>  fs/direct-io.c     |  9 +++++++--
>  fs/ext4/inode.c    |  9 +++++++--
>  fs/ocfs2/aops.c    |  7 ++++++-
>  fs/xfs/xfs_aops.c  | 13 +++++++------
>  include/linux/fs.h |  2 +-
>  6 files changed, 35 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 4fd6b0c..e38b2c5 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -267,8 +267,13 @@ ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
>  	if ((flags & DIO_LOCKING) && iov_iter_rw(iter) == READ)
>  		inode_unlock(inode);
>  
> -	if ((retval > 0) && end_io)
> -		end_io(iocb, pos, retval, bh.b_private);
> +	if (end_io) {
> +		int err;
> +
> +		err = end_io(iocb, pos, retval, bh.b_private);
> +		if (err)
> +			retval = err;

This will have the effect of a later error superseding an earlier error.  I'm
under the impression that code should generally preserve the first error, since
some side effect of that probably caused the rest of the errors.

That said, my guess is that 95% of the time err is set, retval and err will
both be -EIO anyway.  I'm not particularly passionate about whether or not we
preserve the first error code.

> +	}
>  
>  	if (!(flags & DIO_SKIP_DIO_COUNT))
>  		inode_dio_end(inode);
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 1b2f7ff..9c6f885 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -253,8 +253,13 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
>  	if (ret == 0)
>  		ret = transferred;
>  
> -	if (dio->end_io && dio->result)
> -		dio->end_io(dio->iocb, offset, transferred, dio->private);
> +	if (dio->end_io) {
> +		int err;
> +
> +		err = dio->end_io(dio->iocb, offset, ret, dio->private);
> +		if (err)
> +			ret = err;

Same comment here.  Other than that, everything vfs looks ok to me.

> +	}
>  
>  	if (!(dio->flags & DIO_SKIP_DIO_COUNT))
>  		inode_dio_end(dio->inode);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 83bc8bf..9db04dd 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3161,14 +3161,17 @@ out:
>  }
>  #endif
>  
> -static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
> +static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
>  			    ssize_t size, void *private)
>  {
>          ext4_io_end_t *io_end = iocb->private;
>  
> +	if (size <= 0)
> +		return 0;

This leaks the ext4_io_end_t, if there was one.  Granted, that only happens
during an AIO DIO to an unwritten extent, but in any case I suggest removing
this hunk and...

> +
>  	/* if not async direct IO just return */
>  	if (!io_end)
> -		return;
> +		return 0;
>  
>  	ext_debug("ext4_end_io_dio(): io_end 0x%p "
>  		  "for inode %lu, iocb 0x%p, offset %llu, size %zd\n",
> @@ -3179,6 +3182,8 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
>  	io_end->offset = offset;
>  	io_end->size = size;

...if size < 0, then call ext4_clear_io_unwritten_flag to neutralize the ioend:

if (size < 0)
	ext4_clear_io_unwritten_flag(io_end);
else
  	io_end->size = size;
ext4_put_io_end(io_end);
return 0;

You'll probably have to stuff ext4_clear_io_unwritten_flag into a header file;
it's currently a static function somewhere.

>  	ext4_put_io_end(io_end);
> +
> +	return 0;
>  }
>  
>  /*
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 794fd15..5dcc5f5 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -620,7 +620,7 @@ bail:
>   * particularly interested in the aio/dio case.  We use the rw_lock DLM lock
>   * to protect io on one node from truncation on another.
>   */
> -static void ocfs2_dio_end_io(struct kiocb *iocb,
> +static int ocfs2_dio_end_io(struct kiocb *iocb,
>  			     loff_t offset,
>  			     ssize_t bytes,
>  			     void *private)
> @@ -628,6 +628,9 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
>  	struct inode *inode = file_inode(iocb->ki_filp);
>  	int level;
>  
> +	if (bytes <= 0)
> +		return 0;
> +

I suspect we still need to unlock the mutexes later on in this function.

>  	/* this io's submitter should not have unlocked this before we could */
>  	BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
>  
> @@ -644,6 +647,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
>  		level = ocfs2_iocb_rw_locked_level(iocb);
>  		ocfs2_rw_unlock(inode, level);
>  	}

Do we need to still have an accurate value for bytes the conditional above
even if the IO errored out?  

> +
> +	return 0;
>  }
>  
>  static int ocfs2_releasepage(struct page *page, gfp_t wait)
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 379c089..295aaff 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1645,7 +1645,7 @@ out_end_io:
>   * case the completion can be called in interrupt context, whereas if we have an
>   * ioend we will always be called in task context (i.e. from a workqueue).
>   */
> -STATIC void
> +STATIC int
>  xfs_end_io_direct_write(
>  	struct kiocb		*iocb,
>  	loff_t			offset,
> @@ -1655,15 +1655,19 @@ xfs_end_io_direct_write(
>  	struct inode		*inode = file_inode(iocb->ki_filp);
>  	struct xfs_ioend	*ioend = private;
>  
> +	if (size <= 0)
> +		return 0;

Same thing here, I think we can end up leaking the ioend.

--D

> +
>  	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
>  				     ioend ? ioend->io_type : 0, NULL);
>  
>  	if (!ioend) {
>  		ASSERT(offset + size <= i_size_read(inode));
> -		return;
> +		return 0;
>  	}
>  
>  	__xfs_end_io_direct_write(inode, ioend, offset, size);
> +	return 0;
>  }
>  
>  static inline ssize_t
> @@ -1672,10 +1676,7 @@ xfs_vm_do_dio(
>  	struct kiocb		*iocb,
>  	struct iov_iter		*iter,
>  	loff_t			offset,
> -	void			(*endio)(struct kiocb	*iocb,
> -					 loff_t		offset,
> -					 ssize_t	size,
> -					 void		*private),
> +	dio_iodone_t		endio,
>  	int			flags)
>  {
>  	struct block_device	*bdev;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 1a20462..d7f37bf 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -70,7 +70,7 @@ extern int sysctl_protected_hardlinks;
>  struct buffer_head;
>  typedef int (get_block_t)(struct inode *inode, sector_t iblock,
>  			struct buffer_head *bh_result, int create);
> -typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
> +typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
>  			ssize_t bytes, void *private);
>  typedef void (dax_iodone_t)(struct buffer_head *bh_map, int uptodate);
>  
> -- 
> 2.1.4
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] direct-io: always call ->end_io if non-NULL
@ 2016-02-03 19:55     ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2016-02-03 19:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Wed, Feb 03, 2016 at 07:40:14PM +0100, Christoph Hellwig wrote:
> This way we can pass back errors to the file system, and allow for
> cleanup required for all direct I/O invocations.
> 
> Also allow the ->end_io handlers to return errors on their own, so that
> I/O completion errors can be passed on to the callers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/dax.c           |  9 +++++++--
>  fs/direct-io.c     |  9 +++++++--
>  fs/ext4/inode.c    |  9 +++++++--
>  fs/ocfs2/aops.c    |  7 ++++++-
>  fs/xfs/xfs_aops.c  | 13 +++++++------
>  include/linux/fs.h |  2 +-
>  6 files changed, 35 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 4fd6b0c..e38b2c5 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -267,8 +267,13 @@ ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
>  	if ((flags & DIO_LOCKING) && iov_iter_rw(iter) == READ)
>  		inode_unlock(inode);
>  
> -	if ((retval > 0) && end_io)
> -		end_io(iocb, pos, retval, bh.b_private);
> +	if (end_io) {
> +		int err;
> +
> +		err = end_io(iocb, pos, retval, bh.b_private);
> +		if (err)
> +			retval = err;

This will have the effect of a later error superseding an earlier error.  I'm
under the impression that code should generally preserve the first error, since
some side effect of that probably caused the rest of the errors.

That said, my guess is that 95% of the time err is set, retval and err will
both be -EIO anyway.  I'm not particularly passionate about whether or not we
preserve the first error code.

> +	}
>  
>  	if (!(flags & DIO_SKIP_DIO_COUNT))
>  		inode_dio_end(inode);
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 1b2f7ff..9c6f885 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -253,8 +253,13 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
>  	if (ret == 0)
>  		ret = transferred;
>  
> -	if (dio->end_io && dio->result)
> -		dio->end_io(dio->iocb, offset, transferred, dio->private);
> +	if (dio->end_io) {
> +		int err;
> +
> +		err = dio->end_io(dio->iocb, offset, ret, dio->private);
> +		if (err)
> +			ret = err;

Same comment here.  Other than that, everything vfs looks ok to me.

> +	}
>  
>  	if (!(dio->flags & DIO_SKIP_DIO_COUNT))
>  		inode_dio_end(dio->inode);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 83bc8bf..9db04dd 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3161,14 +3161,17 @@ out:
>  }
>  #endif
>  
> -static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
> +static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
>  			    ssize_t size, void *private)
>  {
>          ext4_io_end_t *io_end = iocb->private;
>  
> +	if (size <= 0)
> +		return 0;

This leaks the ext4_io_end_t, if there was one.  Granted, that only happens
during an AIO DIO to an unwritten extent, but in any case I suggest removing
this hunk and...

> +
>  	/* if not async direct IO just return */
>  	if (!io_end)
> -		return;
> +		return 0;
>  
>  	ext_debug("ext4_end_io_dio(): io_end 0x%p "
>  		  "for inode %lu, iocb 0x%p, offset %llu, size %zd\n",
> @@ -3179,6 +3182,8 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
>  	io_end->offset = offset;
>  	io_end->size = size;

...if size < 0, then call ext4_clear_io_unwritten_flag to neutralize the ioend:

if (size < 0)
	ext4_clear_io_unwritten_flag(io_end);
else
  	io_end->size = size;
ext4_put_io_end(io_end);
return 0;

You'll probably have to stuff ext4_clear_io_unwritten_flag into a header file;
it's currently a static function somewhere.

>  	ext4_put_io_end(io_end);
> +
> +	return 0;
>  }
>  
>  /*
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 794fd15..5dcc5f5 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -620,7 +620,7 @@ bail:
>   * particularly interested in the aio/dio case.  We use the rw_lock DLM lock
>   * to protect io on one node from truncation on another.
>   */
> -static void ocfs2_dio_end_io(struct kiocb *iocb,
> +static int ocfs2_dio_end_io(struct kiocb *iocb,
>  			     loff_t offset,
>  			     ssize_t bytes,
>  			     void *private)
> @@ -628,6 +628,9 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
>  	struct inode *inode = file_inode(iocb->ki_filp);
>  	int level;
>  
> +	if (bytes <= 0)
> +		return 0;
> +

I suspect we still need to unlock the mutexes later on in this function.

>  	/* this io's submitter should not have unlocked this before we could */
>  	BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
>  
> @@ -644,6 +647,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
>  		level = ocfs2_iocb_rw_locked_level(iocb);
>  		ocfs2_rw_unlock(inode, level);
>  	}

Do we need to still have an accurate value for bytes the conditional above
even if the IO errored out?  

> +
> +	return 0;
>  }
>  
>  static int ocfs2_releasepage(struct page *page, gfp_t wait)
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 379c089..295aaff 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1645,7 +1645,7 @@ out_end_io:
>   * case the completion can be called in interrupt context, whereas if we have an
>   * ioend we will always be called in task context (i.e. from a workqueue).
>   */
> -STATIC void
> +STATIC int
>  xfs_end_io_direct_write(
>  	struct kiocb		*iocb,
>  	loff_t			offset,
> @@ -1655,15 +1655,19 @@ xfs_end_io_direct_write(
>  	struct inode		*inode = file_inode(iocb->ki_filp);
>  	struct xfs_ioend	*ioend = private;
>  
> +	if (size <= 0)
> +		return 0;

Same thing here, I think we can end up leaking the ioend.

--D

> +
>  	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
>  				     ioend ? ioend->io_type : 0, NULL);
>  
>  	if (!ioend) {
>  		ASSERT(offset + size <= i_size_read(inode));
> -		return;
> +		return 0;
>  	}
>  
>  	__xfs_end_io_direct_write(inode, ioend, offset, size);
> +	return 0;
>  }
>  
>  static inline ssize_t
> @@ -1672,10 +1676,7 @@ xfs_vm_do_dio(
>  	struct kiocb		*iocb,
>  	struct iov_iter		*iter,
>  	loff_t			offset,
> -	void			(*endio)(struct kiocb	*iocb,
> -					 loff_t		offset,
> -					 ssize_t	size,
> -					 void		*private),
> +	dio_iodone_t		endio,
>  	int			flags)
>  {
>  	struct block_device	*bdev;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 1a20462..d7f37bf 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -70,7 +70,7 @@ extern int sysctl_protected_hardlinks;
>  struct buffer_head;
>  typedef int (get_block_t)(struct inode *inode, sector_t iblock,
>  			struct buffer_head *bh_result, int create);
> -typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
> +typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
>  			ssize_t bytes, void *private);
>  typedef void (dax_iodone_t)(struct buffer_head *bh_map, int uptodate);
>  
> -- 
> 2.1.4
> 
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: vfs/xfs: directio updates to ease COW handling V2
  2016-02-03 19:43   ` Jeff Moyer
  (?)
@ 2016-02-03 20:01     ` Darrick J. Wong
  -1 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2016-02-03 20:01 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Christoph Hellwig, linux-fsdevel, linux-ext4, ocfs2-devel, xfs

On Wed, Feb 03, 2016 at 02:43:02PM -0500, Jeff Moyer wrote:
> Hi, Christoph,
> 
> Can you explain a bit what you mean by easing COW handling?  Whenever I
> see COW referenced near DIO, my mind always turns to g_u_p vs. fork.

Just to clarify, I'm talking about copy on write for disk blocks, not for
memory pages.

Basically, XFS implements (disk block) copy on write for (perfectly block
aligned) directio writes by allocating a set of replacement blocks, mapping the
dio writes to the new blocks, and playing a punch/remap trick to map the new
blocks into the file at the appropriate offset.

If the write fails then we don't want do the remap, so the dio_complete handler
has to know whether or not the IO succeeded, hence the new parameter.
Furthermore, if the write succeeds but the remap fails, we also want to be able
to report that to userspace, hence the change of return value from void to int.

(If the dio write isn't block aligned, we fall back to the page cache.)

Hope that helps,

--D

> 
> Thanks!
> Jeff
> 
> Christoph Hellwig <hch@lst.de> writes:
> 
> > See http://www.infradead.org/rpr.html
> >
> > The first patch ensures ->end_io is always called for direct I/O requests
> > that pass it in, even if there was a zero length write, or if an error
> > occured.  The existing users have been updated to ignore it, but XFS
> > will make use of it in the future, and a comment in ext4 suggests it
> > might be useful for it as well.
> >
> > The other two simplify the XFS direct I/O code.
> >
> > Changes since V1:
> >  - allow ->end_io to return errors
> >  - a comment spelling fix
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: vfs/xfs: directio updates to ease COW handling V2
@ 2016-02-03 20:01     ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2016-02-03 20:01 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-fsdevel, linux-ext4, Christoph Hellwig, ocfs2-devel, xfs

On Wed, Feb 03, 2016 at 02:43:02PM -0500, Jeff Moyer wrote:
> Hi, Christoph,
> 
> Can you explain a bit what you mean by easing COW handling?  Whenever I
> see COW referenced near DIO, my mind always turns to g_u_p vs. fork.

Just to clarify, I'm talking about copy on write for disk blocks, not for
memory pages.

Basically, XFS implements (disk block) copy on write for (perfectly block
aligned) directio writes by allocating a set of replacement blocks, mapping the
dio writes to the new blocks, and playing a punch/remap trick to map the new
blocks into the file at the appropriate offset.

If the write fails then we don't want do the remap, so the dio_complete handler
has to know whether or not the IO succeeded, hence the new parameter.
Furthermore, if the write succeeds but the remap fails, we also want to be able
to report that to userspace, hence the change of return value from void to int.

(If the dio write isn't block aligned, we fall back to the page cache.)

Hope that helps,

--D

> 
> Thanks!
> Jeff
> 
> Christoph Hellwig <hch@lst.de> writes:
> 
> > See http://www.infradead.org/rpr.html
> >
> > The first patch ensures ->end_io is always called for direct I/O requests
> > that pass it in, even if there was a zero length write, or if an error
> > occured.  The existing users have been updated to ignore it, but XFS
> > will make use of it in the future, and a comment in ext4 suggests it
> > might be useful for it as well.
> >
> > The other two simplify the XFS direct I/O code.
> >
> > Changes since V1:
> >  - allow ->end_io to return errors
> >  - a comment spelling fix
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] vfs/xfs: directio updates to ease COW handling V2
@ 2016-02-03 20:01     ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2016-02-03 20:01 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Christoph Hellwig, linux-fsdevel, linux-ext4, ocfs2-devel, xfs

On Wed, Feb 03, 2016 at 02:43:02PM -0500, Jeff Moyer wrote:
> Hi, Christoph,
> 
> Can you explain a bit what you mean by easing COW handling?  Whenever I
> see COW referenced near DIO, my mind always turns to g_u_p vs. fork.

Just to clarify, I'm talking about copy on write for disk blocks, not for
memory pages.

Basically, XFS implements (disk block) copy on write for (perfectly block
aligned) directio writes by allocating a set of replacement blocks, mapping the
dio writes to the new blocks, and playing a punch/remap trick to map the new
blocks into the file at the appropriate offset.

If the write fails then we don't want do the remap, so the dio_complete handler
has to know whether or not the IO succeeded, hence the new parameter.
Furthermore, if the write succeeds but the remap fails, we also want to be able
to report that to userspace, hence the change of return value from void to int.

(If the dio write isn't block aligned, we fall back to the page cache.)

Hope that helps,

--D

> 
> Thanks!
> Jeff
> 
> Christoph Hellwig <hch@lst.de> writes:
> 
> > See http://www.infradead.org/rpr.html
> >
> > The first patch ensures ->end_io is always called for direct I/O requests
> > that pass it in, even if there was a zero length write, or if an error
> > occured.  The existing users have been updated to ignore it, but XFS
> > will make use of it in the future, and a comment in ext4 suggests it
> > might be useful for it as well.
> >
> > The other two simplify the XFS direct I/O code.
> >
> > Changes since V1:
> >  - allow ->end_io to return errors
> >  - a comment spelling fix
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo at vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: vfs/xfs: directio updates to ease COW handling V2
  2016-02-03 20:01     ` Darrick J. Wong
  (?)
@ 2016-02-03 21:53       ` Jeff Moyer
  -1 siblings, 0 replies; 52+ messages in thread
From: Jeff Moyer @ 2016-02-03 21:53 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, linux-fsdevel, linux-ext4, ocfs2-devel, xfs

"Darrick J. Wong" <darrick.wong@oracle.com> writes:

> On Wed, Feb 03, 2016 at 02:43:02PM -0500, Jeff Moyer wrote:
>> Hi, Christoph,
>> 
>> Can you explain a bit what you mean by easing COW handling?  Whenever I
>> see COW referenced near DIO, my mind always turns to g_u_p vs. fork.
>
> Just to clarify, I'm talking about copy on write for disk blocks, not for
> memory pages.
>
> Basically, XFS implements (disk block) copy on write for (perfectly block
> aligned) directio writes by allocating a set of replacement blocks, mapping the
> dio writes to the new blocks, and playing a punch/remap trick to map the new
> blocks into the file at the appropriate offset.
>
> If the write fails then we don't want do the remap, so the dio_complete handler
> has to know whether or not the IO succeeded, hence the new parameter.
> Furthermore, if the write succeeds but the remap fails, we also want to be able
> to report that to userspace, hence the change of return value from void to int.
>
> (If the dio write isn't block aligned, we fall back to the page cache.)
>
> Hope that helps,

Eric said the magic word: reflink.  I think I've got it now.

Thanks!
Jeff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: vfs/xfs: directio updates to ease COW handling V2
@ 2016-02-03 21:53       ` Jeff Moyer
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff Moyer @ 2016-02-03 21:53 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-fsdevel, linux-ext4, Christoph Hellwig, ocfs2-devel, xfs

"Darrick J. Wong" <darrick.wong@oracle.com> writes:

> On Wed, Feb 03, 2016 at 02:43:02PM -0500, Jeff Moyer wrote:
>> Hi, Christoph,
>> 
>> Can you explain a bit what you mean by easing COW handling?  Whenever I
>> see COW referenced near DIO, my mind always turns to g_u_p vs. fork.
>
> Just to clarify, I'm talking about copy on write for disk blocks, not for
> memory pages.
>
> Basically, XFS implements (disk block) copy on write for (perfectly block
> aligned) directio writes by allocating a set of replacement blocks, mapping the
> dio writes to the new blocks, and playing a punch/remap trick to map the new
> blocks into the file at the appropriate offset.
>
> If the write fails then we don't want do the remap, so the dio_complete handler
> has to know whether or not the IO succeeded, hence the new parameter.
> Furthermore, if the write succeeds but the remap fails, we also want to be able
> to report that to userspace, hence the change of return value from void to int.
>
> (If the dio write isn't block aligned, we fall back to the page cache.)
>
> Hope that helps,

Eric said the magic word: reflink.  I think I've got it now.

Thanks!
Jeff

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] vfs/xfs: directio updates to ease COW handling V2
@ 2016-02-03 21:53       ` Jeff Moyer
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff Moyer @ 2016-02-03 21:53 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, linux-fsdevel, linux-ext4, ocfs2-devel, xfs

"Darrick J. Wong" <darrick.wong@oracle.com> writes:

> On Wed, Feb 03, 2016 at 02:43:02PM -0500, Jeff Moyer wrote:
>> Hi, Christoph,
>> 
>> Can you explain a bit what you mean by easing COW handling?  Whenever I
>> see COW referenced near DIO, my mind always turns to g_u_p vs. fork.
>
> Just to clarify, I'm talking about copy on write for disk blocks, not for
> memory pages.
>
> Basically, XFS implements (disk block) copy on write for (perfectly block
> aligned) directio writes by allocating a set of replacement blocks, mapping the
> dio writes to the new blocks, and playing a punch/remap trick to map the new
> blocks into the file at the appropriate offset.
>
> If the write fails then we don't want do the remap, so the dio_complete handler
> has to know whether or not the IO succeeded, hence the new parameter.
> Furthermore, if the write succeeds but the remap fails, we also want to be able
> to report that to userspace, hence the change of return value from void to int.
>
> (If the dio write isn't block aligned, we fall back to the page cache.)
>
> Hope that helps,

Eric said the magic word: reflink.  I think I've got it now.

Thanks!
Jeff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/3] direct-io: always call ->end_io if non-NULL
  2016-02-03 19:55     ` Darrick J. Wong
  (?)
@ 2016-02-04  7:14       ` Christoph Hellwig
  -1 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-04  7:14 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Wed, Feb 03, 2016 at 11:55:31AM -0800, Darrick J. Wong wrote:
> This will have the effect of a later error superseding an earlier error.  I'm
> under the impression that code should generally preserve the first error, since
> some side effect of that probably caused the rest of the errors.
> 
> That said, my guess is that 95% of the time err is set, retval and err will
> both be -EIO anyway.  I'm not particularly passionate about whether or not we
> preserve the first error code.

This leaves the option to the file system to pass the value through
or not.  Note that ret before the call will usually have the positive
number of bytes written, so checking if it's 'set' wouldn't be enough
even if adding some special casing in the callers.

> > +static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
> >  			    ssize_t size, void *private)
> >  {
> >          ext4_io_end_t *io_end = iocb->private;
> >  
> > +	if (size <= 0)
> > +		return 0;
> 
> This leaks the ext4_io_end_t, if there was one.  Granted, that only happens
> during an AIO DIO to an unwritten extent, but in any case I suggest removing
> this hunk and...

It's the same behavior as before - and if you look at ext4_ext_direct_IO
it seems to expect this and works around it.

> > +	if (bytes <= 0)
> > +		return 0;
> > +
> 
> I suspect we still need to unlock the mutexes later on in this function.
> 
> >  	/* this io's submitter should not have unlocked this before we could */
> >  	BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
> >  
> > @@ -644,6 +647,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
> >  		level = ocfs2_iocb_rw_locked_level(iocb);
> >  		ocfs2_rw_unlock(inode, level);
> >  	}
> 
> Do we need to still have an accurate value for bytes the conditional above
> even if the IO errored out?  

Again, no changes to the old behavior.  ocfs has some magic stuffed
in iocb->private to deal with the locked state of an iocb, and while
I don't fully understand it I suspect it's to handle the existing
odd ->end_io calling conventions.  Cleaning this up would be nice,
but let's keep that a separate patch.

> >  	struct kiocb		*iocb,
> >  	loff_t			offset,
> > @@ -1655,15 +1655,19 @@ xfs_end_io_direct_write(
> >  	struct inode		*inode = file_inode(iocb->ki_filp);
> >  	struct xfs_ioend	*ioend = private;
> >  
> > +	if (size <= 0)
> > +		return 0;
> 
> Same thing here, I think we can end up leaking the ioend.

This keeps the existing behavior.  But either way, at least for
XFS all this will be properly fixed in the next patch anyway.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/3] direct-io: always call ->end_io if non-NULL
@ 2016-02-04  7:14       ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-04  7:14 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-fsdevel, linux-ext4, Christoph Hellwig, ocfs2-devel, xfs

On Wed, Feb 03, 2016 at 11:55:31AM -0800, Darrick J. Wong wrote:
> This will have the effect of a later error superseding an earlier error.  I'm
> under the impression that code should generally preserve the first error, since
> some side effect of that probably caused the rest of the errors.
> 
> That said, my guess is that 95% of the time err is set, retval and err will
> both be -EIO anyway.  I'm not particularly passionate about whether or not we
> preserve the first error code.

This leaves the option to the file system to pass the value through
or not.  Note that ret before the call will usually have the positive
number of bytes written, so checking if it's 'set' wouldn't be enough
even if adding some special casing in the callers.

> > +static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
> >  			    ssize_t size, void *private)
> >  {
> >          ext4_io_end_t *io_end = iocb->private;
> >  
> > +	if (size <= 0)
> > +		return 0;
> 
> This leaks the ext4_io_end_t, if there was one.  Granted, that only happens
> during an AIO DIO to an unwritten extent, but in any case I suggest removing
> this hunk and...

It's the same behavior as before - and if you look at ext4_ext_direct_IO
it seems to expect this and works around it.

> > +	if (bytes <= 0)
> > +		return 0;
> > +
> 
> I suspect we still need to unlock the mutexes later on in this function.
> 
> >  	/* this io's submitter should not have unlocked this before we could */
> >  	BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
> >  
> > @@ -644,6 +647,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
> >  		level = ocfs2_iocb_rw_locked_level(iocb);
> >  		ocfs2_rw_unlock(inode, level);
> >  	}
> 
> Do we need to still have an accurate value for bytes the conditional above
> even if the IO errored out?  

Again, no changes to the old behavior.  ocfs has some magic stuffed
in iocb->private to deal with the locked state of an iocb, and while
I don't fully understand it I suspect it's to handle the existing
odd ->end_io calling conventions.  Cleaning this up would be nice,
but let's keep that a separate patch.

> >  	struct kiocb		*iocb,
> >  	loff_t			offset,
> > @@ -1655,15 +1655,19 @@ xfs_end_io_direct_write(
> >  	struct inode		*inode = file_inode(iocb->ki_filp);
> >  	struct xfs_ioend	*ioend = private;
> >  
> > +	if (size <= 0)
> > +		return 0;
> 
> Same thing here, I think we can end up leaking the ioend.

This keeps the existing behavior.  But either way, at least for
XFS all this will be properly fixed in the next patch anyway.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] direct-io: always call ->end_io if non-NULL
@ 2016-02-04  7:14       ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-04  7:14 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Wed, Feb 03, 2016 at 11:55:31AM -0800, Darrick J. Wong wrote:
> This will have the effect of a later error superseding an earlier error.  I'm
> under the impression that code should generally preserve the first error, since
> some side effect of that probably caused the rest of the errors.
> 
> That said, my guess is that 95% of the time err is set, retval and err will
> both be -EIO anyway.  I'm not particularly passionate about whether or not we
> preserve the first error code.

This leaves the option to the file system to pass the value through
or not.  Note that ret before the call will usually have the positive
number of bytes written, so checking if it's 'set' wouldn't be enough
even if adding some special casing in the callers.

> > +static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
> >  			    ssize_t size, void *private)
> >  {
> >          ext4_io_end_t *io_end = iocb->private;
> >  
> > +	if (size <= 0)
> > +		return 0;
> 
> This leaks the ext4_io_end_t, if there was one.  Granted, that only happens
> during an AIO DIO to an unwritten extent, but in any case I suggest removing
> this hunk and...

It's the same behavior as before - and if you look at ext4_ext_direct_IO
it seems to expect this and works around it.

> > +	if (bytes <= 0)
> > +		return 0;
> > +
> 
> I suspect we still need to unlock the mutexes later on in this function.
> 
> >  	/* this io's submitter should not have unlocked this before we could */
> >  	BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
> >  
> > @@ -644,6 +647,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
> >  		level = ocfs2_iocb_rw_locked_level(iocb);
> >  		ocfs2_rw_unlock(inode, level);
> >  	}
> 
> Do we need to still have an accurate value for bytes the conditional above
> even if the IO errored out?  

Again, no changes to the old behavior.  ocfs has some magic stuffed
in iocb->private to deal with the locked state of an iocb, and while
I don't fully understand it I suspect it's to handle the existing
odd ->end_io calling conventions.  Cleaning this up would be nice,
but let's keep that a separate patch.

> >  	struct kiocb		*iocb,
> >  	loff_t			offset,
> > @@ -1655,15 +1655,19 @@ xfs_end_io_direct_write(
> >  	struct inode		*inode = file_inode(iocb->ki_filp);
> >  	struct xfs_ioend	*ioend = private;
> >  
> > +	if (size <= 0)
> > +		return 0;
> 
> Same thing here, I think we can end up leaking the ioend.

This keeps the existing behavior.  But either way, at least for
XFS all this will be properly fixed in the next patch anyway.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/3] direct-io: always call ->end_io if non-NULL
  2016-02-04  7:14       ` Christoph Hellwig
  (?)
@ 2016-02-04  8:17         ` Darrick J. Wong
  -1 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2016-02-04  8:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Thu, Feb 04, 2016 at 08:14:55AM +0100, Christoph Hellwig wrote:
> On Wed, Feb 03, 2016 at 11:55:31AM -0800, Darrick J. Wong wrote:
> > This will have the effect of a later error superseding an earlier error.  I'm
> > under the impression that code should generally preserve the first error, since
> > some side effect of that probably caused the rest of the errors.
> > 
> > That said, my guess is that 95% of the time err is set, retval and err will
> > both be -EIO anyway.  I'm not particularly passionate about whether or not we
> > preserve the first error code.
> 
> This leaves the option to the file system to pass the value through
> or not.  Note that ret before the call will usually have the positive
> number of bytes written, so checking if it's 'set' wouldn't be enough
> even if adding some special casing in the callers.

Ok, I can live with that.

> > > +static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
> > >  			    ssize_t size, void *private)
> > >  {
> > >          ext4_io_end_t *io_end = iocb->private;
> > >  
> > > +	if (size <= 0)
> > > +		return 0;
> > 
> > This leaks the ext4_io_end_t, if there was one.  Granted, that only happens
> > during an AIO DIO to an unwritten extent, but in any case I suggest removing
> > this hunk and...
> 
> It's the same behavior as before - and if you look at ext4_ext_direct_IO
> it seems to expect this and works around it.

Gotcha.  That's right, so I'll stop worrying about these. :)

--D

> 
> > > +	if (bytes <= 0)
> > > +		return 0;
> > > +
> > 
> > I suspect we still need to unlock the mutexes later on in this function.
> > 
> > >  	/* this io's submitter should not have unlocked this before we could */
> > >  	BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
> > >  
> > > @@ -644,6 +647,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
> > >  		level = ocfs2_iocb_rw_locked_level(iocb);
> > >  		ocfs2_rw_unlock(inode, level);
> > >  	}
> > 
> > Do we need to still have an accurate value for bytes the conditional above
> > even if the IO errored out?  
> 
> Again, no changes to the old behavior.  ocfs has some magic stuffed
> in iocb->private to deal with the locked state of an iocb, and while
> I don't fully understand it I suspect it's to handle the existing
> odd ->end_io calling conventions.  Cleaning this up would be nice,
> but let's keep that a separate patch.
> 
> > >  	struct kiocb		*iocb,
> > >  	loff_t			offset,
> > > @@ -1655,15 +1655,19 @@ xfs_end_io_direct_write(
> > >  	struct inode		*inode = file_inode(iocb->ki_filp);
> > >  	struct xfs_ioend	*ioend = private;
> > >  
> > > +	if (size <= 0)
> > > +		return 0;
> > 
> > Same thing here, I think we can end up leaking the ioend.
> 
> This keeps the existing behavior.  But either way, at least for
> XFS all this will be properly fixed in the next patch anyway.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/3] direct-io: always call ->end_io if non-NULL
@ 2016-02-04  8:17         ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2016-02-04  8:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4, ocfs2-devel, xfs

On Thu, Feb 04, 2016 at 08:14:55AM +0100, Christoph Hellwig wrote:
> On Wed, Feb 03, 2016 at 11:55:31AM -0800, Darrick J. Wong wrote:
> > This will have the effect of a later error superseding an earlier error.  I'm
> > under the impression that code should generally preserve the first error, since
> > some side effect of that probably caused the rest of the errors.
> > 
> > That said, my guess is that 95% of the time err is set, retval and err will
> > both be -EIO anyway.  I'm not particularly passionate about whether or not we
> > preserve the first error code.
> 
> This leaves the option to the file system to pass the value through
> or not.  Note that ret before the call will usually have the positive
> number of bytes written, so checking if it's 'set' wouldn't be enough
> even if adding some special casing in the callers.

Ok, I can live with that.

> > > +static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
> > >  			    ssize_t size, void *private)
> > >  {
> > >          ext4_io_end_t *io_end = iocb->private;
> > >  
> > > +	if (size <= 0)
> > > +		return 0;
> > 
> > This leaks the ext4_io_end_t, if there was one.  Granted, that only happens
> > during an AIO DIO to an unwritten extent, but in any case I suggest removing
> > this hunk and...
> 
> It's the same behavior as before - and if you look at ext4_ext_direct_IO
> it seems to expect this and works around it.

Gotcha.  That's right, so I'll stop worrying about these. :)

--D

> 
> > > +	if (bytes <= 0)
> > > +		return 0;
> > > +
> > 
> > I suspect we still need to unlock the mutexes later on in this function.
> > 
> > >  	/* this io's submitter should not have unlocked this before we could */
> > >  	BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
> > >  
> > > @@ -644,6 +647,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
> > >  		level = ocfs2_iocb_rw_locked_level(iocb);
> > >  		ocfs2_rw_unlock(inode, level);
> > >  	}
> > 
> > Do we need to still have an accurate value for bytes the conditional above
> > even if the IO errored out?  
> 
> Again, no changes to the old behavior.  ocfs has some magic stuffed
> in iocb->private to deal with the locked state of an iocb, and while
> I don't fully understand it I suspect it's to handle the existing
> odd ->end_io calling conventions.  Cleaning this up would be nice,
> but let's keep that a separate patch.
> 
> > >  	struct kiocb		*iocb,
> > >  	loff_t			offset,
> > > @@ -1655,15 +1655,19 @@ xfs_end_io_direct_write(
> > >  	struct inode		*inode = file_inode(iocb->ki_filp);
> > >  	struct xfs_ioend	*ioend = private;
> > >  
> > > +	if (size <= 0)
> > > +		return 0;
> > 
> > Same thing here, I think we can end up leaking the ioend.
> 
> This keeps the existing behavior.  But either way, at least for
> XFS all this will be properly fixed in the next patch anyway.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] direct-io: always call ->end_io if non-NULL
@ 2016-02-04  8:17         ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2016-02-04  8:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Thu, Feb 04, 2016 at 08:14:55AM +0100, Christoph Hellwig wrote:
> On Wed, Feb 03, 2016 at 11:55:31AM -0800, Darrick J. Wong wrote:
> > This will have the effect of a later error superseding an earlier error.  I'm
> > under the impression that code should generally preserve the first error, since
> > some side effect of that probably caused the rest of the errors.
> > 
> > That said, my guess is that 95% of the time err is set, retval and err will
> > both be -EIO anyway.  I'm not particularly passionate about whether or not we
> > preserve the first error code.
> 
> This leaves the option to the file system to pass the value through
> or not.  Note that ret before the call will usually have the positive
> number of bytes written, so checking if it's 'set' wouldn't be enough
> even if adding some special casing in the callers.

Ok, I can live with that.

> > > +static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
> > >  			    ssize_t size, void *private)
> > >  {
> > >          ext4_io_end_t *io_end = iocb->private;
> > >  
> > > +	if (size <= 0)
> > > +		return 0;
> > 
> > This leaks the ext4_io_end_t, if there was one.  Granted, that only happens
> > during an AIO DIO to an unwritten extent, but in any case I suggest removing
> > this hunk and...
> 
> It's the same behavior as before - and if you look at ext4_ext_direct_IO
> it seems to expect this and works around it.

Gotcha.  That's right, so I'll stop worrying about these. :)

--D

> 
> > > +	if (bytes <= 0)
> > > +		return 0;
> > > +
> > 
> > I suspect we still need to unlock the mutexes later on in this function.
> > 
> > >  	/* this io's submitter should not have unlocked this before we could */
> > >  	BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
> > >  
> > > @@ -644,6 +647,8 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
> > >  		level = ocfs2_iocb_rw_locked_level(iocb);
> > >  		ocfs2_rw_unlock(inode, level);
> > >  	}
> > 
> > Do we need to still have an accurate value for bytes the conditional above
> > even if the IO errored out?  
> 
> Again, no changes to the old behavior.  ocfs has some magic stuffed
> in iocb->private to deal with the locked state of an iocb, and while
> I don't fully understand it I suspect it's to handle the existing
> odd ->end_io calling conventions.  Cleaning this up would be nice,
> but let's keep that a separate patch.
> 
> > >  	struct kiocb		*iocb,
> > >  	loff_t			offset,
> > > @@ -1655,15 +1655,19 @@ xfs_end_io_direct_write(
> > >  	struct inode		*inode = file_inode(iocb->ki_filp);
> > >  	struct xfs_ioend	*ioend = private;
> > >  
> > > +	if (size <= 0)
> > > +		return 0;
> > 
> > Same thing here, I think we can end up leaking the ioend.
> 
> This keeps the existing behavior.  But either way, at least for
> XFS all this will be properly fixed in the next patch anyway.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
  2016-02-03 18:40   ` Christoph Hellwig
  (?)
@ 2016-02-05 21:57     ` Darrick J. Wong
  -1 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2016-02-05 21:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Wed, Feb 03, 2016 at 07:40:15PM +0100, Christoph Hellwig wrote:
> We only need to communicate two bits of information to the direct I/O
> completion handler:
> 
>  (1) do we need to convert any unwritten extents in the range
>  (2) do we need to check if we need to update the inode size based
>      on the range passed to the completion handler
> 
> We can use the private data passed to the get_block handler and the
> completion handler as a simple bitmask to communicate this information
> instead of the current complicated infrastructure reusing the ioends
> from the buffer I/O path, and thus avoiding a memory allocation and
> a context switch for any non-trivial direct write.  As a nice side
> effect we also decouple the direct I/O path implementation from that
> of the buffered I/O path.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_aops.c  | 216 ++++++++++++++++++-----------------------------------
>  fs/xfs/xfs_trace.h |   9 +--
>  2 files changed, 75 insertions(+), 150 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 295aaff..f008a4f 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -36,6 +36,10 @@
>  #include <linux/pagevec.h>
>  #include <linux/writeback.h>
>  
> +/* flags for direct write completions */
> +#define XFS_DIO_FLAG_UNWRITTEN	(1 << 0)
> +#define XFS_DIO_FLAG_APPEND	(1 << 1)
> +
>  void
>  xfs_count_page_state(
>  	struct page		*page,
> @@ -1238,27 +1242,8 @@ xfs_vm_releasepage(
>  }
>  
>  /*
> - * When we map a DIO buffer, we may need to attach an ioend that describes the
> - * type of write IO we are doing. This passes to the completion function the
> - * operations it needs to perform. If the mapping is for an overwrite wholly
> - * within the EOF then we don't need an ioend and so we don't allocate one.
> - * This avoids the unnecessary overhead of allocating and freeing ioends for
> - * workloads that don't require transactions on IO completion.
> - *
> - * If we get multiple mappings in a single IO, we might be mapping different
> - * types. But because the direct IO can only have a single private pointer, we
> - * need to ensure that:
> - *
> - * a) i) the ioend spans the entire region of unwritten mappings; or
> - *    ii) the ioend spans all the mappings that cross or are beyond EOF; and
> - * b) if it contains unwritten extents, it is *permanently* marked as such
> - *
> - * We could do this by chaining ioends like buffered IO does, but we only
> - * actually get one IO completion callback from the direct IO, and that spans
> - * the entire IO regardless of how many mappings and IOs are needed to complete
> - * the DIO. There is only going to be one reference to the ioend and its life
> - * cycle is constrained by the DIO completion code. hence we don't need
> - * reference counting here.
> + * When we map a DIO buffer, we may need to pass flags to
> + * xfs_end_io_direct_write to tell it what kind of write IO we are doing.
>   *
>   * Note that for DIO, an IO to the highest supported file block offset (i.e.
>   * 2^63 - 1FSB bytes) will result in the offset + count overflowing a signed 64
> @@ -1266,68 +1251,26 @@ xfs_vm_releasepage(
>   * extending the file size. We won't know for sure until IO completion is run
>   * and the actual max write offset is communicated to the IO completion
>   * routine.
> - *
> - * For DAX page faults, we are preparing to never see unwritten extents here,
> - * nor should we ever extend the inode size. Hence we will soon have nothing to
> - * do here for this case, ensuring we don't have to provide an IO completion
> - * callback to free an ioend that we don't actually need for a fault into the
> - * page at offset (2^63 - 1FSB) bytes.
>   */
> -
>  static void
>  xfs_map_direct(
>  	struct inode		*inode,
>  	struct buffer_head	*bh_result,
>  	struct xfs_bmbt_irec	*imap,
> -	xfs_off_t		offset,
> -	bool			dax_fault)
> +	xfs_off_t		offset)
>  {
> -	struct xfs_ioend	*ioend;
> +	uintptr_t		*flags = (uintptr_t *)&bh_result->b_private;
>  	xfs_off_t		size = bh_result->b_size;
> -	int			type;
> -
> -	if (ISUNWRITTEN(imap))
> -		type = XFS_IO_UNWRITTEN;
> -	else
> -		type = XFS_IO_OVERWRITE;
> -
> -	trace_xfs_gbmap_direct(XFS_I(inode), offset, size, type, imap);
> -
> -	if (dax_fault) {
> -		ASSERT(type == XFS_IO_OVERWRITE);
> -		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
> -					    imap);
> -		return;
> -	}
>  
> -	if (bh_result->b_private) {
> -		ioend = bh_result->b_private;
> -		ASSERT(ioend->io_size > 0);
> -		ASSERT(offset >= ioend->io_offset);
> -		if (offset + size > ioend->io_offset + ioend->io_size)
> -			ioend->io_size = offset - ioend->io_offset + size;
> -
> -		if (type == XFS_IO_UNWRITTEN && type != ioend->io_type)
> -			ioend->io_type = XFS_IO_UNWRITTEN;
> -
> -		trace_xfs_gbmap_direct_update(XFS_I(inode), ioend->io_offset,
> -					      ioend->io_size, ioend->io_type,
> -					      imap);
> -	} else if (type == XFS_IO_UNWRITTEN ||
> -		   offset + size > i_size_read(inode) ||
> -		   offset + size < 0) {
> -		ioend = xfs_alloc_ioend(inode, type);
> -		ioend->io_offset = offset;
> -		ioend->io_size = size;
> +	trace_xfs_get_blocks_map_direct(XFS_I(inode), offset, size,
> +		ISUNWRITTEN(imap) ? XFS_IO_UNWRITTEN : XFS_IO_OVERWRITE, imap);
>  
> -		bh_result->b_private = ioend;
> +	if (ISUNWRITTEN(imap)) {
> +		*flags |= XFS_DIO_FLAG_UNWRITTEN;
> +		set_buffer_defer_completion(bh_result);
> +	} else if (offset + size > i_size_read(inode) || offset + size < 0) {
> +		*flags |= XFS_DIO_FLAG_APPEND;
>  		set_buffer_defer_completion(bh_result);
> -
> -		trace_xfs_gbmap_direct_new(XFS_I(inode), offset, size, type,
> -					   imap);
> -	} else {
> -		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
> -					    imap);
>  	}
>  }
>  
> @@ -1498,9 +1441,12 @@ __xfs_get_blocks(
>  		if (ISUNWRITTEN(&imap))
>  			set_buffer_unwritten(bh_result);
>  		/* direct IO needs special help */
> -		if (create && direct)
> -			xfs_map_direct(inode, bh_result, &imap, offset,
> -				       dax_fault);
> +		if (create && direct) {
> +			if (dax_fault)
> +				ASSERT(!ISUNWRITTEN(&imap));
> +			else
> +				xfs_map_direct(inode, bh_result, &imap, offset);
> +		}
>  	}
>  
>  	/*
> @@ -1570,42 +1516,50 @@ xfs_get_blocks_dax_fault(
>  	return __xfs_get_blocks(inode, iblock, bh_result, create, true, true);
>  }
>  
> -static void
> -__xfs_end_io_direct_write(
> -	struct inode		*inode,
> -	struct xfs_ioend	*ioend,
> +/*
> + * Complete a direct I/O write request.
> + *
> + * xfs_map_direct passes us some flags in the private data to tell us what to
> + * do.  If no flags are set, then the write IO is an overwrite wholly within
> + * the existing allocated file size and so there is nothing for us to do.
> + *
> + * Note that in this case the completion can be called in interrupt context,
> + * whereas if we have flags set we will always be called in task context
> + * (i.e. from a workqueue).
> + */
> +STATIC int
> +xfs_end_io_direct_write(
> +	struct kiocb		*iocb,
>  	loff_t			offset,
> -	ssize_t			size)
> +	ssize_t			size,
> +	void			*private)
>  {
> -	struct xfs_mount	*mp = XFS_I(inode)->i_mount;
> +	struct inode		*inode = file_inode(iocb->ki_filp);
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
> +	uintptr_t		flags = (uintptr_t)private;
> +	int			error = 0;
>  
> -	if (XFS_FORCED_SHUTDOWN(mp) || ioend->io_error)
> -		goto out_end_io;
> +	trace_xfs_end_io_direct_write(ip, offset, size);
>  
> -	/*
> -	 * dio completion end_io functions are only called on writes if more
> -	 * than 0 bytes was written.
> -	 */
> -	ASSERT(size > 0);
> +	if (XFS_FORCED_SHUTDOWN(mp))
> +		return -EIO;
>  
> -	/*
> -	 * The ioend only maps whole blocks, while the IO may be sector aligned.
> -	 * Hence the ioend offset/size may not match the IO offset/size exactly.
> -	 * Because we don't map overwrites within EOF into the ioend, the offset
> -	 * may not match, but only if the endio spans EOF.  Either way, write
> -	 * the IO sizes into the ioend so that completion processing does the
> -	 * right thing.
> -	 */
> -	ASSERT(offset + size <= ioend->io_offset + ioend->io_size);
> -	ioend->io_size = size;
> -	ioend->io_offset = offset;
> +	if (size <= 0)
> +		return size;
>  
>  	/*
> -	 * The ioend tells us whether we are doing unwritten extent conversion
> +	 * The flags tell us whether we are doing unwritten extent conversions
>  	 * or an append transaction that updates the on-disk file size. These
>  	 * cases are the only cases where we should *potentially* be needing
>  	 * to update the VFS inode size.
> -	 *
> +	 */
> +	if (flags == 0) {
> +		ASSERT(offset + size <= i_size_read(inode));
> +		return 0;
> +	}
> +
> +	/*
>  	 * We need to update the in-core inode size here so that we don't end up
>  	 * with the on-disk inode size being outside the in-core inode size. We
>  	 * have no other method of updating EOF for AIO, so always do it here
> @@ -1616,58 +1570,30 @@ __xfs_end_io_direct_write(
>  	 * here can result in EOF moving backwards and Bad Things Happen when
>  	 * that occurs.
>  	 */
> -	spin_lock(&XFS_I(inode)->i_flags_lock);
> +	spin_lock(&ip->i_flags_lock);
>  	if (offset + size > i_size_read(inode))
>  		i_size_write(inode, offset + size);
> -	spin_unlock(&XFS_I(inode)->i_flags_lock);
> +	spin_unlock(&ip->i_flags_lock);
>  
> -	/*
> -	 * If we are doing an append IO that needs to update the EOF on disk,
> -	 * do the transaction reserve now so we can use common end io
> -	 * processing. Stashing the error (if there is one) in the ioend will
> -	 * result in the ioend processing passing on the error if it is
> -	 * possible as we can't return it from here.
> -	 */
> -	if (ioend->io_type == XFS_IO_OVERWRITE)
> -		ioend->io_error = xfs_setfilesize_trans_alloc(ioend);
> +	if (flags & XFS_DIO_FLAG_UNWRITTEN) {
> +		trace_xfs_end_io_direct_write_unwritten(ip, offset, size);
>  
> -out_end_io:
> -	xfs_end_io(&ioend->io_work);
> -	return;
> -}
> +		error = xfs_iomap_write_unwritten(ip, offset, size);
> +	} else if (flags & XFS_DIO_FLAG_APPEND) {
> +		struct xfs_trans *tp;
>  
> -/*
> - * Complete a direct I/O write request.
> - *
> - * The ioend structure is passed from __xfs_get_blocks() to tell us what to do.
> - * If no ioend exists (i.e. @private == NULL) then the write IO is an overwrite
> - * wholly within the EOF and so there is nothing for us to do. Note that in this
> - * case the completion can be called in interrupt context, whereas if we have an
> - * ioend we will always be called in task context (i.e. from a workqueue).
> - */
> -STATIC int
> -xfs_end_io_direct_write(
> -	struct kiocb		*iocb,
> -	loff_t			offset,
> -	ssize_t			size,
> -	void			*private)
> -{
> -	struct inode		*inode = file_inode(iocb->ki_filp);
> -	struct xfs_ioend	*ioend = private;
> +		trace_xfs_end_io_direct_write_append(ip, offset, size);
>  
> -	if (size <= 0)
> -		return 0;
> -
> -	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
> -				     ioend ? ioend->io_type : 0, NULL);
> -
> -	if (!ioend) {
> -		ASSERT(offset + size <= i_size_read(inode));
> -		return 0;
> +		tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
> +		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
> +		if (error) {
> +			xfs_trans_cancel(tp);
> +			return error;
> +		}
> +		error = xfs_setfilesize(ip, tp, offset, size);

Don't we need a xfs_trans_commit() here?

--D
>  	}
>  
> -	__xfs_end_io_direct_write(inode, ioend, offset, size);
> -	return 0;
> +	return error;
>  }
>  
>  static inline ssize_t
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 391d797..c8d5842 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1296,11 +1296,7 @@ DEFINE_IOMAP_EVENT(xfs_map_blocks_found);
>  DEFINE_IOMAP_EVENT(xfs_map_blocks_alloc);
>  DEFINE_IOMAP_EVENT(xfs_get_blocks_found);
>  DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct_new);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct_update);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct_none);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct_endio);
> +DEFINE_IOMAP_EVENT(xfs_get_blocks_map_direct);
>  
>  DECLARE_EVENT_CLASS(xfs_simple_io_class,
>  	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),
> @@ -1340,6 +1336,9 @@ DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert);
>  DEFINE_SIMPLE_IO_EVENT(xfs_get_blocks_notfound);
>  DEFINE_SIMPLE_IO_EVENT(xfs_setfilesize);
>  DEFINE_SIMPLE_IO_EVENT(xfs_zero_eof);
> +DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write);
> +DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_unwritten);
> +DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_append);
>  
>  DECLARE_EVENT_CLASS(xfs_itrunc_class,
>  	TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size),
> -- 
> 2.1.4
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-05 21:57     ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2016-02-05 21:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4, ocfs2-devel, xfs

On Wed, Feb 03, 2016 at 07:40:15PM +0100, Christoph Hellwig wrote:
> We only need to communicate two bits of information to the direct I/O
> completion handler:
> 
>  (1) do we need to convert any unwritten extents in the range
>  (2) do we need to check if we need to update the inode size based
>      on the range passed to the completion handler
> 
> We can use the private data passed to the get_block handler and the
> completion handler as a simple bitmask to communicate this information
> instead of the current complicated infrastructure reusing the ioends
> from the buffer I/O path, and thus avoiding a memory allocation and
> a context switch for any non-trivial direct write.  As a nice side
> effect we also decouple the direct I/O path implementation from that
> of the buffered I/O path.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_aops.c  | 216 ++++++++++++++++++-----------------------------------
>  fs/xfs/xfs_trace.h |   9 +--
>  2 files changed, 75 insertions(+), 150 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 295aaff..f008a4f 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -36,6 +36,10 @@
>  #include <linux/pagevec.h>
>  #include <linux/writeback.h>
>  
> +/* flags for direct write completions */
> +#define XFS_DIO_FLAG_UNWRITTEN	(1 << 0)
> +#define XFS_DIO_FLAG_APPEND	(1 << 1)
> +
>  void
>  xfs_count_page_state(
>  	struct page		*page,
> @@ -1238,27 +1242,8 @@ xfs_vm_releasepage(
>  }
>  
>  /*
> - * When we map a DIO buffer, we may need to attach an ioend that describes the
> - * type of write IO we are doing. This passes to the completion function the
> - * operations it needs to perform. If the mapping is for an overwrite wholly
> - * within the EOF then we don't need an ioend and so we don't allocate one.
> - * This avoids the unnecessary overhead of allocating and freeing ioends for
> - * workloads that don't require transactions on IO completion.
> - *
> - * If we get multiple mappings in a single IO, we might be mapping different
> - * types. But because the direct IO can only have a single private pointer, we
> - * need to ensure that:
> - *
> - * a) i) the ioend spans the entire region of unwritten mappings; or
> - *    ii) the ioend spans all the mappings that cross or are beyond EOF; and
> - * b) if it contains unwritten extents, it is *permanently* marked as such
> - *
> - * We could do this by chaining ioends like buffered IO does, but we only
> - * actually get one IO completion callback from the direct IO, and that spans
> - * the entire IO regardless of how many mappings and IOs are needed to complete
> - * the DIO. There is only going to be one reference to the ioend and its life
> - * cycle is constrained by the DIO completion code. hence we don't need
> - * reference counting here.
> + * When we map a DIO buffer, we may need to pass flags to
> + * xfs_end_io_direct_write to tell it what kind of write IO we are doing.
>   *
>   * Note that for DIO, an IO to the highest supported file block offset (i.e.
>   * 2^63 - 1FSB bytes) will result in the offset + count overflowing a signed 64
> @@ -1266,68 +1251,26 @@ xfs_vm_releasepage(
>   * extending the file size. We won't know for sure until IO completion is run
>   * and the actual max write offset is communicated to the IO completion
>   * routine.
> - *
> - * For DAX page faults, we are preparing to never see unwritten extents here,
> - * nor should we ever extend the inode size. Hence we will soon have nothing to
> - * do here for this case, ensuring we don't have to provide an IO completion
> - * callback to free an ioend that we don't actually need for a fault into the
> - * page at offset (2^63 - 1FSB) bytes.
>   */
> -
>  static void
>  xfs_map_direct(
>  	struct inode		*inode,
>  	struct buffer_head	*bh_result,
>  	struct xfs_bmbt_irec	*imap,
> -	xfs_off_t		offset,
> -	bool			dax_fault)
> +	xfs_off_t		offset)
>  {
> -	struct xfs_ioend	*ioend;
> +	uintptr_t		*flags = (uintptr_t *)&bh_result->b_private;
>  	xfs_off_t		size = bh_result->b_size;
> -	int			type;
> -
> -	if (ISUNWRITTEN(imap))
> -		type = XFS_IO_UNWRITTEN;
> -	else
> -		type = XFS_IO_OVERWRITE;
> -
> -	trace_xfs_gbmap_direct(XFS_I(inode), offset, size, type, imap);
> -
> -	if (dax_fault) {
> -		ASSERT(type == XFS_IO_OVERWRITE);
> -		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
> -					    imap);
> -		return;
> -	}
>  
> -	if (bh_result->b_private) {
> -		ioend = bh_result->b_private;
> -		ASSERT(ioend->io_size > 0);
> -		ASSERT(offset >= ioend->io_offset);
> -		if (offset + size > ioend->io_offset + ioend->io_size)
> -			ioend->io_size = offset - ioend->io_offset + size;
> -
> -		if (type == XFS_IO_UNWRITTEN && type != ioend->io_type)
> -			ioend->io_type = XFS_IO_UNWRITTEN;
> -
> -		trace_xfs_gbmap_direct_update(XFS_I(inode), ioend->io_offset,
> -					      ioend->io_size, ioend->io_type,
> -					      imap);
> -	} else if (type == XFS_IO_UNWRITTEN ||
> -		   offset + size > i_size_read(inode) ||
> -		   offset + size < 0) {
> -		ioend = xfs_alloc_ioend(inode, type);
> -		ioend->io_offset = offset;
> -		ioend->io_size = size;
> +	trace_xfs_get_blocks_map_direct(XFS_I(inode), offset, size,
> +		ISUNWRITTEN(imap) ? XFS_IO_UNWRITTEN : XFS_IO_OVERWRITE, imap);
>  
> -		bh_result->b_private = ioend;
> +	if (ISUNWRITTEN(imap)) {
> +		*flags |= XFS_DIO_FLAG_UNWRITTEN;
> +		set_buffer_defer_completion(bh_result);
> +	} else if (offset + size > i_size_read(inode) || offset + size < 0) {
> +		*flags |= XFS_DIO_FLAG_APPEND;
>  		set_buffer_defer_completion(bh_result);
> -
> -		trace_xfs_gbmap_direct_new(XFS_I(inode), offset, size, type,
> -					   imap);
> -	} else {
> -		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
> -					    imap);
>  	}
>  }
>  
> @@ -1498,9 +1441,12 @@ __xfs_get_blocks(
>  		if (ISUNWRITTEN(&imap))
>  			set_buffer_unwritten(bh_result);
>  		/* direct IO needs special help */
> -		if (create && direct)
> -			xfs_map_direct(inode, bh_result, &imap, offset,
> -				       dax_fault);
> +		if (create && direct) {
> +			if (dax_fault)
> +				ASSERT(!ISUNWRITTEN(&imap));
> +			else
> +				xfs_map_direct(inode, bh_result, &imap, offset);
> +		}
>  	}
>  
>  	/*
> @@ -1570,42 +1516,50 @@ xfs_get_blocks_dax_fault(
>  	return __xfs_get_blocks(inode, iblock, bh_result, create, true, true);
>  }
>  
> -static void
> -__xfs_end_io_direct_write(
> -	struct inode		*inode,
> -	struct xfs_ioend	*ioend,
> +/*
> + * Complete a direct I/O write request.
> + *
> + * xfs_map_direct passes us some flags in the private data to tell us what to
> + * do.  If no flags are set, then the write IO is an overwrite wholly within
> + * the existing allocated file size and so there is nothing for us to do.
> + *
> + * Note that in this case the completion can be called in interrupt context,
> + * whereas if we have flags set we will always be called in task context
> + * (i.e. from a workqueue).
> + */
> +STATIC int
> +xfs_end_io_direct_write(
> +	struct kiocb		*iocb,
>  	loff_t			offset,
> -	ssize_t			size)
> +	ssize_t			size,
> +	void			*private)
>  {
> -	struct xfs_mount	*mp = XFS_I(inode)->i_mount;
> +	struct inode		*inode = file_inode(iocb->ki_filp);
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
> +	uintptr_t		flags = (uintptr_t)private;
> +	int			error = 0;
>  
> -	if (XFS_FORCED_SHUTDOWN(mp) || ioend->io_error)
> -		goto out_end_io;
> +	trace_xfs_end_io_direct_write(ip, offset, size);
>  
> -	/*
> -	 * dio completion end_io functions are only called on writes if more
> -	 * than 0 bytes was written.
> -	 */
> -	ASSERT(size > 0);
> +	if (XFS_FORCED_SHUTDOWN(mp))
> +		return -EIO;
>  
> -	/*
> -	 * The ioend only maps whole blocks, while the IO may be sector aligned.
> -	 * Hence the ioend offset/size may not match the IO offset/size exactly.
> -	 * Because we don't map overwrites within EOF into the ioend, the offset
> -	 * may not match, but only if the endio spans EOF.  Either way, write
> -	 * the IO sizes into the ioend so that completion processing does the
> -	 * right thing.
> -	 */
> -	ASSERT(offset + size <= ioend->io_offset + ioend->io_size);
> -	ioend->io_size = size;
> -	ioend->io_offset = offset;
> +	if (size <= 0)
> +		return size;
>  
>  	/*
> -	 * The ioend tells us whether we are doing unwritten extent conversion
> +	 * The flags tell us whether we are doing unwritten extent conversions
>  	 * or an append transaction that updates the on-disk file size. These
>  	 * cases are the only cases where we should *potentially* be needing
>  	 * to update the VFS inode size.
> -	 *
> +	 */
> +	if (flags == 0) {
> +		ASSERT(offset + size <= i_size_read(inode));
> +		return 0;
> +	}
> +
> +	/*
>  	 * We need to update the in-core inode size here so that we don't end up
>  	 * with the on-disk inode size being outside the in-core inode size. We
>  	 * have no other method of updating EOF for AIO, so always do it here
> @@ -1616,58 +1570,30 @@ __xfs_end_io_direct_write(
>  	 * here can result in EOF moving backwards and Bad Things Happen when
>  	 * that occurs.
>  	 */
> -	spin_lock(&XFS_I(inode)->i_flags_lock);
> +	spin_lock(&ip->i_flags_lock);
>  	if (offset + size > i_size_read(inode))
>  		i_size_write(inode, offset + size);
> -	spin_unlock(&XFS_I(inode)->i_flags_lock);
> +	spin_unlock(&ip->i_flags_lock);
>  
> -	/*
> -	 * If we are doing an append IO that needs to update the EOF on disk,
> -	 * do the transaction reserve now so we can use common end io
> -	 * processing. Stashing the error (if there is one) in the ioend will
> -	 * result in the ioend processing passing on the error if it is
> -	 * possible as we can't return it from here.
> -	 */
> -	if (ioend->io_type == XFS_IO_OVERWRITE)
> -		ioend->io_error = xfs_setfilesize_trans_alloc(ioend);
> +	if (flags & XFS_DIO_FLAG_UNWRITTEN) {
> +		trace_xfs_end_io_direct_write_unwritten(ip, offset, size);
>  
> -out_end_io:
> -	xfs_end_io(&ioend->io_work);
> -	return;
> -}
> +		error = xfs_iomap_write_unwritten(ip, offset, size);
> +	} else if (flags & XFS_DIO_FLAG_APPEND) {
> +		struct xfs_trans *tp;
>  
> -/*
> - * Complete a direct I/O write request.
> - *
> - * The ioend structure is passed from __xfs_get_blocks() to tell us what to do.
> - * If no ioend exists (i.e. @private == NULL) then the write IO is an overwrite
> - * wholly within the EOF and so there is nothing for us to do. Note that in this
> - * case the completion can be called in interrupt context, whereas if we have an
> - * ioend we will always be called in task context (i.e. from a workqueue).
> - */
> -STATIC int
> -xfs_end_io_direct_write(
> -	struct kiocb		*iocb,
> -	loff_t			offset,
> -	ssize_t			size,
> -	void			*private)
> -{
> -	struct inode		*inode = file_inode(iocb->ki_filp);
> -	struct xfs_ioend	*ioend = private;
> +		trace_xfs_end_io_direct_write_append(ip, offset, size);
>  
> -	if (size <= 0)
> -		return 0;
> -
> -	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
> -				     ioend ? ioend->io_type : 0, NULL);
> -
> -	if (!ioend) {
> -		ASSERT(offset + size <= i_size_read(inode));
> -		return 0;
> +		tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
> +		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
> +		if (error) {
> +			xfs_trans_cancel(tp);
> +			return error;
> +		}
> +		error = xfs_setfilesize(ip, tp, offset, size);

Don't we need a xfs_trans_commit() here?

--D
>  	}
>  
> -	__xfs_end_io_direct_write(inode, ioend, offset, size);
> -	return 0;
> +	return error;
>  }
>  
>  static inline ssize_t
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 391d797..c8d5842 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1296,11 +1296,7 @@ DEFINE_IOMAP_EVENT(xfs_map_blocks_found);
>  DEFINE_IOMAP_EVENT(xfs_map_blocks_alloc);
>  DEFINE_IOMAP_EVENT(xfs_get_blocks_found);
>  DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct_new);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct_update);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct_none);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct_endio);
> +DEFINE_IOMAP_EVENT(xfs_get_blocks_map_direct);
>  
>  DECLARE_EVENT_CLASS(xfs_simple_io_class,
>  	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),
> @@ -1340,6 +1336,9 @@ DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert);
>  DEFINE_SIMPLE_IO_EVENT(xfs_get_blocks_notfound);
>  DEFINE_SIMPLE_IO_EVENT(xfs_setfilesize);
>  DEFINE_SIMPLE_IO_EVENT(xfs_zero_eof);
> +DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write);
> +DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_unwritten);
> +DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_append);
>  
>  DECLARE_EVENT_CLASS(xfs_itrunc_class,
>  	TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size),
> -- 
> 2.1.4
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-05 21:57     ` Darrick J. Wong
  0 siblings, 0 replies; 52+ messages in thread
From: Darrick J. Wong @ 2016-02-05 21:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Wed, Feb 03, 2016 at 07:40:15PM +0100, Christoph Hellwig wrote:
> We only need to communicate two bits of information to the direct I/O
> completion handler:
> 
>  (1) do we need to convert any unwritten extents in the range
>  (2) do we need to check if we need to update the inode size based
>      on the range passed to the completion handler
> 
> We can use the private data passed to the get_block handler and the
> completion handler as a simple bitmask to communicate this information
> instead of the current complicated infrastructure reusing the ioends
> from the buffer I/O path, and thus avoiding a memory allocation and
> a context switch for any non-trivial direct write.  As a nice side
> effect we also decouple the direct I/O path implementation from that
> of the buffered I/O path.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_aops.c  | 216 ++++++++++++++++++-----------------------------------
>  fs/xfs/xfs_trace.h |   9 +--
>  2 files changed, 75 insertions(+), 150 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 295aaff..f008a4f 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -36,6 +36,10 @@
>  #include <linux/pagevec.h>
>  #include <linux/writeback.h>
>  
> +/* flags for direct write completions */
> +#define XFS_DIO_FLAG_UNWRITTEN	(1 << 0)
> +#define XFS_DIO_FLAG_APPEND	(1 << 1)
> +
>  void
>  xfs_count_page_state(
>  	struct page		*page,
> @@ -1238,27 +1242,8 @@ xfs_vm_releasepage(
>  }
>  
>  /*
> - * When we map a DIO buffer, we may need to attach an ioend that describes the
> - * type of write IO we are doing. This passes to the completion function the
> - * operations it needs to perform. If the mapping is for an overwrite wholly
> - * within the EOF then we don't need an ioend and so we don't allocate one.
> - * This avoids the unnecessary overhead of allocating and freeing ioends for
> - * workloads that don't require transactions on IO completion.
> - *
> - * If we get multiple mappings in a single IO, we might be mapping different
> - * types. But because the direct IO can only have a single private pointer, we
> - * need to ensure that:
> - *
> - * a) i) the ioend spans the entire region of unwritten mappings; or
> - *    ii) the ioend spans all the mappings that cross or are beyond EOF; and
> - * b) if it contains unwritten extents, it is *permanently* marked as such
> - *
> - * We could do this by chaining ioends like buffered IO does, but we only
> - * actually get one IO completion callback from the direct IO, and that spans
> - * the entire IO regardless of how many mappings and IOs are needed to complete
> - * the DIO. There is only going to be one reference to the ioend and its life
> - * cycle is constrained by the DIO completion code. hence we don't need
> - * reference counting here.
> + * When we map a DIO buffer, we may need to pass flags to
> + * xfs_end_io_direct_write to tell it what kind of write IO we are doing.
>   *
>   * Note that for DIO, an IO to the highest supported file block offset (i.e.
>   * 2^63 - 1FSB bytes) will result in the offset + count overflowing a signed 64
> @@ -1266,68 +1251,26 @@ xfs_vm_releasepage(
>   * extending the file size. We won't know for sure until IO completion is run
>   * and the actual max write offset is communicated to the IO completion
>   * routine.
> - *
> - * For DAX page faults, we are preparing to never see unwritten extents here,
> - * nor should we ever extend the inode size. Hence we will soon have nothing to
> - * do here for this case, ensuring we don't have to provide an IO completion
> - * callback to free an ioend that we don't actually need for a fault into the
> - * page at offset (2^63 - 1FSB) bytes.
>   */
> -
>  static void
>  xfs_map_direct(
>  	struct inode		*inode,
>  	struct buffer_head	*bh_result,
>  	struct xfs_bmbt_irec	*imap,
> -	xfs_off_t		offset,
> -	bool			dax_fault)
> +	xfs_off_t		offset)
>  {
> -	struct xfs_ioend	*ioend;
> +	uintptr_t		*flags = (uintptr_t *)&bh_result->b_private;
>  	xfs_off_t		size = bh_result->b_size;
> -	int			type;
> -
> -	if (ISUNWRITTEN(imap))
> -		type = XFS_IO_UNWRITTEN;
> -	else
> -		type = XFS_IO_OVERWRITE;
> -
> -	trace_xfs_gbmap_direct(XFS_I(inode), offset, size, type, imap);
> -
> -	if (dax_fault) {
> -		ASSERT(type == XFS_IO_OVERWRITE);
> -		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
> -					    imap);
> -		return;
> -	}
>  
> -	if (bh_result->b_private) {
> -		ioend = bh_result->b_private;
> -		ASSERT(ioend->io_size > 0);
> -		ASSERT(offset >= ioend->io_offset);
> -		if (offset + size > ioend->io_offset + ioend->io_size)
> -			ioend->io_size = offset - ioend->io_offset + size;
> -
> -		if (type == XFS_IO_UNWRITTEN && type != ioend->io_type)
> -			ioend->io_type = XFS_IO_UNWRITTEN;
> -
> -		trace_xfs_gbmap_direct_update(XFS_I(inode), ioend->io_offset,
> -					      ioend->io_size, ioend->io_type,
> -					      imap);
> -	} else if (type == XFS_IO_UNWRITTEN ||
> -		   offset + size > i_size_read(inode) ||
> -		   offset + size < 0) {
> -		ioend = xfs_alloc_ioend(inode, type);
> -		ioend->io_offset = offset;
> -		ioend->io_size = size;
> +	trace_xfs_get_blocks_map_direct(XFS_I(inode), offset, size,
> +		ISUNWRITTEN(imap) ? XFS_IO_UNWRITTEN : XFS_IO_OVERWRITE, imap);
>  
> -		bh_result->b_private = ioend;
> +	if (ISUNWRITTEN(imap)) {
> +		*flags |= XFS_DIO_FLAG_UNWRITTEN;
> +		set_buffer_defer_completion(bh_result);
> +	} else if (offset + size > i_size_read(inode) || offset + size < 0) {
> +		*flags |= XFS_DIO_FLAG_APPEND;
>  		set_buffer_defer_completion(bh_result);
> -
> -		trace_xfs_gbmap_direct_new(XFS_I(inode), offset, size, type,
> -					   imap);
> -	} else {
> -		trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
> -					    imap);
>  	}
>  }
>  
> @@ -1498,9 +1441,12 @@ __xfs_get_blocks(
>  		if (ISUNWRITTEN(&imap))
>  			set_buffer_unwritten(bh_result);
>  		/* direct IO needs special help */
> -		if (create && direct)
> -			xfs_map_direct(inode, bh_result, &imap, offset,
> -				       dax_fault);
> +		if (create && direct) {
> +			if (dax_fault)
> +				ASSERT(!ISUNWRITTEN(&imap));
> +			else
> +				xfs_map_direct(inode, bh_result, &imap, offset);
> +		}
>  	}
>  
>  	/*
> @@ -1570,42 +1516,50 @@ xfs_get_blocks_dax_fault(
>  	return __xfs_get_blocks(inode, iblock, bh_result, create, true, true);
>  }
>  
> -static void
> -__xfs_end_io_direct_write(
> -	struct inode		*inode,
> -	struct xfs_ioend	*ioend,
> +/*
> + * Complete a direct I/O write request.
> + *
> + * xfs_map_direct passes us some flags in the private data to tell us what to
> + * do.  If no flags are set, then the write IO is an overwrite wholly within
> + * the existing allocated file size and so there is nothing for us to do.
> + *
> + * Note that in this case the completion can be called in interrupt context,
> + * whereas if we have flags set we will always be called in task context
> + * (i.e. from a workqueue).
> + */
> +STATIC int
> +xfs_end_io_direct_write(
> +	struct kiocb		*iocb,
>  	loff_t			offset,
> -	ssize_t			size)
> +	ssize_t			size,
> +	void			*private)
>  {
> -	struct xfs_mount	*mp = XFS_I(inode)->i_mount;
> +	struct inode		*inode = file_inode(iocb->ki_filp);
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
> +	uintptr_t		flags = (uintptr_t)private;
> +	int			error = 0;
>  
> -	if (XFS_FORCED_SHUTDOWN(mp) || ioend->io_error)
> -		goto out_end_io;
> +	trace_xfs_end_io_direct_write(ip, offset, size);
>  
> -	/*
> -	 * dio completion end_io functions are only called on writes if more
> -	 * than 0 bytes was written.
> -	 */
> -	ASSERT(size > 0);
> +	if (XFS_FORCED_SHUTDOWN(mp))
> +		return -EIO;
>  
> -	/*
> -	 * The ioend only maps whole blocks, while the IO may be sector aligned.
> -	 * Hence the ioend offset/size may not match the IO offset/size exactly.
> -	 * Because we don't map overwrites within EOF into the ioend, the offset
> -	 * may not match, but only if the endio spans EOF.  Either way, write
> -	 * the IO sizes into the ioend so that completion processing does the
> -	 * right thing.
> -	 */
> -	ASSERT(offset + size <= ioend->io_offset + ioend->io_size);
> -	ioend->io_size = size;
> -	ioend->io_offset = offset;
> +	if (size <= 0)
> +		return size;
>  
>  	/*
> -	 * The ioend tells us whether we are doing unwritten extent conversion
> +	 * The flags tell us whether we are doing unwritten extent conversions
>  	 * or an append transaction that updates the on-disk file size. These
>  	 * cases are the only cases where we should *potentially* be needing
>  	 * to update the VFS inode size.
> -	 *
> +	 */
> +	if (flags == 0) {
> +		ASSERT(offset + size <= i_size_read(inode));
> +		return 0;
> +	}
> +
> +	/*
>  	 * We need to update the in-core inode size here so that we don't end up
>  	 * with the on-disk inode size being outside the in-core inode size. We
>  	 * have no other method of updating EOF for AIO, so always do it here
> @@ -1616,58 +1570,30 @@ __xfs_end_io_direct_write(
>  	 * here can result in EOF moving backwards and Bad Things Happen when
>  	 * that occurs.
>  	 */
> -	spin_lock(&XFS_I(inode)->i_flags_lock);
> +	spin_lock(&ip->i_flags_lock);
>  	if (offset + size > i_size_read(inode))
>  		i_size_write(inode, offset + size);
> -	spin_unlock(&XFS_I(inode)->i_flags_lock);
> +	spin_unlock(&ip->i_flags_lock);
>  
> -	/*
> -	 * If we are doing an append IO that needs to update the EOF on disk,
> -	 * do the transaction reserve now so we can use common end io
> -	 * processing. Stashing the error (if there is one) in the ioend will
> -	 * result in the ioend processing passing on the error if it is
> -	 * possible as we can't return it from here.
> -	 */
> -	if (ioend->io_type == XFS_IO_OVERWRITE)
> -		ioend->io_error = xfs_setfilesize_trans_alloc(ioend);
> +	if (flags & XFS_DIO_FLAG_UNWRITTEN) {
> +		trace_xfs_end_io_direct_write_unwritten(ip, offset, size);
>  
> -out_end_io:
> -	xfs_end_io(&ioend->io_work);
> -	return;
> -}
> +		error = xfs_iomap_write_unwritten(ip, offset, size);
> +	} else if (flags & XFS_DIO_FLAG_APPEND) {
> +		struct xfs_trans *tp;
>  
> -/*
> - * Complete a direct I/O write request.
> - *
> - * The ioend structure is passed from __xfs_get_blocks() to tell us what to do.
> - * If no ioend exists (i.e. @private == NULL) then the write IO is an overwrite
> - * wholly within the EOF and so there is nothing for us to do. Note that in this
> - * case the completion can be called in interrupt context, whereas if we have an
> - * ioend we will always be called in task context (i.e. from a workqueue).
> - */
> -STATIC int
> -xfs_end_io_direct_write(
> -	struct kiocb		*iocb,
> -	loff_t			offset,
> -	ssize_t			size,
> -	void			*private)
> -{
> -	struct inode		*inode = file_inode(iocb->ki_filp);
> -	struct xfs_ioend	*ioend = private;
> +		trace_xfs_end_io_direct_write_append(ip, offset, size);
>  
> -	if (size <= 0)
> -		return 0;
> -
> -	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
> -				     ioend ? ioend->io_type : 0, NULL);
> -
> -	if (!ioend) {
> -		ASSERT(offset + size <= i_size_read(inode));
> -		return 0;
> +		tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
> +		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
> +		if (error) {
> +			xfs_trans_cancel(tp);
> +			return error;
> +		}
> +		error = xfs_setfilesize(ip, tp, offset, size);

Don't we need a xfs_trans_commit() here?

--D
>  	}
>  
> -	__xfs_end_io_direct_write(inode, ioend, offset, size);
> -	return 0;
> +	return error;
>  }
>  
>  static inline ssize_t
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 391d797..c8d5842 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1296,11 +1296,7 @@ DEFINE_IOMAP_EVENT(xfs_map_blocks_found);
>  DEFINE_IOMAP_EVENT(xfs_map_blocks_alloc);
>  DEFINE_IOMAP_EVENT(xfs_get_blocks_found);
>  DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct_new);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct_update);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct_none);
> -DEFINE_IOMAP_EVENT(xfs_gbmap_direct_endio);
> +DEFINE_IOMAP_EVENT(xfs_get_blocks_map_direct);
>  
>  DECLARE_EVENT_CLASS(xfs_simple_io_class,
>  	TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),
> @@ -1340,6 +1336,9 @@ DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert);
>  DEFINE_SIMPLE_IO_EVENT(xfs_get_blocks_notfound);
>  DEFINE_SIMPLE_IO_EVENT(xfs_setfilesize);
>  DEFINE_SIMPLE_IO_EVENT(xfs_zero_eof);
> +DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write);
> +DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_unwritten);
> +DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write_append);
>  
>  DECLARE_EVENT_CLASS(xfs_itrunc_class,
>  	TP_PROTO(struct xfs_inode *ip, xfs_fsize_t new_size),
> -- 
> 2.1.4
> 
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
  2016-02-05 21:57     ` Darrick J. Wong
  (?)
@ 2016-02-05 22:36       ` Dave Chinner
  -1 siblings, 0 replies; 52+ messages in thread
From: Dave Chinner @ 2016-02-05 22:36 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Fri, Feb 05, 2016 at 01:57:18PM -0800, Darrick J. Wong wrote:
> On Wed, Feb 03, 2016 at 07:40:15PM +0100, Christoph Hellwig wrote:
> > -	struct kiocb		*iocb,
> > -	loff_t			offset,
> > -	ssize_t			size,
> > -	void			*private)
> > -{
> > -	struct inode		*inode = file_inode(iocb->ki_filp);
> > -	struct xfs_ioend	*ioend = private;
> > +		trace_xfs_end_io_direct_write_append(ip, offset, size);
> >  
> > -	if (size <= 0)
> > -		return 0;
> > -
> > -	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
> > -				     ioend ? ioend->io_type : 0, NULL);
> > -
> > -	if (!ioend) {
> > -		ASSERT(offset + size <= i_size_read(inode));
> > -		return 0;
> > +		tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
> > +		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
> > +		if (error) {
> > +			xfs_trans_cancel(tp);
> > +			return error;
> > +		}
> > +		error = xfs_setfilesize(ip, tp, offset, size);
> 
> Don't we need a xfs_trans_commit() here?

No, xfs_setfilesize() does that.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-05 22:36       ` Dave Chinner
  0 siblings, 0 replies; 52+ messages in thread
From: Dave Chinner @ 2016-02-05 22:36 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-fsdevel, linux-ext4, Christoph Hellwig, ocfs2-devel, xfs

On Fri, Feb 05, 2016 at 01:57:18PM -0800, Darrick J. Wong wrote:
> On Wed, Feb 03, 2016 at 07:40:15PM +0100, Christoph Hellwig wrote:
> > -	struct kiocb		*iocb,
> > -	loff_t			offset,
> > -	ssize_t			size,
> > -	void			*private)
> > -{
> > -	struct inode		*inode = file_inode(iocb->ki_filp);
> > -	struct xfs_ioend	*ioend = private;
> > +		trace_xfs_end_io_direct_write_append(ip, offset, size);
> >  
> > -	if (size <= 0)
> > -		return 0;
> > -
> > -	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
> > -				     ioend ? ioend->io_type : 0, NULL);
> > -
> > -	if (!ioend) {
> > -		ASSERT(offset + size <= i_size_read(inode));
> > -		return 0;
> > +		tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
> > +		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
> > +		if (error) {
> > +			xfs_trans_cancel(tp);
> > +			return error;
> > +		}
> > +		error = xfs_setfilesize(ip, tp, offset, size);
> 
> Don't we need a xfs_trans_commit() here?

No, xfs_setfilesize() does that.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-05 22:36       ` Dave Chinner
  0 siblings, 0 replies; 52+ messages in thread
From: Dave Chinner @ 2016-02-05 22:36 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Fri, Feb 05, 2016 at 01:57:18PM -0800, Darrick J. Wong wrote:
> On Wed, Feb 03, 2016 at 07:40:15PM +0100, Christoph Hellwig wrote:
> > -	struct kiocb		*iocb,
> > -	loff_t			offset,
> > -	ssize_t			size,
> > -	void			*private)
> > -{
> > -	struct inode		*inode = file_inode(iocb->ki_filp);
> > -	struct xfs_ioend	*ioend = private;
> > +		trace_xfs_end_io_direct_write_append(ip, offset, size);
> >  
> > -	if (size <= 0)
> > -		return 0;
> > -
> > -	trace_xfs_gbmap_direct_endio(XFS_I(inode), offset, size,
> > -				     ioend ? ioend->io_type : 0, NULL);
> > -
> > -	if (!ioend) {
> > -		ASSERT(offset + size <= i_size_read(inode));
> > -		return 0;
> > +		tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
> > +		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
> > +		if (error) {
> > +			xfs_trans_cancel(tp);
> > +			return error;
> > +		}
> > +		error = xfs_setfilesize(ip, tp, offset, size);
> 
> Don't we need a xfs_trans_commit() here?

No, xfs_setfilesize() does that.

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
  2016-02-03 18:40   ` Christoph Hellwig
  (?)
@ 2016-02-08  1:00     ` Dave Chinner
  -1 siblings, 0 replies; 52+ messages in thread
From: Dave Chinner @ 2016-02-08  1:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Wed, Feb 03, 2016 at 07:40:15PM +0100, Christoph Hellwig wrote:
> We only need to communicate two bits of information to the direct I/O
> completion handler:
> 
>  (1) do we need to convert any unwritten extents in the range
>  (2) do we need to check if we need to update the inode size based
>      on the range passed to the completion handler
> 
> We can use the private data passed to the get_block handler and the
> completion handler as a simple bitmask to communicate this information
> instead of the current complicated infrastructure reusing the ioends
> from the buffer I/O path, and thus avoiding a memory allocation and
> a context switch for any non-trivial direct write.  As a nice side
> effect we also decouple the direct I/O path implementation from that
> of the buffered I/O path.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>

This change is now dependent on the preceeding direct IO API
changes. Do I a) take the DIO API change through the XFS tree, or
b) use the older version of the patch that didn't have this
dependency and let somebody else deal with the API change and merge
issues?

I'm happy to take the DIO API change through the XFS tree, if that's
the fastest/easiest way to get the necessary DIO subsystem fixes
into the mainline tree for XFS. As such, the for-next tree that I'm
building right now will include the DIO API change patch....

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-08  1:00     ` Dave Chinner
  0 siblings, 0 replies; 52+ messages in thread
From: Dave Chinner @ 2016-02-08  1:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4, ocfs2-devel, xfs

On Wed, Feb 03, 2016 at 07:40:15PM +0100, Christoph Hellwig wrote:
> We only need to communicate two bits of information to the direct I/O
> completion handler:
> 
>  (1) do we need to convert any unwritten extents in the range
>  (2) do we need to check if we need to update the inode size based
>      on the range passed to the completion handler
> 
> We can use the private data passed to the get_block handler and the
> completion handler as a simple bitmask to communicate this information
> instead of the current complicated infrastructure reusing the ioends
> from the buffer I/O path, and thus avoiding a memory allocation and
> a context switch for any non-trivial direct write.  As a nice side
> effect we also decouple the direct I/O path implementation from that
> of the buffered I/O path.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>

This change is now dependent on the preceeding direct IO API
changes. Do I a) take the DIO API change through the XFS tree, or
b) use the older version of the patch that didn't have this
dependency and let somebody else deal with the API change and merge
issues?

I'm happy to take the DIO API change through the XFS tree, if that's
the fastest/easiest way to get the necessary DIO subsystem fixes
into the mainline tree for XFS. As such, the for-next tree that I'm
building right now will include the DIO API change patch....

-Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-08  1:00     ` Dave Chinner
  0 siblings, 0 replies; 52+ messages in thread
From: Dave Chinner @ 2016-02-08  1:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Wed, Feb 03, 2016 at 07:40:15PM +0100, Christoph Hellwig wrote:
> We only need to communicate two bits of information to the direct I/O
> completion handler:
> 
>  (1) do we need to convert any unwritten extents in the range
>  (2) do we need to check if we need to update the inode size based
>      on the range passed to the completion handler
> 
> We can use the private data passed to the get_block handler and the
> completion handler as a simple bitmask to communicate this information
> instead of the current complicated infrastructure reusing the ioends
> from the buffer I/O path, and thus avoiding a memory allocation and
> a context switch for any non-trivial direct write.  As a nice side
> effect we also decouple the direct I/O path implementation from that
> of the buffered I/O path.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>

This change is now dependent on the preceeding direct IO API
changes. Do I a) take the DIO API change through the XFS tree, or
b) use the older version of the patch that didn't have this
dependency and let somebody else deal with the API change and merge
issues?

I'm happy to take the DIO API change through the XFS tree, if that's
the fastest/easiest way to get the necessary DIO subsystem fixes
into the mainline tree for XFS. As such, the for-next tree that I'm
building right now will include the DIO API change patch....

-Dave.

-- 
Dave Chinner
david at fromorbit.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
  2016-02-08  1:00     ` Dave Chinner
  (?)
@ 2016-02-08  6:17       ` Dave Chinner
  -1 siblings, 0 replies; 52+ messages in thread
From: Dave Chinner @ 2016-02-08  6:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Mon, Feb 08, 2016 at 12:00:26PM +1100, Dave Chinner wrote:
> On Wed, Feb 03, 2016 at 07:40:15PM +0100, Christoph Hellwig wrote:
> > We only need to communicate two bits of information to the direct I/O
> > completion handler:
> > 
> >  (1) do we need to convert any unwritten extents in the range
> >  (2) do we need to check if we need to update the inode size based
> >      on the range passed to the completion handler
> > 
> > We can use the private data passed to the get_block handler and the
> > completion handler as a simple bitmask to communicate this information
> > instead of the current complicated infrastructure reusing the ioends
> > from the buffer I/O path, and thus avoiding a memory allocation and
> > a context switch for any non-trivial direct write.  As a nice side
> > effect we also decouple the direct I/O path implementation from that
> > of the buffered I/O path.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> This change is now dependent on the preceeding direct IO API
> changes. Do I a) take the DIO API change through the XFS tree, or
> b) use the older version of the patch that didn't have this
> dependency and let somebody else deal with the API change and merge
> issues?
> 
> I'm happy to take the DIO API change through the XFS tree, if that's
> the fastest/easiest way to get the necessary DIO subsystem fixes
> into the mainline tree for XFS. As such, the for-next tree that I'm
> building right now will include the DIO API change patch....

Right now this series is in a stable branch in the XFS tree:

git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-dio-fix-4.6

If you want to push it through some other tree, please let me know
when/where it is committed so I can rebuild the XFS for-next branch
appropriately from a stable commit/branch...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-08  6:17       ` Dave Chinner
  0 siblings, 0 replies; 52+ messages in thread
From: Dave Chinner @ 2016-02-08  6:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4, ocfs2-devel, xfs

On Mon, Feb 08, 2016 at 12:00:26PM +1100, Dave Chinner wrote:
> On Wed, Feb 03, 2016 at 07:40:15PM +0100, Christoph Hellwig wrote:
> > We only need to communicate two bits of information to the direct I/O
> > completion handler:
> > 
> >  (1) do we need to convert any unwritten extents in the range
> >  (2) do we need to check if we need to update the inode size based
> >      on the range passed to the completion handler
> > 
> > We can use the private data passed to the get_block handler and the
> > completion handler as a simple bitmask to communicate this information
> > instead of the current complicated infrastructure reusing the ioends
> > from the buffer I/O path, and thus avoiding a memory allocation and
> > a context switch for any non-trivial direct write.  As a nice side
> > effect we also decouple the direct I/O path implementation from that
> > of the buffered I/O path.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> This change is now dependent on the preceeding direct IO API
> changes. Do I a) take the DIO API change through the XFS tree, or
> b) use the older version of the patch that didn't have this
> dependency and let somebody else deal with the API change and merge
> issues?
> 
> I'm happy to take the DIO API change through the XFS tree, if that's
> the fastest/easiest way to get the necessary DIO subsystem fixes
> into the mainline tree for XFS. As such, the for-next tree that I'm
> building right now will include the DIO API change patch....

Right now this series is in a stable branch in the XFS tree:

git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-dio-fix-4.6

If you want to push it through some other tree, please let me know
when/where it is committed so I can rebuild the XFS for-next branch
appropriately from a stable commit/branch...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-08  6:17       ` Dave Chinner
  0 siblings, 0 replies; 52+ messages in thread
From: Dave Chinner @ 2016-02-08  6:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Mon, Feb 08, 2016 at 12:00:26PM +1100, Dave Chinner wrote:
> On Wed, Feb 03, 2016 at 07:40:15PM +0100, Christoph Hellwig wrote:
> > We only need to communicate two bits of information to the direct I/O
> > completion handler:
> > 
> >  (1) do we need to convert any unwritten extents in the range
> >  (2) do we need to check if we need to update the inode size based
> >      on the range passed to the completion handler
> > 
> > We can use the private data passed to the get_block handler and the
> > completion handler as a simple bitmask to communicate this information
> > instead of the current complicated infrastructure reusing the ioends
> > from the buffer I/O path, and thus avoiding a memory allocation and
> > a context switch for any non-trivial direct write.  As a nice side
> > effect we also decouple the direct I/O path implementation from that
> > of the buffered I/O path.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> This change is now dependent on the preceeding direct IO API
> changes. Do I a) take the DIO API change through the XFS tree, or
> b) use the older version of the patch that didn't have this
> dependency and let somebody else deal with the API change and merge
> issues?
> 
> I'm happy to take the DIO API change through the XFS tree, if that's
> the fastest/easiest way to get the necessary DIO subsystem fixes
> into the mainline tree for XFS. As such, the for-next tree that I'm
> building right now will include the DIO API change patch....

Right now this series is in a stable branch in the XFS tree:

git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-dio-fix-4.6

If you want to push it through some other tree, please let me know
when/where it is committed so I can rebuild the XFS for-next branch
appropriately from a stable commit/branch...

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
  2016-02-08  6:17       ` Dave Chinner
  (?)
@ 2016-02-08  7:31         ` Christoph Hellwig
  -1 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-08  7:31 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Mon, Feb 08, 2016 at 05:17:31PM +1100, Dave Chinner wrote:
> Right now this series is in a stable branch in the XFS tree:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-dio-fix-4.6
> 
> If you want to push it through some other tree, please let me know
> when/where it is committed so I can rebuild the XFS for-next branch
> appropriately from a stable commit/branch...

That's how I think it should be handled.  This would also allow the
ext4 and ocfs2 maintainers to depend on the stable branch to clean
up their direct I/O completion handling in this merge window if they
want to.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-08  7:31         ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-08  7:31 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-fsdevel, linux-ext4, Christoph Hellwig, ocfs2-devel, xfs

On Mon, Feb 08, 2016 at 05:17:31PM +1100, Dave Chinner wrote:
> Right now this series is in a stable branch in the XFS tree:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-dio-fix-4.6
> 
> If you want to push it through some other tree, please let me know
> when/where it is committed so I can rebuild the XFS for-next branch
> appropriately from a stable commit/branch...

That's how I think it should be handled.  This would also allow the
ext4 and ocfs2 maintainers to depend on the stable branch to clean
up their direct I/O completion handling in this merge window if they
want to.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-08  7:31         ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-08  7:31 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Mon, Feb 08, 2016 at 05:17:31PM +1100, Dave Chinner wrote:
> Right now this series is in a stable branch in the XFS tree:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-dio-fix-4.6
> 
> If you want to push it through some other tree, please let me know
> when/where it is committed so I can rebuild the XFS for-next branch
> appropriately from a stable commit/branch...

That's how I think it should be handled.  This would also allow the
ext4 and ocfs2 maintainers to depend on the stable branch to clean
up their direct I/O completion handling in this merge window if they
want to.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
  2016-02-08  7:31         ` Christoph Hellwig
  (?)
@ 2016-02-08  9:16           ` Dave Chinner
  -1 siblings, 0 replies; 52+ messages in thread
From: Dave Chinner @ 2016-02-08  9:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Mon, Feb 08, 2016 at 08:31:21AM +0100, Christoph Hellwig wrote:
> On Mon, Feb 08, 2016 at 05:17:31PM +1100, Dave Chinner wrote:
> > Right now this series is in a stable branch in the XFS tree:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-dio-fix-4.6
> > 
> > If you want to push it through some other tree, please let me know
> > when/where it is committed so I can rebuild the XFS for-next branch
> > appropriately from a stable commit/branch...
> 
> That's how I think it should be handled.  This would also allow the
> ext4 and ocfs2 maintainers to depend on the stable branch to clean
> up their direct I/O completion handling in this merge window if they
> want to.

I can't tell if you are saying what I've done is fine if the
xfs-dio-fix-4.6 branch is stable (so others can pull it) or whether
it should be in some other tree. Can you clarify, Christoph?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-08  9:16           ` Dave Chinner
  0 siblings, 0 replies; 52+ messages in thread
From: Dave Chinner @ 2016-02-08  9:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4, ocfs2-devel, xfs

On Mon, Feb 08, 2016 at 08:31:21AM +0100, Christoph Hellwig wrote:
> On Mon, Feb 08, 2016 at 05:17:31PM +1100, Dave Chinner wrote:
> > Right now this series is in a stable branch in the XFS tree:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-dio-fix-4.6
> > 
> > If you want to push it through some other tree, please let me know
> > when/where it is committed so I can rebuild the XFS for-next branch
> > appropriately from a stable commit/branch...
> 
> That's how I think it should be handled.  This would also allow the
> ext4 and ocfs2 maintainers to depend on the stable branch to clean
> up their direct I/O completion handling in this merge window if they
> want to.

I can't tell if you are saying what I've done is fine if the
xfs-dio-fix-4.6 branch is stable (so others can pull it) or whether
it should be in some other tree. Can you clarify, Christoph?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-08  9:16           ` Dave Chinner
  0 siblings, 0 replies; 52+ messages in thread
From: Dave Chinner @ 2016-02-08  9:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Mon, Feb 08, 2016 at 08:31:21AM +0100, Christoph Hellwig wrote:
> On Mon, Feb 08, 2016 at 05:17:31PM +1100, Dave Chinner wrote:
> > Right now this series is in a stable branch in the XFS tree:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-dio-fix-4.6
> > 
> > If you want to push it through some other tree, please let me know
> > when/where it is committed so I can rebuild the XFS for-next branch
> > appropriately from a stable commit/branch...
> 
> That's how I think it should be handled.  This would also allow the
> ext4 and ocfs2 maintainers to depend on the stable branch to clean
> up their direct I/O completion handling in this merge window if they
> want to.

I can't tell if you are saying what I've done is fine if the
xfs-dio-fix-4.6 branch is stable (so others can pull it) or whether
it should be in some other tree. Can you clarify, Christoph?

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
  2016-02-08  9:16           ` Dave Chinner
  (?)
@ 2016-02-08  9:22             ` Christoph Hellwig
  -1 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-08  9:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Mon, Feb 08, 2016 at 08:16:44PM +1100, Dave Chinner wrote:
> I can't tell if you are saying what I've done is fine if the
> xfs-dio-fix-4.6 branch is stable (so others can pull it) or whether
> it should be in some other tree. Can you clarify, Christoph?

What you are doing is fine, thanks!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-08  9:22             ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-08  9:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-ext4, ocfs2-devel, xfs

On Mon, Feb 08, 2016 at 08:16:44PM +1100, Dave Chinner wrote:
> I can't tell if you are saying what I've done is fine if the
> xfs-dio-fix-4.6 branch is stable (so others can pull it) or whether
> it should be in some other tree. Can you clarify, Christoph?

What you are doing is fine, thanks!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] xfs: don't use ioends for direct write completions
@ 2016-02-08  9:22             ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2016-02-08  9:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, xfs, linux-ext4, ocfs2-devel

On Mon, Feb 08, 2016 at 08:16:44PM +1100, Dave Chinner wrote:
> I can't tell if you are saying what I've done is fine if the
> xfs-dio-fix-4.6 branch is stable (so others can pull it) or whether
> it should be in some other tree. Can you clarify, Christoph?

What you are doing is fine, thanks!

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2016-02-08  9:22 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-03 18:40 vfs/xfs: directio updates to ease COW handling V2 Christoph Hellwig
2016-02-03 18:40 ` [Ocfs2-devel] " Christoph Hellwig
2016-02-03 18:40 ` Christoph Hellwig
2016-02-03 18:40 ` [PATCH 1/3] direct-io: always call ->end_io if non-NULL Christoph Hellwig
2016-02-03 18:40   ` [Ocfs2-devel] " Christoph Hellwig
2016-02-03 18:40   ` Christoph Hellwig
2016-02-03 19:55   ` Darrick J. Wong
2016-02-03 19:55     ` [Ocfs2-devel] " Darrick J. Wong
2016-02-03 19:55     ` Darrick J. Wong
2016-02-04  7:14     ` Christoph Hellwig
2016-02-04  7:14       ` [Ocfs2-devel] " Christoph Hellwig
2016-02-04  7:14       ` Christoph Hellwig
2016-02-04  8:17       ` Darrick J. Wong
2016-02-04  8:17         ` [Ocfs2-devel] " Darrick J. Wong
2016-02-04  8:17         ` Darrick J. Wong
2016-02-03 18:40 ` [PATCH 2/3] xfs: don't use ioends for direct write completions Christoph Hellwig
2016-02-03 18:40   ` [Ocfs2-devel] " Christoph Hellwig
2016-02-03 18:40   ` Christoph Hellwig
2016-02-05 21:57   ` Darrick J. Wong
2016-02-05 21:57     ` [Ocfs2-devel] " Darrick J. Wong
2016-02-05 21:57     ` Darrick J. Wong
2016-02-05 22:36     ` Dave Chinner
2016-02-05 22:36       ` [Ocfs2-devel] " Dave Chinner
2016-02-05 22:36       ` Dave Chinner
2016-02-08  1:00   ` Dave Chinner
2016-02-08  1:00     ` [Ocfs2-devel] " Dave Chinner
2016-02-08  1:00     ` Dave Chinner
2016-02-08  6:17     ` Dave Chinner
2016-02-08  6:17       ` [Ocfs2-devel] " Dave Chinner
2016-02-08  6:17       ` Dave Chinner
2016-02-08  7:31       ` Christoph Hellwig
2016-02-08  7:31         ` [Ocfs2-devel] " Christoph Hellwig
2016-02-08  7:31         ` Christoph Hellwig
2016-02-08  9:16         ` Dave Chinner
2016-02-08  9:16           ` [Ocfs2-devel] " Dave Chinner
2016-02-08  9:16           ` Dave Chinner
2016-02-08  9:22           ` Christoph Hellwig
2016-02-08  9:22             ` [Ocfs2-devel] " Christoph Hellwig
2016-02-08  9:22             ` Christoph Hellwig
2016-02-03 18:40 ` [PATCH 3/3] xfs: fold xfs_vm_do_dio into xfs_vm_direct_IO Christoph Hellwig
2016-02-03 18:40   ` [Ocfs2-devel] " Christoph Hellwig
2016-02-03 18:40   ` Christoph Hellwig
2016-02-03 18:40   ` Christoph Hellwig
2016-02-03 19:43 ` vfs/xfs: directio updates to ease COW handling V2 Jeff Moyer
2016-02-03 19:43   ` [Ocfs2-devel] " Jeff Moyer
2016-02-03 19:43   ` Jeff Moyer
2016-02-03 20:01   ` Darrick J. Wong
2016-02-03 20:01     ` [Ocfs2-devel] " Darrick J. Wong
2016-02-03 20:01     ` Darrick J. Wong
2016-02-03 21:53     ` Jeff Moyer
2016-02-03 21:53       ` [Ocfs2-devel] " Jeff Moyer
2016-02-03 21:53       ` Jeff Moyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.