All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] splice: i_mutex vs splice write deadlock V2
@ 2011-08-08  6:45 ` Dave Chinner
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2011-08-08  6:45 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel

generic_file_splice_write() takes the inode->i_mutex after the
filesystem has taken whatever locks it needs to ensure sanity.
however, this typically violates the locking order of filesystems
with their own locks in that the order is usually i_mutex ->
filesystem lock.

XFS is such a case, and generic_file_splice_write() is generating
lockdep warnings because of lock inversions between the
inode->i_mutex and the XFS_I(inode)->i_iolock. There is also a
reported case of fio causing a deadlock when it mixes IO types
(e.g. splice vs direct IO).

Version 2:
- add a new function to take an actor to do the work of splicing the
  data to the file. Convert generic_file_splice_write to use this,
  and make XFS call it with a different actor fuction that avoids
  the i_mutex.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 0/2] splice: i_mutex vs splice write deadlock V2
@ 2011-08-08  6:45 ` Dave Chinner
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2011-08-08  6:45 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel

generic_file_splice_write() takes the inode->i_mutex after the
filesystem has taken whatever locks it needs to ensure sanity.
however, this typically violates the locking order of filesystems
with their own locks in that the order is usually i_mutex ->
filesystem lock.

XFS is such a case, and generic_file_splice_write() is generating
lockdep warnings because of lock inversions between the
inode->i_mutex and the XFS_I(inode)->i_iolock. There is also a
reported case of fio causing a deadlock when it mixes IO types
(e.g. splice vs direct IO).

Version 2:
- add a new function to take an actor to do the work of splicing the
  data to the file. Convert generic_file_splice_write to use this,
  and make XFS call it with a different actor fuction that avoids
  the i_mutex.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] vfs: split generic splice code from i_mutex locking
  2011-08-08  6:45 ` Dave Chinner
@ 2011-08-08  6:45   ` Dave Chinner
  -1 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2011-08-08  6:45 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel

From: Dave Chinner <dchinner@redhat.com>

XFS holds locks that should be nested inside the inode->i_mutex when
generic_file_splice_write is called. This function takes the
i_mutex, and so we get a lock inversion that triggers lockdep
warnings and has been found to cause real deadlocks.

XFS does not need the splice code to take the i_mutex to do the page
cache manipulation, so modify the generic splice code to use an
actor function for the code that currently requires the i_mutex to
be taken.  Convert generic_file_splice_write() to use this new
interface supplying a generic actor function that performs the same
actions as the existing code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/splice.c            |   58 ++++++++++++++++++++++++++++++++++++++----------
 include/linux/splice.h |    5 ++++
 2 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index fa2defa..a02dddc 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -967,24 +967,24 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
 }
 
 /**
- * generic_file_splice_write - splice data from a pipe to a file
+ * splice_write_to_file - splice data from a pipe to a file
  * @pipe:	pipe info
  * @out:	file to write to
  * @ppos:	position in @out
  * @len:	number of bytes to splice
  * @flags:	splice modifier flags
+ * @actor:	worker that does the splicing from the pipe to the file
  *
  * Description:
  *    Will either move or copy pages (determined by @flags options) from
  *    the given pipe inode to the given file.
  *
  */
-ssize_t
-generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
-			  loff_t *ppos, size_t len, unsigned int flags)
+ssize_t splice_write_to_file(struct pipe_inode_info *pipe, struct file *out,
+			     loff_t *ppos, size_t len, unsigned int flags,
+			     splice_write_actor actor)
 {
 	struct address_space *mapping = out->f_mapping;
-	struct inode *inode = mapping->host;
 	struct splice_desc sd = {
 		.total_len = len,
 		.flags = flags,
@@ -1001,13 +1001,8 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 		if (ret <= 0)
 			break;
 
-		mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
-		ret = file_remove_suid(out);
-		if (!ret) {
-			file_update_time(out);
-			ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file);
-		}
-		mutex_unlock(&inode->i_mutex);
+		ret = actor(pipe, &sd);
+
 	} while (ret > 0);
 	splice_from_pipe_end(pipe, &sd);
 
@@ -1032,7 +1027,46 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 
 	return ret;
 }
+EXPORT_SYMBOL(splice_write_to_file);
 
+static ssize_t generic_file_splice_write_actor(struct pipe_inode_info *pipe,
+					       struct splice_desc *sd)
+{
+	struct file *out = sd->u.file;
+	struct inode *inode = out->f_mapping->host;
+	ssize_t ret;
+
+	mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
+	ret = file_remove_suid(out);
+	if (!ret) {
+		file_update_time(out);
+		ret = splice_from_pipe_feed(pipe, sd, pipe_to_file);
+	}
+	mutex_unlock(&inode->i_mutex);
+
+	return ret;
+}
+
+/**
+ * generic_file_splice_write - splice data from a pipe to a file
+ * @pipe:	pipe info
+ * @out:	file to write to
+ * @ppos:	position in @out
+ * @len:	number of bytes to splice
+ * @flags:	splice modifier flags
+ *
+ * Description:
+ *    Will either move or copy pages (determined by @flags options) from
+ *    the given pipe inode to the given file.
+ *
+ */
+ssize_t
+generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
+			  loff_t *ppos, size_t len, unsigned int flags)
+{
+	return splice_write_to_file(pipe, out, ppos, len, flags,
+				    generic_file_splice_write_actor);
+}
 EXPORT_SYMBOL(generic_file_splice_write);
 
 static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 26e5b61..bd14281 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -61,6 +61,8 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *,
 			   struct splice_desc *);
 typedef int (splice_direct_actor)(struct pipe_inode_info *,
 				  struct splice_desc *);
+typedef ssize_t (splice_write_actor)(struct pipe_inode_info *,
+				  struct splice_desc *);
 
 extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *,
 				loff_t *, size_t, unsigned int,
@@ -82,6 +84,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
 extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
 				      splice_direct_actor *);
 
+extern ssize_t splice_write_to_file(struct pipe_inode_info *, struct file *,
+				    loff_t *, size_t, unsigned int,
+				    splice_write_actor *);
 /*
  * for dynamic pipe sizing
  */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/2] vfs: split generic splice code from i_mutex locking
@ 2011-08-08  6:45   ` Dave Chinner
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2011-08-08  6:45 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel

From: Dave Chinner <dchinner@redhat.com>

XFS holds locks that should be nested inside the inode->i_mutex when
generic_file_splice_write is called. This function takes the
i_mutex, and so we get a lock inversion that triggers lockdep
warnings and has been found to cause real deadlocks.

XFS does not need the splice code to take the i_mutex to do the page
cache manipulation, so modify the generic splice code to use an
actor function for the code that currently requires the i_mutex to
be taken.  Convert generic_file_splice_write() to use this new
interface supplying a generic actor function that performs the same
actions as the existing code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/splice.c            |   58 ++++++++++++++++++++++++++++++++++++++----------
 include/linux/splice.h |    5 ++++
 2 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index fa2defa..a02dddc 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -967,24 +967,24 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
 }
 
 /**
- * generic_file_splice_write - splice data from a pipe to a file
+ * splice_write_to_file - splice data from a pipe to a file
  * @pipe:	pipe info
  * @out:	file to write to
  * @ppos:	position in @out
  * @len:	number of bytes to splice
  * @flags:	splice modifier flags
+ * @actor:	worker that does the splicing from the pipe to the file
  *
  * Description:
  *    Will either move or copy pages (determined by @flags options) from
  *    the given pipe inode to the given file.
  *
  */
-ssize_t
-generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
-			  loff_t *ppos, size_t len, unsigned int flags)
+ssize_t splice_write_to_file(struct pipe_inode_info *pipe, struct file *out,
+			     loff_t *ppos, size_t len, unsigned int flags,
+			     splice_write_actor actor)
 {
 	struct address_space *mapping = out->f_mapping;
-	struct inode *inode = mapping->host;
 	struct splice_desc sd = {
 		.total_len = len,
 		.flags = flags,
@@ -1001,13 +1001,8 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 		if (ret <= 0)
 			break;
 
-		mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
-		ret = file_remove_suid(out);
-		if (!ret) {
-			file_update_time(out);
-			ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file);
-		}
-		mutex_unlock(&inode->i_mutex);
+		ret = actor(pipe, &sd);
+
 	} while (ret > 0);
 	splice_from_pipe_end(pipe, &sd);
 
@@ -1032,7 +1027,46 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 
 	return ret;
 }
+EXPORT_SYMBOL(splice_write_to_file);
 
+static ssize_t generic_file_splice_write_actor(struct pipe_inode_info *pipe,
+					       struct splice_desc *sd)
+{
+	struct file *out = sd->u.file;
+	struct inode *inode = out->f_mapping->host;
+	ssize_t ret;
+
+	mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
+	ret = file_remove_suid(out);
+	if (!ret) {
+		file_update_time(out);
+		ret = splice_from_pipe_feed(pipe, sd, pipe_to_file);
+	}
+	mutex_unlock(&inode->i_mutex);
+
+	return ret;
+}
+
+/**
+ * generic_file_splice_write - splice data from a pipe to a file
+ * @pipe:	pipe info
+ * @out:	file to write to
+ * @ppos:	position in @out
+ * @len:	number of bytes to splice
+ * @flags:	splice modifier flags
+ *
+ * Description:
+ *    Will either move or copy pages (determined by @flags options) from
+ *    the given pipe inode to the given file.
+ *
+ */
+ssize_t
+generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
+			  loff_t *ppos, size_t len, unsigned int flags)
+{
+	return splice_write_to_file(pipe, out, ppos, len, flags,
+				    generic_file_splice_write_actor);
+}
 EXPORT_SYMBOL(generic_file_splice_write);
 
 static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 26e5b61..bd14281 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -61,6 +61,8 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *,
 			   struct splice_desc *);
 typedef int (splice_direct_actor)(struct pipe_inode_info *,
 				  struct splice_desc *);
+typedef ssize_t (splice_write_actor)(struct pipe_inode_info *,
+				  struct splice_desc *);
 
 extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *,
 				loff_t *, size_t, unsigned int,
@@ -82,6 +84,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
 extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
 				      splice_direct_actor *);
 
+extern ssize_t splice_write_to_file(struct pipe_inode_info *, struct file *,
+				    loff_t *, size_t, unsigned int,
+				    splice_write_actor *);
 /*
  * for dynamic pipe sizing
  */
-- 
1.7.5.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] xfs: fix splice/direct-IO deadlock
  2011-08-08  6:45 ` Dave Chinner
@ 2011-08-08  6:45   ` Dave Chinner
  -1 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2011-08-08  6:45 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel

From: Dave Chinner <dchinner@redhat.com>

lockdep reports splice vs direct-io write lock inversions due to
generic_file_splice_write() taking the inode->i_mutex inside
XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can
deadlock.  Use splice_write_to_file() with an actor that does not
take the i_mutex to avoid these problems.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_file.c |   41 +++++++++++++++++++++++++++++------------
 1 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 62a5022..959d8b2 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -38,6 +38,7 @@
 
 #include <linux/dcache.h>
 #include <linux/falloc.h>
+#include <linux/splice.h>
 
 static const struct vm_operations_struct xfs_file_vm_ops;
 
@@ -431,13 +432,29 @@ xfs_aio_write_newsize_update(
 	}
 }
 
+static ssize_t
+xfs_file_splice_write_actor(
+	struct pipe_inode_info	*pipe,
+	struct splice_desc	*sd)
+{
+	struct file		*out = sd->u.file;
+	ssize_t ret;
+
+	ret = file_remove_suid(out);
+	if (!ret) {
+		file_update_time(out);
+		ret = splice_from_pipe_feed(pipe, sd, pipe_to_file);
+	}
+
+	return ret;
+}
+
 /*
- * xfs_file_splice_write() does not use xfs_rw_ilock() because
- * generic_file_splice_write() takes the i_mutex itself. This, in theory,
- * couuld cause lock inversions between the aio_write path and the splice path
- * if someone is doing concurrent splice(2) based writes and write(2) based
- * writes to the same inode. The only real way to fix this is to re-implement
- * the generic code here with correct locking orders.
+ * xfs_file_splice_write() does not use the generic file splice write path
+ * because that takes the i_mutex, causing lock inversions with the IOLOCK.
+ * Instead, we call splice_write_to_file() directly with our own actor that does
+ * not take the i_mutex. This allows us to use the xfs_rw_ilock() functions like
+ * the rest of the code and hence avoid lock inversions and deadlocks.
  */
 STATIC ssize_t
 xfs_file_splice_write(
@@ -461,22 +478,22 @@ xfs_file_splice_write(
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
 		return -EIO;
 
-	xfs_ilock(ip, XFS_IOLOCK_EXCL);
+	xfs_rw_ilock(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL);
 
 	new_size = *ppos + count;
-
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	if (new_size > ip->i_size)
 		ip->i_new_size = new_size;
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
 
 	trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
 
-	ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags);
+	ret = splice_write_to_file(pipe, outfilp, ppos, count, flags,
+					xfs_file_splice_write_actor);
 
 	xfs_aio_write_isize_update(inode, ppos, ret);
 	xfs_aio_write_newsize_update(ip, new_size);
-	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+	xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL);
 	return ret;
 }
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] xfs: fix splice/direct-IO deadlock
@ 2011-08-08  6:45   ` Dave Chinner
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2011-08-08  6:45 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel

From: Dave Chinner <dchinner@redhat.com>

lockdep reports splice vs direct-io write lock inversions due to
generic_file_splice_write() taking the inode->i_mutex inside
XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can
deadlock.  Use splice_write_to_file() with an actor that does not
take the i_mutex to avoid these problems.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_file.c |   41 +++++++++++++++++++++++++++++------------
 1 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 62a5022..959d8b2 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -38,6 +38,7 @@
 
 #include <linux/dcache.h>
 #include <linux/falloc.h>
+#include <linux/splice.h>
 
 static const struct vm_operations_struct xfs_file_vm_ops;
 
@@ -431,13 +432,29 @@ xfs_aio_write_newsize_update(
 	}
 }
 
+static ssize_t
+xfs_file_splice_write_actor(
+	struct pipe_inode_info	*pipe,
+	struct splice_desc	*sd)
+{
+	struct file		*out = sd->u.file;
+	ssize_t ret;
+
+	ret = file_remove_suid(out);
+	if (!ret) {
+		file_update_time(out);
+		ret = splice_from_pipe_feed(pipe, sd, pipe_to_file);
+	}
+
+	return ret;
+}
+
 /*
- * xfs_file_splice_write() does not use xfs_rw_ilock() because
- * generic_file_splice_write() takes the i_mutex itself. This, in theory,
- * couuld cause lock inversions between the aio_write path and the splice path
- * if someone is doing concurrent splice(2) based writes and write(2) based
- * writes to the same inode. The only real way to fix this is to re-implement
- * the generic code here with correct locking orders.
+ * xfs_file_splice_write() does not use the generic file splice write path
+ * because that takes the i_mutex, causing lock inversions with the IOLOCK.
+ * Instead, we call splice_write_to_file() directly with our own actor that does
+ * not take the i_mutex. This allows us to use the xfs_rw_ilock() functions like
+ * the rest of the code and hence avoid lock inversions and deadlocks.
  */
 STATIC ssize_t
 xfs_file_splice_write(
@@ -461,22 +478,22 @@ xfs_file_splice_write(
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
 		return -EIO;
 
-	xfs_ilock(ip, XFS_IOLOCK_EXCL);
+	xfs_rw_ilock(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL);
 
 	new_size = *ppos + count;
-
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	if (new_size > ip->i_size)
 		ip->i_new_size = new_size;
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
 
 	trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
 
-	ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags);
+	ret = splice_write_to_file(pipe, outfilp, ppos, count, flags,
+					xfs_file_splice_write_actor);
 
 	xfs_aio_write_isize_update(inode, ppos, ret);
 	xfs_aio_write_newsize_update(ip, new_size);
-	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+	xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL);
 	return ret;
 }
 
-- 
1.7.5.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] vfs: split generic splice code from i_mutex locking
  2011-08-08  6:45   ` Dave Chinner
@ 2011-08-09 11:36     ` Jan Kara
  -1 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2011-08-09 11:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, linux-fsdevel

On Mon 08-08-11 16:45:26, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> XFS holds locks that should be nested inside the inode->i_mutex when
> generic_file_splice_write is called. This function takes the
> i_mutex, and so we get a lock inversion that triggers lockdep
> warnings and has been found to cause real deadlocks.
> 
> XFS does not need the splice code to take the i_mutex to do the page
> cache manipulation, so modify the generic splice code to use an
> actor function for the code that currently requires the i_mutex to
> be taken.  Convert generic_file_splice_write() to use this new
> interface supplying a generic actor function that performs the same
> actions as the existing code.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
  The patch looks good to me.
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> ---
>  fs/splice.c            |   58 ++++++++++++++++++++++++++++++++++++++----------
>  include/linux/splice.h |    5 ++++
>  2 files changed, 51 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index fa2defa..a02dddc 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -967,24 +967,24 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
>  }
>  
>  /**
> - * generic_file_splice_write - splice data from a pipe to a file
> + * splice_write_to_file - splice data from a pipe to a file
>   * @pipe:	pipe info
>   * @out:	file to write to
>   * @ppos:	position in @out
>   * @len:	number of bytes to splice
>   * @flags:	splice modifier flags
> + * @actor:	worker that does the splicing from the pipe to the file
>   *
>   * Description:
>   *    Will either move or copy pages (determined by @flags options) from
>   *    the given pipe inode to the given file.
>   *
>   */
> -ssize_t
> -generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
> -			  loff_t *ppos, size_t len, unsigned int flags)
> +ssize_t splice_write_to_file(struct pipe_inode_info *pipe, struct file *out,
> +			     loff_t *ppos, size_t len, unsigned int flags,
> +			     splice_write_actor actor)
>  {
>  	struct address_space *mapping = out->f_mapping;
> -	struct inode *inode = mapping->host;
>  	struct splice_desc sd = {
>  		.total_len = len,
>  		.flags = flags,
> @@ -1001,13 +1001,8 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
>  		if (ret <= 0)
>  			break;
>  
> -		mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
> -		ret = file_remove_suid(out);
> -		if (!ret) {
> -			file_update_time(out);
> -			ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file);
> -		}
> -		mutex_unlock(&inode->i_mutex);
> +		ret = actor(pipe, &sd);
> +
>  	} while (ret > 0);
>  	splice_from_pipe_end(pipe, &sd);
>  
> @@ -1032,7 +1027,46 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
>  
>  	return ret;
>  }
> +EXPORT_SYMBOL(splice_write_to_file);
>  
> +static ssize_t generic_file_splice_write_actor(struct pipe_inode_info *pipe,
> +					       struct splice_desc *sd)
> +{
> +	struct file *out = sd->u.file;
> +	struct inode *inode = out->f_mapping->host;
> +	ssize_t ret;
> +
> +	mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
> +	ret = file_remove_suid(out);
> +	if (!ret) {
> +		file_update_time(out);
> +		ret = splice_from_pipe_feed(pipe, sd, pipe_to_file);
> +	}
> +	mutex_unlock(&inode->i_mutex);
> +
> +	return ret;
> +}
> +
> +/**
> + * generic_file_splice_write - splice data from a pipe to a file
> + * @pipe:	pipe info
> + * @out:	file to write to
> + * @ppos:	position in @out
> + * @len:	number of bytes to splice
> + * @flags:	splice modifier flags
> + *
> + * Description:
> + *    Will either move or copy pages (determined by @flags options) from
> + *    the given pipe inode to the given file.
> + *
> + */
> +ssize_t
> +generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
> +			  loff_t *ppos, size_t len, unsigned int flags)
> +{
> +	return splice_write_to_file(pipe, out, ppos, len, flags,
> +				    generic_file_splice_write_actor);
> +}
>  EXPORT_SYMBOL(generic_file_splice_write);
>  
>  static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> diff --git a/include/linux/splice.h b/include/linux/splice.h
> index 26e5b61..bd14281 100644
> --- a/include/linux/splice.h
> +++ b/include/linux/splice.h
> @@ -61,6 +61,8 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *,
>  			   struct splice_desc *);
>  typedef int (splice_direct_actor)(struct pipe_inode_info *,
>  				  struct splice_desc *);
> +typedef ssize_t (splice_write_actor)(struct pipe_inode_info *,
> +				  struct splice_desc *);
>  
>  extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *,
>  				loff_t *, size_t, unsigned int,
> @@ -82,6 +84,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
>  extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
>  				      splice_direct_actor *);
>  
> +extern ssize_t splice_write_to_file(struct pipe_inode_info *, struct file *,
> +				    loff_t *, size_t, unsigned int,
> +				    splice_write_actor *);
>  /*
>   * for dynamic pipe sizing
>   */
> -- 
> 1.7.5.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] vfs: split generic splice code from i_mutex locking
@ 2011-08-09 11:36     ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2011-08-09 11:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, xfs

On Mon 08-08-11 16:45:26, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> XFS holds locks that should be nested inside the inode->i_mutex when
> generic_file_splice_write is called. This function takes the
> i_mutex, and so we get a lock inversion that triggers lockdep
> warnings and has been found to cause real deadlocks.
> 
> XFS does not need the splice code to take the i_mutex to do the page
> cache manipulation, so modify the generic splice code to use an
> actor function for the code that currently requires the i_mutex to
> be taken.  Convert generic_file_splice_write() to use this new
> interface supplying a generic actor function that performs the same
> actions as the existing code.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
  The patch looks good to me.
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> ---
>  fs/splice.c            |   58 ++++++++++++++++++++++++++++++++++++++----------
>  include/linux/splice.h |    5 ++++
>  2 files changed, 51 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index fa2defa..a02dddc 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -967,24 +967,24 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
>  }
>  
>  /**
> - * generic_file_splice_write - splice data from a pipe to a file
> + * splice_write_to_file - splice data from a pipe to a file
>   * @pipe:	pipe info
>   * @out:	file to write to
>   * @ppos:	position in @out
>   * @len:	number of bytes to splice
>   * @flags:	splice modifier flags
> + * @actor:	worker that does the splicing from the pipe to the file
>   *
>   * Description:
>   *    Will either move or copy pages (determined by @flags options) from
>   *    the given pipe inode to the given file.
>   *
>   */
> -ssize_t
> -generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
> -			  loff_t *ppos, size_t len, unsigned int flags)
> +ssize_t splice_write_to_file(struct pipe_inode_info *pipe, struct file *out,
> +			     loff_t *ppos, size_t len, unsigned int flags,
> +			     splice_write_actor actor)
>  {
>  	struct address_space *mapping = out->f_mapping;
> -	struct inode *inode = mapping->host;
>  	struct splice_desc sd = {
>  		.total_len = len,
>  		.flags = flags,
> @@ -1001,13 +1001,8 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
>  		if (ret <= 0)
>  			break;
>  
> -		mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
> -		ret = file_remove_suid(out);
> -		if (!ret) {
> -			file_update_time(out);
> -			ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file);
> -		}
> -		mutex_unlock(&inode->i_mutex);
> +		ret = actor(pipe, &sd);
> +
>  	} while (ret > 0);
>  	splice_from_pipe_end(pipe, &sd);
>  
> @@ -1032,7 +1027,46 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
>  
>  	return ret;
>  }
> +EXPORT_SYMBOL(splice_write_to_file);
>  
> +static ssize_t generic_file_splice_write_actor(struct pipe_inode_info *pipe,
> +					       struct splice_desc *sd)
> +{
> +	struct file *out = sd->u.file;
> +	struct inode *inode = out->f_mapping->host;
> +	ssize_t ret;
> +
> +	mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
> +	ret = file_remove_suid(out);
> +	if (!ret) {
> +		file_update_time(out);
> +		ret = splice_from_pipe_feed(pipe, sd, pipe_to_file);
> +	}
> +	mutex_unlock(&inode->i_mutex);
> +
> +	return ret;
> +}
> +
> +/**
> + * generic_file_splice_write - splice data from a pipe to a file
> + * @pipe:	pipe info
> + * @out:	file to write to
> + * @ppos:	position in @out
> + * @len:	number of bytes to splice
> + * @flags:	splice modifier flags
> + *
> + * Description:
> + *    Will either move or copy pages (determined by @flags options) from
> + *    the given pipe inode to the given file.
> + *
> + */
> +ssize_t
> +generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
> +			  loff_t *ppos, size_t len, unsigned int flags)
> +{
> +	return splice_write_to_file(pipe, out, ppos, len, flags,
> +				    generic_file_splice_write_actor);
> +}
>  EXPORT_SYMBOL(generic_file_splice_write);
>  
>  static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> diff --git a/include/linux/splice.h b/include/linux/splice.h
> index 26e5b61..bd14281 100644
> --- a/include/linux/splice.h
> +++ b/include/linux/splice.h
> @@ -61,6 +61,8 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *,
>  			   struct splice_desc *);
>  typedef int (splice_direct_actor)(struct pipe_inode_info *,
>  				  struct splice_desc *);
> +typedef ssize_t (splice_write_actor)(struct pipe_inode_info *,
> +				  struct splice_desc *);
>  
>  extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *,
>  				loff_t *, size_t, unsigned int,
> @@ -82,6 +84,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
>  extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
>  				      splice_direct_actor *);
>  
> +extern ssize_t splice_write_to_file(struct pipe_inode_info *, struct file *,
> +				    loff_t *, size_t, unsigned int,
> +				    splice_write_actor *);
>  /*
>   * for dynamic pipe sizing
>   */
> -- 
> 1.7.5.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] vfs: split generic splice code from i_mutex locking
  2011-08-08  6:45   ` Dave Chinner
@ 2011-08-10 10:12     ` Christoph Hellwig
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2011-08-10 10:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, linux-fsdevel

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] vfs: split generic splice code from i_mutex locking
@ 2011-08-10 10:12     ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2011-08-10 10:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] xfs: fix splice/direct-IO deadlock
  2011-08-08  6:45   ` Dave Chinner
@ 2011-08-10 10:17     ` Christoph Hellwig
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2011-08-10 10:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, linux-fsdevel

On Mon, Aug 08, 2011 at 04:45:27PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> lockdep reports splice vs direct-io write lock inversions due to
> generic_file_splice_write() taking the inode->i_mutex inside
> XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can
> deadlock.  Use splice_write_to_file() with an actor that does not
> take the i_mutex to avoid these problems.

I don't think the locking model is quite correct yet.  We'll still
hold the iolock and i_mutex over
splice_from_pipe_begin/splice_from_pipe_next/splice_from_pipe_end,
which call into the pipe code, and take other i_mutex instances that
may deadlock against ours.

It also means we hold the iolock over generic_write_sync which calls
into ->fsync which takes the iolock with my latests changes.

I think we'll have to take the locking and i_size updates into the
actor, and behave like the other filesystems.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] xfs: fix splice/direct-IO deadlock
@ 2011-08-10 10:17     ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2011-08-10 10:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, xfs

On Mon, Aug 08, 2011 at 04:45:27PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> lockdep reports splice vs direct-io write lock inversions due to
> generic_file_splice_write() taking the inode->i_mutex inside
> XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can
> deadlock.  Use splice_write_to_file() with an actor that does not
> take the i_mutex to avoid these problems.

I don't think the locking model is quite correct yet.  We'll still
hold the iolock and i_mutex over
splice_from_pipe_begin/splice_from_pipe_next/splice_from_pipe_end,
which call into the pipe code, and take other i_mutex instances that
may deadlock against ours.

It also means we hold the iolock over generic_write_sync which calls
into ->fsync which takes the iolock with my latests changes.

I think we'll have to take the locking and i_size updates into the
actor, and behave like the other filesystems.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] splice: i_mutex vs splice write deadlock V2
  2011-08-08  6:45 ` Dave Chinner
@ 2011-08-10 10:19   ` Christoph Hellwig
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2011-08-10 10:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, linux-fsdevel, mfasheh, jlbec

On Mon, Aug 08, 2011 at 04:45:25PM +1000, Dave Chinner wrote:
> generic_file_splice_write() takes the inode->i_mutex after the
> filesystem has taken whatever locks it needs to ensure sanity.
> however, this typically violates the locking order of filesystems
> with their own locks in that the order is usually i_mutex ->
> filesystem lock.
> 
> XFS is such a case, and generic_file_splice_write() is generating
> lockdep warnings because of lock inversions between the
> inode->i_mutex and the XFS_I(inode)->i_iolock. There is also a
> reported case of fio causing a deadlock when it mixes IO types
> (e.g. splice vs direct IO).

Another case is ocfs2, which looks like a perfect candidate to be
converted over to your infrastructure.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] splice: i_mutex vs splice write deadlock V2
@ 2011-08-10 10:19   ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2011-08-10 10:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, mfasheh, jlbec, xfs

On Mon, Aug 08, 2011 at 04:45:25PM +1000, Dave Chinner wrote:
> generic_file_splice_write() takes the inode->i_mutex after the
> filesystem has taken whatever locks it needs to ensure sanity.
> however, this typically violates the locking order of filesystems
> with their own locks in that the order is usually i_mutex ->
> filesystem lock.
> 
> XFS is such a case, and generic_file_splice_write() is generating
> lockdep warnings because of lock inversions between the
> inode->i_mutex and the XFS_I(inode)->i_iolock. There is also a
> reported case of fio causing a deadlock when it mixes IO types
> (e.g. splice vs direct IO).

Another case is ocfs2, which looks like a perfect candidate to be
converted over to your infrastructure.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] vfs: split generic splice code from i_mutex locking
  2012-11-28  2:12 [PATCH 0/2] splice: fix direct IO/splice deadlock Dave Chinner
@ 2012-11-28  2:12   ` Dave Chinner
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2012-11-28  2:12 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: xfs

From: Dave Chinner <dchinner@redhat.com>

XFS holds locks that should be nested inside the inode->i_mutex when
generic_file_splice_write is called. This function takes the
i_mutex, and so we get a lock inversion that triggers lockdep
warnings and has been found to cause real deadlocks.

XFS does not need the splice code to take the i_mutex to do the page
cache manipulation, so modify the generic splice code to use an
actor function for the code that currently requires the i_mutex to
be taken.  Convert generic_file_splice_write() to use this new
interface supplying a generic actor function that performs the same
actions as the existing code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/splice.c            |   64 ++++++++++++++++++++++++++++++++++++------------
 include/linux/splice.h |    5 ++++
 2 files changed, 53 insertions(+), 16 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 13e5b47..66d4f24 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -970,24 +970,24 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
 }
 
 /**
- * generic_file_splice_write - splice data from a pipe to a file
+ * splice_write_to_file - splice data from a pipe to a file
  * @pipe:	pipe info
  * @out:	file to write to
  * @ppos:	position in @out
  * @len:	number of bytes to splice
  * @flags:	splice modifier flags
+ * @actor:	worker that does the splicing from the pipe to the file
  *
  * Description:
  *    Will either move or copy pages (determined by @flags options) from
  *    the given pipe inode to the given file.
  *
  */
-ssize_t
-generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
-			  loff_t *ppos, size_t len, unsigned int flags)
+ssize_t splice_write_to_file(struct pipe_inode_info *pipe, struct file *out,
+			     loff_t *ppos, size_t len, unsigned int flags,
+			     splice_write_actor actor)
 {
 	struct address_space *mapping = out->f_mapping;
-	struct inode *inode = mapping->host;
 	struct splice_desc sd = {
 		.total_len = len,
 		.flags = flags,
@@ -996,7 +996,7 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	};
 	ssize_t ret;
 
-	sb_start_write(inode->i_sb);
+	sb_start_write(mapping->host->i_sb);
 
 	pipe_lock(pipe);
 
@@ -1006,15 +1006,7 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 		if (ret <= 0)
 			break;
 
-		mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
-		ret = file_remove_suid(out);
-		if (!ret) {
-			ret = file_update_time(out);
-			if (!ret)
-				ret = splice_from_pipe_feed(pipe, &sd,
-							    pipe_to_file);
-		}
-		mutex_unlock(&inode->i_mutex);
+		ret = actor(pipe, &sd);
 	} while (ret > 0);
 	splice_from_pipe_end(pipe, &sd);
 
@@ -1036,11 +1028,51 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 			*ppos += ret;
 		balance_dirty_pages_ratelimited_nr(mapping, nr_pages);
 	}
-	sb_end_write(inode->i_sb);
+	sb_end_write(mapping->host->i_sb);
 
 	return ret;
 }
+EXPORT_SYMBOL(splice_write_to_file);
 
+static ssize_t generic_file_splice_write_actor(struct pipe_inode_info *pipe,
+					       struct splice_desc *sd)
+{
+	struct file *out = sd->u.file;
+	struct inode *inode = out->f_mapping->host;
+	ssize_t ret;
+
+	mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
+	ret = file_remove_suid(out);
+	if (!ret) {
+		ret = file_update_time(out);
+		if (!ret)
+			ret = splice_from_pipe_feed(pipe, sd, pipe_to_file);
+	}
+	mutex_unlock(&inode->i_mutex);
+
+	return ret;
+}
+
+/**
+ * generic_file_splice_write - splice data from a pipe to a file
+ * @pipe:	pipe info
+ * @out:	file to write to
+ * @ppos:	position in @out
+ * @len:	number of bytes to splice
+ * @flags:	splice modifier flags
+ *
+ * Description:
+ *    Will either move or copy pages (determined by @flags options) from
+ *    the given pipe inode to the given file.
+ *
+ */
+ssize_t
+generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
+			  loff_t *ppos, size_t len, unsigned int flags)
+{
+	return splice_write_to_file(pipe, out, ppos, len, flags,
+				    generic_file_splice_write_actor);
+}
 EXPORT_SYMBOL(generic_file_splice_write);
 
 static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 09a545a..44ce6f6b 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -62,6 +62,8 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *,
 			   struct splice_desc *);
 typedef int (splice_direct_actor)(struct pipe_inode_info *,
 				  struct splice_desc *);
+typedef ssize_t (splice_write_actor)(struct pipe_inode_info *,
+				  struct splice_desc *);
 
 extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *,
 				loff_t *, size_t, unsigned int,
@@ -83,6 +85,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
 extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
 				      splice_direct_actor *);
 
+extern ssize_t splice_write_to_file(struct pipe_inode_info *, struct file *,
+				    loff_t *, size_t, unsigned int,
+				    splice_write_actor *);
 /*
  * for dynamic pipe sizing
  */
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/2] vfs: split generic splice code from i_mutex locking
@ 2012-11-28  2:12   ` Dave Chinner
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2012-11-28  2:12 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: xfs

From: Dave Chinner <dchinner@redhat.com>

XFS holds locks that should be nested inside the inode->i_mutex when
generic_file_splice_write is called. This function takes the
i_mutex, and so we get a lock inversion that triggers lockdep
warnings and has been found to cause real deadlocks.

XFS does not need the splice code to take the i_mutex to do the page
cache manipulation, so modify the generic splice code to use an
actor function for the code that currently requires the i_mutex to
be taken.  Convert generic_file_splice_write() to use this new
interface supplying a generic actor function that performs the same
actions as the existing code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/splice.c            |   64 ++++++++++++++++++++++++++++++++++++------------
 include/linux/splice.h |    5 ++++
 2 files changed, 53 insertions(+), 16 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 13e5b47..66d4f24 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -970,24 +970,24 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
 }
 
 /**
- * generic_file_splice_write - splice data from a pipe to a file
+ * splice_write_to_file - splice data from a pipe to a file
  * @pipe:	pipe info
  * @out:	file to write to
  * @ppos:	position in @out
  * @len:	number of bytes to splice
  * @flags:	splice modifier flags
+ * @actor:	worker that does the splicing from the pipe to the file
  *
  * Description:
  *    Will either move or copy pages (determined by @flags options) from
  *    the given pipe inode to the given file.
  *
  */
-ssize_t
-generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
-			  loff_t *ppos, size_t len, unsigned int flags)
+ssize_t splice_write_to_file(struct pipe_inode_info *pipe, struct file *out,
+			     loff_t *ppos, size_t len, unsigned int flags,
+			     splice_write_actor actor)
 {
 	struct address_space *mapping = out->f_mapping;
-	struct inode *inode = mapping->host;
 	struct splice_desc sd = {
 		.total_len = len,
 		.flags = flags,
@@ -996,7 +996,7 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	};
 	ssize_t ret;
 
-	sb_start_write(inode->i_sb);
+	sb_start_write(mapping->host->i_sb);
 
 	pipe_lock(pipe);
 
@@ -1006,15 +1006,7 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 		if (ret <= 0)
 			break;
 
-		mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
-		ret = file_remove_suid(out);
-		if (!ret) {
-			ret = file_update_time(out);
-			if (!ret)
-				ret = splice_from_pipe_feed(pipe, &sd,
-							    pipe_to_file);
-		}
-		mutex_unlock(&inode->i_mutex);
+		ret = actor(pipe, &sd);
 	} while (ret > 0);
 	splice_from_pipe_end(pipe, &sd);
 
@@ -1036,11 +1028,51 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 			*ppos += ret;
 		balance_dirty_pages_ratelimited_nr(mapping, nr_pages);
 	}
-	sb_end_write(inode->i_sb);
+	sb_end_write(mapping->host->i_sb);
 
 	return ret;
 }
+EXPORT_SYMBOL(splice_write_to_file);
 
+static ssize_t generic_file_splice_write_actor(struct pipe_inode_info *pipe,
+					       struct splice_desc *sd)
+{
+	struct file *out = sd->u.file;
+	struct inode *inode = out->f_mapping->host;
+	ssize_t ret;
+
+	mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
+	ret = file_remove_suid(out);
+	if (!ret) {
+		ret = file_update_time(out);
+		if (!ret)
+			ret = splice_from_pipe_feed(pipe, sd, pipe_to_file);
+	}
+	mutex_unlock(&inode->i_mutex);
+
+	return ret;
+}
+
+/**
+ * generic_file_splice_write - splice data from a pipe to a file
+ * @pipe:	pipe info
+ * @out:	file to write to
+ * @ppos:	position in @out
+ * @len:	number of bytes to splice
+ * @flags:	splice modifier flags
+ *
+ * Description:
+ *    Will either move or copy pages (determined by @flags options) from
+ *    the given pipe inode to the given file.
+ *
+ */
+ssize_t
+generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
+			  loff_t *ppos, size_t len, unsigned int flags)
+{
+	return splice_write_to_file(pipe, out, ppos, len, flags,
+				    generic_file_splice_write_actor);
+}
 EXPORT_SYMBOL(generic_file_splice_write);
 
 static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 09a545a..44ce6f6b 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -62,6 +62,8 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *,
 			   struct splice_desc *);
 typedef int (splice_direct_actor)(struct pipe_inode_info *,
 				  struct splice_desc *);
+typedef ssize_t (splice_write_actor)(struct pipe_inode_info *,
+				  struct splice_desc *);
 
 extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *,
 				loff_t *, size_t, unsigned int,
@@ -83,6 +85,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
 extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
 				      splice_direct_actor *);
 
+extern ssize_t splice_write_to_file(struct pipe_inode_info *, struct file *,
+				    loff_t *, size_t, unsigned int,
+				    splice_write_actor *);
 /*
  * for dynamic pipe sizing
  */
-- 
1.7.10

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/2] vfs: split generic splice code from i_mutex locking
  2011-07-18  4:04 [PATCH 0/2] splice: i_mutex vs splice write deadlock Dave Chinner
@ 2011-07-18  4:04   ` Dave Chinner
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2011-07-18  4:04 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, xfs

From: Dave Chinner <dchinner@redhat.com>

XFS holds locks that should be nested inside the inode->i_mutex when
generic_file_splice_write is called. This function takes the
i_mutex, and so we get a lock inversion that triggers lockdep
warnings and has been found to cause real deadlocks.

XFS does not need the splice code to take the i_mutex to do the page
cache manipulation, so add a new function
generic_file_splice_write_unlocked() that avoids the locking of the
i_mutex for XFS to call.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/splice.c        |   39 +++++++++++++++++++++++++++++++++++----
 include/linux/fs.h |    2 ++
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index aa866d3..c15137d 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -980,8 +980,9 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
  *
  */
 ssize_t
-generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
-			  loff_t *ppos, size_t len, unsigned int flags)
+__generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
+			  loff_t *ppos, size_t len, unsigned int flags,
+			  int need_imutex)
 {
 	struct address_space *mapping = out->f_mapping;
 	struct inode *inode = mapping->host;
@@ -1001,13 +1002,15 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 		if (ret <= 0)
 			break;
 
-		mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
+		if (need_imutex)
+			mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
 		ret = file_remove_suid(out);
 		if (!ret) {
 			file_update_time(out);
 			ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file);
 		}
-		mutex_unlock(&inode->i_mutex);
+		if (need_imutex)
+			mutex_unlock(&inode->i_mutex);
 	} while (ret > 0);
 	splice_from_pipe_end(pipe, &sd);
 
@@ -1033,8 +1036,36 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	return ret;
 }
 
+/**
+ * generic_file_splice_write - splice data from a pipe to a file
+ * @pipe:	pipe info
+ * @out:	file to write to
+ * @ppos:	position in @out
+ * @len:	number of bytes to splice
+ * @flags:	splice modifier flags
+ *
+ * Description:
+ *    Will either move or copy pages (determined by @flags options) from
+ *    the given pipe inode to the given file.
+ *
+ */
+ssize_t
+generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
+			  loff_t *ppos, size_t len, unsigned int flags)
+{
+	return __generic_file_splice_write(pipe, out, ppos, len, flags, 1);
+}
 EXPORT_SYMBOL(generic_file_splice_write);
 
+ssize_t
+generic_file_splice_write_unlocked(struct pipe_inode_info *pipe,
+				   struct file *out, loff_t *ppos,
+				   size_t len, unsigned int flags)
+{
+	return __generic_file_splice_write(pipe, out, ppos, len, flags, 0);
+}
+EXPORT_SYMBOL(generic_file_splice_write_unlocked);
+
 static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 			  struct splice_desc *sd)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 54c49e5..3a8b984 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2368,6 +2368,8 @@ extern ssize_t default_file_splice_read(struct file *, loff_t *,
 		struct pipe_inode_info *, size_t, unsigned int);
 extern ssize_t generic_file_splice_write(struct pipe_inode_info *,
 		struct file *, loff_t *, size_t, unsigned int);
+extern ssize_t generic_file_splice_write_unlocked(struct pipe_inode_info *,
+		struct file *, loff_t *, size_t, unsigned int);
 extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
 		struct file *out, loff_t *, size_t len, unsigned int flags);
 extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/2] vfs: split generic splice code from i_mutex locking
@ 2011-07-18  4:04   ` Dave Chinner
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2011-07-18  4:04 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, xfs

From: Dave Chinner <dchinner@redhat.com>

XFS holds locks that should be nested inside the inode->i_mutex when
generic_file_splice_write is called. This function takes the
i_mutex, and so we get a lock inversion that triggers lockdep
warnings and has been found to cause real deadlocks.

XFS does not need the splice code to take the i_mutex to do the page
cache manipulation, so add a new function
generic_file_splice_write_unlocked() that avoids the locking of the
i_mutex for XFS to call.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/splice.c        |   39 +++++++++++++++++++++++++++++++++++----
 include/linux/fs.h |    2 ++
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index aa866d3..c15137d 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -980,8 +980,9 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
  *
  */
 ssize_t
-generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
-			  loff_t *ppos, size_t len, unsigned int flags)
+__generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
+			  loff_t *ppos, size_t len, unsigned int flags,
+			  int need_imutex)
 {
 	struct address_space *mapping = out->f_mapping;
 	struct inode *inode = mapping->host;
@@ -1001,13 +1002,15 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 		if (ret <= 0)
 			break;
 
-		mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
+		if (need_imutex)
+			mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
 		ret = file_remove_suid(out);
 		if (!ret) {
 			file_update_time(out);
 			ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file);
 		}
-		mutex_unlock(&inode->i_mutex);
+		if (need_imutex)
+			mutex_unlock(&inode->i_mutex);
 	} while (ret > 0);
 	splice_from_pipe_end(pipe, &sd);
 
@@ -1033,8 +1036,36 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	return ret;
 }
 
+/**
+ * generic_file_splice_write - splice data from a pipe to a file
+ * @pipe:	pipe info
+ * @out:	file to write to
+ * @ppos:	position in @out
+ * @len:	number of bytes to splice
+ * @flags:	splice modifier flags
+ *
+ * Description:
+ *    Will either move or copy pages (determined by @flags options) from
+ *    the given pipe inode to the given file.
+ *
+ */
+ssize_t
+generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
+			  loff_t *ppos, size_t len, unsigned int flags)
+{
+	return __generic_file_splice_write(pipe, out, ppos, len, flags, 1);
+}
 EXPORT_SYMBOL(generic_file_splice_write);
 
+ssize_t
+generic_file_splice_write_unlocked(struct pipe_inode_info *pipe,
+				   struct file *out, loff_t *ppos,
+				   size_t len, unsigned int flags)
+{
+	return __generic_file_splice_write(pipe, out, ppos, len, flags, 0);
+}
+EXPORT_SYMBOL(generic_file_splice_write_unlocked);
+
 static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 			  struct splice_desc *sd)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 54c49e5..3a8b984 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2368,6 +2368,8 @@ extern ssize_t default_file_splice_read(struct file *, loff_t *,
 		struct pipe_inode_info *, size_t, unsigned int);
 extern ssize_t generic_file_splice_write(struct pipe_inode_info *,
 		struct file *, loff_t *, size_t, unsigned int);
+extern ssize_t generic_file_splice_write_unlocked(struct pipe_inode_info *,
+		struct file *, loff_t *, size_t, unsigned int);
 extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
 		struct file *out, loff_t *, size_t len, unsigned int flags);
 extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
-- 
1.7.5.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-11-28  2:13 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-08  6:45 [PATCH 0/2] splice: i_mutex vs splice write deadlock V2 Dave Chinner
2011-08-08  6:45 ` Dave Chinner
2011-08-08  6:45 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner
2011-08-08  6:45   ` Dave Chinner
2011-08-09 11:36   ` Jan Kara
2011-08-09 11:36     ` Jan Kara
2011-08-10 10:12   ` Christoph Hellwig
2011-08-10 10:12     ` Christoph Hellwig
2011-08-08  6:45 ` [PATCH 2/2] xfs: fix splice/direct-IO deadlock Dave Chinner
2011-08-08  6:45   ` Dave Chinner
2011-08-10 10:17   ` Christoph Hellwig
2011-08-10 10:17     ` Christoph Hellwig
2011-08-10 10:19 ` [PATCH 0/2] splice: i_mutex vs splice write deadlock V2 Christoph Hellwig
2011-08-10 10:19   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2012-11-28  2:12 [PATCH 0/2] splice: fix direct IO/splice deadlock Dave Chinner
2012-11-28  2:12 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner
2012-11-28  2:12   ` Dave Chinner
2011-07-18  4:04 [PATCH 0/2] splice: i_mutex vs splice write deadlock Dave Chinner
2011-07-18  4:04 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner
2011-07-18  4:04   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.