linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] splice: i_mutex vs splice write deadlock
@ 2011-07-18  4:04 Dave Chinner
  2011-07-18  4:04 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Dave Chinner @ 2011-07-18  4:04 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, xfs

generic_file_splice_write() takes the inode->i_mutex after the
filesystem has taken whatever locks it needs to ensure sanity.
however, this typically violates the locking order of filesystems
with their own locks in that the order is usually i_mutex ->
filesystem lock.

XFS is such a case, and generic_file_splice_write() is generating
lockdep warnings because of lock inversions between the
inode->i_mutex and the XFS_I(inode)->i_iolock. There is also a
reported case of fio causing a deadlock when it mixes IO types
(e.g. splice vs direct IO).

This patch set introduces generic_file_splice_write_unlocked() and
factors the code such that __generic_file_splice_write() will only
lock the i_mutex if called from the locked variant. The second patch
modifies XFS to use the new function.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] vfs: split generic splice code from i_mutex locking
  2011-07-18  4:04 [PATCH 0/2] splice: i_mutex vs splice write deadlock Dave Chinner
@ 2011-07-18  4:04 ` Dave Chinner
  2011-07-18  4:04 ` [PATCH 2/2] xfs: fix splice/direct-IO deadlock Dave Chinner
  2011-07-19  3:10 ` [PATCH 0/2] splice: i_mutex vs splice write deadlock Christoph Hellwig
  2 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2011-07-18  4:04 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, xfs

From: Dave Chinner <dchinner@redhat.com>

XFS holds locks that should be nested inside the inode->i_mutex when
generic_file_splice_write is called. This function takes the
i_mutex, and so we get a lock inversion that triggers lockdep
warnings and has been found to cause real deadlocks.

XFS does not need the splice code to take the i_mutex to do the page
cache manipulation, so add a new function
generic_file_splice_write_unlocked() that avoids the locking of the
i_mutex for XFS to call.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/splice.c        |   39 +++++++++++++++++++++++++++++++++++----
 include/linux/fs.h |    2 ++
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index aa866d3..c15137d 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -980,8 +980,9 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
  *
  */
 ssize_t
-generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
-			  loff_t *ppos, size_t len, unsigned int flags)
+__generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
+			  loff_t *ppos, size_t len, unsigned int flags,
+			  int need_imutex)
 {
 	struct address_space *mapping = out->f_mapping;
 	struct inode *inode = mapping->host;
@@ -1001,13 +1002,15 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 		if (ret <= 0)
 			break;
 
-		mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
+		if (need_imutex)
+			mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
 		ret = file_remove_suid(out);
 		if (!ret) {
 			file_update_time(out);
 			ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file);
 		}
-		mutex_unlock(&inode->i_mutex);
+		if (need_imutex)
+			mutex_unlock(&inode->i_mutex);
 	} while (ret > 0);
 	splice_from_pipe_end(pipe, &sd);
 
@@ -1033,8 +1036,36 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	return ret;
 }
 
+/**
+ * generic_file_splice_write - splice data from a pipe to a file
+ * @pipe:	pipe info
+ * @out:	file to write to
+ * @ppos:	position in @out
+ * @len:	number of bytes to splice
+ * @flags:	splice modifier flags
+ *
+ * Description:
+ *    Will either move or copy pages (determined by @flags options) from
+ *    the given pipe inode to the given file.
+ *
+ */
+ssize_t
+generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
+			  loff_t *ppos, size_t len, unsigned int flags)
+{
+	return __generic_file_splice_write(pipe, out, ppos, len, flags, 1);
+}
 EXPORT_SYMBOL(generic_file_splice_write);
 
+ssize_t
+generic_file_splice_write_unlocked(struct pipe_inode_info *pipe,
+				   struct file *out, loff_t *ppos,
+				   size_t len, unsigned int flags)
+{
+	return __generic_file_splice_write(pipe, out, ppos, len, flags, 0);
+}
+EXPORT_SYMBOL(generic_file_splice_write_unlocked);
+
 static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 			  struct splice_desc *sd)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 54c49e5..3a8b984 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2368,6 +2368,8 @@ extern ssize_t default_file_splice_read(struct file *, loff_t *,
 		struct pipe_inode_info *, size_t, unsigned int);
 extern ssize_t generic_file_splice_write(struct pipe_inode_info *,
 		struct file *, loff_t *, size_t, unsigned int);
+extern ssize_t generic_file_splice_write_unlocked(struct pipe_inode_info *,
+		struct file *, loff_t *, size_t, unsigned int);
 extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
 		struct file *out, loff_t *, size_t len, unsigned int flags);
 extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] xfs: fix splice/direct-IO deadlock
  2011-07-18  4:04 [PATCH 0/2] splice: i_mutex vs splice write deadlock Dave Chinner
  2011-07-18  4:04 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner
@ 2011-07-18  4:04 ` Dave Chinner
  2011-07-19  3:10 ` [PATCH 0/2] splice: i_mutex vs splice write deadlock Christoph Hellwig
  2 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2011-07-18  4:04 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, xfs

From: Dave Chinner <dchinner@redhat.com>

lockdep reports splice vs direct-io write lock inversions due to
generic_file_splice_write() taking the inode->i_mutex inside
XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can
deadlock.  Use generic_file_splice_write_unlocked() because the
i_mutex does not need to be held over the operations that are done
in the XFS splice write path.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_file.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index f51a384..1e641e6 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -463,7 +463,8 @@ xfs_file_splice_write(
 
 	trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
 
-	ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags);
+	ret = generic_file_splice_write_unlocked(pipe, outfilp, ppos,
+						count, flags);
 
 	xfs_aio_write_isize_update(inode, ppos, ret);
 	xfs_aio_write_newsize_update(ip);
-- 
1.7.5.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] splice: i_mutex vs splice write deadlock
  2011-07-18  4:04 [PATCH 0/2] splice: i_mutex vs splice write deadlock Dave Chinner
  2011-07-18  4:04 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner
  2011-07-18  4:04 ` [PATCH 2/2] xfs: fix splice/direct-IO deadlock Dave Chinner
@ 2011-07-19  3:10 ` Christoph Hellwig
  2011-07-19  4:07   ` Dave Chinner
  2 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2011-07-19  3:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, xfs

I don't really like this very much.  Not taking the i_mutex at all
makes the splice_write method in XFS use different locking than
everyone else, and different from the normal XFS write path.

For example ocfs2 which has the same locking issues just has an
own implementation of the splice_write method, which isn't
too nice but at least marginally better.  I think the right
fix for both xfs and ocfs2 would be to have a generic_file_splice_write
variant that takes an "actor" function pointer, which defaults to
a smaller wrapper around file_remove_suid, file_update_time and
splice_from_pipe_feed, and then XFS and ocfs2 can provide their
own actors that add the additional locking.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] splice: i_mutex vs splice write deadlock
  2011-07-19  3:10 ` [PATCH 0/2] splice: i_mutex vs splice write deadlock Christoph Hellwig
@ 2011-07-19  4:07   ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2011-07-19  4:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-kernel, xfs

On Mon, Jul 18, 2011 at 11:10:03PM -0400, Christoph Hellwig wrote:
> I don't really like this very much.  Not taking the i_mutex at all
> makes the splice_write method in XFS use different locking than
> everyone else, and different from the normal XFS write path.
> 
> For example ocfs2 which has the same locking issues just has an
> own implementation of the splice_write method, which isn't
> too nice but at least marginally better.  I think the right
> fix for both xfs and ocfs2 would be to have a generic_file_splice_write
> variant that takes an "actor" function pointer, which defaults to
> a smaller wrapper around file_remove_suid, file_update_time and
> splice_from_pipe_feed, and then XFS and ocfs2 can provide their
> own actors that add the additional locking.

Yeah I thought about doing that, but wanted to try a simpler version
first. I'll code up the actor variant.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-07-19  4:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-18  4:04 [PATCH 0/2] splice: i_mutex vs splice write deadlock Dave Chinner
2011-07-18  4:04 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner
2011-07-18  4:04 ` [PATCH 2/2] xfs: fix splice/direct-IO deadlock Dave Chinner
2011-07-19  3:10 ` [PATCH 0/2] splice: i_mutex vs splice write deadlock Christoph Hellwig
2011-07-19  4:07   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).