* [PATCH 0/2] splice: i_mutex vs splice write deadlock V2 @ 2011-08-08 6:45 ` Dave Chinner 0 siblings, 0 replies; 22+ messages in thread From: Dave Chinner @ 2011-08-08 6:45 UTC (permalink / raw) To: xfs; +Cc: linux-fsdevel generic_file_splice_write() takes the inode->i_mutex after the filesystem has taken whatever locks it needs to ensure sanity. however, this typically violates the locking order of filesystems with their own locks in that the order is usually i_mutex -> filesystem lock. XFS is such a case, and generic_file_splice_write() is generating lockdep warnings because of lock inversions between the inode->i_mutex and the XFS_I(inode)->i_iolock. There is also a reported case of fio causing a deadlock when it mixes IO types (e.g. splice vs direct IO). Version 2: - add a new function to take an actor to do the work of splicing the data to the file. Convert generic_file_splice_write to use this, and make XFS call it with a different actor fuction that avoids the i_mutex. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 0/2] splice: i_mutex vs splice write deadlock V2 @ 2011-08-08 6:45 ` Dave Chinner 0 siblings, 0 replies; 22+ messages in thread From: Dave Chinner @ 2011-08-08 6:45 UTC (permalink / raw) To: xfs; +Cc: linux-fsdevel generic_file_splice_write() takes the inode->i_mutex after the filesystem has taken whatever locks it needs to ensure sanity. however, this typically violates the locking order of filesystems with their own locks in that the order is usually i_mutex -> filesystem lock. XFS is such a case, and generic_file_splice_write() is generating lockdep warnings because of lock inversions between the inode->i_mutex and the XFS_I(inode)->i_iolock. There is also a reported case of fio causing a deadlock when it mixes IO types (e.g. splice vs direct IO). Version 2: - add a new function to take an actor to do the work of splicing the data to the file. Convert generic_file_splice_write to use this, and make XFS call it with a different actor fuction that avoids the i_mutex. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/2] vfs: split generic splice code from i_mutex locking 2011-08-08 6:45 ` Dave Chinner @ 2011-08-08 6:45 ` Dave Chinner -1 siblings, 0 replies; 22+ messages in thread From: Dave Chinner @ 2011-08-08 6:45 UTC (permalink / raw) To: xfs; +Cc: linux-fsdevel From: Dave Chinner <dchinner@redhat.com> XFS holds locks that should be nested inside the inode->i_mutex when generic_file_splice_write is called. This function takes the i_mutex, and so we get a lock inversion that triggers lockdep warnings and has been found to cause real deadlocks. XFS does not need the splice code to take the i_mutex to do the page cache manipulation, so modify the generic splice code to use an actor function for the code that currently requires the i_mutex to be taken. Convert generic_file_splice_write() to use this new interface supplying a generic actor function that performs the same actions as the existing code. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/splice.c | 58 ++++++++++++++++++++++++++++++++++++++---------- include/linux/splice.h | 5 ++++ 2 files changed, 51 insertions(+), 12 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index fa2defa..a02dddc 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -967,24 +967,24 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out, } /** - * generic_file_splice_write - splice data from a pipe to a file + * splice_write_to_file - splice data from a pipe to a file * @pipe: pipe info * @out: file to write to * @ppos: position in @out * @len: number of bytes to splice * @flags: splice modifier flags + * @actor: worker that does the splicing from the pipe to the file * * Description: * Will either move or copy pages (determined by @flags options) from * the given pipe inode to the given file. * */ -ssize_t -generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, - loff_t *ppos, size_t len, unsigned int flags) +ssize_t splice_write_to_file(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags, + splice_write_actor actor) { struct address_space *mapping = out->f_mapping; - struct inode *inode = mapping->host; struct splice_desc sd = { .total_len = len, .flags = flags, @@ -1001,13 +1001,8 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, if (ret <= 0) break; - mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); - ret = file_remove_suid(out); - if (!ret) { - file_update_time(out); - ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file); - } - mutex_unlock(&inode->i_mutex); + ret = actor(pipe, &sd); + } while (ret > 0); splice_from_pipe_end(pipe, &sd); @@ -1032,7 +1027,46 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, return ret; } +EXPORT_SYMBOL(splice_write_to_file); +static ssize_t generic_file_splice_write_actor(struct pipe_inode_info *pipe, + struct splice_desc *sd) +{ + struct file *out = sd->u.file; + struct inode *inode = out->f_mapping->host; + ssize_t ret; + + mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); + ret = file_remove_suid(out); + if (!ret) { + file_update_time(out); + ret = splice_from_pipe_feed(pipe, sd, pipe_to_file); + } + mutex_unlock(&inode->i_mutex); + + return ret; +} + +/** + * generic_file_splice_write - splice data from a pipe to a file + * @pipe: pipe info + * @out: file to write to + * @ppos: position in @out + * @len: number of bytes to splice + * @flags: splice modifier flags + * + * Description: + * Will either move or copy pages (determined by @flags options) from + * the given pipe inode to the given file. + * + */ +ssize_t +generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags) +{ + return splice_write_to_file(pipe, out, ppos, len, flags, + generic_file_splice_write_actor); +} EXPORT_SYMBOL(generic_file_splice_write); static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf, diff --git a/include/linux/splice.h b/include/linux/splice.h index 26e5b61..bd14281 100644 --- a/include/linux/splice.h +++ b/include/linux/splice.h @@ -61,6 +61,8 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *, struct splice_desc *); typedef int (splice_direct_actor)(struct pipe_inode_info *, struct splice_desc *); +typedef ssize_t (splice_write_actor)(struct pipe_inode_info *, + struct splice_desc *); extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int, @@ -82,6 +84,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *, extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *, splice_direct_actor *); +extern ssize_t splice_write_to_file(struct pipe_inode_info *, struct file *, + loff_t *, size_t, unsigned int, + splice_write_actor *); /* * for dynamic pipe sizing */ -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 1/2] vfs: split generic splice code from i_mutex locking @ 2011-08-08 6:45 ` Dave Chinner 0 siblings, 0 replies; 22+ messages in thread From: Dave Chinner @ 2011-08-08 6:45 UTC (permalink / raw) To: xfs; +Cc: linux-fsdevel From: Dave Chinner <dchinner@redhat.com> XFS holds locks that should be nested inside the inode->i_mutex when generic_file_splice_write is called. This function takes the i_mutex, and so we get a lock inversion that triggers lockdep warnings and has been found to cause real deadlocks. XFS does not need the splice code to take the i_mutex to do the page cache manipulation, so modify the generic splice code to use an actor function for the code that currently requires the i_mutex to be taken. Convert generic_file_splice_write() to use this new interface supplying a generic actor function that performs the same actions as the existing code. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/splice.c | 58 ++++++++++++++++++++++++++++++++++++++---------- include/linux/splice.h | 5 ++++ 2 files changed, 51 insertions(+), 12 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index fa2defa..a02dddc 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -967,24 +967,24 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out, } /** - * generic_file_splice_write - splice data from a pipe to a file + * splice_write_to_file - splice data from a pipe to a file * @pipe: pipe info * @out: file to write to * @ppos: position in @out * @len: number of bytes to splice * @flags: splice modifier flags + * @actor: worker that does the splicing from the pipe to the file * * Description: * Will either move or copy pages (determined by @flags options) from * the given pipe inode to the given file. * */ -ssize_t -generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, - loff_t *ppos, size_t len, unsigned int flags) +ssize_t splice_write_to_file(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags, + splice_write_actor actor) { struct address_space *mapping = out->f_mapping; - struct inode *inode = mapping->host; struct splice_desc sd = { .total_len = len, .flags = flags, @@ -1001,13 +1001,8 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, if (ret <= 0) break; - mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); - ret = file_remove_suid(out); - if (!ret) { - file_update_time(out); - ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file); - } - mutex_unlock(&inode->i_mutex); + ret = actor(pipe, &sd); + } while (ret > 0); splice_from_pipe_end(pipe, &sd); @@ -1032,7 +1027,46 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, return ret; } +EXPORT_SYMBOL(splice_write_to_file); +static ssize_t generic_file_splice_write_actor(struct pipe_inode_info *pipe, + struct splice_desc *sd) +{ + struct file *out = sd->u.file; + struct inode *inode = out->f_mapping->host; + ssize_t ret; + + mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); + ret = file_remove_suid(out); + if (!ret) { + file_update_time(out); + ret = splice_from_pipe_feed(pipe, sd, pipe_to_file); + } + mutex_unlock(&inode->i_mutex); + + return ret; +} + +/** + * generic_file_splice_write - splice data from a pipe to a file + * @pipe: pipe info + * @out: file to write to + * @ppos: position in @out + * @len: number of bytes to splice + * @flags: splice modifier flags + * + * Description: + * Will either move or copy pages (determined by @flags options) from + * the given pipe inode to the given file. + * + */ +ssize_t +generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags) +{ + return splice_write_to_file(pipe, out, ppos, len, flags, + generic_file_splice_write_actor); +} EXPORT_SYMBOL(generic_file_splice_write); static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf, diff --git a/include/linux/splice.h b/include/linux/splice.h index 26e5b61..bd14281 100644 --- a/include/linux/splice.h +++ b/include/linux/splice.h @@ -61,6 +61,8 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *, struct splice_desc *); typedef int (splice_direct_actor)(struct pipe_inode_info *, struct splice_desc *); +typedef ssize_t (splice_write_actor)(struct pipe_inode_info *, + struct splice_desc *); extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int, @@ -82,6 +84,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *, extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *, splice_direct_actor *); +extern ssize_t splice_write_to_file(struct pipe_inode_info *, struct file *, + loff_t *, size_t, unsigned int, + splice_write_actor *); /* * for dynamic pipe sizing */ -- 1.7.5.4 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] vfs: split generic splice code from i_mutex locking 2011-08-08 6:45 ` Dave Chinner @ 2011-08-09 11:36 ` Jan Kara -1 siblings, 0 replies; 22+ messages in thread From: Jan Kara @ 2011-08-09 11:36 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, linux-fsdevel On Mon 08-08-11 16:45:26, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > XFS holds locks that should be nested inside the inode->i_mutex when > generic_file_splice_write is called. This function takes the > i_mutex, and so we get a lock inversion that triggers lockdep > warnings and has been found to cause real deadlocks. > > XFS does not need the splice code to take the i_mutex to do the page > cache manipulation, so modify the generic splice code to use an > actor function for the code that currently requires the i_mutex to > be taken. Convert generic_file_splice_write() to use this new > interface supplying a generic actor function that performs the same > actions as the existing code. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> The patch looks good to me. Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/splice.c | 58 ++++++++++++++++++++++++++++++++++++++---------- > include/linux/splice.h | 5 ++++ > 2 files changed, 51 insertions(+), 12 deletions(-) > > diff --git a/fs/splice.c b/fs/splice.c > index fa2defa..a02dddc 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -967,24 +967,24 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out, > } > > /** > - * generic_file_splice_write - splice data from a pipe to a file > + * splice_write_to_file - splice data from a pipe to a file > * @pipe: pipe info > * @out: file to write to > * @ppos: position in @out > * @len: number of bytes to splice > * @flags: splice modifier flags > + * @actor: worker that does the splicing from the pipe to the file > * > * Description: > * Will either move or copy pages (determined by @flags options) from > * the given pipe inode to the given file. > * > */ > -ssize_t > -generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > - loff_t *ppos, size_t len, unsigned int flags) > +ssize_t splice_write_to_file(struct pipe_inode_info *pipe, struct file *out, > + loff_t *ppos, size_t len, unsigned int flags, > + splice_write_actor actor) > { > struct address_space *mapping = out->f_mapping; > - struct inode *inode = mapping->host; > struct splice_desc sd = { > .total_len = len, > .flags = flags, > @@ -1001,13 +1001,8 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > if (ret <= 0) > break; > > - mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); > - ret = file_remove_suid(out); > - if (!ret) { > - file_update_time(out); > - ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file); > - } > - mutex_unlock(&inode->i_mutex); > + ret = actor(pipe, &sd); > + > } while (ret > 0); > splice_from_pipe_end(pipe, &sd); > > @@ -1032,7 +1027,46 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > > return ret; > } > +EXPORT_SYMBOL(splice_write_to_file); > > +static ssize_t generic_file_splice_write_actor(struct pipe_inode_info *pipe, > + struct splice_desc *sd) > +{ > + struct file *out = sd->u.file; > + struct inode *inode = out->f_mapping->host; > + ssize_t ret; > + > + mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); > + ret = file_remove_suid(out); > + if (!ret) { > + file_update_time(out); > + ret = splice_from_pipe_feed(pipe, sd, pipe_to_file); > + } > + mutex_unlock(&inode->i_mutex); > + > + return ret; > +} > + > +/** > + * generic_file_splice_write - splice data from a pipe to a file > + * @pipe: pipe info > + * @out: file to write to > + * @ppos: position in @out > + * @len: number of bytes to splice > + * @flags: splice modifier flags > + * > + * Description: > + * Will either move or copy pages (determined by @flags options) from > + * the given pipe inode to the given file. > + * > + */ > +ssize_t > +generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > + loff_t *ppos, size_t len, unsigned int flags) > +{ > + return splice_write_to_file(pipe, out, ppos, len, flags, > + generic_file_splice_write_actor); > +} > EXPORT_SYMBOL(generic_file_splice_write); > > static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > diff --git a/include/linux/splice.h b/include/linux/splice.h > index 26e5b61..bd14281 100644 > --- a/include/linux/splice.h > +++ b/include/linux/splice.h > @@ -61,6 +61,8 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *, > struct splice_desc *); > typedef int (splice_direct_actor)(struct pipe_inode_info *, > struct splice_desc *); > +typedef ssize_t (splice_write_actor)(struct pipe_inode_info *, > + struct splice_desc *); > > extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *, > loff_t *, size_t, unsigned int, > @@ -82,6 +84,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *, > extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *, > splice_direct_actor *); > > +extern ssize_t splice_write_to_file(struct pipe_inode_info *, struct file *, > + loff_t *, size_t, unsigned int, > + splice_write_actor *); > /* > * for dynamic pipe sizing > */ > -- > 1.7.5.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] vfs: split generic splice code from i_mutex locking @ 2011-08-09 11:36 ` Jan Kara 0 siblings, 0 replies; 22+ messages in thread From: Jan Kara @ 2011-08-09 11:36 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-fsdevel, xfs On Mon 08-08-11 16:45:26, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > XFS holds locks that should be nested inside the inode->i_mutex when > generic_file_splice_write is called. This function takes the > i_mutex, and so we get a lock inversion that triggers lockdep > warnings and has been found to cause real deadlocks. > > XFS does not need the splice code to take the i_mutex to do the page > cache manipulation, so modify the generic splice code to use an > actor function for the code that currently requires the i_mutex to > be taken. Convert generic_file_splice_write() to use this new > interface supplying a generic actor function that performs the same > actions as the existing code. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> The patch looks good to me. Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/splice.c | 58 ++++++++++++++++++++++++++++++++++++++---------- > include/linux/splice.h | 5 ++++ > 2 files changed, 51 insertions(+), 12 deletions(-) > > diff --git a/fs/splice.c b/fs/splice.c > index fa2defa..a02dddc 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -967,24 +967,24 @@ ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out, > } > > /** > - * generic_file_splice_write - splice data from a pipe to a file > + * splice_write_to_file - splice data from a pipe to a file > * @pipe: pipe info > * @out: file to write to > * @ppos: position in @out > * @len: number of bytes to splice > * @flags: splice modifier flags > + * @actor: worker that does the splicing from the pipe to the file > * > * Description: > * Will either move or copy pages (determined by @flags options) from > * the given pipe inode to the given file. > * > */ > -ssize_t > -generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > - loff_t *ppos, size_t len, unsigned int flags) > +ssize_t splice_write_to_file(struct pipe_inode_info *pipe, struct file *out, > + loff_t *ppos, size_t len, unsigned int flags, > + splice_write_actor actor) > { > struct address_space *mapping = out->f_mapping; > - struct inode *inode = mapping->host; > struct splice_desc sd = { > .total_len = len, > .flags = flags, > @@ -1001,13 +1001,8 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > if (ret <= 0) > break; > > - mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); > - ret = file_remove_suid(out); > - if (!ret) { > - file_update_time(out); > - ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file); > - } > - mutex_unlock(&inode->i_mutex); > + ret = actor(pipe, &sd); > + > } while (ret > 0); > splice_from_pipe_end(pipe, &sd); > > @@ -1032,7 +1027,46 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > > return ret; > } > +EXPORT_SYMBOL(splice_write_to_file); > > +static ssize_t generic_file_splice_write_actor(struct pipe_inode_info *pipe, > + struct splice_desc *sd) > +{ > + struct file *out = sd->u.file; > + struct inode *inode = out->f_mapping->host; > + ssize_t ret; > + > + mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); > + ret = file_remove_suid(out); > + if (!ret) { > + file_update_time(out); > + ret = splice_from_pipe_feed(pipe, sd, pipe_to_file); > + } > + mutex_unlock(&inode->i_mutex); > + > + return ret; > +} > + > +/** > + * generic_file_splice_write - splice data from a pipe to a file > + * @pipe: pipe info > + * @out: file to write to > + * @ppos: position in @out > + * @len: number of bytes to splice > + * @flags: splice modifier flags > + * > + * Description: > + * Will either move or copy pages (determined by @flags options) from > + * the given pipe inode to the given file. > + * > + */ > +ssize_t > +generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out, > + loff_t *ppos, size_t len, unsigned int flags) > +{ > + return splice_write_to_file(pipe, out, ppos, len, flags, > + generic_file_splice_write_actor); > +} > EXPORT_SYMBOL(generic_file_splice_write); > > static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > diff --git a/include/linux/splice.h b/include/linux/splice.h > index 26e5b61..bd14281 100644 > --- a/include/linux/splice.h > +++ b/include/linux/splice.h > @@ -61,6 +61,8 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *, > struct splice_desc *); > typedef int (splice_direct_actor)(struct pipe_inode_info *, > struct splice_desc *); > +typedef ssize_t (splice_write_actor)(struct pipe_inode_info *, > + struct splice_desc *); > > extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *, > loff_t *, size_t, unsigned int, > @@ -82,6 +84,9 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *, > extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *, > splice_direct_actor *); > > +extern ssize_t splice_write_to_file(struct pipe_inode_info *, struct file *, > + loff_t *, size_t, unsigned int, > + splice_write_actor *); > /* > * for dynamic pipe sizing > */ > -- > 1.7.5.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara <jack@suse.cz> SUSE Labs, CR _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] vfs: split generic splice code from i_mutex locking 2011-08-08 6:45 ` Dave Chinner @ 2011-08-10 10:12 ` Christoph Hellwig -1 siblings, 0 replies; 22+ messages in thread From: Christoph Hellwig @ 2011-08-10 10:12 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, linux-fsdevel Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] vfs: split generic splice code from i_mutex locking @ 2011-08-10 10:12 ` Christoph Hellwig 0 siblings, 0 replies; 22+ messages in thread From: Christoph Hellwig @ 2011-08-10 10:12 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-fsdevel, xfs Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 2/2] xfs: fix splice/direct-IO deadlock 2011-08-08 6:45 ` Dave Chinner @ 2011-08-08 6:45 ` Dave Chinner -1 siblings, 0 replies; 22+ messages in thread From: Dave Chinner @ 2011-08-08 6:45 UTC (permalink / raw) To: xfs; +Cc: linux-fsdevel From: Dave Chinner <dchinner@redhat.com> lockdep reports splice vs direct-io write lock inversions due to generic_file_splice_write() taking the inode->i_mutex inside XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can deadlock. Use splice_write_to_file() with an actor that does not take the i_mutex to avoid these problems. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/linux-2.6/xfs_file.c | 41 +++++++++++++++++++++++++++++------------ 1 files changed, 29 insertions(+), 12 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c index 62a5022..959d8b2 100644 --- a/fs/xfs/linux-2.6/xfs_file.c +++ b/fs/xfs/linux-2.6/xfs_file.c @@ -38,6 +38,7 @@ #include <linux/dcache.h> #include <linux/falloc.h> +#include <linux/splice.h> static const struct vm_operations_struct xfs_file_vm_ops; @@ -431,13 +432,29 @@ xfs_aio_write_newsize_update( } } +static ssize_t +xfs_file_splice_write_actor( + struct pipe_inode_info *pipe, + struct splice_desc *sd) +{ + struct file *out = sd->u.file; + ssize_t ret; + + ret = file_remove_suid(out); + if (!ret) { + file_update_time(out); + ret = splice_from_pipe_feed(pipe, sd, pipe_to_file); + } + + return ret; +} + /* - * xfs_file_splice_write() does not use xfs_rw_ilock() because - * generic_file_splice_write() takes the i_mutex itself. This, in theory, - * couuld cause lock inversions between the aio_write path and the splice path - * if someone is doing concurrent splice(2) based writes and write(2) based - * writes to the same inode. The only real way to fix this is to re-implement - * the generic code here with correct locking orders. + * xfs_file_splice_write() does not use the generic file splice write path + * because that takes the i_mutex, causing lock inversions with the IOLOCK. + * Instead, we call splice_write_to_file() directly with our own actor that does + * not take the i_mutex. This allows us to use the xfs_rw_ilock() functions like + * the rest of the code and hence avoid lock inversions and deadlocks. */ STATIC ssize_t xfs_file_splice_write( @@ -461,22 +478,22 @@ xfs_file_splice_write( if (XFS_FORCED_SHUTDOWN(ip->i_mount)) return -EIO; - xfs_ilock(ip, XFS_IOLOCK_EXCL); + xfs_rw_ilock(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL); new_size = *ppos + count; - - xfs_ilock(ip, XFS_ILOCK_EXCL); if (new_size > ip->i_size) ip->i_new_size = new_size; - xfs_iunlock(ip, XFS_ILOCK_EXCL); + + xfs_rw_iunlock(ip, XFS_ILOCK_EXCL); trace_xfs_file_splice_write(ip, count, *ppos, ioflags); - ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags); + ret = splice_write_to_file(pipe, outfilp, ppos, count, flags, + xfs_file_splice_write_actor); xfs_aio_write_isize_update(inode, ppos, ret); xfs_aio_write_newsize_update(ip, new_size); - xfs_iunlock(ip, XFS_IOLOCK_EXCL); + xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL); return ret; } -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/2] xfs: fix splice/direct-IO deadlock @ 2011-08-08 6:45 ` Dave Chinner 0 siblings, 0 replies; 22+ messages in thread From: Dave Chinner @ 2011-08-08 6:45 UTC (permalink / raw) To: xfs; +Cc: linux-fsdevel From: Dave Chinner <dchinner@redhat.com> lockdep reports splice vs direct-io write lock inversions due to generic_file_splice_write() taking the inode->i_mutex inside XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can deadlock. Use splice_write_to_file() with an actor that does not take the i_mutex to avoid these problems. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/linux-2.6/xfs_file.c | 41 +++++++++++++++++++++++++++++------------ 1 files changed, 29 insertions(+), 12 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c index 62a5022..959d8b2 100644 --- a/fs/xfs/linux-2.6/xfs_file.c +++ b/fs/xfs/linux-2.6/xfs_file.c @@ -38,6 +38,7 @@ #include <linux/dcache.h> #include <linux/falloc.h> +#include <linux/splice.h> static const struct vm_operations_struct xfs_file_vm_ops; @@ -431,13 +432,29 @@ xfs_aio_write_newsize_update( } } +static ssize_t +xfs_file_splice_write_actor( + struct pipe_inode_info *pipe, + struct splice_desc *sd) +{ + struct file *out = sd->u.file; + ssize_t ret; + + ret = file_remove_suid(out); + if (!ret) { + file_update_time(out); + ret = splice_from_pipe_feed(pipe, sd, pipe_to_file); + } + + return ret; +} + /* - * xfs_file_splice_write() does not use xfs_rw_ilock() because - * generic_file_splice_write() takes the i_mutex itself. This, in theory, - * couuld cause lock inversions between the aio_write path and the splice path - * if someone is doing concurrent splice(2) based writes and write(2) based - * writes to the same inode. The only real way to fix this is to re-implement - * the generic code here with correct locking orders. + * xfs_file_splice_write() does not use the generic file splice write path + * because that takes the i_mutex, causing lock inversions with the IOLOCK. + * Instead, we call splice_write_to_file() directly with our own actor that does + * not take the i_mutex. This allows us to use the xfs_rw_ilock() functions like + * the rest of the code and hence avoid lock inversions and deadlocks. */ STATIC ssize_t xfs_file_splice_write( @@ -461,22 +478,22 @@ xfs_file_splice_write( if (XFS_FORCED_SHUTDOWN(ip->i_mount)) return -EIO; - xfs_ilock(ip, XFS_IOLOCK_EXCL); + xfs_rw_ilock(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL); new_size = *ppos + count; - - xfs_ilock(ip, XFS_ILOCK_EXCL); if (new_size > ip->i_size) ip->i_new_size = new_size; - xfs_iunlock(ip, XFS_ILOCK_EXCL); + + xfs_rw_iunlock(ip, XFS_ILOCK_EXCL); trace_xfs_file_splice_write(ip, count, *ppos, ioflags); - ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags); + ret = splice_write_to_file(pipe, outfilp, ppos, count, flags, + xfs_file_splice_write_actor); xfs_aio_write_isize_update(inode, ppos, ret); xfs_aio_write_newsize_update(ip, new_size); - xfs_iunlock(ip, XFS_IOLOCK_EXCL); + xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL); return ret; } -- 1.7.5.4 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] xfs: fix splice/direct-IO deadlock 2011-08-08 6:45 ` Dave Chinner @ 2011-08-10 10:17 ` Christoph Hellwig -1 siblings, 0 replies; 22+ messages in thread From: Christoph Hellwig @ 2011-08-10 10:17 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, linux-fsdevel On Mon, Aug 08, 2011 at 04:45:27PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > lockdep reports splice vs direct-io write lock inversions due to > generic_file_splice_write() taking the inode->i_mutex inside > XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can > deadlock. Use splice_write_to_file() with an actor that does not > take the i_mutex to avoid these problems. I don't think the locking model is quite correct yet. We'll still hold the iolock and i_mutex over splice_from_pipe_begin/splice_from_pipe_next/splice_from_pipe_end, which call into the pipe code, and take other i_mutex instances that may deadlock against ours. It also means we hold the iolock over generic_write_sync which calls into ->fsync which takes the iolock with my latests changes. I think we'll have to take the locking and i_size updates into the actor, and behave like the other filesystems. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] xfs: fix splice/direct-IO deadlock @ 2011-08-10 10:17 ` Christoph Hellwig 0 siblings, 0 replies; 22+ messages in thread From: Christoph Hellwig @ 2011-08-10 10:17 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-fsdevel, xfs On Mon, Aug 08, 2011 at 04:45:27PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > lockdep reports splice vs direct-io write lock inversions due to > generic_file_splice_write() taking the inode->i_mutex inside > XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can > deadlock. Use splice_write_to_file() with an actor that does not > take the i_mutex to avoid these problems. I don't think the locking model is quite correct yet. We'll still hold the iolock and i_mutex over splice_from_pipe_begin/splice_from_pipe_next/splice_from_pipe_end, which call into the pipe code, and take other i_mutex instances that may deadlock against ours. It also means we hold the iolock over generic_write_sync which calls into ->fsync which takes the iolock with my latests changes. I think we'll have to take the locking and i_size updates into the actor, and behave like the other filesystems. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/2] splice: i_mutex vs splice write deadlock V2 2011-08-08 6:45 ` Dave Chinner @ 2011-08-10 10:19 ` Christoph Hellwig -1 siblings, 0 replies; 22+ messages in thread From: Christoph Hellwig @ 2011-08-10 10:19 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, linux-fsdevel, mfasheh, jlbec On Mon, Aug 08, 2011 at 04:45:25PM +1000, Dave Chinner wrote: > generic_file_splice_write() takes the inode->i_mutex after the > filesystem has taken whatever locks it needs to ensure sanity. > however, this typically violates the locking order of filesystems > with their own locks in that the order is usually i_mutex -> > filesystem lock. > > XFS is such a case, and generic_file_splice_write() is generating > lockdep warnings because of lock inversions between the > inode->i_mutex and the XFS_I(inode)->i_iolock. There is also a > reported case of fio causing a deadlock when it mixes IO types > (e.g. splice vs direct IO). Another case is ocfs2, which looks like a perfect candidate to be converted over to your infrastructure. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/2] splice: i_mutex vs splice write deadlock V2 @ 2011-08-10 10:19 ` Christoph Hellwig 0 siblings, 0 replies; 22+ messages in thread From: Christoph Hellwig @ 2011-08-10 10:19 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-fsdevel, mfasheh, jlbec, xfs On Mon, Aug 08, 2011 at 04:45:25PM +1000, Dave Chinner wrote: > generic_file_splice_write() takes the inode->i_mutex after the > filesystem has taken whatever locks it needs to ensure sanity. > however, this typically violates the locking order of filesystems > with their own locks in that the order is usually i_mutex -> > filesystem lock. > > XFS is such a case, and generic_file_splice_write() is generating > lockdep warnings because of lock inversions between the > inode->i_mutex and the XFS_I(inode)->i_iolock. There is also a > reported case of fio causing a deadlock when it mixes IO types > (e.g. splice vs direct IO). Another case is ocfs2, which looks like a perfect candidate to be converted over to your infrastructure. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 0/2] splice: fix direct IO/splice deadlock @ 2012-11-28 2:12 Dave Chinner 2012-11-28 2:12 ` Dave Chinner 0 siblings, 1 reply; 22+ messages in thread From: Dave Chinner @ 2012-11-28 2:12 UTC (permalink / raw) To: linux-fsdevel; +Cc: xfs Hi Folks, These two patches have been sitting in my tree for some time. I think I've even posted them before. Basically, XFS can deadlock when you use splice and direct IO on the same file concurrently because the splice write inverts the locking order of the i_mutex and the xfs inode i_iolock. The first patch moves the guts of the i_mutex protected region of the splice write to an actor function, and the second uses this structure to enable XFS to provide an actor that uses the correct locking order and hence avoid the deadlock. Comments? Cheers, Dave. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 2/2] xfs: fix splice/direct-IO deadlock 2012-11-28 2:12 [PATCH 0/2] splice: fix direct IO/splice deadlock Dave Chinner @ 2012-11-28 2:12 ` Dave Chinner 0 siblings, 0 replies; 22+ messages in thread From: Dave Chinner @ 2012-11-28 2:12 UTC (permalink / raw) To: linux-fsdevel; +Cc: xfs From: Dave Chinner <dchinner@redhat.com> lockdep reports splice vs direct-io write lock inversions due to generic_file_splice_write() taking the inode->i_mutex inside XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can deadlock. Remove the XFS_IOLOCK_EXCL locking context from the outer function and drive it inwards to the actor function that only locks the inode when the lock is really needed, Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_file.c | 40 +++++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 2e79c54..21b6f0b 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -39,6 +39,7 @@ #include <linux/dcache.h> #include <linux/falloc.h> #include <linux/pagevec.h> +#include <linux/splice.h> static const struct vm_operations_struct xfs_file_vm_ops; @@ -344,13 +345,33 @@ xfs_file_splice_read( return ret; } +static ssize_t +xfs_file_splice_write_actor( + struct pipe_inode_info *pipe, + struct splice_desc *sd) +{ + struct file *out = sd->u.file; + struct xfs_inode *ip = XFS_I(out->f_mapping->host); + ssize_t ret; + + xfs_rw_ilock(ip, XFS_IOLOCK_EXCL); + ret = file_remove_suid(out); + if (!ret) { + ret = file_update_time(out); + if (!ret) + ret = splice_from_pipe_feed(pipe, sd, pipe_to_file); + } + xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL); + + return ret; +} + /* - * xfs_file_splice_write() does not use xfs_rw_ilock() because - * generic_file_splice_write() takes the i_mutex itself. This, in theory, - * couuld cause lock inversions between the aio_write path and the splice path - * if someone is doing concurrent splice(2) based writes and write(2) based - * writes to the same inode. The only real way to fix this is to re-implement - * the generic code here with correct locking orders. + * xfs_file_splice_write() does not use the generic file splice write path + * because that takes the i_mutex, causing lock inversions with the IOLOCK. + * Instead, we call splice_write_to_file() directly with our own actor that does + * not take the i_mutex. This allows us to use the xfs_rw_ilock() functions like + * the rest of the code and hence avoid lock inversions and deadlocks. */ STATIC ssize_t xfs_file_splice_write( @@ -373,15 +394,12 @@ xfs_file_splice_write( if (XFS_FORCED_SHUTDOWN(ip->i_mount)) return -EIO; - xfs_ilock(ip, XFS_IOLOCK_EXCL); - trace_xfs_file_splice_write(ip, count, *ppos, ioflags); - ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags); + ret = splice_write_to_file(pipe, outfilp, ppos, count, flags, + xfs_file_splice_write_actor); if (ret > 0) XFS_STATS_ADD(xs_write_bytes, ret); - - xfs_iunlock(ip, XFS_IOLOCK_EXCL); return ret; } -- 1.7.10 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/2] xfs: fix splice/direct-IO deadlock @ 2012-11-28 2:12 ` Dave Chinner 0 siblings, 0 replies; 22+ messages in thread From: Dave Chinner @ 2012-11-28 2:12 UTC (permalink / raw) To: linux-fsdevel; +Cc: xfs From: Dave Chinner <dchinner@redhat.com> lockdep reports splice vs direct-io write lock inversions due to generic_file_splice_write() taking the inode->i_mutex inside XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can deadlock. Remove the XFS_IOLOCK_EXCL locking context from the outer function and drive it inwards to the actor function that only locks the inode when the lock is really needed, Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_file.c | 40 +++++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 2e79c54..21b6f0b 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -39,6 +39,7 @@ #include <linux/dcache.h> #include <linux/falloc.h> #include <linux/pagevec.h> +#include <linux/splice.h> static const struct vm_operations_struct xfs_file_vm_ops; @@ -344,13 +345,33 @@ xfs_file_splice_read( return ret; } +static ssize_t +xfs_file_splice_write_actor( + struct pipe_inode_info *pipe, + struct splice_desc *sd) +{ + struct file *out = sd->u.file; + struct xfs_inode *ip = XFS_I(out->f_mapping->host); + ssize_t ret; + + xfs_rw_ilock(ip, XFS_IOLOCK_EXCL); + ret = file_remove_suid(out); + if (!ret) { + ret = file_update_time(out); + if (!ret) + ret = splice_from_pipe_feed(pipe, sd, pipe_to_file); + } + xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL); + + return ret; +} + /* - * xfs_file_splice_write() does not use xfs_rw_ilock() because - * generic_file_splice_write() takes the i_mutex itself. This, in theory, - * couuld cause lock inversions between the aio_write path and the splice path - * if someone is doing concurrent splice(2) based writes and write(2) based - * writes to the same inode. The only real way to fix this is to re-implement - * the generic code here with correct locking orders. + * xfs_file_splice_write() does not use the generic file splice write path + * because that takes the i_mutex, causing lock inversions with the IOLOCK. + * Instead, we call splice_write_to_file() directly with our own actor that does + * not take the i_mutex. This allows us to use the xfs_rw_ilock() functions like + * the rest of the code and hence avoid lock inversions and deadlocks. */ STATIC ssize_t xfs_file_splice_write( @@ -373,15 +394,12 @@ xfs_file_splice_write( if (XFS_FORCED_SHUTDOWN(ip->i_mount)) return -EIO; - xfs_ilock(ip, XFS_IOLOCK_EXCL); - trace_xfs_file_splice_write(ip, count, *ppos, ioflags); - ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags); + ret = splice_write_to_file(pipe, outfilp, ppos, count, flags, + xfs_file_splice_write_actor); if (ret > 0) XFS_STATS_ADD(xs_write_bytes, ret); - - xfs_iunlock(ip, XFS_IOLOCK_EXCL); return ret; } -- 1.7.10 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] xfs: fix splice/direct-IO deadlock 2012-11-28 2:12 ` Dave Chinner @ 2012-11-28 16:07 ` Christoph Hellwig -1 siblings, 0 replies; 22+ messages in thread From: Christoph Hellwig @ 2012-11-28 16:07 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-fsdevel, xfs On Wed, Nov 28, 2012 at 01:12:48PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > lockdep reports splice vs direct-io write lock inversions due to > generic_file_splice_write() taking the inode->i_mutex inside > XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can > deadlock. Remove the XFS_IOLOCK_EXCL locking context from the outer > function and drive it inwards to the actor function that only locks > the inode when the lock is really needed, punctuation? Otherwise the patch looks fine, but I'd love to understand why the generic code thes te I_MUTEX_CHILD annotation and we can get away without it. Also can you add a testcase for this to xfstests? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] xfs: fix splice/direct-IO deadlock @ 2012-11-28 16:07 ` Christoph Hellwig 0 siblings, 0 replies; 22+ messages in thread From: Christoph Hellwig @ 2012-11-28 16:07 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-fsdevel, xfs On Wed, Nov 28, 2012 at 01:12:48PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > lockdep reports splice vs direct-io write lock inversions due to > generic_file_splice_write() taking the inode->i_mutex inside > XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can > deadlock. Remove the XFS_IOLOCK_EXCL locking context from the outer > function and drive it inwards to the actor function that only locks > the inode when the lock is really needed, punctuation? Otherwise the patch looks fine, but I'd love to understand why the generic code thes te I_MUTEX_CHILD annotation and we can get away without it. Also can you add a testcase for this to xfstests? _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] xfs: fix splice/direct-IO deadlock 2012-11-28 16:07 ` Christoph Hellwig @ 2012-11-28 21:33 ` Dave Chinner -1 siblings, 0 replies; 22+ messages in thread From: Dave Chinner @ 2012-11-28 21:33 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-fsdevel, xfs On Wed, Nov 28, 2012 at 11:07:36AM -0500, Christoph Hellwig wrote: > On Wed, Nov 28, 2012 at 01:12:48PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > lockdep reports splice vs direct-io write lock inversions due to > > generic_file_splice_write() taking the inode->i_mutex inside > > XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can > > deadlock. Remove the XFS_IOLOCK_EXCL locking context from the outer > > function and drive it inwards to the actor function that only locks > > the inode when the lock is really needed, > > punctuation? > > Otherwise the patch looks fine, but I'd love to understand why the > generic code thes te I_MUTEX_CHILD annotation and we can get away > without it. > > Also can you add a testcase for this to xfstests? I've had trouble reproducing it reliably. The only cases I've seen it occur are some custom FIO workloads. I'll try again to see if I can come up with something that reliably hits it... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] xfs: fix splice/direct-IO deadlock @ 2012-11-28 21:33 ` Dave Chinner 0 siblings, 0 replies; 22+ messages in thread From: Dave Chinner @ 2012-11-28 21:33 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-fsdevel, xfs On Wed, Nov 28, 2012 at 11:07:36AM -0500, Christoph Hellwig wrote: > On Wed, Nov 28, 2012 at 01:12:48PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > lockdep reports splice vs direct-io write lock inversions due to > > generic_file_splice_write() taking the inode->i_mutex inside > > XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can > > deadlock. Remove the XFS_IOLOCK_EXCL locking context from the outer > > function and drive it inwards to the actor function that only locks > > the inode when the lock is really needed, > > punctuation? > > Otherwise the patch looks fine, but I'd love to understand why the > generic code thes te I_MUTEX_CHILD annotation and we can get away > without it. > > Also can you add a testcase for this to xfstests? I've had trouble reproducing it reliably. The only cases I've seen it occur are some custom FIO workloads. I'll try again to see if I can come up with something that reliably hits it... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 0/2] splice: i_mutex vs splice write deadlock @ 2011-07-18 4:04 Dave Chinner 2011-07-18 4:04 ` Dave Chinner 0 siblings, 1 reply; 22+ messages in thread From: Dave Chinner @ 2011-07-18 4:04 UTC (permalink / raw) To: linux-fsdevel; +Cc: linux-kernel, xfs generic_file_splice_write() takes the inode->i_mutex after the filesystem has taken whatever locks it needs to ensure sanity. however, this typically violates the locking order of filesystems with their own locks in that the order is usually i_mutex -> filesystem lock. XFS is such a case, and generic_file_splice_write() is generating lockdep warnings because of lock inversions between the inode->i_mutex and the XFS_I(inode)->i_iolock. There is also a reported case of fio causing a deadlock when it mixes IO types (e.g. splice vs direct IO). This patch set introduces generic_file_splice_write_unlocked() and factors the code such that __generic_file_splice_write() will only lock the i_mutex if called from the locked variant. The second patch modifies XFS to use the new function. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 2/2] xfs: fix splice/direct-IO deadlock 2011-07-18 4:04 [PATCH 0/2] splice: i_mutex vs splice write deadlock Dave Chinner @ 2011-07-18 4:04 ` Dave Chinner 0 siblings, 0 replies; 22+ messages in thread From: Dave Chinner @ 2011-07-18 4:04 UTC (permalink / raw) To: linux-fsdevel; +Cc: linux-kernel, xfs From: Dave Chinner <dchinner@redhat.com> lockdep reports splice vs direct-io write lock inversions due to generic_file_splice_write() taking the inode->i_mutex inside XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can deadlock. Use generic_file_splice_write_unlocked() because the i_mutex does not need to be held over the operations that are done in the XFS splice write path. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/linux-2.6/xfs_file.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c index f51a384..1e641e6 100644 --- a/fs/xfs/linux-2.6/xfs_file.c +++ b/fs/xfs/linux-2.6/xfs_file.c @@ -463,7 +463,8 @@ xfs_file_splice_write( trace_xfs_file_splice_write(ip, count, *ppos, ioflags); - ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags); + ret = generic_file_splice_write_unlocked(pipe, outfilp, ppos, + count, flags); xfs_aio_write_isize_update(inode, ppos, ret); xfs_aio_write_newsize_update(ip); -- 1.7.5.1 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/2] xfs: fix splice/direct-IO deadlock @ 2011-07-18 4:04 ` Dave Chinner 0 siblings, 0 replies; 22+ messages in thread From: Dave Chinner @ 2011-07-18 4:04 UTC (permalink / raw) To: linux-fsdevel; +Cc: linux-kernel, xfs From: Dave Chinner <dchinner@redhat.com> lockdep reports splice vs direct-io write lock inversions due to generic_file_splice_write() taking the inode->i_mutex inside XFS_IOLOCK_EXCL context. These lock contexts are inverted, hence can deadlock. Use generic_file_splice_write_unlocked() because the i_mutex does not need to be held over the operations that are done in the XFS splice write path. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/linux-2.6/xfs_file.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c index f51a384..1e641e6 100644 --- a/fs/xfs/linux-2.6/xfs_file.c +++ b/fs/xfs/linux-2.6/xfs_file.c @@ -463,7 +463,8 @@ xfs_file_splice_write( trace_xfs_file_splice_write(ip, count, *ppos, ioflags); - ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags); + ret = generic_file_splice_write_unlocked(pipe, outfilp, ppos, + count, flags); xfs_aio_write_isize_update(inode, ppos, ret); xfs_aio_write_newsize_update(ip); -- 1.7.5.1 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 22+ messages in thread
end of thread, other threads:[~2012-11-28 21:33 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-08-08 6:45 [PATCH 0/2] splice: i_mutex vs splice write deadlock V2 Dave Chinner 2011-08-08 6:45 ` Dave Chinner 2011-08-08 6:45 ` [PATCH 1/2] vfs: split generic splice code from i_mutex locking Dave Chinner 2011-08-08 6:45 ` Dave Chinner 2011-08-09 11:36 ` Jan Kara 2011-08-09 11:36 ` Jan Kara 2011-08-10 10:12 ` Christoph Hellwig 2011-08-10 10:12 ` Christoph Hellwig 2011-08-08 6:45 ` [PATCH 2/2] xfs: fix splice/direct-IO deadlock Dave Chinner 2011-08-08 6:45 ` Dave Chinner 2011-08-10 10:17 ` Christoph Hellwig 2011-08-10 10:17 ` Christoph Hellwig 2011-08-10 10:19 ` [PATCH 0/2] splice: i_mutex vs splice write deadlock V2 Christoph Hellwig 2011-08-10 10:19 ` Christoph Hellwig -- strict thread matches above, loose matches on Subject: below -- 2012-11-28 2:12 [PATCH 0/2] splice: fix direct IO/splice deadlock Dave Chinner 2012-11-28 2:12 ` [PATCH 2/2] xfs: fix splice/direct-IO deadlock Dave Chinner 2012-11-28 2:12 ` Dave Chinner 2012-11-28 16:07 ` Christoph Hellwig 2012-11-28 16:07 ` Christoph Hellwig 2012-11-28 21:33 ` Dave Chinner 2012-11-28 21:33 ` Dave Chinner 2011-07-18 4:04 [PATCH 0/2] splice: i_mutex vs splice write deadlock Dave Chinner 2011-07-18 4:04 ` [PATCH 2/2] xfs: fix splice/direct-IO deadlock Dave Chinner 2011-07-18 4:04 ` Dave Chinner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.