All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: Non-blocking inode locking in IO completion
@ 2010-02-17  5:36 Dave Chinner
  2010-02-17 19:29 ` Christoph Hellwig
  2010-02-25 20:06 ` [PATCH] " Alex Elder
  0 siblings, 2 replies; 7+ messages in thread
From: Dave Chinner @ 2010-02-17  5:36 UTC (permalink / raw)
  To: xfs

The introduction of barriers to DM loop devices (e.g. dm-crypt) has
created a new IO order completion dependency that XFS does not
handle. That is, the completion of log IOs (which have barriers) in
the loop filesystem are now dependent on completion of data IO in
the backing filesystem.

This can cause deadlocks when a flush daemon issues a log force with
an inode locked because the IO completion of IO on the inode is
blocked by the inode lock. This in turn prevents further data IO
completion from occuring, which prevents the log IO from completing
because DM is waiting for data Io completion as well.

The fix for this new completion order dependency issue is to make
the IO completion inode locking non-blocking. If the inode lock
can't be grabbed, simply requeue the IO completion back to the work
queue so that it can be processed later. This prevents the
completion queue from being blocked and allows data IO completion on
other inodes to proceed, hence avoiding completion order dependent
deadlocks.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_aops.c |   93 ++++++++++++++++++++++++++-----------------
 1 files changed, 56 insertions(+), 37 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 66abe36..1c65a2b 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -163,14 +163,17 @@ xfs_ioend_new_eof(
 }
 
 /*
- * Update on-disk file size now that data has been written to disk.
- * The current in-memory file size is i_size.  If a write is beyond
- * eof i_new_size will be the intended file size until i_size is
- * updated.  If this write does not extend all the way to the valid
- * file size then restrict this update to the end of the write.
+ * Update on-disk file size now that data has been written to disk.  The
+ * current in-memory file size is i_size.  If a write is beyond eof i_new_size
+ * will be the intended file size until i_size is updated.  If this write does
+ * not extend all the way to the valid file size then restrict this update to
+ * the end of the write.
+ *
+ * This function does not block as blocking on the inode lock in IO completion
+ * can lead to IO completion order dependency deadlocks.. If it can't get the
+ * inode ilock it will return EAGAIN. Callers must handle this.
  */
-
-STATIC void
+STATIC int
 xfs_setfilesize(
 	xfs_ioend_t		*ioend)
 {
@@ -181,9 +184,11 @@ xfs_setfilesize(
 	ASSERT(ioend->io_type != IOMAP_READ);
 
 	if (unlikely(ioend->io_error))
-		return;
+		return 0;
+
+	if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL))
+		return EAGAIN;
 
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	isize = xfs_ioend_new_eof(ioend);
 	if (isize) {
 		ip->i_d.di_size = isize;
@@ -191,6 +196,28 @@ xfs_setfilesize(
 	}
 
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return 0;
+}
+
+/*
+ * Schedule IO completion handling on a xfsdatad if this was
+ * the final hold on this ioend. If we are asked to wait,
+ * flush the workqueue.
+ */
+STATIC void
+xfs_finish_ioend(
+	xfs_ioend_t	*ioend,
+	int		wait)
+{
+	if (atomic_dec_and_test(&ioend->io_remaining)) {
+		struct workqueue_struct *wq;
+
+		wq = (ioend->io_type == IOMAP_UNWRITTEN) ?
+			xfsconvertd_workqueue : xfsdatad_workqueue;
+		queue_work(wq, &ioend->io_work);
+		if (wait)
+			flush_workqueue(wq);
+	}
 }
 
 /*
@@ -198,11 +225,11 @@ xfs_setfilesize(
  */
 STATIC void
 xfs_end_io(
-	struct work_struct	*work)
+	struct work_struct *work)
 {
-	xfs_ioend_t		*ioend =
-		container_of(work, xfs_ioend_t, io_work);
-	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
+	xfs_ioend_t	*ioend = container_of(work, xfs_ioend_t, io_work);
+	struct xfs_inode *ip = XFS_I(ioend->io_inode);
+	int		error;
 
 	/*
 	 * For unwritten extents we need to issue transactions to convert a
@@ -210,7 +237,6 @@ xfs_end_io(
 	 */
 	if (ioend->io_type == IOMAP_UNWRITTEN &&
 	    likely(!ioend->io_error && !XFS_FORCED_SHUTDOWN(ip->i_mount))) {
-		int error;
 
 		error = xfs_iomap_write_unwritten(ip, ioend->io_offset,
 						 ioend->io_size);
@@ -222,30 +248,23 @@ xfs_end_io(
 	 * We might have to update the on-disk file size after extending
 	 * writes.
 	 */
-	if (ioend->io_type != IOMAP_READ)
-		xfs_setfilesize(ioend);
-	xfs_destroy_ioend(ioend);
-}
-
-/*
- * Schedule IO completion handling on a xfsdatad if this was
- * the final hold on this ioend. If we are asked to wait,
- * flush the workqueue.
- */
-STATIC void
-xfs_finish_ioend(
-	xfs_ioend_t	*ioend,
-	int		wait)
-{
-	if (atomic_dec_and_test(&ioend->io_remaining)) {
-		struct workqueue_struct *wq;
-
-		wq = (ioend->io_type == IOMAP_UNWRITTEN) ?
-			xfsconvertd_workqueue : xfsdatad_workqueue;
-		queue_work(wq, &ioend->io_work);
-		if (wait)
-			flush_workqueue(wq);
+	if (ioend->io_type != IOMAP_READ) {
+		error = xfs_setfilesize(ioend);
+		ASSERT(!error || error == EAGAIN);
 	}
+
+	/*
+	 * If we didn't complete processing of the ioend, requeue it to the
+	 * tail of the workqueue for another attempt later. Otherwise destroy
+	 * it.
+	 */
+	if (error == EAGAIN) {
+		atomic_inc(&ioend->io_remaining);
+		xfs_finish_ioend(ioend, 0);
+		/* ensure we don't spin on blocked ioends */
+		delay(1);
+	} else
+		xfs_destroy_ioend(ioend);
 }
 
 /*
-- 
1.6.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Non-blocking inode locking in IO completion
  2010-02-17  5:36 [PATCH] xfs: Non-blocking inode locking in IO completion Dave Chinner
@ 2010-02-17 19:29 ` Christoph Hellwig
  2010-02-17 21:13   ` Dave Chinner
  2010-02-25 20:06 ` [PATCH] " Alex Elder
  1 sibling, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2010-02-17 19:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Feb 17, 2010 at 04:36:29PM +1100, Dave Chinner wrote:
> The introduction of barriers to DM loop devices (e.g. dm-crypt) has
> created a new IO order completion dependency that XFS does not
> handle. That is, the completion of log IOs (which have barriers) in
> the loop filesystem are now dependent on completion of data IO in
> the backing filesystem.

I don't think dm belongs into the picture here at all.  The problem
is simply with the loop device, which sits below dm-crypt in the
bugzilla reports.  The loop device in SuSE (and for a short time in
mainline until we saw unexplainable XFS lockups) implements barriers
using fsync.  Now that fsync turns a log I/O that issues a barrier
in the XFS filesystem inside the loop device into a data I/O the
backing filesystem.  With this the rest of your description applies
again.

The patch looks good to me - while I hate introducing random delay()
calls I don't really see a way around this. 

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Non-blocking inode locking in IO completion
  2010-02-17 19:29 ` Christoph Hellwig
@ 2010-02-17 21:13   ` Dave Chinner
  2010-02-18 12:35     ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2010-02-17 21:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Feb 17, 2010 at 02:29:38PM -0500, Christoph Hellwig wrote:
> On Wed, Feb 17, 2010 at 04:36:29PM +1100, Dave Chinner wrote:
> > The introduction of barriers to DM loop devices (e.g. dm-crypt) has
> > created a new IO order completion dependency that XFS does not
> > handle. That is, the completion of log IOs (which have barriers) in
> > the loop filesystem are now dependent on completion of data IO in
> > the backing filesystem.
> 
> I don't think dm belongs into the picture here at all.  The problem
> is simply with the loop device, which sits below dm-crypt in the
> bugzilla reports.  The loop device in SuSE (and for a short time in
> mainline until we saw unexplainable XFS lockups) implements barriers
> using fsync.  Now that fsync turns a log I/O that issues a barrier
> in the XFS filesystem inside the loop device into a data I/O the
> backing filesystem.  With this the rest of your description applies
> again.

Fair point. I'll change the description to be more accurate.

> The patch looks good to me - while I hate introducing random delay()
> calls I don't really see a way around this. 

I thought about using queue_delayed_work(), but then the change
became much bigger and has other side effects like increasing the
size of the ioend structure.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Non-blocking inode locking in IO completion
  2010-02-17 21:13   ` Dave Chinner
@ 2010-02-18 12:35     ` Christoph Hellwig
  2010-02-25 23:06       ` [PATCH V2] " Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2010-02-18 12:35 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Thu, Feb 18, 2010 at 08:13:12AM +1100, Dave Chinner wrote:
> > The patch looks good to me - while I hate introducing random delay()
> > calls I don't really see a way around this. 
> 
> I thought about using queue_delayed_work(), but then the change
> became much bigger and has other side effects like increasing the
> size of the ioend structure.

Yes, now that the normal work struct and the delayed work struct are
different it would be a pain, agreed.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Non-blocking inode locking in IO completion
  2010-02-17  5:36 [PATCH] xfs: Non-blocking inode locking in IO completion Dave Chinner
  2010-02-17 19:29 ` Christoph Hellwig
@ 2010-02-25 20:06 ` Alex Elder
  1 sibling, 0 replies; 7+ messages in thread
From: Alex Elder @ 2010-02-25 20:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, 2010-02-17 at 16:36 +1100, Dave Chinner wrote:
> The introduction of barriers to DM loop devices (e.g. dm-crypt) has
> created a new IO order completion dependency that XFS does not
> handle. That is, the completion of log IOs (which have barriers) in
> the loop filesystem are now dependent on completion of data IO in
> the backing filesystem.

One comment, below

. . .

> +	if (ioend->io_type != IOMAP_READ) {
> +		error = xfs_setfilesize(ioend);
> +		ASSERT(!error || error == EAGAIN);
>  	}
> +
> +	/*
> +	 * If we didn't complete processing of the ioend, requeue it to the
> +	 * tail of the workqueue for another attempt later. Otherwise destroy
> +	 * it.
> +	 */
> +	if (error == EAGAIN) {

It's not a problem now (and may never be), but it's
conceivable error could have been set to the return
value from xfs_iomap_write_unwritten() to have value
EAGAIN.  It might have been better to include this
block inside the if (ioend->io_type != IOMAP_READ) {
block, above.

(I'll take it as is, however...)

					-Alex




> +		atomic_inc(&ioend->io_remaining);
> +		xfs_finish_ioend(ioend, 0);
> +		/* ensure we don't spin on blocked ioends */
> +		delay(1);
> +	} else
> +		xfs_destroy_ioend(ioend);
>  }
>  
>  /*



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V2] xfs: Non-blocking inode locking in IO completion
  2010-02-18 12:35     ` Christoph Hellwig
@ 2010-02-25 23:06       ` Dave Chinner
  2010-02-26 17:58         ` Alex Elder
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2010-02-25 23:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Feb 18, 2010 at 07:35:13AM -0500, Christoph Hellwig wrote:
> On Thu, Feb 18, 2010 at 08:13:12AM +1100, Dave Chinner wrote:
> > > The patch looks good to me - while I hate introducing random delay()
> > > calls I don't really see a way around this. 
> > 
> > I thought about using queue_delayed_work(), but then the change
> > became much bigger and has other side effects like increasing the
> > size of the ioend structure.
> 
> Yes, now that the normal work struct and the delayed work struct are
> different it would be a pain, agreed.

Version with updated commit message below.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

xfs: Non-blocking inode locking in IO completion

The introduction of barriers to loop devices has created a new IO
order completion dependency that XFS does not handle. The loop
device implements barriers using fsync and so turns a log IO in the
XFS filesystem on the loop device into a data IO in the backing
filesystem. That is, the completion of log IOs in the loop
filesystem are now dependent on completion of data IO in the backing
filesystem.

This can cause deadlocks when a flush daemon issues a log force with
an inode locked because the IO completion of IO on the inode is
blocked by the inode lock. This in turn prevents further data IO
completion from occuring on all XFS filesystems on that CPU (due to
the shared nature of the completion queues). This then prevents the
log IO from completing because the log is waiting for data IO
completion as well.

The fix for this new completion order dependency issue is to make
the IO completion inode locking non-blocking. If the inode lock
can't be grabbed, simply requeue the IO completion back to the work
queue so that it can be processed later. This prevents the
completion queue from being blocked and allows data IO completion on
other inodes to proceed, hence avoiding completion order dependent
deadlocks.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_aops.c |   93 ++++++++++++++++++++++++++-----------------
 1 files changed, 56 insertions(+), 37 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 66abe36..059f159 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -163,14 +163,17 @@ xfs_ioend_new_eof(
 }
 
 /*
- * Update on-disk file size now that data has been written to disk.
- * The current in-memory file size is i_size.  If a write is beyond
- * eof i_new_size will be the intended file size until i_size is
- * updated.  If this write does not extend all the way to the valid
- * file size then restrict this update to the end of the write.
+ * Update on-disk file size now that data has been written to disk.  The
+ * current in-memory file size is i_size.  If a write is beyond eof i_new_size
+ * will be the intended file size until i_size is updated.  If this write does
+ * not extend all the way to the valid file size then restrict this update to
+ * the end of the write.
+ *
+ * This function does not block as blocking on the inode lock in IO completion
+ * can lead to IO completion order dependency deadlocks.. If it can't get the
+ * inode ilock it will return EAGAIN. Callers must handle this.
  */
-
-STATIC void
+STATIC int
 xfs_setfilesize(
 	xfs_ioend_t		*ioend)
 {
@@ -181,9 +184,11 @@ xfs_setfilesize(
 	ASSERT(ioend->io_type != IOMAP_READ);
 
 	if (unlikely(ioend->io_error))
-		return;
+		return 0;
+
+	if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL))
+		return EAGAIN;
 
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	isize = xfs_ioend_new_eof(ioend);
 	if (isize) {
 		ip->i_d.di_size = isize;
@@ -191,6 +196,28 @@ xfs_setfilesize(
 	}
 
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return 0;
+}
+
+/*
+ * Schedule IO completion handling on a xfsdatad if this was
+ * the final hold on this ioend. If we are asked to wait,
+ * flush the workqueue.
+ */
+STATIC void
+xfs_finish_ioend(
+	xfs_ioend_t	*ioend,
+	int		wait)
+{
+	if (atomic_dec_and_test(&ioend->io_remaining)) {
+		struct workqueue_struct *wq;
+
+		wq = (ioend->io_type == IOMAP_UNWRITTEN) ?
+			xfsconvertd_workqueue : xfsdatad_workqueue;
+		queue_work(wq, &ioend->io_work);
+		if (wait)
+			flush_workqueue(wq);
+	}
 }
 
 /*
@@ -198,11 +225,11 @@ xfs_setfilesize(
  */
 STATIC void
 xfs_end_io(
-	struct work_struct	*work)
+	struct work_struct *work)
 {
-	xfs_ioend_t		*ioend =
-		container_of(work, xfs_ioend_t, io_work);
-	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
+	xfs_ioend_t	*ioend = container_of(work, xfs_ioend_t, io_work);
+	struct xfs_inode *ip = XFS_I(ioend->io_inode);
+	int		error = 0;
 
 	/*
 	 * For unwritten extents we need to issue transactions to convert a
@@ -210,7 +237,6 @@ xfs_end_io(
 	 */
 	if (ioend->io_type == IOMAP_UNWRITTEN &&
 	    likely(!ioend->io_error && !XFS_FORCED_SHUTDOWN(ip->i_mount))) {
-		int error;
 
 		error = xfs_iomap_write_unwritten(ip, ioend->io_offset,
 						 ioend->io_size);
@@ -222,30 +248,23 @@ xfs_end_io(
 	 * We might have to update the on-disk file size after extending
 	 * writes.
 	 */
-	if (ioend->io_type != IOMAP_READ)
-		xfs_setfilesize(ioend);
-	xfs_destroy_ioend(ioend);
-}
-
-/*
- * Schedule IO completion handling on a xfsdatad if this was
- * the final hold on this ioend. If we are asked to wait,
- * flush the workqueue.
- */
-STATIC void
-xfs_finish_ioend(
-	xfs_ioend_t	*ioend,
-	int		wait)
-{
-	if (atomic_dec_and_test(&ioend->io_remaining)) {
-		struct workqueue_struct *wq;
-
-		wq = (ioend->io_type == IOMAP_UNWRITTEN) ?
-			xfsconvertd_workqueue : xfsdatad_workqueue;
-		queue_work(wq, &ioend->io_work);
-		if (wait)
-			flush_workqueue(wq);
+	if (ioend->io_type != IOMAP_READ) {
+		error = xfs_setfilesize(ioend);
+		ASSERT(!error || error == EAGAIN);
 	}
+
+	/*
+	 * If we didn't complete processing of the ioend, requeue it to the
+	 * tail of the workqueue for another attempt later. Otherwise destroy
+	 * it.
+	 */
+	if (error == EAGAIN) {
+		atomic_inc(&ioend->io_remaining);
+		xfs_finish_ioend(ioend, 0);
+		/* ensure we don't spin on blocked ioends */
+		delay(1);
+	} else
+		xfs_destroy_ioend(ioend);
 }
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] xfs: Non-blocking inode locking in IO completion
  2010-02-25 23:06       ` [PATCH V2] " Dave Chinner
@ 2010-02-26 17:58         ` Alex Elder
  0 siblings, 0 replies; 7+ messages in thread
From: Alex Elder @ 2010-02-26 17:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Fri, 2010-02-26 at 10:06 +1100, Dave Chinner wrote:
> On Thu, Feb 18, 2010 at 07:35:13AM -0500, Christoph Hellwig wrote:
> > On Thu, Feb 18, 2010 at 08:13:12AM +1100, Dave Chinner wrote:
> > > > The patch looks good to me - while I hate introducing random delay()
> > > > calls I don't really see a way around this. 
> > > 
> > > I thought about using queue_delayed_work(), but then the change
> > > became much bigger and has other side effects like increasing the
> > > size of the ioend structure.
> > 
> > Yes, now that the normal work struct and the delayed work struct are
> > different it would be a pain, agreed.
> 
> Version with updated commit message below.

This looks good and I'll be incorporating this version.
Thanks for updating it.

					-Alex

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> xfs: Non-blocking inode locking in IO completion
> 
> The introduction of barriers to loop devices has created a new IO
> order completion dependency that XFS does not handle. The loop
> device implements barriers using fsync and so turns a log IO in the
> XFS filesystem on the loop device into a data IO in the backing
> filesystem. That is, the completion of log IOs in the loop
> filesystem are now dependent on completion of data IO in the backing
> filesystem.
. . .

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-02-26 17:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-17  5:36 [PATCH] xfs: Non-blocking inode locking in IO completion Dave Chinner
2010-02-17 19:29 ` Christoph Hellwig
2010-02-17 21:13   ` Dave Chinner
2010-02-18 12:35     ` Christoph Hellwig
2010-02-25 23:06       ` [PATCH V2] " Dave Chinner
2010-02-26 17:58         ` Alex Elder
2010-02-25 20:06 ` [PATCH] " Alex Elder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.