All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] xfs: fix a class of shutdown hangs
@ 2012-03-23  1:47 Dave Chinner
  2012-03-23  1:47 ` [PATCH 1/2] xfs: move shutdown out of xfs_trans_ail_delete_bulk Dave Chinner
  2012-03-23  1:47 ` [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait() Dave Chinner
  0 siblings, 2 replies; 10+ messages in thread
From: Dave Chinner @ 2012-03-23  1:47 UTC (permalink / raw)
  To: xfs

The shutdown hang these two patches fix was tripped over by RH QA
when some other (unknown) problem resulting in trying to remove an
EFI log item from the AIL when it was not in the AIL at all. The
shutdown was triggered from log IO completion context, and hung
trying to wait for log IO completion.

IOWs, the patches will allow the shutdown to finish and the
filesystem to be unmounted, but do nothing to fix the unknown
problem that caused the shutdown to be triggered.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] xfs: move shutdown out of xfs_trans_ail_delete_bulk
  2012-03-23  1:47 [PATCH 0/2] xfs: fix a class of shutdown hangs Dave Chinner
@ 2012-03-23  1:47 ` Dave Chinner
  2012-03-24 17:01   ` Christoph Hellwig
  2012-03-23  1:47 ` [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait() Dave Chinner
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2012-03-23  1:47 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_trans_ail_delete_bulk() can be called from different contexts so
if the itemis not in the AIL we need different shutdown for each
context.  Move the shutdown to the call locations to prepare for
changing the shutdown methods where appropriate.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf_item.c     |   12 ++++++++++--
 fs/xfs/xfs_dquot.c        |    9 ++++++---
 fs/xfs/xfs_dquot_item.c   |    5 ++++-
 fs/xfs/xfs_extfree_item.c |    7 ++++++-
 fs/xfs/xfs_iget.c         |    9 ++++++---
 fs/xfs/xfs_inode_item.c   |   16 +++++++++++++---
 fs/xfs/xfs_log_recover.c  |   12 +++++++-----
 fs/xfs/xfs_trans_ail.c    |   10 ++++++----
 fs/xfs/xfs_trans_priv.h   |    6 +++---
 9 files changed, 61 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index eac97ef..3e5f654 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -454,8 +454,13 @@ xfs_buf_item_unpin(
 			bp->b_fspriv = NULL;
 			bp->b_iodone = NULL;
 		} else {
+			int error;
+
 			spin_lock(&ailp->xa_lock);
-			xfs_trans_ail_delete(ailp, (xfs_log_item_t *)bip);
+			error = xfs_trans_ail_delete(ailp, lip);
+			if (error == EFSCORRUPTED)
+				xfs_force_shutdown(ailp->xa_mount,
+						   SHUTDOWN_CORRUPT_INCORE);
 			xfs_buf_item_relse(bp);
 			ASSERT(bp->b_fspriv == NULL);
 		}
@@ -1030,6 +1035,7 @@ xfs_buf_iodone(
 	struct xfs_log_item	*lip)
 {
 	struct xfs_ail		*ailp = lip->li_ailp;
+	int			error;
 
 	ASSERT(BUF_ITEM(lip)->bli_buf == bp);
 
@@ -1045,6 +1051,8 @@ xfs_buf_iodone(
 	 * Either way, AIL is useless if we're forcing a shutdown.
 	 */
 	spin_lock(&ailp->xa_lock);
-	xfs_trans_ail_delete(ailp, lip);
+	error = xfs_trans_ail_delete(ailp, lip);
+	if (error == EFSCORRUPTED)
+		xfs_force_shutdown(ailp->xa_mount, SHUTDOWN_CORRUPT_INCORE);
 	xfs_buf_item_free(BUF_ITEM(lip));
 }
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 4be16a0..cd5bd4b 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -856,9 +856,12 @@ xfs_qm_dqflush_done(
 
 		/* xfs_trans_ail_delete() drops the AIL lock. */
 		spin_lock(&ailp->xa_lock);
-		if (lip->li_lsn == qip->qli_flush_lsn)
-			xfs_trans_ail_delete(ailp, lip);
-		else
+		if (lip->li_lsn == qip->qli_flush_lsn) {
+			int	error = xfs_trans_ail_delete(ailp, lip);
+			if (error == EFSCORRUPTED)
+				xfs_force_shutdown(ailp->xa_mount,
+						SHUTDOWN_CORRUPT_INCORE);
+		} else
 			spin_unlock(&ailp->xa_lock);
 	}
 
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index 34baeae..69a098c 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -448,13 +448,16 @@ xfs_qm_qoffend_logitem_committed(
 	struct xfs_qoff_logitem	*qfe = QOFF_ITEM(lip);
 	struct xfs_qoff_logitem	*qfs = qfe->qql_start_lip;
 	struct xfs_ail		*ailp = qfs->qql_item.li_ailp;
+	int			error;
 
 	/*
 	 * Delete the qoff-start logitem from the AIL.
 	 * xfs_trans_ail_delete() drops the AIL lock.
 	 */
 	spin_lock(&ailp->xa_lock);
-	xfs_trans_ail_delete(ailp, (xfs_log_item_t *)qfs);
+	error = xfs_trans_ail_delete(ailp, (struct xfs_log_item *)qfs);
+	if (error == EFSCORRUPTED)
+		xfs_force_shutdown(ailp->xa_mount, SHUTDOWN_CORRUPT_INCORE);
 
 	kmem_free(qfs);
 	kmem_free(qfe);
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 35c2aff..4ccf2b6 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -62,9 +62,14 @@ __xfs_efi_release(
 	struct xfs_ail		*ailp = efip->efi_item.li_ailp;
 
 	if (!test_and_clear_bit(XFS_EFI_COMMITTED, &efip->efi_flags)) {
+		int	error;
+
 		spin_lock(&ailp->xa_lock);
 		/* xfs_trans_ail_delete() drops the AIL lock. */
-		xfs_trans_ail_delete(ailp, &efip->efi_item);
+		error = xfs_trans_ail_delete(ailp, &efip->efi_item);
+		if (error == EFSCORRUPTED)
+			xfs_force_shutdown(ailp->xa_mount,
+						SHUTDOWN_CORRUPT_INCORE);
 		xfs_efi_item_free(efip);
 	}
 }
diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index bcc6c24..4700ba4 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -135,9 +135,12 @@ xfs_inode_free(
 				       XFS_FORCED_SHUTDOWN(ip->i_mount));
 		if (lip->li_flags & XFS_LI_IN_AIL) {
 			spin_lock(&ailp->xa_lock);
-			if (lip->li_flags & XFS_LI_IN_AIL)
-				xfs_trans_ail_delete(ailp, lip);
-			else
+			if (lip->li_flags & XFS_LI_IN_AIL) {
+				int	error = xfs_trans_ail_delete(ailp, lip);
+				if (error == EFSCORRUPTED)
+					xfs_force_shutdown(ailp->xa_mount,
+							SHUTDOWN_CORRUPT_INCORE);
+			} else
 				spin_unlock(&ailp->xa_lock);
 		}
 		xfs_inode_item_destroy(ip);
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 05d924e..b0a813f 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -837,7 +837,9 @@ xfs_iflush_done(
 	 */
 	if (need_ail) {
 		struct xfs_log_item *log_items[need_ail];
-		int i = 0;
+		int	i = 0;
+		int	error;
+
 		spin_lock(&ailp->xa_lock);
 		for (blip = lip; blip; blip = blip->li_bio_list) {
 			iip = INODE_ITEM(blip);
@@ -848,7 +850,10 @@ xfs_iflush_done(
 			ASSERT(i <= need_ail);
 		}
 		/* xfs_trans_ail_delete_bulk() drops the AIL lock. */
-		xfs_trans_ail_delete_bulk(ailp, log_items, i);
+		error = xfs_trans_ail_delete_bulk(ailp, log_items, i);
+		if (error == EFSCORRUPTED)
+			xfs_force_shutdown(ailp->xa_mount,
+						SHUTDOWN_CORRUPT_INCORE);
 	}
 
 
@@ -887,8 +892,13 @@ xfs_iflush_abort(
 		if (iip->ili_item.li_flags & XFS_LI_IN_AIL) {
 			spin_lock(&ailp->xa_lock);
 			if (iip->ili_item.li_flags & XFS_LI_IN_AIL) {
+				int	error;
 				/* xfs_trans_ail_delete() drops the AIL lock. */
-				xfs_trans_ail_delete(ailp, (xfs_log_item_t *)iip);
+				error = xfs_trans_ail_delete(ailp,
+							(xfs_log_item_t *)iip);
+				if (error == EFSCORRUPTED)
+					xfs_force_shutdown(ailp->xa_mount,
+							SHUTDOWN_CORRUPT_INCORE);
 			} else
 				spin_unlock(&ailp->xa_lock);
 		}
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 7c75c73..63df121 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2638,11 +2638,13 @@ xlog_recover_efd_pass2(
 		if (lip->li_type == XFS_LI_EFI) {
 			efip = (xfs_efi_log_item_t *)lip;
 			if (efip->efi_format.efi_id == efi_id) {
-				/*
-				 * xfs_trans_ail_delete() drops the
-				 * AIL lock.
-				 */
-				xfs_trans_ail_delete(ailp, lip);
+				int	error;
+
+				/* xfs_trans_ail_delete() drops the AIL lock. */
+				error = xfs_trans_ail_delete(ailp, lip);
+				if (error == EFSCORRUPTED)
+					xfs_force_shutdown(ailp->xa_mount,
+							SHUTDOWN_CORRUPT_INCORE);
 				xfs_efi_item_free(efip);
 				spin_lock(&ailp->xa_lock);
 				break;
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 1dead07..4c94a1f 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -694,9 +694,10 @@ xfs_trans_ail_update_bulk(
  * of traffic on the lock, especially during IO completion.
  *
  * This function must be called with the AIL lock held.  The lock is dropped
- * before returning.
+ * before returning. If EFSCORRUPTED is returned, the caller must shut down the
+ * filesystem.
  */
-void
+int
 xfs_trans_ail_delete_bulk(
 	struct xfs_ail		*ailp,
 	struct xfs_log_item	**log_items,
@@ -718,9 +719,9 @@ xfs_trans_ail_delete_bulk(
 				xfs_alert_tag(mp, XFS_PTAG_AILDELETE,
 		"%s: attempting to delete a log item that is not in the AIL",
 						__func__);
-				xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+				return EFSCORRUPTED;
 			}
-			return;
+			return 0;
 		}
 
 		xfs_ail_delete(ailp, lip);
@@ -735,6 +736,7 @@ xfs_trans_ail_delete_bulk(
 		xlog_assign_tail_lsn(ailp->xa_mount);
 		xfs_log_space_wake(ailp->xa_mount);
 	}
+	return 0;
 }
 
 /*
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 8ab2ced..8ca636a 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -89,15 +89,15 @@ xfs_trans_ail_update(
 	xfs_trans_ail_update_bulk(ailp, NULL, &lip, 1, lsn);
 }
 
-void	xfs_trans_ail_delete_bulk(struct xfs_ail *ailp,
+int	xfs_trans_ail_delete_bulk(struct xfs_ail *ailp,
 				struct xfs_log_item **log_items, int nr_items)
 				__releases(ailp->xa_lock);
-static inline void
+static inline int
 xfs_trans_ail_delete(
 	struct xfs_ail	*ailp,
 	xfs_log_item_t	*lip) __releases(ailp->xa_lock)
 {
-	xfs_trans_ail_delete_bulk(ailp, &lip, 1);
+	return xfs_trans_ail_delete_bulk(ailp, &lip, 1);
 }
 
 void			xfs_ail_push(struct xfs_ail *, xfs_lsn_t);
-- 
1.7.9

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait()
  2012-03-23  1:47 [PATCH 0/2] xfs: fix a class of shutdown hangs Dave Chinner
  2012-03-23  1:47 ` [PATCH 1/2] xfs: move shutdown out of xfs_trans_ail_delete_bulk Dave Chinner
@ 2012-03-23  1:47 ` Dave Chinner
  2012-03-24 17:03   ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2012-03-23  1:47 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When a shutdown is triggered from failing to find an item in the AIL
during delete, we can be called from either metadata IO completion
context or from log IO completion context. In the case of log IO
completion context, we must indicate that this is a log error so
that the forced shutdown does not attempt to flush the log.

To flush the log whilst in log IO completion will cause a deadlock
as the shutdown won't proceed until log IO completes, and log Io
cannot complete because it has blocked waiting for itself to
complete....

We delete items in the AIL from log IO completion when we are
unpinning in-memory only items, or items that do not require
writeback to remove from the AIL (e.g. EFI/EFD items). Hence there
are several locations that need this treatment.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf_item.c     |    2 +-
 fs/xfs/xfs_dquot_item.c   |    2 +-
 fs/xfs/xfs_extfree_item.c |    2 +-
 fs/xfs/xfs_inode.c        |    4 ++--
 fs/xfs/xfs_inode_item.c   |    6 ++++--
 fs/xfs/xfs_inode_item.h   |    2 +-
 6 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 3e5f654..07fac37 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -460,7 +460,7 @@ xfs_buf_item_unpin(
 			error = xfs_trans_ail_delete(ailp, lip);
 			if (error == EFSCORRUPTED)
 				xfs_force_shutdown(ailp->xa_mount,
-						   SHUTDOWN_CORRUPT_INCORE);
+						   SHUTDOWN_LOG_IO_ERROR);
 			xfs_buf_item_relse(bp);
 			ASSERT(bp->b_fspriv == NULL);
 		}
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index 69a098c..452fb24 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -457,7 +457,7 @@ xfs_qm_qoffend_logitem_committed(
 	spin_lock(&ailp->xa_lock);
 	error = xfs_trans_ail_delete(ailp, (struct xfs_log_item *)qfs);
 	if (error == EFSCORRUPTED)
-		xfs_force_shutdown(ailp->xa_mount, SHUTDOWN_CORRUPT_INCORE);
+		xfs_force_shutdown(ailp->xa_mount, SHUTDOWN_LOG_IO_ERROR);
 
 	kmem_free(qfs);
 	kmem_free(qfe);
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 4ccf2b6..40d1b0e 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -69,7 +69,7 @@ __xfs_efi_release(
 		error = xfs_trans_ail_delete(ailp, &efip->efi_item);
 		if (error == EFSCORRUPTED)
 			xfs_force_shutdown(ailp->xa_mount,
-						SHUTDOWN_CORRUPT_INCORE);
+						SHUTDOWN_LOG_IO_ERROR);
 		xfs_efi_item_free(efip);
 	}
 }
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index bc46c0a..4fb2e99 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2377,7 +2377,7 @@ cluster_corrupt_out:
 	/*
 	 * Unlocks the flush lock
 	 */
-	xfs_iflush_abort(iq);
+	xfs_iflush_abort(iq, false);
 	kmem_free(ilist);
 	xfs_perag_put(pag);
 	return XFS_ERROR(EFSCORRUPTED);
@@ -2503,7 +2503,7 @@ cluster_corrupt_out:
 	/*
 	 * Unlocks the flush lock
 	 */
-	xfs_iflush_abort(ip);
+	xfs_iflush_abort(ip, false);
 	return XFS_ERROR(EFSCORRUPTED);
 }
 
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index b0a813f..bd7fd73 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -883,7 +883,8 @@ xfs_iflush_done(
  */
 void
 xfs_iflush_abort(
-	xfs_inode_t		*ip)
+	struct xfs_inode	*ip,
+	bool			stale)
 {
 	xfs_inode_log_item_t	*iip = ip->i_itemp;
 
@@ -898,6 +899,7 @@ xfs_iflush_abort(
 							(xfs_log_item_t *)iip);
 				if (error == EFSCORRUPTED)
 					xfs_force_shutdown(ailp->xa_mount,
+						stale ? SHUTDOWN_LOG_IO_ERROR :
 							SHUTDOWN_CORRUPT_INCORE);
 			} else
 				spin_unlock(&ailp->xa_lock);
@@ -925,7 +927,7 @@ xfs_istale_done(
 	struct xfs_buf		*bp,
 	struct xfs_log_item	*lip)
 {
-	xfs_iflush_abort(INODE_ITEM(lip)->ili_inode);
+	xfs_iflush_abort(INODE_ITEM(lip)->ili_inode, true);
 }
 
 /*
diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h
index 41d61c3..376d4d0 100644
--- a/fs/xfs/xfs_inode_item.h
+++ b/fs/xfs/xfs_inode_item.h
@@ -165,7 +165,7 @@ extern void xfs_inode_item_init(struct xfs_inode *, struct xfs_mount *);
 extern void xfs_inode_item_destroy(struct xfs_inode *);
 extern void xfs_iflush_done(struct xfs_buf *, struct xfs_log_item *);
 extern void xfs_istale_done(struct xfs_buf *, struct xfs_log_item *);
-extern void xfs_iflush_abort(struct xfs_inode *);
+extern void xfs_iflush_abort(struct xfs_inode *, bool);
 extern int xfs_inode_item_format_convert(xfs_log_iovec_t *,
 					 xfs_inode_log_format_t *);
 
-- 
1.7.9

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] xfs: move shutdown out of xfs_trans_ail_delete_bulk
  2012-03-23  1:47 ` [PATCH 1/2] xfs: move shutdown out of xfs_trans_ail_delete_bulk Dave Chinner
@ 2012-03-24 17:01   ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2012-03-24 17:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

>  	spin_lock(&ailp->xa_lock);
> -	xfs_trans_ail_delete(ailp, (xfs_log_item_t *)qfs);
> +	error = xfs_trans_ail_delete(ailp, (struct xfs_log_item *)qfs);

please make this a:

	error = xfs_trans_ail_delete(ailp, &qfs->qli_item);

while you're at it.

>  		if (lip->li_flags & XFS_LI_IN_AIL) {
>  			spin_lock(&ailp->xa_lock);
> -			if (lip->li_flags & XFS_LI_IN_AIL)
> -				xfs_trans_ail_delete(ailp, lip);
> -			else
> +			if (lip->li_flags & XFS_LI_IN_AIL) {
> +				int	error = xfs_trans_ail_delete(ailp, lip);
> +				if (error == EFSCORRUPTED)
> +					xfs_force_shutdown(ailp->xa_mount,
> +							SHUTDOWN_CORRUPT_INCORE);
> +			} else

Given that we already have two checks for XFS_LI_IN_AIL around the
call to xfs_trans_ail_delete I don't think we need to handle the error
here.

> @@ -887,8 +892,13 @@ xfs_iflush_abort(
>  		if (iip->ili_item.li_flags & XFS_LI_IN_AIL) {
>  			spin_lock(&ailp->xa_lock);
>  			if (iip->ili_item.li_flags & XFS_LI_IN_AIL) {
> +				int	error;
>  				/* xfs_trans_ail_delete() drops the AIL lock. */
> -				xfs_trans_ail_delete(ailp, (xfs_log_item_t *)iip);
> +				error = xfs_trans_ail_delete(ailp,
> +							(xfs_log_item_t *)iip);
> +				if (error == EFSCORRUPTED)
> +					xfs_force_shutdown(ailp->xa_mount,
> +							SHUTDOWN_CORRUPT_INCORE);

Same argument here.

Also please pass in &iip->>ili_item instead of the cast here.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait()
  2012-03-23  1:47 ` [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait() Dave Chinner
@ 2012-03-24 17:03   ` Christoph Hellwig
  2012-03-24 22:46     ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2012-03-24 17:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Mar 23, 2012 at 12:47:43PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When a shutdown is triggered from failing to find an item in the AIL
> during delete, we can be called from either metadata IO completion
> context or from log IO completion context. In the case of log IO
> completion context, we must indicate that this is a log error so
> that the forced shutdown does not attempt to flush the log.
> 
> To flush the log whilst in log IO completion will cause a deadlock
> as the shutdown won't proceed until log IO completes, and log Io
> cannot complete because it has blocked waiting for itself to
> complete....
> 
> We delete items in the AIL from log IO completion when we are
> unpinning in-memory only items, or items that do not require
> writeback to remove from the AIL (e.g. EFI/EFD items). Hence there
> are several locations that need this treatment.

Looks good.  I wonder if it might be simple to simply pass a flags
argument to xfs_ail_delete(_bulk) which tells which kind of shutdown
to do.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait()
  2012-03-24 17:03   ` Christoph Hellwig
@ 2012-03-24 22:46     ` Dave Chinner
  2012-03-25 11:28       ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2012-03-24 22:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sat, Mar 24, 2012 at 01:03:02PM -0400, Christoph Hellwig wrote:
> On Fri, Mar 23, 2012 at 12:47:43PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When a shutdown is triggered from failing to find an item in the AIL
> > during delete, we can be called from either metadata IO completion
> > context or from log IO completion context. In the case of log IO
> > completion context, we must indicate that this is a log error so
> > that the forced shutdown does not attempt to flush the log.
> > 
> > To flush the log whilst in log IO completion will cause a deadlock
> > as the shutdown won't proceed until log IO completes, and log Io
> > cannot complete because it has blocked waiting for itself to
> > complete....
> > 
> > We delete items in the AIL from log IO completion when we are
> > unpinning in-memory only items, or items that do not require
> > writeback to remove from the AIL (e.g. EFI/EFD items). Hence there
> > are several locations that need this treatment.
> 
> Looks good.  I wonder if it might be simple to simply pass a flags
> argument to xfs_ail_delete(_bulk) which tells which kind of shutdown
> to do.

I thought about doing that, but it seems strange and unusual to tell
code how to handle errors internally instead of returning the error
and letting the caller handle it...

I don't mind either way, though. If you prefer I pass in the
shutdown flag, I can change it all to do that....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait()
  2012-03-24 22:46     ` Dave Chinner
@ 2012-03-25 11:28       ` Christoph Hellwig
  2012-03-27  9:45         ` [PATCH v2] shutdown hang fix (was Re: [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait()) Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2012-03-25 11:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Sun, Mar 25, 2012 at 09:46:49AM +1100, Dave Chinner wrote:
> > Looks good.  I wonder if it might be simple to simply pass a flags
> > argument to xfs_ail_delete(_bulk) which tells which kind of shutdown
> > to do.
> 
> I thought about doing that, but it seems strange and unusual to tell
> code how to handle errors internally instead of returning the error
> and letting the caller handle it...
> 
> I don't mind either way, though. If you prefer I pass in the
> shutdown flag, I can change it all to do that....

I don't have a strong opinion.  Moving the shutdown to the caller is
cleaner for sure, but it's a lot of churn when the other version would

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] shutdown hang fix (was Re: [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait())
  2012-03-25 11:28       ` Christoph Hellwig
@ 2012-03-27  9:45         ` Dave Chinner
  2012-03-27  9:57           ` Christoph Hellwig
  2012-04-06 17:42           ` Mark Tinguely
  0 siblings, 2 replies; 10+ messages in thread
From: Dave Chinner @ 2012-03-27  9:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Mar 25, 2012 at 07:28:01AM -0400, Christoph Hellwig wrote:
> On Sun, Mar 25, 2012 at 09:46:49AM +1100, Dave Chinner wrote:
> > > Looks good.  I wonder if it might be simple to simply pass a flags
> > > argument to xfs_ail_delete(_bulk) which tells which kind of shutdown
> > > to do.
> > 
> > I thought about doing that, but it seems strange and unusual to tell
> > code how to handle errors internally instead of returning the error
> > and letting the caller handle it...
> > 
> > I don't mind either way, though. If you prefer I pass in the
> > shutdown flag, I can change it all to do that....
> 
> I don't have a strong opinion.  Moving the shutdown to the caller is
> cleaner for sure, but it's a lot of churn when the other version would

Ok, Here's a "pass-in-the-shutdown-method" version of the change.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


xfs: pass shutdown method into xfs_trans_ail_delete_bulk

From: Dave Chinner <dchinner@redhat.com>

xfs_trans_ail_delete_bulk() can be called from different contexts so
if the item is not in the AIL we need different shutdown for each
context.  Pass in the shutdown method needed so the correct action
can be taken.

Signed-off-by: Dave Chinner <dchinner@redhat.com>

---
 fs/xfs/xfs_buf_item.c     |    4 ++--
 fs/xfs/xfs_dquot.c        |    2 +-
 fs/xfs/xfs_dquot_item.c   |    2 +-
 fs/xfs/xfs_extfree_item.c |    3 ++-
 fs/xfs/xfs_iget.c         |    3 ++-
 fs/xfs/xfs_inode.c        |    4 ++--
 fs/xfs/xfs_inode_item.c   |   23 +++++++++++++----------
 fs/xfs/xfs_inode_item.h   |    2 +-
 fs/xfs/xfs_log_recover.c  |    3 ++-
 fs/xfs/xfs_trans_ail.c    |    5 +++--
 fs/xfs/xfs_trans_priv.h   |    8 +++++---
 11 files changed, 34 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index eac97ef..8111a65 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -455,7 +455,7 @@ xfs_buf_item_unpin(
 			bp->b_iodone = NULL;
 		} else {
 			spin_lock(&ailp->xa_lock);
-			xfs_trans_ail_delete(ailp, (xfs_log_item_t *)bip);
+			xfs_trans_ail_delete(ailp, lip, SHUTDOWN_LOG_IO_ERROR);
 			xfs_buf_item_relse(bp);
 			ASSERT(bp->b_fspriv == NULL);
 		}
@@ -1045,6 +1045,6 @@ xfs_buf_iodone(
 	 * Either way, AIL is useless if we're forcing a shutdown.
 	 */
 	spin_lock(&ailp->xa_lock);
-	xfs_trans_ail_delete(ailp, lip);
+	xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
 	xfs_buf_item_free(BUF_ITEM(lip));
 }
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 1155208..abbdb7a 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -857,7 +857,7 @@ xfs_qm_dqflush_done(
 		/* xfs_trans_ail_delete() drops the AIL lock. */
 		spin_lock(&ailp->xa_lock);
 		if (lip->li_lsn == qip->qli_flush_lsn)
-			xfs_trans_ail_delete(ailp, lip);
+			xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
 		else
 			spin_unlock(&ailp->xa_lock);
 	}
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index 34baeae..4e3127b 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -454,7 +454,7 @@ xfs_qm_qoffend_logitem_committed(
 	 * xfs_trans_ail_delete() drops the AIL lock.
 	 */
 	spin_lock(&ailp->xa_lock);
-	xfs_trans_ail_delete(ailp, (xfs_log_item_t *)qfs);
+	xfs_trans_ail_delete(ailp, &qfs->qql_item, SHUTDOWN_LOG_IO_ERROR);
 
 	kmem_free(qfs);
 	kmem_free(qfe);
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 35c2aff..be61d1d 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -64,7 +64,8 @@ __xfs_efi_release(
 	if (!test_and_clear_bit(XFS_EFI_COMMITTED, &efip->efi_flags)) {
 		spin_lock(&ailp->xa_lock);
 		/* xfs_trans_ail_delete() drops the AIL lock. */
-		xfs_trans_ail_delete(ailp, &efip->efi_item);
+		xfs_trans_ail_delete(ailp, &efip->efi_item,
+				     SHUTDOWN_LOG_IO_ERROR);
 		xfs_efi_item_free(efip);
 	}
 }
diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index bcc6c24..b9df823 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -136,7 +136,8 @@ xfs_inode_free(
 		if (lip->li_flags & XFS_LI_IN_AIL) {
 			spin_lock(&ailp->xa_lock);
 			if (lip->li_flags & XFS_LI_IN_AIL)
-				xfs_trans_ail_delete(ailp, lip);
+				xfs_trans_ail_delete(ailp, lip,
+						     SHUTDOWN_CORRUPT_INCORE);
 			else
 				spin_unlock(&ailp->xa_lock);
 		}
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index bc46c0a..4fb2e99 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2377,7 +2377,7 @@ cluster_corrupt_out:
 	/*
 	 * Unlocks the flush lock
 	 */
-	xfs_iflush_abort(iq);
+	xfs_iflush_abort(iq, false);
 	kmem_free(ilist);
 	xfs_perag_put(pag);
 	return XFS_ERROR(EFSCORRUPTED);
@@ -2503,7 +2503,7 @@ cluster_corrupt_out:
 	/*
 	 * Unlocks the flush lock
 	 */
-	xfs_iflush_abort(ip);
+	xfs_iflush_abort(ip, false);
 	return XFS_ERROR(EFSCORRUPTED);
 }
 
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 05d924e..696f565 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -848,7 +848,8 @@ xfs_iflush_done(
 			ASSERT(i <= need_ail);
 		}
 		/* xfs_trans_ail_delete_bulk() drops the AIL lock. */
-		xfs_trans_ail_delete_bulk(ailp, log_items, i);
+		xfs_trans_ail_delete_bulk(ailp, log_items, i,
+					  SHUTDOWN_CORRUPT_INCORE);
 	}
 
 
@@ -869,16 +870,15 @@ xfs_iflush_done(
 }
 
 /*
- * This is the inode flushing abort routine.  It is called
- * from xfs_iflush when the filesystem is shutting down to clean
- * up the inode state.
- * It is responsible for removing the inode item
- * from the AIL if it has not been re-logged, and unlocking the inode's
- * flush lock.
+ * This is the inode flushing abort routine.  It is called from xfs_iflush when
+ * the filesystem is shutting down to clean up the inode state.  It is
+ * responsible for removing the inode item from the AIL if it has not been
+ * re-logged, and unlocking the inode's flush lock.
  */
 void
 xfs_iflush_abort(
-	xfs_inode_t		*ip)
+	xfs_inode_t		*ip,
+	bool			stale)
 {
 	xfs_inode_log_item_t	*iip = ip->i_itemp;
 
@@ -888,7 +888,10 @@ xfs_iflush_abort(
 			spin_lock(&ailp->xa_lock);
 			if (iip->ili_item.li_flags & XFS_LI_IN_AIL) {
 				/* xfs_trans_ail_delete() drops the AIL lock. */
-				xfs_trans_ail_delete(ailp, (xfs_log_item_t *)iip);
+				xfs_trans_ail_delete(ailp, &iip->ili_item,
+						stale ?
+						     SHUTDOWN_LOG_IO_ERROR :
+						     SHUTDOWN_CORRUPT_INCORE);
 			} else
 				spin_unlock(&ailp->xa_lock);
 		}
@@ -915,7 +918,7 @@ xfs_istale_done(
 	struct xfs_buf		*bp,
 	struct xfs_log_item	*lip)
 {
-	xfs_iflush_abort(INODE_ITEM(lip)->ili_inode);
+	xfs_iflush_abort(INODE_ITEM(lip)->ili_inode, true);
 }
 
 /*
diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h
index 41d61c3..376d4d0 100644
--- a/fs/xfs/xfs_inode_item.h
+++ b/fs/xfs/xfs_inode_item.h
@@ -165,7 +165,7 @@ extern void xfs_inode_item_init(struct xfs_inode *, struct xfs_mount *);
 extern void xfs_inode_item_destroy(struct xfs_inode *);
 extern void xfs_iflush_done(struct xfs_buf *, struct xfs_log_item *);
 extern void xfs_istale_done(struct xfs_buf *, struct xfs_log_item *);
-extern void xfs_iflush_abort(struct xfs_inode *);
+extern void xfs_iflush_abort(struct xfs_inode *, bool);
 extern int xfs_inode_item_format_convert(xfs_log_iovec_t *,
 					 xfs_inode_log_format_t *);
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 7c75c73..41f0694 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2642,7 +2642,8 @@ xlog_recover_efd_pass2(
 				 * xfs_trans_ail_delete() drops the
 				 * AIL lock.
 				 */
-				xfs_trans_ail_delete(ailp, lip);
+				xfs_trans_ail_delete(ailp, lip,
+						     SHUTDOWN_CORRUPT_INCORE);
 				xfs_efi_item_free(efip);
 				spin_lock(&ailp->xa_lock);
 				break;
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 1dead07..a6016e4 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -700,7 +700,8 @@ void
 xfs_trans_ail_delete_bulk(
 	struct xfs_ail		*ailp,
 	struct xfs_log_item	**log_items,
-	int			nr_items) __releases(ailp->xa_lock)
+	int			nr_items,
+	int			shutdown_type) __releases(ailp->xa_lock)
 {
 	xfs_log_item_t		*mlip;
 	int			mlip_changed = 0;
@@ -718,7 +719,7 @@ xfs_trans_ail_delete_bulk(
 				xfs_alert_tag(mp, XFS_PTAG_AILDELETE,
 		"%s: attempting to delete a log item that is not in the AIL",
 						__func__);
-				xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+				xfs_force_shutdown(mp, shutdown_type);
 			}
 			return;
 		}
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 8ab2ced..449c881 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -90,14 +90,16 @@ xfs_trans_ail_update(
 }
 
 void	xfs_trans_ail_delete_bulk(struct xfs_ail *ailp,
-				struct xfs_log_item **log_items, int nr_items)
+				struct xfs_log_item **log_items, int nr_items,
+				int shutdown_type)
 				__releases(ailp->xa_lock);
 static inline void
 xfs_trans_ail_delete(
 	struct xfs_ail	*ailp,
-	xfs_log_item_t	*lip) __releases(ailp->xa_lock)
+	xfs_log_item_t	*lip,
+	int		shutdown_type) __releases(ailp->xa_lock)
 {
-	xfs_trans_ail_delete_bulk(ailp, &lip, 1);
+	xfs_trans_ail_delete_bulk(ailp, &lip, 1, shutdown_type);
 }
 
 void			xfs_ail_push(struct xfs_ail *, xfs_lsn_t);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] shutdown hang fix (was Re: [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait())
  2012-03-27  9:45         ` [PATCH v2] shutdown hang fix (was Re: [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait()) Dave Chinner
@ 2012-03-27  9:57           ` Christoph Hellwig
  2012-04-06 17:42           ` Mark Tinguely
  1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2012-03-27  9:57 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

This one looks a lot simpler to me, so it gets my vote.


Reviewed-by: Christoph Hellwig <hch@lst.de>

Alex, can we push this to Linus soonish?  I'll have to rebase the
AIL emptying changes on top of this one, so getting it in soon would
make my life a lot easier.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] shutdown hang fix (was Re: [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait())
  2012-03-27  9:45         ` [PATCH v2] shutdown hang fix (was Re: [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait()) Dave Chinner
  2012-03-27  9:57           ` Christoph Hellwig
@ 2012-04-06 17:42           ` Mark Tinguely
  1 sibling, 0 replies; 10+ messages in thread
From: Mark Tinguely @ 2012-04-06 17:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 03/27/12 04:45, Dave Chinner wrote:
>
> xfs: pass shutdown method into xfs_trans_ail_delete_bulk
>
> From: Dave Chinner<dchinner@redhat.com>
>
> xfs_trans_ail_delete_bulk() can be called from different contexts so
> if the item is not in the AIL we need different shutdown for each
> context.  Pass in the shutdown method needed so the correct action
> can be taken.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>

Looks good.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-04-06 17:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-23  1:47 [PATCH 0/2] xfs: fix a class of shutdown hangs Dave Chinner
2012-03-23  1:47 ` [PATCH 1/2] xfs: move shutdown out of xfs_trans_ail_delete_bulk Dave Chinner
2012-03-24 17:01   ` Christoph Hellwig
2012-03-23  1:47 ` [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait() Dave Chinner
2012-03-24 17:03   ` Christoph Hellwig
2012-03-24 22:46     ` Dave Chinner
2012-03-25 11:28       ` Christoph Hellwig
2012-03-27  9:45         ` [PATCH v2] shutdown hang fix (was Re: [PATCH 2/2] xfs: avoid shutdown hang in xlog_wait()) Dave Chinner
2012-03-27  9:57           ` Christoph Hellwig
2012-04-06 17:42           ` Mark Tinguely

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.