linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] xfs: flush related error handling cleanups
@ 2020-04-17 15:08 Brian Foster
  2020-04-17 15:08 ` [PATCH 01/12] xfs: refactor failed buffer resubmission into xfsaild Brian Foster
                   ` (12 more replies)
  0 siblings, 13 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-17 15:08 UTC (permalink / raw)
  To: linux-xfs

Hi all,

This actually started as what I intended to be a cleanup of xfsaild
error handling and the fact that unexpected errors are kind of lost in
the ->iop_push() handlers of flushable log items. Some discussion with
Dave on that is available here[1]. I was thinking of genericizing the
behavior, but I'm not so sure that is possible now given the error
handling requirements of the associated items.

While thinking through that, I ended up incorporating various cleanups
in the somewhat confusing and erratic error handling on the periphery of
xfsaild, such as the flush handlers. Most of these are straightforward
cleanups except for patch 9, which I think requires careful review and
is of debatable value. I have used patch 12 to run an hour or so of
highly concurrent fsstress load against it and will execute a longer run
over the weekend now that fstests has completed.

Thoughts, reviews, flames appreciated.

Brian

[1] https://lore.kernel.org/linux-xfs/20200331114653.GA53541@bfoster/

Brian Foster (12):
  xfs: refactor failed buffer resubmission into xfsaild
  xfs: factor out buffer I/O failure simulation code
  xfs: always attach iflush_done and simplify error handling
  xfs: remove unnecessary shutdown check from xfs_iflush()
  xfs: ratelimit unmount time per-buffer I/O error warning
  xfs: remove duplicate verification from xfs_qm_dqflush()
  xfs: abort consistently on dquot flush failure
  xfs: remove unnecessary quotaoff intent item push handler
  xfs: elide the AIL lock on log item failure tracking
  xfs: clean up AIL log item removal functions
  xfs: remove unused iflush stale parameter
  xfs: random buffer write failure errortag

 fs/xfs/libxfs/xfs_errortag.h  |   4 +-
 fs/xfs/libxfs/xfs_inode_buf.c |   7 +--
 fs/xfs/xfs_bmap_item.c        |   2 +-
 fs/xfs/xfs_buf.c              |  36 ++++++++---
 fs/xfs/xfs_buf.h              |   1 +
 fs/xfs/xfs_buf_item.c         |  86 ++++----------------------
 fs/xfs/xfs_buf_item.h         |   2 -
 fs/xfs/xfs_dquot.c            |  84 ++++++++------------------
 fs/xfs/xfs_dquot_item.c       |  31 +---------
 fs/xfs/xfs_error.c            |   3 +
 fs/xfs/xfs_extfree_item.c     |   2 +-
 fs/xfs/xfs_icache.c           |   2 +-
 fs/xfs/xfs_inode.c            | 110 +++++++++-------------------------
 fs/xfs/xfs_inode_item.c       |  39 +++---------
 fs/xfs/xfs_inode_item.h       |   2 +-
 fs/xfs/xfs_refcount_item.c    |   2 +-
 fs/xfs/xfs_rmap_item.c        |   2 +-
 fs/xfs/xfs_trans_ail.c        |  52 +++++++++++++++-
 fs/xfs/xfs_trans_priv.h       |  22 +++----
 19 files changed, 175 insertions(+), 314 deletions(-)

-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 01/12] xfs: refactor failed buffer resubmission into xfsaild
  2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
@ 2020-04-17 15:08 ` Brian Foster
  2020-04-17 22:37   ` Allison Collins
  2020-04-20  2:45   ` Dave Chinner
  2020-04-17 15:08 ` [PATCH 02/12] xfs: factor out buffer I/O failure simulation code Brian Foster
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-17 15:08 UTC (permalink / raw)
  To: linux-xfs

Flush locked log items whose underlying buffers fail metadata
writeback are tagged with a special flag to indicate that the flush
lock is already held. This is currently implemented in the type
specific ->iop_push() callback, but the processing required for such
items is not type specific because we're only doing basic state
management on the underlying buffer.

Factor the failed log item handling out of the inode and dquot
->iop_push() callbacks and open code the buffer resubmit helper into
a single helper called from xfsaild_push_item(). This provides a
generic mechanism for handling failed metadata buffer writeback with
a bit less code.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf_item.c   | 39 ---------------------------------------
 fs/xfs/xfs_buf_item.h   |  2 --
 fs/xfs/xfs_dquot_item.c | 15 ---------------
 fs/xfs/xfs_inode_item.c | 15 ---------------
 fs/xfs/xfs_trans_ail.c  | 41 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 41 insertions(+), 71 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 1545657c3ca0..8796adde2d12 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -1248,42 +1248,3 @@ xfs_buf_iodone(
 	xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
 	xfs_buf_item_free(BUF_ITEM(lip));
 }
-
-/*
- * Requeue a failed buffer for writeback.
- *
- * We clear the log item failed state here as well, but we have to be careful
- * about reference counts because the only active reference counts on the buffer
- * may be the failed log items. Hence if we clear the log item failed state
- * before queuing the buffer for IO we can release all active references to
- * the buffer and free it, leading to use after free problems in
- * xfs_buf_delwri_queue. It makes no difference to the buffer or log items which
- * order we process them in - the buffer is locked, and we own the buffer list
- * so nothing on them is going to change while we are performing this action.
- *
- * Hence we can safely queue the buffer for IO before we clear the failed log
- * item state, therefore  always having an active reference to the buffer and
- * avoiding the transient zero-reference state that leads to use-after-free.
- *
- * Return true if the buffer was added to the buffer list, false if it was
- * already on the buffer list.
- */
-bool
-xfs_buf_resubmit_failed_buffers(
-	struct xfs_buf		*bp,
-	struct list_head	*buffer_list)
-{
-	struct xfs_log_item	*lip;
-	bool			ret;
-
-	ret = xfs_buf_delwri_queue(bp, buffer_list);
-
-	/*
-	 * XFS_LI_FAILED set/clear is protected by ail_lock, caller of this
-	 * function already have it acquired
-	 */
-	list_for_each_entry(lip, &bp->b_li_list, li_bio_list)
-		xfs_clear_li_failed(lip);
-
-	return ret;
-}
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index 30114b510332..c9c57e2da932 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -59,8 +59,6 @@ void	xfs_buf_attach_iodone(struct xfs_buf *,
 			      struct xfs_log_item *);
 void	xfs_buf_iodone_callbacks(struct xfs_buf *);
 void	xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *);
-bool	xfs_buf_resubmit_failed_buffers(struct xfs_buf *,
-					struct list_head *);
 bool	xfs_buf_log_check_iovec(struct xfs_log_iovec *iovec);
 
 extern kmem_zone_t	*xfs_buf_item_zone;
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index baad1748d0d1..5a7808299a32 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -145,21 +145,6 @@ xfs_qm_dquot_logitem_push(
 	if (atomic_read(&dqp->q_pincount) > 0)
 		return XFS_ITEM_PINNED;
 
-	/*
-	 * The buffer containing this item failed to be written back
-	 * previously. Resubmit the buffer for IO
-	 */
-	if (test_bit(XFS_LI_FAILED, &lip->li_flags)) {
-		if (!xfs_buf_trylock(bp))
-			return XFS_ITEM_LOCKED;
-
-		if (!xfs_buf_resubmit_failed_buffers(bp, buffer_list))
-			rval = XFS_ITEM_FLUSHING;
-
-		xfs_buf_unlock(bp);
-		return rval;
-	}
-
 	if (!xfs_dqlock_nowait(dqp))
 		return XFS_ITEM_LOCKED;
 
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index f779cca2346f..1d4d256a2e96 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -497,21 +497,6 @@ xfs_inode_item_push(
 	if (xfs_ipincount(ip) > 0)
 		return XFS_ITEM_PINNED;
 
-	/*
-	 * The buffer containing this item failed to be written back
-	 * previously. Resubmit the buffer for IO.
-	 */
-	if (test_bit(XFS_LI_FAILED, &lip->li_flags)) {
-		if (!xfs_buf_trylock(bp))
-			return XFS_ITEM_LOCKED;
-
-		if (!xfs_buf_resubmit_failed_buffers(bp, buffer_list))
-			rval = XFS_ITEM_FLUSHING;
-
-		xfs_buf_unlock(bp);
-		return rval;
-	}
-
 	if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED))
 		return XFS_ITEM_LOCKED;
 
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 564253550b75..0c709651a2c6 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -345,6 +345,45 @@ xfs_ail_delete(
 	xfs_trans_ail_cursor_clear(ailp, lip);
 }
 
+/*
+ * Requeue a failed buffer for writeback.
+ *
+ * We clear the log item failed state here as well, but we have to be careful
+ * about reference counts because the only active reference counts on the buffer
+ * may be the failed log items. Hence if we clear the log item failed state
+ * before queuing the buffer for IO we can release all active references to
+ * the buffer and free it, leading to use after free problems in
+ * xfs_buf_delwri_queue. It makes no difference to the buffer or log items which
+ * order we process them in - the buffer is locked, and we own the buffer list
+ * so nothing on them is going to change while we are performing this action.
+ *
+ * Hence we can safely queue the buffer for IO before we clear the failed log
+ * item state, therefore  always having an active reference to the buffer and
+ * avoiding the transient zero-reference state that leads to use-after-free.
+ */
+static inline int
+xfsaild_push_failed(
+	struct xfs_log_item	*lip,
+	struct list_head	*buffer_list)
+{
+	struct xfs_buf		*bp = lip->li_buf;
+
+	if (!xfs_buf_trylock(bp))
+		return XFS_ITEM_LOCKED;
+
+	if (!xfs_buf_delwri_queue(bp, buffer_list)) {
+		xfs_buf_unlock(bp);
+		return XFS_ITEM_FLUSHING;
+	}
+
+	/* protected by ail_lock */
+	list_for_each_entry(lip, &bp->b_li_list, li_bio_list)
+		xfs_clear_li_failed(lip);
+
+	xfs_buf_unlock(bp);
+	return XFS_ITEM_SUCCESS;
+}
+
 static inline uint
 xfsaild_push_item(
 	struct xfs_ail		*ailp,
@@ -365,6 +404,8 @@ xfsaild_push_item(
 	 */
 	if (!lip->li_ops->iop_push)
 		return XFS_ITEM_PINNED;
+	if (test_bit(XFS_LI_FAILED, &lip->li_flags))
+		return xfsaild_push_failed(lip, &ailp->ail_buf_list);
 	return lip->li_ops->iop_push(lip, &ailp->ail_buf_list);
 }
 
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 02/12] xfs: factor out buffer I/O failure simulation code
  2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
  2020-04-17 15:08 ` [PATCH 01/12] xfs: refactor failed buffer resubmission into xfsaild Brian Foster
@ 2020-04-17 15:08 ` Brian Foster
  2020-04-17 22:37   ` Allison Collins
  2020-04-20  2:48   ` Dave Chinner
  2020-04-17 15:08 ` [PATCH 03/12] xfs: always attach iflush_done and simplify error handling Brian Foster
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-17 15:08 UTC (permalink / raw)
  To: linux-xfs

We use the same buffer I/O failure simulation code in a few
different places. It's not much code, but it's not necessarily
self-explanatory. Factor it into a helper and document it in one
place.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf.c      | 23 +++++++++++++++++++----
 fs/xfs/xfs_buf.h      |  1 +
 fs/xfs/xfs_buf_item.c | 22 +++-------------------
 fs/xfs/xfs_inode.c    |  7 +------
 4 files changed, 24 insertions(+), 29 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 9ec3eaf1c618..93942d8e35dd 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1248,6 +1248,24 @@ xfs_buf_ioerror_alert(
 			-bp->b_error);
 }
 
+/*
+  * To simulate an I/O failure, the buffer must be locked and held with at least
+ * three references. The LRU reference is dropped by the stale call. The buf
+ * item reference is dropped via ioend processing. The third reference is owned
+ * by the caller and is dropped on I/O completion if the buffer is XBF_ASYNC.
+ */
+void
+xfs_buf_iofail(
+	struct xfs_buf	*bp,
+	int		flags)
+{
+	bp->b_flags |= flags;
+	bp->b_flags &= ~XBF_DONE;
+	xfs_buf_stale(bp);
+	xfs_buf_ioerror(bp, -EIO);
+	xfs_buf_ioend(bp);
+}
+
 int
 xfs_bwrite(
 	struct xfs_buf		*bp)
@@ -1480,10 +1498,7 @@ __xfs_buf_submit(
 
 	/* on shutdown we stale and complete the buffer immediately */
 	if (XFS_FORCED_SHUTDOWN(bp->b_mount)) {
-		xfs_buf_ioerror(bp, -EIO);
-		bp->b_flags &= ~XBF_DONE;
-		xfs_buf_stale(bp);
-		xfs_buf_ioend(bp);
+		xfs_buf_iofail(bp, 0);
 		return -EIO;
 	}
 
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 9a04c53c2488..a6bce4702b2e 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -263,6 +263,7 @@ extern void __xfs_buf_ioerror(struct xfs_buf *bp, int error,
 		xfs_failaddr_t failaddr);
 #define xfs_buf_ioerror(bp, err) __xfs_buf_ioerror((bp), (err), __this_address)
 extern void xfs_buf_ioerror_alert(struct xfs_buf *bp, xfs_failaddr_t fa);
+void xfs_buf_iofail(struct xfs_buf *, int);
 
 extern int __xfs_buf_submit(struct xfs_buf *bp, bool);
 static inline int xfs_buf_submit(struct xfs_buf *bp)
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 8796adde2d12..72d37a4609d8 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -471,28 +471,12 @@ xfs_buf_item_unpin(
 		xfs_buf_relse(bp);
 	} else if (freed && remove) {
 		/*
-		 * There are currently two references to the buffer - the active
-		 * LRU reference and the buf log item. What we are about to do
-		 * here - simulate a failed IO completion - requires 3
-		 * references.
-		 *
-		 * The LRU reference is removed by the xfs_buf_stale() call. The
-		 * buf item reference is removed by the xfs_buf_iodone()
-		 * callback that is run by xfs_buf_do_callbacks() during ioend
-		 * processing (via the bp->b_iodone callback), and then finally
-		 * the ioend processing will drop the IO reference if the buffer
-		 * is marked XBF_ASYNC.
-		 *
-		 * Hence we need to take an additional reference here so that IO
-		 * completion processing doesn't free the buffer prematurely.
+		 * The buffer must be locked and held by the caller to simulate
+		 * an async I/O failure.
 		 */
 		xfs_buf_lock(bp);
 		xfs_buf_hold(bp);
-		bp->b_flags |= XBF_ASYNC;
-		xfs_buf_ioerror(bp, -EIO);
-		bp->b_flags &= ~XBF_DONE;
-		xfs_buf_stale(bp);
-		xfs_buf_ioend(bp);
+		xfs_buf_iofail(bp, XBF_ASYNC);
 	}
 }
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index d1772786af29..b539ee221ce5 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3629,12 +3629,7 @@ xfs_iflush_cluster(
 	 * xfs_buf_submit().
 	 */
 	ASSERT(bp->b_iodone);
-	bp->b_flags |= XBF_ASYNC;
-	bp->b_flags &= ~XBF_DONE;
-	xfs_buf_stale(bp);
-	xfs_buf_ioerror(bp, -EIO);
-	xfs_buf_ioend(bp);
-
+	xfs_buf_iofail(bp, XBF_ASYNC);
 	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 
 	/* abort the corrupt inode, as it was not attached to the buffer */
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 03/12] xfs: always attach iflush_done and simplify error handling
  2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
  2020-04-17 15:08 ` [PATCH 01/12] xfs: refactor failed buffer resubmission into xfsaild Brian Foster
  2020-04-17 15:08 ` [PATCH 02/12] xfs: factor out buffer I/O failure simulation code Brian Foster
@ 2020-04-17 15:08 ` Brian Foster
  2020-04-18  0:07   ` Allison Collins
  2020-04-20  3:08   ` Dave Chinner
  2020-04-17 15:08 ` [PATCH 04/12] xfs: remove unnecessary shutdown check from xfs_iflush() Brian Foster
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-17 15:08 UTC (permalink / raw)
  To: linux-xfs

The inode flush code has several layers of error handling between
the inode and cluster flushing code. If the inode flush fails before
acquiring the backing buffer, the inode flush is aborted. If the
cluster flush fails, the current inode flush is aborted and the
cluster buffer is failed to handle the initial inode and any others
that might have been attached before the error.

Since xfs_iflush() is the only caller of xfs_iflush_cluser(), the
error handling between the two can be condensed in the top-level
function. If we update xfs_iflush_int() to attach the item
completion handler to the buffer first, any errors that occur after
the first call to xfs_iflush_int() can be handled with a buffer
I/O failure.

Lift the error handling from xfs_iflush_cluster() into xfs_iflush()
and consolidate with the existing error handling. This also replaces
the need to release the buffer because failing the buffer with
XBF_ASYNC drops the current reference.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_inode.c | 94 +++++++++++++++-------------------------------
 1 file changed, 30 insertions(+), 64 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index b539ee221ce5..4c9971ec6fa6 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3496,6 +3496,7 @@ xfs_iflush_cluster(
 	struct xfs_inode	**cilist;
 	struct xfs_inode	*cip;
 	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
+	int			error = 0;
 	int			nr_found;
 	int			clcount = 0;
 	int			i;
@@ -3588,11 +3589,10 @@ xfs_iflush_cluster(
 		 * re-check that it's dirty before flushing.
 		 */
 		if (!xfs_inode_clean(cip)) {
-			int	error;
 			error = xfs_iflush_int(cip, bp);
 			if (error) {
 				xfs_iunlock(cip, XFS_ILOCK_SHARED);
-				goto cluster_corrupt_out;
+				goto out_free;
 			}
 			clcount++;
 		} else {
@@ -3611,32 +3611,7 @@ xfs_iflush_cluster(
 	kmem_free(cilist);
 out_put:
 	xfs_perag_put(pag);
-	return 0;
-
-
-cluster_corrupt_out:
-	/*
-	 * Corruption detected in the clustering loop.  Invalidate the
-	 * inode buffer and shut down the filesystem.
-	 */
-	rcu_read_unlock();
-
-	/*
-	 * We'll always have an inode attached to the buffer for completion
-	 * process by the time we are called from xfs_iflush(). Hence we have
-	 * always need to do IO completion processing to abort the inodes
-	 * attached to the buffer.  handle them just like the shutdown case in
-	 * xfs_buf_submit().
-	 */
-	ASSERT(bp->b_iodone);
-	xfs_buf_iofail(bp, XBF_ASYNC);
-	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
-
-	/* abort the corrupt inode, as it was not attached to the buffer */
-	xfs_iflush_abort(cip, false);
-	kmem_free(cilist);
-	xfs_perag_put(pag);
-	return -EFSCORRUPTED;
+	return error;
 }
 
 /*
@@ -3692,17 +3667,16 @@ xfs_iflush(
 	 */
 	if (XFS_FORCED_SHUTDOWN(mp)) {
 		error = -EIO;
-		goto abort_out;
+		goto abort;
 	}
 
 	/*
 	 * Get the buffer containing the on-disk inode. We are doing a try-lock
-	 * operation here, so we may get  an EAGAIN error. In that case, we
-	 * simply want to return with the inode still dirty.
+	 * operation here, so we may get an EAGAIN error. In that case, return
+	 * leaving the inode dirty.
 	 *
 	 * If we get any other error, we effectively have a corruption situation
-	 * and we cannot flush the inode, so we treat it the same as failing
-	 * xfs_iflush_int().
+	 * and we cannot flush the inode. Abort the flush and shut down.
 	 */
 	error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK,
 			       0);
@@ -3711,14 +3685,7 @@ xfs_iflush(
 		return error;
 	}
 	if (error)
-		goto corrupt_out;
-
-	/*
-	 * First flush out the inode that xfs_iflush was called with.
-	 */
-	error = xfs_iflush_int(ip, bp);
-	if (error)
-		goto corrupt_out;
+		goto abort;
 
 	/*
 	 * If the buffer is pinned then push on the log now so we won't
@@ -3728,28 +3695,28 @@ xfs_iflush(
 		xfs_log_force(mp, 0);
 
 	/*
-	 * inode clustering: try to gather other inodes into this write
+	 * Flush the provided inode then attempt to gather others from the
+	 * cluster into the write.
 	 *
-	 * Note: Any error during clustering will result in the filesystem
-	 * being shut down and completion callbacks run on the cluster buffer.
-	 * As we have already flushed and attached this inode to the buffer,
-	 * it has already been aborted and released by xfs_iflush_cluster() and
-	 * so we have no further error handling to do here.
+	 * Note: Once we attempt to flush an inode, we must run buffer
+	 * completion callbacks on any failure. If this fails, simulate an I/O
+	 * failure on the buffer and shut down.
 	 */
-	error = xfs_iflush_cluster(ip, bp);
-	if (error)
-		return error;
+	error = xfs_iflush_int(ip, bp);
+	if (!error)
+		error = xfs_iflush_cluster(ip, bp);
+	if (error) {
+		xfs_buf_iofail(bp, XBF_ASYNC);
+		goto shutdown;
+	}
 
 	*bpp = bp;
 	return 0;
 
-corrupt_out:
-	if (bp)
-		xfs_buf_relse(bp);
-	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
-abort_out:
-	/* abort the corrupt inode, as it was not attached to the buffer */
+abort:
 	xfs_iflush_abort(ip, false);
+shutdown:
+	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 	return error;
 }
 
@@ -3798,6 +3765,13 @@ xfs_iflush_int(
 	       ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
 	ASSERT(iip != NULL && iip->ili_fields != 0);
 
+	/*
+	 * Attach the inode item callback to the buffer. Whether the flush
+	 * succeeds or not, buffer I/O completion processing is now required to
+	 * remove the inode from the AIL and release the flush lock.
+	 */
+	xfs_buf_attach_iodone(bp, xfs_iflush_done, &iip->ili_item);
+
 	/* set *dip = inode's place in the buffer */
 	dip = xfs_buf_offset(bp, ip->i_imap.im_boffset);
 
@@ -3913,14 +3887,6 @@ xfs_iflush_int(
 	xfs_trans_ail_copy_lsn(mp->m_ail, &iip->ili_flush_lsn,
 				&iip->ili_item.li_lsn);
 
-	/*
-	 * Attach the function xfs_iflush_done to the inode's
-	 * buffer.  This will remove the inode from the AIL
-	 * and unlock the inode's flush lock when the inode is
-	 * completely written to disk.
-	 */
-	xfs_buf_attach_iodone(bp, xfs_iflush_done, &iip->ili_item);
-
 	/* generate the checksum. */
 	xfs_dinode_calc_crc(mp, dip);
 
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 04/12] xfs: remove unnecessary shutdown check from xfs_iflush()
  2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
                   ` (2 preceding siblings ...)
  2020-04-17 15:08 ` [PATCH 03/12] xfs: always attach iflush_done and simplify error handling Brian Foster
@ 2020-04-17 15:08 ` Brian Foster
  2020-04-18  0:27   ` Allison Collins
  2020-04-20  3:10   ` Dave Chinner
  2020-04-17 15:08 ` [PATCH 05/12] xfs: ratelimit unmount time per-buffer I/O error warning Brian Foster
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-17 15:08 UTC (permalink / raw)
  To: linux-xfs

The shutdown check in xfs_iflush() duplicates checks down in the
buffer code. If the fs is shut down, xfs_trans_read_buf_map() always
returns an error and falls into the same error path. Remove the
unnecessary check along with the warning in xfs_imap_to_bp()
that generates excessive noise in the log if the fs is shut down.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_inode_buf.c |  7 +------
 fs/xfs/xfs_inode.c            | 13 -------------
 2 files changed, 1 insertion(+), 19 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 39c5a6e24915..b102e611bf54 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -172,12 +172,7 @@ xfs_imap_to_bp(
 				   (int)imap->im_len, buf_flags, &bp,
 				   &xfs_inode_buf_ops);
 	if (error) {
-		if (error == -EAGAIN) {
-			ASSERT(buf_flags & XBF_TRYLOCK);
-			return error;
-		}
-		xfs_warn(mp, "%s: xfs_trans_read_buf() returned error %d.",
-			__func__, error);
+		ASSERT(error != -EAGAIN || (buf_flags & XBF_TRYLOCK));
 		return error;
 	}
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 4c9971ec6fa6..98ee1b10d1b0 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3657,19 +3657,6 @@ xfs_iflush(
 		return 0;
 	}
 
-	/*
-	 * This may have been unpinned because the filesystem is shutting
-	 * down forcibly. If that's the case we must not write this inode
-	 * to disk, because the log record didn't make it to disk.
-	 *
-	 * We also have to remove the log item from the AIL in this case,
-	 * as we wait for an empty AIL as part of the unmount process.
-	 */
-	if (XFS_FORCED_SHUTDOWN(mp)) {
-		error = -EIO;
-		goto abort;
-	}
-
 	/*
 	 * Get the buffer containing the on-disk inode. We are doing a try-lock
 	 * operation here, so we may get an EAGAIN error. In that case, return
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 05/12] xfs: ratelimit unmount time per-buffer I/O error warning
  2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
                   ` (3 preceding siblings ...)
  2020-04-17 15:08 ` [PATCH 04/12] xfs: remove unnecessary shutdown check from xfs_iflush() Brian Foster
@ 2020-04-17 15:08 ` Brian Foster
  2020-04-20  3:19   ` Dave Chinner
  2020-04-20 18:50   ` Allison Collins
  2020-04-17 15:08 ` [PATCH 06/12] xfs: remove duplicate verification from xfs_qm_dqflush() Brian Foster
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-17 15:08 UTC (permalink / raw)
  To: linux-xfs

At unmount time, XFS emits a warning for every in-core buffer that
might have undergone a write error. In practice this behavior is
probably reasonable given that the filesystem is likely short lived
once I/O errors begin to occur consistently. Under certain test or
otherwise expected error conditions, this can spam the logs and slow
down the unmount. Ratelimit the warning to prevent this problem
while still informing the user that errors have occurred.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 93942d8e35dd..5120fed06075 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1685,11 +1685,10 @@ xfs_wait_buftarg(
 			bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
 			list_del_init(&bp->b_lru);
 			if (bp->b_flags & XBF_WRITE_FAIL) {
-				xfs_alert(btp->bt_mount,
-"Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!",
+				xfs_alert_ratelimited(btp->bt_mount,
+"Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!\n"
+"Please run xfs_repair to determine the extent of the problem.",
 					(long long)bp->b_bn);
-				xfs_alert(btp->bt_mount,
-"Please run xfs_repair to determine the extent of the problem.");
 			}
 			xfs_buf_rele(bp);
 		}
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 06/12] xfs: remove duplicate verification from xfs_qm_dqflush()
  2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
                   ` (4 preceding siblings ...)
  2020-04-17 15:08 ` [PATCH 05/12] xfs: ratelimit unmount time per-buffer I/O error warning Brian Foster
@ 2020-04-17 15:08 ` Brian Foster
  2020-04-20  3:53   ` Dave Chinner
  2020-04-17 15:08 ` [PATCH 07/12] xfs: abort consistently on dquot flush failure Brian Foster
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: Brian Foster @ 2020-04-17 15:08 UTC (permalink / raw)
  To: linux-xfs

The dquot read/write verifier calls xfs_dqblk_verify() on every
dquot in the buffer. Remove the duplicate call from
xfs_qm_dqflush().

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_dquot.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index af2c8e5ceea0..73032c18a94a 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1071,7 +1071,6 @@ xfs_qm_dqflush(
 	struct xfs_buf		*bp;
 	struct xfs_dqblk	*dqb;
 	struct xfs_disk_dquot	*ddqp;
-	xfs_failaddr_t		fa;
 	int			error;
 
 	ASSERT(XFS_DQ_IS_LOCKED(dqp));
@@ -1116,19 +1115,6 @@ xfs_qm_dqflush(
 	dqb = bp->b_addr + dqp->q_bufoffset;
 	ddqp = &dqb->dd_diskdq;
 
-	/*
-	 * A simple sanity check in case we got a corrupted dquot.
-	 */
-	fa = xfs_dqblk_verify(mp, dqb, be32_to_cpu(ddqp->d_id), 0);
-	if (fa) {
-		xfs_alert(mp, "corrupt dquot ID 0x%x in memory at %pS",
-				be32_to_cpu(ddqp->d_id), fa);
-		xfs_buf_relse(bp);
-		xfs_dqfunlock(dqp);
-		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
-		return -EFSCORRUPTED;
-	}
-
 	/* This is the only portion of data that needs to persist */
 	memcpy(ddqp, &dqp->q_core, sizeof(struct xfs_disk_dquot));
 
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 07/12] xfs: abort consistently on dquot flush failure
  2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
                   ` (5 preceding siblings ...)
  2020-04-17 15:08 ` [PATCH 06/12] xfs: remove duplicate verification from xfs_qm_dqflush() Brian Foster
@ 2020-04-17 15:08 ` Brian Foster
  2020-04-20  3:54   ` Dave Chinner
  2020-04-20 18:50   ` Allison Collins
  2020-04-17 15:08 ` [PATCH 08/12] xfs: remove unnecessary quotaoff intent item push handler Brian Foster
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-17 15:08 UTC (permalink / raw)
  To: linux-xfs

The dquot flush handler effectively aborts the dquot flush if the
filesystem is already shut down, but doesn't actually shut down if
the flush fails. Update xfs_qm_dqflush() to consistently abort the
dquot flush and shutdown the fs if the flush fails with an
unexpected error.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_dquot.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 73032c18a94a..41750f797861 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1068,6 +1068,7 @@ xfs_qm_dqflush(
 	struct xfs_buf		**bpp)
 {
 	struct xfs_mount	*mp = dqp->q_mount;
+	struct xfs_log_item	*lip = &dqp->q_logitem.qli_item;
 	struct xfs_buf		*bp;
 	struct xfs_dqblk	*dqb;
 	struct xfs_disk_dquot	*ddqp;
@@ -1082,32 +1083,16 @@ xfs_qm_dqflush(
 
 	xfs_qm_dqunpin_wait(dqp);
 
-	/*
-	 * This may have been unpinned because the filesystem is shutting
-	 * down forcibly. If that's the case we must not write this dquot
-	 * to disk, because the log record didn't make it to disk.
-	 *
-	 * We also have to remove the log item from the AIL in this case,
-	 * as we wait for an emptry AIL as part of the unmount process.
-	 */
-	if (XFS_FORCED_SHUTDOWN(mp)) {
-		struct xfs_log_item	*lip = &dqp->q_logitem.qli_item;
-		dqp->dq_flags &= ~XFS_DQ_DIRTY;
-
-		xfs_trans_ail_remove(lip, SHUTDOWN_CORRUPT_INCORE);
-
-		error = -EIO;
-		goto out_unlock;
-	}
-
 	/*
 	 * Get the buffer containing the on-disk dquot
 	 */
 	error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp, dqp->q_blkno,
 				   mp->m_quotainfo->qi_dqchunklen, XBF_TRYLOCK,
 				   &bp, &xfs_dquot_buf_ops);
-	if (error)
+	if (error == -EAGAIN)
 		goto out_unlock;
+	if (error)
+		goto out_abort;
 
 	/*
 	 * Calculate the location of the dquot inside the buffer.
@@ -1161,6 +1146,10 @@ xfs_qm_dqflush(
 	*bpp = bp;
 	return 0;
 
+out_abort:
+	dqp->dq_flags &= ~XFS_DQ_DIRTY;
+	xfs_trans_ail_remove(lip, SHUTDOWN_CORRUPT_INCORE);
+	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 out_unlock:
 	xfs_dqfunlock(dqp);
 	return error;
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 08/12] xfs: remove unnecessary quotaoff intent item push handler
  2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
                   ` (6 preceding siblings ...)
  2020-04-17 15:08 ` [PATCH 07/12] xfs: abort consistently on dquot flush failure Brian Foster
@ 2020-04-17 15:08 ` Brian Foster
  2020-04-20  3:58   ` Dave Chinner
  2020-04-17 15:08 ` [PATCH 09/12] xfs: elide the AIL lock on log item failure tracking Brian Foster
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: Brian Foster @ 2020-04-17 15:08 UTC (permalink / raw)
  To: linux-xfs

The quotaoff intent item push handler unconditionally returns locked
status because it remains AIL resident until removed by the
quotafoff end intent. xfsaild_push_item() already returns pinned
status for items (generally intents) without a push handler. This is
effectively the same behavior for the purpose of quotaoff, so remove
the unnecessary quotaoff push handler.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_dquot_item.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index 5a7808299a32..582b3796a0c9 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -274,18 +274,6 @@ xfs_qm_qoff_logitem_format(
 	xlog_finish_iovec(lv, vecp, sizeof(struct xfs_qoff_logitem));
 }
 
-/*
- * There isn't much you can do to push a quotaoff item.  It is simply
- * stuck waiting for the log to be flushed to disk.
- */
-STATIC uint
-xfs_qm_qoff_logitem_push(
-	struct xfs_log_item	*lip,
-	struct list_head	*buffer_list)
-{
-	return XFS_ITEM_LOCKED;
-}
-
 STATIC xfs_lsn_t
 xfs_qm_qoffend_logitem_committed(
 	struct xfs_log_item	*lip,
@@ -318,14 +306,12 @@ static const struct xfs_item_ops xfs_qm_qoffend_logitem_ops = {
 	.iop_size	= xfs_qm_qoff_logitem_size,
 	.iop_format	= xfs_qm_qoff_logitem_format,
 	.iop_committed	= xfs_qm_qoffend_logitem_committed,
-	.iop_push	= xfs_qm_qoff_logitem_push,
 	.iop_release	= xfs_qm_qoff_logitem_release,
 };
 
 static const struct xfs_item_ops xfs_qm_qoff_logitem_ops = {
 	.iop_size	= xfs_qm_qoff_logitem_size,
 	.iop_format	= xfs_qm_qoff_logitem_format,
-	.iop_push	= xfs_qm_qoff_logitem_push,
 	.iop_release	= xfs_qm_qoff_logitem_release,
 };
 
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 09/12] xfs: elide the AIL lock on log item failure tracking
  2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
                   ` (7 preceding siblings ...)
  2020-04-17 15:08 ` [PATCH 08/12] xfs: remove unnecessary quotaoff intent item push handler Brian Foster
@ 2020-04-17 15:08 ` Brian Foster
  2020-04-17 15:08 ` [PATCH 10/12] xfs: clean up AIL log item removal functions Brian Foster
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-17 15:08 UTC (permalink / raw)
  To: linux-xfs

The log item failed flag is used to indicate a log item was flushed
but the underlying buffer was not successfully written to disk. If
the error configuration allows for writeback retries, xfsaild uses
the flag to resubmit the underlying buffer without acquiring the
flush lock of the item.

The flag is currently set and cleared under the AIL lock and buffer
lock in the ->iop_error() callback, invoked via ->b_iodone() at I/O
completion time. The flag is checked at xfsaild push time under AIL
lock and cleared under buffer lock before resubmission. If I/O
eventually succeeds, the flag is cleared in the _done() handler for
the associated item type, again under AIL lock and buffer lock.

As far as I can tell, the only reason for holding the AIL lock
across sets/clears is to manage consistency between the log item
bitop state and the temporary buffer pointer that is attached to the
log item. The bit itself is used to manage consistency of the
attached buffer, but is not enough to guarantee the buffer is still
attached by the time xfsaild attempts to access it. However since
failure state is always set or cleared under buffer lock (either via
I/O completion or xfsaild), this particular case can be handled at
item push time by verifying failure state once under buffer lock.

Remove the AIL lock protection from the various bits of log item
failure state management and simplify the surrounding code where
applicable. Use the buffer lock in the xfsaild resubmit code to
detect failure state changes and temporarily treat the item as
locked.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf_item.c   |  4 ----
 fs/xfs/xfs_dquot.c      | 41 +++++++++++++++--------------------------
 fs/xfs/xfs_inode_item.c | 11 +++--------
 fs/xfs/xfs_trans_ail.c  | 12 ++++++++++--
 fs/xfs/xfs_trans_priv.h |  5 -----
 5 files changed, 28 insertions(+), 45 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 72d37a4609d8..6b000f855e13 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -1046,7 +1046,6 @@ xfs_buf_do_callbacks_fail(
 	struct xfs_buf		*bp)
 {
 	struct xfs_log_item	*lip;
-	struct xfs_ail		*ailp;
 
 	/*
 	 * Buffer log item errors are handled directly by xfs_buf_item_push()
@@ -1058,13 +1057,10 @@ xfs_buf_do_callbacks_fail(
 
 	lip = list_first_entry(&bp->b_li_list, struct xfs_log_item,
 			li_bio_list);
-	ailp = lip->li_ailp;
-	spin_lock(&ailp->ail_lock);
 	list_for_each_entry(lip, &bp->b_li_list, li_bio_list) {
 		if (lip->li_ops->iop_error)
 			lip->li_ops->iop_error(lip, bp);
 	}
-	spin_unlock(&ailp->ail_lock);
 }
 
 static bool
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 41750f797861..953059235130 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1023,34 +1023,23 @@ xfs_qm_dqflush_done(
 	struct xfs_ail		*ailp = lip->li_ailp;
 
 	/*
-	 * We only want to pull the item from the AIL if its
-	 * location in the log has not changed since we started the flush.
-	 * Thus, we only bother if the dquot's lsn has
-	 * not changed. First we check the lsn outside the lock
-	 * since it's cheaper, and then we recheck while
-	 * holding the lock before removing the dquot from the AIL.
+	 * Only pull the item from the AIL if its location in the log has not
+	 * changed since it was flushed. Do a lockless check first to reduce
+	 * lock traffic.
 	 */
-	if (test_bit(XFS_LI_IN_AIL, &lip->li_flags) &&
-	    ((lip->li_lsn == qip->qli_flush_lsn) ||
-	     test_bit(XFS_LI_FAILED, &lip->li_flags))) {
-
-		/* xfs_trans_ail_delete() drops the AIL lock. */
-		spin_lock(&ailp->ail_lock);
-		if (lip->li_lsn == qip->qli_flush_lsn) {
-			xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
-		} else {
-			/*
-			 * Clear the failed state since we are about to drop the
-			 * flush lock
-			 */
-			xfs_clear_li_failed(lip);
-			spin_unlock(&ailp->ail_lock);
-		}
-	}
+	if (!test_bit(XFS_LI_IN_AIL, &lip->li_flags) ||
+	    lip->li_lsn != qip->qli_flush_lsn)
+		goto out;
 
-	/*
-	 * Release the dq's flush lock since we're done with it.
-	 */
+	spin_lock(&ailp->ail_lock);
+	if (lip->li_lsn == qip->qli_flush_lsn)
+		/* xfs_trans_ail_delete() drops the AIL lock */
+		xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
+	else
+		spin_unlock(&ailp->ail_lock);
+
+out:
+	xfs_clear_li_failed(lip);
 	xfs_dqfunlock(dqp);
 }
 
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 1d4d256a2e96..0ae61844b224 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -682,9 +682,7 @@ xfs_iflush_done(
 	 * Scan the buffer IO completions for other inodes being completed and
 	 * attach them to the current inode log item.
 	 */
-
 	list_add_tail(&lip->li_bio_list, &tmp);
-
 	list_for_each_entry_safe(blip, n, &bp->b_li_list, li_bio_list) {
 		if (lip->li_cb != xfs_iflush_done)
 			continue;
@@ -695,15 +693,13 @@ xfs_iflush_done(
 		 * the AIL lock.
 		 */
 		iip = INODE_ITEM(blip);
-		if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) ||
-		    test_bit(XFS_LI_FAILED, &blip->li_flags))
+		if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn)
 			need_ail++;
 	}
 
 	/* make sure we capture the state of the initial inode. */
 	iip = INODE_ITEM(lip);
-	if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) ||
-	    test_bit(XFS_LI_FAILED, &lip->li_flags))
+	if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn)
 		need_ail++;
 
 	/*
@@ -732,8 +728,6 @@ xfs_iflush_done(
 				xfs_lsn_t lsn = xfs_ail_delete_one(ailp, blip);
 				if (!tail_lsn && lsn)
 					tail_lsn = lsn;
-			} else {
-				xfs_clear_li_failed(blip);
 			}
 		}
 		xfs_ail_update_finish(ailp, tail_lsn);
@@ -746,6 +740,7 @@ xfs_iflush_done(
 	 */
 	list_for_each_entry_safe(blip, n, &tmp, li_bio_list) {
 		list_del_init(&blip->li_bio_list);
+		xfs_clear_li_failed(blip);
 		iip = INODE_ITEM(blip);
 		iip->ili_logged = 0;
 		iip->ili_last_fields = 0;
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 0c709651a2c6..6af609070143 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -368,15 +368,23 @@ xfsaild_push_failed(
 {
 	struct xfs_buf		*bp = lip->li_buf;
 
-	if (!xfs_buf_trylock(bp))
+	/*
+	 * Log item state bits are racy so we cannot assume the temporary buffer
+	 * pointer is set. Treat the item as locked if the pointer is clear or
+	 * the failure state has changed once we've locked out I/O completion.
+	 */
+	if (!bp || !xfs_buf_trylock(bp))
 		return XFS_ITEM_LOCKED;
+	if (!test_bit(XFS_LI_FAILED, &lip->li_flags)) {
+		xfs_buf_unlock(bp);
+		return XFS_ITEM_LOCKED;
+	}
 
 	if (!xfs_buf_delwri_queue(bp, buffer_list)) {
 		xfs_buf_unlock(bp);
 		return XFS_ITEM_FLUSHING;
 	}
 
-	/* protected by ail_lock */
 	list_for_each_entry(lip, &bp->b_li_list, li_bio_list)
 		xfs_clear_li_failed(lip);
 
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 35655eac01a6..9135afdcee9d 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -158,9 +158,6 @@ xfs_clear_li_failed(
 {
 	struct xfs_buf	*bp = lip->li_buf;
 
-	ASSERT(test_bit(XFS_LI_IN_AIL, &lip->li_flags));
-	lockdep_assert_held(&lip->li_ailp->ail_lock);
-
 	if (test_and_clear_bit(XFS_LI_FAILED, &lip->li_flags)) {
 		lip->li_buf = NULL;
 		xfs_buf_rele(bp);
@@ -172,8 +169,6 @@ xfs_set_li_failed(
 	struct xfs_log_item	*lip,
 	struct xfs_buf		*bp)
 {
-	lockdep_assert_held(&lip->li_ailp->ail_lock);
-
 	if (!test_and_set_bit(XFS_LI_FAILED, &lip->li_flags)) {
 		xfs_buf_hold(bp);
 		lip->li_buf = bp;
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 10/12] xfs: clean up AIL log item removal functions
  2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
                   ` (8 preceding siblings ...)
  2020-04-17 15:08 ` [PATCH 09/12] xfs: elide the AIL lock on log item failure tracking Brian Foster
@ 2020-04-17 15:08 ` Brian Foster
  2020-04-20  4:32   ` Dave Chinner
  2020-04-17 15:08 ` [PATCH 11/12] xfs: remove unused iflush stale parameter Brian Foster
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: Brian Foster @ 2020-04-17 15:08 UTC (permalink / raw)
  To: linux-xfs

We have two AIL removal functions with slightly different semantics.
xfs_trans_ail_delete() expects the caller to have the AIL lock and
for the associated item to be AIL resident. If not, the filesystem
is shut down. xfs_trans_ail_remove() acquires the AIL lock, checks
that the item is AIL resident and calls the former if so.

These semantics lead to confused usage between the two. For example,
the _remove() variant takes a shutdown parameter to pass to the
_delete() variant, but immediately returns if the AIL bit is not
set. This means that _remove() would never shut down if an item is
not AIL resident, even though it appears that many callers would
expect it to.

Make the following changes to clean up both of these functions:

- Most callers of xfs_trans_ail_delete() acquire the AIL lock just
  before the call. Update _delete() to acquire the lock and open
  code the couple of callers that make additional checks under AIL
  lock.
- Drop the unnecessary ailp parameter from _delete().
- Drop the unused shutdown parameter from _remove() and open code
  the implementation.

In summary, this leaves a _delete() variant that expects an AIL
resident item and a _remove() helper that checks the AIL bit. Audit
the existing callsites for use of the appropriate function and
update as necessary.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_bmap_item.c     |  2 +-
 fs/xfs/xfs_buf_item.c      | 21 ++++++++-------------
 fs/xfs/xfs_dquot.c         | 12 +++++++-----
 fs/xfs/xfs_dquot_item.c    |  2 +-
 fs/xfs/xfs_extfree_item.c  |  2 +-
 fs/xfs/xfs_inode_item.c    |  6 +-----
 fs/xfs/xfs_refcount_item.c |  2 +-
 fs/xfs/xfs_rmap_item.c     |  2 +-
 fs/xfs/xfs_trans_ail.c     |  3 ++-
 fs/xfs/xfs_trans_priv.h    | 17 +++++++++--------
 10 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index ee6f4229cebc..909221a4a8ab 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -51,7 +51,7 @@ xfs_bui_release(
 {
 	ASSERT(atomic_read(&buip->bui_refcount) > 0);
 	if (atomic_dec_and_test(&buip->bui_refcount)) {
-		xfs_trans_ail_remove(&buip->bui_item, SHUTDOWN_LOG_IO_ERROR);
+		xfs_trans_ail_delete(&buip->bui_item, SHUTDOWN_LOG_IO_ERROR);
 		xfs_bui_item_free(buip);
 	}
 }
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 6b000f855e13..1bf1c14d4ebb 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -410,7 +410,6 @@ xfs_buf_item_unpin(
 {
 	struct xfs_buf_log_item	*bip = BUF_ITEM(lip);
 	xfs_buf_t		*bp = bip->bli_buf;
-	struct xfs_ail		*ailp = lip->li_ailp;
 	int			stale = bip->bli_flags & XFS_BLI_STALE;
 	int			freed;
 
@@ -463,8 +462,7 @@ xfs_buf_item_unpin(
 			list_del_init(&bp->b_li_list);
 			bp->b_iodone = NULL;
 		} else {
-			spin_lock(&ailp->ail_lock);
-			xfs_trans_ail_delete(ailp, lip, SHUTDOWN_LOG_IO_ERROR);
+			xfs_trans_ail_delete(lip, SHUTDOWN_LOG_IO_ERROR);
 			xfs_buf_item_relse(bp);
 			ASSERT(bp->b_log_item == NULL);
 		}
@@ -568,7 +566,7 @@ xfs_buf_item_put(
 	 * state.
 	 */
 	if (aborted)
-		xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
+		xfs_trans_ail_delete(lip, SHUTDOWN_LOG_IO_ERROR);
 	xfs_buf_item_relse(bip->bli_buf);
 	return true;
 }
@@ -1209,22 +1207,19 @@ xfs_buf_iodone(
 	struct xfs_buf		*bp,
 	struct xfs_log_item	*lip)
 {
-	struct xfs_ail		*ailp = lip->li_ailp;
-
 	ASSERT(BUF_ITEM(lip)->bli_buf == bp);
 
 	xfs_buf_rele(bp);
 
 	/*
-	 * If we are forcibly shutting down, this may well be
-	 * off the AIL already. That's because we simulate the
-	 * log-committed callbacks to unpin these buffers. Or we may never
-	 * have put this item on AIL because of the transaction was
-	 * aborted forcibly. xfs_trans_ail_delete() takes care of these.
+	 * If we are forcibly shutting down, this may well be off the AIL
+	 * already. That's because we simulate the log-committed callbacks to
+	 * unpin these buffers. Or we may never have put this item on AIL
+	 * because of the transaction was aborted forcibly.
+	 * xfs_trans_ail_delete() takes care of these.
 	 *
 	 * Either way, AIL is useless if we're forcing a shutdown.
 	 */
-	spin_lock(&ailp->ail_lock);
-	xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
+	xfs_trans_ail_delete(lip, SHUTDOWN_CORRUPT_INCORE);
 	xfs_buf_item_free(BUF_ITEM(lip));
 }
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 953059235130..996751dd6302 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1021,6 +1021,7 @@ xfs_qm_dqflush_done(
 	struct xfs_dq_logitem	*qip = (struct xfs_dq_logitem *)lip;
 	struct xfs_dquot	*dqp = qip->qli_dquot;
 	struct xfs_ail		*ailp = lip->li_ailp;
+	xfs_lsn_t		tail_lsn;
 
 	/*
 	 * Only pull the item from the AIL if its location in the log has not
@@ -1032,10 +1033,11 @@ xfs_qm_dqflush_done(
 		goto out;
 
 	spin_lock(&ailp->ail_lock);
-	if (lip->li_lsn == qip->qli_flush_lsn)
-		/* xfs_trans_ail_delete() drops the AIL lock */
-		xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
-	else
+	if (lip->li_lsn == qip->qli_flush_lsn) {
+		/* xfs_ail_update_finish() drops the AIL lock */
+		tail_lsn = xfs_ail_delete_one(ailp, lip);
+		xfs_ail_update_finish(ailp, tail_lsn);
+	} else
 		spin_unlock(&ailp->ail_lock);
 
 out:
@@ -1137,7 +1139,7 @@ xfs_qm_dqflush(
 
 out_abort:
 	dqp->dq_flags &= ~XFS_DQ_DIRTY;
-	xfs_trans_ail_remove(lip, SHUTDOWN_CORRUPT_INCORE);
+	xfs_trans_ail_remove(lip);
 	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 out_unlock:
 	xfs_dqfunlock(dqp);
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index 582b3796a0c9..f129cfcc36be 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -329,7 +329,7 @@ xfs_qm_qoff_logitem_relse(
 	ASSERT(test_bit(XFS_LI_IN_AIL, &lip->li_flags) ||
 	       test_bit(XFS_LI_ABORTED, &lip->li_flags) ||
 	       XFS_FORCED_SHUTDOWN(lip->li_mountp));
-	xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
+	xfs_trans_ail_remove(lip);
 	kmem_free(lip->li_lv_shadow);
 	kmem_free(qoff);
 }
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 6ea847f6e298..cd98eba24884 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -55,7 +55,7 @@ xfs_efi_release(
 {
 	ASSERT(atomic_read(&efip->efi_refcount) > 0);
 	if (atomic_dec_and_test(&efip->efi_refcount)) {
-		xfs_trans_ail_remove(&efip->efi_item, SHUTDOWN_LOG_IO_ERROR);
+		xfs_trans_ail_delete(&efip->efi_item, SHUTDOWN_LOG_IO_ERROR);
 		xfs_efi_item_free(efip);
 	}
 }
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 0ae61844b224..f8dd9bb8c851 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -763,11 +763,7 @@ xfs_iflush_abort(
 	xfs_inode_log_item_t	*iip = ip->i_itemp;
 
 	if (iip) {
-		if (test_bit(XFS_LI_IN_AIL, &iip->ili_item.li_flags)) {
-			xfs_trans_ail_remove(&iip->ili_item,
-					     stale ? SHUTDOWN_LOG_IO_ERROR :
-						     SHUTDOWN_CORRUPT_INCORE);
-		}
+		xfs_trans_ail_remove(&iip->ili_item);
 		iip->ili_logged = 0;
 		/*
 		 * Clear the ili_last_fields bits now that we know that the
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 8eeed73928cd..712939a015a9 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -50,7 +50,7 @@ xfs_cui_release(
 {
 	ASSERT(atomic_read(&cuip->cui_refcount) > 0);
 	if (atomic_dec_and_test(&cuip->cui_refcount)) {
-		xfs_trans_ail_remove(&cuip->cui_item, SHUTDOWN_LOG_IO_ERROR);
+		xfs_trans_ail_delete(&cuip->cui_item, SHUTDOWN_LOG_IO_ERROR);
 		xfs_cui_item_free(cuip);
 	}
 }
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 4911b68f95dd..ff949b32c051 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -50,7 +50,7 @@ xfs_rui_release(
 {
 	ASSERT(atomic_read(&ruip->rui_refcount) > 0);
 	if (atomic_dec_and_test(&ruip->rui_refcount)) {
-		xfs_trans_ail_remove(&ruip->rui_item, SHUTDOWN_LOG_IO_ERROR);
+		xfs_trans_ail_delete(&ruip->rui_item, SHUTDOWN_LOG_IO_ERROR);
 		xfs_rui_item_free(ruip);
 	}
 }
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 6af609070143..80acdb89bd6e 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -872,13 +872,14 @@ xfs_ail_delete_one(
  */
 void
 xfs_trans_ail_delete(
-	struct xfs_ail		*ailp,
 	struct xfs_log_item	*lip,
 	int			shutdown_type)
 {
+	struct xfs_ail		*ailp = lip->li_ailp;
 	struct xfs_mount	*mp = ailp->ail_mount;
 	xfs_lsn_t		tail_lsn;
 
+	spin_lock(&ailp->ail_lock);
 	if (!test_bit(XFS_LI_IN_AIL, &lip->li_flags)) {
 		spin_unlock(&ailp->ail_lock);
 		if (!XFS_FORCED_SHUTDOWN(mp)) {
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 9135afdcee9d..7563c78e2997 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -94,22 +94,23 @@ xfs_trans_ail_update(
 xfs_lsn_t xfs_ail_delete_one(struct xfs_ail *ailp, struct xfs_log_item *lip);
 void xfs_ail_update_finish(struct xfs_ail *ailp, xfs_lsn_t old_lsn)
 			__releases(ailp->ail_lock);
-void xfs_trans_ail_delete(struct xfs_ail *ailp, struct xfs_log_item *lip,
-		int shutdown_type);
+void xfs_trans_ail_delete(struct xfs_log_item *lip, int shutdown_type);
 
 static inline void
 xfs_trans_ail_remove(
-	struct xfs_log_item	*lip,
-	int			shutdown_type)
+	struct xfs_log_item	*lip)
 {
 	struct xfs_ail		*ailp = lip->li_ailp;
+	xfs_lsn_t		tail_lsn;
 
 	spin_lock(&ailp->ail_lock);
-	/* xfs_trans_ail_delete() drops the AIL lock */
-	if (test_bit(XFS_LI_IN_AIL, &lip->li_flags))
-		xfs_trans_ail_delete(ailp, lip, shutdown_type);
-	else
+	/* xfs_ail_update_finish() drops the AIL lock */
+	if (test_bit(XFS_LI_IN_AIL, &lip->li_flags)) {
+		tail_lsn = xfs_ail_delete_one(ailp, lip);
+		xfs_ail_update_finish(ailp, tail_lsn);
+	} else {
 		spin_unlock(&ailp->ail_lock);
+	}
 }
 
 void			xfs_ail_push(struct xfs_ail *, xfs_lsn_t);
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 11/12] xfs: remove unused iflush stale parameter
  2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
                   ` (9 preceding siblings ...)
  2020-04-17 15:08 ` [PATCH 10/12] xfs: clean up AIL log item removal functions Brian Foster
@ 2020-04-17 15:08 ` Brian Foster
  2020-04-20  4:34   ` Dave Chinner
  2020-04-20 19:19   ` Allison Collins
  2020-04-17 15:08 ` [PATCH 12/12] xfs: random buffer write failure errortag Brian Foster
  2020-04-19 22:53 ` [PATCH 00/12] xfs: flush related error handling cleanups Dave Chinner
  12 siblings, 2 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-17 15:08 UTC (permalink / raw)
  To: linux-xfs

The stale parameter was used to control the now unused shutdown
parameter of xfs_trans_ail_remove().

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_icache.c     | 2 +-
 fs/xfs/xfs_inode.c      | 2 +-
 fs/xfs/xfs_inode_item.c | 7 +++----
 fs/xfs/xfs_inode_item.h | 2 +-
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index a7be7a9e5c1a..9088716465e7 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1118,7 +1118,7 @@ xfs_reclaim_inode(
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
 		xfs_iunpin_wait(ip);
 		/* xfs_iflush_abort() drops the flush lock */
-		xfs_iflush_abort(ip, false);
+		xfs_iflush_abort(ip);
 		goto reclaim;
 	}
 	if (xfs_ipincount(ip)) {
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 98ee1b10d1b0..502ffe857b12 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3701,7 +3701,7 @@ xfs_iflush(
 	return 0;
 
 abort:
-	xfs_iflush_abort(ip, false);
+	xfs_iflush_abort(ip);
 shutdown:
 	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 	return error;
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index f8dd9bb8c851..b8cbd0c61ce0 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -757,10 +757,9 @@ xfs_iflush_done(
  */
 void
 xfs_iflush_abort(
-	xfs_inode_t		*ip,
-	bool			stale)
+	struct xfs_inode		*ip)
 {
-	xfs_inode_log_item_t	*iip = ip->i_itemp;
+	struct xfs_inode_log_item	*iip = ip->i_itemp;
 
 	if (iip) {
 		xfs_trans_ail_remove(&iip->ili_item);
@@ -788,7 +787,7 @@ xfs_istale_done(
 	struct xfs_buf		*bp,
 	struct xfs_log_item	*lip)
 {
-	xfs_iflush_abort(INODE_ITEM(lip)->ili_inode, true);
+	xfs_iflush_abort(INODE_ITEM(lip)->ili_inode);
 }
 
 /*
diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h
index 07a60e74c39c..a68c114b79a0 100644
--- a/fs/xfs/xfs_inode_item.h
+++ b/fs/xfs/xfs_inode_item.h
@@ -34,7 +34,7 @@ extern void xfs_inode_item_init(struct xfs_inode *, struct xfs_mount *);
 extern void xfs_inode_item_destroy(struct xfs_inode *);
 extern void xfs_iflush_done(struct xfs_buf *, struct xfs_log_item *);
 extern void xfs_istale_done(struct xfs_buf *, struct xfs_log_item *);
-extern void xfs_iflush_abort(struct xfs_inode *, bool);
+extern void xfs_iflush_abort(struct xfs_inode *);
 extern int xfs_inode_item_format_convert(xfs_log_iovec_t *,
 					 struct xfs_inode_log_format *);
 
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 12/12] xfs: random buffer write failure errortag
  2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
                   ` (10 preceding siblings ...)
  2020-04-17 15:08 ` [PATCH 11/12] xfs: remove unused iflush stale parameter Brian Foster
@ 2020-04-17 15:08 ` Brian Foster
  2020-04-20  4:37   ` Dave Chinner
  2020-04-20 22:42   ` Allison Collins
  2020-04-19 22:53 ` [PATCH 00/12] xfs: flush related error handling cleanups Dave Chinner
  12 siblings, 2 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-17 15:08 UTC (permalink / raw)
  To: linux-xfs

Introduce an error tag to randomly fail async buffer writes. This is
primarily to facilitate testing of the XFS error configuration
mechanism.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_errortag.h | 4 +++-
 fs/xfs/xfs_buf.c             | 6 ++++++
 fs/xfs/xfs_error.c           | 3 +++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
index 79e6c4fb1d8a..2486dab19023 100644
--- a/fs/xfs/libxfs/xfs_errortag.h
+++ b/fs/xfs/libxfs/xfs_errortag.h
@@ -55,7 +55,8 @@
 #define XFS_ERRTAG_FORCE_SCRUB_REPAIR			32
 #define XFS_ERRTAG_FORCE_SUMMARY_RECALC			33
 #define XFS_ERRTAG_IUNLINK_FALLBACK			34
-#define XFS_ERRTAG_MAX					35
+#define XFS_ERRTAG_BUF_IOERROR				35
+#define XFS_ERRTAG_MAX					36
 
 /*
  * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
@@ -95,5 +96,6 @@
 #define XFS_RANDOM_FORCE_SCRUB_REPAIR			1
 #define XFS_RANDOM_FORCE_SUMMARY_RECALC			1
 #define XFS_RANDOM_IUNLINK_FALLBACK			(XFS_RANDOM_DEFAULT/10)
+#define XFS_RANDOM_BUF_IOERROR				XFS_RANDOM_DEFAULT
 
 #endif /* __XFS_ERRORTAG_H_ */
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 5120fed06075..a305db779156 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1289,6 +1289,12 @@ xfs_buf_bio_end_io(
 	struct bio		*bio)
 {
 	struct xfs_buf		*bp = (struct xfs_buf *)bio->bi_private;
+	struct xfs_mount	*mp = bp->b_mount;
+
+	if (!bio->bi_status &&
+	    (bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
+	    XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BUF_IOERROR))
+		bio->bi_status = errno_to_blk_status(-EIO);
 
 	/*
 	 * don't overwrite existing errors - otherwise we can lose errors on
diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
index a21e9cc6516a..7f6e20899473 100644
--- a/fs/xfs/xfs_error.c
+++ b/fs/xfs/xfs_error.c
@@ -53,6 +53,7 @@ static unsigned int xfs_errortag_random_default[] = {
 	XFS_RANDOM_FORCE_SCRUB_REPAIR,
 	XFS_RANDOM_FORCE_SUMMARY_RECALC,
 	XFS_RANDOM_IUNLINK_FALLBACK,
+	XFS_RANDOM_BUF_IOERROR,
 };
 
 struct xfs_errortag_attr {
@@ -162,6 +163,7 @@ XFS_ERRORTAG_ATTR_RW(buf_lru_ref,	XFS_ERRTAG_BUF_LRU_REF);
 XFS_ERRORTAG_ATTR_RW(force_repair,	XFS_ERRTAG_FORCE_SCRUB_REPAIR);
 XFS_ERRORTAG_ATTR_RW(bad_summary,	XFS_ERRTAG_FORCE_SUMMARY_RECALC);
 XFS_ERRORTAG_ATTR_RW(iunlink_fallback,	XFS_ERRTAG_IUNLINK_FALLBACK);
+XFS_ERRORTAG_ATTR_RW(buf_ioerror,	XFS_ERRTAG_BUF_IOERROR);
 
 static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(noerror),
@@ -199,6 +201,7 @@ static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(force_repair),
 	XFS_ERRORTAG_ATTR_LIST(bad_summary),
 	XFS_ERRORTAG_ATTR_LIST(iunlink_fallback),
+	XFS_ERRORTAG_ATTR_LIST(buf_ioerror),
 	NULL,
 };
 
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 01/12] xfs: refactor failed buffer resubmission into xfsaild
  2020-04-17 15:08 ` [PATCH 01/12] xfs: refactor failed buffer resubmission into xfsaild Brian Foster
@ 2020-04-17 22:37   ` Allison Collins
  2020-04-20  2:45   ` Dave Chinner
  1 sibling, 0 replies; 48+ messages in thread
From: Allison Collins @ 2020-04-17 22:37 UTC (permalink / raw)
  To: Brian Foster, linux-xfs



On 4/17/20 8:08 AM, Brian Foster wrote:
> Flush locked log items whose underlying buffers fail metadata
> writeback are tagged with a special flag to indicate that the flush
> lock is already held. This is currently implemented in the type
> specific ->iop_push() callback, but the processing required for such
> items is not type specific because we're only doing basic state
> management on the underlying buffer.
> 
> Factor the failed log item handling out of the inode and dquot
> ->iop_push() callbacks and open code the buffer resubmit helper into
> a single helper called from xfsaild_push_item(). This provides a
> generic mechanism for handling failed metadata buffer writeback with
> a bit less code.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Ok, I traced it through, and I think the re-factor is equivalent
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

> ---
>   fs/xfs/xfs_buf_item.c   | 39 ---------------------------------------
>   fs/xfs/xfs_buf_item.h   |  2 --
>   fs/xfs/xfs_dquot_item.c | 15 ---------------
>   fs/xfs/xfs_inode_item.c | 15 ---------------
>   fs/xfs/xfs_trans_ail.c  | 41 +++++++++++++++++++++++++++++++++++++++++
>   5 files changed, 41 insertions(+), 71 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 1545657c3ca0..8796adde2d12 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -1248,42 +1248,3 @@ xfs_buf_iodone(
>   	xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
>   	xfs_buf_item_free(BUF_ITEM(lip));
>   }
> -
> -/*
> - * Requeue a failed buffer for writeback.
> - *
> - * We clear the log item failed state here as well, but we have to be careful
> - * about reference counts because the only active reference counts on the buffer
> - * may be the failed log items. Hence if we clear the log item failed state
> - * before queuing the buffer for IO we can release all active references to
> - * the buffer and free it, leading to use after free problems in
> - * xfs_buf_delwri_queue. It makes no difference to the buffer or log items which
> - * order we process them in - the buffer is locked, and we own the buffer list
> - * so nothing on them is going to change while we are performing this action.
> - *
> - * Hence we can safely queue the buffer for IO before we clear the failed log
> - * item state, therefore  always having an active reference to the buffer and
> - * avoiding the transient zero-reference state that leads to use-after-free.
> - *
> - * Return true if the buffer was added to the buffer list, false if it was
> - * already on the buffer list.
> - */
> -bool
> -xfs_buf_resubmit_failed_buffers(
> -	struct xfs_buf		*bp,
> -	struct list_head	*buffer_list)
> -{
> -	struct xfs_log_item	*lip;
> -	bool			ret;
> -
> -	ret = xfs_buf_delwri_queue(bp, buffer_list);
> -
> -	/*
> -	 * XFS_LI_FAILED set/clear is protected by ail_lock, caller of this
> -	 * function already have it acquired
> -	 */
> -	list_for_each_entry(lip, &bp->b_li_list, li_bio_list)
> -		xfs_clear_li_failed(lip);
> -
> -	return ret;
> -}
> diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
> index 30114b510332..c9c57e2da932 100644
> --- a/fs/xfs/xfs_buf_item.h
> +++ b/fs/xfs/xfs_buf_item.h
> @@ -59,8 +59,6 @@ void	xfs_buf_attach_iodone(struct xfs_buf *,
>   			      struct xfs_log_item *);
>   void	xfs_buf_iodone_callbacks(struct xfs_buf *);
>   void	xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *);
> -bool	xfs_buf_resubmit_failed_buffers(struct xfs_buf *,
> -					struct list_head *);
>   bool	xfs_buf_log_check_iovec(struct xfs_log_iovec *iovec);
>   
>   extern kmem_zone_t	*xfs_buf_item_zone;
> diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
> index baad1748d0d1..5a7808299a32 100644
> --- a/fs/xfs/xfs_dquot_item.c
> +++ b/fs/xfs/xfs_dquot_item.c
> @@ -145,21 +145,6 @@ xfs_qm_dquot_logitem_push(
>   	if (atomic_read(&dqp->q_pincount) > 0)
>   		return XFS_ITEM_PINNED;
>   
> -	/*
> -	 * The buffer containing this item failed to be written back
> -	 * previously. Resubmit the buffer for IO
> -	 */
> -	if (test_bit(XFS_LI_FAILED, &lip->li_flags)) {
> -		if (!xfs_buf_trylock(bp))
> -			return XFS_ITEM_LOCKED;
> -
> -		if (!xfs_buf_resubmit_failed_buffers(bp, buffer_list))
> -			rval = XFS_ITEM_FLUSHING;
> -
> -		xfs_buf_unlock(bp);
> -		return rval;
> -	}
> -
>   	if (!xfs_dqlock_nowait(dqp))
>   		return XFS_ITEM_LOCKED;
>   
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index f779cca2346f..1d4d256a2e96 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -497,21 +497,6 @@ xfs_inode_item_push(
>   	if (xfs_ipincount(ip) > 0)
>   		return XFS_ITEM_PINNED;
>   
> -	/*
> -	 * The buffer containing this item failed to be written back
> -	 * previously. Resubmit the buffer for IO.
> -	 */
> -	if (test_bit(XFS_LI_FAILED, &lip->li_flags)) {
> -		if (!xfs_buf_trylock(bp))
> -			return XFS_ITEM_LOCKED;
> -
> -		if (!xfs_buf_resubmit_failed_buffers(bp, buffer_list))
> -			rval = XFS_ITEM_FLUSHING;
> -
> -		xfs_buf_unlock(bp);
> -		return rval;
> -	}
> -
>   	if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED))
>   		return XFS_ITEM_LOCKED;
>   
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 564253550b75..0c709651a2c6 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -345,6 +345,45 @@ xfs_ail_delete(
>   	xfs_trans_ail_cursor_clear(ailp, lip);
>   }
>   
> +/*
> + * Requeue a failed buffer for writeback.
> + *
> + * We clear the log item failed state here as well, but we have to be careful
> + * about reference counts because the only active reference counts on the buffer
> + * may be the failed log items. Hence if we clear the log item failed state
> + * before queuing the buffer for IO we can release all active references to
> + * the buffer and free it, leading to use after free problems in
> + * xfs_buf_delwri_queue. It makes no difference to the buffer or log items which
> + * order we process them in - the buffer is locked, and we own the buffer list
> + * so nothing on them is going to change while we are performing this action.
> + *
> + * Hence we can safely queue the buffer for IO before we clear the failed log
> + * item state, therefore  always having an active reference to the buffer and
> + * avoiding the transient zero-reference state that leads to use-after-free.
> + */
> +static inline int
> +xfsaild_push_failed(
> +	struct xfs_log_item	*lip,
> +	struct list_head	*buffer_list)
> +{
> +	struct xfs_buf		*bp = lip->li_buf;
> +
> +	if (!xfs_buf_trylock(bp))
> +		return XFS_ITEM_LOCKED;
> +
> +	if (!xfs_buf_delwri_queue(bp, buffer_list)) {
> +		xfs_buf_unlock(bp);
> +		return XFS_ITEM_FLUSHING;
> +	}
> +
> +	/* protected by ail_lock */
> +	list_for_each_entry(lip, &bp->b_li_list, li_bio_list)
> +		xfs_clear_li_failed(lip);
> +
> +	xfs_buf_unlock(bp);
> +	return XFS_ITEM_SUCCESS;
> +}
> +
>   static inline uint
>   xfsaild_push_item(
>   	struct xfs_ail		*ailp,
> @@ -365,6 +404,8 @@ xfsaild_push_item(
>   	 */
>   	if (!lip->li_ops->iop_push)
>   		return XFS_ITEM_PINNED;
> +	if (test_bit(XFS_LI_FAILED, &lip->li_flags))
> +		return xfsaild_push_failed(lip, &ailp->ail_buf_list);
>   	return lip->li_ops->iop_push(lip, &ailp->ail_buf_list);
>   }
>   
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 02/12] xfs: factor out buffer I/O failure simulation code
  2020-04-17 15:08 ` [PATCH 02/12] xfs: factor out buffer I/O failure simulation code Brian Foster
@ 2020-04-17 22:37   ` Allison Collins
  2020-04-20  2:48   ` Dave Chinner
  1 sibling, 0 replies; 48+ messages in thread
From: Allison Collins @ 2020-04-17 22:37 UTC (permalink / raw)
  To: Brian Foster, linux-xfs

On 4/17/20 8:08 AM, Brian Foster wrote:
> We use the same buffer I/O failure simulation code in a few
> different places. It's not much code, but it's not necessarily
> self-explanatory. Factor it into a helper and document it in one
> place.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Ok, looks ok to me:
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

> ---
>   fs/xfs/xfs_buf.c      | 23 +++++++++++++++++++----
>   fs/xfs/xfs_buf.h      |  1 +
>   fs/xfs/xfs_buf_item.c | 22 +++-------------------
>   fs/xfs/xfs_inode.c    |  7 +------
>   4 files changed, 24 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 9ec3eaf1c618..93942d8e35dd 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1248,6 +1248,24 @@ xfs_buf_ioerror_alert(
>   			-bp->b_error);
>   }
>   
> +/*
> +  * To simulate an I/O failure, the buffer must be locked and held with at least
> + * three references. The LRU reference is dropped by the stale call. The buf
> + * item reference is dropped via ioend processing. The third reference is owned
> + * by the caller and is dropped on I/O completion if the buffer is XBF_ASYNC.
> + */
> +void
> +xfs_buf_iofail(
> +	struct xfs_buf	*bp,
> +	int		flags)
> +{
> +	bp->b_flags |= flags;
> +	bp->b_flags &= ~XBF_DONE;
> +	xfs_buf_stale(bp);
> +	xfs_buf_ioerror(bp, -EIO);
> +	xfs_buf_ioend(bp);
> +}
> +
>   int
>   xfs_bwrite(
>   	struct xfs_buf		*bp)
> @@ -1480,10 +1498,7 @@ __xfs_buf_submit(
>   
>   	/* on shutdown we stale and complete the buffer immediately */
>   	if (XFS_FORCED_SHUTDOWN(bp->b_mount)) {
> -		xfs_buf_ioerror(bp, -EIO);
> -		bp->b_flags &= ~XBF_DONE;
> -		xfs_buf_stale(bp);
> -		xfs_buf_ioend(bp);
> +		xfs_buf_iofail(bp, 0);
>   		return -EIO;
>   	}
>   
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index 9a04c53c2488..a6bce4702b2e 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -263,6 +263,7 @@ extern void __xfs_buf_ioerror(struct xfs_buf *bp, int error,
>   		xfs_failaddr_t failaddr);
>   #define xfs_buf_ioerror(bp, err) __xfs_buf_ioerror((bp), (err), __this_address)
>   extern void xfs_buf_ioerror_alert(struct xfs_buf *bp, xfs_failaddr_t fa);
> +void xfs_buf_iofail(struct xfs_buf *, int);
>   
>   extern int __xfs_buf_submit(struct xfs_buf *bp, bool);
>   static inline int xfs_buf_submit(struct xfs_buf *bp)
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 8796adde2d12..72d37a4609d8 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -471,28 +471,12 @@ xfs_buf_item_unpin(
>   		xfs_buf_relse(bp);
>   	} else if (freed && remove) {
>   		/*
> -		 * There are currently two references to the buffer - the active
> -		 * LRU reference and the buf log item. What we are about to do
> -		 * here - simulate a failed IO completion - requires 3
> -		 * references.
> -		 *
> -		 * The LRU reference is removed by the xfs_buf_stale() call. The
> -		 * buf item reference is removed by the xfs_buf_iodone()
> -		 * callback that is run by xfs_buf_do_callbacks() during ioend
> -		 * processing (via the bp->b_iodone callback), and then finally
> -		 * the ioend processing will drop the IO reference if the buffer
> -		 * is marked XBF_ASYNC.
> -		 *
> -		 * Hence we need to take an additional reference here so that IO
> -		 * completion processing doesn't free the buffer prematurely.
> +		 * The buffer must be locked and held by the caller to simulate
> +		 * an async I/O failure.
>   		 */
>   		xfs_buf_lock(bp);
>   		xfs_buf_hold(bp);
> -		bp->b_flags |= XBF_ASYNC;
> -		xfs_buf_ioerror(bp, -EIO);
> -		bp->b_flags &= ~XBF_DONE;
> -		xfs_buf_stale(bp);
> -		xfs_buf_ioend(bp);
> +		xfs_buf_iofail(bp, XBF_ASYNC);
>   	}
>   }
>   
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index d1772786af29..b539ee221ce5 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3629,12 +3629,7 @@ xfs_iflush_cluster(
>   	 * xfs_buf_submit().
>   	 */
>   	ASSERT(bp->b_iodone);
> -	bp->b_flags |= XBF_ASYNC;
> -	bp->b_flags &= ~XBF_DONE;
> -	xfs_buf_stale(bp);
> -	xfs_buf_ioerror(bp, -EIO);
> -	xfs_buf_ioend(bp);
> -
> +	xfs_buf_iofail(bp, XBF_ASYNC);
>   	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
>   
>   	/* abort the corrupt inode, as it was not attached to the buffer */
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/12] xfs: always attach iflush_done and simplify error handling
  2020-04-17 15:08 ` [PATCH 03/12] xfs: always attach iflush_done and simplify error handling Brian Foster
@ 2020-04-18  0:07   ` Allison Collins
  2020-04-20 13:59     ` Brian Foster
  2020-04-20  3:08   ` Dave Chinner
  1 sibling, 1 reply; 48+ messages in thread
From: Allison Collins @ 2020-04-18  0:07 UTC (permalink / raw)
  To: Brian Foster, linux-xfs

On 4/17/20 8:08 AM, Brian Foster wrote:
> The inode flush code has several layers of error handling between
> the inode and cluster flushing code. If the inode flush fails before
> acquiring the backing buffer, the inode flush is aborted. If the
> cluster flush fails, the current inode flush is aborted and the
> cluster buffer is failed to handle the initial inode and any others
> that might have been attached before the error.
> 
> Since xfs_iflush() is the only caller of xfs_iflush_cluser(), the
> error handling between the two can be condensed in the top-level
> function. If we update xfs_iflush_int() to attach the item
> completion handler to the buffer first, any errors that occur after
> the first call to xfs_iflush_int() can be handled with a buffer
> I/O failure.
> 
> Lift the error handling from xfs_iflush_cluster() into xfs_iflush()
> and consolidate with the existing error handling. This also replaces
> the need to release the buffer because failing the buffer with
> XBF_ASYNC drops the current reference.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>   fs/xfs/xfs_inode.c | 94 +++++++++++++++-------------------------------
>   1 file changed, 30 insertions(+), 64 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index b539ee221ce5..4c9971ec6fa6 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3496,6 +3496,7 @@ xfs_iflush_cluster(
>   	struct xfs_inode	**cilist;
>   	struct xfs_inode	*cip;
>   	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
> +	int			error = 0;
>   	int			nr_found;
>   	int			clcount = 0;
>   	int			i;
> @@ -3588,11 +3589,10 @@ xfs_iflush_cluster(
>   		 * re-check that it's dirty before flushing.
>   		 */
>   		if (!xfs_inode_clean(cip)) {
> -			int	error;
>   			error = xfs_iflush_int(cip, bp);
>   			if (error) {
>   				xfs_iunlock(cip, XFS_ILOCK_SHARED);
> -				goto cluster_corrupt_out;
> +				goto out_free;
>   			}
>   			clcount++;
>   		} else {
> @@ -3611,32 +3611,7 @@ xfs_iflush_cluster(
>   	kmem_free(cilist);
>   out_put:
>   	xfs_perag_put(pag);
> -	return 0;
> -
> -
> -cluster_corrupt_out:
> -	/*
> -	 * Corruption detected in the clustering loop.  Invalidate the
> -	 * inode buffer and shut down the filesystem.
> -	 */
> -	rcu_read_unlock();
Hmm, I can't find where this unlock moved?  Is it not needed anymore?  I 
think I followed the rest of it though.

Reviewed-by: Allison Collins <allison.henderson@oracle.com>

Allison

> -
> -	/*
> -	 * We'll always have an inode attached to the buffer for completion
> -	 * process by the time we are called from xfs_iflush(). Hence we have
> -	 * always need to do IO completion processing to abort the inodes
> -	 * attached to the buffer.  handle them just like the shutdown case in
> -	 * xfs_buf_submit().
> -	 */
> -	ASSERT(bp->b_iodone);
> -	xfs_buf_iofail(bp, XBF_ASYNC);
> -	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> -
> -	/* abort the corrupt inode, as it was not attached to the buffer */
> -	xfs_iflush_abort(cip, false);
> -	kmem_free(cilist);
> -	xfs_perag_put(pag);
> -	return -EFSCORRUPTED;
> +	return error;
>   }
>   
>   /*
> @@ -3692,17 +3667,16 @@ xfs_iflush(
>   	 */
>   	if (XFS_FORCED_SHUTDOWN(mp)) {
>   		error = -EIO;
> -		goto abort_out;
> +		goto abort;
>   	}
>   
>   	/*
>   	 * Get the buffer containing the on-disk inode. We are doing a try-lock
> -	 * operation here, so we may get  an EAGAIN error. In that case, we
> -	 * simply want to return with the inode still dirty.
> +	 * operation here, so we may get an EAGAIN error. In that case, return
> +	 * leaving the inode dirty.
>   	 *
>   	 * If we get any other error, we effectively have a corruption situation
> -	 * and we cannot flush the inode, so we treat it the same as failing
> -	 * xfs_iflush_int().
> +	 * and we cannot flush the inode. Abort the flush and shut down.
>   	 */
>   	error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK,
>   			       0);
> @@ -3711,14 +3685,7 @@ xfs_iflush(
>   		return error;
>   	}
>   	if (error)
> -		goto corrupt_out;
> -
> -	/*
> -	 * First flush out the inode that xfs_iflush was called with.
> -	 */
> -	error = xfs_iflush_int(ip, bp);
> -	if (error)
> -		goto corrupt_out;
> +		goto abort;
>   
>   	/*
>   	 * If the buffer is pinned then push on the log now so we won't
> @@ -3728,28 +3695,28 @@ xfs_iflush(
>   		xfs_log_force(mp, 0);
>   
>   	/*
> -	 * inode clustering: try to gather other inodes into this write
> +	 * Flush the provided inode then attempt to gather others from the
> +	 * cluster into the write.
>   	 *
> -	 * Note: Any error during clustering will result in the filesystem
> -	 * being shut down and completion callbacks run on the cluster buffer.
> -	 * As we have already flushed and attached this inode to the buffer,
> -	 * it has already been aborted and released by xfs_iflush_cluster() and
> -	 * so we have no further error handling to do here.
> +	 * Note: Once we attempt to flush an inode, we must run buffer
> +	 * completion callbacks on any failure. If this fails, simulate an I/O
> +	 * failure on the buffer and shut down.
>   	 */
> -	error = xfs_iflush_cluster(ip, bp);
> -	if (error)
> -		return error;
> +	error = xfs_iflush_int(ip, bp);
> +	if (!error)
> +		error = xfs_iflush_cluster(ip, bp);
> +	if (error) {
> +		xfs_buf_iofail(bp, XBF_ASYNC);
> +		goto shutdown;
> +	}
>   
>   	*bpp = bp;
>   	return 0;
>   
> -corrupt_out:
> -	if (bp)
> -		xfs_buf_relse(bp);
> -	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> -abort_out:
> -	/* abort the corrupt inode, as it was not attached to the buffer */
> +abort:
>   	xfs_iflush_abort(ip, false);
> +shutdown:
> +	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
>   	return error;
>   }
>   
> @@ -3798,6 +3765,13 @@ xfs_iflush_int(
>   	       ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
>   	ASSERT(iip != NULL && iip->ili_fields != 0);
>   
> +	/*
> +	 * Attach the inode item callback to the buffer. Whether the flush
> +	 * succeeds or not, buffer I/O completion processing is now required to
> +	 * remove the inode from the AIL and release the flush lock.
> +	 */
> +	xfs_buf_attach_iodone(bp, xfs_iflush_done, &iip->ili_item);
> +
>   	/* set *dip = inode's place in the buffer */
>   	dip = xfs_buf_offset(bp, ip->i_imap.im_boffset);
>   
> @@ -3913,14 +3887,6 @@ xfs_iflush_int(
>   	xfs_trans_ail_copy_lsn(mp->m_ail, &iip->ili_flush_lsn,
>   				&iip->ili_item.li_lsn);
>   
> -	/*
> -	 * Attach the function xfs_iflush_done to the inode's
> -	 * buffer.  This will remove the inode from the AIL
> -	 * and unlock the inode's flush lock when the inode is
> -	 * completely written to disk.
> -	 */
> -	xfs_buf_attach_iodone(bp, xfs_iflush_done, &iip->ili_item);
> -
>   	/* generate the checksum. */
>   	xfs_dinode_calc_crc(mp, dip);
>   
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/12] xfs: remove unnecessary shutdown check from xfs_iflush()
  2020-04-17 15:08 ` [PATCH 04/12] xfs: remove unnecessary shutdown check from xfs_iflush() Brian Foster
@ 2020-04-18  0:27   ` Allison Collins
  2020-04-20  3:10   ` Dave Chinner
  1 sibling, 0 replies; 48+ messages in thread
From: Allison Collins @ 2020-04-18  0:27 UTC (permalink / raw)
  To: Brian Foster, linux-xfs



On 4/17/20 8:08 AM, Brian Foster wrote:
> The shutdown check in xfs_iflush() duplicates checks down in the
> buffer code. If the fs is shut down, xfs_trans_read_buf_map() always
> returns an error and falls into the same error path. Remove the
> unnecessary check along with the warning in xfs_imap_to_bp()
> that generates excessive noise in the log if the fs is shut down.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Ok, I see the duplicate handler you are referring to.
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

> ---
>   fs/xfs/libxfs/xfs_inode_buf.c |  7 +------
>   fs/xfs/xfs_inode.c            | 13 -------------
>   2 files changed, 1 insertion(+), 19 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index 39c5a6e24915..b102e611bf54 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -172,12 +172,7 @@ xfs_imap_to_bp(
>   				   (int)imap->im_len, buf_flags, &bp,
>   				   &xfs_inode_buf_ops);
>   	if (error) {
> -		if (error == -EAGAIN) {
> -			ASSERT(buf_flags & XBF_TRYLOCK);
> -			return error;
> -		}
> -		xfs_warn(mp, "%s: xfs_trans_read_buf() returned error %d.",
> -			__func__, error);
> +		ASSERT(error != -EAGAIN || (buf_flags & XBF_TRYLOCK));
>   		return error;
>   	}
>   
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 4c9971ec6fa6..98ee1b10d1b0 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3657,19 +3657,6 @@ xfs_iflush(
>   		return 0;
>   	}
>   
> -	/*
> -	 * This may have been unpinned because the filesystem is shutting
> -	 * down forcibly. If that's the case we must not write this inode
> -	 * to disk, because the log record didn't make it to disk.
> -	 *
> -	 * We also have to remove the log item from the AIL in this case,
> -	 * as we wait for an empty AIL as part of the unmount process.
> -	 */
> -	if (XFS_FORCED_SHUTDOWN(mp)) {
> -		error = -EIO;
> -		goto abort;
> -	}
> -
>   	/*
>   	 * Get the buffer containing the on-disk inode. We are doing a try-lock
>   	 * operation here, so we may get an EAGAIN error. In that case, return
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 00/12] xfs: flush related error handling cleanups
  2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
                   ` (11 preceding siblings ...)
  2020-04-17 15:08 ` [PATCH 12/12] xfs: random buffer write failure errortag Brian Foster
@ 2020-04-19 22:53 ` Dave Chinner
  2020-04-20 14:06   ` Brian Foster
  12 siblings, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2020-04-19 22:53 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 17, 2020 at 11:08:47AM -0400, Brian Foster wrote:
> Hi all,
> 
> This actually started as what I intended to be a cleanup of xfsaild
> error handling and the fact that unexpected errors are kind of lost in
> the ->iop_push() handlers of flushable log items. Some discussion with
> Dave on that is available here[1]. I was thinking of genericizing the
> behavior, but I'm not so sure that is possible now given the error
> handling requirements of the associated items.
> 
> While thinking through that, I ended up incorporating various cleanups
> in the somewhat confusing and erratic error handling on the periphery of
> xfsaild, such as the flush handlers. Most of these are straightforward
> cleanups except for patch 9, which I think requires careful review and
> is of debatable value. I have used patch 12 to run an hour or so of
> highly concurrent fsstress load against it and will execute a longer run
> over the weekend now that fstests has completed.
> 
> Thoughts, reviews, flames appreciated.

I'll need to do something thinking on this patchset - I have a
patchset that touches a lot of the same code I'm working on right
now to pin inode cluster buffers in memory when the inode is dirtied
so we don't get RMW cycles in AIL flushing.

That code gets rid of xfs_iflush() completely, removes dirty inodes
from the AIL and tracks only ordered cluster buffers in the AIL for
inode writeback (i.e. reduces AIL tracked log items by up to 30x).
It also only does inode writeback from the ordered cluster buffers.

The idea behind this is to make inode flushing completely
non-blocking, and to simply inode cluster flushing to simply iterate
all the dirty inodes attached to the buffer. This gets rid of radix
tree lookups and races with reclaim, and gets rid of having to
special case a locked inode in the cluster iteration code.

I was looking at this as the model to then apply to dquot flushing,
too, because it currently does not have cluster flushing, and hence
flushes dquots individually, even though there can be multiple dirty
dquots per buffer. Some of this patchset moves the dquot flushing a
bit closer to the inode code, so those parts are going to be useful
regardless of everything else....

Do you have a git tree I could pull this from to see how bad the
conflicts are?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 01/12] xfs: refactor failed buffer resubmission into xfsaild
  2020-04-17 15:08 ` [PATCH 01/12] xfs: refactor failed buffer resubmission into xfsaild Brian Foster
  2020-04-17 22:37   ` Allison Collins
@ 2020-04-20  2:45   ` Dave Chinner
  2020-04-20 13:58     ` Brian Foster
  1 sibling, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2020-04-20  2:45 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 17, 2020 at 11:08:48AM -0400, Brian Foster wrote:
> Flush locked log items whose underlying buffers fail metadata
> writeback are tagged with a special flag to indicate that the flush
> lock is already held. This is currently implemented in the type
> specific ->iop_push() callback, but the processing required for such
> items is not type specific because we're only doing basic state
> management on the underlying buffer.
> 
> Factor the failed log item handling out of the inode and dquot
> ->iop_push() callbacks and open code the buffer resubmit helper into
> a single helper called from xfsaild_push_item(). This provides a
> generic mechanism for handling failed metadata buffer writeback with
> a bit less code.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
.....
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 564253550b75..0c709651a2c6 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -345,6 +345,45 @@ xfs_ail_delete(
>  	xfs_trans_ail_cursor_clear(ailp, lip);
>  }
>  
> +/*
> + * Requeue a failed buffer for writeback.
> + *
> + * We clear the log item failed state here as well, but we have to be careful
> + * about reference counts because the only active reference counts on the buffer
> + * may be the failed log items. Hence if we clear the log item failed state
> + * before queuing the buffer for IO we can release all active references to
> + * the buffer and free it, leading to use after free problems in
> + * xfs_buf_delwri_queue. It makes no difference to the buffer or log items which
> + * order we process them in - the buffer is locked, and we own the buffer list
> + * so nothing on them is going to change while we are performing this action.
> + *
> + * Hence we can safely queue the buffer for IO before we clear the failed log
> + * item state, therefore  always having an active reference to the buffer and
> + * avoiding the transient zero-reference state that leads to use-after-free.
> + */
> +static inline int
> +xfsaild_push_failed(

Bad name IMO. Makes me think it's an action that is taken when an
item push failed, not an action that resubmits a buffer that had an
IO failure.

i.e. "push" doesn't imply IO, and failure to push an item doesn't
mean there was an IO error - it may be locked, already flushing,
pinned, etc.

> +	struct xfs_log_item	*lip,
> +	struct list_head	*buffer_list)
> +{
> +	struct xfs_buf		*bp = lip->li_buf;

This also assumes that the log item has a buffer associated with it.
So perhaps "xfsaild_resubmit_failed_buffer()" would be more
approriate here.

Other than that, the code is fine.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 02/12] xfs: factor out buffer I/O failure simulation code
  2020-04-17 15:08 ` [PATCH 02/12] xfs: factor out buffer I/O failure simulation code Brian Foster
  2020-04-17 22:37   ` Allison Collins
@ 2020-04-20  2:48   ` Dave Chinner
  2020-04-20 13:58     ` Brian Foster
  1 sibling, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2020-04-20  2:48 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 17, 2020 at 11:08:49AM -0400, Brian Foster wrote:
> We use the same buffer I/O failure simulation code in a few
> different places. It's not much code, but it's not necessarily
> self-explanatory. Factor it into a helper and document it in one
> place.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_buf.c      | 23 +++++++++++++++++++----
>  fs/xfs/xfs_buf.h      |  1 +
>  fs/xfs/xfs_buf_item.c | 22 +++-------------------
>  fs/xfs/xfs_inode.c    |  7 +------
>  4 files changed, 24 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 9ec3eaf1c618..93942d8e35dd 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1248,6 +1248,24 @@ xfs_buf_ioerror_alert(
>  			-bp->b_error);
>  }
>  
> +/*
> +  * To simulate an I/O failure, the buffer must be locked and held with at least

Whitespace.

> + * three references. The LRU reference is dropped by the stale call. The buf
> + * item reference is dropped via ioend processing. The third reference is owned
> + * by the caller and is dropped on I/O completion if the buffer is XBF_ASYNC.
> + */
> +void
> +xfs_buf_iofail(
> +	struct xfs_buf	*bp,
> +	int		flags)
> +{
> +	bp->b_flags |= flags;
> +	bp->b_flags &= ~XBF_DONE;
> +	xfs_buf_stale(bp);
> +	xfs_buf_ioerror(bp, -EIO);
> +	xfs_buf_ioend(bp);
> +}

This function is an IO completion function. Can we call it
xfs_buf_ioend_fail(), please, to indicate that it both fails and
completes the IO in progress?


Otherwise ok.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/12] xfs: always attach iflush_done and simplify error handling
  2020-04-17 15:08 ` [PATCH 03/12] xfs: always attach iflush_done and simplify error handling Brian Foster
  2020-04-18  0:07   ` Allison Collins
@ 2020-04-20  3:08   ` Dave Chinner
  2020-04-20 14:00     ` Brian Foster
  1 sibling, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2020-04-20  3:08 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 17, 2020 at 11:08:50AM -0400, Brian Foster wrote:
> The inode flush code has several layers of error handling between
> the inode and cluster flushing code. If the inode flush fails before
> acquiring the backing buffer, the inode flush is aborted. If the
> cluster flush fails, the current inode flush is aborted and the
> cluster buffer is failed to handle the initial inode and any others
> that might have been attached before the error.
> 
> Since xfs_iflush() is the only caller of xfs_iflush_cluser(), the

xfs_iflush_cluster()

> error handling between the two can be condensed in the top-level
> function. If we update xfs_iflush_int() to attach the item
> completion handler to the buffer first, any errors that occur after
> the first call to xfs_iflush_int() can be handled with a buffer
> I/O failure.
> 
> Lift the error handling from xfs_iflush_cluster() into xfs_iflush()
> and consolidate with the existing error handling. This also replaces
> the need to release the buffer because failing the buffer with
> XBF_ASYNC drops the current reference.

Yeah, that makes sense. I've lifted the cluster flush error handling
into the callers, even though xfs_iflush() has gone away.

However...

> @@ -3798,6 +3765,13 @@ xfs_iflush_int(
>  	       ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
>  	ASSERT(iip != NULL && iip->ili_fields != 0);
>  
> +	/*
> +	 * Attach the inode item callback to the buffer. Whether the flush
> +	 * succeeds or not, buffer I/O completion processing is now required to
> +	 * remove the inode from the AIL and release the flush lock.
> +	 */
> +	xfs_buf_attach_iodone(bp, xfs_iflush_done, &iip->ili_item);
> +
>  	/* set *dip = inode's place in the buffer */
>  	dip = xfs_buf_offset(bp, ip->i_imap.im_boffset);

...I'm not convinced this is a valid thing to do at this point. The
inode item has not been set up yet with the correct state that is
associated with the flushing of the inode (e.g. lsn, last_flags,
etc) and so this kinda leaves a landmine in the item IO completion
processing in that failure cannot rely on any of the inode log item
state to make condition decisions.

While it's technically not wrong, it just makes me uneasy, as in
future the flush abort code will have to be careful about using
inode state in making decisions, and there's not comments in the
abort code to indicate that the state may be invalid...

/me has chased several subtle issues through this code recently...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/12] xfs: remove unnecessary shutdown check from xfs_iflush()
  2020-04-17 15:08 ` [PATCH 04/12] xfs: remove unnecessary shutdown check from xfs_iflush() Brian Foster
  2020-04-18  0:27   ` Allison Collins
@ 2020-04-20  3:10   ` Dave Chinner
  1 sibling, 0 replies; 48+ messages in thread
From: Dave Chinner @ 2020-04-20  3:10 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 17, 2020 at 11:08:51AM -0400, Brian Foster wrote:
> The shutdown check in xfs_iflush() duplicates checks down in the
> buffer code. If the fs is shut down, xfs_trans_read_buf_map() always
> returns an error and falls into the same error path. Remove the
> unnecessary check along with the warning in xfs_imap_to_bp()
> that generates excessive noise in the log if the fs is shut down.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 05/12] xfs: ratelimit unmount time per-buffer I/O error warning
  2020-04-17 15:08 ` [PATCH 05/12] xfs: ratelimit unmount time per-buffer I/O error warning Brian Foster
@ 2020-04-20  3:19   ` Dave Chinner
  2020-04-20 14:02     ` Brian Foster
  2020-04-20 18:50   ` Allison Collins
  1 sibling, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2020-04-20  3:19 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 17, 2020 at 11:08:52AM -0400, Brian Foster wrote:
> At unmount time, XFS emits a warning for every in-core buffer that
> might have undergone a write error. In practice this behavior is
> probably reasonable given that the filesystem is likely short lived
> once I/O errors begin to occur consistently. Under certain test or
> otherwise expected error conditions, this can spam the logs and slow
> down the unmount. Ratelimit the warning to prevent this problem
> while still informing the user that errors have occurred.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_buf.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 93942d8e35dd..5120fed06075 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1685,11 +1685,10 @@ xfs_wait_buftarg(
>  			bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
>  			list_del_init(&bp->b_lru);
>  			if (bp->b_flags & XBF_WRITE_FAIL) {
> -				xfs_alert(btp->bt_mount,
> -"Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!",
> +				xfs_alert_ratelimited(btp->bt_mount,
> +"Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!\n"
> +"Please run xfs_repair to determine the extent of the problem.",
>  					(long long)bp->b_bn);

Hmmmm. I was under the impression that multiple line log messages
were frowned upon because they prevent every output line in the log
being tagged correctly. That's where KERN_CONT came from (i.e. it's
a continuation of a previous log message), but we don't use that
with the XFS logging and hence multi-line log messages are split
into multiple logging calls.

IOWs, this might be better handled just using a static ratelimit
variable here....

Actually, we already have one for xfs_buf_item_push() to limit
warnings about retrying XBF_WRITE_FAIL buffers:

static DEFINE_RATELIMIT_STATE(xfs_buf_write_fail_rl_state, 30 * HZ, 10);

Perhaps we should be using the same ratelimit variable here....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] xfs: remove duplicate verification from xfs_qm_dqflush()
  2020-04-17 15:08 ` [PATCH 06/12] xfs: remove duplicate verification from xfs_qm_dqflush() Brian Foster
@ 2020-04-20  3:53   ` Dave Chinner
  2020-04-20 14:02     ` Brian Foster
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2020-04-20  3:53 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 17, 2020 at 11:08:53AM -0400, Brian Foster wrote:
> The dquot read/write verifier calls xfs_dqblk_verify() on every
> dquot in the buffer. Remove the duplicate call from
> xfs_qm_dqflush().

Ah, I think there's a bug here - it's not supposed to be a
duplicate....

> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_dquot.c | 14 --------------
>  1 file changed, 14 deletions(-)
> 
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index af2c8e5ceea0..73032c18a94a 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -1071,7 +1071,6 @@ xfs_qm_dqflush(
>  	struct xfs_buf		*bp;
>  	struct xfs_dqblk	*dqb;
>  	struct xfs_disk_dquot	*ddqp;
> -	xfs_failaddr_t		fa;
>  	int			error;
>  
>  	ASSERT(XFS_DQ_IS_LOCKED(dqp));
> @@ -1116,19 +1115,6 @@ xfs_qm_dqflush(
>  	dqb = bp->b_addr + dqp->q_bufoffset;
>  	ddqp = &dqb->dd_diskdq;
>  
> -	/*
> -	 * A simple sanity check in case we got a corrupted dquot.
> -	 */
> -	fa = xfs_dqblk_verify(mp, dqb, be32_to_cpu(ddqp->d_id), 0);

So this verifies the on disk dquot ....

> -	if (fa) {
> -		xfs_alert(mp, "corrupt dquot ID 0x%x in memory at %pS",

...which issues an "in memory corruption" alert on failure...

> -				be32_to_cpu(ddqp->d_id), fa);
> -		xfs_buf_relse(bp);
> -		xfs_dqfunlock(dqp);
> -		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> -		return -EFSCORRUPTED;
> -	}
> -
>  	/* This is the only portion of data that needs to persist */
>  	memcpy(ddqp, &dqp->q_core, sizeof(struct xfs_disk_dquot));

.... and on success we immediately overwrite the on-disk copy with
the unchecked in-memory copy of the dquot. 

IOWs, I think that verification call here should be checking the
in-memory dquot core, not the on disk buffer that is about to get
trashed.  i.e. something like this:

-	fa = xfs_dqblk_verify(mp, dqb, be32_to_cpu(ddqp->d_id), 0);
+	fa = xfs_dquot_verify(mp, &dqp->q_core, be32_to_cpu(ddqp->d_id), 0);

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 07/12] xfs: abort consistently on dquot flush failure
  2020-04-17 15:08 ` [PATCH 07/12] xfs: abort consistently on dquot flush failure Brian Foster
@ 2020-04-20  3:54   ` Dave Chinner
  2020-04-20 18:50   ` Allison Collins
  1 sibling, 0 replies; 48+ messages in thread
From: Dave Chinner @ 2020-04-20  3:54 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 17, 2020 at 11:08:54AM -0400, Brian Foster wrote:
> The dquot flush handler effectively aborts the dquot flush if the
> filesystem is already shut down, but doesn't actually shut down if
> the flush fails. Update xfs_qm_dqflush() to consistently abort the
> dquot flush and shutdown the fs if the flush fails with an
> unexpected error.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Makes sense.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 08/12] xfs: remove unnecessary quotaoff intent item push handler
  2020-04-17 15:08 ` [PATCH 08/12] xfs: remove unnecessary quotaoff intent item push handler Brian Foster
@ 2020-04-20  3:58   ` Dave Chinner
  2020-04-20 14:02     ` Brian Foster
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2020-04-20  3:58 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 17, 2020 at 11:08:55AM -0400, Brian Foster wrote:
> The quotaoff intent item push handler unconditionally returns locked
> status because it remains AIL resident until removed by the
> quotafoff end intent. xfsaild_push_item() already returns pinned
> status for items (generally intents) without a push handler. This is
> effectively the same behavior for the purpose of quotaoff, so remove

It's not the same. XFS_ITEM_PINNED results in a log force from the
xfsaild, while XFS_ITEM_LOCKED items are just skipped. So this
change will result in a log force every time the AIL push sees a
quotaoff log item.  Hence I think the code as it stands is correct
as log forces will not speed up the removal of the quotaoff item
from the AIL...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/12] xfs: clean up AIL log item removal functions
  2020-04-17 15:08 ` [PATCH 10/12] xfs: clean up AIL log item removal functions Brian Foster
@ 2020-04-20  4:32   ` Dave Chinner
  2020-04-20 14:03     ` Brian Foster
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2020-04-20  4:32 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 17, 2020 at 11:08:57AM -0400, Brian Foster wrote:
> We have two AIL removal functions with slightly different semantics.
> xfs_trans_ail_delete() expects the caller to have the AIL lock and
> for the associated item to be AIL resident. If not, the filesystem
> is shut down. xfs_trans_ail_remove() acquires the AIL lock, checks
> that the item is AIL resident and calls the former if so.
> 
> These semantics lead to confused usage between the two. For example,
> the _remove() variant takes a shutdown parameter to pass to the
> _delete() variant, but immediately returns if the AIL bit is not
> set. This means that _remove() would never shut down if an item is
> not AIL resident, even though it appears that many callers would
> expect it to.
> 
> Make the following changes to clean up both of these functions:
> 
> - Most callers of xfs_trans_ail_delete() acquire the AIL lock just
>   before the call. Update _delete() to acquire the lock and open
>   code the couple of callers that make additional checks under AIL
>   lock.
> - Drop the unnecessary ailp parameter from _delete().
> - Drop the unused shutdown parameter from _remove() and open code
>   the implementation.
> 
> In summary, this leaves a _delete() variant that expects an AIL
> resident item and a _remove() helper that checks the AIL bit. Audit
> the existing callsites for use of the appropriate function and
> update as necessary.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
....

Good start, but...

> @@ -1032,10 +1033,11 @@ xfs_qm_dqflush_done(
>  		goto out;
>  
>  	spin_lock(&ailp->ail_lock);
> -	if (lip->li_lsn == qip->qli_flush_lsn)
> -		/* xfs_trans_ail_delete() drops the AIL lock */
> -		xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
> -	else
> +	if (lip->li_lsn == qip->qli_flush_lsn) {
> +		/* xfs_ail_update_finish() drops the AIL lock */
> +		tail_lsn = xfs_ail_delete_one(ailp, lip);
> +		xfs_ail_update_finish(ailp, tail_lsn);
> +	} else
>  		spin_unlock(&ailp->ail_lock);

This drops the shutdown if the dquot is not in the AIL. It should be
in the AIL, so if it isn't we should be shutting down...

> @@ -872,13 +872,14 @@ xfs_ail_delete_one(
>   */
>  void
>  xfs_trans_ail_delete(
> -	struct xfs_ail		*ailp,
>  	struct xfs_log_item	*lip,
>  	int			shutdown_type)
>  {
> +	struct xfs_ail		*ailp = lip->li_ailp;
>  	struct xfs_mount	*mp = ailp->ail_mount;
>  	xfs_lsn_t		tail_lsn;
>  
> +	spin_lock(&ailp->ail_lock);
>  	if (!test_bit(XFS_LI_IN_AIL, &lip->li_flags)) {
>  		spin_unlock(&ailp->ail_lock);
>  		if (!XFS_FORCED_SHUTDOWN(mp)) {
> diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
> index 9135afdcee9d..7563c78e2997 100644
> --- a/fs/xfs/xfs_trans_priv.h
> +++ b/fs/xfs/xfs_trans_priv.h
> @@ -94,22 +94,23 @@ xfs_trans_ail_update(
>  xfs_lsn_t xfs_ail_delete_one(struct xfs_ail *ailp, struct xfs_log_item *lip);
>  void xfs_ail_update_finish(struct xfs_ail *ailp, xfs_lsn_t old_lsn)
>  			__releases(ailp->ail_lock);
> -void xfs_trans_ail_delete(struct xfs_ail *ailp, struct xfs_log_item *lip,
> -		int shutdown_type);
> +void xfs_trans_ail_delete(struct xfs_log_item *lip, int shutdown_type);
>  
>  static inline void
>  xfs_trans_ail_remove(
> -	struct xfs_log_item	*lip,
> -	int			shutdown_type)
> +	struct xfs_log_item	*lip)
>  {
>  	struct xfs_ail		*ailp = lip->li_ailp;
> +	xfs_lsn_t		tail_lsn;
>  
>  	spin_lock(&ailp->ail_lock);
> -	/* xfs_trans_ail_delete() drops the AIL lock */
> -	if (test_bit(XFS_LI_IN_AIL, &lip->li_flags))
> -		xfs_trans_ail_delete(ailp, lip, shutdown_type);
> -	else
> +	/* xfs_ail_update_finish() drops the AIL lock */
> +	if (test_bit(XFS_LI_IN_AIL, &lip->li_flags)) {
> +		tail_lsn = xfs_ail_delete_one(ailp, lip);
> +		xfs_ail_update_finish(ailp, tail_lsn);
> +	} else {
>  		spin_unlock(&ailp->ail_lock);
> +	}
>  }

This makes xfs_trans_ail_delete() and xfs_trans_ail_remove() almost
identical, except one will shutdown if the item is not in the AIL
and the other won't. Wouldn't it be better to get it down to just
one function that does everything, and remove the confusion of which
to use altogether?

void
xfs_trans_ail_delete(
	struct xfs_log_item     *lip,
	int                     shutdown)
{
	struct xfs_ail		*ailp = lip->li_ailp;

	spin_lock(&ailp->ail_lock);
	if (test_bit(XFS_LI_IN_AIL, &lip->li_flags)) {
		xfs_lsn_t tail_lsn = xfs_ail_delete_one(ailp, lip);
		xfs_ail_update_finish(ailp, tail_lsn);
		return;
	}
	spin_unlock(&ailp->ail_lock);
	if (!shutdown)
		return;

	/* do shutdown stuff */
}

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 11/12] xfs: remove unused iflush stale parameter
  2020-04-17 15:08 ` [PATCH 11/12] xfs: remove unused iflush stale parameter Brian Foster
@ 2020-04-20  4:34   ` Dave Chinner
  2020-04-20 19:19   ` Allison Collins
  1 sibling, 0 replies; 48+ messages in thread
From: Dave Chinner @ 2020-04-20  4:34 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 17, 2020 at 11:08:58AM -0400, Brian Foster wrote:
> The stale parameter was used to control the now unused shutdown
> parameter of xfs_trans_ail_remove().
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/12] xfs: random buffer write failure errortag
  2020-04-17 15:08 ` [PATCH 12/12] xfs: random buffer write failure errortag Brian Foster
@ 2020-04-20  4:37   ` Dave Chinner
  2020-04-20 14:04     ` Brian Foster
  2020-04-20 22:42   ` Allison Collins
  1 sibling, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2020-04-20  4:37 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Apr 17, 2020 at 11:08:59AM -0400, Brian Foster wrote:
> Introduce an error tag to randomly fail async buffer writes. This is
> primarily to facilitate testing of the XFS error configuration
> mechanism.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Isn't this what XFS_ERRTAG_IODONE_IOERR and XFS_RANDOM_IODONE_IOERR
is for?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 01/12] xfs: refactor failed buffer resubmission into xfsaild
  2020-04-20  2:45   ` Dave Chinner
@ 2020-04-20 13:58     ` Brian Foster
  2020-04-20 22:19       ` Dave Chinner
  0 siblings, 1 reply; 48+ messages in thread
From: Brian Foster @ 2020-04-20 13:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Apr 20, 2020 at 12:45:56PM +1000, Dave Chinner wrote:
> On Fri, Apr 17, 2020 at 11:08:48AM -0400, Brian Foster wrote:
> > Flush locked log items whose underlying buffers fail metadata
> > writeback are tagged with a special flag to indicate that the flush
> > lock is already held. This is currently implemented in the type
> > specific ->iop_push() callback, but the processing required for such
> > items is not type specific because we're only doing basic state
> > management on the underlying buffer.
> > 
> > Factor the failed log item handling out of the inode and dquot
> > ->iop_push() callbacks and open code the buffer resubmit helper into
> > a single helper called from xfsaild_push_item(). This provides a
> > generic mechanism for handling failed metadata buffer writeback with
> > a bit less code.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> .....
> > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> > index 564253550b75..0c709651a2c6 100644
> > --- a/fs/xfs/xfs_trans_ail.c
> > +++ b/fs/xfs/xfs_trans_ail.c
> > @@ -345,6 +345,45 @@ xfs_ail_delete(
> >  	xfs_trans_ail_cursor_clear(ailp, lip);
> >  }
> >  
> > +/*
> > + * Requeue a failed buffer for writeback.
> > + *
> > + * We clear the log item failed state here as well, but we have to be careful
> > + * about reference counts because the only active reference counts on the buffer
> > + * may be the failed log items. Hence if we clear the log item failed state
> > + * before queuing the buffer for IO we can release all active references to
> > + * the buffer and free it, leading to use after free problems in
> > + * xfs_buf_delwri_queue. It makes no difference to the buffer or log items which
> > + * order we process them in - the buffer is locked, and we own the buffer list
> > + * so nothing on them is going to change while we are performing this action.
> > + *
> > + * Hence we can safely queue the buffer for IO before we clear the failed log
> > + * item state, therefore  always having an active reference to the buffer and
> > + * avoiding the transient zero-reference state that leads to use-after-free.
> > + */
> > +static inline int
> > +xfsaild_push_failed(
> 
> Bad name IMO. Makes me think it's an action that is taken when an
> item push failed, not an action that resubmits a buffer that had an
> IO failure.
> 
> i.e. "push" doesn't imply IO, and failure to push an item doesn't
> mean there was an IO error - it may be locked, already flushing,
> pinned, etc.
> 

Yeah..

> > +	struct xfs_log_item	*lip,
> > +	struct list_head	*buffer_list)
> > +{
> > +	struct xfs_buf		*bp = lip->li_buf;
> 
> This also assumes that the log item has a buffer associated with it.
> So perhaps "xfsaild_resubmit_failed_buffer()" would be more
> approriate here.
> 

The buffer pointer is an implementation detail of failed items. It would
be nice to find a way to avoid that. How about xfsaild_resubmit_item()
to be consistent with the parent function (xfsaild_push_item())?

Brian

> Other than that, the code is fine.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 02/12] xfs: factor out buffer I/O failure simulation code
  2020-04-20  2:48   ` Dave Chinner
@ 2020-04-20 13:58     ` Brian Foster
  0 siblings, 0 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-20 13:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Apr 20, 2020 at 12:48:40PM +1000, Dave Chinner wrote:
> On Fri, Apr 17, 2020 at 11:08:49AM -0400, Brian Foster wrote:
> > We use the same buffer I/O failure simulation code in a few
> > different places. It's not much code, but it's not necessarily
> > self-explanatory. Factor it into a helper and document it in one
> > place.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_buf.c      | 23 +++++++++++++++++++----
> >  fs/xfs/xfs_buf.h      |  1 +
> >  fs/xfs/xfs_buf_item.c | 22 +++-------------------
> >  fs/xfs/xfs_inode.c    |  7 +------
> >  4 files changed, 24 insertions(+), 29 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 9ec3eaf1c618..93942d8e35dd 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -1248,6 +1248,24 @@ xfs_buf_ioerror_alert(
> >  			-bp->b_error);
> >  }
> >  
> > +/*
> > +  * To simulate an I/O failure, the buffer must be locked and held with at least
> 
> Whitespace.
> 

Fixed.

> > + * three references. The LRU reference is dropped by the stale call. The buf
> > + * item reference is dropped via ioend processing. The third reference is owned
> > + * by the caller and is dropped on I/O completion if the buffer is XBF_ASYNC.
> > + */
> > +void
> > +xfs_buf_iofail(
> > +	struct xfs_buf	*bp,
> > +	int		flags)
> > +{
> > +	bp->b_flags |= flags;
> > +	bp->b_flags &= ~XBF_DONE;
> > +	xfs_buf_stale(bp);
> > +	xfs_buf_ioerror(bp, -EIO);
> > +	xfs_buf_ioend(bp);
> > +}
> 
> This function is an IO completion function. Can we call it
> xfs_buf_ioend_fail(), please, to indicate that it both fails and
> completes the IO in progress?
> 

Works for me..

Brian

> 
> Otherwise ok.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/12] xfs: always attach iflush_done and simplify error handling
  2020-04-18  0:07   ` Allison Collins
@ 2020-04-20 13:59     ` Brian Foster
  0 siblings, 0 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-20 13:59 UTC (permalink / raw)
  To: Allison Collins; +Cc: linux-xfs

On Fri, Apr 17, 2020 at 05:07:16PM -0700, Allison Collins wrote:
> On 4/17/20 8:08 AM, Brian Foster wrote:
> > The inode flush code has several layers of error handling between
> > the inode and cluster flushing code. If the inode flush fails before
> > acquiring the backing buffer, the inode flush is aborted. If the
> > cluster flush fails, the current inode flush is aborted and the
> > cluster buffer is failed to handle the initial inode and any others
> > that might have been attached before the error.
> > 
> > Since xfs_iflush() is the only caller of xfs_iflush_cluser(), the
> > error handling between the two can be condensed in the top-level
> > function. If we update xfs_iflush_int() to attach the item
> > completion handler to the buffer first, any errors that occur after
> > the first call to xfs_iflush_int() can be handled with a buffer
> > I/O failure.
> > 
> > Lift the error handling from xfs_iflush_cluster() into xfs_iflush()
> > and consolidate with the existing error handling. This also replaces
> > the need to release the buffer because failing the buffer with
> > XBF_ASYNC drops the current reference.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >   fs/xfs/xfs_inode.c | 94 +++++++++++++++-------------------------------
> >   1 file changed, 30 insertions(+), 64 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index b539ee221ce5..4c9971ec6fa6 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -3496,6 +3496,7 @@ xfs_iflush_cluster(
> >   	struct xfs_inode	**cilist;
> >   	struct xfs_inode	*cip;
> >   	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
> > +	int			error = 0;
> >   	int			nr_found;
> >   	int			clcount = 0;
> >   	int			i;
> > @@ -3588,11 +3589,10 @@ xfs_iflush_cluster(
> >   		 * re-check that it's dirty before flushing.
> >   		 */
> >   		if (!xfs_inode_clean(cip)) {
> > -			int	error;
> >   			error = xfs_iflush_int(cip, bp);
> >   			if (error) {
> >   				xfs_iunlock(cip, XFS_ILOCK_SHARED);
> > -				goto cluster_corrupt_out;
> > +				goto out_free;
> >   			}
> >   			clcount++;
> >   		} else {
> > @@ -3611,32 +3611,7 @@ xfs_iflush_cluster(
> >   	kmem_free(cilist);
> >   out_put:
> >   	xfs_perag_put(pag);
> > -	return 0;
> > -
> > -
> > -cluster_corrupt_out:
> > -	/*
> > -	 * Corruption detected in the clustering loop.  Invalidate the
> > -	 * inode buffer and shut down the filesystem.
> > -	 */
> > -	rcu_read_unlock();
> Hmm, I can't find where this unlock moved?  Is it not needed anymore?  I
> think I followed the rest of it though.
> 

It's not needed because the whole cluster_corrupt_out path goes away.
out_free is used instead, which also calls rcu_read_unlock() and returns
the error to the caller..

> Reviewed-by: Allison Collins <allison.henderson@oracle.com>
> 

Thanks!

Brian

> Allison
> 
> > -
> > -	/*
> > -	 * We'll always have an inode attached to the buffer for completion
> > -	 * process by the time we are called from xfs_iflush(). Hence we have
> > -	 * always need to do IO completion processing to abort the inodes
> > -	 * attached to the buffer.  handle them just like the shutdown case in
> > -	 * xfs_buf_submit().
> > -	 */
> > -	ASSERT(bp->b_iodone);
> > -	xfs_buf_iofail(bp, XBF_ASYNC);
> > -	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > -
> > -	/* abort the corrupt inode, as it was not attached to the buffer */
> > -	xfs_iflush_abort(cip, false);
> > -	kmem_free(cilist);
> > -	xfs_perag_put(pag);
> > -	return -EFSCORRUPTED;
> > +	return error;
> >   }
> >   /*
> > @@ -3692,17 +3667,16 @@ xfs_iflush(
> >   	 */
> >   	if (XFS_FORCED_SHUTDOWN(mp)) {
> >   		error = -EIO;
> > -		goto abort_out;
> > +		goto abort;
> >   	}
> >   	/*
> >   	 * Get the buffer containing the on-disk inode. We are doing a try-lock
> > -	 * operation here, so we may get  an EAGAIN error. In that case, we
> > -	 * simply want to return with the inode still dirty.
> > +	 * operation here, so we may get an EAGAIN error. In that case, return
> > +	 * leaving the inode dirty.
> >   	 *
> >   	 * If we get any other error, we effectively have a corruption situation
> > -	 * and we cannot flush the inode, so we treat it the same as failing
> > -	 * xfs_iflush_int().
> > +	 * and we cannot flush the inode. Abort the flush and shut down.
> >   	 */
> >   	error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK,
> >   			       0);
> > @@ -3711,14 +3685,7 @@ xfs_iflush(
> >   		return error;
> >   	}
> >   	if (error)
> > -		goto corrupt_out;
> > -
> > -	/*
> > -	 * First flush out the inode that xfs_iflush was called with.
> > -	 */
> > -	error = xfs_iflush_int(ip, bp);
> > -	if (error)
> > -		goto corrupt_out;
> > +		goto abort;
> >   	/*
> >   	 * If the buffer is pinned then push on the log now so we won't
> > @@ -3728,28 +3695,28 @@ xfs_iflush(
> >   		xfs_log_force(mp, 0);
> >   	/*
> > -	 * inode clustering: try to gather other inodes into this write
> > +	 * Flush the provided inode then attempt to gather others from the
> > +	 * cluster into the write.
> >   	 *
> > -	 * Note: Any error during clustering will result in the filesystem
> > -	 * being shut down and completion callbacks run on the cluster buffer.
> > -	 * As we have already flushed and attached this inode to the buffer,
> > -	 * it has already been aborted and released by xfs_iflush_cluster() and
> > -	 * so we have no further error handling to do here.
> > +	 * Note: Once we attempt to flush an inode, we must run buffer
> > +	 * completion callbacks on any failure. If this fails, simulate an I/O
> > +	 * failure on the buffer and shut down.
> >   	 */
> > -	error = xfs_iflush_cluster(ip, bp);
> > -	if (error)
> > -		return error;
> > +	error = xfs_iflush_int(ip, bp);
> > +	if (!error)
> > +		error = xfs_iflush_cluster(ip, bp);
> > +	if (error) {
> > +		xfs_buf_iofail(bp, XBF_ASYNC);
> > +		goto shutdown;
> > +	}
> >   	*bpp = bp;
> >   	return 0;
> > -corrupt_out:
> > -	if (bp)
> > -		xfs_buf_relse(bp);
> > -	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > -abort_out:
> > -	/* abort the corrupt inode, as it was not attached to the buffer */
> > +abort:
> >   	xfs_iflush_abort(ip, false);
> > +shutdown:
> > +	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> >   	return error;
> >   }
> > @@ -3798,6 +3765,13 @@ xfs_iflush_int(
> >   	       ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
> >   	ASSERT(iip != NULL && iip->ili_fields != 0);
> > +	/*
> > +	 * Attach the inode item callback to the buffer. Whether the flush
> > +	 * succeeds or not, buffer I/O completion processing is now required to
> > +	 * remove the inode from the AIL and release the flush lock.
> > +	 */
> > +	xfs_buf_attach_iodone(bp, xfs_iflush_done, &iip->ili_item);
> > +
> >   	/* set *dip = inode's place in the buffer */
> >   	dip = xfs_buf_offset(bp, ip->i_imap.im_boffset);
> > @@ -3913,14 +3887,6 @@ xfs_iflush_int(
> >   	xfs_trans_ail_copy_lsn(mp->m_ail, &iip->ili_flush_lsn,
> >   				&iip->ili_item.li_lsn);
> > -	/*
> > -	 * Attach the function xfs_iflush_done to the inode's
> > -	 * buffer.  This will remove the inode from the AIL
> > -	 * and unlock the inode's flush lock when the inode is
> > -	 * completely written to disk.
> > -	 */
> > -	xfs_buf_attach_iodone(bp, xfs_iflush_done, &iip->ili_item);
> > -
> >   	/* generate the checksum. */
> >   	xfs_dinode_calc_crc(mp, dip);
> > 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/12] xfs: always attach iflush_done and simplify error handling
  2020-04-20  3:08   ` Dave Chinner
@ 2020-04-20 14:00     ` Brian Foster
  0 siblings, 0 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-20 14:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Apr 20, 2020 at 01:08:52PM +1000, Dave Chinner wrote:
> On Fri, Apr 17, 2020 at 11:08:50AM -0400, Brian Foster wrote:
> > The inode flush code has several layers of error handling between
> > the inode and cluster flushing code. If the inode flush fails before
> > acquiring the backing buffer, the inode flush is aborted. If the
> > cluster flush fails, the current inode flush is aborted and the
> > cluster buffer is failed to handle the initial inode and any others
> > that might have been attached before the error.
> > 
> > Since xfs_iflush() is the only caller of xfs_iflush_cluser(), the
> 
> xfs_iflush_cluster()
> 

Fixed.

> > error handling between the two can be condensed in the top-level
> > function. If we update xfs_iflush_int() to attach the item
> > completion handler to the buffer first, any errors that occur after
> > the first call to xfs_iflush_int() can be handled with a buffer
> > I/O failure.
> > 
> > Lift the error handling from xfs_iflush_cluster() into xfs_iflush()
> > and consolidate with the existing error handling. This also replaces
> > the need to release the buffer because failing the buffer with
> > XBF_ASYNC drops the current reference.
> 
> Yeah, that makes sense. I've lifted the cluster flush error handling
> into the callers, even though xfs_iflush() has gone away.
> 
> However...
> 
> > @@ -3798,6 +3765,13 @@ xfs_iflush_int(
> >  	       ip->i_d.di_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
> >  	ASSERT(iip != NULL && iip->ili_fields != 0);
> >  
> > +	/*
> > +	 * Attach the inode item callback to the buffer. Whether the flush
> > +	 * succeeds or not, buffer I/O completion processing is now required to
> > +	 * remove the inode from the AIL and release the flush lock.
> > +	 */
> > +	xfs_buf_attach_iodone(bp, xfs_iflush_done, &iip->ili_item);
> > +
> >  	/* set *dip = inode's place in the buffer */
> >  	dip = xfs_buf_offset(bp, ip->i_imap.im_boffset);
> 
> ...I'm not convinced this is a valid thing to do at this point. The
> inode item has not been set up yet with the correct state that is
> associated with the flushing of the inode (e.g. lsn, last_flags,
> etc) and so this kinda leaves a landmine in the item IO completion
> processing in that failure cannot rely on any of the inode log item
> state to make condition decisions.
> 

It is a bit ugly. I moved it just because it seemed like it could
facilitate enough simplification to be worthwhile. It's debatable
whether what ultimately fell out of that is worthwhile, of course. TBH,
I find the whole I/O approach to the error sequence rather rickety, even
though it's currently necessary, so I'm sympathetic to the argument of
not risking the common code path in service to the uncommon error path.

We could consider other tweaks like making flush imminent and leaving
the error checks at the end of the function (with documentation to
explain that we're shutting down, etc.), but I'd have to think about
that some more. It could be best just to leave this code as is if you
anticipate it going away relatively soon. This whole series was just
intended to be quick hit cleanups anyways...

Brian

> While it's technically not wrong, it just makes me uneasy, as in
> future the flush abort code will have to be careful about using
> inode state in making decisions, and there's not comments in the
> abort code to indicate that the state may be invalid...
> 
> /me has chased several subtle issues through this code recently...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 05/12] xfs: ratelimit unmount time per-buffer I/O error warning
  2020-04-20  3:19   ` Dave Chinner
@ 2020-04-20 14:02     ` Brian Foster
  2020-04-20 22:23       ` Dave Chinner
  0 siblings, 1 reply; 48+ messages in thread
From: Brian Foster @ 2020-04-20 14:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Apr 20, 2020 at 01:19:59PM +1000, Dave Chinner wrote:
> On Fri, Apr 17, 2020 at 11:08:52AM -0400, Brian Foster wrote:
> > At unmount time, XFS emits a warning for every in-core buffer that
> > might have undergone a write error. In practice this behavior is
> > probably reasonable given that the filesystem is likely short lived
> > once I/O errors begin to occur consistently. Under certain test or
> > otherwise expected error conditions, this can spam the logs and slow
> > down the unmount. Ratelimit the warning to prevent this problem
> > while still informing the user that errors have occurred.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_buf.c | 7 +++----
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 93942d8e35dd..5120fed06075 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -1685,11 +1685,10 @@ xfs_wait_buftarg(
> >  			bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
> >  			list_del_init(&bp->b_lru);
> >  			if (bp->b_flags & XBF_WRITE_FAIL) {
> > -				xfs_alert(btp->bt_mount,
> > -"Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!",
> > +				xfs_alert_ratelimited(btp->bt_mount,
> > +"Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!\n"
> > +"Please run xfs_repair to determine the extent of the problem.",
> >  					(long long)bp->b_bn);
> 
> Hmmmm. I was under the impression that multiple line log messages
> were frowned upon because they prevent every output line in the log
> being tagged correctly. That's where KERN_CONT came from (i.e. it's
> a continuation of a previous log message), but we don't use that
> with the XFS logging and hence multi-line log messages are split
> into multiple logging calls.
> 

I debated combining these into a single line for that exact reason for
about a second and then just went with this because I didn't think it
mattered that much.

> IOWs, this might be better handled just using a static ratelimit
> variable here....
> 
> Actually, we already have one for xfs_buf_item_push() to limit
> warnings about retrying XBF_WRITE_FAIL buffers:
> 
> static DEFINE_RATELIMIT_STATE(xfs_buf_write_fail_rl_state, 30 * HZ, 10);
> 
> Perhaps we should be using the same ratelimit variable here....
> 

IIRC that was static in another file, but we can centralize (and perhaps
generalize..) it somewhere if that is preferred..

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] xfs: remove duplicate verification from xfs_qm_dqflush()
  2020-04-20  3:53   ` Dave Chinner
@ 2020-04-20 14:02     ` Brian Foster
  2020-04-20 22:31       ` Dave Chinner
  0 siblings, 1 reply; 48+ messages in thread
From: Brian Foster @ 2020-04-20 14:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Apr 20, 2020 at 01:53:22PM +1000, Dave Chinner wrote:
> On Fri, Apr 17, 2020 at 11:08:53AM -0400, Brian Foster wrote:
> > The dquot read/write verifier calls xfs_dqblk_verify() on every
> > dquot in the buffer. Remove the duplicate call from
> > xfs_qm_dqflush().
> 
> Ah, I think there's a bug here - it's not supposed to be a
> duplicate....
> 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_dquot.c | 14 --------------
> >  1 file changed, 14 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > index af2c8e5ceea0..73032c18a94a 100644
> > --- a/fs/xfs/xfs_dquot.c
> > +++ b/fs/xfs/xfs_dquot.c
> > @@ -1071,7 +1071,6 @@ xfs_qm_dqflush(
> >  	struct xfs_buf		*bp;
> >  	struct xfs_dqblk	*dqb;
> >  	struct xfs_disk_dquot	*ddqp;
> > -	xfs_failaddr_t		fa;
> >  	int			error;
> >  
> >  	ASSERT(XFS_DQ_IS_LOCKED(dqp));
> > @@ -1116,19 +1115,6 @@ xfs_qm_dqflush(
> >  	dqb = bp->b_addr + dqp->q_bufoffset;
> >  	ddqp = &dqb->dd_diskdq;
> >  
> > -	/*
> > -	 * A simple sanity check in case we got a corrupted dquot.
> > -	 */
> > -	fa = xfs_dqblk_verify(mp, dqb, be32_to_cpu(ddqp->d_id), 0);
> 
> So this verifies the on disk dquot ....
> 
> > -	if (fa) {
> > -		xfs_alert(mp, "corrupt dquot ID 0x%x in memory at %pS",
> 
> ...which issues an "in memory corruption" alert on failure...
> 
> > -				be32_to_cpu(ddqp->d_id), fa);
> > -		xfs_buf_relse(bp);
> > -		xfs_dqfunlock(dqp);
> > -		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > -		return -EFSCORRUPTED;
> > -	}
> > -
> >  	/* This is the only portion of data that needs to persist */
> >  	memcpy(ddqp, &dqp->q_core, sizeof(struct xfs_disk_dquot));
> 
> .... and on success we immediately overwrite the on-disk copy with
> the unchecked in-memory copy of the dquot. 
> 
> IOWs, I think that verification call here should be checking the
> in-memory dquot core, not the on disk buffer that is about to get
> trashed.  i.e. something like this:
> 
> -	fa = xfs_dqblk_verify(mp, dqb, be32_to_cpu(ddqp->d_id), 0);
> +	fa = xfs_dquot_verify(mp, &dqp->q_core, be32_to_cpu(ddqp->d_id), 0);
> 

Isn't this still essentially duplicated by the write verifier? I don't
feel strongly about changing it as above vs. removing it, but it does
still seem unnecessary to me..

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 08/12] xfs: remove unnecessary quotaoff intent item push handler
  2020-04-20  3:58   ` Dave Chinner
@ 2020-04-20 14:02     ` Brian Foster
  0 siblings, 0 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-20 14:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Apr 20, 2020 at 01:58:58PM +1000, Dave Chinner wrote:
> On Fri, Apr 17, 2020 at 11:08:55AM -0400, Brian Foster wrote:
> > The quotaoff intent item push handler unconditionally returns locked
> > status because it remains AIL resident until removed by the
> > quotafoff end intent. xfsaild_push_item() already returns pinned
> > status for items (generally intents) without a push handler. This is
> > effectively the same behavior for the purpose of quotaoff, so remove
> 
> It's not the same. XFS_ITEM_PINNED results in a log force from the
> xfsaild, while XFS_ITEM_LOCKED items are just skipped. So this
> change will result in a log force every time the AIL push sees a
> quotaoff log item.  Hence I think the code as it stands is correct
> as log forces will not speed up the removal of the quotaoff item
> from the AIL...
> 

Ah right.. I was thinking we had a count heuristic there but we
potentially force the log once we hit at least one pinned item. I think
this same thing might have come up when this was first refactored. I'll
drop this one..

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/12] xfs: clean up AIL log item removal functions
  2020-04-20  4:32   ` Dave Chinner
@ 2020-04-20 14:03     ` Brian Foster
  0 siblings, 0 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-20 14:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Apr 20, 2020 at 02:32:33PM +1000, Dave Chinner wrote:
> On Fri, Apr 17, 2020 at 11:08:57AM -0400, Brian Foster wrote:
> > We have two AIL removal functions with slightly different semantics.
> > xfs_trans_ail_delete() expects the caller to have the AIL lock and
> > for the associated item to be AIL resident. If not, the filesystem
> > is shut down. xfs_trans_ail_remove() acquires the AIL lock, checks
> > that the item is AIL resident and calls the former if so.
> > 
> > These semantics lead to confused usage between the two. For example,
> > the _remove() variant takes a shutdown parameter to pass to the
> > _delete() variant, but immediately returns if the AIL bit is not
> > set. This means that _remove() would never shut down if an item is
> > not AIL resident, even though it appears that many callers would
> > expect it to.
> > 
> > Make the following changes to clean up both of these functions:
> > 
> > - Most callers of xfs_trans_ail_delete() acquire the AIL lock just
> >   before the call. Update _delete() to acquire the lock and open
> >   code the couple of callers that make additional checks under AIL
> >   lock.
> > - Drop the unnecessary ailp parameter from _delete().
> > - Drop the unused shutdown parameter from _remove() and open code
> >   the implementation.
> > 
> > In summary, this leaves a _delete() variant that expects an AIL
> > resident item and a _remove() helper that checks the AIL bit. Audit
> > the existing callsites for use of the appropriate function and
> > update as necessary.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> ....
> 
> Good start, but...
> 
> > @@ -1032,10 +1033,11 @@ xfs_qm_dqflush_done(
> >  		goto out;
> >  
> >  	spin_lock(&ailp->ail_lock);
> > -	if (lip->li_lsn == qip->qli_flush_lsn)
> > -		/* xfs_trans_ail_delete() drops the AIL lock */
> > -		xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
> > -	else
> > +	if (lip->li_lsn == qip->qli_flush_lsn) {
> > +		/* xfs_ail_update_finish() drops the AIL lock */
> > +		tail_lsn = xfs_ail_delete_one(ailp, lip);
> > +		xfs_ail_update_finish(ailp, tail_lsn);
> > +	} else
> >  		spin_unlock(&ailp->ail_lock);
> 
> This drops the shutdown if the dquot is not in the AIL. It should be
> in the AIL, so if it isn't we should be shutting down...
> 

It might not be in the AIL if we're in quotacheck because it does
everything in-core.

> > @@ -872,13 +872,14 @@ xfs_ail_delete_one(
> >   */
> >  void
> >  xfs_trans_ail_delete(
> > -	struct xfs_ail		*ailp,
> >  	struct xfs_log_item	*lip,
> >  	int			shutdown_type)
> >  {
> > +	struct xfs_ail		*ailp = lip->li_ailp;
> >  	struct xfs_mount	*mp = ailp->ail_mount;
> >  	xfs_lsn_t		tail_lsn;
> >  
> > +	spin_lock(&ailp->ail_lock);
> >  	if (!test_bit(XFS_LI_IN_AIL, &lip->li_flags)) {
> >  		spin_unlock(&ailp->ail_lock);
> >  		if (!XFS_FORCED_SHUTDOWN(mp)) {
> > diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
> > index 9135afdcee9d..7563c78e2997 100644
> > --- a/fs/xfs/xfs_trans_priv.h
> > +++ b/fs/xfs/xfs_trans_priv.h
> > @@ -94,22 +94,23 @@ xfs_trans_ail_update(
> >  xfs_lsn_t xfs_ail_delete_one(struct xfs_ail *ailp, struct xfs_log_item *lip);
> >  void xfs_ail_update_finish(struct xfs_ail *ailp, xfs_lsn_t old_lsn)
> >  			__releases(ailp->ail_lock);
> > -void xfs_trans_ail_delete(struct xfs_ail *ailp, struct xfs_log_item *lip,
> > -		int shutdown_type);
> > +void xfs_trans_ail_delete(struct xfs_log_item *lip, int shutdown_type);
> >  
> >  static inline void
> >  xfs_trans_ail_remove(
> > -	struct xfs_log_item	*lip,
> > -	int			shutdown_type)
> > +	struct xfs_log_item	*lip)
> >  {
> >  	struct xfs_ail		*ailp = lip->li_ailp;
> > +	xfs_lsn_t		tail_lsn;
> >  
> >  	spin_lock(&ailp->ail_lock);
> > -	/* xfs_trans_ail_delete() drops the AIL lock */
> > -	if (test_bit(XFS_LI_IN_AIL, &lip->li_flags))
> > -		xfs_trans_ail_delete(ailp, lip, shutdown_type);
> > -	else
> > +	/* xfs_ail_update_finish() drops the AIL lock */
> > +	if (test_bit(XFS_LI_IN_AIL, &lip->li_flags)) {
> > +		tail_lsn = xfs_ail_delete_one(ailp, lip);
> > +		xfs_ail_update_finish(ailp, tail_lsn);
> > +	} else {
> >  		spin_unlock(&ailp->ail_lock);
> > +	}
> >  }
> 
> This makes xfs_trans_ail_delete() and xfs_trans_ail_remove() almost
> identical, except one will shutdown if the item is not in the AIL
> and the other won't. Wouldn't it be better to get it down to just
> one function that does everything, and remove the confusion of which
> to use altogether?
> 

Yes, I was thinking about doing this when working on this patch but
determined it was easier to fix up both functions first and then
consider combining them in a separate step, but then never got back to
it. That might have been before I ended up open-coding some of the other
sites too so the end result wasn't really clear to me.

Regardless, I'll take another look and fold that change in if it makes
sense..

Brian

> void
> xfs_trans_ail_delete(
> 	struct xfs_log_item     *lip,
> 	int                     shutdown)
> {
> 	struct xfs_ail		*ailp = lip->li_ailp;
> 
> 	spin_lock(&ailp->ail_lock);
> 	if (test_bit(XFS_LI_IN_AIL, &lip->li_flags)) {
> 		xfs_lsn_t tail_lsn = xfs_ail_delete_one(ailp, lip);
> 		xfs_ail_update_finish(ailp, tail_lsn);
> 		return;
> 	}
> 	spin_unlock(&ailp->ail_lock);
> 	if (!shutdown)
> 		return;
> 
> 	/* do shutdown stuff */
> }
> 
> -Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/12] xfs: random buffer write failure errortag
  2020-04-20  4:37   ` Dave Chinner
@ 2020-04-20 14:04     ` Brian Foster
  0 siblings, 0 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-20 14:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Apr 20, 2020 at 02:37:17PM +1000, Dave Chinner wrote:
> On Fri, Apr 17, 2020 at 11:08:59AM -0400, Brian Foster wrote:
> > Introduce an error tag to randomly fail async buffer writes. This is
> > primarily to facilitate testing of the XFS error configuration
> > mechanism.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> 
> Isn't this what XFS_ERRTAG_IODONE_IOERR and XFS_RANDOM_IODONE_IOERR
> is for?
> 

That one triggers log I/O errors, which is an imminent shutdown error
scenario. I'm not aware of a clean way to combine the two, though the
above could probably use a better name. TBH, I wasn't sure about the
name for the buffer one either. I wonder if XFS_ERRTAG_LOG_IOERROR and
XFS_ERRTAG_METADATA_IOERROR would be more usable..? That's assuming
we're Ok with changing existing error tag names. The IODONE one appears
to have been around forever...

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 00/12] xfs: flush related error handling cleanups
  2020-04-19 22:53 ` [PATCH 00/12] xfs: flush related error handling cleanups Dave Chinner
@ 2020-04-20 14:06   ` Brian Foster
  2020-04-20 22:53     ` Dave Chinner
  0 siblings, 1 reply; 48+ messages in thread
From: Brian Foster @ 2020-04-20 14:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Apr 20, 2020 at 08:53:06AM +1000, Dave Chinner wrote:
> On Fri, Apr 17, 2020 at 11:08:47AM -0400, Brian Foster wrote:
> > Hi all,
> > 
> > This actually started as what I intended to be a cleanup of xfsaild
> > error handling and the fact that unexpected errors are kind of lost in
> > the ->iop_push() handlers of flushable log items. Some discussion with
> > Dave on that is available here[1]. I was thinking of genericizing the
> > behavior, but I'm not so sure that is possible now given the error
> > handling requirements of the associated items.
> > 
> > While thinking through that, I ended up incorporating various cleanups
> > in the somewhat confusing and erratic error handling on the periphery of
> > xfsaild, such as the flush handlers. Most of these are straightforward
> > cleanups except for patch 9, which I think requires careful review and
> > is of debatable value. I have used patch 12 to run an hour or so of
> > highly concurrent fsstress load against it and will execute a longer run
> > over the weekend now that fstests has completed.
> > 
> > Thoughts, reviews, flames appreciated.
> 
> I'll need to do something thinking on this patchset - I have a
> patchset that touches a lot of the same code I'm working on right
> now to pin inode cluster buffers in memory when the inode is dirtied
> so we don't get RMW cycles in AIL flushing.
> 
> That code gets rid of xfs_iflush() completely, removes dirty inodes
> from the AIL and tracks only ordered cluster buffers in the AIL for
> inode writeback (i.e. reduces AIL tracked log items by up to 30x).
> It also only does inode writeback from the ordered cluster buffers.
> 

Ok. I could see that being reason enough to drop the iflush iodone
patch, given that it depends on a bit of a rework/hack. A cleaner
solution requires more thought and it might not be worth the time if the
code is going away. Most of the rest are straightforward cleanups though
so I wouldn't expect complex conflict resolution. It's hard to say
for sure without seeing the code, of course..

> The idea behind this is to make inode flushing completely
> non-blocking, and to simply inode cluster flushing to simply iterate
> all the dirty inodes attached to the buffer. This gets rid of radix
> tree lookups and races with reclaim, and gets rid of having to
> special case a locked inode in the cluster iteration code.
> 

Sounds interesting, but it's not really clear to me what the general
flushing dynamic looks like in this model. I.e., you mention
xfs_iflush() goes away, but cluster flushing still exists in some form,
so I can't really tell if xfs_iflush() going away is tied to a
functional change or primarily a refactoring/cleanup. Anyways, no need
to go into the weeds if the code will eventually clarify..

> I was looking at this as the model to then apply to dquot flushing,
> too, because it currently does not have cluster flushing, and hence
> flushes dquots individually, even though there can be multiple dirty
> dquots per buffer. Some of this patchset moves the dquot flushing a
> bit closer to the inode code, so those parts are going to be useful
> regardless of everything else....
> 

Makes sense.

> Do you have a git tree I could pull this from to see how bad the
> conflicts are?
> 

I don't have a public tree. I suppose I could look into getting
kernel.org access if somebody could point me in the right direction for
that. :) In the meantime I could make a private tree accessible to you
directly if that's helpful..

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 05/12] xfs: ratelimit unmount time per-buffer I/O error warning
  2020-04-17 15:08 ` [PATCH 05/12] xfs: ratelimit unmount time per-buffer I/O error warning Brian Foster
  2020-04-20  3:19   ` Dave Chinner
@ 2020-04-20 18:50   ` Allison Collins
  1 sibling, 0 replies; 48+ messages in thread
From: Allison Collins @ 2020-04-20 18:50 UTC (permalink / raw)
  To: Brian Foster, linux-xfs



On 4/17/20 8:08 AM, Brian Foster wrote:
> At unmount time, XFS emits a warning for every in-core buffer that
> might have undergone a write error. In practice this behavior is
> probably reasonable given that the filesystem is likely short lived
> once I/O errors begin to occur consistently. Under certain test or
> otherwise expected error conditions, this can spam the logs and slow
> down the unmount. Ratelimit the warning to prevent this problem
> while still informing the user that errors have occurred.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Ok, other than the line return Dave pointed out, looks ok to me:
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

> ---
>   fs/xfs/xfs_buf.c | 7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 93942d8e35dd..5120fed06075 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1685,11 +1685,10 @@ xfs_wait_buftarg(
>   			bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
>   			list_del_init(&bp->b_lru);
>   			if (bp->b_flags & XBF_WRITE_FAIL) {
> -				xfs_alert(btp->bt_mount,
> -"Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!",
> +				xfs_alert_ratelimited(btp->bt_mount,
> +"Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!\n"
> +"Please run xfs_repair to determine the extent of the problem.",
>   					(long long)bp->b_bn);
> -				xfs_alert(btp->bt_mount,
> -"Please run xfs_repair to determine the extent of the problem.");
>   			}
>   			xfs_buf_rele(bp);
>   		}
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 07/12] xfs: abort consistently on dquot flush failure
  2020-04-17 15:08 ` [PATCH 07/12] xfs: abort consistently on dquot flush failure Brian Foster
  2020-04-20  3:54   ` Dave Chinner
@ 2020-04-20 18:50   ` Allison Collins
  1 sibling, 0 replies; 48+ messages in thread
From: Allison Collins @ 2020-04-20 18:50 UTC (permalink / raw)
  To: Brian Foster, linux-xfs



On 4/17/20 8:08 AM, Brian Foster wrote:
> The dquot flush handler effectively aborts the dquot flush if the
> filesystem is already shut down, but doesn't actually shut down if
> the flush fails. Update xfs_qm_dqflush() to consistently abort the
> dquot flush and shutdown the fs if the flush fails with an
> unexpected error.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Ok, makes sense:
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

> ---
>   fs/xfs/xfs_dquot.c | 27 ++++++++-------------------
>   1 file changed, 8 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index 73032c18a94a..41750f797861 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -1068,6 +1068,7 @@ xfs_qm_dqflush(
>   	struct xfs_buf		**bpp)
>   {
>   	struct xfs_mount	*mp = dqp->q_mount;
> +	struct xfs_log_item	*lip = &dqp->q_logitem.qli_item;
>   	struct xfs_buf		*bp;
>   	struct xfs_dqblk	*dqb;
>   	struct xfs_disk_dquot	*ddqp;
> @@ -1082,32 +1083,16 @@ xfs_qm_dqflush(
>   
>   	xfs_qm_dqunpin_wait(dqp);
>   
> -	/*
> -	 * This may have been unpinned because the filesystem is shutting
> -	 * down forcibly. If that's the case we must not write this dquot
> -	 * to disk, because the log record didn't make it to disk.
> -	 *
> -	 * We also have to remove the log item from the AIL in this case,
> -	 * as we wait for an emptry AIL as part of the unmount process.
> -	 */
> -	if (XFS_FORCED_SHUTDOWN(mp)) {
> -		struct xfs_log_item	*lip = &dqp->q_logitem.qli_item;
> -		dqp->dq_flags &= ~XFS_DQ_DIRTY;
> -
> -		xfs_trans_ail_remove(lip, SHUTDOWN_CORRUPT_INCORE);
> -
> -		error = -EIO;
> -		goto out_unlock;
> -	}
> -
>   	/*
>   	 * Get the buffer containing the on-disk dquot
>   	 */
>   	error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp, dqp->q_blkno,
>   				   mp->m_quotainfo->qi_dqchunklen, XBF_TRYLOCK,
>   				   &bp, &xfs_dquot_buf_ops);
> -	if (error)
> +	if (error == -EAGAIN)
>   		goto out_unlock;
> +	if (error)
> +		goto out_abort;
>   
>   	/*
>   	 * Calculate the location of the dquot inside the buffer.
> @@ -1161,6 +1146,10 @@ xfs_qm_dqflush(
>   	*bpp = bp;
>   	return 0;
>   
> +out_abort:
> +	dqp->dq_flags &= ~XFS_DQ_DIRTY;
> +	xfs_trans_ail_remove(lip, SHUTDOWN_CORRUPT_INCORE);
> +	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
>   out_unlock:
>   	xfs_dqfunlock(dqp);
>   	return error;
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 11/12] xfs: remove unused iflush stale parameter
  2020-04-17 15:08 ` [PATCH 11/12] xfs: remove unused iflush stale parameter Brian Foster
  2020-04-20  4:34   ` Dave Chinner
@ 2020-04-20 19:19   ` Allison Collins
  1 sibling, 0 replies; 48+ messages in thread
From: Allison Collins @ 2020-04-20 19:19 UTC (permalink / raw)
  To: Brian Foster, linux-xfs



On 4/17/20 8:08 AM, Brian Foster wrote:
> The stale parameter was used to control the now unused shutdown
> parameter of xfs_trans_ail_remove().
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Looks ok to me
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

> ---
>   fs/xfs/xfs_icache.c     | 2 +-
>   fs/xfs/xfs_inode.c      | 2 +-
>   fs/xfs/xfs_inode_item.c | 7 +++----
>   fs/xfs/xfs_inode_item.h | 2 +-
>   4 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index a7be7a9e5c1a..9088716465e7 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1118,7 +1118,7 @@ xfs_reclaim_inode(
>   	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
>   		xfs_iunpin_wait(ip);
>   		/* xfs_iflush_abort() drops the flush lock */
> -		xfs_iflush_abort(ip, false);
> +		xfs_iflush_abort(ip);
>   		goto reclaim;
>   	}
>   	if (xfs_ipincount(ip)) {
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 98ee1b10d1b0..502ffe857b12 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3701,7 +3701,7 @@ xfs_iflush(
>   	return 0;
>   
>   abort:
> -	xfs_iflush_abort(ip, false);
> +	xfs_iflush_abort(ip);
>   shutdown:
>   	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
>   	return error;
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index f8dd9bb8c851..b8cbd0c61ce0 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -757,10 +757,9 @@ xfs_iflush_done(
>    */
>   void
>   xfs_iflush_abort(
> -	xfs_inode_t		*ip,
> -	bool			stale)
> +	struct xfs_inode		*ip)
>   {
> -	xfs_inode_log_item_t	*iip = ip->i_itemp;
> +	struct xfs_inode_log_item	*iip = ip->i_itemp;
>   
>   	if (iip) {
>   		xfs_trans_ail_remove(&iip->ili_item);
> @@ -788,7 +787,7 @@ xfs_istale_done(
>   	struct xfs_buf		*bp,
>   	struct xfs_log_item	*lip)
>   {
> -	xfs_iflush_abort(INODE_ITEM(lip)->ili_inode, true);
> +	xfs_iflush_abort(INODE_ITEM(lip)->ili_inode);
>   }
>   
>   /*
> diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h
> index 07a60e74c39c..a68c114b79a0 100644
> --- a/fs/xfs/xfs_inode_item.h
> +++ b/fs/xfs/xfs_inode_item.h
> @@ -34,7 +34,7 @@ extern void xfs_inode_item_init(struct xfs_inode *, struct xfs_mount *);
>   extern void xfs_inode_item_destroy(struct xfs_inode *);
>   extern void xfs_iflush_done(struct xfs_buf *, struct xfs_log_item *);
>   extern void xfs_istale_done(struct xfs_buf *, struct xfs_log_item *);
> -extern void xfs_iflush_abort(struct xfs_inode *, bool);
> +extern void xfs_iflush_abort(struct xfs_inode *);
>   extern int xfs_inode_item_format_convert(xfs_log_iovec_t *,
>   					 struct xfs_inode_log_format *);
>   
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 01/12] xfs: refactor failed buffer resubmission into xfsaild
  2020-04-20 13:58     ` Brian Foster
@ 2020-04-20 22:19       ` Dave Chinner
  0 siblings, 0 replies; 48+ messages in thread
From: Dave Chinner @ 2020-04-20 22:19 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Apr 20, 2020 at 09:58:25AM -0400, Brian Foster wrote:
> On Mon, Apr 20, 2020 at 12:45:56PM +1000, Dave Chinner wrote:
> > On Fri, Apr 17, 2020 at 11:08:48AM -0400, Brian Foster wrote:
> > > Flush locked log items whose underlying buffers fail metadata
> > > writeback are tagged with a special flag to indicate that the flush
> > > lock is already held. This is currently implemented in the type
> > > specific ->iop_push() callback, but the processing required for such
> > > items is not type specific because we're only doing basic state
> > > management on the underlying buffer.
> > > 
> > > Factor the failed log item handling out of the inode and dquot
> > > ->iop_push() callbacks and open code the buffer resubmit helper into
> > > a single helper called from xfsaild_push_item(). This provides a
> > > generic mechanism for handling failed metadata buffer writeback with
> > > a bit less code.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > .....
> > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> > > index 564253550b75..0c709651a2c6 100644
> > > --- a/fs/xfs/xfs_trans_ail.c
> > > +++ b/fs/xfs/xfs_trans_ail.c
> > > @@ -345,6 +345,45 @@ xfs_ail_delete(
> > >  	xfs_trans_ail_cursor_clear(ailp, lip);
> > >  }
> > >  
> > > +/*
> > > + * Requeue a failed buffer for writeback.
> > > + *
> > > + * We clear the log item failed state here as well, but we have to be careful
> > > + * about reference counts because the only active reference counts on the buffer
> > > + * may be the failed log items. Hence if we clear the log item failed state
> > > + * before queuing the buffer for IO we can release all active references to
> > > + * the buffer and free it, leading to use after free problems in
> > > + * xfs_buf_delwri_queue. It makes no difference to the buffer or log items which
> > > + * order we process them in - the buffer is locked, and we own the buffer list
> > > + * so nothing on them is going to change while we are performing this action.
> > > + *
> > > + * Hence we can safely queue the buffer for IO before we clear the failed log
> > > + * item state, therefore  always having an active reference to the buffer and
> > > + * avoiding the transient zero-reference state that leads to use-after-free.
> > > + */
> > > +static inline int
> > > +xfsaild_push_failed(
> > 
> > Bad name IMO. Makes me think it's an action that is taken when an
> > item push failed, not an action that resubmits a buffer that had an
> > IO failure.
> > 
> > i.e. "push" doesn't imply IO, and failure to push an item doesn't
> > mean there was an IO error - it may be locked, already flushing,
> > pinned, etc.
> > 
> 
> Yeah..
> 
> > > +	struct xfs_log_item	*lip,
> > > +	struct list_head	*buffer_list)
> > > +{
> > > +	struct xfs_buf		*bp = lip->li_buf;
> > 
> > This also assumes that the log item has a buffer associated with it.
> > So perhaps "xfsaild_resubmit_failed_buffer()" would be more
> > approriate here.
> > 
> 
> The buffer pointer is an implementation detail of failed items. It would
> be nice to find a way to avoid that. How about xfsaild_resubmit_item()
> to be consistent with the parent function (xfsaild_push_item())?

That works, too. :)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 05/12] xfs: ratelimit unmount time per-buffer I/O error warning
  2020-04-20 14:02     ` Brian Foster
@ 2020-04-20 22:23       ` Dave Chinner
  2020-04-21 12:13         ` Brian Foster
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2020-04-20 22:23 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Apr 20, 2020 at 10:02:05AM -0400, Brian Foster wrote:
> On Mon, Apr 20, 2020 at 01:19:59PM +1000, Dave Chinner wrote:
> > On Fri, Apr 17, 2020 at 11:08:52AM -0400, Brian Foster wrote:
> > > At unmount time, XFS emits a warning for every in-core buffer that
> > > might have undergone a write error. In practice this behavior is
> > > probably reasonable given that the filesystem is likely short lived
> > > once I/O errors begin to occur consistently. Under certain test or
> > > otherwise expected error conditions, this can spam the logs and slow
> > > down the unmount. Ratelimit the warning to prevent this problem
> > > while still informing the user that errors have occurred.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > >  fs/xfs/xfs_buf.c | 7 +++----
> > >  1 file changed, 3 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > > index 93942d8e35dd..5120fed06075 100644
> > > --- a/fs/xfs/xfs_buf.c
> > > +++ b/fs/xfs/xfs_buf.c
> > > @@ -1685,11 +1685,10 @@ xfs_wait_buftarg(
> > >  			bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
> > >  			list_del_init(&bp->b_lru);
> > >  			if (bp->b_flags & XBF_WRITE_FAIL) {
> > > -				xfs_alert(btp->bt_mount,
> > > -"Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!",
> > > +				xfs_alert_ratelimited(btp->bt_mount,
> > > +"Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!\n"
> > > +"Please run xfs_repair to determine the extent of the problem.",
> > >  					(long long)bp->b_bn);
> > 
> > Hmmmm. I was under the impression that multiple line log messages
> > were frowned upon because they prevent every output line in the log
> > being tagged correctly. That's where KERN_CONT came from (i.e. it's
> > a continuation of a previous log message), but we don't use that
> > with the XFS logging and hence multi-line log messages are split
> > into multiple logging calls.
> > 
> 
> I debated combining these into a single line for that exact reason for
> about a second and then just went with this because I didn't think it
> mattered that much.

It doesn't matter to us, but it does matter to those people who want
their log entries correctly tagged for their classification
engines...

> > IOWs, this might be better handled just using a static ratelimit
> > variable here....
> > 
> > Actually, we already have one for xfs_buf_item_push() to limit
> > warnings about retrying XBF_WRITE_FAIL buffers:
> > 
> > static DEFINE_RATELIMIT_STATE(xfs_buf_write_fail_rl_state, 30 * HZ, 10);
> > 
> > Perhaps we should be using the same ratelimit variable here....
> > 
> 
> IIRC that was static in another file, but we can centralize (and perhaps
> generalize..) it somewhere if that is preferred..

I think it makes sense to have all the buffer write fail
messages ratelimited under the same variable - once it starts
spewing messages, we should limit them all the same way...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] xfs: remove duplicate verification from xfs_qm_dqflush()
  2020-04-20 14:02     ` Brian Foster
@ 2020-04-20 22:31       ` Dave Chinner
  0 siblings, 0 replies; 48+ messages in thread
From: Dave Chinner @ 2020-04-20 22:31 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Apr 20, 2020 at 10:02:21AM -0400, Brian Foster wrote:
> On Mon, Apr 20, 2020 at 01:53:22PM +1000, Dave Chinner wrote:
> > On Fri, Apr 17, 2020 at 11:08:53AM -0400, Brian Foster wrote:
> > > The dquot read/write verifier calls xfs_dqblk_verify() on every
> > > dquot in the buffer. Remove the duplicate call from
> > > xfs_qm_dqflush().
> > 
> > Ah, I think there's a bug here - it's not supposed to be a
> > duplicate....
> > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > >  fs/xfs/xfs_dquot.c | 14 --------------
> > >  1 file changed, 14 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > > index af2c8e5ceea0..73032c18a94a 100644
> > > --- a/fs/xfs/xfs_dquot.c
> > > +++ b/fs/xfs/xfs_dquot.c
> > > @@ -1071,7 +1071,6 @@ xfs_qm_dqflush(
> > >  	struct xfs_buf		*bp;
> > >  	struct xfs_dqblk	*dqb;
> > >  	struct xfs_disk_dquot	*ddqp;
> > > -	xfs_failaddr_t		fa;
> > >  	int			error;
> > >  
> > >  	ASSERT(XFS_DQ_IS_LOCKED(dqp));
> > > @@ -1116,19 +1115,6 @@ xfs_qm_dqflush(
> > >  	dqb = bp->b_addr + dqp->q_bufoffset;
> > >  	ddqp = &dqb->dd_diskdq;
> > >  
> > > -	/*
> > > -	 * A simple sanity check in case we got a corrupted dquot.
> > > -	 */
> > > -	fa = xfs_dqblk_verify(mp, dqb, be32_to_cpu(ddqp->d_id), 0);
> > 
> > So this verifies the on disk dquot ....
> > 
> > > -	if (fa) {
> > > -		xfs_alert(mp, "corrupt dquot ID 0x%x in memory at %pS",
> > 
> > ...which issues an "in memory corruption" alert on failure...
> > 
> > > -				be32_to_cpu(ddqp->d_id), fa);
> > > -		xfs_buf_relse(bp);
> > > -		xfs_dqfunlock(dqp);
> > > -		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > -		return -EFSCORRUPTED;
> > > -	}
> > > -
> > >  	/* This is the only portion of data that needs to persist */
> > >  	memcpy(ddqp, &dqp->q_core, sizeof(struct xfs_disk_dquot));
> > 
> > .... and on success we immediately overwrite the on-disk copy with
> > the unchecked in-memory copy of the dquot. 
> > 
> > IOWs, I think that verification call here should be checking the
> > in-memory dquot core, not the on disk buffer that is about to get
> > trashed.  i.e. something like this:
> > 
> > -	fa = xfs_dqblk_verify(mp, dqb, be32_to_cpu(ddqp->d_id), 0);
> > +	fa = xfs_dquot_verify(mp, &dqp->q_core, be32_to_cpu(ddqp->d_id), 0);
> > 
> 
> Isn't this still essentially duplicated by the write verifier? I don't
> feel strongly about changing it as above vs. removing it, but it does
> still seem unnecessary to me..

It's no different to the xfs_iflush_int() code that runs a heap of
checks on the in-memory inode before it is flushed to the backing
buffer. That uses a combination of open coded checks (for error
injection) and verifier functions (e.g. fork checking), so this
really isn't that unusual.

Realistically, it's better to catch the corruption as early as
possible - if we catch it here we know we corrupted the in-memory
dquot. However, if the write verifier catches it we have no idea
exactly when the corruption occurred, or whether it was a result of
a code problem or an external memory corruption in memory we haven't
modified at all...

IOWs the two checks or intended to catch very different classes of
in-memory corruptions, so they really aren't redundant or
unnecessary at all...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/12] xfs: random buffer write failure errortag
  2020-04-17 15:08 ` [PATCH 12/12] xfs: random buffer write failure errortag Brian Foster
  2020-04-20  4:37   ` Dave Chinner
@ 2020-04-20 22:42   ` Allison Collins
  1 sibling, 0 replies; 48+ messages in thread
From: Allison Collins @ 2020-04-20 22:42 UTC (permalink / raw)
  To: Brian Foster, linux-xfs



On 4/17/20 8:08 AM, Brian Foster wrote:
> Introduce an error tag to randomly fail async buffer writes. This is
> primarily to facilitate testing of the XFS error configuration
> mechanism.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
Ok, I think it looks ok:
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

> ---
>   fs/xfs/libxfs/xfs_errortag.h | 4 +++-
>   fs/xfs/xfs_buf.c             | 6 ++++++
>   fs/xfs/xfs_error.c           | 3 +++
>   3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
> index 79e6c4fb1d8a..2486dab19023 100644
> --- a/fs/xfs/libxfs/xfs_errortag.h
> +++ b/fs/xfs/libxfs/xfs_errortag.h
> @@ -55,7 +55,8 @@
>   #define XFS_ERRTAG_FORCE_SCRUB_REPAIR			32
>   #define XFS_ERRTAG_FORCE_SUMMARY_RECALC			33
>   #define XFS_ERRTAG_IUNLINK_FALLBACK			34
> -#define XFS_ERRTAG_MAX					35
> +#define XFS_ERRTAG_BUF_IOERROR				35
> +#define XFS_ERRTAG_MAX					36
>   
>   /*
>    * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
> @@ -95,5 +96,6 @@
>   #define XFS_RANDOM_FORCE_SCRUB_REPAIR			1
>   #define XFS_RANDOM_FORCE_SUMMARY_RECALC			1
>   #define XFS_RANDOM_IUNLINK_FALLBACK			(XFS_RANDOM_DEFAULT/10)
> +#define XFS_RANDOM_BUF_IOERROR				XFS_RANDOM_DEFAULT
>   
>   #endif /* __XFS_ERRORTAG_H_ */
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 5120fed06075..a305db779156 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1289,6 +1289,12 @@ xfs_buf_bio_end_io(
>   	struct bio		*bio)
>   {
>   	struct xfs_buf		*bp = (struct xfs_buf *)bio->bi_private;
> +	struct xfs_mount	*mp = bp->b_mount;
> +
> +	if (!bio->bi_status &&
> +	    (bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
> +	    XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BUF_IOERROR))
> +		bio->bi_status = errno_to_blk_status(-EIO);
>   
>   	/*
>   	 * don't overwrite existing errors - otherwise we can lose errors on
> diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
> index a21e9cc6516a..7f6e20899473 100644
> --- a/fs/xfs/xfs_error.c
> +++ b/fs/xfs/xfs_error.c
> @@ -53,6 +53,7 @@ static unsigned int xfs_errortag_random_default[] = {
>   	XFS_RANDOM_FORCE_SCRUB_REPAIR,
>   	XFS_RANDOM_FORCE_SUMMARY_RECALC,
>   	XFS_RANDOM_IUNLINK_FALLBACK,
> +	XFS_RANDOM_BUF_IOERROR,
>   };
>   
>   struct xfs_errortag_attr {
> @@ -162,6 +163,7 @@ XFS_ERRORTAG_ATTR_RW(buf_lru_ref,	XFS_ERRTAG_BUF_LRU_REF);
>   XFS_ERRORTAG_ATTR_RW(force_repair,	XFS_ERRTAG_FORCE_SCRUB_REPAIR);
>   XFS_ERRORTAG_ATTR_RW(bad_summary,	XFS_ERRTAG_FORCE_SUMMARY_RECALC);
>   XFS_ERRORTAG_ATTR_RW(iunlink_fallback,	XFS_ERRTAG_IUNLINK_FALLBACK);
> +XFS_ERRORTAG_ATTR_RW(buf_ioerror,	XFS_ERRTAG_BUF_IOERROR);
>   
>   static struct attribute *xfs_errortag_attrs[] = {
>   	XFS_ERRORTAG_ATTR_LIST(noerror),
> @@ -199,6 +201,7 @@ static struct attribute *xfs_errortag_attrs[] = {
>   	XFS_ERRORTAG_ATTR_LIST(force_repair),
>   	XFS_ERRORTAG_ATTR_LIST(bad_summary),
>   	XFS_ERRORTAG_ATTR_LIST(iunlink_fallback),
> +	XFS_ERRORTAG_ATTR_LIST(buf_ioerror),
>   	NULL,
>   };
>   
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 00/12] xfs: flush related error handling cleanups
  2020-04-20 14:06   ` Brian Foster
@ 2020-04-20 22:53     ` Dave Chinner
  0 siblings, 0 replies; 48+ messages in thread
From: Dave Chinner @ 2020-04-20 22:53 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Apr 20, 2020 at 10:06:04AM -0400, Brian Foster wrote:
> On Mon, Apr 20, 2020 at 08:53:06AM +1000, Dave Chinner wrote:
> > On Fri, Apr 17, 2020 at 11:08:47AM -0400, Brian Foster wrote:
> > > Hi all,
> > > 
> > > This actually started as what I intended to be a cleanup of xfsaild
> > > error handling and the fact that unexpected errors are kind of lost in
> > > the ->iop_push() handlers of flushable log items. Some discussion with
> > > Dave on that is available here[1]. I was thinking of genericizing the
> > > behavior, but I'm not so sure that is possible now given the error
> > > handling requirements of the associated items.
> > > 
> > > While thinking through that, I ended up incorporating various cleanups
> > > in the somewhat confusing and erratic error handling on the periphery of
> > > xfsaild, such as the flush handlers. Most of these are straightforward
> > > cleanups except for patch 9, which I think requires careful review and
> > > is of debatable value. I have used patch 12 to run an hour or so of
> > > highly concurrent fsstress load against it and will execute a longer run
> > > over the weekend now that fstests has completed.
> > > 
> > > Thoughts, reviews, flames appreciated.
> > 
> > I'll need to do something thinking on this patchset - I have a
> > patchset that touches a lot of the same code I'm working on right
> > now to pin inode cluster buffers in memory when the inode is dirtied
> > so we don't get RMW cycles in AIL flushing.
> > 
> > That code gets rid of xfs_iflush() completely, removes dirty inodes
> > from the AIL and tracks only ordered cluster buffers in the AIL for
> > inode writeback (i.e. reduces AIL tracked log items by up to 30x).
> > It also only does inode writeback from the ordered cluster buffers.
> > 
> 
> Ok. I could see that being reason enough to drop the iflush iodone
> patch, given that it depends on a bit of a rework/hack. A cleaner
> solution requires more thought and it might not be worth the time if the
> code is going away. Most of the rest are straightforward cleanups though
> so I wouldn't expect complex conflict resolution. It's hard to say
> for sure without seeing the code, of course..

Yeah, now I've been though most of it there isn't a huge impact on
my patchset. Mainly just the conflicts in the mods to xfs_iflush and
friends.

> > The idea behind this is to make inode flushing completely
> > non-blocking, and to simply inode cluster flushing to simply iterate
> > all the dirty inodes attached to the buffer. This gets rid of radix
> > tree lookups and races with reclaim, and gets rid of having to
> > special case a locked inode in the cluster iteration code.
> > 
> 
> Sounds interesting, but it's not really clear to me what the general
> flushing dynamic looks like in this model. I.e., you mention
> xfs_iflush() goes away, but cluster flushing still exists in some form,
> so I can't really tell if xfs_iflush() going away is tied to a
> functional change or primarily a refactoring/cleanup. Anyways, no need
> to go into the weeds if the code will eventually clarify..

It's primarily a clean-up to try to reduce AIL pushing overhead as
I'm regularly seeing the xfsaild CPU bound trying to push inodes
that are already on their way to disk. So I'm trying to reduce
cluster flushing to be driven by a buffer item push rather than by
pushing repeatedly on every inode item that is attached to the
buffer. 

> > Do you have a git tree I could pull this from to see how bad the
> > conflicts are?
> > 
> 
> I don't have a public tree. I suppose I could look into getting
> kernel.org access if somebody could point me in the right
> direction for that. :) In the meantime I could make a private tree
> accessible to you directly if that's helpful..

Send a request for an account and git tree to helpdesk@kernel.org
and cc Darrick, Eric and myself so we can ACK the request.

Details here:

https://korg.wiki.kernel.org/userdoc/accounts

and all the userdoc is here:

https://korg.wiki.kernel.org/start

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 05/12] xfs: ratelimit unmount time per-buffer I/O error warning
  2020-04-20 22:23       ` Dave Chinner
@ 2020-04-21 12:13         ` Brian Foster
  0 siblings, 0 replies; 48+ messages in thread
From: Brian Foster @ 2020-04-21 12:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Apr 21, 2020 at 08:23:32AM +1000, Dave Chinner wrote:
> On Mon, Apr 20, 2020 at 10:02:05AM -0400, Brian Foster wrote:
> > On Mon, Apr 20, 2020 at 01:19:59PM +1000, Dave Chinner wrote:
> > > On Fri, Apr 17, 2020 at 11:08:52AM -0400, Brian Foster wrote:
> > > > At unmount time, XFS emits a warning for every in-core buffer that
> > > > might have undergone a write error. In practice this behavior is
> > > > probably reasonable given that the filesystem is likely short lived
> > > > once I/O errors begin to occur consistently. Under certain test or
> > > > otherwise expected error conditions, this can spam the logs and slow
> > > > down the unmount. Ratelimit the warning to prevent this problem
> > > > while still informing the user that errors have occurred.
> > > > 
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > >  fs/xfs/xfs_buf.c | 7 +++----
> > > >  1 file changed, 3 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > > > index 93942d8e35dd..5120fed06075 100644
> > > > --- a/fs/xfs/xfs_buf.c
> > > > +++ b/fs/xfs/xfs_buf.c
> > > > @@ -1685,11 +1685,10 @@ xfs_wait_buftarg(
> > > >  			bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
> > > >  			list_del_init(&bp->b_lru);
> > > >  			if (bp->b_flags & XBF_WRITE_FAIL) {
> > > > -				xfs_alert(btp->bt_mount,
> > > > -"Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!",
> > > > +				xfs_alert_ratelimited(btp->bt_mount,
> > > > +"Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!\n"
> > > > +"Please run xfs_repair to determine the extent of the problem.",
> > > >  					(long long)bp->b_bn);
> > > 
> > > Hmmmm. I was under the impression that multiple line log messages
> > > were frowned upon because they prevent every output line in the log
> > > being tagged correctly. That's where KERN_CONT came from (i.e. it's
> > > a continuation of a previous log message), but we don't use that
> > > with the XFS logging and hence multi-line log messages are split
> > > into multiple logging calls.
> > > 
> > 
> > I debated combining these into a single line for that exact reason for
> > about a second and then just went with this because I didn't think it
> > mattered that much.
> 
> It doesn't matter to us, but it does matter to those people who want
> their log entries correctly tagged for their classification
> engines...
> 

Makes sense, though I am a bit curious whether it would be categorized
correctly even when fixed up, or whether something like a single long
line would be preferred over two. *shrug*

> > > IOWs, this might be better handled just using a static ratelimit
> > > variable here....
> > > 
> > > Actually, we already have one for xfs_buf_item_push() to limit
> > > warnings about retrying XBF_WRITE_FAIL buffers:
> > > 
> > > static DEFINE_RATELIMIT_STATE(xfs_buf_write_fail_rl_state, 30 * HZ, 10);
> > > 
> > > Perhaps we should be using the same ratelimit variable here....
> > > 
> > 
> > IIRC that was static in another file, but we can centralize (and perhaps
> > generalize..) it somewhere if that is preferred..
> 
> I think it makes sense to have all the buffer write fail
> messages ratelimited under the same variable - once it starts
> spewing messages, we should limit them all the same way...
> 

Yeah. I actually ended up sticking the ratelimit in the buftarg as it
comes off a bit cleaner than a global and I don't think there's much of
a practical difference in having a per-target limit.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2020-04-21 12:13 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-17 15:08 [PATCH 00/12] xfs: flush related error handling cleanups Brian Foster
2020-04-17 15:08 ` [PATCH 01/12] xfs: refactor failed buffer resubmission into xfsaild Brian Foster
2020-04-17 22:37   ` Allison Collins
2020-04-20  2:45   ` Dave Chinner
2020-04-20 13:58     ` Brian Foster
2020-04-20 22:19       ` Dave Chinner
2020-04-17 15:08 ` [PATCH 02/12] xfs: factor out buffer I/O failure simulation code Brian Foster
2020-04-17 22:37   ` Allison Collins
2020-04-20  2:48   ` Dave Chinner
2020-04-20 13:58     ` Brian Foster
2020-04-17 15:08 ` [PATCH 03/12] xfs: always attach iflush_done and simplify error handling Brian Foster
2020-04-18  0:07   ` Allison Collins
2020-04-20 13:59     ` Brian Foster
2020-04-20  3:08   ` Dave Chinner
2020-04-20 14:00     ` Brian Foster
2020-04-17 15:08 ` [PATCH 04/12] xfs: remove unnecessary shutdown check from xfs_iflush() Brian Foster
2020-04-18  0:27   ` Allison Collins
2020-04-20  3:10   ` Dave Chinner
2020-04-17 15:08 ` [PATCH 05/12] xfs: ratelimit unmount time per-buffer I/O error warning Brian Foster
2020-04-20  3:19   ` Dave Chinner
2020-04-20 14:02     ` Brian Foster
2020-04-20 22:23       ` Dave Chinner
2020-04-21 12:13         ` Brian Foster
2020-04-20 18:50   ` Allison Collins
2020-04-17 15:08 ` [PATCH 06/12] xfs: remove duplicate verification from xfs_qm_dqflush() Brian Foster
2020-04-20  3:53   ` Dave Chinner
2020-04-20 14:02     ` Brian Foster
2020-04-20 22:31       ` Dave Chinner
2020-04-17 15:08 ` [PATCH 07/12] xfs: abort consistently on dquot flush failure Brian Foster
2020-04-20  3:54   ` Dave Chinner
2020-04-20 18:50   ` Allison Collins
2020-04-17 15:08 ` [PATCH 08/12] xfs: remove unnecessary quotaoff intent item push handler Brian Foster
2020-04-20  3:58   ` Dave Chinner
2020-04-20 14:02     ` Brian Foster
2020-04-17 15:08 ` [PATCH 09/12] xfs: elide the AIL lock on log item failure tracking Brian Foster
2020-04-17 15:08 ` [PATCH 10/12] xfs: clean up AIL log item removal functions Brian Foster
2020-04-20  4:32   ` Dave Chinner
2020-04-20 14:03     ` Brian Foster
2020-04-17 15:08 ` [PATCH 11/12] xfs: remove unused iflush stale parameter Brian Foster
2020-04-20  4:34   ` Dave Chinner
2020-04-20 19:19   ` Allison Collins
2020-04-17 15:08 ` [PATCH 12/12] xfs: random buffer write failure errortag Brian Foster
2020-04-20  4:37   ` Dave Chinner
2020-04-20 14:04     ` Brian Foster
2020-04-20 22:42   ` Allison Collins
2020-04-19 22:53 ` [PATCH 00/12] xfs: flush related error handling cleanups Dave Chinner
2020-04-20 14:06   ` Brian Foster
2020-04-20 22:53     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).