All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2 0/4] xfs: small fixes for 5.12
@ 2021-03-07 20:25 Darrick J. Wong
  2021-03-07 20:25 ` [PATCH 1/4] xfs: fix quota accounting when a mount is idmapped Darrick J. Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-07 20:25 UTC (permalink / raw)
  To: djwong
  Cc: Christoph Hellwig, Christian Brauner, linux-xfs, hch, dchinner,
	christian.brauner

Hi all,

Here's a handful of bug fixes for 5.12.  The first one fixes a bug in
quota accounting on idmapped mounts; the second avoids a buffer deadlock
in bulkstat/inumbers if the inobt is corrupt; and the third fixes a
mount hang if mount fails after creating (or otherwise dirtying) the
quota inodes.

v2: remove unnecessary freeze protection from fsmap, refactor shared
    parts of unmount code, tweaks to the itable deadlock detection

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=random-fixes-5.12
---
 fs/xfs/xfs_fsmap.c   |   14 +++----
 fs/xfs/xfs_inode.c   |   14 ++++---
 fs/xfs/xfs_itable.c  |   40 ++++++++++++++++++--
 fs/xfs/xfs_iwalk.c   |   32 ++++++++++++++--
 fs/xfs/xfs_mount.c   |  100 +++++++++++++++++++++++++++-----------------------
 fs/xfs/xfs_symlink.c |    3 +-
 6 files changed, 132 insertions(+), 71 deletions(-)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] xfs: fix quota accounting when a mount is idmapped
  2021-03-07 20:25 [PATCHSET v2 0/4] xfs: small fixes for 5.12 Darrick J. Wong
@ 2021-03-07 20:25 ` Darrick J. Wong
  2021-03-07 22:28   ` Dave Chinner
  2021-03-07 20:25 ` [PATCH 2/4] xfs: avoid buffer deadlocks when walking fs inodes Darrick J. Wong
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-07 20:25 UTC (permalink / raw)
  To: djwong
  Cc: Christoph Hellwig, Christian Brauner, linux-xfs, hch, dchinner,
	christian.brauner

From: Darrick J. Wong <djwong@kernel.org>

Nowadays, we indirectly use the idmap-aware helper functions in the VFS
to set the initial uid and gid of a file being created.  Unfortunately,
we didn't convert the quota code, which means we attach the wrong dquots
to files created on an idmapped mount.

Fixes: f736d93d76d3 ("xfs: support idmapped mounts")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 fs/xfs/xfs_inode.c   |   14 ++++++++------
 fs/xfs/xfs_symlink.c |    3 ++-
 2 files changed, 10 insertions(+), 7 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 46a861d55e48..f93370bd7b1e 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1007,9 +1007,10 @@ xfs_create(
 	/*
 	 * Make sure that we have allocated dquot(s) on disk.
 	 */
-	error = xfs_qm_vop_dqalloc(dp, current_fsuid(), current_fsgid(), prid,
-					XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT,
-					&udqp, &gdqp, &pdqp);
+	error = xfs_qm_vop_dqalloc(dp, fsuid_into_mnt(mnt_userns),
+			fsgid_into_mnt(mnt_userns), prid,
+			XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT,
+			&udqp, &gdqp, &pdqp);
 	if (error)
 		return error;
 
@@ -1157,9 +1158,10 @@ xfs_create_tmpfile(
 	/*
 	 * Make sure that we have allocated dquot(s) on disk.
 	 */
-	error = xfs_qm_vop_dqalloc(dp, current_fsuid(), current_fsgid(), prid,
-				XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT,
-				&udqp, &gdqp, &pdqp);
+	error = xfs_qm_vop_dqalloc(dp, fsuid_into_mnt(mnt_userns),
+			fsgid_into_mnt(mnt_userns), prid,
+			XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT,
+			&udqp, &gdqp, &pdqp);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 1379013d74b8..7f368b10ded1 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -182,7 +182,8 @@ xfs_symlink(
 	/*
 	 * Make sure that we have allocated dquot(s) on disk.
 	 */
-	error = xfs_qm_vop_dqalloc(dp, current_fsuid(), current_fsgid(), prid,
+	error = xfs_qm_vop_dqalloc(dp, fsuid_into_mnt(mnt_userns),
+			fsgid_into_mnt(mnt_userns), prid,
 			XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT,
 			&udqp, &gdqp, &pdqp);
 	if (error)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] xfs: avoid buffer deadlocks when walking fs inodes
  2021-03-07 20:25 [PATCHSET v2 0/4] xfs: small fixes for 5.12 Darrick J. Wong
  2021-03-07 20:25 ` [PATCH 1/4] xfs: fix quota accounting when a mount is idmapped Darrick J. Wong
@ 2021-03-07 20:25 ` Darrick J. Wong
  2021-03-07 20:36   ` [PATCH v2.1 " Darrick J. Wong
  2021-03-07 20:25 ` [PATCH 3/4] xfs: force log and push AIL to clear pinned inodes when aborting mount Darrick J. Wong
  2021-03-07 20:26 ` [PATCH 4/4] xfs: drop freeze protection when running GETFSMAP Darrick J. Wong
  3 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-07 20:25 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch, dchinner, christian.brauner

From: Darrick J. Wong <djwong@kernel.org>

When we're servicing an INUMBERS or BULKSTAT request or running
quotacheck, grab an empty transaction so that we can use its inherent
recursive buffer locking abilities to detect inode btree cycles without
hitting ABBA buffer deadlocks.

Found by fuzzing an inode btree pointer to introduce a cycle into the
tree (xfs/365).

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_itable.c |   40 ++++++++++++++++++++++++++++++++++++----
 fs/xfs/xfs_iwalk.c  |   32 +++++++++++++++++++++++++++-----
 2 files changed, 63 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index ca310a125d1e..0cc19135e827 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -19,6 +19,7 @@
 #include "xfs_error.h"
 #include "xfs_icache.h"
 #include "xfs_health.h"
+#include "xfs_trans.h"
 
 /*
  * Bulk Stat
@@ -166,6 +167,7 @@ xfs_bulkstat_one(
 		.formatter	= formatter,
 		.breq		= breq,
 	};
+	struct xfs_trans	*tp;
 	int			error;
 
 	ASSERT(breq->icount == 1);
@@ -175,9 +177,18 @@ xfs_bulkstat_one(
 	if (!bc.buf)
 		return -ENOMEM;
 
+	/*
+	 * Grab an empty transaction so that we can use its recursive buffer
+	 * locking abilities to detect cycles in the inobt without deadlocking.
+	 */
+	error = xfs_trans_alloc_empty(breq->mp, &tp);
+	if (error)
+		goto out;
+
 	error = xfs_bulkstat_one_int(breq->mp, breq->mnt_userns, NULL,
 				     breq->startino, &bc);
-
+	xfs_trans_cancel(tp);
+out:
 	kmem_free(bc.buf);
 
 	/*
@@ -241,6 +252,7 @@ xfs_bulkstat(
 		.formatter	= formatter,
 		.breq		= breq,
 	};
+	struct xfs_trans	*tp;
 	int			error;
 
 	if (breq->mnt_userns != &init_user_ns) {
@@ -256,9 +268,18 @@ xfs_bulkstat(
 	if (!bc.buf)
 		return -ENOMEM;
 
-	error = xfs_iwalk(breq->mp, NULL, breq->startino, breq->flags,
+	/*
+	 * Grab an empty transaction so that we can use its recursive buffer
+	 * locking abilities to detect cycles in the inobt without deadlocking.
+	 */
+	error = xfs_trans_alloc_empty(breq->mp, &tp);
+	if (error)
+		goto out;
+
+	error = xfs_iwalk(breq->mp, tp, breq->startino, breq->flags,
 			xfs_bulkstat_iwalk, breq->icount, &bc);
-
+	xfs_trans_cancel(tp);
+out:
 	kmem_free(bc.buf);
 
 	/*
@@ -371,13 +392,24 @@ xfs_inumbers(
 		.formatter	= formatter,
 		.breq		= breq,
 	};
+	struct xfs_trans	*tp;
 	int			error = 0;
 
 	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
 		return 0;
 
-	error = xfs_inobt_walk(breq->mp, NULL, breq->startino, breq->flags,
+	/*
+	 * Grab an empty transaction so that we can use its recursive buffer
+	 * locking abilities to detect cycles in the inobt without deadlocking.
+	 */
+	error = xfs_trans_alloc_empty(breq->mp, &tp);
+	if (error)
+		goto out;
+
+	error = xfs_inobt_walk(breq->mp, tp, breq->startino, breq->flags,
 			xfs_inumbers_walk, breq->icount, &ic);
+	xfs_trans_cancel(tp);
+out:
 
 	/*
 	 * We found some inode groups, so clear the error status and return
diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index c4a340f1f1e1..e1e889f3647f 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -81,6 +81,9 @@ struct xfs_iwalk_ag {
 
 	/* Skip empty inobt records? */
 	unsigned int			skip_empty:1;
+
+	/* Drop the (hopefully empty) transaction when calling iwalk_fn. */
+	unsigned int			drop_trans:1;
 };
 
 /*
@@ -351,7 +354,6 @@ xfs_iwalk_run_callbacks(
 	int				*has_more)
 {
 	struct xfs_mount		*mp = iwag->mp;
-	struct xfs_trans		*tp = iwag->tp;
 	struct xfs_inobt_rec_incore	*irec;
 	xfs_agino_t			next_agino;
 	int				error;
@@ -361,10 +363,15 @@ xfs_iwalk_run_callbacks(
 	ASSERT(iwag->nr_recs > 0);
 
 	/* Delete cursor but remember the last record we cached... */
-	xfs_iwalk_del_inobt(tp, curpp, agi_bpp, 0);
+	xfs_iwalk_del_inobt(iwag->tp, curpp, agi_bpp, 0);
 	irec = &iwag->recs[iwag->nr_recs - 1];
 	ASSERT(next_agino >= irec->ir_startino + XFS_INODES_PER_CHUNK);
 
+	if (iwag->drop_trans) {
+		xfs_trans_cancel(iwag->tp);
+		iwag->tp = NULL;
+	}
+
 	error = xfs_iwalk_ag_recs(iwag);
 	if (error)
 		return error;
@@ -375,8 +382,14 @@ xfs_iwalk_run_callbacks(
 	if (!has_more)
 		return 0;
 
+	if (iwag->drop_trans) {
+		error = xfs_trans_alloc_empty(mp, &iwag->tp);
+		if (error)
+			return error;
+	}
+
 	/* ...and recreate the cursor just past where we left off. */
-	error = xfs_inobt_cur(mp, tp, agno, XFS_BTNUM_INO, curpp, agi_bpp);
+	error = xfs_inobt_cur(mp, iwag->tp, agno, XFS_BTNUM_INO, curpp, agi_bpp);
 	if (error)
 		return error;
 
@@ -389,7 +402,6 @@ xfs_iwalk_ag(
 	struct xfs_iwalk_ag		*iwag)
 {
 	struct xfs_mount		*mp = iwag->mp;
-	struct xfs_trans		*tp = iwag->tp;
 	struct xfs_buf			*agi_bp = NULL;
 	struct xfs_btree_cur		*cur = NULL;
 	xfs_agnumber_t			agno;
@@ -469,7 +481,7 @@ xfs_iwalk_ag(
 	error = xfs_iwalk_run_callbacks(iwag, agno, &cur, &agi_bp, &has_more);
 
 out:
-	xfs_iwalk_del_inobt(tp, &cur, &agi_bp, error);
+	xfs_iwalk_del_inobt(iwag->tp, &cur, &agi_bp, error);
 	return error;
 }
 
@@ -594,8 +606,18 @@ xfs_iwalk_ag_work(
 	error = xfs_iwalk_alloc(iwag);
 	if (error)
 		goto out;
+	/*
+	 * Grab an empty transaction so that we can use its recursive buffer
+	 * locking abilities to detect cycles in the inobt without deadlocking.
+	 */
+	error = xfs_trans_alloc_empty(mp, &iwag->tp);
+	if (error)
+		goto out;
+	iwag->drop_trans = 1;
 
 	error = xfs_iwalk_ag(iwag);
+	if (iwag->tp)
+		xfs_trans_cancel(iwag->tp);
 	xfs_iwalk_free(iwag);
 out:
 	kmem_free(iwag);


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] xfs: force log and push AIL to clear pinned inodes when aborting mount
  2021-03-07 20:25 [PATCHSET v2 0/4] xfs: small fixes for 5.12 Darrick J. Wong
  2021-03-07 20:25 ` [PATCH 1/4] xfs: fix quota accounting when a mount is idmapped Darrick J. Wong
  2021-03-07 20:25 ` [PATCH 2/4] xfs: avoid buffer deadlocks when walking fs inodes Darrick J. Wong
@ 2021-03-07 20:25 ` Darrick J. Wong
  2021-03-07 23:01   ` Dave Chinner
                     ` (2 more replies)
  2021-03-07 20:26 ` [PATCH 4/4] xfs: drop freeze protection when running GETFSMAP Darrick J. Wong
  3 siblings, 3 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-07 20:25 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch, dchinner, christian.brauner

From: Darrick J. Wong <djwong@kernel.org>

If we allocate quota inodes in the process of mounting a filesystem but
then decide to abort the mount, it's possible that the quota inodes are
sitting around pinned by the log.  Now that inode reclaim relies on the
AIL to flush inodes, we have to force the log and push the AIL in
between releasing the quota inodes and kicking off reclaim to tear down
all the incore inodes.  Do this by extracting the bits we need from the
unmount path and reusing them.

This was originally found during a fuzz test of metadata directories
(xfs/1546), but the actual symptom was that reclaim hung up on the quota
inodes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_mount.c |  100 ++++++++++++++++++++++++++++------------------------
 1 file changed, 54 insertions(+), 46 deletions(-)


diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 52370d0a3f43..556ce373145f 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -634,6 +634,57 @@ xfs_check_summary_counts(
 	return xfs_initialize_perag_data(mp, mp->m_sb.sb_agcount);
 }
 
+/*
+ * Force the log contents and checkpoint them into the filesystem, the reclaim
+ * inodes in preparation to unmount.
+ */
+static void
+xfs_unmount_flush_inodes(
+	struct xfs_mount	*mp)
+{
+	/*
+	 * We can potentially deadlock here if we have an inode cluster
+	 * that has been freed has its buffer still pinned in memory because
+	 * the transaction is still sitting in a iclog. The stale inodes
+	 * on that buffer will be pinned to the buffer until the
+	 * transaction hits the disk and the callbacks run. Pushing the AIL will
+	 * skip the stale inodes and may never see the pinned buffer, so
+	 * nothing will push out the iclog and unpin the buffer. Hence we
+	 * need to force the log here to ensure all items are flushed into the
+	 * AIL before we go any further.
+	 */
+	xfs_log_force(mp, XFS_LOG_SYNC);
+
+	/*
+	 * Wait for all busy extents to be freed, including completion of
+	 * any discard operation.
+	 */
+	xfs_extent_busy_wait_all(mp);
+	flush_workqueue(xfs_discard_wq);
+
+	/*
+	 * We now need to tell the world we are unmounting. This will allow
+	 * us to detect that the filesystem is going away and we should error
+	 * out anything that we have been retrying in the background. This will
+	 * prevent neverending retries in AIL pushing from hanging the unmount.
+	 */
+	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
+
+	/*
+	 * Flush all pending changes from the AIL.
+	 */
+	xfs_ail_push_all_sync(mp->m_ail);
+
+	/*
+	 * Reclaim all inodes. At this point there should be no dirty inodes and
+	 * none should be pinned or locked. Stop background inode reclaim here
+	 * if it is still running.
+	 */
+	cancel_delayed_work_sync(&mp->m_reclaim_work);
+	xfs_reclaim_inodes(mp);
+	xfs_health_unmount(mp);
+}
+
 /*
  * This function does the following on an initial mount of a file system:
  *	- reads the superblock from disk and init the mount struct
@@ -1008,7 +1059,7 @@ xfs_mountfs(
 	/* Clean out dquots that might be in memory after quotacheck. */
 	xfs_qm_unmount(mp);
 	/*
-	 * Cancel all delayed reclaim work and reclaim the inodes directly.
+	 * Flush all inode reclamation work and flush the log.
 	 * We have to do this /after/ rtunmount and qm_unmount because those
 	 * two will have scheduled delayed reclaim for the rt/quota inodes.
 	 *
@@ -1018,11 +1069,8 @@ xfs_mountfs(
 	 * qm_unmount_quotas and therefore rely on qm_unmount to release the
 	 * quota inodes.
 	 */
-	cancel_delayed_work_sync(&mp->m_reclaim_work);
-	xfs_reclaim_inodes(mp);
-	xfs_health_unmount(mp);
+	xfs_unmount_flush_inodes(mp);
  out_log_dealloc:
-	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
 	xfs_log_mount_cancel(mp);
  out_fail_wait:
 	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
@@ -1063,47 +1111,7 @@ xfs_unmountfs(
 	xfs_rtunmount_inodes(mp);
 	xfs_irele(mp->m_rootip);
 
-	/*
-	 * We can potentially deadlock here if we have an inode cluster
-	 * that has been freed has its buffer still pinned in memory because
-	 * the transaction is still sitting in a iclog. The stale inodes
-	 * on that buffer will be pinned to the buffer until the
-	 * transaction hits the disk and the callbacks run. Pushing the AIL will
-	 * skip the stale inodes and may never see the pinned buffer, so
-	 * nothing will push out the iclog and unpin the buffer. Hence we
-	 * need to force the log here to ensure all items are flushed into the
-	 * AIL before we go any further.
-	 */
-	xfs_log_force(mp, XFS_LOG_SYNC);
-
-	/*
-	 * Wait for all busy extents to be freed, including completion of
-	 * any discard operation.
-	 */
-	xfs_extent_busy_wait_all(mp);
-	flush_workqueue(xfs_discard_wq);
-
-	/*
-	 * We now need to tell the world we are unmounting. This will allow
-	 * us to detect that the filesystem is going away and we should error
-	 * out anything that we have been retrying in the background. This will
-	 * prevent neverending retries in AIL pushing from hanging the unmount.
-	 */
-	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
-
-	/*
-	 * Flush all pending changes from the AIL.
-	 */
-	xfs_ail_push_all_sync(mp->m_ail);
-
-	/*
-	 * Reclaim all inodes. At this point there should be no dirty inodes and
-	 * none should be pinned or locked. Stop background inode reclaim here
-	 * if it is still running.
-	 */
-	cancel_delayed_work_sync(&mp->m_reclaim_work);
-	xfs_reclaim_inodes(mp);
-	xfs_health_unmount(mp);
+	xfs_unmount_flush_inodes(mp);
 
 	xfs_qm_unmount(mp);
 


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] xfs: drop freeze protection when running GETFSMAP
  2021-03-07 20:25 [PATCHSET v2 0/4] xfs: small fixes for 5.12 Darrick J. Wong
                   ` (2 preceding siblings ...)
  2021-03-07 20:25 ` [PATCH 3/4] xfs: force log and push AIL to clear pinned inodes when aborting mount Darrick J. Wong
@ 2021-03-07 20:26 ` Darrick J. Wong
  2021-03-07 23:05   ` Dave Chinner
  2021-03-08  7:55   ` Christoph Hellwig
  3 siblings, 2 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-07 20:26 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch, dchinner, christian.brauner

From: Darrick J. Wong <djwong@kernel.org>

A recent log refactoring patchset from Brian Foster relaxed fsfreeze
behavior with regards to the buffer cache -- now freeze only waits for
pending buffer IO to finish, and does not try to drain the buffer cache
LRU.  As a result, fsfreeze should no longer stall indefinitely while
fsmap runs.  Drop the sb_start_write calls around fsmap invocations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_fsmap.c |   14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 9ce5e7d5bf8f..34f2b971ce43 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -904,14 +904,6 @@ xfs_getfsmap(
 	info.fsmap_recs = fsmap_recs;
 	info.head = head;
 
-	/*
-	 * If fsmap runs concurrently with a scrub, the freeze can be delayed
-	 * indefinitely as we walk the rmapbt and iterate over metadata
-	 * buffers.  Freeze quiesces the log (which waits for the buffer LRU to
-	 * be emptied) and that won't happen while we're reading buffers.
-	 */
-	sb_start_write(mp->m_super);
-
 	/* For each device we support... */
 	for (i = 0; i < XFS_GETFSMAP_DEVS; i++) {
 		/* Is this device within the range the user asked for? */
@@ -934,6 +926,11 @@ xfs_getfsmap(
 		if (handlers[i].dev > head->fmh_keys[0].fmr_device)
 			memset(&dkeys[0], 0, sizeof(struct xfs_fsmap));
 
+		/*
+		 * Grab an empty transaction so that we can use its recursive
+		 * buffer locking abilities to detect cycles in the rmapbt
+		 * without deadlocking.
+		 */
 		error = xfs_trans_alloc_empty(mp, &tp);
 		if (error)
 			break;
@@ -951,7 +948,6 @@ xfs_getfsmap(
 
 	if (tp)
 		xfs_trans_cancel(tp);
-	sb_end_write(mp->m_super);
 	head->fmh_oflags = FMH_OF_DEV_T;
 	return error;
 }


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2.1 2/4] xfs: avoid buffer deadlocks when walking fs inodes
  2021-03-07 20:25 ` [PATCH 2/4] xfs: avoid buffer deadlocks when walking fs inodes Darrick J. Wong
@ 2021-03-07 20:36   ` Darrick J. Wong
  2021-03-07 22:37     ` Dave Chinner
  2021-03-08  7:56     ` Christoph Hellwig
  0 siblings, 2 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-07 20:36 UTC (permalink / raw)
  To: linux-xfs, hch, dchinner, christian.brauner

From: Darrick J. Wong <djwong@kernel.org>

When we're servicing an INUMBERS or BULKSTAT request or running
quotacheck, grab an empty transaction so that we can use its inherent
recursive buffer locking abilities to detect inode btree cycles without
hitting ABBA buffer deadlocks.

Found by fuzzing an inode btree pointer to introduce a cycle into the
tree (xfs/365).

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
v2.1: actually pass tp in the bulkstat_single case
---
 fs/xfs/xfs_itable.c |   42 +++++++++++++++++++++++++++++++++++++-----
 fs/xfs/xfs_iwalk.c  |   32 +++++++++++++++++++++++++++-----
 2 files changed, 64 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index ca310a125d1e..326495c8910e 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -19,6 +19,7 @@
 #include "xfs_error.h"
 #include "xfs_icache.h"
 #include "xfs_health.h"
+#include "xfs_trans.h"
 
 /*
  * Bulk Stat
@@ -166,6 +167,7 @@ xfs_bulkstat_one(
 		.formatter	= formatter,
 		.breq		= breq,
 	};
+	struct xfs_trans	*tp;
 	int			error;
 
 	ASSERT(breq->icount == 1);
@@ -175,9 +177,18 @@ xfs_bulkstat_one(
 	if (!bc.buf)
 		return -ENOMEM;
 
-	error = xfs_bulkstat_one_int(breq->mp, breq->mnt_userns, NULL,
-				     breq->startino, &bc);
+	/*
+	 * Grab an empty transaction so that we can use its recursive buffer
+	 * locking abilities to detect cycles in the inobt without deadlocking.
+	 */
+	error = xfs_trans_alloc_empty(breq->mp, &tp);
+	if (error)
+		goto out;
 
+	error = xfs_bulkstat_one_int(breq->mp, breq->mnt_userns, tp,
+			breq->startino, &bc);
+	xfs_trans_cancel(tp);
+out:
 	kmem_free(bc.buf);
 
 	/*
@@ -241,6 +252,7 @@ xfs_bulkstat(
 		.formatter	= formatter,
 		.breq		= breq,
 	};
+	struct xfs_trans	*tp;
 	int			error;
 
 	if (breq->mnt_userns != &init_user_ns) {
@@ -256,9 +268,18 @@ xfs_bulkstat(
 	if (!bc.buf)
 		return -ENOMEM;
 
-	error = xfs_iwalk(breq->mp, NULL, breq->startino, breq->flags,
+	/*
+	 * Grab an empty transaction so that we can use its recursive buffer
+	 * locking abilities to detect cycles in the inobt without deadlocking.
+	 */
+	error = xfs_trans_alloc_empty(breq->mp, &tp);
+	if (error)
+		goto out;
+
+	error = xfs_iwalk(breq->mp, tp, breq->startino, breq->flags,
 			xfs_bulkstat_iwalk, breq->icount, &bc);
-
+	xfs_trans_cancel(tp);
+out:
 	kmem_free(bc.buf);
 
 	/*
@@ -371,13 +392,24 @@ xfs_inumbers(
 		.formatter	= formatter,
 		.breq		= breq,
 	};
+	struct xfs_trans	*tp;
 	int			error = 0;
 
 	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
 		return 0;
 
-	error = xfs_inobt_walk(breq->mp, NULL, breq->startino, breq->flags,
+	/*
+	 * Grab an empty transaction so that we can use its recursive buffer
+	 * locking abilities to detect cycles in the inobt without deadlocking.
+	 */
+	error = xfs_trans_alloc_empty(breq->mp, &tp);
+	if (error)
+		goto out;
+
+	error = xfs_inobt_walk(breq->mp, tp, breq->startino, breq->flags,
 			xfs_inumbers_walk, breq->icount, &ic);
+	xfs_trans_cancel(tp);
+out:
 
 	/*
 	 * We found some inode groups, so clear the error status and return
diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index c4a340f1f1e1..e1e889f3647f 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -81,6 +81,9 @@ struct xfs_iwalk_ag {
 
 	/* Skip empty inobt records? */
 	unsigned int			skip_empty:1;
+
+	/* Drop the (hopefully empty) transaction when calling iwalk_fn. */
+	unsigned int			drop_trans:1;
 };
 
 /*
@@ -351,7 +354,6 @@ xfs_iwalk_run_callbacks(
 	int				*has_more)
 {
 	struct xfs_mount		*mp = iwag->mp;
-	struct xfs_trans		*tp = iwag->tp;
 	struct xfs_inobt_rec_incore	*irec;
 	xfs_agino_t			next_agino;
 	int				error;
@@ -361,10 +363,15 @@ xfs_iwalk_run_callbacks(
 	ASSERT(iwag->nr_recs > 0);
 
 	/* Delete cursor but remember the last record we cached... */
-	xfs_iwalk_del_inobt(tp, curpp, agi_bpp, 0);
+	xfs_iwalk_del_inobt(iwag->tp, curpp, agi_bpp, 0);
 	irec = &iwag->recs[iwag->nr_recs - 1];
 	ASSERT(next_agino >= irec->ir_startino + XFS_INODES_PER_CHUNK);
 
+	if (iwag->drop_trans) {
+		xfs_trans_cancel(iwag->tp);
+		iwag->tp = NULL;
+	}
+
 	error = xfs_iwalk_ag_recs(iwag);
 	if (error)
 		return error;
@@ -375,8 +382,14 @@ xfs_iwalk_run_callbacks(
 	if (!has_more)
 		return 0;
 
+	if (iwag->drop_trans) {
+		error = xfs_trans_alloc_empty(mp, &iwag->tp);
+		if (error)
+			return error;
+	}
+
 	/* ...and recreate the cursor just past where we left off. */
-	error = xfs_inobt_cur(mp, tp, agno, XFS_BTNUM_INO, curpp, agi_bpp);
+	error = xfs_inobt_cur(mp, iwag->tp, agno, XFS_BTNUM_INO, curpp, agi_bpp);
 	if (error)
 		return error;
 
@@ -389,7 +402,6 @@ xfs_iwalk_ag(
 	struct xfs_iwalk_ag		*iwag)
 {
 	struct xfs_mount		*mp = iwag->mp;
-	struct xfs_trans		*tp = iwag->tp;
 	struct xfs_buf			*agi_bp = NULL;
 	struct xfs_btree_cur		*cur = NULL;
 	xfs_agnumber_t			agno;
@@ -469,7 +481,7 @@ xfs_iwalk_ag(
 	error = xfs_iwalk_run_callbacks(iwag, agno, &cur, &agi_bp, &has_more);
 
 out:
-	xfs_iwalk_del_inobt(tp, &cur, &agi_bp, error);
+	xfs_iwalk_del_inobt(iwag->tp, &cur, &agi_bp, error);
 	return error;
 }
 
@@ -594,8 +606,18 @@ xfs_iwalk_ag_work(
 	error = xfs_iwalk_alloc(iwag);
 	if (error)
 		goto out;
+	/*
+	 * Grab an empty transaction so that we can use its recursive buffer
+	 * locking abilities to detect cycles in the inobt without deadlocking.
+	 */
+	error = xfs_trans_alloc_empty(mp, &iwag->tp);
+	if (error)
+		goto out;
+	iwag->drop_trans = 1;
 
 	error = xfs_iwalk_ag(iwag);
+	if (iwag->tp)
+		xfs_trans_cancel(iwag->tp);
 	xfs_iwalk_free(iwag);
 out:
 	kmem_free(iwag);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] xfs: fix quota accounting when a mount is idmapped
  2021-03-07 20:25 ` [PATCH 1/4] xfs: fix quota accounting when a mount is idmapped Darrick J. Wong
@ 2021-03-07 22:28   ` Dave Chinner
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2021-03-07 22:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, Christian Brauner, linux-xfs, dchinner

On Sun, Mar 07, 2021 at 12:25:46PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Nowadays, we indirectly use the idmap-aware helper functions in the VFS
> to set the initial uid and gid of a file being created.  Unfortunately,
> we didn't convert the quota code, which means we attach the wrong dquots
> to files created on an idmapped mount.
> 
> Fixes: f736d93d76d3 ("xfs: support idmapped mounts")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
> ---
>  fs/xfs/xfs_inode.c   |   14 ++++++++------
>  fs/xfs/xfs_symlink.c |    3 ++-
>  2 files changed, 10 insertions(+), 7 deletions(-)

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2.1 2/4] xfs: avoid buffer deadlocks when walking fs inodes
  2021-03-07 20:36   ` [PATCH v2.1 " Darrick J. Wong
@ 2021-03-07 22:37     ` Dave Chinner
  2021-03-08  3:56       ` Darrick J. Wong
  2021-03-08  7:56     ` Christoph Hellwig
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2021-03-07 22:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, dchinner, christian.brauner

On Sun, Mar 07, 2021 at 12:36:38PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> When we're servicing an INUMBERS or BULKSTAT request or running
> quotacheck, grab an empty transaction so that we can use its inherent
> recursive buffer locking abilities to detect inode btree cycles without
> hitting ABBA buffer deadlocks.
> 
> Found by fuzzing an inode btree pointer to introduce a cycle into the
> tree (xfs/365).
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> v2.1: actually pass tp in the bulkstat_single case
> ---
>  fs/xfs/xfs_itable.c |   42 +++++++++++++++++++++++++++++++++++++-----
>  fs/xfs/xfs_iwalk.c  |   32 +++++++++++++++++++++++++++-----
>  2 files changed, 64 insertions(+), 10 deletions(-)

Looks ok, but I can't help but wonder if this case should flag
corruption if lock recursion does actually occur...

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] xfs: force log and push AIL to clear pinned inodes when aborting mount
  2021-03-07 20:25 ` [PATCH 3/4] xfs: force log and push AIL to clear pinned inodes when aborting mount Darrick J. Wong
@ 2021-03-07 23:01   ` Dave Chinner
  2021-03-08  4:47     ` Darrick J. Wong
  2021-03-08  4:48   ` [PATCH v2.1 " Darrick J. Wong
  2021-03-08  7:53   ` [PATCH " Christoph Hellwig
  2 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2021-03-07 23:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, dchinner, christian.brauner

On Sun, Mar 07, 2021 at 12:25:57PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> If we allocate quota inodes in the process of mounting a filesystem but
> then decide to abort the mount, it's possible that the quota inodes are
> sitting around pinned by the log.  Now that inode reclaim relies on the
> AIL to flush inodes, we have to force the log and push the AIL in
> between releasing the quota inodes and kicking off reclaim to tear down
> all the incore inodes.  Do this by extracting the bits we need from the
> unmount path and reusing them.
> 
> This was originally found during a fuzz test of metadata directories
> (xfs/1546), but the actual symptom was that reclaim hung up on the quota
> inodes.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_mount.c |  100 ++++++++++++++++++++++++++++------------------------
>  1 file changed, 54 insertions(+), 46 deletions(-)

Seems reasonable.

> 
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 52370d0a3f43..556ce373145f 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -634,6 +634,57 @@ xfs_check_summary_counts(
>  	return xfs_initialize_perag_data(mp, mp->m_sb.sb_agcount);
>  }
>  
> +/*
> + * Force the log contents and checkpoint them into the filesystem, the reclaim
> + * inodes in preparation to unmount.

"then reclaim"

Ignoring the typo, the comment doesn't add anything useful - you're
saying what the function does, not why. I'd prefer you lift all the
comments in the code up into the header, explaining why each step
is needed/taken. Something like:

/*
 * Flush and reclaim dirty inodes in preparation for unmount. Inodes and
 * internal inode structures can be sitting in the CIL and AIL at this point, so
 * we need to unpin them, write them back and/or reclaim them before unmount can
 * proceed.
 *
 * An inode cluster that has been freed can have its buffer still pinned in
 * memory because the transaction is still sitting in a iclog. The stale inodes
 * on that buffer will be pinned to the buffer until the transaction hits the
 * disk and the callbacks run. Pushing the AIL will skip the stale inodes and
 * may never see the pinned buffer, so nothing will push out the iclog and unpin
 * the buffer.
 *
 * Hence we need to force the log to unpin everything first. However, log forces
 * don't wait for the discards they issue to complete, so we have to explicitly
 * wait for them to complete here as well.
 *
 * Then we can tell the world we are unmounting so that error handling knows
 * that the filesystem is going away and we should error out anything that we
 * have been retrying in the background.  This will prevent never-ending retries
 * in AIL pushing from hanging the unmount.
 *
 * Finally, we can push the AIL to clean all the remaining dirty objects, then
 * reclaim the remaining inodes that are still in memory at this point in
 * time.
 */
static void
xfs_unmount_flush_inodes(
	struct xfs_mount	*mp)
{
	xfs_log_force(mp, XFS_LOG_SYNC);
	xfs_extent_busy_wait_all(mp);
	flush_workqueue(xfs_discard_wq);

	mp->m_flags |= XFS_MOUNT_UNMOUNTING;

	xfs_ail_push_all_sync(mp->m_ail);
	cancel_delayed_work_sync(&mp->m_reclaim_work);
	xfs_reclaim_inodes(mp);
	xfs_health_unmount(mp);
}

Everything else looks fine.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] xfs: drop freeze protection when running GETFSMAP
  2021-03-07 20:26 ` [PATCH 4/4] xfs: drop freeze protection when running GETFSMAP Darrick J. Wong
@ 2021-03-07 23:05   ` Dave Chinner
  2021-03-08  4:45     ` Darrick J. Wong
  2021-03-08  7:55   ` Christoph Hellwig
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2021-03-07 23:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, dchinner, christian.brauner

On Sun, Mar 07, 2021 at 12:26:02PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> A recent log refactoring patchset from Brian Foster relaxed fsfreeze
> behavior with regards to the buffer cache -- now freeze only waits for
> pending buffer IO to finish, and does not try to drain the buffer cache
> LRU.  As a result, fsfreeze should no longer stall indefinitely while
> fsmap runs.  Drop the sb_start_write calls around fsmap invocations.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_fsmap.c |   14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
> index 9ce5e7d5bf8f..34f2b971ce43 100644
> --- a/fs/xfs/xfs_fsmap.c
> +++ b/fs/xfs/xfs_fsmap.c
> @@ -904,14 +904,6 @@ xfs_getfsmap(
>  	info.fsmap_recs = fsmap_recs;
>  	info.head = head;
>  
> -	/*
> -	 * If fsmap runs concurrently with a scrub, the freeze can be delayed
> -	 * indefinitely as we walk the rmapbt and iterate over metadata
> -	 * buffers.  Freeze quiesces the log (which waits for the buffer LRU to
> -	 * be emptied) and that won't happen while we're reading buffers.
> -	 */
> -	sb_start_write(mp->m_super);
> -
>  	/* For each device we support... */
>  	for (i = 0; i < XFS_GETFSMAP_DEVS; i++) {
>  		/* Is this device within the range the user asked for? */
> @@ -934,6 +926,11 @@ xfs_getfsmap(
>  		if (handlers[i].dev > head->fmh_keys[0].fmr_device)
>  			memset(&dkeys[0], 0, sizeof(struct xfs_fsmap));
>  
> +		/*
> +		 * Grab an empty transaction so that we can use its recursive
> +		 * buffer locking abilities to detect cycles in the rmapbt
> +		 * without deadlocking.
> +		 */
>  		error = xfs_trans_alloc_empty(mp, &tp);
>  		if (error)
>  			break;

Took me a moment to work out that this is just adding a comment
because it wasn't mentioned in the commit log. Somewhat unrelated to
the bug fix but it's harmless so I don't see any need for you to
do any extra work to respin this patch to remove it.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2.1 2/4] xfs: avoid buffer deadlocks when walking fs inodes
  2021-03-07 22:37     ` Dave Chinner
@ 2021-03-08  3:56       ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-08  3:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, hch, dchinner, christian.brauner

On Mon, Mar 08, 2021 at 09:37:03AM +1100, Dave Chinner wrote:
> On Sun, Mar 07, 2021 at 12:36:38PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > When we're servicing an INUMBERS or BULKSTAT request or running
> > quotacheck, grab an empty transaction so that we can use its inherent
> > recursive buffer locking abilities to detect inode btree cycles without
> > hitting ABBA buffer deadlocks.
> > 
> > Found by fuzzing an inode btree pointer to introduce a cycle into the
> > tree (xfs/365).
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> > v2.1: actually pass tp in the bulkstat_single case
> > ---
> >  fs/xfs/xfs_itable.c |   42 +++++++++++++++++++++++++++++++++++++-----
> >  fs/xfs/xfs_iwalk.c  |   32 +++++++++++++++++++++++++++-----
> >  2 files changed, 64 insertions(+), 10 deletions(-)
> 
> Looks ok, but I can't help but wonder if this case should flag
> corruption if lock recursion does actually occur...

Hm.  Scrub will, since it checks the level and block number of every
pointer it follows.  IIRC the rest of the btree code isn't as paranoid
about things like that.

> Reviewed-by: Dave Chinner <dchinner@redhat.com>

Anyway, thanks for the reviews.

--D

> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] xfs: drop freeze protection when running GETFSMAP
  2021-03-07 23:05   ` Dave Chinner
@ 2021-03-08  4:45     ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-08  4:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, hch, dchinner, christian.brauner

On Mon, Mar 08, 2021 at 10:05:55AM +1100, Dave Chinner wrote:
> On Sun, Mar 07, 2021 at 12:26:02PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > A recent log refactoring patchset from Brian Foster relaxed fsfreeze
> > behavior with regards to the buffer cache -- now freeze only waits for
> > pending buffer IO to finish, and does not try to drain the buffer cache
> > LRU.  As a result, fsfreeze should no longer stall indefinitely while
> > fsmap runs.  Drop the sb_start_write calls around fsmap invocations.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/xfs_fsmap.c |   14 +++++---------
> >  1 file changed, 5 insertions(+), 9 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
> > index 9ce5e7d5bf8f..34f2b971ce43 100644
> > --- a/fs/xfs/xfs_fsmap.c
> > +++ b/fs/xfs/xfs_fsmap.c
> > @@ -904,14 +904,6 @@ xfs_getfsmap(
> >  	info.fsmap_recs = fsmap_recs;
> >  	info.head = head;
> >  
> > -	/*
> > -	 * If fsmap runs concurrently with a scrub, the freeze can be delayed
> > -	 * indefinitely as we walk the rmapbt and iterate over metadata
> > -	 * buffers.  Freeze quiesces the log (which waits for the buffer LRU to
> > -	 * be emptied) and that won't happen while we're reading buffers.
> > -	 */
> > -	sb_start_write(mp->m_super);
> > -
> >  	/* For each device we support... */
> >  	for (i = 0; i < XFS_GETFSMAP_DEVS; i++) {
> >  		/* Is this device within the range the user asked for? */
> > @@ -934,6 +926,11 @@ xfs_getfsmap(
> >  		if (handlers[i].dev > head->fmh_keys[0].fmr_device)
> >  			memset(&dkeys[0], 0, sizeof(struct xfs_fsmap));
> >  
> > +		/*
> > +		 * Grab an empty transaction so that we can use its recursive
> > +		 * buffer locking abilities to detect cycles in the rmapbt
> > +		 * without deadlocking.
> > +		 */
> >  		error = xfs_trans_alloc_empty(mp, &tp);
> >  		if (error)
> >  			break;
> 
> Took me a moment to work out that this is just adding a comment
> because it wasn't mentioned in the commit log. Somewhat unrelated to
> the bug fix but it's harmless so I don't see any need for you to
> do any extra work to respin this patch to remove it.

I'll add a sentence to the commit message explaining why we're adding a
seemingly random (but related!) comment:

"While we're cleaning things, add a comment to the xfs_trans_alloc_empty
call explaining why we're running around with empty transactions."

--D

> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] xfs: force log and push AIL to clear pinned inodes when aborting mount
  2021-03-07 23:01   ` Dave Chinner
@ 2021-03-08  4:47     ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-08  4:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, hch, dchinner, christian.brauner

On Mon, Mar 08, 2021 at 10:01:23AM +1100, Dave Chinner wrote:
> On Sun, Mar 07, 2021 at 12:25:57PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > If we allocate quota inodes in the process of mounting a filesystem but
> > then decide to abort the mount, it's possible that the quota inodes are
> > sitting around pinned by the log.  Now that inode reclaim relies on the
> > AIL to flush inodes, we have to force the log and push the AIL in
> > between releasing the quota inodes and kicking off reclaim to tear down
> > all the incore inodes.  Do this by extracting the bits we need from the
> > unmount path and reusing them.
> > 
> > This was originally found during a fuzz test of metadata directories
> > (xfs/1546), but the actual symptom was that reclaim hung up on the quota
> > inodes.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/xfs_mount.c |  100 ++++++++++++++++++++++++++++------------------------
> >  1 file changed, 54 insertions(+), 46 deletions(-)
> 
> Seems reasonable.
> 
> > 
> > 
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index 52370d0a3f43..556ce373145f 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -634,6 +634,57 @@ xfs_check_summary_counts(
> >  	return xfs_initialize_perag_data(mp, mp->m_sb.sb_agcount);
> >  }
> >  
> > +/*
> > + * Force the log contents and checkpoint them into the filesystem, the reclaim
> > + * inodes in preparation to unmount.
> 
> "then reclaim"
> 
> Ignoring the typo, the comment doesn't add anything useful - you're
> saying what the function does, not why. I'd prefer you lift all the
> comments in the code up into the header, explaining why each step
> is needed/taken. Something like:
> 
> /*
>  * Flush and reclaim dirty inodes in preparation for unmount. Inodes and
>  * internal inode structures can be sitting in the CIL and AIL at this point, so
>  * we need to unpin them, write them back and/or reclaim them before unmount can
>  * proceed.
>  *
>  * An inode cluster that has been freed can have its buffer still pinned in
>  * memory because the transaction is still sitting in a iclog. The stale inodes
>  * on that buffer will be pinned to the buffer until the transaction hits the
>  * disk and the callbacks run. Pushing the AIL will skip the stale inodes and
>  * may never see the pinned buffer, so nothing will push out the iclog and unpin
>  * the buffer.
>  *
>  * Hence we need to force the log to unpin everything first. However, log forces
>  * don't wait for the discards they issue to complete, so we have to explicitly
>  * wait for them to complete here as well.
>  *
>  * Then we can tell the world we are unmounting so that error handling knows
>  * that the filesystem is going away and we should error out anything that we
>  * have been retrying in the background.  This will prevent never-ending retries
>  * in AIL pushing from hanging the unmount.
>  *
>  * Finally, we can push the AIL to clean all the remaining dirty objects, then
>  * reclaim the remaining inodes that are still in memory at this point in
>  * time.
>  */

Ok.  Seems good to me.

--D

> static void
> xfs_unmount_flush_inodes(
> 	struct xfs_mount	*mp)
> {
> 	xfs_log_force(mp, XFS_LOG_SYNC);
> 	xfs_extent_busy_wait_all(mp);
> 	flush_workqueue(xfs_discard_wq);
> 
> 	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
> 
> 	xfs_ail_push_all_sync(mp->m_ail);
> 	cancel_delayed_work_sync(&mp->m_reclaim_work);
> 	xfs_reclaim_inodes(mp);
> 	xfs_health_unmount(mp);
> }
> 
> Everything else looks fine.
> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2.1 3/4] xfs: force log and push AIL to clear pinned inodes when aborting mount
  2021-03-07 20:25 ` [PATCH 3/4] xfs: force log and push AIL to clear pinned inodes when aborting mount Darrick J. Wong
  2021-03-07 23:01   ` Dave Chinner
@ 2021-03-08  4:48   ` Darrick J. Wong
  2021-03-08  9:16     ` Christoph Hellwig
  2021-03-08 23:42     ` Dave Chinner
  2021-03-08  7:53   ` [PATCH " Christoph Hellwig
  2 siblings, 2 replies; 19+ messages in thread
From: Darrick J. Wong @ 2021-03-08  4:48 UTC (permalink / raw)
  To: linux-xfs, hch, dchinner, christian.brauner

From: Darrick J. Wong <djwong@kernel.org>

If we allocate quota inodes in the process of mounting a filesystem but
then decide to abort the mount, it's possible that the quota inodes are
sitting around pinned by the log.  Now that inode reclaim relies on the
AIL to flush inodes, we have to force the log and push the AIL in
between releasing the quota inodes and kicking off reclaim to tear down
all the incore inodes.  Do this by extracting the bits we need from the
unmount path and reusing them; as an added bonus, failed writes during
a failed mount will not retry forever.

This was originally found during a fuzz test of metadata directories
(xfs/1546), but the actual symptom was that reclaim hung up on the quota
inodes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
v2.1: consolidate the comments.
---
 fs/xfs/xfs_mount.c |   90 +++++++++++++++++++++++++---------------------------
 1 file changed, 44 insertions(+), 46 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 52370d0a3f43..1c97b155a8ee 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -634,6 +634,47 @@ xfs_check_summary_counts(
 	return xfs_initialize_perag_data(mp, mp->m_sb.sb_agcount);
 }
 
+/*
+ * Flush and reclaim dirty inodes in preparation for unmount. Inodes and
+ * internal inode structures can be sitting in the CIL and AIL at this point,
+ * so we need to unpin them, write them back and/or reclaim them before unmount
+ * can proceed.
+ *
+ * An inode cluster that has been freed can have its buffer still pinned in
+ * memory because the transaction is still sitting in a iclog. The stale inodes
+ * on that buffer will be pinned to the buffer until the transaction hits the
+ * disk and the callbacks run. Pushing the AIL will skip the stale inodes and
+ * may never see the pinned buffer, so nothing will push out the iclog and
+ * unpin the buffer.
+ *
+ * Hence we need to force the log to unpin everything first. However, log
+ * forces don't wait for the discards they issue to complete, so we have to
+ * explicitly wait for them to complete here as well.
+ *
+ * Then we can tell the world we are unmounting so that error handling knows
+ * that the filesystem is going away and we should error out anything that we
+ * have been retrying in the background.  This will prevent never-ending
+ * retries in AIL pushing from hanging the unmount.
+ *
+ * Finally, we can push the AIL to clean all the remaining dirty objects, then
+ * reclaim the remaining inodes that are still in memory at this point in time.
+ */
+static void
+xfs_unmount_flush_inodes(
+	struct xfs_mount	*mp)
+{
+	xfs_log_force(mp, XFS_LOG_SYNC);
+	xfs_extent_busy_wait_all(mp);
+	flush_workqueue(xfs_discard_wq);
+
+	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
+
+	xfs_ail_push_all_sync(mp->m_ail);
+	cancel_delayed_work_sync(&mp->m_reclaim_work);
+	xfs_reclaim_inodes(mp);
+	xfs_health_unmount(mp);
+}
+
 /*
  * This function does the following on an initial mount of a file system:
  *	- reads the superblock from disk and init the mount struct
@@ -1008,7 +1049,7 @@ xfs_mountfs(
 	/* Clean out dquots that might be in memory after quotacheck. */
 	xfs_qm_unmount(mp);
 	/*
-	 * Cancel all delayed reclaim work and reclaim the inodes directly.
+	 * Flush all inode reclamation work and flush the log.
 	 * We have to do this /after/ rtunmount and qm_unmount because those
 	 * two will have scheduled delayed reclaim for the rt/quota inodes.
 	 *
@@ -1018,11 +1059,8 @@ xfs_mountfs(
 	 * qm_unmount_quotas and therefore rely on qm_unmount to release the
 	 * quota inodes.
 	 */
-	cancel_delayed_work_sync(&mp->m_reclaim_work);
-	xfs_reclaim_inodes(mp);
-	xfs_health_unmount(mp);
+	xfs_unmount_flush_inodes(mp);
  out_log_dealloc:
-	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
 	xfs_log_mount_cancel(mp);
  out_fail_wait:
 	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
@@ -1063,47 +1101,7 @@ xfs_unmountfs(
 	xfs_rtunmount_inodes(mp);
 	xfs_irele(mp->m_rootip);
 
-	/*
-	 * We can potentially deadlock here if we have an inode cluster
-	 * that has been freed has its buffer still pinned in memory because
-	 * the transaction is still sitting in a iclog. The stale inodes
-	 * on that buffer will be pinned to the buffer until the
-	 * transaction hits the disk and the callbacks run. Pushing the AIL will
-	 * skip the stale inodes and may never see the pinned buffer, so
-	 * nothing will push out the iclog and unpin the buffer. Hence we
-	 * need to force the log here to ensure all items are flushed into the
-	 * AIL before we go any further.
-	 */
-	xfs_log_force(mp, XFS_LOG_SYNC);
-
-	/*
-	 * Wait for all busy extents to be freed, including completion of
-	 * any discard operation.
-	 */
-	xfs_extent_busy_wait_all(mp);
-	flush_workqueue(xfs_discard_wq);
-
-	/*
-	 * We now need to tell the world we are unmounting. This will allow
-	 * us to detect that the filesystem is going away and we should error
-	 * out anything that we have been retrying in the background. This will
-	 * prevent neverending retries in AIL pushing from hanging the unmount.
-	 */
-	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
-
-	/*
-	 * Flush all pending changes from the AIL.
-	 */
-	xfs_ail_push_all_sync(mp->m_ail);
-
-	/*
-	 * Reclaim all inodes. At this point there should be no dirty inodes and
-	 * none should be pinned or locked. Stop background inode reclaim here
-	 * if it is still running.
-	 */
-	cancel_delayed_work_sync(&mp->m_reclaim_work);
-	xfs_reclaim_inodes(mp);
-	xfs_health_unmount(mp);
+	xfs_unmount_flush_inodes(mp);
 
 	xfs_qm_unmount(mp);
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] xfs: force log and push AIL to clear pinned inodes when aborting mount
  2021-03-07 20:25 ` [PATCH 3/4] xfs: force log and push AIL to clear pinned inodes when aborting mount Darrick J. Wong
  2021-03-07 23:01   ` Dave Chinner
  2021-03-08  4:48   ` [PATCH v2.1 " Darrick J. Wong
@ 2021-03-08  7:53   ` Christoph Hellwig
  2 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2021-03-08  7:53 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, dchinner, christian.brauner

On Sun, Mar 07, 2021 at 12:25:57PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> If we allocate quota inodes in the process of mounting a filesystem but
> then decide to abort the mount, it's possible that the quota inodes are
> sitting around pinned by the log.  Now that inode reclaim relies on the
> AIL to flush inodes, we have to force the log and push the AIL in
> between releasing the quota inodes and kicking off reclaim to tear down
> all the incore inodes.  Do this by extracting the bits we need from the
> unmount path and reusing them.
> 
> This was originally found during a fuzz test of metadata directories
> (xfs/1546), but the actual symptom was that reclaim hung up on the quota
> inodes.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] xfs: drop freeze protection when running GETFSMAP
  2021-03-07 20:26 ` [PATCH 4/4] xfs: drop freeze protection when running GETFSMAP Darrick J. Wong
  2021-03-07 23:05   ` Dave Chinner
@ 2021-03-08  7:55   ` Christoph Hellwig
  1 sibling, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2021-03-08  7:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, dchinner, christian.brauner

On Sun, Mar 07, 2021 at 12:26:02PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> A recent log refactoring patchset from Brian Foster relaxed fsfreeze
> behavior with regards to the buffer cache -- now freeze only waits for
> pending buffer IO to finish, and does not try to drain the buffer cache
> LRU.  As a result, fsfreeze should no longer stall indefinitely while
> fsmap runs.  Drop the sb_start_write calls around fsmap invocations.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2.1 2/4] xfs: avoid buffer deadlocks when walking fs inodes
  2021-03-07 20:36   ` [PATCH v2.1 " Darrick J. Wong
  2021-03-07 22:37     ` Dave Chinner
@ 2021-03-08  7:56     ` Christoph Hellwig
  1 sibling, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2021-03-08  7:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, dchinner, christian.brauner

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2.1 3/4] xfs: force log and push AIL to clear pinned inodes when aborting mount
  2021-03-08  4:48   ` [PATCH v2.1 " Darrick J. Wong
@ 2021-03-08  9:16     ` Christoph Hellwig
  2021-03-08 23:42     ` Dave Chinner
  1 sibling, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2021-03-08  9:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, dchinner, christian.brauner

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2.1 3/4] xfs: force log and push AIL to clear pinned inodes when aborting mount
  2021-03-08  4:48   ` [PATCH v2.1 " Darrick J. Wong
  2021-03-08  9:16     ` Christoph Hellwig
@ 2021-03-08 23:42     ` Dave Chinner
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2021-03-08 23:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, dchinner, christian.brauner

On Sun, Mar 07, 2021 at 08:48:37PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> If we allocate quota inodes in the process of mounting a filesystem but
> then decide to abort the mount, it's possible that the quota inodes are
> sitting around pinned by the log.  Now that inode reclaim relies on the
> AIL to flush inodes, we have to force the log and push the AIL in
> between releasing the quota inodes and kicking off reclaim to tear down
> all the incore inodes.  Do this by extracting the bits we need from the
> unmount path and reusing them; as an added bonus, failed writes during
> a failed mount will not retry forever.
> 
> This was originally found during a fuzz test of metadata directories
> (xfs/1546), but the actual symptom was that reclaim hung up on the quota
> inodes.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> v2.1: consolidate the comments.
> ---
>  fs/xfs/xfs_mount.c |   90 +++++++++++++++++++++++++---------------------------
>  1 file changed, 44 insertions(+), 46 deletions(-)

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-03-08 23:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-07 20:25 [PATCHSET v2 0/4] xfs: small fixes for 5.12 Darrick J. Wong
2021-03-07 20:25 ` [PATCH 1/4] xfs: fix quota accounting when a mount is idmapped Darrick J. Wong
2021-03-07 22:28   ` Dave Chinner
2021-03-07 20:25 ` [PATCH 2/4] xfs: avoid buffer deadlocks when walking fs inodes Darrick J. Wong
2021-03-07 20:36   ` [PATCH v2.1 " Darrick J. Wong
2021-03-07 22:37     ` Dave Chinner
2021-03-08  3:56       ` Darrick J. Wong
2021-03-08  7:56     ` Christoph Hellwig
2021-03-07 20:25 ` [PATCH 3/4] xfs: force log and push AIL to clear pinned inodes when aborting mount Darrick J. Wong
2021-03-07 23:01   ` Dave Chinner
2021-03-08  4:47     ` Darrick J. Wong
2021-03-08  4:48   ` [PATCH v2.1 " Darrick J. Wong
2021-03-08  9:16     ` Christoph Hellwig
2021-03-08 23:42     ` Dave Chinner
2021-03-08  7:53   ` [PATCH " Christoph Hellwig
2021-03-07 20:26 ` [PATCH 4/4] xfs: drop freeze protection when running GETFSMAP Darrick J. Wong
2021-03-07 23:05   ` Dave Chinner
2021-03-08  4:45     ` Darrick J. Wong
2021-03-08  7:55   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.