linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v2 0/7] xfs: consolidate posteof and cowblocks cleanup
@ 2021-01-11 23:23 Darrick J. Wong
  2021-01-11 23:23 ` [PATCH 1/7] xfs: increase the default parallelism levels of pwork clients Darrick J. Wong
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:23 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

Hi all,

Currently, we treat the garbage collection of post-EOF preallocations
and copy-on-write preallocations as totally separate tasks -- different
incore inode tags, different workqueues, etc.  This is wasteful of radix
tree tags and workqueue resources since we effectively have parallel
code paths to do the same thing.

Therefore, consolidate both functions under one radix tree bit and one
workqueue function that scans an inode for both things at the same time.
At the end of the series we make the scanning per-AG instead of per-fs
so that the scanning can run in parallel.  This reduces locking
contention from background threads and will free up a radix tree bit for
the deferred inode inactivation series.

v2: clean up and rebase against 5.11.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=eofblocks-consolidation-5.12
---
 fs/xfs/scrub/common.c  |    4 -
 fs/xfs/xfs_bmap_util.c |  109 +++++++++-------------
 fs/xfs/xfs_buf.c       |   34 +++++++
 fs/xfs/xfs_buf.h       |    1 
 fs/xfs/xfs_globals.c   |    7 +
 fs/xfs/xfs_icache.c    |  240 +++++++++++++++++++++++-------------------------
 fs/xfs/xfs_icache.h    |   13 +--
 fs/xfs/xfs_inode.c     |   36 +++++++
 fs/xfs/xfs_inode.h     |    2 
 fs/xfs/xfs_iwalk.c     |    2 
 fs/xfs/xfs_linux.h     |    3 -
 fs/xfs/xfs_mount.c     |   44 +++++++++
 fs/xfs/xfs_mount.h     |   10 +-
 fs/xfs/xfs_pwork.c     |   17 +--
 fs/xfs/xfs_pwork.h     |    2 
 fs/xfs/xfs_super.c     |   43 ++++++---
 fs/xfs/xfs_sysctl.c    |   15 +--
 fs/xfs/xfs_sysctl.h    |    3 -
 fs/xfs/xfs_trace.h     |    6 -
 19 files changed, 341 insertions(+), 250 deletions(-)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/7] xfs: increase the default parallelism levels of pwork clients
  2021-01-11 23:23 [PATCHSET v2 0/7] xfs: consolidate posteof and cowblocks cleanup Darrick J. Wong
@ 2021-01-11 23:23 ` Darrick J. Wong
  2021-01-13 14:49   ` Christoph Hellwig
  2021-01-11 23:23 ` [PATCH 2/7] xfs: refactor the predicate part of xfs_free_eofblocks Darrick J. Wong
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:23 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Increase the default parallelism level for pwork clients so that we can
take advantage of computers with a lot of CPUs and a lot of hardware.
The posteof/cowblocks cleanup series will use the functionality
presented in this patch to constrain the number of background per-ag gc
threads to our best estimate of the amount of parallelism that the
filesystem can sustain.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_buf.c   |   34 ++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_buf.h   |    1 +
 fs/xfs/xfs_iwalk.c |    2 +-
 fs/xfs/xfs_mount.c |   39 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_mount.h |    1 +
 fs/xfs/xfs_pwork.c |   17 +++++------------
 fs/xfs/xfs_pwork.h |    2 +-
 7 files changed, 82 insertions(+), 14 deletions(-)


diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index f8400bbd6473..10d05c4522c9 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2384,3 +2384,37 @@ xfs_verify_magic16(
 		return false;
 	return dmagic == bp->b_ops->magic16[idx];
 }
+
+/* Estimate the amount of parallelism available for a given device. */
+unsigned int
+xfs_buftarg_guess_threads(
+	struct xfs_buftarg	*btp)
+{
+	int			iomin;
+	int			ioopt;
+
+	/*
+	 * The device tells us that it is non-rotational, and we take that to
+	 * mean there are no moving parts and that the device can handle all
+	 * the CPUs throwing IO requests at it.
+	 */
+	if (blk_queue_nonrot(btp->bt_bdev->bd_disk->queue))
+		return num_online_cpus();
+
+	/*
+	 * The device has a preferred and minimum IO size that suggest a RAID
+	 * setup, so infer the number of disks and assume that the parallelism
+	 * is equal to the disk count.
+	 */
+	iomin = bdev_io_min(btp->bt_bdev);
+	ioopt = bdev_io_opt(btp->bt_bdev);
+	if (iomin > 0 && ioopt > iomin)
+		return ioopt / iomin;
+
+	/*
+	 * The device did not indicate that it has any capabilities beyond that
+	 * of a rotating disk with a single drive head, so we estimate no
+	 * parallelism at all.
+	 */
+	return 1;
+}
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 5d91a31298a4..fb0e0d89962c 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -349,6 +349,7 @@ extern xfs_buftarg_t *xfs_alloc_buftarg(struct xfs_mount *,
 extern void xfs_free_buftarg(struct xfs_buftarg *);
 extern void xfs_wait_buftarg(xfs_buftarg_t *);
 extern int xfs_setsize_buftarg(xfs_buftarg_t *, unsigned int);
+unsigned int xfs_buftarg_guess_threads(struct xfs_buftarg *btp);
 
 #define xfs_getsize_buftarg(buftarg)	block_size((buftarg)->bt_bdev)
 #define xfs_readonly_buftarg(buftarg)	bdev_read_only((buftarg)->bt_bdev)
diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index eae3aff9bc97..2ab07d58c901 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -624,7 +624,7 @@ xfs_iwalk_threaded(
 	ASSERT(agno < mp->m_sb.sb_agcount);
 	ASSERT(!(flags & ~XFS_IWALK_FLAGS_ALL));
 
-	nr_threads = xfs_pwork_guess_datadev_parallelism(mp);
+	nr_threads = xfs_pwork_guess_threads(mp);
 	error = xfs_pwork_init(mp, &pctl, xfs_iwalk_ag_work, "xfs_iwalk",
 			nr_threads);
 	if (error)
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 7110507a2b6b..1e974106e58c 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1358,3 +1358,42 @@ xfs_mod_delalloc(
 	percpu_counter_add_batch(&mp->m_delalloc_blks, delta,
 			XFS_DELALLOC_BATCH);
 }
+
+/*
+ * Estimate the amount of parallelism that is available for metadata operations
+ * on this filesystem.
+ */
+unsigned int
+xfs_guess_metadata_threads(
+	struct xfs_mount	*mp)
+{
+	unsigned int		threads;
+
+	/*
+	 * Estimate the amount of parallelism for metadata operations from the
+	 * least capable of the two devices that handle metadata.  Cap that
+	 * estimate to the number of AGs to avoid unnecessary lock contention.
+	 */
+	threads = xfs_buftarg_guess_threads(mp->m_ddev_targp);
+	if (mp->m_logdev_targp != mp->m_ddev_targp)
+		threads = min(xfs_buftarg_guess_threads(mp->m_logdev_targp),
+			      threads);
+	threads = min(mp->m_sb.sb_agcount, threads);
+
+	/* If the storage told us it has fancy capabilities, we're done. */
+	if (threads > 1)
+		goto clamp;
+
+	/*
+	 * Metadata storage did not even hint that it has any parallel
+	 * capability.  If the filesystem was formatted with a stripe unit and
+	 * width, we'll treat that as evidence of a RAID setup and estimate
+	 * the number of disks.
+	 */
+	if (mp->m_sb.sb_unit > 0 && mp->m_sb.sb_width > mp->m_sb.sb_unit)
+		threads = mp->m_sb.sb_width / mp->m_sb.sb_unit;
+
+clamp:
+	/* Don't return an estimate larger than the CPU count. */
+	return min(num_online_cpus(), threads);
+}
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index dfa429b77ee2..70f6c68c795f 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -426,5 +426,6 @@ struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp,
 		int error_class, int error);
 void xfs_force_summary_recalc(struct xfs_mount *mp);
 void xfs_mod_delalloc(struct xfs_mount *mp, int64_t delta);
+unsigned int xfs_guess_metadata_threads(struct xfs_mount *mp);
 
 #endif	/* __XFS_MOUNT_H__ */
diff --git a/fs/xfs/xfs_pwork.c b/fs/xfs/xfs_pwork.c
index b03333f1c84a..5f1a5e575a48 100644
--- a/fs/xfs/xfs_pwork.c
+++ b/fs/xfs/xfs_pwork.c
@@ -118,19 +118,12 @@ xfs_pwork_poll(
 		touch_softlockup_watchdog();
 }
 
-/*
- * Return the amount of parallelism that the data device can handle, or 0 for
- * no limit.
- */
+/* Estimate how many threads we need for a parallel work queue. */
 unsigned int
-xfs_pwork_guess_datadev_parallelism(
+xfs_pwork_guess_threads(
 	struct xfs_mount	*mp)
 {
-	struct xfs_buftarg	*btp = mp->m_ddev_targp;
-
-	/*
-	 * For now we'll go with the most conservative setting possible,
-	 * which is two threads for an SSD and 1 thread everywhere else.
-	 */
-	return blk_queue_nonrot(btp->bt_bdev->bd_disk->queue) ? 2 : 1;
+	/* pwork queues are not unbounded, so we have to abide WQ_MAX_ACTIVE. */
+	return min_t(unsigned int, xfs_guess_metadata_threads(mp),
+			WQ_MAX_ACTIVE);
 }
diff --git a/fs/xfs/xfs_pwork.h b/fs/xfs/xfs_pwork.h
index 8133124cf3bb..f402920f7061 100644
--- a/fs/xfs/xfs_pwork.h
+++ b/fs/xfs/xfs_pwork.h
@@ -56,6 +56,6 @@ int xfs_pwork_init(struct xfs_mount *mp, struct xfs_pwork_ctl *pctl,
 void xfs_pwork_queue(struct xfs_pwork_ctl *pctl, struct xfs_pwork *pwork);
 int xfs_pwork_destroy(struct xfs_pwork_ctl *pctl);
 void xfs_pwork_poll(struct xfs_pwork_ctl *pctl);
-unsigned int xfs_pwork_guess_datadev_parallelism(struct xfs_mount *mp);
+unsigned int xfs_pwork_guess_threads(struct xfs_mount *mp);
 
 #endif /* __XFS_PWORK_H__ */


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/7] xfs: refactor the predicate part of xfs_free_eofblocks
  2021-01-11 23:23 [PATCHSET v2 0/7] xfs: consolidate posteof and cowblocks cleanup Darrick J. Wong
  2021-01-11 23:23 ` [PATCH 1/7] xfs: increase the default parallelism levels of pwork clients Darrick J. Wong
@ 2021-01-11 23:23 ` Darrick J. Wong
  2021-01-13 14:57   ` Christoph Hellwig
  2021-01-11 23:23 ` [PATCH 3/7] xfs: consolidate incore inode radix tree posteof/cowblocks tags Darrick J. Wong
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:23 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Refactor the part of _free_eofblocks that decides if it's really going
to truncate post-EOF blocks into a separate helper function.  The
upcoming deferred inode inactivation patch requires us to be able to
decide this prior to actual inactivation.  No functionality changes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_bmap_util.c |  109 +++++++++++++++++++++---------------------------
 fs/xfs/xfs_inode.c     |   36 ++++++++++++++++
 fs/xfs/xfs_inode.h     |    2 +
 3 files changed, 85 insertions(+), 62 deletions(-)


diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 602011f06fb6..16e996ae0ff3 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -637,78 +637,63 @@ xfs_free_eofblocks(
 	struct xfs_inode	*ip)
 {
 	struct xfs_trans	*tp;
-	int			error;
-	xfs_fileoff_t		end_fsb;
-	xfs_fileoff_t		last_fsb;
-	xfs_filblks_t		map_len;
-	int			nimaps;
-	struct xfs_bmbt_irec	imap;
 	struct xfs_mount	*mp = ip->i_mount;
-
-	/*
-	 * Figure out if there are any blocks beyond the end
-	 * of the file.  If not, then there is nothing to do.
-	 */
-	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
-	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
-	if (last_fsb <= end_fsb)
-		return 0;
-	map_len = last_fsb - end_fsb;
-
-	nimaps = 1;
-	xfs_ilock(ip, XFS_ILOCK_SHARED);
-	error = xfs_bmapi_read(ip, end_fsb, map_len, &imap, &nimaps, 0);
-	xfs_iunlock(ip, XFS_ILOCK_SHARED);
+	bool			has;
+	int			error;
 
 	/*
 	 * If there are blocks after the end of file, truncate the file to its
 	 * current size to free them up.
 	 */
-	if (!error && (nimaps != 0) &&
-	    (imap.br_startblock != HOLESTARTBLOCK ||
-	     ip->i_delayed_blks)) {
-		/*
-		 * Attach the dquots to the inode up front.
-		 */
-		error = xfs_qm_dqattach(ip);
-		if (error)
-			return error;
+	error = xfs_has_eofblocks(ip, &has);
+	if (error || !has)
+		return error;
 
-		/* wait on dio to ensure i_size has settled */
-		inode_dio_wait(VFS_I(ip));
+	/*
+	 * Attach the dquots to the inode up front.
+	 */
+	error = xfs_qm_dqattach(ip);
+	if (error)
+		return error;
 
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0,
-				&tp);
-		if (error) {
-			ASSERT(XFS_FORCED_SHUTDOWN(mp));
-			return error;
-		}
+	/* wait on dio to ensure i_size has settled */
+	inode_dio_wait(VFS_I(ip));
 
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
-		xfs_trans_ijoin(tp, ip, 0);
-
-		/*
-		 * Do not update the on-disk file size.  If we update the
-		 * on-disk file size and then the system crashes before the
-		 * contents of the file are flushed to disk then the files
-		 * may be full of holes (ie NULL files bug).
-		 */
-		error = xfs_itruncate_extents_flags(&tp, ip, XFS_DATA_FORK,
-					XFS_ISIZE(ip), XFS_BMAPI_NODISCARD);
-		if (error) {
-			/*
-			 * If we get an error at this point we simply don't
-			 * bother truncating the file.
-			 */
-			xfs_trans_cancel(tp);
-		} else {
-			error = xfs_trans_commit(tp);
-			if (!error)
-				xfs_inode_clear_eofblocks_tag(ip);
-		}
-
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
+	if (error) {
+		ASSERT(XFS_FORCED_SHUTDOWN(mp));
+		return error;
 	}
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, 0);
+
+	/*
+	 * Do not update the on-disk file size.  If we update the
+	 * on-disk file size and then the system crashes before the
+	 * contents of the file are flushed to disk then the files
+	 * may be full of holes (ie NULL files bug).
+	 */
+	error = xfs_itruncate_extents_flags(&tp, ip, XFS_DATA_FORK,
+				XFS_ISIZE(ip), XFS_BMAPI_NODISCARD);
+	if (error)
+		goto err_cancel;
+
+	error = xfs_trans_commit(tp);
+	if (error)
+		goto out_unlock;
+
+	xfs_inode_clear_eofblocks_tag(ip);
+	goto out_unlock;
+
+err_cancel:
+	/*
+	 * If we get an error at this point we simply don't
+	 * bother truncating the file.
+	 */
+	xfs_trans_cancel(tp);
+out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 0e6cc33a33ad..d30b49cd60d7 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3711,3 +3711,39 @@ xfs_iunlock2_io_mmap(
 	if (!same_inode)
 		inode_unlock(VFS_I(ip1));
 }
+
+/*
+ * Decide if this inode have post-EOF blocks.  The caller is responsible
+ * for knowing / caring about the PREALLOC/APPEND flags.
+ */
+int
+xfs_has_eofblocks(
+	struct xfs_inode	*ip,
+	bool			*has)
+{
+	struct xfs_bmbt_irec	imap;
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_fileoff_t		end_fsb;
+	xfs_fileoff_t		last_fsb;
+	xfs_filblks_t		map_len;
+	int			nimaps;
+	int			error;
+
+	*has = false;
+	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
+	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
+	if (last_fsb <= end_fsb)
+		return 0;
+	map_len = last_fsb - end_fsb;
+
+	nimaps = 1;
+	xfs_ilock(ip, XFS_ILOCK_SHARED);
+	error = xfs_bmapi_read(ip, end_fsb, map_len, &imap, &nimaps, 0);
+	xfs_iunlock(ip, XFS_ILOCK_SHARED);
+
+	if (error || nimaps == 0)
+		return error;
+
+	*has = imap.br_startblock != HOLESTARTBLOCK || ip->i_delayed_blks;
+	return 0;
+}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index eca333f5f715..5295791be4e2 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -468,6 +468,8 @@ extern struct kmem_zone	*xfs_inode_zone;
 /* The default CoW extent size hint. */
 #define XFS_DEFAULT_COWEXTSZ_HINT 32
 
+int xfs_has_eofblocks(struct xfs_inode *ip, bool *has);
+
 int xfs_iunlink_init(struct xfs_perag *pag);
 void xfs_iunlink_destroy(struct xfs_perag *pag);
 


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/7] xfs: consolidate incore inode radix tree posteof/cowblocks tags
  2021-01-11 23:23 [PATCHSET v2 0/7] xfs: consolidate posteof and cowblocks cleanup Darrick J. Wong
  2021-01-11 23:23 ` [PATCH 1/7] xfs: increase the default parallelism levels of pwork clients Darrick J. Wong
  2021-01-11 23:23 ` [PATCH 2/7] xfs: refactor the predicate part of xfs_free_eofblocks Darrick J. Wong
@ 2021-01-11 23:23 ` Darrick J. Wong
  2021-01-13 14:59   ` Christoph Hellwig
  2021-01-11 23:23 ` [PATCH 4/7] xfs: consolidate the eofblocks and cowblocks workers Darrick J. Wong
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:23 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The clearing of posteof blocks and cowblocks serve the same purpose:
removing speculative block preallocations from inactive files.  We don't
need to burn two radix tree tags on this, so combine them into one.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c |  104 +++++++++++++++++++++++++--------------------------
 fs/xfs/xfs_icache.h |    4 +-
 fs/xfs/xfs_trace.h  |    6 +--
 3 files changed, 54 insertions(+), 60 deletions(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index c3867b25e362..da833f6e1fd9 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -943,7 +943,7 @@ xfs_queue_eofblocks(
 	struct xfs_mount *mp)
 {
 	rcu_read_lock();
-	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_EOFBLOCKS_TAG))
+	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_BLOCK_GC_TAG))
 		queue_delayed_work(mp->m_eofblocks_workqueue,
 				   &mp->m_eofblocks_work,
 				   msecs_to_jiffies(xfs_eofb_secs * 1000));
@@ -990,7 +990,7 @@ xfs_queue_cowblocks(
 	struct xfs_mount *mp)
 {
 	rcu_read_lock();
-	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_COWBLOCKS_TAG))
+	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_BLOCK_GC_TAG))
 		queue_delayed_work(mp->m_eofblocks_workqueue,
 				   &mp->m_cowblocks_work,
 				   msecs_to_jiffies(xfs_cowb_secs * 1000));
@@ -1388,6 +1388,9 @@ xfs_inode_free_eofblocks(
 
 	wait = eofb && (eofb->eof_flags & XFS_EOF_FLAGS_SYNC);
 
+	if (!xfs_iflags_test(ip, XFS_IEOFBLOCKS))
+		return 0;
+
 	if (!xfs_can_free_eofblocks(ip, false)) {
 		/* inode could be preallocated or append-only */
 		trace_xfs_inode_free_eofblocks_invalid(ip);
@@ -1427,7 +1430,7 @@ xfs_icache_free_eofblocks(
 	struct xfs_eofblocks	*eofb)
 {
 	return __xfs_inode_walk(mp, 0, xfs_inode_free_eofblocks, eofb,
-			XFS_ICI_EOFBLOCKS_TAG);
+			XFS_ICI_BLOCK_GC_TAG);
 }
 
 /*
@@ -1506,61 +1509,48 @@ xfs_inode_free_blocks(
 	return xfs_blockgc_scan(mp, &eofb);
 }
 
-static inline unsigned long
-xfs_iflag_for_tag(
-	int		tag)
-{
-	switch (tag) {
-	case XFS_ICI_EOFBLOCKS_TAG:
-		return XFS_IEOFBLOCKS;
-	case XFS_ICI_COWBLOCKS_TAG:
-		return XFS_ICOWBLOCKS;
-	default:
-		ASSERT(0);
-		return 0;
-	}
-}
-
 static void
 __xfs_inode_set_blocks_tag(
-	xfs_inode_t	*ip,
-	void		(*execute)(struct xfs_mount *mp),
-	void		(*set_tp)(struct xfs_mount *mp, xfs_agnumber_t agno,
-				  int error, unsigned long caller_ip),
-	int		tag)
+	struct xfs_inode	*ip,
+	void			(*execute)(struct xfs_mount *mp),
+	unsigned long		iflag)
 {
-	struct xfs_mount *mp = ip->i_mount;
-	struct xfs_perag *pag;
-	int tagged;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_perag	*pag;
+	int			tagged;
+
+	ASSERT((iflag & ~(XFS_IEOFBLOCKS | XFS_ICOWBLOCKS)) == 0);
 
 	/*
 	 * Don't bother locking the AG and looking up in the radix trees
 	 * if we already know that we have the tag set.
 	 */
-	if (ip->i_flags & xfs_iflag_for_tag(tag))
+	if (ip->i_flags & iflag)
 		return;
 	spin_lock(&ip->i_flags_lock);
-	ip->i_flags |= xfs_iflag_for_tag(tag);
+	ip->i_flags |= iflag;
 	spin_unlock(&ip->i_flags_lock);
 
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
 	spin_lock(&pag->pag_ici_lock);
 
-	tagged = radix_tree_tagged(&pag->pag_ici_root, tag);
+	tagged = radix_tree_tagged(&pag->pag_ici_root, XFS_ICI_BLOCK_GC_TAG);
 	radix_tree_tag_set(&pag->pag_ici_root,
-			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino), tag);
+			   XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+			   XFS_ICI_BLOCK_GC_TAG);
 	if (!tagged) {
-		/* propagate the eofblocks tag up into the perag radix tree */
+		/* propagate the blockgc tag up into the perag radix tree */
 		spin_lock(&ip->i_mount->m_perag_lock);
 		radix_tree_tag_set(&ip->i_mount->m_perag_tree,
 				   XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
-				   tag);
+				   XFS_ICI_BLOCK_GC_TAG);
 		spin_unlock(&ip->i_mount->m_perag_lock);
 
 		/* kick off background trimming */
 		execute(ip->i_mount);
 
-		set_tp(ip->i_mount, pag->pag_agno, -1, _RET_IP_);
+		trace_xfs_perag_set_blockgc(ip->i_mount, pag->pag_agno, -1,
+				_RET_IP_);
 	}
 
 	spin_unlock(&pag->pag_ici_lock);
@@ -1573,37 +1563,43 @@ xfs_inode_set_eofblocks_tag(
 {
 	trace_xfs_inode_set_eofblocks_tag(ip);
 	return __xfs_inode_set_blocks_tag(ip, xfs_queue_eofblocks,
-			trace_xfs_perag_set_eofblocks,
-			XFS_ICI_EOFBLOCKS_TAG);
+			XFS_IEOFBLOCKS);
 }
 
 static void
 __xfs_inode_clear_blocks_tag(
-	xfs_inode_t	*ip,
-	void		(*clear_tp)(struct xfs_mount *mp, xfs_agnumber_t agno,
-				    int error, unsigned long caller_ip),
-	int		tag)
+	struct xfs_inode	*ip,
+	unsigned long		iflag)
 {
-	struct xfs_mount *mp = ip->i_mount;
-	struct xfs_perag *pag;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_perag	*pag;
+	bool			clear_tag;
+
+	ASSERT((iflag & ~(XFS_IEOFBLOCKS | XFS_ICOWBLOCKS)) == 0);
 
 	spin_lock(&ip->i_flags_lock);
-	ip->i_flags &= ~xfs_iflag_for_tag(tag);
+	ip->i_flags &= ~iflag;
+	clear_tag = (ip->i_flags & (XFS_IEOFBLOCKS | XFS_ICOWBLOCKS)) == 0;
 	spin_unlock(&ip->i_flags_lock);
 
+	if (!clear_tag)
+		return;
+
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
 	spin_lock(&pag->pag_ici_lock);
 
 	radix_tree_tag_clear(&pag->pag_ici_root,
-			     XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino), tag);
-	if (!radix_tree_tagged(&pag->pag_ici_root, tag)) {
-		/* clear the eofblocks tag from the perag radix tree */
+			     XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino),
+			     XFS_ICI_BLOCK_GC_TAG);
+	if (!radix_tree_tagged(&pag->pag_ici_root, XFS_ICI_BLOCK_GC_TAG)) {
+		/* clear the blockgc tag from the perag radix tree */
 		spin_lock(&ip->i_mount->m_perag_lock);
 		radix_tree_tag_clear(&ip->i_mount->m_perag_tree,
 				     XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
-				     tag);
+				     XFS_ICI_BLOCK_GC_TAG);
 		spin_unlock(&ip->i_mount->m_perag_lock);
-		clear_tp(ip->i_mount, pag->pag_agno, -1, _RET_IP_);
+		trace_xfs_perag_clear_blockgc(ip->i_mount, pag->pag_agno, -1,
+				_RET_IP_);
 	}
 
 	spin_unlock(&pag->pag_ici_lock);
@@ -1615,8 +1611,7 @@ xfs_inode_clear_eofblocks_tag(
 	xfs_inode_t	*ip)
 {
 	trace_xfs_inode_clear_eofblocks_tag(ip);
-	return __xfs_inode_clear_blocks_tag(ip,
-			trace_xfs_perag_clear_eofblocks, XFS_ICI_EOFBLOCKS_TAG);
+	return __xfs_inode_clear_blocks_tag(ip, XFS_IEOFBLOCKS);
 }
 
 /*
@@ -1674,6 +1669,9 @@ xfs_inode_free_cowblocks(
 
 	wait = eofb && (eofb->eof_flags & XFS_EOF_FLAGS_SYNC);
 
+	if (!xfs_iflags_test(ip, XFS_ICOWBLOCKS))
+		return 0;
+
 	if (!xfs_prep_free_cowblocks(ip))
 		return 0;
 
@@ -1715,7 +1713,7 @@ xfs_icache_free_cowblocks(
 	struct xfs_eofblocks	*eofb)
 {
 	return __xfs_inode_walk(mp, 0, xfs_inode_free_cowblocks, eofb,
-			XFS_ICI_COWBLOCKS_TAG);
+			XFS_ICI_BLOCK_GC_TAG);
 }
 
 void
@@ -1724,8 +1722,7 @@ xfs_inode_set_cowblocks_tag(
 {
 	trace_xfs_inode_set_cowblocks_tag(ip);
 	return __xfs_inode_set_blocks_tag(ip, xfs_queue_cowblocks,
-			trace_xfs_perag_set_cowblocks,
-			XFS_ICI_COWBLOCKS_TAG);
+			XFS_ICOWBLOCKS);
 }
 
 void
@@ -1733,8 +1730,7 @@ xfs_inode_clear_cowblocks_tag(
 	xfs_inode_t	*ip)
 {
 	trace_xfs_inode_clear_cowblocks_tag(ip);
-	return __xfs_inode_clear_blocks_tag(ip,
-			trace_xfs_perag_clear_cowblocks, XFS_ICI_COWBLOCKS_TAG);
+	return __xfs_inode_clear_blocks_tag(ip, XFS_ICOWBLOCKS);
 }
 
 /* Disable post-EOF and CoW block auto-reclamation. */
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 2f230d273a54..fcee10dbfbe9 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -23,8 +23,8 @@ struct xfs_eofblocks {
 #define XFS_ICI_NO_TAG		(-1)	/* special flag for an untagged lookup
 					   in xfs_inode_walk */
 #define XFS_ICI_RECLAIM_TAG	0	/* inode is to be reclaimed */
-#define XFS_ICI_EOFBLOCKS_TAG	1	/* inode has blocks beyond EOF */
-#define XFS_ICI_COWBLOCKS_TAG	2	/* inode can have cow blocks to gc */
+/* Inode has speculative preallocations (posteof or cow) to clean. */
+#define XFS_ICI_BLOCK_GC_TAG	1
 
 /*
  * Flags for xfs_iget()
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 9a29a5e18711..6bdd3eee2462 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -155,10 +155,8 @@ DEFINE_PERAG_REF_EVENT(xfs_perag_get_tag);
 DEFINE_PERAG_REF_EVENT(xfs_perag_put);
 DEFINE_PERAG_REF_EVENT(xfs_perag_set_reclaim);
 DEFINE_PERAG_REF_EVENT(xfs_perag_clear_reclaim);
-DEFINE_PERAG_REF_EVENT(xfs_perag_set_eofblocks);
-DEFINE_PERAG_REF_EVENT(xfs_perag_clear_eofblocks);
-DEFINE_PERAG_REF_EVENT(xfs_perag_set_cowblocks);
-DEFINE_PERAG_REF_EVENT(xfs_perag_clear_cowblocks);
+DEFINE_PERAG_REF_EVENT(xfs_perag_set_blockgc);
+DEFINE_PERAG_REF_EVENT(xfs_perag_clear_blockgc);
 
 DECLARE_EVENT_CLASS(xfs_ag_class,
 	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno),


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/7] xfs: consolidate the eofblocks and cowblocks workers
  2021-01-11 23:23 [PATCHSET v2 0/7] xfs: consolidate posteof and cowblocks cleanup Darrick J. Wong
                   ` (2 preceding siblings ...)
  2021-01-11 23:23 ` [PATCH 3/7] xfs: consolidate incore inode radix tree posteof/cowblocks tags Darrick J. Wong
@ 2021-01-11 23:23 ` Darrick J. Wong
  2021-01-13 15:04   ` Christoph Hellwig
  2021-01-11 23:23 ` [PATCH 5/7] xfs: only walk the incore inode tree once per blockgc scan Darrick J. Wong
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:23 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Remove the separate cowblocks work items and knob so that we can control
and run everything from a single blockgc work queue.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_globals.c |    7 ++--
 fs/xfs/xfs_icache.c  |   80 ++++++++++++++------------------------------------
 fs/xfs/xfs_icache.h  |    5 +--
 fs/xfs/xfs_linux.h   |    3 +-
 fs/xfs/xfs_mount.h   |    6 +---
 fs/xfs/xfs_super.c   |   11 +++----
 fs/xfs/xfs_sysctl.c  |   15 ++-------
 fs/xfs/xfs_sysctl.h  |    3 +-
 8 files changed, 39 insertions(+), 91 deletions(-)


diff --git a/fs/xfs/xfs_globals.c b/fs/xfs/xfs_globals.c
index fa55ab8b8d80..f62fa652c2fd 100644
--- a/fs/xfs/xfs_globals.c
+++ b/fs/xfs/xfs_globals.c
@@ -8,8 +8,8 @@
 /*
  * Tunable XFS parameters.  xfs_params is required even when CONFIG_SYSCTL=n,
  * other XFS code uses these values.  Times are measured in centisecs (i.e.
- * 100ths of a second) with the exception of eofb_timer and cowb_timer, which
- * are measured in seconds.
+ * 100ths of a second) with the exception of blockgc_timer, which is measured
+ * in seconds.
  */
 xfs_param_t xfs_params = {
 			  /*	MIN		DFLT		MAX	*/
@@ -28,8 +28,7 @@ xfs_param_t xfs_params = {
 	.rotorstep	= {	1,		1,		255	},
 	.inherit_nodfrg	= {	0,		1,		1	},
 	.fstrm_timer	= {	1,		30*100,		3600*100},
-	.eofb_timer	= {	1,		300,		3600*24},
-	.cowb_timer	= {	1,		1800,		3600*24},
+	.blockgc_timer	= {	1,		300,		3600*24},
 };
 
 struct xfs_globals xfs_globals = {
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index da833f6e1fd9..92fcec349054 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -935,18 +935,18 @@ xfs_inode_walk(
 }
 
 /*
- * Background scanning to trim post-EOF preallocated space. This is queued
- * based on the 'speculative_prealloc_lifetime' tunable (5m by default).
+ * Background scanning to trim preallocated space. This is queued based on the
+ * 'speculative_prealloc_lifetime' tunable (5m by default).
  */
-void
-xfs_queue_eofblocks(
-	struct xfs_mount *mp)
+static void
+xfs_queue_blockgc(
+	struct xfs_mount	*mp)
 {
 	rcu_read_lock();
 	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_BLOCK_GC_TAG))
-		queue_delayed_work(mp->m_eofblocks_workqueue,
-				   &mp->m_eofblocks_work,
-				   msecs_to_jiffies(xfs_eofb_secs * 1000));
+		queue_delayed_work(mp->m_blockgc_workqueue,
+				   &mp->m_blockgc_work,
+				   msecs_to_jiffies(xfs_blockgc_secs * 1000));
 	rcu_read_unlock();
 }
 
@@ -965,51 +965,22 @@ xfs_blockgc_scan(
 	return xfs_icache_free_cowblocks(mp, eofb);
 }
 
+/* Background worker that trims preallocated space. */
 void
-xfs_eofblocks_worker(
-	struct work_struct *work)
+xfs_blockgc_worker(
+	struct work_struct	*work)
 {
-	struct xfs_mount *mp = container_of(to_delayed_work(work),
-				struct xfs_mount, m_eofblocks_work);
+	struct xfs_mount	*mp = container_of(to_delayed_work(work),
+					struct xfs_mount, m_blockgc_work);
+	int			error;
 
 	if (!sb_start_write_trylock(mp->m_super))
 		return;
-	xfs_icache_free_eofblocks(mp, NULL);
+	error = xfs_blockgc_scan(mp, NULL);
+	if (error)
+		xfs_info(mp, "preallocation gc worker failed, err=%d", error);
 	sb_end_write(mp->m_super);
-
-	xfs_queue_eofblocks(mp);
-}
-
-/*
- * Background scanning to trim preallocated CoW space. This is queued
- * based on the 'speculative_cow_prealloc_lifetime' tunable (5m by default).
- * (We'll just piggyback on the post-EOF prealloc space workqueue.)
- */
-void
-xfs_queue_cowblocks(
-	struct xfs_mount *mp)
-{
-	rcu_read_lock();
-	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_BLOCK_GC_TAG))
-		queue_delayed_work(mp->m_eofblocks_workqueue,
-				   &mp->m_cowblocks_work,
-				   msecs_to_jiffies(xfs_cowb_secs * 1000));
-	rcu_read_unlock();
-}
-
-void
-xfs_cowblocks_worker(
-	struct work_struct *work)
-{
-	struct xfs_mount *mp = container_of(to_delayed_work(work),
-				struct xfs_mount, m_cowblocks_work);
-
-	if (!sb_start_write_trylock(mp->m_super))
-		return;
-	xfs_icache_free_cowblocks(mp, NULL);
-	sb_end_write(mp->m_super);
-
-	xfs_queue_cowblocks(mp);
+	xfs_queue_blockgc(mp);
 }
 
 /*
@@ -1512,7 +1483,6 @@ xfs_inode_free_blocks(
 static void
 __xfs_inode_set_blocks_tag(
 	struct xfs_inode	*ip,
-	void			(*execute)(struct xfs_mount *mp),
 	unsigned long		iflag)
 {
 	struct xfs_mount	*mp = ip->i_mount;
@@ -1547,7 +1517,7 @@ __xfs_inode_set_blocks_tag(
 		spin_unlock(&ip->i_mount->m_perag_lock);
 
 		/* kick off background trimming */
-		execute(ip->i_mount);
+		xfs_queue_blockgc(ip->i_mount);
 
 		trace_xfs_perag_set_blockgc(ip->i_mount, pag->pag_agno, -1,
 				_RET_IP_);
@@ -1562,8 +1532,7 @@ xfs_inode_set_eofblocks_tag(
 	xfs_inode_t	*ip)
 {
 	trace_xfs_inode_set_eofblocks_tag(ip);
-	return __xfs_inode_set_blocks_tag(ip, xfs_queue_eofblocks,
-			XFS_IEOFBLOCKS);
+	return __xfs_inode_set_blocks_tag(ip, XFS_IEOFBLOCKS);
 }
 
 static void
@@ -1721,8 +1690,7 @@ xfs_inode_set_cowblocks_tag(
 	xfs_inode_t	*ip)
 {
 	trace_xfs_inode_set_cowblocks_tag(ip);
-	return __xfs_inode_set_blocks_tag(ip, xfs_queue_cowblocks,
-			XFS_ICOWBLOCKS);
+	return __xfs_inode_set_blocks_tag(ip, XFS_ICOWBLOCKS);
 }
 
 void
@@ -1738,8 +1706,7 @@ void
 xfs_stop_block_reaping(
 	struct xfs_mount	*mp)
 {
-	cancel_delayed_work_sync(&mp->m_eofblocks_work);
-	cancel_delayed_work_sync(&mp->m_cowblocks_work);
+	cancel_delayed_work_sync(&mp->m_blockgc_work);
 }
 
 /* Enable post-EOF and CoW block auto-reclamation. */
@@ -1747,6 +1714,5 @@ void
 xfs_start_block_reaping(
 	struct xfs_mount	*mp)
 {
-	xfs_queue_eofblocks(mp);
-	xfs_queue_cowblocks(mp);
+	xfs_queue_blockgc(mp);
 }
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index fcee10dbfbe9..4ddb2c6de18b 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -55,14 +55,11 @@ int xfs_inode_free_blocks(struct xfs_mount *mp, bool sync);
 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
 int xfs_icache_free_eofblocks(struct xfs_mount *, struct xfs_eofblocks *);
-void xfs_eofblocks_worker(struct work_struct *);
-void xfs_queue_eofblocks(struct xfs_mount *);
+void xfs_blockgc_worker(struct work_struct *work);
 
 void xfs_inode_set_cowblocks_tag(struct xfs_inode *ip);
 void xfs_inode_clear_cowblocks_tag(struct xfs_inode *ip);
 int xfs_icache_free_cowblocks(struct xfs_mount *, struct xfs_eofblocks *);
-void xfs_cowblocks_worker(struct work_struct *);
-void xfs_queue_cowblocks(struct xfs_mount *);
 
 int xfs_inode_walk(struct xfs_mount *mp,
 	int (*execute)(struct xfs_inode *ip, void *args),
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 5b7a1e201559..af6be9b9ccdf 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -98,8 +98,7 @@ typedef __u32			xfs_nlink_t;
 #define xfs_rotorstep		xfs_params.rotorstep.val
 #define xfs_inherit_nodefrag	xfs_params.inherit_nodfrg.val
 #define xfs_fstrm_centisecs	xfs_params.fstrm_timer.val
-#define xfs_eofb_secs		xfs_params.eofb_timer.val
-#define xfs_cowb_secs		xfs_params.cowb_timer.val
+#define xfs_blockgc_secs	xfs_params.blockgc_timer.val
 
 #define current_cpu()		(raw_smp_processor_id())
 #define current_set_flags_nested(sp, f)		\
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 70f6c68c795f..d9f32102514e 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -93,7 +93,7 @@ typedef struct xfs_mount {
 	struct workqueue_struct	*m_unwritten_workqueue;
 	struct workqueue_struct	*m_cil_workqueue;
 	struct workqueue_struct	*m_reclaim_workqueue;
-	struct workqueue_struct *m_eofblocks_workqueue;
+	struct workqueue_struct *m_blockgc_workqueue;
 	struct workqueue_struct	*m_sync_workqueue;
 
 	int			m_bsize;	/* fs logical block size */
@@ -177,9 +177,7 @@ typedef struct xfs_mount {
 	uint64_t		m_resblks_avail;/* available reserved blocks */
 	uint64_t		m_resblks_save;	/* reserved blks @ remount,ro */
 	struct delayed_work	m_reclaim_work;	/* background inode reclaim */
-	struct delayed_work	m_eofblocks_work; /* background eof blocks
-						     trimming */
-	struct delayed_work	m_cowblocks_work; /* background cow blocks
+	struct delayed_work	m_blockgc_work; /* background prealloc blocks
 						     trimming */
 	struct xfs_kobj		m_kobj;
 	struct xfs_kobj		m_error_kobj;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 813be879a5e5..7fb024f96964 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -515,9 +515,9 @@ xfs_init_mount_workqueues(
 	if (!mp->m_reclaim_workqueue)
 		goto out_destroy_cil;
 
-	mp->m_eofblocks_workqueue = alloc_workqueue("xfs-eofblocks/%s",
+	mp->m_blockgc_workqueue = alloc_workqueue("xfs-blockgc/%s",
 			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_super->s_id);
-	if (!mp->m_eofblocks_workqueue)
+	if (!mp->m_blockgc_workqueue)
 		goto out_destroy_reclaim;
 
 	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s", WQ_FREEZABLE, 0,
@@ -528,7 +528,7 @@ xfs_init_mount_workqueues(
 	return 0;
 
 out_destroy_eofb:
-	destroy_workqueue(mp->m_eofblocks_workqueue);
+	destroy_workqueue(mp->m_blockgc_workqueue);
 out_destroy_reclaim:
 	destroy_workqueue(mp->m_reclaim_workqueue);
 out_destroy_cil:
@@ -546,7 +546,7 @@ xfs_destroy_mount_workqueues(
 	struct xfs_mount	*mp)
 {
 	destroy_workqueue(mp->m_sync_workqueue);
-	destroy_workqueue(mp->m_eofblocks_workqueue);
+	destroy_workqueue(mp->m_blockgc_workqueue);
 	destroy_workqueue(mp->m_reclaim_workqueue);
 	destroy_workqueue(mp->m_cil_workqueue);
 	destroy_workqueue(mp->m_unwritten_workqueue);
@@ -1872,8 +1872,7 @@ static int xfs_init_fs_context(
 	mutex_init(&mp->m_growlock);
 	INIT_WORK(&mp->m_flush_inodes_work, xfs_flush_inodes_worker);
 	INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker);
-	INIT_DELAYED_WORK(&mp->m_eofblocks_work, xfs_eofblocks_worker);
-	INIT_DELAYED_WORK(&mp->m_cowblocks_work, xfs_cowblocks_worker);
+	INIT_DELAYED_WORK(&mp->m_blockgc_work, xfs_blockgc_worker);
 	mp->m_kobj.kobject.kset = xfs_kset;
 	/*
 	 * We don't create the finobt per-ag space reservation until after log
diff --git a/fs/xfs/xfs_sysctl.c b/fs/xfs/xfs_sysctl.c
index fac9de7ee6d0..145e06c47744 100644
--- a/fs/xfs/xfs_sysctl.c
+++ b/fs/xfs/xfs_sysctl.c
@@ -194,21 +194,12 @@ static struct ctl_table xfs_table[] = {
 	},
 	{
 		.procname	= "speculative_prealloc_lifetime",
-		.data		= &xfs_params.eofb_timer.val,
+		.data		= &xfs_params.blockgc_timer.val,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
-		.extra1		= &xfs_params.eofb_timer.min,
-		.extra2		= &xfs_params.eofb_timer.max,
-	},
-	{
-		.procname	= "speculative_cow_prealloc_lifetime",
-		.data		= &xfs_params.cowb_timer.val,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
-		.extra1		= &xfs_params.cowb_timer.min,
-		.extra2		= &xfs_params.cowb_timer.max,
+		.extra1		= &xfs_params.blockgc_timer.min,
+		.extra2		= &xfs_params.blockgc_timer.max,
 	},
 	/* please keep this the last entry */
 #ifdef CONFIG_PROC_FS
diff --git a/fs/xfs/xfs_sysctl.h b/fs/xfs/xfs_sysctl.h
index 8abf4640f1d5..7692e76ead33 100644
--- a/fs/xfs/xfs_sysctl.h
+++ b/fs/xfs/xfs_sysctl.h
@@ -35,8 +35,7 @@ typedef struct xfs_param {
 	xfs_sysctl_val_t rotorstep;	/* inode32 AG rotoring control knob */
 	xfs_sysctl_val_t inherit_nodfrg;/* Inherit the "nodefrag" inode flag. */
 	xfs_sysctl_val_t fstrm_timer;	/* Filestream dir-AG assoc'n timeout. */
-	xfs_sysctl_val_t eofb_timer;	/* Interval between eofb scan wakeups */
-	xfs_sysctl_val_t cowb_timer;	/* Interval between cowb scan wakeups */
+	xfs_sysctl_val_t blockgc_timer;	/* Interval between blockgc scans */
 } xfs_param_t;
 
 /*


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/7] xfs: only walk the incore inode tree once per blockgc scan
  2021-01-11 23:23 [PATCHSET v2 0/7] xfs: consolidate posteof and cowblocks cleanup Darrick J. Wong
                   ` (3 preceding siblings ...)
  2021-01-11 23:23 ` [PATCH 4/7] xfs: consolidate the eofblocks and cowblocks workers Darrick J. Wong
@ 2021-01-11 23:23 ` Darrick J. Wong
  2021-01-13 15:06   ` Christoph Hellwig
  2021-01-11 23:23 ` [PATCH 6/7] xfs: rename block gc start and stop functions Darrick J. Wong
  2021-01-11 23:23 ` [PATCH 7/7] xfs: parallelize block preallocation garbage collection Darrick J. Wong
  6 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:23 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Perform background block preallocation gc scans more efficiently by
walking the incore inode tree once.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c |   27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 92fcec349054..4f68375cf873 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -31,6 +31,9 @@
  */
 #define XFS_INODE_WALK_INEW_WAIT	0x1	/* wait on new inodes */
 
+STATIC int xfs_inode_free_eofblocks(struct xfs_inode *ip, void *args);
+STATIC int xfs_inode_free_cowblocks(struct xfs_inode *ip, void *args);
+
 /*
  * Allocate and initialise an xfs_inode.
  */
@@ -950,19 +953,29 @@ xfs_queue_blockgc(
 	rcu_read_unlock();
 }
 
+/* Scan one incore inode for block preallocations that we can remove. */
+static int
+xfs_blockgc_scan_inode(
+	struct xfs_inode	*ip,
+	void			*args)
+{
+	int			error;
+
+	error = xfs_inode_free_eofblocks(ip, args);
+	if (error && error != -EAGAIN)
+		return error;
+
+	return xfs_inode_free_cowblocks(ip, args);
+}
+
 /* Scan all incore inodes for block preallocations that we can remove. */
 static inline int
 xfs_blockgc_scan(
 	struct xfs_mount	*mp,
 	struct xfs_eofblocks	*eofb)
 {
-	int			error;
-
-	error = xfs_icache_free_eofblocks(mp, eofb);
-	if (error && error != -EAGAIN)
-		return error;
-
-	return xfs_icache_free_cowblocks(mp, eofb);
+	return __xfs_inode_walk(mp, 0, xfs_blockgc_scan_inode, eofb,
+			XFS_ICI_BLOCK_GC_TAG);
 }
 
 /* Background worker that trims preallocated space. */


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/7] xfs: rename block gc start and stop functions
  2021-01-11 23:23 [PATCHSET v2 0/7] xfs: consolidate posteof and cowblocks cleanup Darrick J. Wong
                   ` (4 preceding siblings ...)
  2021-01-11 23:23 ` [PATCH 5/7] xfs: only walk the incore inode tree once per blockgc scan Darrick J. Wong
@ 2021-01-11 23:23 ` Darrick J. Wong
  2021-01-13 15:07   ` Christoph Hellwig
  2021-01-11 23:23 ` [PATCH 7/7] xfs: parallelize block preallocation garbage collection Darrick J. Wong
  6 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:23 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Shorten the names of the two functions that start and stop block
preallocation garbage collection and move them up to the other blockgc
functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/common.c |    4 ++--
 fs/xfs/xfs_icache.c   |   32 ++++++++++++++++----------------
 fs/xfs/xfs_icache.h   |    4 ++--
 fs/xfs/xfs_mount.c    |    2 +-
 fs/xfs/xfs_super.c    |    8 ++++----
 5 files changed, 25 insertions(+), 25 deletions(-)


diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 8ea6d4aa3f55..53456f3de881 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -888,7 +888,7 @@ xchk_stop_reaping(
 	struct xfs_scrub	*sc)
 {
 	sc->flags |= XCHK_REAPING_DISABLED;
-	xfs_stop_block_reaping(sc->mp);
+	xfs_blockgc_stop(sc->mp);
 }
 
 /* Restart background reaping of resources. */
@@ -896,6 +896,6 @@ void
 xchk_start_reaping(
 	struct xfs_scrub	*sc)
 {
-	xfs_start_block_reaping(sc->mp);
+	xfs_blockgc_start(sc->mp);
 	sc->flags &= ~XCHK_REAPING_DISABLED;
 }
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 4f68375cf873..d1179f3e9d1f 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -996,6 +996,22 @@ xfs_blockgc_worker(
 	xfs_queue_blockgc(mp);
 }
 
+/* Disable post-EOF and CoW block auto-reclamation. */
+void
+xfs_blockgc_stop(
+	struct xfs_mount	*mp)
+{
+	cancel_delayed_work_sync(&mp->m_blockgc_work);
+}
+
+/* Enable post-EOF and CoW block auto-reclamation. */
+void
+xfs_blockgc_start(
+	struct xfs_mount	*mp)
+{
+	xfs_queue_blockgc(mp);
+}
+
 /*
  * Grab the inode for reclaim exclusively.
  *
@@ -1713,19 +1729,3 @@ xfs_inode_clear_cowblocks_tag(
 	trace_xfs_inode_clear_cowblocks_tag(ip);
 	return __xfs_inode_clear_blocks_tag(ip, XFS_ICOWBLOCKS);
 }
-
-/* Disable post-EOF and CoW block auto-reclamation. */
-void
-xfs_stop_block_reaping(
-	struct xfs_mount	*mp)
-{
-	cancel_delayed_work_sync(&mp->m_blockgc_work);
-}
-
-/* Enable post-EOF and CoW block auto-reclamation. */
-void
-xfs_start_block_reaping(
-	struct xfs_mount	*mp)
-{
-	xfs_queue_blockgc(mp);
-}
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 4ddb2c6de18b..5bcea900bc30 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -68,7 +68,7 @@ int xfs_inode_walk(struct xfs_mount *mp,
 int xfs_icache_inode_is_allocated(struct xfs_mount *mp, struct xfs_trans *tp,
 				  xfs_ino_t ino, bool *inuse);
 
-void xfs_stop_block_reaping(struct xfs_mount *mp);
-void xfs_start_block_reaping(struct xfs_mount *mp);
+void xfs_blockgc_stop(struct xfs_mount *mp);
+void xfs_blockgc_start(struct xfs_mount *mp);
 
 #endif
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 1e974106e58c..c3144cf5febe 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1054,7 +1054,7 @@ xfs_unmountfs(
 	uint64_t		resblks;
 	int			error;
 
-	xfs_stop_block_reaping(mp);
+	xfs_blockgc_stop(mp);
 	xfs_fs_unreserve_ag_blocks(mp);
 	xfs_qm_unmount_quotas(mp);
 	xfs_rtunmount_inodes(mp);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 7fb024f96964..f72c1f473025 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -920,7 +920,7 @@ xfs_fs_freeze(
 	 * set a GFP_NOFS context here to avoid recursion deadlocks.
 	 */
 	flags = memalloc_nofs_save();
-	xfs_stop_block_reaping(mp);
+	xfs_blockgc_stop(mp);
 	xfs_save_resvblks(mp);
 	xfs_quiesce_attr(mp);
 	ret = xfs_sync_sb(mp, true);
@@ -936,7 +936,7 @@ xfs_fs_unfreeze(
 
 	xfs_restore_resvblks(mp);
 	xfs_log_work_queue(mp);
-	xfs_start_block_reaping(mp);
+	xfs_blockgc_start(mp);
 	return 0;
 }
 
@@ -1720,7 +1720,7 @@ xfs_remount_rw(
 		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 		return error;
 	}
-	xfs_start_block_reaping(mp);
+	xfs_blockgc_start(mp);
 
 	/* Create the per-AG metadata reservation pool .*/
 	error = xfs_fs_reserve_ag_blocks(mp);
@@ -1740,7 +1740,7 @@ xfs_remount_ro(
 	 * Cancel background eofb scanning so it cannot race with the final
 	 * log force+buftarg wait and deadlock the remount.
 	 */
-	xfs_stop_block_reaping(mp);
+	xfs_blockgc_stop(mp);
 
 	/* Get rid of any leftover CoW reservations... */
 	error = xfs_icache_free_cowblocks(mp, NULL);


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/7] xfs: parallelize block preallocation garbage collection
  2021-01-11 23:23 [PATCHSET v2 0/7] xfs: consolidate posteof and cowblocks cleanup Darrick J. Wong
                   ` (5 preceding siblings ...)
  2021-01-11 23:23 ` [PATCH 6/7] xfs: rename block gc start and stop functions Darrick J. Wong
@ 2021-01-11 23:23 ` Darrick J. Wong
  2021-01-13 15:09   ` Christoph Hellwig
  6 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-11 23:23 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Split the block preallocation garbage collection work into per-AG work
items so that we can take advantage of parallelization.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c |   47 +++++++++++++++++++++++++++++++++--------------
 fs/xfs/xfs_mount.c  |    3 +++
 fs/xfs/xfs_mount.h  |    5 +++--
 fs/xfs/xfs_super.c  |   26 ++++++++++++++++++++++++--
 4 files changed, 63 insertions(+), 18 deletions(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index d1179f3e9d1f..e78724440d87 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -943,12 +943,12 @@ xfs_inode_walk(
  */
 static void
 xfs_queue_blockgc(
-	struct xfs_mount	*mp)
+	struct xfs_perag	*pag)
 {
 	rcu_read_lock();
-	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_BLOCK_GC_TAG))
-		queue_delayed_work(mp->m_blockgc_workqueue,
-				   &mp->m_blockgc_work,
+	if (radix_tree_tagged(&pag->pag_ici_root, XFS_ICI_BLOCK_GC_TAG))
+		queue_delayed_work(pag->pag_mount->m_blockgc_workqueue,
+				   &pag->pag_blockgc_work,
 				   msecs_to_jiffies(xfs_blockgc_secs * 1000));
 	rcu_read_unlock();
 }
@@ -983,25 +983,40 @@ void
 xfs_blockgc_worker(
 	struct work_struct	*work)
 {
-	struct xfs_mount	*mp = container_of(to_delayed_work(work),
-					struct xfs_mount, m_blockgc_work);
+	struct xfs_perag	*pag = container_of(to_delayed_work(work),
+					struct xfs_perag, pag_blockgc_work);
 	int			error;
 
-	if (!sb_start_write_trylock(mp->m_super))
+	if (!sb_start_write_trylock(pag->pag_mount->m_super))
 		return;
-	error = xfs_blockgc_scan(mp, NULL);
+
+	error = xfs_inode_walk_ag(pag, 0, xfs_blockgc_scan_inode, NULL,
+			XFS_ICI_BLOCK_GC_TAG);
 	if (error)
-		xfs_info(mp, "preallocation gc worker failed, err=%d", error);
-	sb_end_write(mp->m_super);
-	xfs_queue_blockgc(mp);
+		xfs_info(pag->pag_mount,
+				"AG %u preallocation gc worker failed, err=%d",
+				pag->pag_agno, error);
+	sb_end_write(pag->pag_mount->m_super);
+	xfs_queue_blockgc(pag);
 }
 
+#define for_each_perag_tag(mp, next_agno, pag, tag) \
+	for ((next_agno) = 0, (pag) = xfs_perag_get_tag((mp), 0, (tag)); \
+	     (pag) != NULL; \
+	     (next_agno) = (pag)->pag_agno + 1, \
+	     xfs_perag_put(pag), \
+	     (pag) = xfs_perag_get_tag((mp), (next_agno), (tag)))
+
 /* Disable post-EOF and CoW block auto-reclamation. */
 void
 xfs_blockgc_stop(
 	struct xfs_mount	*mp)
 {
-	cancel_delayed_work_sync(&mp->m_blockgc_work);
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		agno;
+
+	for_each_perag_tag(mp, agno, pag, XFS_ICI_BLOCK_GC_TAG)
+		cancel_delayed_work_sync(&pag->pag_blockgc_work);
 }
 
 /* Enable post-EOF and CoW block auto-reclamation. */
@@ -1009,7 +1024,11 @@ void
 xfs_blockgc_start(
 	struct xfs_mount	*mp)
 {
-	xfs_queue_blockgc(mp);
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		agno;
+
+	for_each_perag_tag(mp, agno, pag, XFS_ICI_BLOCK_GC_TAG)
+		xfs_queue_blockgc(pag);
 }
 
 /*
@@ -1546,7 +1565,7 @@ __xfs_inode_set_blocks_tag(
 		spin_unlock(&ip->i_mount->m_perag_lock);
 
 		/* kick off background trimming */
-		xfs_queue_blockgc(ip->i_mount);
+		xfs_queue_blockgc(pag);
 
 		trace_xfs_perag_set_blockgc(ip->i_mount, pag->pag_agno, -1,
 				_RET_IP_);
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index c3144cf5febe..ed96b48afb0a 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -126,6 +126,7 @@ __xfs_free_perag(
 {
 	struct xfs_perag *pag = container_of(head, struct xfs_perag, rcu_head);
 
+	ASSERT(!delayed_work_pending(&pag->pag_blockgc_work));
 	ASSERT(atomic_read(&pag->pag_ref) == 0);
 	kmem_free(pag);
 }
@@ -146,6 +147,7 @@ xfs_free_perag(
 		spin_unlock(&mp->m_perag_lock);
 		ASSERT(pag);
 		ASSERT(atomic_read(&pag->pag_ref) == 0);
+		cancel_delayed_work_sync(&pag->pag_blockgc_work);
 		xfs_iunlink_destroy(pag);
 		xfs_buf_hash_destroy(pag);
 		call_rcu(&pag->rcu_head, __xfs_free_perag);
@@ -201,6 +203,7 @@ xfs_initialize_perag(
 		pag->pag_agno = index;
 		pag->pag_mount = mp;
 		spin_lock_init(&pag->pag_ici_lock);
+		INIT_DELAYED_WORK(&pag->pag_blockgc_work, xfs_blockgc_worker);
 		INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
 
 		error = xfs_buf_hash_init(pag);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index d9f32102514e..2bb2b5704805 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -177,8 +177,6 @@ typedef struct xfs_mount {
 	uint64_t		m_resblks_avail;/* available reserved blocks */
 	uint64_t		m_resblks_save;	/* reserved blks @ remount,ro */
 	struct delayed_work	m_reclaim_work;	/* background inode reclaim */
-	struct delayed_work	m_blockgc_work; /* background prealloc blocks
-						     trimming */
 	struct xfs_kobj		m_kobj;
 	struct xfs_kobj		m_error_kobj;
 	struct xfs_kobj		m_error_meta_kobj;
@@ -367,6 +365,9 @@ typedef struct xfs_perag {
 	/* Blocks reserved for the reverse mapping btree. */
 	struct xfs_ag_resv	pag_rmapbt_resv;
 
+	/* background prealloc block trimming */
+	struct delayed_work	pag_blockgc_work;
+
 	/* reference count */
 	uint8_t			pagf_refcount_level;
 
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index f72c1f473025..9624e8a08509 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -516,7 +516,8 @@ xfs_init_mount_workqueues(
 		goto out_destroy_cil;
 
 	mp->m_blockgc_workqueue = alloc_workqueue("xfs-blockgc/%s",
-			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_super->s_id);
+			WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_FREEZABLE, 0,
+			mp->m_super->s_id);
 	if (!mp->m_blockgc_workqueue)
 		goto out_destroy_reclaim;
 
@@ -1387,6 +1388,26 @@ xfs_fs_validate_params(
 	return 0;
 }
 
+/*
+ * Constrain the number of threads that we start for background work.  This
+ * is the estimated parallelism of the filesystem capped to the unbound work
+ * queue maximum.
+ *
+ * We can't set this when we allocate the workqueues because the thread count
+ * derives from AG count, and we can't know that until we're far enough through
+ * setup to read the superblock, which requires functioning workqueues.
+ */
+static inline void
+xfs_configure_background_workqueues(
+	struct xfs_mount	*mp)
+{
+	unsigned int		threads = xfs_guess_metadata_threads(mp);
+	unsigned int		max_active;
+
+	max_active = min_t(unsigned int, threads, WQ_UNBOUND_MAX_ACTIVE);
+	workqueue_set_max_active(mp->m_blockgc_workqueue, max_active);
+}
+
 static int
 xfs_fs_fill_super(
 	struct super_block	*sb,
@@ -1452,6 +1473,8 @@ xfs_fs_fill_super(
 	if (error)
 		goto out_free_sb;
 
+	xfs_configure_background_workqueues(mp);
+
 	error = xfs_setup_devices(mp);
 	if (error)
 		goto out_free_sb;
@@ -1872,7 +1895,6 @@ static int xfs_init_fs_context(
 	mutex_init(&mp->m_growlock);
 	INIT_WORK(&mp->m_flush_inodes_work, xfs_flush_inodes_worker);
 	INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker);
-	INIT_DELAYED_WORK(&mp->m_blockgc_work, xfs_blockgc_worker);
 	mp->m_kobj.kobject.kset = xfs_kset;
 	/*
 	 * We don't create the finobt per-ag space reservation until after log


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] xfs: increase the default parallelism levels of pwork clients
  2021-01-11 23:23 ` [PATCH 1/7] xfs: increase the default parallelism levels of pwork clients Darrick J. Wong
@ 2021-01-13 14:49   ` Christoph Hellwig
  2021-01-14 21:32     ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-01-13 14:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

> +/* Estimate the amount of parallelism available for a given device. */
> +unsigned int
> +xfs_buftarg_guess_threads(
> +	struct xfs_buftarg	*btp)
> +{
> +	int			iomin;
> +	int			ioopt;
> +
> +	/*
> +	 * The device tells us that it is non-rotational, and we take that to
> +	 * mean there are no moving parts and that the device can handle all
> +	 * the CPUs throwing IO requests at it.
> +	 */
> +	if (blk_queue_nonrot(btp->bt_bdev->bd_disk->queue))
> +		return num_online_cpus();
> +
> +	/*
> +	 * The device has a preferred and minimum IO size that suggest a RAID
> +	 * setup, so infer the number of disks and assume that the parallelism
> +	 * is equal to the disk count.
> +	 */
> +	iomin = bdev_io_min(btp->bt_bdev);
> +	ioopt = bdev_io_opt(btp->bt_bdev);
> +	if (iomin > 0 && ioopt > iomin)
> +		return ioopt / iomin;
> +
> +	/*
> +	 * The device did not indicate that it has any capabilities beyond that
> +	 * of a rotating disk with a single drive head, so we estimate no
> +	 * parallelism at all.
> +	 */
> +	return 1;
> +}

Why is this in xfs_buf.c despite having nothing to do with the buffer
cache?

Also I think we need some sort of manual override in case the guess is
wrong.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/7] xfs: refactor the predicate part of xfs_free_eofblocks
  2021-01-11 23:23 ` [PATCH 2/7] xfs: refactor the predicate part of xfs_free_eofblocks Darrick J. Wong
@ 2021-01-13 14:57   ` Christoph Hellwig
  2021-01-14 22:49     ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-01-13 14:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jan 11, 2021 at 03:23:28PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Refactor the part of _free_eofblocks that decides if it's really going
> to truncate post-EOF blocks into a separate helper function.  The
> upcoming deferred inode inactivation patch requires us to be able to
> decide this prior to actual inactivation.  No functionality changes.

Is there any specific reason why the new xfs_has_eofblocks helper is in
xfs_inode.c?  That just makes following the logic a little harder.

> +
> +/*
> + * Decide if this inode have post-EOF blocks.  The caller is responsible
> + * for knowing / caring about the PREALLOC/APPEND flags.
> + */
> +int
> +xfs_has_eofblocks(
> +	struct xfs_inode	*ip,
> +	bool			*has)
> +{
> +	struct xfs_bmbt_irec	imap;
> +	struct xfs_mount	*mp = ip->i_mount;
> +	xfs_fileoff_t		end_fsb;
> +	xfs_fileoff_t		last_fsb;
> +	xfs_filblks_t		map_len;
> +	int			nimaps;
> +	int			error;
> +
> +	*has = false;
> +	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
> +	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> +	if (last_fsb <= end_fsb)
> +		return 0;

Where does this strange magic come from?

> +	map_len = last_fsb - end_fsb;
> +
> +	nimaps = 1;
> +	xfs_ilock(ip, XFS_ILOCK_SHARED);
> +	error = xfs_bmapi_read(ip, end_fsb, map_len, &imap, &nimaps, 0);
> +	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +
> +	if (error || nimaps == 0)
> +		return error;
> +
> +	*has = imap.br_startblock != HOLESTARTBLOCK || ip->i_delayed_blks;
> +	return 0;

I think this logic could be simplified at lot by using xfs_iext_lookup
directly. Something like:

	*has = false;

	xfs_ilock(ip, XFS_ILOCK_SHARED);
	if (ip->i_delayed_blks) {
		*has = true;
		goto out_unlock;
	}
	if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
		if (error)
			goto out_unlock;
        }
	if (xfs_iext_lookup_extent(ip, &ip->i_df, end_fsb, &icur, &imap))
		*has = true;
out_unlock:
	xfs_iunlock(ip, XFS_ILOCK_SHARED);
	return error;
		

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/7] xfs: consolidate incore inode radix tree posteof/cowblocks tags
  2021-01-11 23:23 ` [PATCH 3/7] xfs: consolidate incore inode radix tree posteof/cowblocks tags Darrick J. Wong
@ 2021-01-13 14:59   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2021-01-13 14:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jan 11, 2021 at 03:23:34PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> The clearing of posteof blocks and cowblocks serve the same purpose:
> removing speculative block preallocations from inactive files.  We don't
> need to burn two radix tree tags on this, so combine them into one.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/7] xfs: consolidate the eofblocks and cowblocks workers
  2021-01-11 23:23 ` [PATCH 4/7] xfs: consolidate the eofblocks and cowblocks workers Darrick J. Wong
@ 2021-01-13 15:04   ` Christoph Hellwig
  2021-01-13 23:53     ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-01-13 15:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jan 11, 2021 at 03:23:40PM -0800, Darrick J. Wong wrote:
> + * 100ths of a second) with the exception of blockgc_timer, which is measured
> + * in seconds.
>   */
>  xfs_param_t xfs_params = {
>  			  /*	MIN		DFLT		MAX	*/
> @@ -28,8 +28,7 @@ xfs_param_t xfs_params = {
>  	.rotorstep	= {	1,		1,		255	},
>  	.inherit_nodfrg	= {	0,		1,		1	},
>  	.fstrm_timer	= {	1,		30*100,		3600*100},
> -	.eofb_timer	= {	1,		300,		3600*24},
> -	.cowb_timer	= {	1,		1800,		3600*24},
> +	.blockgc_timer	= {	1,		300,		3600*24},

Renaming this is going to break existing scripts.  We could either kill off the
COW timer as it is relatively recent, or we could keep both and use the minimum.
But removing both and picking an entirely new name seems a little dangerous.

Otherwise this looks sane to me.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/7] xfs: only walk the incore inode tree once per blockgc scan
  2021-01-11 23:23 ` [PATCH 5/7] xfs: only walk the incore inode tree once per blockgc scan Darrick J. Wong
@ 2021-01-13 15:06   ` Christoph Hellwig
  2021-01-13 20:41     ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-01-13 15:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jan 11, 2021 at 03:23:46PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Perform background block preallocation gc scans more efficiently by
> walking the incore inode tree once.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks good.  If you could find a way to avoid the forward declarations
I'd be even more happy :)

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/7] xfs: rename block gc start and stop functions
  2021-01-11 23:23 ` [PATCH 6/7] xfs: rename block gc start and stop functions Darrick J. Wong
@ 2021-01-13 15:07   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2021-01-13 15:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jan 11, 2021 at 03:23:52PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Shorten the names of the two functions that start and stop block
> preallocation garbage collection and move them up to the other blockgc
> functions.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/7] xfs: parallelize block preallocation garbage collection
  2021-01-11 23:23 ` [PATCH 7/7] xfs: parallelize block preallocation garbage collection Darrick J. Wong
@ 2021-01-13 15:09   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2021-01-13 15:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jan 11, 2021 at 03:23:59PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Split the block preallocation garbage collection work into per-AG work
> items so that we can take advantage of parallelization.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/7] xfs: only walk the incore inode tree once per blockgc scan
  2021-01-13 15:06   ` Christoph Hellwig
@ 2021-01-13 20:41     ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-13 20:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Jan 13, 2021 at 04:06:32PM +0100, Christoph Hellwig wrote:
> On Mon, Jan 11, 2021 at 03:23:46PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Perform background block preallocation gc scans more efficiently by
> > walking the incore inode tree once.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> 
> Looks good.  If you could find a way to avoid the forward declarations
> I'd be even more happy :)

They went away as a part of the massive rework stemming from Dave's
"simple" review comment in the previous series about moving the ENOSPC
retry loops into the quota reservation functions.

Granted, the double series has ballooned from 13 to 28 patches as I had
to clean up even more quota code... :(

--D

> Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/7] xfs: consolidate the eofblocks and cowblocks workers
  2021-01-13 15:04   ` Christoph Hellwig
@ 2021-01-13 23:53     ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-13 23:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Jan 13, 2021 at 04:04:30PM +0100, Christoph Hellwig wrote:
> On Mon, Jan 11, 2021 at 03:23:40PM -0800, Darrick J. Wong wrote:
> > + * 100ths of a second) with the exception of blockgc_timer, which is measured
> > + * in seconds.
> >   */
> >  xfs_param_t xfs_params = {
> >  			  /*	MIN		DFLT		MAX	*/
> > @@ -28,8 +28,7 @@ xfs_param_t xfs_params = {
> >  	.rotorstep	= {	1,		1,		255	},
> >  	.inherit_nodfrg	= {	0,		1,		1	},
> >  	.fstrm_timer	= {	1,		30*100,		3600*100},
> > -	.eofb_timer	= {	1,		300,		3600*24},
> > -	.cowb_timer	= {	1,		1800,		3600*24},
> > +	.blockgc_timer	= {	1,		300,		3600*24},
> 
> Renaming this is going to break existing scripts.  We could either kill off the
> COW timer as it is relatively recent, or we could keep both and use the minimum.
> But removing both and picking an entirely new name seems a little dangerous.

"blockgc_timer" is the internal variable.  The xfs_sysctl.c changes at
the bottom of the patch erase speculative_cow_prealloc_lifetime, but the
older knob remains unchanged.  See xfs_sysctl.c:

	{
		.procname	= "speculative_prealloc_lifetime",
		.data		= &xfs_params.blockgc_timer.val,

And from the test system:

# ls /proc/sys/fs/xfs/
total 0
-rw-r--r-- 1 root root 0 Jan 13 15:46 error_level
-rw-r--r-- 1 root root 0 Jan 13 15:46 filestream_centisecs
-rw-r--r-- 1 root root 0 Jan 13 15:46 inherit_noatime
-rw-r--r-- 1 root root 0 Jan 13 15:46 inherit_nodefrag
-rw-r--r-- 1 root root 0 Jan 13 15:46 inherit_nodump
-rw-r--r-- 1 root root 0 Jan 13 15:46 inherit_nosymlinks
-rw-r--r-- 1 root root 0 Jan 13 15:46 inherit_sync
-rw-r--r-- 1 root root 0 Jan 13 15:46 irix_sgid_inherit
-rw-r--r-- 1 root root 0 Jan 13 15:46 irix_symlink_mode
-rw-r--r-- 1 root root 0 Jan 13 15:46 panic_mask
-rw-r--r-- 1 root root 0 Jan 13 15:46 rotorstep
-rw-r--r-- 1 root root 0 Jan 13 15:46 speculative_prealloc_lifetime
-rw-r--r-- 1 root root 0 Jan 13 15:46 stats_clear
-rw-r--r-- 1 root root 0 Jan 13 15:46 xfssyncd_centisecs

--D

> 
> Otherwise this looks sane to me.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] xfs: increase the default parallelism levels of pwork clients
  2021-01-13 14:49   ` Christoph Hellwig
@ 2021-01-14 21:32     ` Darrick J. Wong
  2021-01-14 22:38       ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-14 21:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Jan 13, 2021 at 03:49:32PM +0100, Christoph Hellwig wrote:
> > +/* Estimate the amount of parallelism available for a given device. */
> > +unsigned int
> > +xfs_buftarg_guess_threads(
> > +	struct xfs_buftarg	*btp)
> > +{
> > +	int			iomin;
> > +	int			ioopt;
> > +
> > +	/*
> > +	 * The device tells us that it is non-rotational, and we take that to
> > +	 * mean there are no moving parts and that the device can handle all
> > +	 * the CPUs throwing IO requests at it.
> > +	 */
> > +	if (blk_queue_nonrot(btp->bt_bdev->bd_disk->queue))
> > +		return num_online_cpus();
> > +
> > +	/*
> > +	 * The device has a preferred and minimum IO size that suggest a RAID
> > +	 * setup, so infer the number of disks and assume that the parallelism
> > +	 * is equal to the disk count.
> > +	 */
> > +	iomin = bdev_io_min(btp->bt_bdev);
> > +	ioopt = bdev_io_opt(btp->bt_bdev);
> > +	if (iomin > 0 && ioopt > iomin)
> > +		return ioopt / iomin;
> > +
> > +	/*
> > +	 * The device did not indicate that it has any capabilities beyond that
> > +	 * of a rotating disk with a single drive head, so we estimate no
> > +	 * parallelism at all.
> > +	 */
> > +	return 1;
> > +}
> 
> Why is this in xfs_buf.c despite having nothing to do with the buffer
> cache?

Initially I assigned it to the buftarg code on the grounds that this is
how you'd estimate the level of parallelism available through the data
device buftarg.

I don't really care where it goes, though.  xfs_pwork_guess_threads
would be fine too.

> Also I think we need some sort of manual override in case the guess is
> wrong.

Hm, where would we put it?  One of the xfs sysctls?  And would we be
able to resize the background blockgc workqueue size too?

--D

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] xfs: increase the default parallelism levels of pwork clients
  2021-01-14 21:32     ` Darrick J. Wong
@ 2021-01-14 22:38       ` Darrick J. Wong
  2021-01-18 17:36         ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-14 22:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Thu, Jan 14, 2021 at 01:32:59PM -0800, Darrick J. Wong wrote:
> On Wed, Jan 13, 2021 at 03:49:32PM +0100, Christoph Hellwig wrote:
> > > +/* Estimate the amount of parallelism available for a given device. */
> > > +unsigned int
> > > +xfs_buftarg_guess_threads(
> > > +	struct xfs_buftarg	*btp)
> > > +{
> > > +	int			iomin;
> > > +	int			ioopt;
> > > +
> > > +	/*
> > > +	 * The device tells us that it is non-rotational, and we take that to
> > > +	 * mean there are no moving parts and that the device can handle all
> > > +	 * the CPUs throwing IO requests at it.
> > > +	 */
> > > +	if (blk_queue_nonrot(btp->bt_bdev->bd_disk->queue))
> > > +		return num_online_cpus();
> > > +
> > > +	/*
> > > +	 * The device has a preferred and minimum IO size that suggest a RAID
> > > +	 * setup, so infer the number of disks and assume that the parallelism
> > > +	 * is equal to the disk count.
> > > +	 */
> > > +	iomin = bdev_io_min(btp->bt_bdev);
> > > +	ioopt = bdev_io_opt(btp->bt_bdev);
> > > +	if (iomin > 0 && ioopt > iomin)
> > > +		return ioopt / iomin;
> > > +
> > > +	/*
> > > +	 * The device did not indicate that it has any capabilities beyond that
> > > +	 * of a rotating disk with a single drive head, so we estimate no
> > > +	 * parallelism at all.
> > > +	 */
> > > +	return 1;
> > > +}
> > 
> > Why is this in xfs_buf.c despite having nothing to do with the buffer
> > cache?
> 
> Initially I assigned it to the buftarg code on the grounds that this is
> how you'd estimate the level of parallelism available through the data
> device buftarg.
> 
> I don't really care where it goes, though.  xfs_pwork_guess_threads
> would be fine too.
> 
> > Also I think we need some sort of manual override in case the guess is
> > wrong.

There already /is/ a pwork_threads sysctl knob for controlling
quotacheck parallelism; and for the block gc workqueue I add WQ_SYSFS so
that you can set /sys/bus/workqueue/devices/xfs-*/max_active.

--D
> 
> Hm, where would we put it?  One of the xfs sysctls?  And would we be
> able to resize the background blockgc workqueue size too?

> --D

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/7] xfs: refactor the predicate part of xfs_free_eofblocks
  2021-01-13 14:57   ` Christoph Hellwig
@ 2021-01-14 22:49     ` Darrick J. Wong
  2021-01-18 17:38       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-14 22:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, Jan 13, 2021 at 03:57:15PM +0100, Christoph Hellwig wrote:
> On Mon, Jan 11, 2021 at 03:23:28PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Refactor the part of _free_eofblocks that decides if it's really going
> > to truncate post-EOF blocks into a separate helper function.  The
> > upcoming deferred inode inactivation patch requires us to be able to
> > decide this prior to actual inactivation.  No functionality changes.
> 
> Is there any specific reason why the new xfs_has_eofblocks helper is in
> xfs_inode.c?  That just makes following the logic a little harder.

I ... have no idea.  This patch also isn't technically needed until we
get to the deferred inactivation patchset, but I'll reshuffle the deck
and move this function back to xfs_bmap_util.c.

> > +
> > +/*
> > + * Decide if this inode have post-EOF blocks.  The caller is responsible
> > + * for knowing / caring about the PREALLOC/APPEND flags.
> > + */
> > +int
> > +xfs_has_eofblocks(
> > +	struct xfs_inode	*ip,
> > +	bool			*has)
> > +{
> > +	struct xfs_bmbt_irec	imap;
> > +	struct xfs_mount	*mp = ip->i_mount;
> > +	xfs_fileoff_t		end_fsb;
> > +	xfs_fileoff_t		last_fsb;
> > +	xfs_filblks_t		map_len;
> > +	int			nimaps;
> > +	int			error;
> > +
> > +	*has = false;
> > +	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
> > +	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> > +	if (last_fsb <= end_fsb)
> > +		return 0;
> 
> Where does this strange magic come from?

It comes straight from xfs_free_eofblocks.

I /think/ the purpose of this is to avoid touching any file that's
larger than the page cache supports (i.e. 16T on 32-bit) because inode
inactivation causes pagecache invalidation, and trying to invalidate
with too high a pgoff causes weird integer truncation problems.

> > +	map_len = last_fsb - end_fsb;
> > +
> > +	nimaps = 1;
> > +	xfs_ilock(ip, XFS_ILOCK_SHARED);
> > +	error = xfs_bmapi_read(ip, end_fsb, map_len, &imap, &nimaps, 0);
> > +	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> > +
> > +	if (error || nimaps == 0)
> > +		return error;
> > +
> > +	*has = imap.br_startblock != HOLESTARTBLOCK || ip->i_delayed_blks;
> > +	return 0;
> 
> I think this logic could be simplified at lot by using xfs_iext_lookup
> directly. Something like:
> 
> 	*has = false;
> 
> 	xfs_ilock(ip, XFS_ILOCK_SHARED);
> 	if (ip->i_delayed_blks) {
> 		*has = true;
> 		goto out_unlock;
> 	}
> 	if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
> 		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);

Is it even worth reading in the bmap btree to clear posteof blocks if we
haven't loaded it already?

--D

> 		if (error)
> 			goto out_unlock;
>         }
> 	if (xfs_iext_lookup_extent(ip, &ip->i_df, end_fsb, &icur, &imap))
> 		*has = true;
> out_unlock:
> 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> 	return error;

--D

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] xfs: increase the default parallelism levels of pwork clients
  2021-01-14 22:38       ` Darrick J. Wong
@ 2021-01-18 17:36         ` Christoph Hellwig
  2021-01-18 19:57           ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-01-18 17:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Thu, Jan 14, 2021 at 02:38:49PM -0800, Darrick J. Wong wrote:
> There already /is/ a pwork_threads sysctl knob for controlling
> quotacheck parallelism; and for the block gc workqueue I add WQ_SYSFS so
> that you can set /sys/bus/workqueue/devices/xfs-*/max_active.

Hmm.  A single know that is named to describe that it deals with the
expected device parallelism might be easier to understand for users.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/7] xfs: refactor the predicate part of xfs_free_eofblocks
  2021-01-14 22:49     ` Darrick J. Wong
@ 2021-01-18 17:38       ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2021-01-18 17:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Thu, Jan 14, 2021 at 02:49:53PM -0800, Darrick J. Wong wrote:
> > > +	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
> > > +	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> > > +	if (last_fsb <= end_fsb)
> > > +		return 0;
> > 
> > Where does this strange magic come from?
> 
> It comes straight from xfs_free_eofblocks.
> 
> I /think/ the purpose of this is to avoid touching any file that's
> larger than the page cache supports (i.e. 16T on 32-bit) because inode
> inactivation causes pagecache invalidation, and trying to invalidate
> with too high a pgoff causes weird integer truncation problems.

Hmm.  At very least we need to document this, but we really should not
allow to even read in an inode successfully under this condition..

Screams for a nice little test case..

> > 	xfs_ilock(ip, XFS_ILOCK_SHARED);
> > 	if (ip->i_delayed_blks) {
> > 		*has = true;
> > 		goto out_unlock;
> > 	}
> > 	if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
> > 		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
> 
> Is it even worth reading in the bmap btree to clear posteof blocks if we
> haven't loaded it already?

True, we can return early in that case.  I'm also not sure what the
i_delayed_blks check is supposed to do, as delalloc extents do show
up in the iext tree just like normal extents.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] xfs: increase the default parallelism levels of pwork clients
  2021-01-18 17:36         ` Christoph Hellwig
@ 2021-01-18 19:57           ` Darrick J. Wong
  2021-01-19 16:37             ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-18 19:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Jan 18, 2021 at 05:36:28PM +0000, Christoph Hellwig wrote:
> On Thu, Jan 14, 2021 at 02:38:49PM -0800, Darrick J. Wong wrote:
> > There already /is/ a pwork_threads sysctl knob for controlling
> > quotacheck parallelism; and for the block gc workqueue I add WQ_SYSFS so
> > that you can set /sys/bus/workqueue/devices/xfs-*/max_active.
> 
> Hmm.  A single know that is named to describe that it deals with the
> expected device parallelism might be easier to understand for users.

Where should I add a sysfs attributes for per-fs configuration knobs?  I
don't really want to add "expected parallelism" to /sys/fs/xfs/*/error
because that seems like the wrong place, and /proc/sys/fs/xfs/ is too
global for something that could depend on the device.

--D

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] xfs: increase the default parallelism levels of pwork clients
  2021-01-18 19:57           ` Darrick J. Wong
@ 2021-01-19 16:37             ` Christoph Hellwig
  2021-01-19 19:17               ` Darrick J. Wong
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-01-19 16:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Mon, Jan 18, 2021 at 11:57:10AM -0800, Darrick J. Wong wrote:
> Where should I add a sysfs attributes for per-fs configuration knobs?  I
> don't really want to add "expected parallelism" to /sys/fs/xfs/*/error
> because that seems like the wrong place, and /proc/sys/fs/xfs/ is too
> global for something that could depend on the device.

Maybe a mount option that can be changed using remount?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/7] xfs: increase the default parallelism levels of pwork clients
  2021-01-19 16:37             ` Christoph Hellwig
@ 2021-01-19 19:17               ` Darrick J. Wong
  0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2021-01-19 19:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Tue, Jan 19, 2021 at 04:37:11PM +0000, Christoph Hellwig wrote:
> On Mon, Jan 18, 2021 at 11:57:10AM -0800, Darrick J. Wong wrote:
> > Where should I add a sysfs attributes for per-fs configuration knobs?  I
> > don't really want to add "expected parallelism" to /sys/fs/xfs/*/error
> > because that seems like the wrong place, and /proc/sys/fs/xfs/ is too
> > global for something that could depend on the device.
> 
> Maybe a mount option that can be changed using remount?

Yeah, I'll tack that on the end of the blockgc series as a
CONFIG_XFS_DEBUG=y mount option.

--D

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-01-19 19:23 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-11 23:23 [PATCHSET v2 0/7] xfs: consolidate posteof and cowblocks cleanup Darrick J. Wong
2021-01-11 23:23 ` [PATCH 1/7] xfs: increase the default parallelism levels of pwork clients Darrick J. Wong
2021-01-13 14:49   ` Christoph Hellwig
2021-01-14 21:32     ` Darrick J. Wong
2021-01-14 22:38       ` Darrick J. Wong
2021-01-18 17:36         ` Christoph Hellwig
2021-01-18 19:57           ` Darrick J. Wong
2021-01-19 16:37             ` Christoph Hellwig
2021-01-19 19:17               ` Darrick J. Wong
2021-01-11 23:23 ` [PATCH 2/7] xfs: refactor the predicate part of xfs_free_eofblocks Darrick J. Wong
2021-01-13 14:57   ` Christoph Hellwig
2021-01-14 22:49     ` Darrick J. Wong
2021-01-18 17:38       ` Christoph Hellwig
2021-01-11 23:23 ` [PATCH 3/7] xfs: consolidate incore inode radix tree posteof/cowblocks tags Darrick J. Wong
2021-01-13 14:59   ` Christoph Hellwig
2021-01-11 23:23 ` [PATCH 4/7] xfs: consolidate the eofblocks and cowblocks workers Darrick J. Wong
2021-01-13 15:04   ` Christoph Hellwig
2021-01-13 23:53     ` Darrick J. Wong
2021-01-11 23:23 ` [PATCH 5/7] xfs: only walk the incore inode tree once per blockgc scan Darrick J. Wong
2021-01-13 15:06   ` Christoph Hellwig
2021-01-13 20:41     ` Darrick J. Wong
2021-01-11 23:23 ` [PATCH 6/7] xfs: rename block gc start and stop functions Darrick J. Wong
2021-01-13 15:07   ` Christoph Hellwig
2021-01-11 23:23 ` [PATCH 7/7] xfs: parallelize block preallocation garbage collection Darrick J. Wong
2021-01-13 15:09   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).