linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v3 0/2] xfs: speed up parallel workqueues
@ 2021-01-28  6:03 Darrick J. Wong
  2021-01-28  6:03 ` [PATCH 1/2] xfs: increase the default parallelism levels of pwork clients Darrick J. Wong
  2021-01-28  6:03 ` [PATCH 2/2] xfs: set WQ_SYSFS on all workqueues in debug mode Darrick J. Wong
  0 siblings, 2 replies; 7+ messages in thread
From: Darrick J. Wong @ 2021-01-28  6:03 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, linux-xfs, hch, david, bfoster

Hi all,

After some discussion on IRC with Dave, we came to the conclusion that
our background workqueue behavior could use some tweaking.  Kernel
worker threads that scan the filesystem and/or run their own
transactions more closely fit the definition of an unbound workqueue --
the work items can take a long time, they don't have much in common with
the submitter thread, the submitter isn't waiting hotly for a response,
and we the process scheduler should deal with scheduling them.
Furthermore, we don't want to place artificial limits on workqueue
scaling because that can leave unused capacity while we're blocking
(pwork is currently used for mount time quotacheck).

Therefore, we switch pwork to use an unbound workqueue, and now we let
the workqueue code figure out the relevant concurrency.  We expose the
pwork workqueue via sysfs, and document the interesting knobs in the
administrator's guide.

v2: expose all the workqueues via sysfs
v3: only expose those workqueues that exist for kernel threads that
    create their own transactions, and update the admin guide.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=workqueue-speedups-5.12
---
 Documentation/admin-guide/xfs.rst |   38 +++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_iwalk.c                |    5 +----
 fs/xfs/xfs_log.c                  |    5 +++--
 fs/xfs/xfs_mru_cache.c            |    2 +-
 fs/xfs/xfs_pwork.c                |   25 +++++-------------------
 fs/xfs/xfs_pwork.h                |    4 +---
 fs/xfs/xfs_super.c                |   23 ++++++++++++++--------
 fs/xfs/xfs_super.h                |    6 ++++++
 8 files changed, 69 insertions(+), 39 deletions(-)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] xfs: increase the default parallelism levels of pwork clients
  2021-01-28  6:03 [PATCHSET v3 0/2] xfs: speed up parallel workqueues Darrick J. Wong
@ 2021-01-28  6:03 ` Darrick J. Wong
  2021-01-28  9:27   ` Christoph Hellwig
  2021-01-29 16:30   ` Brian Foster
  2021-01-28  6:03 ` [PATCH 2/2] xfs: set WQ_SYSFS on all workqueues in debug mode Darrick J. Wong
  1 sibling, 2 replies; 7+ messages in thread
From: Darrick J. Wong @ 2021-01-28  6:03 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, hch, david, bfoster

From: Darrick J. Wong <djwong@kernel.org>

Increase the parallelism level for pwork clients to the workqueue
defaults so that we can take advantage of computers with a lot of CPUs
and a lot of hardware.  On fast systems this will speed up quotacheck by
a large factor, and the following posteof/cowblocks cleanup series will
use the functionality presented in this patch to run garbage collection
as quickly as possible.

We do this by switching the pwork workqueue to unbounded, since the
current user (quotacheck) runs lengthy scans for each work item and we
don't care about dispatching the work on a warm cpu cache or anything
like that.  Also set WQ_SYSFS so that we can monitor where the wq is
running.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 Documentation/admin-guide/xfs.rst |   38 +++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_iwalk.c                |    5 +----
 fs/xfs/xfs_pwork.c                |   25 +++++-------------------
 fs/xfs/xfs_pwork.h                |    4 +---
 4 files changed, 45 insertions(+), 27 deletions(-)


diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst
index 86de8a1ad91c..b00b1eece9de 100644
--- a/Documentation/admin-guide/xfs.rst
+++ b/Documentation/admin-guide/xfs.rst
@@ -495,3 +495,41 @@ the class and error context. For example, the default values for
 "metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults
 to "fail immediately" behaviour. This is done because ENODEV is a fatal,
 unrecoverable error no matter how many times the metadata IO is retried.
+
+Workqueue Concurrency
+=====================
+
+XFS uses kernel workqueues to parallelize metadata update processes.  This
+enables it to take advantage of storage hardware that can service many IO
+operations simultaneously.  This interface exposes internal implementation
+details of XFS, and as such is explicitly not part of any userspace API/ABI
+guarantee the kernel may give userspace.  These are undocumented features of
+the generic workqueue implementation XFS uses for concurrency, and they are
+provided here purely for diagnostic and tuning purposes and may change at any
+time in the future.
+
+The control knobs for a filesystem's workqueues are organized by task at hand
+and the short name of the data device.  They all can be found in:
+
+  /sys/bus/workqueue/devices/${task}!${device}
+
+================  ===========
+  Task            Description
+================  ===========
+  xfs_iwalk-$pid  Inode scans of the entire filesystem. Currently limited to
+                  mount time quotacheck.
+================  ===========
+
+For example, the knobs for the quotacheck workqueue for /dev/nvme0n1 would be
+found in /sys/bus/workqueue/devices/xfs_iwalk-1111!nvme0n1/.
+
+The interesting knobs for XFS workqueues are as follows:
+
+============     ===========
+  Knob           Description
+============     ===========
+  max_active     Maximum number of background threads that can be started to
+                 run the work.
+  cpumask        CPUs upon which the threads are allowed to run.
+  nice           Relative priority of scheduling the threads.  These are the
+                 same nice levels that can be applied to userspace processes.
diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index eae3aff9bc97..c4a340f1f1e1 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -618,15 +618,12 @@ xfs_iwalk_threaded(
 {
 	struct xfs_pwork_ctl	pctl;
 	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
-	unsigned int		nr_threads;
 	int			error;
 
 	ASSERT(agno < mp->m_sb.sb_agcount);
 	ASSERT(!(flags & ~XFS_IWALK_FLAGS_ALL));
 
-	nr_threads = xfs_pwork_guess_datadev_parallelism(mp);
-	error = xfs_pwork_init(mp, &pctl, xfs_iwalk_ag_work, "xfs_iwalk",
-			nr_threads);
+	error = xfs_pwork_init(mp, &pctl, xfs_iwalk_ag_work, "xfs_iwalk");
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_pwork.c b/fs/xfs/xfs_pwork.c
index b03333f1c84a..c283b801cc5d 100644
--- a/fs/xfs/xfs_pwork.c
+++ b/fs/xfs/xfs_pwork.c
@@ -61,16 +61,18 @@ xfs_pwork_init(
 	struct xfs_mount	*mp,
 	struct xfs_pwork_ctl	*pctl,
 	xfs_pwork_work_fn	work_fn,
-	const char		*tag,
-	unsigned int		nr_threads)
+	const char		*tag)
 {
+	unsigned int		nr_threads = 0;
+
 #ifdef DEBUG
 	if (xfs_globals.pwork_threads >= 0)
 		nr_threads = xfs_globals.pwork_threads;
 #endif
 	trace_xfs_pwork_init(mp, nr_threads, current->pid);
 
-	pctl->wq = alloc_workqueue("%s-%d", WQ_FREEZABLE, nr_threads, tag,
+	pctl->wq = alloc_workqueue("%s-%d",
+			WQ_UNBOUND | WQ_SYSFS | WQ_FREEZABLE, nr_threads, tag,
 			current->pid);
 	if (!pctl->wq)
 		return -ENOMEM;
@@ -117,20 +119,3 @@ xfs_pwork_poll(
 				atomic_read(&pctl->nr_work) == 0, HZ) == 0)
 		touch_softlockup_watchdog();
 }
-
-/*
- * Return the amount of parallelism that the data device can handle, or 0 for
- * no limit.
- */
-unsigned int
-xfs_pwork_guess_datadev_parallelism(
-	struct xfs_mount	*mp)
-{
-	struct xfs_buftarg	*btp = mp->m_ddev_targp;
-
-	/*
-	 * For now we'll go with the most conservative setting possible,
-	 * which is two threads for an SSD and 1 thread everywhere else.
-	 */
-	return blk_queue_nonrot(btp->bt_bdev->bd_disk->queue) ? 2 : 1;
-}
diff --git a/fs/xfs/xfs_pwork.h b/fs/xfs/xfs_pwork.h
index 8133124cf3bb..c0ef81fc85dd 100644
--- a/fs/xfs/xfs_pwork.h
+++ b/fs/xfs/xfs_pwork.h
@@ -51,11 +51,9 @@ xfs_pwork_want_abort(
 }
 
 int xfs_pwork_init(struct xfs_mount *mp, struct xfs_pwork_ctl *pctl,
-		xfs_pwork_work_fn work_fn, const char *tag,
-		unsigned int nr_threads);
+		xfs_pwork_work_fn work_fn, const char *tag);
 void xfs_pwork_queue(struct xfs_pwork_ctl *pctl, struct xfs_pwork *pwork);
 int xfs_pwork_destroy(struct xfs_pwork_ctl *pctl);
 void xfs_pwork_poll(struct xfs_pwork_ctl *pctl);
-unsigned int xfs_pwork_guess_datadev_parallelism(struct xfs_mount *mp);
 
 #endif /* __XFS_PWORK_H__ */


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] xfs: set WQ_SYSFS on all workqueues in debug mode
  2021-01-28  6:03 [PATCHSET v3 0/2] xfs: speed up parallel workqueues Darrick J. Wong
  2021-01-28  6:03 ` [PATCH 1/2] xfs: increase the default parallelism levels of pwork clients Darrick J. Wong
@ 2021-01-28  6:03 ` Darrick J. Wong
  2021-01-28  9:27   ` Christoph Hellwig
  2021-01-29 16:31   ` Brian Foster
  1 sibling, 2 replies; 7+ messages in thread
From: Darrick J. Wong @ 2021-01-28  6:03 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, linux-xfs, hch, david, bfoster

From: Darrick J. Wong <djwong@kernel.org>

When CONFIG_XFS_DEBUG=y, set WQ_SYSFS on all workqueues that we create
so that we (developers) have a means to monitor cpu affinity and whatnot
for background workers.  In the next patchset we'll expose knobs for
some of the workqueues publicly and document it, but not now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c       |    5 +++--
 fs/xfs/xfs_mru_cache.c |    2 +-
 fs/xfs/xfs_super.c     |   23 ++++++++++++++---------
 fs/xfs/xfs_super.h     |    6 ++++++
 4 files changed, 24 insertions(+), 12 deletions(-)


diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index d8b814227734..9aa30e7cd314 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1492,8 +1492,9 @@ xlog_alloc_log(
 	log->l_iclog->ic_prev = prev_iclog;	/* re-write 1st prev ptr */
 
 	log->l_ioend_workqueue = alloc_workqueue("xfs-log/%s",
-			WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_HIGHPRI, 0,
-			mp->m_super->s_id);
+			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM |
+				    WQ_HIGHPRI),
+			0, mp->m_super->s_id);
 	if (!log->l_ioend_workqueue)
 		goto out_free_iclog;
 
diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
index a06661dac5be..34c3b16f834f 100644
--- a/fs/xfs/xfs_mru_cache.c
+++ b/fs/xfs/xfs_mru_cache.c
@@ -294,7 +294,7 @@ int
 xfs_mru_cache_init(void)
 {
 	xfs_mru_reap_wq = alloc_workqueue("xfs_mru_cache",
-				WQ_MEM_RECLAIM|WQ_FREEZABLE, 1);
+			XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE), 1);
 	if (!xfs_mru_reap_wq)
 		return -ENOMEM;
 	return 0;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index aed74a3fc787..8959561351ca 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -495,33 +495,37 @@ xfs_init_mount_workqueues(
 	struct xfs_mount	*mp)
 {
 	mp->m_buf_workqueue = alloc_workqueue("xfs-buf/%s",
-			WQ_MEM_RECLAIM|WQ_FREEZABLE, 1, mp->m_super->s_id);
+			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
+			1, mp->m_super->s_id);
 	if (!mp->m_buf_workqueue)
 		goto out;
 
 	mp->m_unwritten_workqueue = alloc_workqueue("xfs-conv/%s",
-			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_super->s_id);
+			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
+			0, mp->m_super->s_id);
 	if (!mp->m_unwritten_workqueue)
 		goto out_destroy_buf;
 
 	mp->m_cil_workqueue = alloc_workqueue("xfs-cil/%s",
-			WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND,
+			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
 			0, mp->m_super->s_id);
 	if (!mp->m_cil_workqueue)
 		goto out_destroy_unwritten;
 
 	mp->m_reclaim_workqueue = alloc_workqueue("xfs-reclaim/%s",
-			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_super->s_id);
+			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
+			0, mp->m_super->s_id);
 	if (!mp->m_reclaim_workqueue)
 		goto out_destroy_cil;
 
 	mp->m_eofblocks_workqueue = alloc_workqueue("xfs-eofblocks/%s",
-			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_super->s_id);
+			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
+			0, mp->m_super->s_id);
 	if (!mp->m_eofblocks_workqueue)
 		goto out_destroy_reclaim;
 
-	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s", WQ_FREEZABLE, 0,
-					       mp->m_super->s_id);
+	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s",
+			XFS_WQFLAGS(WQ_FREEZABLE), 0, mp->m_super->s_id);
 	if (!mp->m_sync_workqueue)
 		goto out_destroy_eofb;
 
@@ -2085,11 +2089,12 @@ xfs_init_workqueues(void)
 	 * max_active value for this workqueue.
 	 */
 	xfs_alloc_wq = alloc_workqueue("xfsalloc",
-			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0);
+			XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE), 0);
 	if (!xfs_alloc_wq)
 		return -ENOMEM;
 
-	xfs_discard_wq = alloc_workqueue("xfsdiscard", WQ_UNBOUND, 0);
+	xfs_discard_wq = alloc_workqueue("xfsdiscard", XFS_WQFLAGS(WQ_UNBOUND),
+			0);
 	if (!xfs_discard_wq)
 		goto out_free_alloc_wq;
 
diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h
index b552cf6d3379..1ca484b8357f 100644
--- a/fs/xfs/xfs_super.h
+++ b/fs/xfs/xfs_super.h
@@ -75,6 +75,12 @@ extern void xfs_qm_exit(void);
 				XFS_ASSERT_FATAL_STRING \
 				XFS_DBG_STRING /* DBG must be last */
 
+#ifdef DEBUG
+# define XFS_WQFLAGS(wqflags)	(WQ_SYSFS | (wqflags))
+#else
+# define XFS_WQFLAGS(wqflags)	(wqflags)
+#endif
+
 struct xfs_inode;
 struct xfs_mount;
 struct xfs_buftarg;


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] xfs: increase the default parallelism levels of pwork clients
  2021-01-28  6:03 ` [PATCH 1/2] xfs: increase the default parallelism levels of pwork clients Darrick J. Wong
@ 2021-01-28  9:27   ` Christoph Hellwig
  2021-01-29 16:30   ` Brian Foster
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2021-01-28  9:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, david, bfoster

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] xfs: set WQ_SYSFS on all workqueues in debug mode
  2021-01-28  6:03 ` [PATCH 2/2] xfs: set WQ_SYSFS on all workqueues in debug mode Darrick J. Wong
@ 2021-01-28  9:27   ` Christoph Hellwig
  2021-01-29 16:31   ` Brian Foster
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2021-01-28  9:27 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dave Chinner, linux-xfs, hch, david, bfoster, Tejun Heo,
	Lai Jiangshan, linux-kernel

On Wed, Jan 27, 2021 at 10:03:36PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> When CONFIG_XFS_DEBUG=y, set WQ_SYSFS on all workqueues that we create
> so that we (developers) have a means to monitor cpu affinity and whatnot
> for background workers.  In the next patchset we'll expose knobs for
> some of the workqueues publicly and document it, but not now.

And I still think this is a horrible idea.

> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c       |    5 +++--
>  fs/xfs/xfs_mru_cache.c |    2 +-
>  fs/xfs/xfs_super.c     |   23 ++++++++++++++---------
>  fs/xfs/xfs_super.h     |    6 ++++++
>  4 files changed, 24 insertions(+), 12 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index d8b814227734..9aa30e7cd314 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1492,8 +1492,9 @@ xlog_alloc_log(
>  	log->l_iclog->ic_prev = prev_iclog;	/* re-write 1st prev ptr */
>  
>  	log->l_ioend_workqueue = alloc_workqueue("xfs-log/%s",
> -			WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_HIGHPRI, 0,
> -			mp->m_super->s_id);
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM |
> +				    WQ_HIGHPRI),
> +			0, mp->m_super->s_id);
>  	if (!log->l_ioend_workqueue)
>  		goto out_free_iclog;
>  
> diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
> index a06661dac5be..34c3b16f834f 100644
> --- a/fs/xfs/xfs_mru_cache.c
> +++ b/fs/xfs/xfs_mru_cache.c
> @@ -294,7 +294,7 @@ int
>  xfs_mru_cache_init(void)
>  {
>  	xfs_mru_reap_wq = alloc_workqueue("xfs_mru_cache",
> -				WQ_MEM_RECLAIM|WQ_FREEZABLE, 1);
> +			XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE), 1);
>  	if (!xfs_mru_reap_wq)
>  		return -ENOMEM;
>  	return 0;
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index aed74a3fc787..8959561351ca 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -495,33 +495,37 @@ xfs_init_mount_workqueues(
>  	struct xfs_mount	*mp)
>  {
>  	mp->m_buf_workqueue = alloc_workqueue("xfs-buf/%s",
> -			WQ_MEM_RECLAIM|WQ_FREEZABLE, 1, mp->m_super->s_id);
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
> +			1, mp->m_super->s_id);
>  	if (!mp->m_buf_workqueue)
>  		goto out;
>  
>  	mp->m_unwritten_workqueue = alloc_workqueue("xfs-conv/%s",
> -			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_super->s_id);
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
> +			0, mp->m_super->s_id);
>  	if (!mp->m_unwritten_workqueue)
>  		goto out_destroy_buf;
>  
>  	mp->m_cil_workqueue = alloc_workqueue("xfs-cil/%s",
> -			WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND,
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
>  			0, mp->m_super->s_id);
>  	if (!mp->m_cil_workqueue)
>  		goto out_destroy_unwritten;
>  
>  	mp->m_reclaim_workqueue = alloc_workqueue("xfs-reclaim/%s",
> -			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_super->s_id);
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
> +			0, mp->m_super->s_id);
>  	if (!mp->m_reclaim_workqueue)
>  		goto out_destroy_cil;
>  
>  	mp->m_eofblocks_workqueue = alloc_workqueue("xfs-eofblocks/%s",
> -			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_super->s_id);
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
> +			0, mp->m_super->s_id);
>  	if (!mp->m_eofblocks_workqueue)
>  		goto out_destroy_reclaim;
>  
> -	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s", WQ_FREEZABLE, 0,
> -					       mp->m_super->s_id);
> +	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s",
> +			XFS_WQFLAGS(WQ_FREEZABLE), 0, mp->m_super->s_id);
>  	if (!mp->m_sync_workqueue)
>  		goto out_destroy_eofb;
>  
> @@ -2085,11 +2089,12 @@ xfs_init_workqueues(void)
>  	 * max_active value for this workqueue.
>  	 */
>  	xfs_alloc_wq = alloc_workqueue("xfsalloc",
> -			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0);
> +			XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE), 0);
>  	if (!xfs_alloc_wq)
>  		return -ENOMEM;
>  
> -	xfs_discard_wq = alloc_workqueue("xfsdiscard", WQ_UNBOUND, 0);
> +	xfs_discard_wq = alloc_workqueue("xfsdiscard", XFS_WQFLAGS(WQ_UNBOUND),
> +			0);
>  	if (!xfs_discard_wq)
>  		goto out_free_alloc_wq;
>  
> diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h
> index b552cf6d3379..1ca484b8357f 100644
> --- a/fs/xfs/xfs_super.h
> +++ b/fs/xfs/xfs_super.h
> @@ -75,6 +75,12 @@ extern void xfs_qm_exit(void);
>  				XFS_ASSERT_FATAL_STRING \
>  				XFS_DBG_STRING /* DBG must be last */
>  
> +#ifdef DEBUG
> +# define XFS_WQFLAGS(wqflags)	(WQ_SYSFS | (wqflags))
> +#else
> +# define XFS_WQFLAGS(wqflags)	(wqflags)
> +#endif
> +
>  struct xfs_inode;
>  struct xfs_mount;
>  struct xfs_buftarg;
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] xfs: increase the default parallelism levels of pwork clients
  2021-01-28  6:03 ` [PATCH 1/2] xfs: increase the default parallelism levels of pwork clients Darrick J. Wong
  2021-01-28  9:27   ` Christoph Hellwig
@ 2021-01-29 16:30   ` Brian Foster
  1 sibling, 0 replies; 7+ messages in thread
From: Brian Foster @ 2021-01-29 16:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, david

On Wed, Jan 27, 2021 at 10:03:31PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Increase the parallelism level for pwork clients to the workqueue
> defaults so that we can take advantage of computers with a lot of CPUs
> and a lot of hardware.  On fast systems this will speed up quotacheck by
> a large factor, and the following posteof/cowblocks cleanup series will
> use the functionality presented in this patch to run garbage collection
> as quickly as possible.
> 
> We do this by switching the pwork workqueue to unbounded, since the
> current user (quotacheck) runs lengthy scans for each work item and we
> don't care about dispatching the work on a warm cpu cache or anything
> like that.  Also set WQ_SYSFS so that we can monitor where the wq is
> running.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---

I feel like this should be two patches, but otherwise:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  Documentation/admin-guide/xfs.rst |   38 +++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_iwalk.c                |    5 +----
>  fs/xfs/xfs_pwork.c                |   25 +++++-------------------
>  fs/xfs/xfs_pwork.h                |    4 +---
>  4 files changed, 45 insertions(+), 27 deletions(-)
> 
> 
> diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst
> index 86de8a1ad91c..b00b1eece9de 100644
> --- a/Documentation/admin-guide/xfs.rst
> +++ b/Documentation/admin-guide/xfs.rst
> @@ -495,3 +495,41 @@ the class and error context. For example, the default values for
>  "metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults
>  to "fail immediately" behaviour. This is done because ENODEV is a fatal,
>  unrecoverable error no matter how many times the metadata IO is retried.
> +
> +Workqueue Concurrency
> +=====================
> +
> +XFS uses kernel workqueues to parallelize metadata update processes.  This
> +enables it to take advantage of storage hardware that can service many IO
> +operations simultaneously.  This interface exposes internal implementation
> +details of XFS, and as such is explicitly not part of any userspace API/ABI
> +guarantee the kernel may give userspace.  These are undocumented features of
> +the generic workqueue implementation XFS uses for concurrency, and they are
> +provided here purely for diagnostic and tuning purposes and may change at any
> +time in the future.
> +
> +The control knobs for a filesystem's workqueues are organized by task at hand
> +and the short name of the data device.  They all can be found in:
> +
> +  /sys/bus/workqueue/devices/${task}!${device}
> +
> +================  ===========
> +  Task            Description
> +================  ===========
> +  xfs_iwalk-$pid  Inode scans of the entire filesystem. Currently limited to
> +                  mount time quotacheck.
> +================  ===========
> +
> +For example, the knobs for the quotacheck workqueue for /dev/nvme0n1 would be
> +found in /sys/bus/workqueue/devices/xfs_iwalk-1111!nvme0n1/.
> +
> +The interesting knobs for XFS workqueues are as follows:
> +
> +============     ===========
> +  Knob           Description
> +============     ===========
> +  max_active     Maximum number of background threads that can be started to
> +                 run the work.
> +  cpumask        CPUs upon which the threads are allowed to run.
> +  nice           Relative priority of scheduling the threads.  These are the
> +                 same nice levels that can be applied to userspace processes.
> diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> index eae3aff9bc97..c4a340f1f1e1 100644
> --- a/fs/xfs/xfs_iwalk.c
> +++ b/fs/xfs/xfs_iwalk.c
> @@ -618,15 +618,12 @@ xfs_iwalk_threaded(
>  {
>  	struct xfs_pwork_ctl	pctl;
>  	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, startino);
> -	unsigned int		nr_threads;
>  	int			error;
>  
>  	ASSERT(agno < mp->m_sb.sb_agcount);
>  	ASSERT(!(flags & ~XFS_IWALK_FLAGS_ALL));
>  
> -	nr_threads = xfs_pwork_guess_datadev_parallelism(mp);
> -	error = xfs_pwork_init(mp, &pctl, xfs_iwalk_ag_work, "xfs_iwalk",
> -			nr_threads);
> +	error = xfs_pwork_init(mp, &pctl, xfs_iwalk_ag_work, "xfs_iwalk");
>  	if (error)
>  		return error;
>  
> diff --git a/fs/xfs/xfs_pwork.c b/fs/xfs/xfs_pwork.c
> index b03333f1c84a..c283b801cc5d 100644
> --- a/fs/xfs/xfs_pwork.c
> +++ b/fs/xfs/xfs_pwork.c
> @@ -61,16 +61,18 @@ xfs_pwork_init(
>  	struct xfs_mount	*mp,
>  	struct xfs_pwork_ctl	*pctl,
>  	xfs_pwork_work_fn	work_fn,
> -	const char		*tag,
> -	unsigned int		nr_threads)
> +	const char		*tag)
>  {
> +	unsigned int		nr_threads = 0;
> +
>  #ifdef DEBUG
>  	if (xfs_globals.pwork_threads >= 0)
>  		nr_threads = xfs_globals.pwork_threads;
>  #endif
>  	trace_xfs_pwork_init(mp, nr_threads, current->pid);
>  
> -	pctl->wq = alloc_workqueue("%s-%d", WQ_FREEZABLE, nr_threads, tag,
> +	pctl->wq = alloc_workqueue("%s-%d",
> +			WQ_UNBOUND | WQ_SYSFS | WQ_FREEZABLE, nr_threads, tag,
>  			current->pid);
>  	if (!pctl->wq)
>  		return -ENOMEM;
> @@ -117,20 +119,3 @@ xfs_pwork_poll(
>  				atomic_read(&pctl->nr_work) == 0, HZ) == 0)
>  		touch_softlockup_watchdog();
>  }
> -
> -/*
> - * Return the amount of parallelism that the data device can handle, or 0 for
> - * no limit.
> - */
> -unsigned int
> -xfs_pwork_guess_datadev_parallelism(
> -	struct xfs_mount	*mp)
> -{
> -	struct xfs_buftarg	*btp = mp->m_ddev_targp;
> -
> -	/*
> -	 * For now we'll go with the most conservative setting possible,
> -	 * which is two threads for an SSD and 1 thread everywhere else.
> -	 */
> -	return blk_queue_nonrot(btp->bt_bdev->bd_disk->queue) ? 2 : 1;
> -}
> diff --git a/fs/xfs/xfs_pwork.h b/fs/xfs/xfs_pwork.h
> index 8133124cf3bb..c0ef81fc85dd 100644
> --- a/fs/xfs/xfs_pwork.h
> +++ b/fs/xfs/xfs_pwork.h
> @@ -51,11 +51,9 @@ xfs_pwork_want_abort(
>  }
>  
>  int xfs_pwork_init(struct xfs_mount *mp, struct xfs_pwork_ctl *pctl,
> -		xfs_pwork_work_fn work_fn, const char *tag,
> -		unsigned int nr_threads);
> +		xfs_pwork_work_fn work_fn, const char *tag);
>  void xfs_pwork_queue(struct xfs_pwork_ctl *pctl, struct xfs_pwork *pwork);
>  int xfs_pwork_destroy(struct xfs_pwork_ctl *pctl);
>  void xfs_pwork_poll(struct xfs_pwork_ctl *pctl);
> -unsigned int xfs_pwork_guess_datadev_parallelism(struct xfs_mount *mp);
>  
>  #endif /* __XFS_PWORK_H__ */
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] xfs: set WQ_SYSFS on all workqueues in debug mode
  2021-01-28  6:03 ` [PATCH 2/2] xfs: set WQ_SYSFS on all workqueues in debug mode Darrick J. Wong
  2021-01-28  9:27   ` Christoph Hellwig
@ 2021-01-29 16:31   ` Brian Foster
  1 sibling, 0 replies; 7+ messages in thread
From: Brian Foster @ 2021-01-29 16:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs, hch, david

On Wed, Jan 27, 2021 at 10:03:36PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> When CONFIG_XFS_DEBUG=y, set WQ_SYSFS on all workqueues that we create
> so that we (developers) have a means to monitor cpu affinity and whatnot
> for background workers.  In the next patchset we'll expose knobs for
> some of the workqueues publicly and document it, but not now.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_log.c       |    5 +++--
>  fs/xfs/xfs_mru_cache.c |    2 +-
>  fs/xfs/xfs_super.c     |   23 ++++++++++++++---------
>  fs/xfs/xfs_super.h     |    6 ++++++
>  4 files changed, 24 insertions(+), 12 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index d8b814227734..9aa30e7cd314 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1492,8 +1492,9 @@ xlog_alloc_log(
>  	log->l_iclog->ic_prev = prev_iclog;	/* re-write 1st prev ptr */
>  
>  	log->l_ioend_workqueue = alloc_workqueue("xfs-log/%s",
> -			WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_HIGHPRI, 0,
> -			mp->m_super->s_id);
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM |
> +				    WQ_HIGHPRI),
> +			0, mp->m_super->s_id);
>  	if (!log->l_ioend_workqueue)
>  		goto out_free_iclog;
>  
> diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
> index a06661dac5be..34c3b16f834f 100644
> --- a/fs/xfs/xfs_mru_cache.c
> +++ b/fs/xfs/xfs_mru_cache.c
> @@ -294,7 +294,7 @@ int
>  xfs_mru_cache_init(void)
>  {
>  	xfs_mru_reap_wq = alloc_workqueue("xfs_mru_cache",
> -				WQ_MEM_RECLAIM|WQ_FREEZABLE, 1);
> +			XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE), 1);
>  	if (!xfs_mru_reap_wq)
>  		return -ENOMEM;
>  	return 0;
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index aed74a3fc787..8959561351ca 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -495,33 +495,37 @@ xfs_init_mount_workqueues(
>  	struct xfs_mount	*mp)
>  {
>  	mp->m_buf_workqueue = alloc_workqueue("xfs-buf/%s",
> -			WQ_MEM_RECLAIM|WQ_FREEZABLE, 1, mp->m_super->s_id);
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
> +			1, mp->m_super->s_id);
>  	if (!mp->m_buf_workqueue)
>  		goto out;
>  
>  	mp->m_unwritten_workqueue = alloc_workqueue("xfs-conv/%s",
> -			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_super->s_id);
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
> +			0, mp->m_super->s_id);
>  	if (!mp->m_unwritten_workqueue)
>  		goto out_destroy_buf;
>  
>  	mp->m_cil_workqueue = alloc_workqueue("xfs-cil/%s",
> -			WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND,
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
>  			0, mp->m_super->s_id);
>  	if (!mp->m_cil_workqueue)
>  		goto out_destroy_unwritten;
>  
>  	mp->m_reclaim_workqueue = alloc_workqueue("xfs-reclaim/%s",
> -			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_super->s_id);
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
> +			0, mp->m_super->s_id);
>  	if (!mp->m_reclaim_workqueue)
>  		goto out_destroy_cil;
>  
>  	mp->m_eofblocks_workqueue = alloc_workqueue("xfs-eofblocks/%s",
> -			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_super->s_id);
> +			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
> +			0, mp->m_super->s_id);
>  	if (!mp->m_eofblocks_workqueue)
>  		goto out_destroy_reclaim;
>  
> -	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s", WQ_FREEZABLE, 0,
> -					       mp->m_super->s_id);
> +	mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s",
> +			XFS_WQFLAGS(WQ_FREEZABLE), 0, mp->m_super->s_id);
>  	if (!mp->m_sync_workqueue)
>  		goto out_destroy_eofb;
>  
> @@ -2085,11 +2089,12 @@ xfs_init_workqueues(void)
>  	 * max_active value for this workqueue.
>  	 */
>  	xfs_alloc_wq = alloc_workqueue("xfsalloc",
> -			WQ_MEM_RECLAIM|WQ_FREEZABLE, 0);
> +			XFS_WQFLAGS(WQ_MEM_RECLAIM | WQ_FREEZABLE), 0);
>  	if (!xfs_alloc_wq)
>  		return -ENOMEM;
>  
> -	xfs_discard_wq = alloc_workqueue("xfsdiscard", WQ_UNBOUND, 0);
> +	xfs_discard_wq = alloc_workqueue("xfsdiscard", XFS_WQFLAGS(WQ_UNBOUND),
> +			0);
>  	if (!xfs_discard_wq)
>  		goto out_free_alloc_wq;
>  
> diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h
> index b552cf6d3379..1ca484b8357f 100644
> --- a/fs/xfs/xfs_super.h
> +++ b/fs/xfs/xfs_super.h
> @@ -75,6 +75,12 @@ extern void xfs_qm_exit(void);
>  				XFS_ASSERT_FATAL_STRING \
>  				XFS_DBG_STRING /* DBG must be last */
>  
> +#ifdef DEBUG
> +# define XFS_WQFLAGS(wqflags)	(WQ_SYSFS | (wqflags))
> +#else
> +# define XFS_WQFLAGS(wqflags)	(wqflags)
> +#endif
> +
>  struct xfs_inode;
>  struct xfs_mount;
>  struct xfs_buftarg;
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-29 16:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-28  6:03 [PATCHSET v3 0/2] xfs: speed up parallel workqueues Darrick J. Wong
2021-01-28  6:03 ` [PATCH 1/2] xfs: increase the default parallelism levels of pwork clients Darrick J. Wong
2021-01-28  9:27   ` Christoph Hellwig
2021-01-29 16:30   ` Brian Foster
2021-01-28  6:03 ` [PATCH 2/2] xfs: set WQ_SYSFS on all workqueues in debug mode Darrick J. Wong
2021-01-28  9:27   ` Christoph Hellwig
2021-01-29 16:31   ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).