All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH wq/for-6.9 v5 0/4] workqueue: Enable unbound cpumask update on ordered workqueues
@ 2024-02-08 16:10 Waiman Long
  2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 1/4] workqueue: Link pwq's into wq->pwqs from oldest to newest Waiman Long
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Waiman Long @ 2024-02-08 16:10 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, Juri Lelli, Cestmir Kalina, Alex Gladkov,
	Phil Auld, Costa Shulyupin, Waiman Long

 v5:
  - [v4] https://lore.kernel.org/lkml/20240207011911.975608-1-longman@redhat.com/
  - Streamline patch 2 by simplifying unplug_oldest_pwq() and calling it
    only in pwq_release_workfn().

 v4:
  - [v3] https://lore.kernel.org/lkml/20240205194602.871505-1-longman@redhat.com/
  - Rebase on the latest for-6.9 branch again & discard the use of
    __WQ_ORDERED_EXPLICIT and resetting of __WQ_ORDERED.
  - Add a new patch 1 to change the ordering of pwq's in wq->pwqs from
    oldest to newest.
  - Change the terminalogy from freeze/thaw to plug/unplug.
  - Allow more than 2 pwq's in wq->pwqs of ordered workqueue but only the
    oldest one is unplugged. This eliminates the need to wait for
    the draining of extra pwq in workqueue_apply_unbound_cpumask().

 v3:
  - [v2] https://lore.kernel.org/lkml/20240203154334.791910-1-longman@redhat.com/
  - Drop patch 1 as it has been merged into the for-6.9 branch.
  - Use rcu_access_pointer() to access wq->dfl_pwq.
  - Use RCU protection instead of acquiring wq->mutex in
    apply_wqattrs_cleanup().

Ordered workqueues does not currently follow changes made to the
global unbound cpumask because per-pool workqueue changes may break
the ordering guarantee. IOW, a work function in an ordered workqueue
may run on a cpuset isolated CPU.

This series enables ordered workqueues to follow changes made to the
global unbound cpumask by temporaily suspending (plugging) the execution
of work items in the newly allocated pool_workqueue until the old pwq
has been properly drained.

The cpumask of the rescuer task of each workqueue is also made to follow
changes in workqueue unbound cpumask as well as its sysfs cpumask,
if available.

Juri Lelli (1):
  kernel/workqueue: Let rescuers follow unbound wq cpumask changes

Waiman Long (3):
  workqueue: Link pwq's into wq->pwqs from oldest to newest
  workqueue: Enable unbound cpumask update on ordered workqueues
  workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask

 kernel/workqueue.c | 81 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 69 insertions(+), 12 deletions(-)

-- 
2.39.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH wq/for-6.9 v5 1/4] workqueue: Link pwq's into wq->pwqs from oldest to newest
  2024-02-08 16:10 [PATCH wq/for-6.9 v5 0/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
@ 2024-02-08 16:10 ` Waiman Long
  2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 2/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2024-02-08 16:10 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, Juri Lelli, Cestmir Kalina, Alex Gladkov,
	Phil Auld, Costa Shulyupin, Waiman Long

Add a new pwq into the tail of wq->pwqs so that pwq iteration will
start from the oldest pwq to the newest. This ordering will facilitate
the inclusion of ordered workqueues in a wq_unbound_cpumask update.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index cf514ba0dfc3..fa7bd3b34f52 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4804,7 +4804,7 @@ static void link_pwq(struct pool_workqueue *pwq)
 	pwq->work_color = wq->work_color;
 
 	/* link in @pwq */
-	list_add_rcu(&pwq->pwqs_node, &wq->pwqs);
+	list_add_tail_rcu(&pwq->pwqs_node, &wq->pwqs);
 }
 
 /* obtain a pool matching @attr and create a pwq associating the pool and @wq */
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH wq/for-6.9 v5 2/4] workqueue: Enable unbound cpumask update on ordered workqueues
  2024-02-08 16:10 [PATCH wq/for-6.9 v5 0/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
  2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 1/4] workqueue: Link pwq's into wq->pwqs from oldest to newest Waiman Long
@ 2024-02-08 16:10 ` Waiman Long
  2024-02-08 17:42   ` Tejun Heo
  2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 3/4] kernel/workqueue: Let rescuers follow unbound wq cpumask changes Waiman Long
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2024-02-08 16:10 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, Juri Lelli, Cestmir Kalina, Alex Gladkov,
	Phil Auld, Costa Shulyupin, Waiman Long

Ordered workqueues does not currently follow changes made to the
global unbound cpumask because per-pool workqueue changes may break
the ordering guarantee. IOW, a work function in an ordered workqueue
may run on an isolated CPU.

This patch enables ordered workqueues to follow changes made to the
global unbound cpumask by temporaily plug or suspend the newly allocated
pool_workqueue from executing newly queued work items until the old
pwq has been properly drained. For ordered workqueues, there should
only be one pwq that is unplugged, the rests should be plugged.

This enables ordered workqueues to follow the unbound cpumask changes
like other unbound workqueues at the expense of some delay in execution
of work functions during the transition period.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/workqueue.c | 72 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 62 insertions(+), 10 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index fa7bd3b34f52..e261acf258b8 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -255,6 +255,7 @@ struct pool_workqueue {
 	int			refcnt;		/* L: reference count */
 	int			nr_in_flight[WORK_NR_COLORS];
 						/* L: nr of in_flight works */
+	bool			plugged;	/* L: execution suspended */
 
 	/*
 	 * nr_active management and WORK_STRUCT_INACTIVE:
@@ -1708,6 +1709,9 @@ static bool pwq_tryinc_nr_active(struct pool_workqueue *pwq, bool fill)
 		goto out;
 	}
 
+	if (unlikely(pwq->plugged))
+		return false;
+
 	/*
 	 * Unbound workqueue uses per-node shared nr_active $nna. If @pwq is
 	 * already waiting on $nna, pwq_dec_nr_active() will maintain the
@@ -1782,6 +1786,46 @@ static bool pwq_activate_first_inactive(struct pool_workqueue *pwq, bool fill)
 	}
 }
 
+/**
+ * unplug_oldest_pwq - restart an oldest plugged pool_workqueue
+ * @wq: workqueue_struct to be restarted
+ *
+ * pwq's are linked into wq->pwqs with the oldest first. For ordered
+ * workqueues, only the oldest pwq is unplugged, the others are plugged to
+ * suspend execution until the oldest one is drained. When this happens, the
+ * next oldest one (first plugged pwq in iteration) will be unplugged to
+ * restart work item execution to ensure proper work item ordering.
+ *
+ *    dfl_pwq --------------+     [P] - plugged
+ *                          |
+ *                          v
+ *    pwqs -> A -> B [P] -> C [P] (newest)
+ *            |    |        |
+ *            1    3        5
+ *            |    |        |
+ *            2    4        6
+ */
+static void unplug_oldest_pwq(struct workqueue_struct *wq)
+{
+	struct pool_workqueue *pwq;
+	unsigned long flags;
+
+	lockdep_assert_held(&wq->mutex);
+
+	pwq = list_first_entry_or_null(&wq->pwqs, struct pool_workqueue,
+				       pwqs_node);
+	if (WARN_ON_ONCE(!pwq))
+		return;
+
+	raw_spin_lock_irqsave(&pwq->pool->lock, flags);
+	if (pwq->plugged) {
+		pwq->plugged = false;
+		if (pwq_activate_first_inactive(pwq, true))
+			kick_pool(pwq->pool);
+	}
+	raw_spin_unlock_irqrestore(&pwq->pool->lock, flags);
+}
+
 /**
  * node_activate_pending_pwq - Activate a pending pwq on a wq_node_nr_active
  * @nna: wq_node_nr_active to activate a pending pwq for
@@ -4740,6 +4784,13 @@ static void pwq_release_workfn(struct kthread_work *work)
 		mutex_lock(&wq->mutex);
 		list_del_rcu(&pwq->pwqs_node);
 		is_last = list_empty(&wq->pwqs);
+
+		/*
+		 * For ordered workqueue with a plugged dfl_pwq, restart it now.
+		 */
+		if (!is_last && (wq->flags & __WQ_ORDERED))
+			unplug_oldest_pwq(wq);
+
 		mutex_unlock(&wq->mutex);
 	}
 
@@ -4966,6 +5017,15 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 	cpumask_copy(new_attrs->__pod_cpumask, new_attrs->cpumask);
 	ctx->attrs = new_attrs;
 
+	/*
+	 * For initialized ordered workqueues, there is only one pwq (dfl_pwq).
+	 * Set the plugged flag of ctx->dfl_pwq to suspend execution of newly
+	 * queued work items until execution of older work items in the old
+	 * pwq's have completed.
+	 */
+	if (!list_empty(&wq->pwqs) && (wq->flags & __WQ_ORDERED))
+		ctx->dfl_pwq->plugged = true;
+
 	ctx->wq = wq;
 	return ctx;
 
@@ -5006,10 +5066,6 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
 	if (WARN_ON(!(wq->flags & WQ_UNBOUND)))
 		return -EINVAL;
 
-	/* creating multiple pwqs breaks ordering guarantee */
-	if (!list_empty(&wq->pwqs) && WARN_ON(wq->flags & __WQ_ORDERED))
-		return -EINVAL;
-
 	ctx = apply_wqattrs_prepare(wq, attrs, wq_unbound_cpumask);
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
@@ -6489,9 +6545,6 @@ static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask)
 	list_for_each_entry(wq, &workqueues, list) {
 		if (!(wq->flags & WQ_UNBOUND) || (wq->flags & __WQ_DESTROYING))
 			continue;
-		/* creating multiple pwqs breaks ordering guarantee */
-		if (wq->flags & __WQ_ORDERED)
-			continue;
 
 		ctx = apply_wqattrs_prepare(wq, wq->unbound_attrs, unbound_cpumask);
 		if (IS_ERR(ctx)) {
@@ -7006,9 +7059,8 @@ int workqueue_sysfs_register(struct workqueue_struct *wq)
 	int ret;
 
 	/*
-	 * Adjusting max_active or creating new pwqs by applying
-	 * attributes breaks ordering guarantee.  Disallow exposing ordered
-	 * workqueues.
+	 * Adjusting max_active breaks ordering guarantee.  Disallow exposing
+	 * ordered workqueues.
 	 */
 	if (WARN_ON(wq->flags & __WQ_ORDERED))
 		return -EINVAL;
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH wq/for-6.9 v5 3/4] kernel/workqueue: Let rescuers follow unbound wq cpumask changes
  2024-02-08 16:10 [PATCH wq/for-6.9 v5 0/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
  2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 1/4] workqueue: Link pwq's into wq->pwqs from oldest to newest Waiman Long
  2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 2/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
@ 2024-02-08 16:10 ` Waiman Long
  2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 4/4] workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask Waiman Long
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2024-02-08 16:10 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, Juri Lelli, Cestmir Kalina, Alex Gladkov,
	Phil Auld, Costa Shulyupin, Waiman Long

From: Juri Lelli <juri.lelli@redhat.com>

When workqueue cpumask changes are committed the associated rescuer (if
one exists) affinity is not touched and this might be a problem down the
line for isolated setups.

Make sure rescuers affinity is updated every time a workqueue cpumask
changes, so that rescuers can't break isolation.

 [longman: set_cpus_allowed_ptr() will block until the designated task
  is enqueued on an allowed CPU, no wake_up_process() needed. Also use
  the unbound_effective_cpumask() helper as suggested by Tejun.]

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/workqueue.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e261acf258b8..8df27c496b63 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5054,6 +5054,11 @@ static void apply_wqattrs_commit(struct apply_wqattrs_ctx *ctx)
 	/* update node_nr_active->max */
 	wq_update_node_max_active(ctx->wq, -1);
 
+	/* rescuer needs to respect wq cpumask changes */
+	if (ctx->wq->rescuer)
+		set_cpus_allowed_ptr(ctx->wq->rescuer->task,
+				     unbound_effective_cpumask(ctx->wq));
+
 	mutex_unlock(&ctx->wq->mutex);
 }
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH wq/for-6.9 v5 4/4] workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask
  2024-02-08 16:10 [PATCH wq/for-6.9 v5 0/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
                   ` (2 preceding siblings ...)
  2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 3/4] kernel/workqueue: Let rescuers follow unbound wq cpumask changes Waiman Long
@ 2024-02-08 16:10 ` Waiman Long
  2024-02-08 19:12 ` [PATCH wq/for-6.9 v6 2/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
  2024-02-08 19:26 ` [PATCH wq/for-6.9 v5 0/4] " Tejun Heo
  5 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2024-02-08 16:10 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, Juri Lelli, Cestmir Kalina, Alex Gladkov,
	Phil Auld, Costa Shulyupin, Waiman Long

Commit 85f0ab43f9de ("kernel/workqueue: Bind rescuer to unbound
cpumask for WQ_UNBOUND") modified init_rescuer() to bind rescuer of
an unbound workqueue to the cpumask in wq->unbound_attrs. However
unbound_attrs->cpumask's of all workqueues are initialized to
cpu_possible_mask and will only be changed if it has the WQ_SYSFS flag
to expose a cpumask sysfs file to be written by users. So this patch
doesn't achieve what it is intended to do.

If an unbound workqueue is created after wq_unbound_cpumask is modified
and there is no more unbound cpumask update after that, the unbound
rescuer will be bound to all CPUs unless the workqueue is created
with the WQ_SYSFS flag and a user explicitly modified its cpumask
sysfs file.  Fix this problem by binding directly to wq_unbound_cpumask
in init_rescuer().

Fixes: 85f0ab43f9de ("kernel/workqueue: Bind rescuer to unbound cpumask for WQ_UNBOUND")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8df27c496b63..ca53e1144f0a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5302,7 +5302,7 @@ static int init_rescuer(struct workqueue_struct *wq)
 
 	wq->rescuer = rescuer;
 	if (wq->flags & WQ_UNBOUND)
-		kthread_bind_mask(rescuer->task, wq->unbound_attrs->cpumask);
+		kthread_bind_mask(rescuer->task, wq_unbound_cpumask);
 	else
 		kthread_bind_mask(rescuer->task, cpu_possible_mask);
 	wake_up_process(rescuer->task);
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH wq/for-6.9 v5 2/4] workqueue: Enable unbound cpumask update on ordered workqueues
  2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 2/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
@ 2024-02-08 17:42   ` Tejun Heo
  0 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2024-02-08 17:42 UTC (permalink / raw)
  To: Waiman Long
  Cc: Lai Jiangshan, linux-kernel, Juri Lelli, Cestmir Kalina,
	Alex Gladkov, Phil Auld, Costa Shulyupin

Hello,

Generally looks good to me. Minor nits below:

On Thu, Feb 08, 2024 at 11:10:12AM -0500, Waiman Long wrote:
> +static void unplug_oldest_pwq(struct workqueue_struct *wq)
> +{
> +	struct pool_workqueue *pwq;
> +	unsigned long flags;
> +
> +	lockdep_assert_held(&wq->mutex);
> +
> +	pwq = list_first_entry_or_null(&wq->pwqs, struct pool_workqueue,
> +				       pwqs_node);
> +	if (WARN_ON_ONCE(!pwq))
> +		return;
> +	raw_spin_lock_irqsave(&pwq->pool->lock, flags);

Can we do raw_spin_lock_irq() instead?

> @@ -4740,6 +4784,13 @@ static void pwq_release_workfn(struct kthread_work *work)
>  		mutex_lock(&wq->mutex);
>  		list_del_rcu(&pwq->pwqs_node);
>  		is_last = list_empty(&wq->pwqs);
> +
> +		/*
> +		 * For ordered workqueue with a plugged dfl_pwq, restart it now.
> +		 */
> +		if (!is_last && (wq->flags & __WQ_ORDERED))
> +			unplug_oldest_pwq(wq);

I'm not so sure about is_last test here. unplug_oldest_pwq() is testing for
NULL anyway, so maybe just drop this test here and drop WARN_ON_ONCE()
there?

> @@ -4966,6 +5017,15 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
>  	cpumask_copy(new_attrs->__pod_cpumask, new_attrs->cpumask);
>  	ctx->attrs = new_attrs;
>  
> +	/*
> +	 * For initialized ordered workqueues, there is only one pwq (dfl_pwq).
> +	 * Set the plugged flag of ctx->dfl_pwq to suspend execution of newly
> +	 * queued work items until execution of older work items in the old
> +	 * pwq's have completed.
> +	 */
> +	if (!list_empty(&wq->pwqs) && (wq->flags & __WQ_ORDERED))
> +		ctx->dfl_pwq->plugged = true;

Can we test __WQ_ORDERED first?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH wq/for-6.9 v6 2/4] workqueue: Enable unbound cpumask update on ordered workqueues
  2024-02-08 16:10 [PATCH wq/for-6.9 v5 0/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
                   ` (3 preceding siblings ...)
  2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 4/4] workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask Waiman Long
@ 2024-02-08 19:12 ` Waiman Long
  2024-02-08 19:26 ` [PATCH wq/for-6.9 v5 0/4] " Tejun Heo
  5 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2024-02-08 19:12 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan
  Cc: linux-kernel, Juri Lelli, Cestmir Kalina, Alex Gladkov,
	Phil Auld, Costa Shulyupin, Waiman Long

Ordered workqueues does not currently follow changes made to the
global unbound cpumask because per-pool workqueue changes may break
the ordering guarantee. IOW, a work function in an ordered workqueue
may run on an isolated CPU.

This patch enables ordered workqueues to follow changes made to the
global unbound cpumask by temporaily plug or suspend the newly allocated
pool_workqueue from executing newly queued work items until the old
pwq has been properly drained. For ordered workqueues, there should
only be one pwq that is unplugged, the rests should be plugged.

This enables ordered workqueues to follow the unbound cpumask changes
like other unbound workqueues at the expense of some delay in execution
of work functions during the transition period.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/workqueue.c | 69 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 59 insertions(+), 10 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index fa7bd3b34f52..da124859a691 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -255,6 +255,7 @@ struct pool_workqueue {
 	int			refcnt;		/* L: reference count */
 	int			nr_in_flight[WORK_NR_COLORS];
 						/* L: nr of in_flight works */
+	bool			plugged;	/* L: execution suspended */
 
 	/*
 	 * nr_active management and WORK_STRUCT_INACTIVE:
@@ -1708,6 +1709,9 @@ static bool pwq_tryinc_nr_active(struct pool_workqueue *pwq, bool fill)
 		goto out;
 	}
 
+	if (unlikely(pwq->plugged))
+		return false;
+
 	/*
 	 * Unbound workqueue uses per-node shared nr_active $nna. If @pwq is
 	 * already waiting on $nna, pwq_dec_nr_active() will maintain the
@@ -1782,6 +1786,43 @@ static bool pwq_activate_first_inactive(struct pool_workqueue *pwq, bool fill)
 	}
 }
 
+/**
+ * unplug_oldest_pwq - restart an oldest plugged pool_workqueue
+ * @wq: workqueue_struct to be restarted
+ *
+ * pwq's are linked into wq->pwqs with the oldest first. For ordered
+ * workqueues, only the oldest pwq is unplugged, the others are plugged to
+ * suspend execution until the oldest one is drained. When this happens, the
+ * next oldest one (first plugged pwq in iteration) will be unplugged to
+ * restart work item execution to ensure proper work item ordering.
+ *
+ *    dfl_pwq --------------+     [P] - plugged
+ *                          |
+ *                          v
+ *    pwqs -> A -> B [P] -> C [P] (newest)
+ *            |    |        |
+ *            1    3        5
+ *            |    |        |
+ *            2    4        6
+ */
+static void unplug_oldest_pwq(struct workqueue_struct *wq)
+{
+	struct pool_workqueue *pwq;
+
+	lockdep_assert_held(&wq->mutex);
+
+	/* Caller should make sure that pwqs isn't empty before calling */
+	pwq = list_first_entry_or_null(&wq->pwqs, struct pool_workqueue,
+				       pwqs_node);
+	raw_spin_lock_irq(&pwq->pool->lock);
+	if (pwq->plugged) {
+		pwq->plugged = false;
+		if (pwq_activate_first_inactive(pwq, true))
+			kick_pool(pwq->pool);
+	}
+	raw_spin_unlock_irq(&pwq->pool->lock);
+}
+
 /**
  * node_activate_pending_pwq - Activate a pending pwq on a wq_node_nr_active
  * @nna: wq_node_nr_active to activate a pending pwq for
@@ -4740,6 +4781,13 @@ static void pwq_release_workfn(struct kthread_work *work)
 		mutex_lock(&wq->mutex);
 		list_del_rcu(&pwq->pwqs_node);
 		is_last = list_empty(&wq->pwqs);
+
+		/*
+		 * For ordered workqueue with a plugged dfl_pwq, restart it now.
+		 */
+		if (!is_last && (wq->flags & __WQ_ORDERED))
+			unplug_oldest_pwq(wq);
+
 		mutex_unlock(&wq->mutex);
 	}
 
@@ -4966,6 +5014,15 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 	cpumask_copy(new_attrs->__pod_cpumask, new_attrs->cpumask);
 	ctx->attrs = new_attrs;
 
+	/*
+	 * For initialized ordered workqueues, there should only be one pwq
+	 * (dfl_pwq). Set the plugged flag of ctx->dfl_pwq to suspend execution
+	 * of newly queued work items until execution of older work items in
+	 * the old pwq's have completed.
+	 */
+	if ((wq->flags & __WQ_ORDERED) && !list_empty(&wq->pwqs))
+		ctx->dfl_pwq->plugged = true;
+
 	ctx->wq = wq;
 	return ctx;
 
@@ -5006,10 +5063,6 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
 	if (WARN_ON(!(wq->flags & WQ_UNBOUND)))
 		return -EINVAL;
 
-	/* creating multiple pwqs breaks ordering guarantee */
-	if (!list_empty(&wq->pwqs) && WARN_ON(wq->flags & __WQ_ORDERED))
-		return -EINVAL;
-
 	ctx = apply_wqattrs_prepare(wq, attrs, wq_unbound_cpumask);
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
@@ -6489,9 +6542,6 @@ static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask)
 	list_for_each_entry(wq, &workqueues, list) {
 		if (!(wq->flags & WQ_UNBOUND) || (wq->flags & __WQ_DESTROYING))
 			continue;
-		/* creating multiple pwqs breaks ordering guarantee */
-		if (wq->flags & __WQ_ORDERED)
-			continue;
 
 		ctx = apply_wqattrs_prepare(wq, wq->unbound_attrs, unbound_cpumask);
 		if (IS_ERR(ctx)) {
@@ -7006,9 +7056,8 @@ int workqueue_sysfs_register(struct workqueue_struct *wq)
 	int ret;
 
 	/*
-	 * Adjusting max_active or creating new pwqs by applying
-	 * attributes breaks ordering guarantee.  Disallow exposing ordered
-	 * workqueues.
+	 * Adjusting max_active breaks ordering guarantee.  Disallow exposing
+	 * ordered workqueues.
 	 */
 	if (WARN_ON(wq->flags & __WQ_ORDERED))
 		return -EINVAL;
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH wq/for-6.9 v5 0/4] workqueue: Enable unbound cpumask update on ordered workqueues
  2024-02-08 16:10 [PATCH wq/for-6.9 v5 0/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
                   ` (4 preceding siblings ...)
  2024-02-08 19:12 ` [PATCH wq/for-6.9 v6 2/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
@ 2024-02-08 19:26 ` Tejun Heo
  2024-02-08 19:33   ` Waiman Long
  5 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2024-02-08 19:26 UTC (permalink / raw)
  To: Waiman Long
  Cc: Lai Jiangshan, linux-kernel, Juri Lelli, Cestmir Kalina,
	Alex Gladkov, Phil Auld, Costa Shulyupin

On Thu, Feb 08, 2024 at 11:10:10AM -0500, Waiman Long wrote:
>  v5:
>   - [v4] https://lore.kernel.org/lkml/20240207011911.975608-1-longman@redhat.com/
>   - Streamline patch 2 by simplifying unplug_oldest_pwq() and calling it
>     only in pwq_release_workfn().
> 
>  v4:
>   - [v3] https://lore.kernel.org/lkml/20240205194602.871505-1-longman@redhat.com/
>   - Rebase on the latest for-6.9 branch again & discard the use of
>     __WQ_ORDERED_EXPLICIT and resetting of __WQ_ORDERED.
>   - Add a new patch 1 to change the ordering of pwq's in wq->pwqs from
>     oldest to newest.
>   - Change the terminalogy from freeze/thaw to plug/unplug.
>   - Allow more than 2 pwq's in wq->pwqs of ordered workqueue but only the
>     oldest one is unplugged. This eliminates the need to wait for
>     the draining of extra pwq in workqueue_apply_unbound_cpumask().
> 
>  v3:
>   - [v2] https://lore.kernel.org/lkml/20240203154334.791910-1-longman@redhat.com/
>   - Drop patch 1 as it has been merged into the for-6.9 branch.
>   - Use rcu_access_pointer() to access wq->dfl_pwq.
>   - Use RCU protection instead of acquiring wq->mutex in
>     apply_wqattrs_cleanup().

Applied the series w/ (w/ the updated second patch) to wq/for-6.9.

Thanks for working on this. It's really great that this got solved finally.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH wq/for-6.9 v5 0/4] workqueue: Enable unbound cpumask update on ordered workqueues
  2024-02-08 19:26 ` [PATCH wq/for-6.9 v5 0/4] " Tejun Heo
@ 2024-02-08 19:33   ` Waiman Long
  0 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2024-02-08 19:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lai Jiangshan, linux-kernel, Juri Lelli, Cestmir Kalina,
	Alex Gladkov, Phil Auld, Costa Shulyupin


On 2/8/24 14:26, Tejun Heo wrote:
> On Thu, Feb 08, 2024 at 11:10:10AM -0500, Waiman Long wrote:
>>   v5:
>>    - [v4] https://lore.kernel.org/lkml/20240207011911.975608-1-longman@redhat.com/
>>    - Streamline patch 2 by simplifying unplug_oldest_pwq() and calling it
>>      only in pwq_release_workfn().
>>
>>   v4:
>>    - [v3] https://lore.kernel.org/lkml/20240205194602.871505-1-longman@redhat.com/
>>    - Rebase on the latest for-6.9 branch again & discard the use of
>>      __WQ_ORDERED_EXPLICIT and resetting of __WQ_ORDERED.
>>    - Add a new patch 1 to change the ordering of pwq's in wq->pwqs from
>>      oldest to newest.
>>    - Change the terminalogy from freeze/thaw to plug/unplug.
>>    - Allow more than 2 pwq's in wq->pwqs of ordered workqueue but only the
>>      oldest one is unplugged. This eliminates the need to wait for
>>      the draining of extra pwq in workqueue_apply_unbound_cpumask().
>>
>>   v3:
>>    - [v2] https://lore.kernel.org/lkml/20240203154334.791910-1-longman@redhat.com/
>>    - Drop patch 1 as it has been merged into the for-6.9 branch.
>>    - Use rcu_access_pointer() to access wq->dfl_pwq.
>>    - Use RCU protection instead of acquiring wq->mutex in
>>      apply_wqattrs_cleanup().
> Applied the series w/ (w/ the updated second patch) to wq/for-6.9.
>
> Thanks for working on this. It's really great that this got solved finally.

Me too. Thanks for all the good suggestions that you had given me as I 
don't have a full understanding of all the different pieces of the 
workqueue code.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-02-08 19:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-08 16:10 [PATCH wq/for-6.9 v5 0/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 1/4] workqueue: Link pwq's into wq->pwqs from oldest to newest Waiman Long
2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 2/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
2024-02-08 17:42   ` Tejun Heo
2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 3/4] kernel/workqueue: Let rescuers follow unbound wq cpumask changes Waiman Long
2024-02-08 16:10 ` [PATCH wq/for-6.9 v5 4/4] workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask Waiman Long
2024-02-08 19:12 ` [PATCH wq/for-6.9 v6 2/4] workqueue: Enable unbound cpumask update on ordered workqueues Waiman Long
2024-02-08 19:26 ` [PATCH wq/for-6.9 v5 0/4] " Tejun Heo
2024-02-08 19:33   ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.