linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] workqueue: fix memory leak of struct pool_workqueue in destroy_workqueue()
@ 2021-08-12  8:38 lizhe.67
  2021-08-13 12:03 ` Lai Jiangshan
  0 siblings, 1 reply; 10+ messages in thread
From: lizhe.67 @ 2021-08-12  8:38 UTC (permalink / raw)
  To: tj, jiangshanlai; +Cc: linux-kernel, lizefan.x, Li Zhe

From: Li Zhe <lizhe.67@bytedance.com>

Even after def98c84b6cd, we may encount sanity check failures in
destroy_workqueue() although we call flush_work() before, which
result in memory leak of struct pool_workqueue.

The warning logs are listed below.

WARNING: CPU: 0 PID: 19336 at kernel/workqueue.c:4430 destroy_workqueue+0x11a/0x2f0
*****
destroy_workqueue: test_workqueue9 has the following busy pwq
  pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=0/1 refcnt=2
      in-flight: 5658:wq_barrier_func
Showing busy workqueues and worker pools:
*****

The possible stack which result in the failure is listed below.

thread A:
destroy_workqueue()
----raw_spin_lock_irq(&pwq->pool->lock)
----pwq_busy()

thread B:
----process_one_work()
----raw_spin_unlock_irq(&pool->lock)
----worker->current_func(work)
----cond_resched()
----raw_spin_lock_irq(&pool->lock)
----pwq_dec_nr_in_flight(pwq, work_color)

Thread A may get pool->lock before thread B, with the pwq->refcnt
is still 2, which result in memory leak and sanity check failures.

Notice that wq_barrier_func() only calls complete(), and it is not
suitable to expand struct work_struct considering of the memory cost,
this patch put complete() after obtaining pool->lock in function
process_one_work() to eliminate competition by identify the work as a
barrier with the work->func equal to NULL.

Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
 kernel/workqueue.c | 39 ++++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f148eacda55a..02f77f35522c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -280,6 +280,12 @@ struct workqueue_struct {
 	struct pool_workqueue __rcu *numa_pwq_tbl[]; /* PWR: unbound pwqs indexed by node */
 };
 
+struct wq_barrier {
+	struct work_struct	work;
+	struct completion	done;
+	struct task_struct	*task;	/* purely informational */
+};
+
 static struct kmem_cache *pwq_cache;
 
 static cpumask_var_t *wq_numa_possible_cpumask;
@@ -2152,6 +2158,11 @@ static bool manage_workers(struct worker *worker)
 	return true;
 }
 
+static inline bool is_barrier_func(work_func_t func)
+{
+	return func == NULL;
+}
+
 /**
  * process_one_work - process single work
  * @worker: self
@@ -2273,7 +2284,8 @@ __acquires(&pool->lock)
 	 */
 	lockdep_invariant_state(true);
 	trace_workqueue_execute_start(work);
-	worker->current_func(work);
+	if (likely(!is_barrier_func(worker->current_func)))
+		worker->current_func(work);
 	/*
 	 * While we must be careful to not use "work" after this, the trace
 	 * point will only record its address.
@@ -2303,6 +2315,11 @@ __acquires(&pool->lock)
 
 	raw_spin_lock_irq(&pool->lock);
 
+	if (unlikely(is_barrier_func(worker->current_func))) {
+		struct wq_barrier *barr = container_of(work, struct wq_barrier, work);
+		complete(&barr->done);
+	}
+
 	/* clear cpu intensive status */
 	if (unlikely(cpu_intensive))
 		worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
@@ -2618,18 +2635,6 @@ static void check_flush_dependency(struct workqueue_struct *target_wq,
 		  target_wq->name, target_func);
 }
 
-struct wq_barrier {
-	struct work_struct	work;
-	struct completion	done;
-	struct task_struct	*task;	/* purely informational */
-};
-
-static void wq_barrier_func(struct work_struct *work)
-{
-	struct wq_barrier *barr = container_of(work, struct wq_barrier, work);
-	complete(&barr->done);
-}
-
 /**
  * insert_wq_barrier - insert a barrier work
  * @pwq: pwq to insert barrier into
@@ -2667,7 +2672,11 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
 	 * checks and call back into the fixup functions where we
 	 * might deadlock.
 	 */
-	INIT_WORK_ONSTACK(&barr->work, wq_barrier_func);
+	/* no need to init func because complete() has been moved to
+	 * proccess_one_work(), which means that we use NULL to identify
+	 * if this work is a barrier
+	 */
+	INIT_WORK_ONSTACK(&barr->work, NULL);
 	__set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work));
 
 	init_completion_map(&barr->done, &target->lockdep_map);
@@ -4682,7 +4691,7 @@ static void pr_cont_pool_info(struct worker_pool *pool)
 
 static void pr_cont_work(bool comma, struct work_struct *work)
 {
-	if (work->func == wq_barrier_func) {
+	if (is_barrier_func(work->func)) {
 		struct wq_barrier *barr;
 
 		barr = container_of(work, struct wq_barrier, work);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] workqueue: fix memory leak of struct pool_workqueue in destroy_workqueue()
  2021-08-12  8:38 [PATCH] workqueue: fix memory leak of struct pool_workqueue in destroy_workqueue() lizhe.67
@ 2021-08-13 12:03 ` Lai Jiangshan
  2021-08-17  1:32   ` [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work() Lai Jiangshan
  0 siblings, 1 reply; 10+ messages in thread
From: Lai Jiangshan @ 2021-08-13 12:03 UTC (permalink / raw)
  To: lizhe.67; +Cc: Tejun Heo, LKML, lizefan.x

On Thu, Aug 12, 2021 at 4:38 PM <lizhe.67@bytedance.com> wrote:
>
> From: Li Zhe <lizhe.67@bytedance.com>
>
> Even after def98c84b6cd, we may encount sanity check failures in
> destroy_workqueue() although we call flush_work() before, which
> result in memory leak of struct pool_workqueue.
>
> The warning logs are listed below.
>
> WARNING: CPU: 0 PID: 19336 at kernel/workqueue.c:4430 destroy_workqueue+0x11a/0x2f0
> *****
> destroy_workqueue: test_workqueue9 has the following busy pwq
>   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=0/1 refcnt=2
>       in-flight: 5658:wq_barrier_func
> Showing busy workqueues and worker pools:
> *****
>
> The possible stack which result in the failure is listed below.
>
> thread A:
> destroy_workqueue()
> ----raw_spin_lock_irq(&pwq->pool->lock)
> ----pwq_busy()
>
> thread B:
> ----process_one_work()
> ----raw_spin_unlock_irq(&pool->lock)
> ----worker->current_func(work)
> ----cond_resched()
> ----raw_spin_lock_irq(&pool->lock)
> ----pwq_dec_nr_in_flight(pwq, work_color)

Hello, Li

Thanks for your report.
As this list of events shows, the problem does exist.

But complicating process_one_work() and adding branches to it
are not optimized.  I'm trying to figure out another way to fix it.

Thanks
Lai

>
> Thread A may get pool->lock before thread B, with the pwq->refcnt
> is still 2, which result in memory leak and sanity check failures.
>
> Notice that wq_barrier_func() only calls complete(), and it is not
> suitable to expand struct work_struct considering of the memory cost,
> this patch put complete() after obtaining pool->lock in function
> process_one_work() to eliminate competition by identify the work as a
> barrier with the work->func equal to NULL.
>
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> ---
>  kernel/workqueue.c | 39 ++++++++++++++++++++++++---------------
>  1 file changed, 24 insertions(+), 15 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f148eacda55a..02f77f35522c 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -280,6 +280,12 @@ struct workqueue_struct {
>         struct pool_workqueue __rcu *numa_pwq_tbl[]; /* PWR: unbound pwqs indexed by node */
>  };
>
> +struct wq_barrier {
> +       struct work_struct      work;
> +       struct completion       done;
> +       struct task_struct      *task;  /* purely informational */
> +};
> +
>  static struct kmem_cache *pwq_cache;
>
>  static cpumask_var_t *wq_numa_possible_cpumask;
> @@ -2152,6 +2158,11 @@ static bool manage_workers(struct worker *worker)
>         return true;
>  }
>
> +static inline bool is_barrier_func(work_func_t func)
> +{
> +       return func == NULL;
> +}
> +
>  /**
>   * process_one_work - process single work
>   * @worker: self
> @@ -2273,7 +2284,8 @@ __acquires(&pool->lock)
>          */
>         lockdep_invariant_state(true);
>         trace_workqueue_execute_start(work);
> -       worker->current_func(work);
> +       if (likely(!is_barrier_func(worker->current_func)))
> +               worker->current_func(work);
>         /*
>          * While we must be careful to not use "work" after this, the trace
>          * point will only record its address.
> @@ -2303,6 +2315,11 @@ __acquires(&pool->lock)
>
>         raw_spin_lock_irq(&pool->lock);
>
> +       if (unlikely(is_barrier_func(worker->current_func))) {
> +               struct wq_barrier *barr = container_of(work, struct wq_barrier, work);
> +               complete(&barr->done);
> +       }
> +
>         /* clear cpu intensive status */
>         if (unlikely(cpu_intensive))
>                 worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
> @@ -2618,18 +2635,6 @@ static void check_flush_dependency(struct workqueue_struct *target_wq,
>                   target_wq->name, target_func);
>  }
>
> -struct wq_barrier {
> -       struct work_struct      work;
> -       struct completion       done;
> -       struct task_struct      *task;  /* purely informational */
> -};
> -
> -static void wq_barrier_func(struct work_struct *work)
> -{
> -       struct wq_barrier *barr = container_of(work, struct wq_barrier, work);
> -       complete(&barr->done);
> -}
> -
>  /**
>   * insert_wq_barrier - insert a barrier work
>   * @pwq: pwq to insert barrier into
> @@ -2667,7 +2672,11 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
>          * checks and call back into the fixup functions where we
>          * might deadlock.
>          */
> -       INIT_WORK_ONSTACK(&barr->work, wq_barrier_func);
> +       /* no need to init func because complete() has been moved to
> +        * proccess_one_work(), which means that we use NULL to identify
> +        * if this work is a barrier
> +        */
> +       INIT_WORK_ONSTACK(&barr->work, NULL);
>         __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work));
>
>         init_completion_map(&barr->done, &target->lockdep_map);
> @@ -4682,7 +4691,7 @@ static void pr_cont_pool_info(struct worker_pool *pool)
>
>  static void pr_cont_work(bool comma, struct work_struct *work)
>  {
> -       if (work->func == wq_barrier_func) {
> +       if (is_barrier_func(work->func)) {
>                 struct wq_barrier *barr;
>
>                 barr = container_of(work, struct wq_barrier, work);
> --
> 2.11.0
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work()
  2021-08-13 12:03 ` Lai Jiangshan
@ 2021-08-17  1:32   ` Lai Jiangshan
  2021-08-17  1:32     ` [PATCH 1/6] workqueue: Rename "delayed" (delayed by active management) to "inactive" Lai Jiangshan
                       ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Lai Jiangshan @ 2021-08-17  1:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: lizefan.x, lizhe.67, Tejun Heo, Lai Jiangshan, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

Li reported a problem[1] that the sanity check in destroy_workqueue() catches
a WARNING due to drain_workqueue() doesn't wait for flush_work() to finish.

The warning logs are listed below:

WARNING: CPU: 0 PID: 19336 at kernel/workqueue.c:4430 destroy_workqueue+0x11a/0x2f0
*****
destroy_workqueue: test_workqueue9 has the following busy pwq
  pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=0/1 refcnt=2
      in-flight: 5658:wq_barrier_func
Showing busy workqueues and worker pools:
*****

It shows that even after drain_workqueue() returns, the barrier work item
is still in flight and the pwq (and a worker) is still busy on it.

The problem is caused by drain_workqueue() not watching flush_work():

Thread A				Worker
					/* normal work item with linked */
					process_scheduled_works()
destroy_workqueue()			  process_one_work()
  drain_workqueue()			    /* run normal work item */
				 /--	    pwq_dec_nr_in_flight()
    flush_workqueue()	    <---/
		/* the last normal work item is done */
  sanity_check				  process_one_work()
				       /--  raw_spin_unlock_irq(&pool->lock)
    raw_spin_lock_irq(&pool->lock)  <-/     /* maybe preempt */
    *WARNING*				    wq_barrier_func()
					    /* maybe preempt by cond_resched() */

So the solution is to make drain_workqueue() watch for flush_work() which
means making flush_workqueue() watch for flush_work().

Due to historical convenience, we used WORK_NO_COLOR for barrier work items
queued by flush_work().  The color has two purposes:
	Not participate in flushing
	Not participate in nr_active

Only the second purpose is obligatory.  So the plan is to mark barrier
work items inactive without using WORK_NO_COLOR in patch4 so that we can
assign a flushing color to them in patch5.

Patch1-3 are preparation, and patch6 is a cleanup.

[1]: https://lore.kernel.org/lkml/20210812083814.32453-1-lizhe.67@bytedance.com/

Lai Jiangshan (6):
  workqueue: Rename "delayed" (delayed by active management) to
    "inactive"
  workqueue: Change arguement of pwq_dec_nr_in_flight()
  workqueue: Change the code of calculating work_flags in
    insert_wq_barrier()
  workqueue: Mark barrier work with WORK_STRUCT_INACTIVE
  workqueue: Assign a color to barrier work items
  workqueue: Remove unused WORK_NO_COLOR

 include/linux/workqueue.h   |  13 ++--
 kernel/workqueue.c          | 133 ++++++++++++++++++++++--------------
 kernel/workqueue_internal.h |   3 +-
 3 files changed, 88 insertions(+), 61 deletions(-)

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/6] workqueue: Rename "delayed" (delayed by active management) to "inactive"
  2021-08-17  1:32   ` [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work() Lai Jiangshan
@ 2021-08-17  1:32     ` Lai Jiangshan
  2021-08-17  1:32     ` [PATCH 2/6] workqueue: Change arguement of pwq_dec_nr_in_flight() Lai Jiangshan
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Lai Jiangshan @ 2021-08-17  1:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: lizefan.x, lizhe.67, Tejun Heo, Lai Jiangshan, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

There are two kinds of "delayed" work items in workqueue subsystem.

One is for timer-delayed work items which are visible to workqueue users.
The other kind is for work items delayed by active management which can
not be directly visible to workqueue users.  We mixed the word "delayed"
for both kinds and caused somewhat ambiguity.

This patch renames the later one (delayed by active management) to
"inactive", because it is used for workqueue active management and
most of its related symbols are named with "active" or "activate".

All "delayed" and "DELAYED" are carefully checked and renamed one by
one to avoid accidentally changing the name of the other kind for
timer-delayed.

No functional change intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 include/linux/workqueue.h |  4 +--
 kernel/workqueue.c        | 58 +++++++++++++++++++--------------------
 2 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index d15a7730ee18..38e479bcdbfe 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -29,7 +29,7 @@ void delayed_work_timer_fn(struct timer_list *t);
 
 enum {
 	WORK_STRUCT_PENDING_BIT	= 0,	/* work item is pending execution */
-	WORK_STRUCT_DELAYED_BIT	= 1,	/* work item is delayed */
+	WORK_STRUCT_INACTIVE_BIT= 1,	/* work item is inactive */
 	WORK_STRUCT_PWQ_BIT	= 2,	/* data points to pwq */
 	WORK_STRUCT_LINKED_BIT	= 3,	/* next work is linked to this one */
 #ifdef CONFIG_DEBUG_OBJECTS_WORK
@@ -42,7 +42,7 @@ enum {
 	WORK_STRUCT_COLOR_BITS	= 4,
 
 	WORK_STRUCT_PENDING	= 1 << WORK_STRUCT_PENDING_BIT,
-	WORK_STRUCT_DELAYED	= 1 << WORK_STRUCT_DELAYED_BIT,
+	WORK_STRUCT_INACTIVE	= 1 << WORK_STRUCT_INACTIVE_BIT,
 	WORK_STRUCT_PWQ		= 1 << WORK_STRUCT_PWQ_BIT,
 	WORK_STRUCT_LINKED	= 1 << WORK_STRUCT_LINKED_BIT,
 #ifdef CONFIG_DEBUG_OBJECTS_WORK
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 50142fc08902..b2712807c403 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -207,7 +207,7 @@ struct pool_workqueue {
 						/* L: nr of in_flight works */
 	int			nr_active;	/* L: nr of active works */
 	int			max_active;	/* L: max active works */
-	struct list_head	delayed_works;	/* L: delayed works */
+	struct list_head	inactive_works;	/* L: inactive works */
 	struct list_head	pwqs_node;	/* WR: node on wq->pwqs */
 	struct list_head	mayday_node;	/* MD: node on wq->maydays */
 
@@ -1136,7 +1136,7 @@ static void put_pwq_unlocked(struct pool_workqueue *pwq)
 	}
 }
 
-static void pwq_activate_delayed_work(struct work_struct *work)
+static void pwq_activate_inactive_work(struct work_struct *work)
 {
 	struct pool_workqueue *pwq = get_work_pwq(work);
 
@@ -1144,16 +1144,16 @@ static void pwq_activate_delayed_work(struct work_struct *work)
 	if (list_empty(&pwq->pool->worklist))
 		pwq->pool->watchdog_ts = jiffies;
 	move_linked_works(work, &pwq->pool->worklist, NULL);
-	__clear_bit(WORK_STRUCT_DELAYED_BIT, work_data_bits(work));
+	__clear_bit(WORK_STRUCT_INACTIVE_BIT, work_data_bits(work));
 	pwq->nr_active++;
 }
 
-static void pwq_activate_first_delayed(struct pool_workqueue *pwq)
+static void pwq_activate_first_inactive(struct pool_workqueue *pwq)
 {
-	struct work_struct *work = list_first_entry(&pwq->delayed_works,
+	struct work_struct *work = list_first_entry(&pwq->inactive_works,
 						    struct work_struct, entry);
 
-	pwq_activate_delayed_work(work);
+	pwq_activate_inactive_work(work);
 }
 
 /**
@@ -1176,10 +1176,10 @@ static void pwq_dec_nr_in_flight(struct pool_workqueue *pwq, int color)
 	pwq->nr_in_flight[color]--;
 
 	pwq->nr_active--;
-	if (!list_empty(&pwq->delayed_works)) {
-		/* one down, submit a delayed one */
+	if (!list_empty(&pwq->inactive_works)) {
+		/* one down, submit an inactive one */
 		if (pwq->nr_active < pwq->max_active)
-			pwq_activate_first_delayed(pwq);
+			pwq_activate_first_inactive(pwq);
 	}
 
 	/* is flush in progress and are we at the flushing tip? */
@@ -1281,14 +1281,14 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 		debug_work_deactivate(work);
 
 		/*
-		 * A delayed work item cannot be grabbed directly because
+		 * An inactive work item cannot be grabbed directly because
 		 * it might have linked NO_COLOR work items which, if left
-		 * on the delayed_list, will confuse pwq->nr_active
+		 * on the inactive_works list, will confuse pwq->nr_active
 		 * management later on and cause stall.  Make sure the work
 		 * item is activated before grabbing.
 		 */
-		if (*work_data_bits(work) & WORK_STRUCT_DELAYED)
-			pwq_activate_delayed_work(work);
+		if (*work_data_bits(work) & WORK_STRUCT_INACTIVE)
+			pwq_activate_inactive_work(work);
 
 		list_del_init(&work->entry);
 		pwq_dec_nr_in_flight(pwq, get_work_color(work));
@@ -1490,8 +1490,8 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 		if (list_empty(worklist))
 			pwq->pool->watchdog_ts = jiffies;
 	} else {
-		work_flags |= WORK_STRUCT_DELAYED;
-		worklist = &pwq->delayed_works;
+		work_flags |= WORK_STRUCT_INACTIVE;
+		worklist = &pwq->inactive_works;
 	}
 
 	debug_work_activate(work);
@@ -2531,7 +2531,7 @@ static int rescuer_thread(void *__rescuer)
 			/*
 			 * The above execution of rescued work items could
 			 * have created more to rescue through
-			 * pwq_activate_first_delayed() or chained
+			 * pwq_activate_first_inactive() or chained
 			 * queueing.  Let's put @pwq back on mayday list so
 			 * that such back-to-back work items, which may be
 			 * being used to relieve memory pressure, don't
@@ -2957,7 +2957,7 @@ void drain_workqueue(struct workqueue_struct *wq)
 		bool drained;
 
 		raw_spin_lock_irq(&pwq->pool->lock);
-		drained = !pwq->nr_active && list_empty(&pwq->delayed_works);
+		drained = !pwq->nr_active && list_empty(&pwq->inactive_works);
 		raw_spin_unlock_irq(&pwq->pool->lock);
 
 		if (drained)
@@ -3707,7 +3707,7 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
  * @pwq: target pool_workqueue
  *
  * If @pwq isn't freezing, set @pwq->max_active to the associated
- * workqueue's saved_max_active and activate delayed work items
+ * workqueue's saved_max_active and activate inactive work items
  * accordingly.  If @pwq is freezing, clear @pwq->max_active to zero.
  */
 static void pwq_adjust_max_active(struct pool_workqueue *pwq)
@@ -3736,9 +3736,9 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
 
 		pwq->max_active = wq->saved_max_active;
 
-		while (!list_empty(&pwq->delayed_works) &&
+		while (!list_empty(&pwq->inactive_works) &&
 		       pwq->nr_active < pwq->max_active) {
-			pwq_activate_first_delayed(pwq);
+			pwq_activate_first_inactive(pwq);
 			kick = true;
 		}
 
@@ -3769,7 +3769,7 @@ static void init_pwq(struct pool_workqueue *pwq, struct workqueue_struct *wq,
 	pwq->wq = wq;
 	pwq->flush_color = -1;
 	pwq->refcnt = 1;
-	INIT_LIST_HEAD(&pwq->delayed_works);
+	INIT_LIST_HEAD(&pwq->inactive_works);
 	INIT_LIST_HEAD(&pwq->pwqs_node);
 	INIT_LIST_HEAD(&pwq->mayday_node);
 	INIT_WORK(&pwq->unbound_release_work, pwq_unbound_release_workfn);
@@ -4356,7 +4356,7 @@ static bool pwq_busy(struct pool_workqueue *pwq)
 
 	if ((pwq != pwq->wq->dfl_pwq) && (pwq->refcnt > 1))
 		return true;
-	if (pwq->nr_active || !list_empty(&pwq->delayed_works))
+	if (pwq->nr_active || !list_empty(&pwq->inactive_works))
 		return true;
 
 	return false;
@@ -4552,7 +4552,7 @@ bool workqueue_congested(int cpu, struct workqueue_struct *wq)
 	else
 		pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
 
-	ret = !list_empty(&pwq->delayed_works);
+	ret = !list_empty(&pwq->inactive_works);
 	preempt_enable();
 	rcu_read_unlock();
 
@@ -4748,11 +4748,11 @@ static void show_pwq(struct pool_workqueue *pwq)
 		pr_cont("\n");
 	}
 
-	if (!list_empty(&pwq->delayed_works)) {
+	if (!list_empty(&pwq->inactive_works)) {
 		bool comma = false;
 
-		pr_info("    delayed:");
-		list_for_each_entry(work, &pwq->delayed_works, entry) {
+		pr_info("    inactive:");
+		list_for_each_entry(work, &pwq->inactive_works, entry) {
 			pr_cont_work(comma, work);
 			comma = !(*work_data_bits(work) & WORK_STRUCT_LINKED);
 		}
@@ -4782,7 +4782,7 @@ void show_workqueue_state(void)
 		bool idle = true;
 
 		for_each_pwq(pwq, wq) {
-			if (pwq->nr_active || !list_empty(&pwq->delayed_works)) {
+			if (pwq->nr_active || !list_empty(&pwq->inactive_works)) {
 				idle = false;
 				break;
 			}
@@ -4794,7 +4794,7 @@ void show_workqueue_state(void)
 
 		for_each_pwq(pwq, wq) {
 			raw_spin_lock_irqsave(&pwq->pool->lock, flags);
-			if (pwq->nr_active || !list_empty(&pwq->delayed_works))
+			if (pwq->nr_active || !list_empty(&pwq->inactive_works))
 				show_pwq(pwq);
 			raw_spin_unlock_irqrestore(&pwq->pool->lock, flags);
 			/*
@@ -5177,7 +5177,7 @@ EXPORT_SYMBOL_GPL(work_on_cpu_safe);
  * freeze_workqueues_begin - begin freezing workqueues
  *
  * Start freezing workqueues.  After this function returns, all freezable
- * workqueues will queue new works to their delayed_works list instead of
+ * workqueues will queue new works to their inactive_works list instead of
  * pool->worklist.
  *
  * CONTEXT:
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/6] workqueue: Change arguement of pwq_dec_nr_in_flight()
  2021-08-17  1:32   ` [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work() Lai Jiangshan
  2021-08-17  1:32     ` [PATCH 1/6] workqueue: Rename "delayed" (delayed by active management) to "inactive" Lai Jiangshan
@ 2021-08-17  1:32     ` Lai Jiangshan
  2021-08-17  1:32     ` [PATCH 3/6] workqueue: Change the code of calculating work_flags in insert_wq_barrier() Lai Jiangshan
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Lai Jiangshan @ 2021-08-17  1:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: lizefan.x, lizhe.67, Tejun Heo, Lai Jiangshan, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

Make pwq_dec_nr_in_flight() use work_data rather just work_color.

Prepare for later patch to get WORK_STRUCT_INACTIVE bit from work_data
in pwq_dec_nr_in_flight().

No functional change intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b2712807c403..0a4fae7cb918 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -579,9 +579,9 @@ static unsigned int work_color_to_flags(int color)
 	return color << WORK_STRUCT_COLOR_SHIFT;
 }
 
-static int get_work_color(struct work_struct *work)
+static int get_work_color(unsigned long work_data)
 {
-	return (*work_data_bits(work) >> WORK_STRUCT_COLOR_SHIFT) &
+	return (work_data >> WORK_STRUCT_COLOR_SHIFT) &
 		((1 << WORK_STRUCT_COLOR_BITS) - 1);
 }
 
@@ -1159,7 +1159,7 @@ static void pwq_activate_first_inactive(struct pool_workqueue *pwq)
 /**
  * pwq_dec_nr_in_flight - decrement pwq's nr_in_flight
  * @pwq: pwq of interest
- * @color: color of work which left the queue
+ * @work_data: work_data of work which left the queue
  *
  * A work either has completed or is removed from pending queue,
  * decrement nr_in_flight of its pwq and handle workqueue flushing.
@@ -1167,8 +1167,10 @@ static void pwq_activate_first_inactive(struct pool_workqueue *pwq)
  * CONTEXT:
  * raw_spin_lock_irq(pool->lock).
  */
-static void pwq_dec_nr_in_flight(struct pool_workqueue *pwq, int color)
+static void pwq_dec_nr_in_flight(struct pool_workqueue *pwq, unsigned long work_data)
 {
+	int color = get_work_color(work_data);
+
 	/* uncolored work items don't participate in flushing or nr_active */
 	if (color == WORK_NO_COLOR)
 		goto out_put;
@@ -1291,7 +1293,7 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 			pwq_activate_inactive_work(work);
 
 		list_del_init(&work->entry);
-		pwq_dec_nr_in_flight(pwq, get_work_color(work));
+		pwq_dec_nr_in_flight(pwq, *work_data_bits(work));
 
 		/* work->data points to pwq iff queued, point to pool */
 		set_work_pool_and_keep_pending(work, pool->id);
@@ -2173,7 +2175,7 @@ __acquires(&pool->lock)
 	struct pool_workqueue *pwq = get_work_pwq(work);
 	struct worker_pool *pool = worker->pool;
 	bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
-	int work_color;
+	unsigned long work_data;
 	struct worker *collision;
 #ifdef CONFIG_LOCKDEP
 	/*
@@ -2209,7 +2211,7 @@ __acquires(&pool->lock)
 	worker->current_work = work;
 	worker->current_func = work->func;
 	worker->current_pwq = pwq;
-	work_color = get_work_color(work);
+	work_data = *work_data_bits(work);
 
 	/*
 	 * Record wq name for cmdline and debug reporting, may get
@@ -2315,7 +2317,7 @@ __acquires(&pool->lock)
 	worker->current_work = NULL;
 	worker->current_func = NULL;
 	worker->current_pwq = NULL;
-	pwq_dec_nr_in_flight(pwq, work_color);
+	pwq_dec_nr_in_flight(pwq, work_data);
 }
 
 /**
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/6] workqueue: Change the code of calculating work_flags in insert_wq_barrier()
  2021-08-17  1:32   ` [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work() Lai Jiangshan
  2021-08-17  1:32     ` [PATCH 1/6] workqueue: Rename "delayed" (delayed by active management) to "inactive" Lai Jiangshan
  2021-08-17  1:32     ` [PATCH 2/6] workqueue: Change arguement of pwq_dec_nr_in_flight() Lai Jiangshan
@ 2021-08-17  1:32     ` Lai Jiangshan
  2021-08-17  1:32     ` [PATCH 4/6] workqueue: Mark barrier work with WORK_STRUCT_INACTIVE Lai Jiangshan
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Lai Jiangshan @ 2021-08-17  1:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: lizefan.x, lizhe.67, Tejun Heo, Lai Jiangshan, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

Add a local var @work_flags to calculate work_flags step by step, so that
we don't need to squeeze several flags in only the last line of code.

Parepare for next patch to add a bit to barrier work item's flag.  Not
squshing this to next patch makes it clear that what it will have changed.

No functional change intended.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0a4fae7cb918..55fc2d1688d9 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2660,8 +2660,8 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
 			      struct wq_barrier *barr,
 			      struct work_struct *target, struct worker *worker)
 {
+	unsigned int work_flags = work_color_to_flags(WORK_NO_COLOR);
 	struct list_head *head;
-	unsigned int linked = 0;
 
 	/*
 	 * debugobject calls are safe here even with pool->lock locked
@@ -2687,13 +2687,12 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
 
 		head = target->entry.next;
 		/* there can already be other linked works, inherit and set */
-		linked = *bits & WORK_STRUCT_LINKED;
+		work_flags |= *bits & WORK_STRUCT_LINKED;
 		__set_bit(WORK_STRUCT_LINKED_BIT, bits);
 	}
 
 	debug_work_activate(&barr->work);
-	insert_work(pwq, &barr->work, head,
-		    work_color_to_flags(WORK_NO_COLOR) | linked);
+	insert_work(pwq, &barr->work, head, work_flags);
 }
 
 /**
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/6] workqueue: Mark barrier work with WORK_STRUCT_INACTIVE
  2021-08-17  1:32   ` [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work() Lai Jiangshan
                       ` (2 preceding siblings ...)
  2021-08-17  1:32     ` [PATCH 3/6] workqueue: Change the code of calculating work_flags in insert_wq_barrier() Lai Jiangshan
@ 2021-08-17  1:32     ` Lai Jiangshan
  2021-08-17  1:32     ` [PATCH 5/6] workqueue: Assign a color to barrier work items Lai Jiangshan
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Lai Jiangshan @ 2021-08-17  1:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: lizefan.x, lizhe.67, Tejun Heo, Lai Jiangshan, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

Currently, WORK_NO_COLOR has two meanings:
	Not participate in flushing
	Not participate in nr_active

And only non-barrier work items are marked with WORK_STRUCT_INACTIVE
when they are in inactive_works list.  The barrier work items are not
marked INACTIVE even linked in inactive_works list since these tail
items are always moved together with the head work item.

These definitions are simple, clean and practical. (Except a small
blemish that only the first meaning of WORK_NO_COLOR is documented in
include/linux/workqueue.h while both meanings are in workqueue.c)

But dual-purpose WORK_NO_COLOR used for barrier work items has proven to
be problematical[1].  Only the second purpose is obligatory.  So we plan
to make barrier work items participate in flushing but keep them still
not participating in nr_active.

So the plan is to mark barrier work items inactive without using
WORK_NO_COLOR in this patch so that we can assign a flushing color to
them in next patch.

The reasonable way is to add or reuse a bit in work data of the work
item.  But adding a bit will double the size of pool_workqueue.

Currently, WORK_STRUCT_INACTIVE is only used in try_to_grab_pending()
for user-queued work items and try_to_grab_pending() can't work for
barrier work items.  So we extend WORK_STRUCT_INACTIVE to also mark
barrier work items no matter which list they are in because we don't
need to determind which list a barrier work item is in.

So the meaning of WORK_STRUCT_INACTIVE becomes just "the work items don't
participate in nr_active" (no matter whether it is a barrier work item or
a user-queued work item).  And WORK_STRUCT_INACTIVE for user-queued work
items means they are in inactive_works list.

This patch does it by setting WORK_STRUCT_INACTIVE for barrier work items
in insert_wq_barrier() and checking WORK_STRUCT_INACTIVE first in
pwq_dec_nr_in_flight().  And the meaning of WORK_NO_COLOR is reduced to
only "not participating in flushing".

There is no functionality change intended in this patch.  Because
WORK_NO_COLOR+WORK_STRUCT_INACTIVE represents the previous WORK_NO_COLOR
in meaning and try_to_grab_pending() doesn't use for barrier work items
and avoids being confused by this extended WORK_STRUCT_INACTIVE.

A bunch of comment for nr_active & WORK_STRUCT_INACTIVE is also added for
documenting how WORK_STRUCT_INACTIVE works in nr_active management.

[1]: https://lore.kernel.org/lkml/20210812083814.32453-1-lizhe.67@bytedance.com/
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c | 42 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 55fc2d1688d9..1b2792b397f0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -205,6 +205,23 @@ struct pool_workqueue {
 	int			refcnt;		/* L: reference count */
 	int			nr_in_flight[WORK_NR_COLORS];
 						/* L: nr of in_flight works */
+
+	/*
+	 * nr_active management and WORK_STRUCT_INACTIVE:
+	 *
+	 * When pwq->nr_active >= max_active, new work item is queued to
+	 * pwq->inactive_works instead of pool->worklist and marked with
+	 * WORK_STRUCT_INACTIVE.
+	 *
+	 * All work items marked with WORK_STRUCT_INACTIVE do not participate
+	 * in pwq->nr_active and all work items in pwq->inactive_works are
+	 * marked with WORK_STRUCT_INACTIVE.  But not all WORK_STRUCT_INACTIVE
+	 * work items are in pwq->inactive_works.  Some of them are ready to
+	 * run in pool->worklist or worker->scheduled.  Those work itmes are
+	 * only struct wq_barrier which is used for flush_work() and should
+	 * not participate in pwq->nr_active.  For non-barrier work item, it
+	 * is marked with WORK_STRUCT_INACTIVE iff it is in pwq->inactive_works.
+	 */
 	int			nr_active;	/* L: nr of active works */
 	int			max_active;	/* L: max active works */
 	struct list_head	inactive_works;	/* L: inactive works */
@@ -1171,19 +1188,21 @@ static void pwq_dec_nr_in_flight(struct pool_workqueue *pwq, unsigned long work_
 {
 	int color = get_work_color(work_data);
 
-	/* uncolored work items don't participate in flushing or nr_active */
+	if (!(work_data & WORK_STRUCT_INACTIVE)) {
+		pwq->nr_active--;
+		if (!list_empty(&pwq->inactive_works)) {
+			/* one down, submit an inactive one */
+			if (pwq->nr_active < pwq->max_active)
+				pwq_activate_first_inactive(pwq);
+		}
+	}
+
+	/* uncolored work items don't participate in flushing */
 	if (color == WORK_NO_COLOR)
 		goto out_put;
 
 	pwq->nr_in_flight[color]--;
 
-	pwq->nr_active--;
-	if (!list_empty(&pwq->inactive_works)) {
-		/* one down, submit an inactive one */
-		if (pwq->nr_active < pwq->max_active)
-			pwq_activate_first_inactive(pwq);
-	}
-
 	/* is flush in progress and are we at the flushing tip? */
 	if (likely(pwq->flush_color != color))
 		goto out_put;
@@ -1283,6 +1302,10 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 		debug_work_deactivate(work);
 
 		/*
+		 * A cancelable inactive work item must be in the
+		 * pwq->inactive_works since a queued barrier can't be
+		 * canceled (see the comments in insert_wq_barrier()).
+		 *
 		 * An inactive work item cannot be grabbed directly because
 		 * it might have linked NO_COLOR work items which, if left
 		 * on the inactive_works list, will confuse pwq->nr_active
@@ -2676,6 +2699,9 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
 
 	barr->task = current;
 
+	/* The barrier work item does not participate in pwq->nr_active. */
+	work_flags |= WORK_STRUCT_INACTIVE;
+
 	/*
 	 * If @target is currently being executed, schedule the
 	 * barrier to the worker; otherwise, put it after @target.
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/6] workqueue: Assign a color to barrier work items
  2021-08-17  1:32   ` [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work() Lai Jiangshan
                       ` (3 preceding siblings ...)
  2021-08-17  1:32     ` [PATCH 4/6] workqueue: Mark barrier work with WORK_STRUCT_INACTIVE Lai Jiangshan
@ 2021-08-17  1:32     ` Lai Jiangshan
  2021-08-17  1:32     ` [PATCH 6/6] workqueue: Remove unused WORK_NO_COLOR Lai Jiangshan
  2021-08-17 17:48     ` [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work() Tejun Heo
  6 siblings, 0 replies; 10+ messages in thread
From: Lai Jiangshan @ 2021-08-17  1:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: lizefan.x, lizhe.67, Tejun Heo, Lai Jiangshan, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

There was no strong reason to or not to flush barrier work items in
flush_workqueue().  And we have to make barrier work items not participate
in nr_active so we had been using WORK_NO_COLOR for them which also makes
them can't be flushed by flush_workqueue().

And the users of flush_workqueue() often do not intend to wait barrier work
items issued by flush_work().  That made the choice sound perfect.

But barrier work items have reference to internal structure (pool_workqueue)
and the worker thread[s] is/are still busy for the workqueue user when the
barrrier work items are not done.  So it is reasonable to make flush_workqueue()
also watch for flush_work() to make it more robust.

And a problem[1] reported by Li Zhe shows that we need such robustness.
The warning logs are listed below:

WARNING: CPU: 0 PID: 19336 at kernel/workqueue.c:4430 destroy_workqueue+0x11a/0x2f0
*****
destroy_workqueue: test_workqueue9 has the following busy pwq
  pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=0/1 refcnt=2
      in-flight: 5658:wq_barrier_func
Showing busy workqueues and worker pools:
*****

It shows that even after drain_workqueue() returns, the barrier work item
is still in flight and the pwq (and a worker) is still busy on it.

The problem is caused by flush_workqueue() not watching flush_work():

Thread A				Worker
					/* normal work item with linked */
					process_scheduled_works()
destroy_workqueue()			  process_one_work()
  drain_workqueue()			    /* run normal work item */
				 /--	    pwq_dec_nr_in_flight()
    flush_workqueue()	    <---/
		/* the last normal work item is done */
  sanity_check				  process_one_work()
				       /--  raw_spin_unlock_irq(&pool->lock)
    raw_spin_lock_irq(&pool->lock)  <-/     /* maybe preempt */
    *WARNING*				    wq_barrier_func()
					    /* maybe preempt by cond_resched() */

Thread A can get the pool lock after the Worker unlocks the pool lock before
running wq_barrier_func().  And if there is any preemption happen around
wq_barrier_func(), destroy_workqueue()'s sanity check is more likely to
get the lock and catch it.  (Note: preemption is not necessary to cause the bug,
the unlocking is enough to possibly trigger the WARNING.)

A simple solution might be just executing all linked barrier work items
once without releasing pool lock after the head work item's
pwq_dec_nr_in_flight().  But this solution has two problems:

  1) the head work item might also be barrier work item when the user-queued
     work item is cancelled. For example:
	thread 1:		thread 2:
	queue_work(wq, &my_work)
	flush_work(&my_work)
				cancel_work_sync(&my_work);
	/* Neiter my_work nor the barrier work is scheduled. */
				destroy_workqueue(wq);
	/* This is an easier way to catch the WARNING. */

  2) there might be too much linked barrier work items and running them
     all once without releasing pool lock just causes trouble.

The only solution is to make flush_workqueue() aslo watch barrier work
items.  So we have to assign a color to these barrier work items which
is the color of the head (user-queued) work item.

Assigning a color doesn't cause any problem in ative management, because
the prvious patch made barrier work items not participate in nr_active
via WORK_STRUCT_INACTIVE rather than reliance on the (old) WORK_NO_COLOR.

[1]: https://lore.kernel.org/lkml/20210812083814.32453-1-lizhe.67@bytedance.com/
Reported-by: Li Zhe <lizhe.67@bytedance.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c          | 20 ++++++++++++--------
 kernel/workqueue_internal.h |  3 ++-
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1b2792b397f0..803b0868433b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1197,10 +1197,6 @@ static void pwq_dec_nr_in_flight(struct pool_workqueue *pwq, unsigned long work_
 		}
 	}
 
-	/* uncolored work items don't participate in flushing */
-	if (color == WORK_NO_COLOR)
-		goto out_put;
-
 	pwq->nr_in_flight[color]--;
 
 	/* is flush in progress and are we at the flushing tip? */
@@ -1307,7 +1303,7 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 		 * canceled (see the comments in insert_wq_barrier()).
 		 *
 		 * An inactive work item cannot be grabbed directly because
-		 * it might have linked NO_COLOR work items which, if left
+		 * it might have linked barrier work items which, if left
 		 * on the inactive_works list, will confuse pwq->nr_active
 		 * management later on and cause stall.  Make sure the work
 		 * item is activated before grabbing.
@@ -2235,6 +2231,7 @@ __acquires(&pool->lock)
 	worker->current_func = work->func;
 	worker->current_pwq = pwq;
 	work_data = *work_data_bits(work);
+	worker->current_color = get_work_color(work_data);
 
 	/*
 	 * Record wq name for cmdline and debug reporting, may get
@@ -2340,6 +2337,7 @@ __acquires(&pool->lock)
 	worker->current_work = NULL;
 	worker->current_func = NULL;
 	worker->current_pwq = NULL;
+	worker->current_color = INT_MAX;
 	pwq_dec_nr_in_flight(pwq, work_data);
 }
 
@@ -2683,7 +2681,8 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
 			      struct wq_barrier *barr,
 			      struct work_struct *target, struct worker *worker)
 {
-	unsigned int work_flags = work_color_to_flags(WORK_NO_COLOR);
+	unsigned int work_flags = 0;
+	unsigned int work_color;
 	struct list_head *head;
 
 	/*
@@ -2706,17 +2705,22 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
 	 * If @target is currently being executed, schedule the
 	 * barrier to the worker; otherwise, put it after @target.
 	 */
-	if (worker)
+	if (worker) {
 		head = worker->scheduled.next;
-	else {
+		work_color = worker->current_color;
+	} else {
 		unsigned long *bits = work_data_bits(target);
 
 		head = target->entry.next;
 		/* there can already be other linked works, inherit and set */
 		work_flags |= *bits & WORK_STRUCT_LINKED;
+		work_color = get_work_color(*bits);
 		__set_bit(WORK_STRUCT_LINKED_BIT, bits);
 	}
 
+	pwq->nr_in_flight[work_color]++;
+	work_flags |= work_color_to_flags(work_color);
+
 	debug_work_activate(&barr->work);
 	insert_work(pwq, &barr->work, head, work_flags);
 }
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 498de0e909a4..e00b1204a8e9 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -30,7 +30,8 @@ struct worker {
 
 	struct work_struct	*current_work;	/* L: work being processed */
 	work_func_t		current_func;	/* L: current_work's fn */
-	struct pool_workqueue	*current_pwq; /* L: current_work's pwq */
+	struct pool_workqueue	*current_pwq;	/* L: current_work's pwq */
+	unsigned int		current_color;	/* L: current_work's color */
 	struct list_head	scheduled;	/* L: scheduled works */
 
 	/* 64 bytes boundary on 64bit, 32 on 32bit */
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/6] workqueue: Remove unused WORK_NO_COLOR
  2021-08-17  1:32   ` [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work() Lai Jiangshan
                       ` (4 preceding siblings ...)
  2021-08-17  1:32     ` [PATCH 5/6] workqueue: Assign a color to barrier work items Lai Jiangshan
@ 2021-08-17  1:32     ` Lai Jiangshan
  2021-08-17 17:48     ` [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work() Tejun Heo
  6 siblings, 0 replies; 10+ messages in thread
From: Lai Jiangshan @ 2021-08-17  1:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: lizefan.x, lizhe.67, Tejun Heo, Lai Jiangshan, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

WORK_NO_COLOR has no user now, just remove it.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 include/linux/workqueue.h | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 38e479bcdbfe..ba914e67c5ab 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -51,19 +51,14 @@ enum {
 	WORK_STRUCT_STATIC	= 0,
 #endif
 
-	/*
-	 * The last color is no color used for works which don't
-	 * participate in workqueue flushing.
-	 */
-	WORK_NR_COLORS		= (1 << WORK_STRUCT_COLOR_BITS) - 1,
-	WORK_NO_COLOR		= WORK_NR_COLORS,
+	WORK_NR_COLORS		= (1 << WORK_STRUCT_COLOR_BITS),
 
 	/* not bound to any CPU, prefer the local CPU */
 	WORK_CPU_UNBOUND	= NR_CPUS,
 
 	/*
 	 * Reserve 8 bits off of pwq pointer w/ debugobjects turned off.
-	 * This makes pwqs aligned to 256 bytes and allows 15 workqueue
+	 * This makes pwqs aligned to 256 bytes and allows 16 workqueue
 	 * flush colors.
 	 */
 	WORK_STRUCT_FLAG_BITS	= WORK_STRUCT_COLOR_SHIFT +
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work()
  2021-08-17  1:32   ` [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work() Lai Jiangshan
                       ` (5 preceding siblings ...)
  2021-08-17  1:32     ` [PATCH 6/6] workqueue: Remove unused WORK_NO_COLOR Lai Jiangshan
@ 2021-08-17 17:48     ` Tejun Heo
  6 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2021-08-17 17:48 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, lizefan.x, lizhe.67, Lai Jiangshan

On Tue, Aug 17, 2021 at 09:32:33AM +0800, Lai Jiangshan wrote:
> Due to historical convenience, we used WORK_NO_COLOR for barrier work items
> queued by flush_work().  The color has two purposes:
> 	Not participate in flushing
> 	Not participate in nr_active
> 
> Only the second purpose is obligatory.  So the plan is to mark barrier
> work items inactive without using WORK_NO_COLOR in patch4 so that we can
> assign a flushing color to them in patch5.

This is great work. Applied to wq/for-5.15.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-08-17 17:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-12  8:38 [PATCH] workqueue: fix memory leak of struct pool_workqueue in destroy_workqueue() lizhe.67
2021-08-13 12:03 ` Lai Jiangshan
2021-08-17  1:32   ` [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work() Lai Jiangshan
2021-08-17  1:32     ` [PATCH 1/6] workqueue: Rename "delayed" (delayed by active management) to "inactive" Lai Jiangshan
2021-08-17  1:32     ` [PATCH 2/6] workqueue: Change arguement of pwq_dec_nr_in_flight() Lai Jiangshan
2021-08-17  1:32     ` [PATCH 3/6] workqueue: Change the code of calculating work_flags in insert_wq_barrier() Lai Jiangshan
2021-08-17  1:32     ` [PATCH 4/6] workqueue: Mark barrier work with WORK_STRUCT_INACTIVE Lai Jiangshan
2021-08-17  1:32     ` [PATCH 5/6] workqueue: Assign a color to barrier work items Lai Jiangshan
2021-08-17  1:32     ` [PATCH 6/6] workqueue: Remove unused WORK_NO_COLOR Lai Jiangshan
2021-08-17 17:48     ` [PATCH 0/6] workqueue: Make flush_workqueue() also watch flush_work() Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).