linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] workqueue: Make the workqueue code PREEMPT_RT safe
@ 2020-05-27 19:46 Sebastian Andrzej Siewior
  2020-05-27 19:46 ` [PATCH v2 1/2] workqueue: Use rcuwait for wq_manager_wait Sebastian Andrzej Siewior
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-05-27 19:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds

The series changes `wq_manager_wait' from waitqueues to simple
waitqueues and its internal locking (pool::lock and wq_mayday_lock) to
raw spinlocks so that workqueues can be used on PREEMPT_RT from truly
atomic context.

v1…v2: Use rcuwait instead of swait.

v1 can be found at
   https://lore.kernel.org/lkml/20200513162732.977489-1-bigeasy@linutronix.de/

Sebastian



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] workqueue: Use rcuwait for wq_manager_wait
  2020-05-27 19:46 [PATCH v2 0/2] workqueue: Make the workqueue code PREEMPT_RT safe Sebastian Andrzej Siewior
@ 2020-05-27 19:46 ` Sebastian Andrzej Siewior
  2020-05-28  3:06   ` [PATCH 1/2] workqueue: pin the pool while it is managing Lai Jiangshan
  2020-05-27 19:46 ` [PATCH v2 2/2] workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t Sebastian Andrzej Siewior
  2020-05-27 22:57 ` [PATCH v2 0/2] workqueue: Make the workqueue code PREEMPT_RT safe Linus Torvalds
  2 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-05-27 19:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, Sebastian Andrzej Siewior

The workqueue code has it's internal spinlock (pool::lock) and also
implicit spinlock usage in the wq_manager waitqueue. These spinlocks
are converted to 'sleeping' spinlocks on a RT-kernel.

Workqueue functions can be invoked from contexts which are truly atomic
even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such
contexts is forbidden.

pool::lock can be converted to a raw spinlock as the lock held times
are short. But the workqueue manager waitqueue is handled inside of
pool::lock held regions which again violates the lock nesting rules
of raw and regular spinlocks.

The manager waitqueue has no special requirements like custom wakeup
callbacks or mass wakeups. While it does not use exclusive wait mode
explicitly there is no strict requirement to queue the waiters in a
particular order as there is only one waiter at a time.

This allows to replace the waitqueue with rcuwait which solves the
locking problem because rcuwait relies on existing locking.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/workqueue.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 891ccad5f2716..e1d9af713df4a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -301,7 +301,8 @@ static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;
 static DEFINE_MUTEX(wq_pool_mutex);	/* protects pools and workqueues list */
 static DEFINE_MUTEX(wq_pool_attach_mutex); /* protects worker attach/detach */
 static DEFINE_SPINLOCK(wq_mayday_lock);	/* protects wq->maydays list */
-static DECLARE_WAIT_QUEUE_HEAD(wq_manager_wait); /* wait for manager to go away */
+/* wait for manager to go away */
+static struct rcuwait manager_wait = __RCUWAIT_INITIALIZER(manager_wait);
 
 static LIST_HEAD(workqueues);		/* PR: list of all workqueues */
 static bool workqueue_freezing;		/* PL: have wqs started freezing? */
@@ -2140,7 +2141,7 @@ static bool manage_workers(struct worker *worker)
 
 	pool->manager = NULL;
 	pool->flags &= ~POOL_MANAGER_ACTIVE;
-	wake_up(&wq_manager_wait);
+	rcuwait_wake_up(&manager_wait);
 	return true;
 }
 
@@ -3504,6 +3505,18 @@ static void rcu_free_pool(struct rcu_head *rcu)
 	kfree(pool);
 }
 
+/* This returns with the lock held on success (pool manager is inactive). */
+static bool wq_manager_inactive(struct worker_pool *pool)
+{
+	spin_lock_irq(&pool->lock);
+
+	if (pool->flags & POOL_MANAGER_ACTIVE) {
+		spin_unlock_irq(&pool->lock);
+		return false;
+	}
+	return true;
+}
+
 /**
  * put_unbound_pool - put a worker_pool
  * @pool: worker_pool to put
@@ -3539,10 +3552,11 @@ static void put_unbound_pool(struct worker_pool *pool)
 	 * Become the manager and destroy all workers.  This prevents
 	 * @pool's workers from blocking on attach_mutex.  We're the last
 	 * manager and @pool gets freed with the flag set.
+	 * Because of how wq_manager_inactive() works, we will hold the
+	 * spinlock after a successful wait.
 	 */
-	spin_lock_irq(&pool->lock);
-	wait_event_lock_irq(wq_manager_wait,
-			    !(pool->flags & POOL_MANAGER_ACTIVE), pool->lock);
+	rcuwait_wait_event(&manager_wait, wq_manager_inactive(pool),
+			   TASK_UNINTERRUPTIBLE);
 	pool->flags |= POOL_MANAGER_ACTIVE;
 
 	while ((worker = first_idle_worker(pool)))
-- 
2.27.0.rc0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t
  2020-05-27 19:46 [PATCH v2 0/2] workqueue: Make the workqueue code PREEMPT_RT safe Sebastian Andrzej Siewior
  2020-05-27 19:46 ` [PATCH v2 1/2] workqueue: Use rcuwait for wq_manager_wait Sebastian Andrzej Siewior
@ 2020-05-27 19:46 ` Sebastian Andrzej Siewior
  2020-05-28  0:33   ` Lai Jiangshan
  2020-05-29 14:04   ` Tejun Heo
  2020-05-27 22:57 ` [PATCH v2 0/2] workqueue: Make the workqueue code PREEMPT_RT safe Linus Torvalds
  2 siblings, 2 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-05-27 19:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, Sebastian Andrzej Siewior

The workqueue code has it's internal spinlocks (pool::lock), which
are acquired on most workqueue operations. These spinlocks are
converted to 'sleeping' spinlocks on a RT-kernel.

Workqueue functions can be invoked from contexts which are truly atomic
even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such
contexts is forbidden.

The pool::lock hold times are bound and the code sections are
relatively short, which allows to convert pool::lock and as a
consequence wq_mayday_lock to raw spinlocks which are truly spinning
locks even on a PREEMPT_RT kernel.

With the previous conversion of the manager waitqueue to a simple
waitqueue workqueues are now fully RT compliant.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/workqueue.c | 176 ++++++++++++++++++++++-----------------------
 1 file changed, 88 insertions(+), 88 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e1d9af713df4a..f8f80e2f79387 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -145,7 +145,7 @@ enum {
 /* struct worker is defined in workqueue_internal.h */
 
 struct worker_pool {
-	spinlock_t		lock;		/* the pool lock */
+	raw_spinlock_t		lock;		/* the pool lock */
 	int			cpu;		/* I: the associated cpu */
 	int			node;		/* I: the associated node ID */
 	int			id;		/* I: pool ID */
@@ -300,7 +300,7 @@ static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;
 
 static DEFINE_MUTEX(wq_pool_mutex);	/* protects pools and workqueues list */
 static DEFINE_MUTEX(wq_pool_attach_mutex); /* protects worker attach/detach */
-static DEFINE_SPINLOCK(wq_mayday_lock);	/* protects wq->maydays list */
+static DEFINE_RAW_SPINLOCK(wq_mayday_lock);	/* protects wq->maydays list */
 /* wait for manager to go away */
 static struct rcuwait manager_wait = __RCUWAIT_INITIALIZER(manager_wait);
 
@@ -827,7 +827,7 @@ static struct worker *first_idle_worker(struct worker_pool *pool)
  * Wake up the first idle worker of @pool.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock).
+ * raw_spin_lock_irq(pool->lock).
  */
 static void wake_up_worker(struct worker_pool *pool)
 {
@@ -882,7 +882,7 @@ void wq_worker_sleeping(struct task_struct *task)
 		return;
 
 	worker->sleeping = 1;
-	spin_lock_irq(&pool->lock);
+	raw_spin_lock_irq(&pool->lock);
 
 	/*
 	 * The counterpart of the following dec_and_test, implied mb,
@@ -901,7 +901,7 @@ void wq_worker_sleeping(struct task_struct *task)
 		if (next)
 			wake_up_process(next->task);
 	}
-	spin_unlock_irq(&pool->lock);
+	raw_spin_unlock_irq(&pool->lock);
 }
 
 /**
@@ -912,7 +912,7 @@ void wq_worker_sleeping(struct task_struct *task)
  * the scheduler to get a worker's last known identity.
  *
  * CONTEXT:
- * spin_lock_irq(rq->lock)
+ * raw_spin_lock_irq(rq->lock)
  *
  * This function is called during schedule() when a kworker is going
  * to sleep. It's used by psi to identify aggregation workers during
@@ -943,7 +943,7 @@ work_func_t wq_worker_last_func(struct task_struct *task)
  * Set @flags in @worker->flags and adjust nr_running accordingly.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock)
+ * raw_spin_lock_irq(pool->lock)
  */
 static inline void worker_set_flags(struct worker *worker, unsigned int flags)
 {
@@ -968,7 +968,7 @@ static inline void worker_set_flags(struct worker *worker, unsigned int flags)
  * Clear @flags in @worker->flags and adjust nr_running accordingly.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock)
+ * raw_spin_lock_irq(pool->lock)
  */
 static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
 {
@@ -1016,7 +1016,7 @@ static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
  * actually occurs, it should be easy to locate the culprit work function.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock).
+ * raw_spin_lock_irq(pool->lock).
  *
  * Return:
  * Pointer to worker which is executing @work if found, %NULL
@@ -1051,7 +1051,7 @@ static struct worker *find_worker_executing_work(struct worker_pool *pool,
  * nested inside outer list_for_each_entry_safe().
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock).
+ * raw_spin_lock_irq(pool->lock).
  */
 static void move_linked_works(struct work_struct *work, struct list_head *head,
 			      struct work_struct **nextp)
@@ -1129,9 +1129,9 @@ static void put_pwq_unlocked(struct pool_workqueue *pwq)
 		 * As both pwqs and pools are RCU protected, the
 		 * following lock operations are safe.
 		 */
-		spin_lock_irq(&pwq->pool->lock);
+		raw_spin_lock_irq(&pwq->pool->lock);
 		put_pwq(pwq);
-		spin_unlock_irq(&pwq->pool->lock);
+		raw_spin_unlock_irq(&pwq->pool->lock);
 	}
 }
 
@@ -1164,7 +1164,7 @@ static void pwq_activate_first_delayed(struct pool_workqueue *pwq)
  * decrement nr_in_flight of its pwq and handle workqueue flushing.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock).
+ * raw_spin_lock_irq(pool->lock).
  */
 static void pwq_dec_nr_in_flight(struct pool_workqueue *pwq, int color)
 {
@@ -1263,7 +1263,7 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 	if (!pool)
 		goto fail;
 
-	spin_lock(&pool->lock);
+	raw_spin_lock(&pool->lock);
 	/*
 	 * work->data is guaranteed to point to pwq only while the work
 	 * item is queued on pwq->wq, and both updating work->data to point
@@ -1292,11 +1292,11 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 		/* work->data points to pwq iff queued, point to pool */
 		set_work_pool_and_keep_pending(work, pool->id);
 
-		spin_unlock(&pool->lock);
+		raw_spin_unlock(&pool->lock);
 		rcu_read_unlock();
 		return 1;
 	}
-	spin_unlock(&pool->lock);
+	raw_spin_unlock(&pool->lock);
 fail:
 	rcu_read_unlock();
 	local_irq_restore(*flags);
@@ -1317,7 +1317,7 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
  * work_struct flags.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock).
+ * raw_spin_lock_irq(pool->lock).
  */
 static void insert_work(struct pool_workqueue *pwq, struct work_struct *work,
 			struct list_head *head, unsigned int extra_flags)
@@ -1434,7 +1434,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 	if (last_pool && last_pool != pwq->pool) {
 		struct worker *worker;
 
-		spin_lock(&last_pool->lock);
+		raw_spin_lock(&last_pool->lock);
 
 		worker = find_worker_executing_work(last_pool, work);
 
@@ -1442,11 +1442,11 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 			pwq = worker->current_pwq;
 		} else {
 			/* meh... not running there, queue here */
-			spin_unlock(&last_pool->lock);
-			spin_lock(&pwq->pool->lock);
+			raw_spin_unlock(&last_pool->lock);
+			raw_spin_lock(&pwq->pool->lock);
 		}
 	} else {
-		spin_lock(&pwq->pool->lock);
+		raw_spin_lock(&pwq->pool->lock);
 	}
 
 	/*
@@ -1459,7 +1459,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 	 */
 	if (unlikely(!pwq->refcnt)) {
 		if (wq->flags & WQ_UNBOUND) {
-			spin_unlock(&pwq->pool->lock);
+			raw_spin_unlock(&pwq->pool->lock);
 			cpu_relax();
 			goto retry;
 		}
@@ -1491,7 +1491,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 	insert_work(pwq, work, worklist, work_flags);
 
 out:
-	spin_unlock(&pwq->pool->lock);
+	raw_spin_unlock(&pwq->pool->lock);
 	rcu_read_unlock();
 }
 
@@ -1760,7 +1760,7 @@ EXPORT_SYMBOL(queue_rcu_work);
  * necessary.
  *
  * LOCKING:
- * spin_lock_irq(pool->lock).
+ * raw_spin_lock_irq(pool->lock).
  */
 static void worker_enter_idle(struct worker *worker)
 {
@@ -1800,7 +1800,7 @@ static void worker_enter_idle(struct worker *worker)
  * @worker is leaving idle state.  Update stats.
  *
  * LOCKING:
- * spin_lock_irq(pool->lock).
+ * raw_spin_lock_irq(pool->lock).
  */
 static void worker_leave_idle(struct worker *worker)
 {
@@ -1938,11 +1938,11 @@ static struct worker *create_worker(struct worker_pool *pool)
 	worker_attach_to_pool(worker, pool);
 
 	/* start the newly created worker */
-	spin_lock_irq(&pool->lock);
+	raw_spin_lock_irq(&pool->lock);
 	worker->pool->nr_workers++;
 	worker_enter_idle(worker);
 	wake_up_process(worker->task);
-	spin_unlock_irq(&pool->lock);
+	raw_spin_unlock_irq(&pool->lock);
 
 	return worker;
 
@@ -1961,7 +1961,7 @@ static struct worker *create_worker(struct worker_pool *pool)
  * be idle.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock).
+ * raw_spin_lock_irq(pool->lock).
  */
 static void destroy_worker(struct worker *worker)
 {
@@ -1987,7 +1987,7 @@ static void idle_worker_timeout(struct timer_list *t)
 {
 	struct worker_pool *pool = from_timer(pool, t, idle_timer);
 
-	spin_lock_irq(&pool->lock);
+	raw_spin_lock_irq(&pool->lock);
 
 	while (too_many_workers(pool)) {
 		struct worker *worker;
@@ -2005,7 +2005,7 @@ static void idle_worker_timeout(struct timer_list *t)
 		destroy_worker(worker);
 	}
 
-	spin_unlock_irq(&pool->lock);
+	raw_spin_unlock_irq(&pool->lock);
 }
 
 static void send_mayday(struct work_struct *work)
@@ -2036,8 +2036,8 @@ static void pool_mayday_timeout(struct timer_list *t)
 	struct worker_pool *pool = from_timer(pool, t, mayday_timer);
 	struct work_struct *work;
 
-	spin_lock_irq(&pool->lock);
-	spin_lock(&wq_mayday_lock);		/* for wq->maydays */
+	raw_spin_lock_irq(&pool->lock);
+	raw_spin_lock(&wq_mayday_lock);		/* for wq->maydays */
 
 	if (need_to_create_worker(pool)) {
 		/*
@@ -2050,8 +2050,8 @@ static void pool_mayday_timeout(struct timer_list *t)
 			send_mayday(work);
 	}
 
-	spin_unlock(&wq_mayday_lock);
-	spin_unlock_irq(&pool->lock);
+	raw_spin_unlock(&wq_mayday_lock);
+	raw_spin_unlock_irq(&pool->lock);
 
 	mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INTERVAL);
 }
@@ -2070,7 +2070,7 @@ static void pool_mayday_timeout(struct timer_list *t)
  * may_start_working() %true.
  *
  * LOCKING:
- * spin_lock_irq(pool->lock) which may be released and regrabbed
+ * raw_spin_lock_irq(pool->lock) which may be released and regrabbed
  * multiple times.  Does GFP_KERNEL allocations.  Called only from
  * manager.
  */
@@ -2079,7 +2079,7 @@ __releases(&pool->lock)
 __acquires(&pool->lock)
 {
 restart:
-	spin_unlock_irq(&pool->lock);
+	raw_spin_unlock_irq(&pool->lock);
 
 	/* if we don't make progress in MAYDAY_INITIAL_TIMEOUT, call for help */
 	mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INITIAL_TIMEOUT);
@@ -2095,7 +2095,7 @@ __acquires(&pool->lock)
 	}
 
 	del_timer_sync(&pool->mayday_timer);
-	spin_lock_irq(&pool->lock);
+	raw_spin_lock_irq(&pool->lock);
 	/*
 	 * This is necessary even after a new worker was just successfully
 	 * created as @pool->lock was dropped and the new worker might have
@@ -2118,7 +2118,7 @@ __acquires(&pool->lock)
  * and may_start_working() is true.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock) which may be released and regrabbed
+ * raw_spin_lock_irq(pool->lock) which may be released and regrabbed
  * multiple times.  Does GFP_KERNEL allocations.
  *
  * Return:
@@ -2157,7 +2157,7 @@ static bool manage_workers(struct worker *worker)
  * call this function to process a work.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock) which is released and regrabbed.
+ * raw_spin_lock_irq(pool->lock) which is released and regrabbed.
  */
 static void process_one_work(struct worker *worker, struct work_struct *work)
 __releases(&pool->lock)
@@ -2239,7 +2239,7 @@ __acquires(&pool->lock)
 	 */
 	set_work_pool_and_clear_pending(work, pool->id);
 
-	spin_unlock_irq(&pool->lock);
+	raw_spin_unlock_irq(&pool->lock);
 
 	lock_map_acquire(&pwq->wq->lockdep_map);
 	lock_map_acquire(&lockdep_map);
@@ -2294,7 +2294,7 @@ __acquires(&pool->lock)
 	 */
 	cond_resched();
 
-	spin_lock_irq(&pool->lock);
+	raw_spin_lock_irq(&pool->lock);
 
 	/* clear cpu intensive status */
 	if (unlikely(cpu_intensive))
@@ -2320,7 +2320,7 @@ __acquires(&pool->lock)
  * fetches a work from the top and executes it.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock) which may be released and regrabbed
+ * raw_spin_lock_irq(pool->lock) which may be released and regrabbed
  * multiple times.
  */
 static void process_scheduled_works(struct worker *worker)
@@ -2362,11 +2362,11 @@ static int worker_thread(void *__worker)
 	/* tell the scheduler that this is a workqueue worker */
 	set_pf_worker(true);
 woke_up:
-	spin_lock_irq(&pool->lock);
+	raw_spin_lock_irq(&pool->lock);
 
 	/* am I supposed to die? */
 	if (unlikely(worker->flags & WORKER_DIE)) {
-		spin_unlock_irq(&pool->lock);
+		raw_spin_unlock_irq(&pool->lock);
 		WARN_ON_ONCE(!list_empty(&worker->entry));
 		set_pf_worker(false);
 
@@ -2432,7 +2432,7 @@ static int worker_thread(void *__worker)
 	 */
 	worker_enter_idle(worker);
 	__set_current_state(TASK_IDLE);
-	spin_unlock_irq(&pool->lock);
+	raw_spin_unlock_irq(&pool->lock);
 	schedule();
 	goto woke_up;
 }
@@ -2486,7 +2486,7 @@ static int rescuer_thread(void *__rescuer)
 	should_stop = kthread_should_stop();
 
 	/* see whether any pwq is asking for help */
-	spin_lock_irq(&wq_mayday_lock);
+	raw_spin_lock_irq(&wq_mayday_lock);
 
 	while (!list_empty(&wq->maydays)) {
 		struct pool_workqueue *pwq = list_first_entry(&wq->maydays,
@@ -2498,11 +2498,11 @@ static int rescuer_thread(void *__rescuer)
 		__set_current_state(TASK_RUNNING);
 		list_del_init(&pwq->mayday_node);
 
-		spin_unlock_irq(&wq_mayday_lock);
+		raw_spin_unlock_irq(&wq_mayday_lock);
 
 		worker_attach_to_pool(rescuer, pool);
 
-		spin_lock_irq(&pool->lock);
+		raw_spin_lock_irq(&pool->lock);
 
 		/*
 		 * Slurp in all works issued via this workqueue and
@@ -2531,7 +2531,7 @@ static int rescuer_thread(void *__rescuer)
 			 * incur MAYDAY_INTERVAL delay inbetween.
 			 */
 			if (need_to_create_worker(pool)) {
-				spin_lock(&wq_mayday_lock);
+				raw_spin_lock(&wq_mayday_lock);
 				/*
 				 * Queue iff we aren't racing destruction
 				 * and somebody else hasn't queued it already.
@@ -2540,7 +2540,7 @@ static int rescuer_thread(void *__rescuer)
 					get_pwq(pwq);
 					list_add_tail(&pwq->mayday_node, &wq->maydays);
 				}
-				spin_unlock(&wq_mayday_lock);
+				raw_spin_unlock(&wq_mayday_lock);
 			}
 		}
 
@@ -2558,14 +2558,14 @@ static int rescuer_thread(void *__rescuer)
 		if (need_more_worker(pool))
 			wake_up_worker(pool);
 
-		spin_unlock_irq(&pool->lock);
+		raw_spin_unlock_irq(&pool->lock);
 
 		worker_detach_from_pool(rescuer);
 
-		spin_lock_irq(&wq_mayday_lock);
+		raw_spin_lock_irq(&wq_mayday_lock);
 	}
 
-	spin_unlock_irq(&wq_mayday_lock);
+	raw_spin_unlock_irq(&wq_mayday_lock);
 
 	if (should_stop) {
 		__set_current_state(TASK_RUNNING);
@@ -2645,7 +2645,7 @@ static void wq_barrier_func(struct work_struct *work)
  * underneath us, so we can't reliably determine pwq from @target.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock).
+ * raw_spin_lock_irq(pool->lock).
  */
 static void insert_wq_barrier(struct pool_workqueue *pwq,
 			      struct wq_barrier *barr,
@@ -2732,7 +2732,7 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
 	for_each_pwq(pwq, wq) {
 		struct worker_pool *pool = pwq->pool;
 
-		spin_lock_irq(&pool->lock);
+		raw_spin_lock_irq(&pool->lock);
 
 		if (flush_color >= 0) {
 			WARN_ON_ONCE(pwq->flush_color != -1);
@@ -2749,7 +2749,7 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
 			pwq->work_color = work_color;
 		}
 
-		spin_unlock_irq(&pool->lock);
+		raw_spin_unlock_irq(&pool->lock);
 	}
 
 	if (flush_color >= 0 && atomic_dec_and_test(&wq->nr_pwqs_to_flush))
@@ -2949,9 +2949,9 @@ void drain_workqueue(struct workqueue_struct *wq)
 	for_each_pwq(pwq, wq) {
 		bool drained;
 
-		spin_lock_irq(&pwq->pool->lock);
+		raw_spin_lock_irq(&pwq->pool->lock);
 		drained = !pwq->nr_active && list_empty(&pwq->delayed_works);
-		spin_unlock_irq(&pwq->pool->lock);
+		raw_spin_unlock_irq(&pwq->pool->lock);
 
 		if (drained)
 			continue;
@@ -2987,7 +2987,7 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
 		return false;
 	}
 
-	spin_lock_irq(&pool->lock);
+	raw_spin_lock_irq(&pool->lock);
 	/* see the comment in try_to_grab_pending() with the same code */
 	pwq = get_work_pwq(work);
 	if (pwq) {
@@ -3003,7 +3003,7 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
 	check_flush_dependency(pwq->wq, work);
 
 	insert_wq_barrier(pwq, barr, work, worker);
-	spin_unlock_irq(&pool->lock);
+	raw_spin_unlock_irq(&pool->lock);
 
 	/*
 	 * Force a lock recursion deadlock when using flush_work() inside a
@@ -3022,7 +3022,7 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
 	rcu_read_unlock();
 	return true;
 already_gone:
-	spin_unlock_irq(&pool->lock);
+	raw_spin_unlock_irq(&pool->lock);
 	rcu_read_unlock();
 	return false;
 }
@@ -3415,7 +3415,7 @@ static bool wqattrs_equal(const struct workqueue_attrs *a,
  */
 static int init_worker_pool(struct worker_pool *pool)
 {
-	spin_lock_init(&pool->lock);
+	raw_spin_lock_init(&pool->lock);
 	pool->id = -1;
 	pool->cpu = -1;
 	pool->node = NUMA_NO_NODE;
@@ -3508,10 +3508,10 @@ static void rcu_free_pool(struct rcu_head *rcu)
 /* This returns with the lock held on success (pool manager is inactive). */
 static bool wq_manager_inactive(struct worker_pool *pool)
 {
-	spin_lock_irq(&pool->lock);
+	raw_spin_lock_irq(&pool->lock);
 
 	if (pool->flags & POOL_MANAGER_ACTIVE) {
-		spin_unlock_irq(&pool->lock);
+		raw_spin_unlock_irq(&pool->lock);
 		return false;
 	}
 	return true;
@@ -3562,7 +3562,7 @@ static void put_unbound_pool(struct worker_pool *pool)
 	while ((worker = first_idle_worker(pool)))
 		destroy_worker(worker);
 	WARN_ON(pool->nr_workers || pool->nr_idle);
-	spin_unlock_irq(&pool->lock);
+	raw_spin_unlock_irq(&pool->lock);
 
 	mutex_lock(&wq_pool_attach_mutex);
 	if (!list_empty(&pool->workers))
@@ -3718,7 +3718,7 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
 		return;
 
 	/* this function can be called during early boot w/ irq disabled */
-	spin_lock_irqsave(&pwq->pool->lock, flags);
+	raw_spin_lock_irqsave(&pwq->pool->lock, flags);
 
 	/*
 	 * During [un]freezing, the caller is responsible for ensuring that
@@ -3741,7 +3741,7 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
 		pwq->max_active = 0;
 	}
 
-	spin_unlock_irqrestore(&pwq->pool->lock, flags);
+	raw_spin_unlock_irqrestore(&pwq->pool->lock, flags);
 }
 
 /* initialize newly alloced @pwq which is associated with @wq and @pool */
@@ -4143,9 +4143,9 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
 
 use_dfl_pwq:
 	mutex_lock(&wq->mutex);
-	spin_lock_irq(&wq->dfl_pwq->pool->lock);
+	raw_spin_lock_irq(&wq->dfl_pwq->pool->lock);
 	get_pwq(wq->dfl_pwq);
-	spin_unlock_irq(&wq->dfl_pwq->pool->lock);
+	raw_spin_unlock_irq(&wq->dfl_pwq->pool->lock);
 	old_pwq = numa_pwq_tbl_install(wq, node, wq->dfl_pwq);
 out_unlock:
 	mutex_unlock(&wq->mutex);
@@ -4374,9 +4374,9 @@ void destroy_workqueue(struct workqueue_struct *wq)
 		struct worker *rescuer = wq->rescuer;
 
 		/* this prevents new queueing */
-		spin_lock_irq(&wq_mayday_lock);
+		raw_spin_lock_irq(&wq_mayday_lock);
 		wq->rescuer = NULL;
-		spin_unlock_irq(&wq_mayday_lock);
+		raw_spin_unlock_irq(&wq_mayday_lock);
 
 		/* rescuer will empty maydays list before exiting */
 		kthread_stop(rescuer->task);
@@ -4390,18 +4390,18 @@ void destroy_workqueue(struct workqueue_struct *wq)
 	mutex_lock(&wq_pool_mutex);
 	mutex_lock(&wq->mutex);
 	for_each_pwq(pwq, wq) {
-		spin_lock_irq(&pwq->pool->lock);
+		raw_spin_lock_irq(&pwq->pool->lock);
 		if (WARN_ON(pwq_busy(pwq))) {
 			pr_warn("%s: %s has the following busy pwq\n",
 				__func__, wq->name);
 			show_pwq(pwq);
-			spin_unlock_irq(&pwq->pool->lock);
+			raw_spin_unlock_irq(&pwq->pool->lock);
 			mutex_unlock(&wq->mutex);
 			mutex_unlock(&wq_pool_mutex);
 			show_workqueue_state();
 			return;
 		}
-		spin_unlock_irq(&pwq->pool->lock);
+		raw_spin_unlock_irq(&pwq->pool->lock);
 	}
 	mutex_unlock(&wq->mutex);
 	mutex_unlock(&wq_pool_mutex);
@@ -4572,10 +4572,10 @@ unsigned int work_busy(struct work_struct *work)
 	rcu_read_lock();
 	pool = get_work_pool(work);
 	if (pool) {
-		spin_lock_irqsave(&pool->lock, flags);
+		raw_spin_lock_irqsave(&pool->lock, flags);
 		if (find_worker_executing_work(pool, work))
 			ret |= WORK_BUSY_RUNNING;
-		spin_unlock_irqrestore(&pool->lock, flags);
+		raw_spin_unlock_irqrestore(&pool->lock, flags);
 	}
 	rcu_read_unlock();
 
@@ -4782,10 +4782,10 @@ void show_workqueue_state(void)
 		pr_info("workqueue %s: flags=0x%x\n", wq->name, wq->flags);
 
 		for_each_pwq(pwq, wq) {
-			spin_lock_irqsave(&pwq->pool->lock, flags);
+			raw_spin_lock_irqsave(&pwq->pool->lock, flags);
 			if (pwq->nr_active || !list_empty(&pwq->delayed_works))
 				show_pwq(pwq);
-			spin_unlock_irqrestore(&pwq->pool->lock, flags);
+			raw_spin_unlock_irqrestore(&pwq->pool->lock, flags);
 			/*
 			 * We could be printing a lot from atomic context, e.g.
 			 * sysrq-t -> show_workqueue_state(). Avoid triggering
@@ -4799,7 +4799,7 @@ void show_workqueue_state(void)
 		struct worker *worker;
 		bool first = true;
 
-		spin_lock_irqsave(&pool->lock, flags);
+		raw_spin_lock_irqsave(&pool->lock, flags);
 		if (pool->nr_workers == pool->nr_idle)
 			goto next_pool;
 
@@ -4818,7 +4818,7 @@ void show_workqueue_state(void)
 		}
 		pr_cont("\n");
 	next_pool:
-		spin_unlock_irqrestore(&pool->lock, flags);
+		raw_spin_unlock_irqrestore(&pool->lock, flags);
 		/*
 		 * We could be printing a lot from atomic context, e.g.
 		 * sysrq-t -> show_workqueue_state(). Avoid triggering
@@ -4848,7 +4848,7 @@ void wq_worker_comm(char *buf, size_t size, struct task_struct *task)
 		struct worker_pool *pool = worker->pool;
 
 		if (pool) {
-			spin_lock_irq(&pool->lock);
+			raw_spin_lock_irq(&pool->lock);
 			/*
 			 * ->desc tracks information (wq name or
 			 * set_worker_desc()) for the latest execution.  If
@@ -4862,7 +4862,7 @@ void wq_worker_comm(char *buf, size_t size, struct task_struct *task)
 					scnprintf(buf + off, size - off, "-%s",
 						  worker->desc);
 			}
-			spin_unlock_irq(&pool->lock);
+			raw_spin_unlock_irq(&pool->lock);
 		}
 	}
 
@@ -4893,7 +4893,7 @@ static void unbind_workers(int cpu)
 
 	for_each_cpu_worker_pool(pool, cpu) {
 		mutex_lock(&wq_pool_attach_mutex);
-		spin_lock_irq(&pool->lock);
+		raw_spin_lock_irq(&pool->lock);
 
 		/*
 		 * We've blocked all attach/detach operations. Make all workers
@@ -4907,7 +4907,7 @@ static void unbind_workers(int cpu)
 
 		pool->flags |= POOL_DISASSOCIATED;
 
-		spin_unlock_irq(&pool->lock);
+		raw_spin_unlock_irq(&pool->lock);
 		mutex_unlock(&wq_pool_attach_mutex);
 
 		/*
@@ -4933,9 +4933,9 @@ static void unbind_workers(int cpu)
 		 * worker blocking could lead to lengthy stalls.  Kick off
 		 * unbound chain execution of currently pending work items.
 		 */
-		spin_lock_irq(&pool->lock);
+		raw_spin_lock_irq(&pool->lock);
 		wake_up_worker(pool);
-		spin_unlock_irq(&pool->lock);
+		raw_spin_unlock_irq(&pool->lock);
 	}
 }
 
@@ -4962,7 +4962,7 @@ static void rebind_workers(struct worker_pool *pool)
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
 						  pool->attrs->cpumask) < 0);
 
-	spin_lock_irq(&pool->lock);
+	raw_spin_lock_irq(&pool->lock);
 
 	pool->flags &= ~POOL_DISASSOCIATED;
 
@@ -5001,7 +5001,7 @@ static void rebind_workers(struct worker_pool *pool)
 		WRITE_ONCE(worker->flags, worker_flags);
 	}
 
-	spin_unlock_irq(&pool->lock);
+	raw_spin_unlock_irq(&pool->lock);
 }
 
 /**
-- 
2.27.0.rc0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] workqueue: Make the workqueue code PREEMPT_RT safe
  2020-05-27 19:46 [PATCH v2 0/2] workqueue: Make the workqueue code PREEMPT_RT safe Sebastian Andrzej Siewior
  2020-05-27 19:46 ` [PATCH v2 1/2] workqueue: Use rcuwait for wq_manager_wait Sebastian Andrzej Siewior
  2020-05-27 19:46 ` [PATCH v2 2/2] workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t Sebastian Andrzej Siewior
@ 2020-05-27 22:57 ` Linus Torvalds
  2 siblings, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2020-05-27 22:57 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Linux Kernel Mailing List, Tejun Heo, Lai Jiangshan,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar

On Wed, May 27, 2020 at 12:47 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> v1…v2: Use rcuwait instead of swait.

Thanks, both series look sane to me.

I only scanned the other one, but I didn't see anything that made me go "Hmm".

                Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t
  2020-05-27 19:46 ` [PATCH v2 2/2] workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t Sebastian Andrzej Siewior
@ 2020-05-28  0:33   ` Lai Jiangshan
  2020-05-29 14:04   ` Tejun Heo
  1 sibling, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2020-05-28  0:33 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: LKML, Tejun Heo, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Linus Torvalds

On Thu, May 28, 2020 at 3:47 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> The workqueue code has it's internal spinlocks (pool::lock), which
> are acquired on most workqueue operations. These spinlocks are
> converted to 'sleeping' spinlocks on a RT-kernel.
>
> Workqueue functions can be invoked from contexts which are truly atomic
> even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such
> contexts is forbidden.
>
> The pool::lock hold times are bound and the code sections are
> relatively short, which allows to convert pool::lock and as a
> consequence wq_mayday_lock to raw spinlocks which are truly spinning
> locks even on a PREEMPT_RT kernel.
>
> With the previous conversion of the manager waitqueue to a simple
> waitqueue workqueues are now fully RT compliant.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  kernel/workqueue.c | 176 ++++++++++++++++++++++-----------------------
>  1 file changed, 88 insertions(+), 88 deletions(-)
>

Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] workqueue: pin the pool while it is managing
  2020-05-27 19:46 ` [PATCH v2 1/2] workqueue: Use rcuwait for wq_manager_wait Sebastian Andrzej Siewior
@ 2020-05-28  3:06   ` Lai Jiangshan
  2020-05-28  3:06     ` [PATCH 2/2] workqueue: remove useless POOL_MANAGER_ACTIVE Lai Jiangshan
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Lai Jiangshan @ 2020-05-28  3:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Linus Torvalds, Sebastian Andrzej Siewior, Tejun Heo,
	Lai Jiangshan

So that put_unbound_pool() can ensure all workers in idle,
no unfinished manager. And it doens't need to wait any manager
and can go to delete all the idle workers straight away.

Also removes manager waitqueue, because it is unneeded and as
Sebastian Andrzej Siewior said:

  The workqueue code has it's internal spinlock (pool::lock) and also
  implicit spinlock usage in the wq_manager waitqueue. These spinlocks
  are converted to 'sleeping' spinlocks on a RT-kernel.

  Workqueue functions can be invoked from contexts which are truly atomic
  even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such
  contexts is forbidden.

  pool::lock can be converted to a raw spinlock as the lock held times
  are short. But the workqueue manager waitqueue is handled inside of
  pool::lock held regions which again violates the lock nesting rules
  of raw and regular spinlocks.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/lkml/20200527194633.1660952-2-bigeasy@linutronix.de
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 891ccad5f271..fde10a5dba82 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -301,7 +301,6 @@ static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;
 static DEFINE_MUTEX(wq_pool_mutex);	/* protects pools and workqueues list */
 static DEFINE_MUTEX(wq_pool_attach_mutex); /* protects worker attach/detach */
 static DEFINE_SPINLOCK(wq_mayday_lock);	/* protects wq->maydays list */
-static DECLARE_WAIT_QUEUE_HEAD(wq_manager_wait); /* wait for manager to go away */
 
 static LIST_HEAD(workqueues);		/* PR: list of all workqueues */
 static bool workqueue_freezing;		/* PL: have wqs started freezing? */
@@ -2129,10 +2128,21 @@ __acquires(&pool->lock)
 static bool manage_workers(struct worker *worker)
 {
 	struct worker_pool *pool = worker->pool;
+	struct work_struct *work = list_first_entry(&pool->worklist,
+					struct work_struct, entry);
+	struct pool_workqueue *pwq = get_work_pwq(work);
 
 	if (pool->flags & POOL_MANAGER_ACTIVE)
 		return false;
 
+	/*
+	 * If @pwq is for an unbound wq, its base ref and the @pool's
+	 * ref may be put at any time due to an attribute change.
+	 * Pin @pwq to also pin the @pool until the manager is done
+	 * with it.
+	 */
+	get_pwq(pwq);
+
 	pool->flags |= POOL_MANAGER_ACTIVE;
 	pool->manager = worker;
 
@@ -2140,7 +2150,7 @@ static bool manage_workers(struct worker *worker)
 
 	pool->manager = NULL;
 	pool->flags &= ~POOL_MANAGER_ACTIVE;
-	wake_up(&wq_manager_wait);
+	put_pwq(pwq);
 	return true;
 }
 
@@ -3535,16 +3545,7 @@ static void put_unbound_pool(struct worker_pool *pool)
 		idr_remove(&worker_pool_idr, pool->id);
 	hash_del(&pool->hash_node);
 
-	/*
-	 * Become the manager and destroy all workers.  This prevents
-	 * @pool's workers from blocking on attach_mutex.  We're the last
-	 * manager and @pool gets freed with the flag set.
-	 */
 	spin_lock_irq(&pool->lock);
-	wait_event_lock_irq(wq_manager_wait,
-			    !(pool->flags & POOL_MANAGER_ACTIVE), pool->lock);
-	pool->flags |= POOL_MANAGER_ACTIVE;
-
 	while ((worker = first_idle_worker(pool)))
 		destroy_worker(worker);
 	WARN_ON(pool->nr_workers || pool->nr_idle);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] workqueue: remove useless POOL_MANAGER_ACTIVE
  2020-05-28  3:06   ` [PATCH 1/2] workqueue: pin the pool while it is managing Lai Jiangshan
@ 2020-05-28  3:06     ` Lai Jiangshan
  2020-05-28  8:08     ` [PATCH 1/2] workqueue: pin the pool while it is managing Sebastian Andrzej Siewior
  2020-05-28 14:35     ` Tejun Heo
  2 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2020-05-28  3:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Lai Jiangshan, Tejun Heo, Lai Jiangshan

It is the same meaning as pool->manager now.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index fde10a5dba82..8bc11075763b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -70,7 +70,6 @@ enum {
 	 * wq_pool_attach_mutex to avoid changing binding state while
 	 * worker_attach_to_pool() is in progress.
 	 */
-	POOL_MANAGER_ACTIVE	= 1 << 0,	/* being managed */
 	POOL_DISASSOCIATED	= 1 << 2,	/* cpu can't serve workers */
 
 	/* worker flags */
@@ -798,7 +797,7 @@ static bool need_to_create_worker(struct worker_pool *pool)
 /* Do we have too many workers and should some go away? */
 static bool too_many_workers(struct worker_pool *pool)
 {
-	bool managing = pool->flags & POOL_MANAGER_ACTIVE;
+	bool managing = !!pool->manager;
 	int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
 	int nr_busy = pool->nr_workers - nr_idle;
 
@@ -2132,7 +2131,7 @@ static bool manage_workers(struct worker *worker)
 					struct work_struct, entry);
 	struct pool_workqueue *pwq = get_work_pwq(work);
 
-	if (pool->flags & POOL_MANAGER_ACTIVE)
+	if (pool->manager)
 		return false;
 
 	/*
@@ -2143,13 +2142,11 @@ static bool manage_workers(struct worker *worker)
 	 */
 	get_pwq(pwq);
 
-	pool->flags |= POOL_MANAGER_ACTIVE;
 	pool->manager = worker;
 
 	maybe_create_worker(pool);
 
 	pool->manager = NULL;
-	pool->flags &= ~POOL_MANAGER_ACTIVE;
 	put_pwq(pwq);
 	return true;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] workqueue: pin the pool while it is managing
  2020-05-28  3:06   ` [PATCH 1/2] workqueue: pin the pool while it is managing Lai Jiangshan
  2020-05-28  3:06     ` [PATCH 2/2] workqueue: remove useless POOL_MANAGER_ACTIVE Lai Jiangshan
@ 2020-05-28  8:08     ` Sebastian Andrzej Siewior
  2020-05-28  8:35       ` Lai Jiangshan
  2020-05-28 14:35     ` Tejun Heo
  2 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-05-28  8:08 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Linus Torvalds, Tejun Heo, Lai Jiangshan

On 2020-05-28 03:06:55 [+0000], Lai Jiangshan wrote:
> So that put_unbound_pool() can ensure all workers in idle,
> no unfinished manager. And it doens't need to wait any manager
> and can go to delete all the idle workers straight away.
> 
> Also removes manager waitqueue, because it is unneeded and as
> Sebastian Andrzej Siewior said:
> 
>   The workqueue code has it's internal spinlock (pool::lock) and also
>   implicit spinlock usage in the wq_manager waitqueue. These spinlocks
>   are converted to 'sleeping' spinlocks on a RT-kernel.
> 
>   Workqueue functions can be invoked from contexts which are truly atomic
>   even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such
>   contexts is forbidden.
> 
>   pool::lock can be converted to a raw spinlock as the lock held times
>   are short. But the workqueue manager waitqueue is handled inside of
>   pool::lock held regions which again violates the lock nesting rules
>   of raw and regular spinlocks.

This seems to work for my test case I had test my chance. And lockdep
didn't complain so…

If you prefer this over my 1/2 what do we do about 2/2? Do you want me
to repost it?

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] workqueue: pin the pool while it is managing
  2020-05-28  8:08     ` [PATCH 1/2] workqueue: pin the pool while it is managing Sebastian Andrzej Siewior
@ 2020-05-28  8:35       ` Lai Jiangshan
  0 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2020-05-28  8:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Lai Jiangshan, LKML, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, Tejun Heo

On Thu, May 28, 2020 at 4:08 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2020-05-28 03:06:55 [+0000], Lai Jiangshan wrote:
> > So that put_unbound_pool() can ensure all workers in idle,
> > no unfinished manager. And it doens't need to wait any manager
> > and can go to delete all the idle workers straight away.
> >
> > Also removes manager waitqueue, because it is unneeded and as
> > Sebastian Andrzej Siewior said:
> >
> >   The workqueue code has it's internal spinlock (pool::lock) and also
> >   implicit spinlock usage in the wq_manager waitqueue. These spinlocks
> >   are converted to 'sleeping' spinlocks on a RT-kernel.
> >
> >   Workqueue functions can be invoked from contexts which are truly atomic
> >   even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such
> >   contexts is forbidden.
> >
> >   pool::lock can be converted to a raw spinlock as the lock held times
> >   are short. But the workqueue manager waitqueue is handled inside of
> >   pool::lock held regions which again violates the lock nesting rules
> >   of raw and regular spinlocks.
>
> This seems to work for my test case I had test my chance. And lockdep
> didn't complain so…
>
> If you prefer this over my 1/2 what do we do about 2/2? Do you want me
> to repost it?

I think we can just wait until Tejun reviews them.

If there is something wrong that I missed in my patch, your patches
are the best choice.

If I need to update my patch, I will repost the 3 patches
(2 of mine, the 2/2 of yours). At least I forgot to add
"Reported-by Sebastian Andrzej Siewior <bigeasy@linutronix.de>"
in the patch.

If Tejun queues my patches right away, you can rebase the 2/2
of yours and repost it.

Lai

>
> Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] workqueue: pin the pool while it is managing
  2020-05-28  3:06   ` [PATCH 1/2] workqueue: pin the pool while it is managing Lai Jiangshan
  2020-05-28  3:06     ` [PATCH 2/2] workqueue: remove useless POOL_MANAGER_ACTIVE Lai Jiangshan
  2020-05-28  8:08     ` [PATCH 1/2] workqueue: pin the pool while it is managing Sebastian Andrzej Siewior
@ 2020-05-28 14:35     ` Tejun Heo
  2020-05-29  5:33       ` Lai Jiangshan
  2 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2020-05-28 14:35 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Linus Torvalds, Sebastian Andrzej Siewior, Lai Jiangshan

Hello,

On Thu, May 28, 2020 at 03:06:55AM +0000, Lai Jiangshan wrote:
> @@ -2129,10 +2128,21 @@ __acquires(&pool->lock)
>  static bool manage_workers(struct worker *worker)
>  {
>  	struct worker_pool *pool = worker->pool;
> +	struct work_struct *work = list_first_entry(&pool->worklist,
> +					struct work_struct, entry);

I'm not sure about this. It's depending on an external condition (active
work item) which isn't obvious and when that condition breaks the resulting
bug will be one which is difficult to reproduce. Adding to that, pwq isn't
even the object this code path is interested in, which is the cause of the
previous problem too.

> @@ -2140,7 +2150,7 @@ static bool manage_workers(struct worker *worker)
>  
>  	pool->manager = NULL;
>  	pool->flags &= ~POOL_MANAGER_ACTIVE;
> -	wake_up(&wq_manager_wait);
> +	put_pwq(pwq);

So, this works only because pwq release bounces through another work item,
so even if a worker of the pool which is currently being destroyed initiates
the release of the containing pool, it still works out, because by the time
the async release path kicks in and grabs the pool lock, everything should
be idle.

I get that this can work but it's sitting on top of a bunch of subtleties.
The current code is more verbose but also significantly more explicit and
straight-forward. I'd rather keep the current behavior unless we can get rid
of the subtleties.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] workqueue: pin the pool while it is managing
  2020-05-28 14:35     ` Tejun Heo
@ 2020-05-29  5:33       ` Lai Jiangshan
  0 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2020-05-29  5:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lai Jiangshan, LKML, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, Sebastian Andrzej Siewior

On Thu, May 28, 2020 at 10:35 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Thu, May 28, 2020 at 03:06:55AM +0000, Lai Jiangshan wrote:
> > @@ -2129,10 +2128,21 @@ __acquires(&pool->lock)
> >  static bool manage_workers(struct worker *worker)
> >  {
> >       struct worker_pool *pool = worker->pool;
> > +     struct work_struct *work = list_first_entry(&pool->worklist,
> > +                                     struct work_struct, entry);
>
> I'm not sure about this. It's depending on an external condition (active
> work item) which isn't obvious and when that condition breaks the resulting
> bug will be one which is difficult to reproduce. Adding to that, pwq isn't
> even the object this code path is interested in, which is the cause of the
> previous problem too.

Ok, I agree with you.

Thanks
Lai

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t
  2020-05-27 19:46 ` [PATCH v2 2/2] workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t Sebastian Andrzej Siewior
  2020-05-28  0:33   ` Lai Jiangshan
@ 2020-05-29 14:04   ` Tejun Heo
  1 sibling, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2020-05-29 14:04 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Lai Jiangshan, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds

On Wed, May 27, 2020 at 09:46:33PM +0200, Sebastian Andrzej Siewior wrote:
> The workqueue code has it's internal spinlocks (pool::lock), which
> are acquired on most workqueue operations. These spinlocks are
> converted to 'sleeping' spinlocks on a RT-kernel.
> 
> Workqueue functions can be invoked from contexts which are truly atomic
> even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such
> contexts is forbidden.
> 
> The pool::lock hold times are bound and the code sections are
> relatively short, which allows to convert pool::lock and as a
> consequence wq_mayday_lock to raw spinlocks which are truly spinning
> locks even on a PREEMPT_RT kernel.
> 
> With the previous conversion of the manager waitqueue to a simple
> waitqueue workqueues are now fully RT compliant.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Applied 1-2 to wq/for-5.8.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-05-29 14:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27 19:46 [PATCH v2 0/2] workqueue: Make the workqueue code PREEMPT_RT safe Sebastian Andrzej Siewior
2020-05-27 19:46 ` [PATCH v2 1/2] workqueue: Use rcuwait for wq_manager_wait Sebastian Andrzej Siewior
2020-05-28  3:06   ` [PATCH 1/2] workqueue: pin the pool while it is managing Lai Jiangshan
2020-05-28  3:06     ` [PATCH 2/2] workqueue: remove useless POOL_MANAGER_ACTIVE Lai Jiangshan
2020-05-28  8:08     ` [PATCH 1/2] workqueue: pin the pool while it is managing Sebastian Andrzej Siewior
2020-05-28  8:35       ` Lai Jiangshan
2020-05-28 14:35     ` Tejun Heo
2020-05-29  5:33       ` Lai Jiangshan
2020-05-27 19:46 ` [PATCH v2 2/2] workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t Sebastian Andrzej Siewior
2020-05-28  0:33   ` Lai Jiangshan
2020-05-29 14:04   ` Tejun Heo
2020-05-27 22:57 ` [PATCH v2 0/2] workqueue: Make the workqueue code PREEMPT_RT safe Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).