[PATCH 0/2] workqueue: use normal RCU and don't depend on the rq lock

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] workqueue: use normal RCU and don't depend on the rq lock
@ 2019-03-13 16:55 Sebastian Andrzej Siewior
  2019-03-13 16:55 ` [PATCH 1/2] workqueue: Use normal rcu Sebastian Andrzej Siewior
  2019-03-13 16:55 ` [PATCH 2/2] sched: Distangle worker accounting from rq lock Sebastian Andrzej Siewior
  0 siblings, 2 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-03-13 16:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Ingo Molnar, Peter Zijlstra, Thomas Gleixner

The second patch was posted originally around v3.0-rc. While digging
through the archive it seems that there is nothing wrong with the patch
except for the wording of its description. I reworded that part.
I'm not sure if the first patch ever made it ever to lkml.
Both were in -RT for ages (and there is no known fallout).

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] workqueue: Use normal rcu
  2019-03-13 16:55 [PATCH 0/2] workqueue: use normal RCU and don't depend on the rq lock Sebastian Andrzej Siewior
@ 2019-03-13 16:55 ` Sebastian Andrzej Siewior
  2019-03-21 20:59   ` Sebastian Andrzej Siewior
  2019-04-08 19:38   ` Tejun Heo
  2019-03-13 16:55 ` [PATCH 2/2] sched: Distangle worker accounting from rq lock Sebastian Andrzej Siewior
  1 sibling, 2 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-03-13 16:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

There is no need for sched_rcu. The undocumented reason why sched_rcu
is used is to avoid a few explicit rcu_read_lock()/unlock() pairs by
the fact that sched_rcu reader side critical sections are also protected
by preempt or irq disabled regions.

Replace rcu_read_lock_sched with rcu_read_lock and acquire the RCU lock
where it is not yet explicit acquired. Replace local_irq_disable() with
rcu_read_lock(). Update asserts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: mangle changelog a little]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/workqueue.c | 93 +++++++++++++++++++++++++---------------------
 1 file changed, 51 insertions(+), 42 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index fc5d23d752a57..9d22030ae7f44 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -127,16 +127,16 @@ enum {
  *
  * PL: wq_pool_mutex protected.
  *
- * PR: wq_pool_mutex protected for writes.  Sched-RCU protected for reads.
+ * PR: wq_pool_mutex protected for writes.  RCU protected for reads.
  *
  * PW: wq_pool_mutex and wq->mutex protected for writes.  Either for reads.
  *
  * PWR: wq_pool_mutex and wq->mutex protected for writes.  Either or
- *      sched-RCU for reads.
+ *      RCU for reads.
  *
  * WQ: wq->mutex protected.
  *
- * WR: wq->mutex protected for writes.  Sched-RCU protected for reads.
+ * WR: wq->mutex protected for writes.  RCU protected for reads.
  *
  * MD: wq_mayday_lock protected.
  */
@@ -183,7 +183,7 @@ struct worker_pool {
 	atomic_t		nr_running ____cacheline_aligned_in_smp;
 
 	/*
-	 * Destruction of pool is sched-RCU protected to allow dereferences
+	 * Destruction of pool is RCU protected to allow dereferences
 	 * from get_work_pool().
 	 */
 	struct rcu_head		rcu;
@@ -212,7 +212,7 @@ struct pool_workqueue {
 	/*
 	 * Release of unbound pwq is punted to system_wq.  See put_pwq()
 	 * and pwq_unbound_release_workfn() for details.  pool_workqueue
-	 * itself is also sched-RCU protected so that the first pwq can be
+	 * itself is also RCU protected so that the first pwq can be
 	 * determined without grabbing wq->mutex.
 	 */
 	struct work_struct	unbound_release_work;
@@ -264,8 +264,8 @@ struct workqueue_struct {
 	char			name[WQ_NAME_LEN]; /* I: workqueue name */
 
 	/*
-	 * Destruction of workqueue_struct is sched-RCU protected to allow
-	 * walking the workqueues list without grabbing wq_pool_mutex.
+	 * Destruction of workqueue_struct is RCU protected to allow walking
+	 * the workqueues list without grabbing wq_pool_mutex.
 	 * This is used to dump all workqueues from sysrq.
 	 */
 	struct rcu_head		rcu;
@@ -357,20 +357,20 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq);
 #include <trace/events/workqueue.h>
 
 #define assert_rcu_or_pool_mutex()					\
-	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() &&			\
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&			\
 			 !lockdep_is_held(&wq_pool_mutex),		\
-			 "sched RCU or wq_pool_mutex should be held")
+			 "RCU or wq_pool_mutex should be held")
 
 #define assert_rcu_or_wq_mutex(wq)					\
-	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() &&			\
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&			\
 			 !lockdep_is_held(&wq->mutex),			\
-			 "sched RCU or wq->mutex should be held")
+			 "RCU or wq->mutex should be held")
 
 #define assert_rcu_or_wq_mutex_or_pool_mutex(wq)			\
-	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() &&			\
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&			\
 			 !lockdep_is_held(&wq->mutex) &&		\
 			 !lockdep_is_held(&wq_pool_mutex),		\
-			 "sched RCU, wq->mutex or wq_pool_mutex should be held")
+			 "RCU, wq->mutex or wq_pool_mutex should be held")
 
 #define for_each_cpu_worker_pool(pool, cpu)				\
 	for ((pool) = &per_cpu(cpu_worker_pools, cpu)[0];		\
@@ -382,7 +382,7 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq);
  * @pool: iteration cursor
  * @pi: integer used for iteration
  *
- * This must be called either with wq_pool_mutex held or sched RCU read
+ * This must be called either with wq_pool_mutex held or RCU read
  * locked.  If the pool needs to be used beyond the locking in effect, the
  * caller is responsible for guaranteeing that the pool stays online.
  *
@@ -414,7 +414,7 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq);
  * @pwq: iteration cursor
  * @wq: the target workqueue
  *
- * This must be called either with wq->mutex held or sched RCU read locked.
+ * This must be called either with wq->mutex held or RCU read locked.
  * If the pwq needs to be used beyond the locking in effect, the caller is
  * responsible for guaranteeing that the pwq stays online.
  *
@@ -550,7 +550,7 @@ static int worker_pool_assign_id(struct worker_pool *pool)
  * @wq: the target workqueue
  * @node: the node ID
  *
- * This must be called with any of wq_pool_mutex, wq->mutex or sched RCU
+ * This must be called with any of wq_pool_mutex, wq->mutex or RCU
  * read locked.
  * If the pwq needs to be used beyond the locking in effect, the caller is
  * responsible for guaranteeing that the pwq stays online.
@@ -694,8 +694,8 @@ static struct pool_workqueue *get_work_pwq(struct work_struct *work)
  * @work: the work item of interest
  *
  * Pools are created and destroyed under wq_pool_mutex, and allows read
- * access under sched-RCU read lock.  As such, this function should be
- * called under wq_pool_mutex or with preemption disabled.
+ * access under RCU read lock.  As such, this function should be
+ * called under wq_pool_mutex or inside of a rcu_read_lock() region.
  *
  * All fields of the returned pool are accessible as long as the above
  * mentioned locking is in effect.  If the returned pool needs to be used
@@ -1120,7 +1120,7 @@ static void put_pwq_unlocked(struct pool_workqueue *pwq)
 {
 	if (pwq) {
 		/*
-		 * As both pwqs and pools are sched-RCU protected, the
+		 * As both pwqs and pools are RCU protected, the
 		 * following lock operations are safe.
 		 */
 		spin_lock_irq(&pwq->pool->lock);
@@ -1248,6 +1248,7 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
 		return 0;
 
+	rcu_read_lock();
 	/*
 	 * The queueing is in progress, or it is already queued. Try to
 	 * steal it from ->worklist without clearing WORK_STRUCT_PENDING.
@@ -1286,10 +1287,12 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 		set_work_pool_and_keep_pending(work, pool->id);
 
 		spin_unlock(&pool->lock);
+		rcu_read_unlock();
 		return 1;
 	}
 	spin_unlock(&pool->lock);
 fail:
+	rcu_read_unlock();
 	local_irq_restore(*flags);
 	if (work_is_canceling(work))
 		return -ENOENT;
@@ -1403,6 +1406,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 	if (unlikely(wq->flags & __WQ_DRAINING) &&
 	    WARN_ON_ONCE(!is_chained_work(wq)))
 		return;
+	rcu_read_lock();
 retry:
 	if (req_cpu == WORK_CPU_UNBOUND)
 		cpu = wq_select_unbound_cpu(raw_smp_processor_id());
@@ -1459,10 +1463,8 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 	/* pwq determined, queue */
 	trace_workqueue_queue_work(req_cpu, pwq, work);
 
-	if (WARN_ON(!list_empty(&work->entry))) {
-		spin_unlock(&pwq->pool->lock);
-		return;
-	}
+	if (WARN_ON(!list_empty(&work->entry)))
+		goto out;
 
 	pwq->nr_in_flight[pwq->work_color]++;
 	work_flags = work_color_to_flags(pwq->work_color);
@@ -1480,7 +1482,9 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 
 	insert_work(pwq, work, worklist, work_flags);
 
+out:
 	spin_unlock(&pwq->pool->lock);
+	rcu_read_unlock();
 }
 
 /**
@@ -2878,14 +2882,14 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
 
 	might_sleep();
 
-	local_irq_disable();
+	rcu_read_lock();
 	pool = get_work_pool(work);
 	if (!pool) {
-		local_irq_enable();
+		rcu_read_unlock();
 		return false;
 	}
 
-	spin_lock(&pool->lock);
+	spin_lock_irq(&pool->lock);
 	/* see the comment in try_to_grab_pending() with the same code */
 	pwq = get_work_pwq(work);
 	if (pwq) {
@@ -2917,10 +2921,11 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
 		lock_map_acquire(&pwq->wq->lockdep_map);
 		lock_map_release(&pwq->wq->lockdep_map);
 	}
-
+	rcu_read_unlock();
 	return true;
 already_gone:
 	spin_unlock_irq(&pool->lock);
+	rcu_read_unlock();
 	return false;
 }
 
@@ -3364,7 +3369,7 @@ static void rcu_free_pool(struct rcu_head *rcu)
  * put_unbound_pool - put a worker_pool
  * @pool: worker_pool to put
  *
- * Put @pool.  If its refcnt reaches zero, it gets destroyed in sched-RCU
+ * Put @pool.  If its refcnt reaches zero, it gets destroyed in RCU
  * safe manner.  get_unbound_pool() calls this function on its failure path
  * and this function should be able to release pools which went through,
  * successfully or not, init_worker_pool().
@@ -3418,7 +3423,7 @@ static void put_unbound_pool(struct worker_pool *pool)
 	del_timer_sync(&pool->idle_timer);
 	del_timer_sync(&pool->mayday_timer);
 
-	/* sched-RCU protected to allow dereferences from get_work_pool() */
+	/* RCU protected to allow dereferences from get_work_pool() */
 	call_rcu(&pool->rcu, rcu_free_pool);
 }
 
@@ -4328,7 +4333,8 @@ bool workqueue_congested(int cpu, struct workqueue_struct *wq)
 	struct pool_workqueue *pwq;
 	bool ret;
 
-	rcu_read_lock_sched();
+	rcu_read_lock();
+	preempt_disable();
 
 	if (cpu == WORK_CPU_UNBOUND)
 		cpu = smp_processor_id();
@@ -4339,7 +4345,8 @@ bool workqueue_congested(int cpu, struct workqueue_struct *wq)
 		pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
 
 	ret = !list_empty(&pwq->delayed_works);
-	rcu_read_unlock_sched();
+	preempt_enable();
+	rcu_read_unlock();
 
 	return ret;
 }
@@ -4365,15 +4372,15 @@ unsigned int work_busy(struct work_struct *work)
 	if (work_pending(work))
 		ret |= WORK_BUSY_PENDING;
 
-	local_irq_save(flags);
+	rcu_read_lock();
 	pool = get_work_pool(work);
 	if (pool) {
-		spin_lock(&pool->lock);
+		spin_lock_irqsave(&pool->lock, flags);
 		if (find_worker_executing_work(pool, work))
 			ret |= WORK_BUSY_RUNNING;
-		spin_unlock(&pool->lock);
+		spin_unlock_irqrestore(&pool->lock, flags);
 	}
-	local_irq_restore(flags);
+	rcu_read_unlock();
 
 	return ret;
 }
@@ -4557,7 +4564,7 @@ void show_workqueue_state(void)
 	unsigned long flags;
 	int pi;
 
-	rcu_read_lock_sched();
+	rcu_read_lock();
 
 	pr_info("Showing busy workqueues and worker pools:\n");
 
@@ -4622,7 +4629,7 @@ void show_workqueue_state(void)
 		touch_nmi_watchdog();
 	}
 
-	rcu_read_unlock_sched();
+	rcu_read_unlock();
 }
 
 /* used to show worker information through /proc/PID/{comm,stat,status} */
@@ -5009,16 +5016,16 @@ bool freeze_workqueues_busy(void)
 		 * nr_active is monotonically decreasing.  It's safe
 		 * to peek without lock.
 		 */
-		rcu_read_lock_sched();
+		rcu_read_lock();
 		for_each_pwq(pwq, wq) {
 			WARN_ON_ONCE(pwq->nr_active < 0);
 			if (pwq->nr_active) {
 				busy = true;
-				rcu_read_unlock_sched();
+				rcu_read_unlock();
 				goto out_unlock;
 			}
 		}
-		rcu_read_unlock_sched();
+		rcu_read_unlock();
 	}
 out_unlock:
 	mutex_unlock(&wq_pool_mutex);
@@ -5213,7 +5220,8 @@ static ssize_t wq_pool_ids_show(struct device *dev,
 	const char *delim = "";
 	int node, written = 0;
 
-	rcu_read_lock_sched();
+	get_online_cpus();
+	rcu_read_lock();
 	for_each_node(node) {
 		written += scnprintf(buf + written, PAGE_SIZE - written,
 				     "%s%d:%d", delim, node,
@@ -5221,7 +5229,8 @@ static ssize_t wq_pool_ids_show(struct device *dev,
 		delim = " ";
 	}
 	written += scnprintf(buf + written, PAGE_SIZE - written, "\n");
-	rcu_read_unlock_sched();
+	rcu_read_unlock();
+	put_online_cpus();
 
 	return written;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] sched: Distangle worker accounting from rq lock
  2019-03-13 16:55 [PATCH 0/2] workqueue: use normal RCU and don't depend on the rq lock Sebastian Andrzej Siewior
  2019-03-13 16:55 ` [PATCH 1/2] workqueue: Use normal rcu Sebastian Andrzej Siewior
@ 2019-03-13 16:55 ` Sebastian Andrzej Siewior
  2019-04-08 19:45   ` Tejun Heo
  1 sibling, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-03-13 16:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Daniel Bristot de Oliveira,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

The worker accounting for CPU bound workers is plugged into the core
scheduler code and the wakeup code. This is not a hard requirement and
can be avoided by keeping track of the state in the workqueue code
itself.

Keep track of the sleeping state in the worker itself and call the
notifier before entering the core scheduler. There might be false
positives when the task is woken between that call and actually
scheduling, but that's not really different from scheduling and being
woken immediately after switching away. When nr_running is updated when
the task is retunrning from schedule() then it is later compared when it
is done from ttwu().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: http://lkml.kernel.org/r/20110622174919.135236139@linutronix.de
Link: http://lkml.kernel.org/r/ad2b29b5715f970bffc1a7026cabd6ff0b24076a.1532952814.git.bristot@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: preempt_disable() around wq_worker_sleeping() by Daniel Bristot de
          Oliveira]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/sched/core.c         | 88 +++++++++----------------------------
 kernel/workqueue.c          | 54 ++++++++++-------------
 kernel/workqueue_internal.h |  5 ++-
 3 files changed, 48 insertions(+), 99 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d8d76a65cfdd5..82fb09672902d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1655,10 +1655,6 @@ static inline void ttwu_activate(struct rq *rq, struct task_struct *p, int en_fl
 {
 	activate_task(rq, p, en_flags);
 	p->on_rq = TASK_ON_RQ_QUEUED;
-
-	/* If a worker is waking up, notify the workqueue: */
-	if (p->flags & PF_WQ_WORKER)
-		wq_worker_waking_up(p, cpu_of(rq));
 }
 
 /*
@@ -2076,56 +2072,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	return success;
 }
 
-/**
- * try_to_wake_up_local - try to wake up a local task with rq lock held
- * @p: the thread to be awakened
- * @rf: request-queue flags for pinning
- *
- * Put @p on the run-queue if it's not already there. The caller must
- * ensure that this_rq() is locked, @p is bound to this_rq() and not
- * the current task.
- */
-static void try_to_wake_up_local(struct task_struct *p, struct rq_flags *rf)
-{
-	struct rq *rq = task_rq(p);
-
-	if (WARN_ON_ONCE(rq != this_rq()) ||
-	    WARN_ON_ONCE(p == current))
-		return;
-
-	lockdep_assert_held(&rq->lock);
-
-	if (!raw_spin_trylock(&p->pi_lock)) {
-		/*
-		 * This is OK, because current is on_cpu, which avoids it being
-		 * picked for load-balance and preemption/IRQs are still
-		 * disabled avoiding further scheduler activity on it and we've
-		 * not yet picked a replacement task.
-		 */
-		rq_unlock(rq, rf);
-		raw_spin_lock(&p->pi_lock);
-		rq_relock(rq, rf);
-	}
-
-	if (!(p->state & TASK_NORMAL))
-		goto out;
-
-	trace_sched_waking(p);
-
-	if (!task_on_rq_queued(p)) {
-		if (p->in_iowait) {
-			delayacct_blkio_end(p);
-			atomic_dec(&rq->nr_iowait);
-		}
-		ttwu_activate(rq, p, ENQUEUE_WAKEUP | ENQUEUE_NOCLOCK);
-	}
-
-	ttwu_do_wakeup(rq, p, 0, rf);
-	ttwu_stat(p, smp_processor_id(), 0);
-out:
-	raw_spin_unlock(&p->pi_lock);
-}
-
 /**
  * wake_up_process - Wake up a specific process
  * @p: The process to be woken up.
@@ -3439,19 +3385,6 @@ static void __sched notrace __schedule(bool preempt)
 				atomic_inc(&rq->nr_iowait);
 				delayacct_blkio_start();
 			}
-
-			/*
-			 * If a worker went to sleep, notify and ask workqueue
-			 * whether it wants to wake up a task to maintain
-			 * concurrency.
-			 */
-			if (prev->flags & PF_WQ_WORKER) {
-				struct task_struct *to_wakeup;
-
-				to_wakeup = wq_worker_sleeping(prev);
-				if (to_wakeup)
-					try_to_wake_up_local(to_wakeup, &rf);
-			}
 		}
 		switch_count = &prev->nvcsw;
 	}
@@ -3511,6 +3444,20 @@ static inline void sched_submit_work(struct task_struct *tsk)
 {
 	if (!tsk->state || tsk_is_pi_blocked(tsk))
 		return;
+
+	/*
+	 * If a worker went to sleep, notify and ask workqueue whether
+	 * it wants to wake up a task to maintain concurrency.
+	 * As this function is called inside the schedule() context,
+	 * we disable preemption to avoid it calling schedule() again
+	 * in the possible wakeup of a kworker.
+	 */
+	if (tsk->flags & PF_WQ_WORKER) {
+		preempt_disable();
+		wq_worker_sleeping(tsk);
+		preempt_enable_no_resched();
+	}
+
 	/*
 	 * If we are going to sleep and we have plugged IO queued,
 	 * make sure to submit it to avoid deadlocks.
@@ -3519,6 +3466,12 @@ static inline void sched_submit_work(struct task_struct *tsk)
 		blk_schedule_flush_plug(tsk);
 }
 
+static void sched_update_worker(struct task_struct *tsk)
+{
+	if (tsk->flags & PF_WQ_WORKER)
+		wq_worker_running(tsk);
+}
+
 asmlinkage __visible void __sched schedule(void)
 {
 	struct task_struct *tsk = current;
@@ -3529,6 +3482,7 @@ asmlinkage __visible void __sched schedule(void)
 		__schedule(false);
 		sched_preempt_enable_no_resched();
 	} while (need_resched());
+	sched_update_worker(tsk);
 }
 EXPORT_SYMBOL(schedule);
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9d22030ae7f44..e785fdd7dc537 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -839,43 +839,32 @@ static void wake_up_worker(struct worker_pool *pool)
 }
 
 /**
- * wq_worker_waking_up - a worker is waking up
+ * wq_worker_running - a worker is running again
  * @task: task waking up
- * @cpu: CPU @task is waking up to
  *
- * This function is called during try_to_wake_up() when a worker is
- * being awoken.
- *
- * CONTEXT:
- * spin_lock_irq(rq->lock)
+ * This function is called when a worker returns from schedule()
  */
-void wq_worker_waking_up(struct task_struct *task, int cpu)
+void wq_worker_running(struct task_struct *task)
 {
 	struct worker *worker = kthread_data(task);
 
-	if (!(worker->flags & WORKER_NOT_RUNNING)) {
-		WARN_ON_ONCE(worker->pool->cpu != cpu);
+	if (!worker->sleeping)
+		return;
+	if (!(worker->flags & WORKER_NOT_RUNNING))
 		atomic_inc(&worker->pool->nr_running);
-	}
+	worker->sleeping = 0;
 }
 
 /**
  * wq_worker_sleeping - a worker is going to sleep
  * @task: task going to sleep
  *
- * This function is called during schedule() when a busy worker is
- * going to sleep.  Worker on the same cpu can be woken up by
- * returning pointer to its task.
- *
- * CONTEXT:
- * spin_lock_irq(rq->lock)
- *
- * Return:
- * Worker task on @cpu to wake up, %NULL if none.
+ * This function is called from schedule() when a busy worker is
+ * going to sleep.
  */
-struct task_struct *wq_worker_sleeping(struct task_struct *task)
+void wq_worker_sleeping(struct task_struct *task)
 {
-	struct worker *worker = kthread_data(task), *to_wakeup = NULL;
+	struct worker *next, *worker = kthread_data(task);
 	struct worker_pool *pool;
 
 	/*
@@ -884,13 +873,15 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task)
 	 * checking NOT_RUNNING.
 	 */
 	if (worker->flags & WORKER_NOT_RUNNING)
-		return NULL;
+		return;
 
 	pool = worker->pool;
 
-	/* this can only happen on the local cpu */
-	if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
-		return NULL;
+	if (WARN_ON_ONCE(worker->sleeping))
+		return;
+
+	worker->sleeping = 1;
+	spin_lock_irq(&pool->lock);
 
 	/*
 	 * The counterpart of the following dec_and_test, implied mb,
@@ -904,9 +895,12 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task)
 	 * lock is safe.
 	 */
 	if (atomic_dec_and_test(&pool->nr_running) &&
-	    !list_empty(&pool->worklist))
-		to_wakeup = first_idle_worker(pool);
-	return to_wakeup ? to_wakeup->task : NULL;
+	    !list_empty(&pool->worklist)) {
+		next = first_idle_worker(pool);
+		if (next)
+			wake_up_process(next->task);
+	}
+	spin_unlock_irq(&pool->lock);
 }
 
 /**
@@ -4793,7 +4787,7 @@ static void rebind_workers(struct worker_pool *pool)
 		 *
 		 * WRITE_ONCE() is necessary because @worker->flags may be
 		 * tested without holding any lock in
-		 * wq_worker_waking_up().  Without it, NOT_RUNNING test may
+		 * wq_worker_running().  Without it, NOT_RUNNING test may
 		 * fail incorrectly leading to premature concurrency
 		 * management operations.
 		 */
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index cb68b03ca89aa..498de0e909a43 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -44,6 +44,7 @@ struct worker {
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
 	int			id;		/* I: worker id */
+	int			sleeping;	/* None */
 
 	/*
 	 * Opaque string set with work_set_desc().  Printed out with task
@@ -72,8 +73,8 @@ static inline struct worker *current_wq_worker(void)
  * Scheduler hooks for concurrency managed workqueue.  Only to be used from
  * sched/ and workqueue.c.
  */
-void wq_worker_waking_up(struct task_struct *task, int cpu);
-struct task_struct *wq_worker_sleeping(struct task_struct *task);
+void wq_worker_running(struct task_struct *task);
+void wq_worker_sleeping(struct task_struct *task);
 work_func_t wq_worker_last_func(struct task_struct *task);
 
 #endif /* _KERNEL_WORKQUEUE_INTERNAL_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] workqueue: Use normal rcu
  2019-03-13 16:55 ` [PATCH 1/2] workqueue: Use normal rcu Sebastian Andrzej Siewior
@ 2019-03-21 20:59   ` Sebastian Andrzej Siewior
  2019-03-22 17:43     ` Tejun Heo
  2019-04-08 19:38   ` Tejun Heo
  1 sibling, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-03-21 20:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Ingo Molnar, Peter Zijlstra, Thomas Gleixner

On 2019-03-13 17:55:47 [+0100], To linux-kernel@vger.kernel.org wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> There is no need for sched_rcu. The undocumented reason why sched_rcu
> is used is to avoid a few explicit rcu_read_lock()/unlock() pairs by
> the fact that sched_rcu reader side critical sections are also protected
> by preempt or irq disabled regions.
> 
> Replace rcu_read_lock_sched with rcu_read_lock and acquire the RCU lock
> where it is not yet explicit acquired. Replace local_irq_disable() with
> rcu_read_lock(). Update asserts.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> [bigeasy: mangle changelog a little]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

A gentle ping.

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] workqueue: Use normal rcu
  2019-03-21 20:59   ` Sebastian Andrzej Siewior
@ 2019-03-22 17:43     ` Tejun Heo
  2019-03-22 17:59       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2019-03-22 17:43 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Lai Jiangshan, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner

Hello,

On Thu, Mar 21, 2019 at 09:59:35PM +0100, Sebastian Andrzej Siewior wrote:
> On 2019-03-13 17:55:47 [+0100], To linux-kernel@vger.kernel.org wrote:
> > From: Thomas Gleixner <tglx@linutronix.de>
> > 
> > There is no need for sched_rcu. The undocumented reason why sched_rcu
> > is used is to avoid a few explicit rcu_read_lock()/unlock() pairs by
> > the fact that sched_rcu reader side critical sections are also protected
> > by preempt or irq disabled regions.
> > 
> > Replace rcu_read_lock_sched with rcu_read_lock and acquire the RCU lock
> > where it is not yet explicit acquired. Replace local_irq_disable() with
> > rcu_read_lock(). Update asserts.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > [bigeasy: mangle changelog a little]
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> A gentle ping.

We can switch but it doesn't really say why we'd want to.  Can you
please explain why this is better?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] workqueue: Use normal rcu
  2019-03-22 17:43     ` Tejun Heo
@ 2019-03-22 17:59       ` Sebastian Andrzej Siewior
  2019-04-05 14:42         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-03-22 17:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, Lai Jiangshan, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner

On 2019-03-22 10:43:34 [-0700], Tejun Heo wrote:
> Hello,
Hi,

> We can switch but it doesn't really say why we'd want to.  Can you
> please explain why this is better?

there is this undocumented part. Avoiding the sched RCU means also we
are more preemptible which is good :) Especially on -RT where we can't
disable preemption across the whole critical section.

Is this good enough?

> Thanks.

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] workqueue: Use normal rcu
  2019-03-22 17:59       ` Sebastian Andrzej Siewior
@ 2019-04-05 14:42         ` Sebastian Andrzej Siewior
  2019-04-08 15:10           ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-04-05 14:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, Lai Jiangshan, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner

On 2019-03-22 18:59:23 [+0100], To Tejun Heo wrote:
> On 2019-03-22 10:43:34 [-0700], Tejun Heo wrote:
> > Hello,
Hi,

> > We can switch but it doesn't really say why we'd want to.  Can you
> > please explain why this is better?
> 
> there is this undocumented part. Avoiding the sched RCU means also we
> are more preemptible which is good :) Especially on -RT where we can't
> disable preemption across the whole critical section.
> 
> Is this good enough?

a gentle ping.

> > Thanks.
> 
Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] workqueue: Use normal rcu
  2019-04-05 14:42         ` Sebastian Andrzej Siewior
@ 2019-04-08 15:10           ` Tejun Heo
  0 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2019-04-08 15:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Lai Jiangshan, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner

Hello, Sebastian.

On Fri, Apr 05, 2019 at 04:42:18PM +0200, Sebastian Andrzej Siewior wrote:
> On 2019-03-22 18:59:23 [+0100], To Tejun Heo wrote:
> > On 2019-03-22 10:43:34 [-0700], Tejun Heo wrote:
> > > Hello,
> Hi,
> 
> > > We can switch but it doesn't really say why we'd want to.  Can you
> > > please explain why this is better?
> > 
> > there is this undocumented part. Avoiding the sched RCU means also we
> > are more preemptible which is good :) Especially on -RT where we can't
> > disable preemption across the whole critical section.
> > 
> > Is this good enough?
> 
> a gentle ping.

Heh, yeah, sorry about dragging my feet on it.  The patchset itself
doesn't actually improve the upstream kernel (except for the obvious
bug fix where it's using the wrong call_rcu right now), so I've been
constantly on the fence.  At the same time, it doesn't really worsen
anything either.  Imma look through the patches once more later today.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] workqueue: Use normal rcu
  2019-03-13 16:55 ` [PATCH 1/2] workqueue: Use normal rcu Sebastian Andrzej Siewior
  2019-03-21 20:59   ` Sebastian Andrzej Siewior
@ 2019-04-08 19:38   ` Tejun Heo
  1 sibling, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2019-04-08 19:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Lai Jiangshan, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner

On Wed, Mar 13, 2019 at 05:55:47PM +0100, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> There is no need for sched_rcu. The undocumented reason why sched_rcu
> is used is to avoid a few explicit rcu_read_lock()/unlock() pairs by
> the fact that sched_rcu reader side critical sections are also protected
> by preempt or irq disabled regions.
> 
> Replace rcu_read_lock_sched with rcu_read_lock and acquire the RCU lock
> where it is not yet explicit acquired. Replace local_irq_disable() with
> rcu_read_lock(). Update asserts.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> [bigeasy: mangle changelog a little]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Applied to wq/for-5.2.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] sched: Distangle worker accounting from rq lock
  2019-03-13 16:55 ` [PATCH 2/2] sched: Distangle worker accounting from rq lock Sebastian Andrzej Siewior
@ 2019-04-08 19:45   ` Tejun Heo
  2019-04-09  7:33     ` Sebastian Andrzej Siewior
  2019-04-09  8:22     ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Tejun Heo @ 2019-04-08 19:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Lai Jiangshan, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Daniel Bristot de Oliveira

Hello,

On Wed, Mar 13, 2019 at 05:55:48PM +0100, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The worker accounting for CPU bound workers is plugged into the core
> scheduler code and the wakeup code. This is not a hard requirement and
> can be avoided by keeping track of the state in the workqueue code
> itself.
> 
> Keep track of the sleeping state in the worker itself and call the
> notifier before entering the core scheduler. There might be false
> positives when the task is woken between that call and actually
> scheduling, but that's not really different from scheduling and being
> woken immediately after switching away. When nr_running is updated when
> the task is retunrning from schedule() then it is later compared when it
> is done from ttwu().
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Link: http://lkml.kernel.org/r/20110622174919.135236139@linutronix.de
> Link: http://lkml.kernel.org/r/ad2b29b5715f970bffc1a7026cabd6ff0b24076a.1532952814.git.bristot@redhat.com
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> [bigeasy: preempt_disable() around wq_worker_sleeping() by Daniel Bristot de
>           Oliveira]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

This looks good from wq side.  Peter, are you okay with routing this
through the wq tree?  If you wanna take it through the sched tree,
please feel free to add

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] sched: Distangle worker accounting from rq lock
  2019-04-08 19:45   ` Tejun Heo
@ 2019-04-09  7:33     ` Sebastian Andrzej Siewior
  2019-04-09  8:22     ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-04-09  7:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, Lai Jiangshan, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Daniel Bristot de Oliveira

On 2019-04-08 12:45:05 [-0700], Tejun Heo wrote:
> Hello,
Hi,
…
> This looks good from wq side.  Peter, are you okay with routing this
> through the wq tree?  If you wanna take it through the sched tree,
> please feel free to add

Thank you.

> Acked-by: Tejun Heo <tj@kernel.org>
> 
> Thanks.

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] sched: Distangle worker accounting from rq lock
  2019-04-08 19:45   ` Tejun Heo
  2019-04-09  7:33     ` Sebastian Andrzej Siewior
@ 2019-04-09  8:22     ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2019-04-09  8:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Sebastian Andrzej Siewior, linux-kernel, Lai Jiangshan,
	Ingo Molnar, Thomas Gleixner, Daniel Bristot de Oliveira

On Mon, Apr 08, 2019 at 12:45:05PM -0700, Tejun Heo wrote:
> Hello,
> 
> On Wed, Mar 13, 2019 at 05:55:48PM +0100, Sebastian Andrzej Siewior wrote:
> > From: Thomas Gleixner <tglx@linutronix.de>
> > 
> > The worker accounting for CPU bound workers is plugged into the core
> > scheduler code and the wakeup code. This is not a hard requirement and
> > can be avoided by keeping track of the state in the workqueue code
> > itself.
> > 
> > Keep track of the sleeping state in the worker itself and call the
> > notifier before entering the core scheduler. There might be false
> > positives when the task is woken between that call and actually
> > scheduling, but that's not really different from scheduling and being
> > woken immediately after switching away. When nr_running is updated when
> > the task is retunrning from schedule() then it is later compared when it
> > is done from ttwu().
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > Link: http://lkml.kernel.org/r/20110622174919.135236139@linutronix.de
> > Link: http://lkml.kernel.org/r/ad2b29b5715f970bffc1a7026cabd6ff0b24076a.1532952814.git.bristot@redhat.com
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > [bigeasy: preempt_disable() around wq_worker_sleeping() by Daniel Bristot de
> >           Oliveira]
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> This looks good from wq side.  Peter, are you okay with routing this
> through the wq tree?  If you wanna take it through the sched tree,
> please feel free to add
> 
> Acked-by: Tejun Heo <tj@kernel.org>

If you don't mind I'll take it through the sched tree, because while
looking at it I did a few cleanups on top.

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-04-09  8:23 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-13 16:55 [PATCH 0/2] workqueue: use normal RCU and don't depend on the rq lock Sebastian Andrzej Siewior
2019-03-13 16:55 ` [PATCH 1/2] workqueue: Use normal rcu Sebastian Andrzej Siewior
2019-03-21 20:59   ` Sebastian Andrzej Siewior
2019-03-22 17:43     ` Tejun Heo
2019-03-22 17:59       ` Sebastian Andrzej Siewior
2019-04-05 14:42         ` Sebastian Andrzej Siewior
2019-04-08 15:10           ` Tejun Heo
2019-04-08 19:38   ` Tejun Heo
2019-03-13 16:55 ` [PATCH 2/2] sched: Distangle worker accounting from rq lock Sebastian Andrzej Siewior
2019-04-08 19:45   ` Tejun Heo
2019-04-09  7:33     ` Sebastian Andrzej Siewior
2019-04-09  8:22     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).