All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues
@ 2013-03-18 15:23 Viresh Kumar
  2013-03-18 15:23 ` [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving Viresh Kumar
                   ` (7 more replies)
  0 siblings, 8 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-18 15:23 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: linaro-kernel, robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel, Viresh Kumar

In order to save power, it would be useful to schedule light weight work on cpus
that aren't IDLE instead of waking up an IDLE one.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This is already implemented for timers as get_nohz_timer_target(). We can figure
out few more users of this feature, like workqueues.

This patchset converts get_nohz_timer_target() into a generic API
sched_select_cpu() so that other frameworks (like workqueue) can also use it.

This routine returns the cpu which is non-idle. It accepts a bitwise OR of SD_*
flags present in linux/sched.h. If the local CPU isn't idle OR all cpus are
idle, local cpu is returned back. If local cpu is idle, then we must look for
another CPU which have all the flags passed as argument as set and isn't idle.

This patchset in first two patches creates generic API sched_select_cpu(). In
the third patch we create a new set of APIs for workqueues to queue work on any
cpu. All other patches migrate some of the users of workqueues which showed up
significantly on my setup. Others can be migrated later.

Earlier discussions over v1 and v2 can be found here:
http://www.mail-archive.com/linaro-dev@lists.linaro.org/msg13342.html
http://lists.linaro.org/pipermail/linaro-dev/2012-November/014344.html

Earlier discussions over this concept were done at last LPC:
http://summit.linuxplumbersconf.org/lpc-2012/meeting/90/lpc2012-sched-timer-workqueue/

Setup:
-----
- ARM Vexpress TC2 - big.LITTLE CPU
- Core 0-1: A15, 2-4: A7
- rootfs: linaro-ubuntu-devel

This patchset has been tested on a big LITTLE system (heterogeneous) but is
useful for all other homogeneous systems as well. During these tests audio was
played in background using aplay.

Results:
-------

Cluster A15 Energy      Cluster A7 Energy       Total
------------------      -----------------       -----

Without this patchset (Energy in Joules):
---------------------

0.151162                2.183545                2.334707
0.223730                2.687067                2.910797
0.289687                2.732702                3.022389
0.454198                2.745908                3.200106
0.495552                2.746465                3.242017

Average:
0.322866                2.619137                2.942003
	                

With this patchset (Energy in Joules):
---------------------

0.133361                2.267822                2.401183
0.260626                2.833389                3.094015
0.142365                2.277929                2.420294
0.246793                2.582550                2.829343
0.130462                2.257033                2.387495

Average:
0.182721                2.443745                2.626466


Above tests are repeated multiple times and events are tracked using trace-cmd
and analysed using kernelshark. And it was easily noticeable that idle time for
many cpus has increased considerably, which eventually saved some power.

These patches are applied here for others to test:
http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/sched-wq-migration-v3

Viresh Kumar (7):
  sched: Create sched_select_cpu() to give preferred CPU for power
    saving
  timer: hrtimer: Don't check idle_cpu() before calling
    get_nohz_timer_target()
  workqueue: Add helpers to schedule work on any cpu
  PHYLIB: queue work on any cpu
  mmc: queue work on any cpu
  block: queue work on any cpu
  fbcon: queue work on any cpu

 block/blk-core.c              |   6 +-
 block/blk-ioc.c               |   2 +-
 block/genhd.c                 |   9 ++-
 drivers/mmc/core/core.c       |   2 +-
 drivers/net/phy/phy.c         |   9 +--
 drivers/video/console/fbcon.c |   2 +-
 include/linux/sched.h         |  21 +++++-
 include/linux/workqueue.h     |   5 ++
 kernel/hrtimer.c              |   2 +-
 kernel/sched/core.c           |  63 +++++++++-------
 kernel/timer.c                |   2 +-
 kernel/workqueue.c            | 163 +++++++++++++++++++++++++++++-------------
 12 files changed, 192 insertions(+), 94 deletions(-)

-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving
  2013-03-18 15:23 [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues Viresh Kumar
@ 2013-03-18 15:23 ` Viresh Kumar
  2013-03-18 15:39   ` Frederic Weisbecker
  2013-03-19 12:30   ` Peter Zijlstra
  2013-03-18 15:23 ` [PATCH V3 2/7] timer: hrtimer: Don't check idle_cpu() before calling get_nohz_timer_target() Viresh Kumar
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-18 15:23 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: linaro-kernel, robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel, Viresh Kumar

In order to save power, it would be useful to schedule light weight work on cpus
that aren't IDLE instead of waking up an IDLE one.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This is already implemented for timers as get_nohz_timer_target(). We can figure
out few more users of this feature, like workqueues.

This patch converts get_nohz_timer_target() into a generic API
sched_select_cpu() so that other frameworks (like workqueue) can also use it.

This routine returns the cpu which is non-idle. It accepts a bitwise OR of SD_*
flags present in linux/sched.h. If the local CPU isn't idle OR all cpus are
idle, local cpu is returned back. If local cpu is idle, then we must look for
another CPU which have all the flags passed as argument as set and isn't idle.

This patch reuses the code from get_nohz_timer_target() routine, which had
similar implementation.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 include/linux/sched.h | 21 +++++++++++++++--
 kernel/sched/core.c   | 63 +++++++++++++++++++++++++++++----------------------
 2 files changed, 55 insertions(+), 29 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e20580d..216fa0d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -230,14 +230,31 @@ extern void init_idle_bootup_task(struct task_struct *idle);
 
 extern int runqueue_is_locked(int cpu);
 
-#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
+#ifdef CONFIG_SMP
+extern int sched_select_cpu(unsigned int sd_flags);
+
+#ifdef CONFIG_NO_HZ
 extern void nohz_balance_enter_idle(int cpu);
 extern void set_cpu_sd_state_idle(void);
-extern int get_nohz_timer_target(void);
+/*
+ * In the semi idle case, use the nearest busy cpu for migrating timers
+ * from an idle cpu.  This is good for power-savings.
+ *
+ * We don't do similar optimization for completely idle system, as
+ * selecting an idle cpu will add more delays to the timers than intended
+ * (as that cpu's timer base may not be uptodate wrt jiffies etc).
+ */
+#define get_nohz_timer_target() sched_select_cpu(0)
 #else
 static inline void nohz_balance_enter_idle(int cpu) { }
 static inline void set_cpu_sd_state_idle(void) { }
+
+static inline int sched_select_cpu(unsigned int sd_flags)
+{
+	return raw_smp_processor_id();
+}
 #endif
+#endif /* CONFIG_SMP */
 
 /*
  * Only dump TASK_* tasks. (0 for all tasks)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b36635e..ccd76d7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -551,33 +551,6 @@ void resched_cpu(int cpu)
 
 #ifdef CONFIG_NO_HZ
 /*
- * In the semi idle case, use the nearest busy cpu for migrating timers
- * from an idle cpu.  This is good for power-savings.
- *
- * We don't do similar optimization for completely idle system, as
- * selecting an idle cpu will add more delays to the timers than intended
- * (as that cpu's timer base may not be uptodate wrt jiffies etc).
- */
-int get_nohz_timer_target(void)
-{
-	int cpu = smp_processor_id();
-	int i;
-	struct sched_domain *sd;
-
-	rcu_read_lock();
-	for_each_domain(cpu, sd) {
-		for_each_cpu(i, sched_domain_span(sd)) {
-			if (!idle_cpu(i)) {
-				cpu = i;
-				goto unlock;
-			}
-		}
-	}
-unlock:
-	rcu_read_unlock();
-	return cpu;
-}
-/*
  * When add_timer_on() enqueues a timer into the timer wheel of an
  * idle CPU then this timer might expire before the next timer event
  * which is scheduled to wake up that CPU. In case of a completely
@@ -648,6 +621,42 @@ void sched_avg_update(struct rq *rq)
 	}
 }
 
+/*
+ * This routine returns the nearest non-idle cpu. It accepts a bitwise OR of
+ * SD_* flags present in linux/sched.h. If the local CPU isn't idle, it is
+ * returned back. If it is idle, then we must look for another CPU which have
+ * all the flags passed as argument as set.
+ */
+int sched_select_cpu(unsigned int sd_flags)
+{
+	struct sched_domain *sd;
+	int cpu = smp_processor_id();
+	int i;
+
+	/* If Current cpu isn't idle, don't migrate anything */
+	if (!idle_cpu(cpu))
+		return cpu;
+
+	rcu_read_lock();
+	for_each_domain(cpu, sd) {
+		 /* If sd doesnt' have sd_flags set skip sd. */
+		if ((sd->flags & sd_flags) != sd_flags)
+			continue;
+
+		for_each_cpu(i, sched_domain_span(sd)) {
+			if (i == cpu)
+				continue;
+			if (!idle_cpu(i)) {
+				cpu = i;
+				goto unlock;
+			}
+		}
+	}
+unlock:
+	rcu_read_unlock();
+	return cpu;
+}
+
 #else /* !CONFIG_SMP */
 void resched_task(struct task_struct *p)
 {
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V3 2/7] timer: hrtimer: Don't check idle_cpu() before calling get_nohz_timer_target()
  2013-03-18 15:23 [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues Viresh Kumar
  2013-03-18 15:23 ` [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving Viresh Kumar
@ 2013-03-18 15:23 ` Viresh Kumar
  2013-03-18 15:23 ` [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu Viresh Kumar
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-18 15:23 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: linaro-kernel, robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel, Viresh Kumar

Check for current cpu's idleness is already done in implementation of
sched_select_cpu() which is called by get_nohz_timer_target(). So, no need to
call idle_cpu() twice, once from sched_select_cpu() and once from timer and
hrtimer before calling get_nohz_timer_target().

This patch removes calls to idle_cpu() from timer and hrtimer.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/hrtimer.c | 2 +-
 kernel/timer.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index cc47812..c56668d 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -161,7 +161,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
 static int hrtimer_get_target(int this_cpu, int pinned)
 {
 #ifdef CONFIG_NO_HZ
-	if (!pinned && get_sysctl_timer_migration() && idle_cpu(this_cpu))
+	if (!pinned && get_sysctl_timer_migration())
 		return get_nohz_timer_target();
 #endif
 	return this_cpu;
diff --git a/kernel/timer.c b/kernel/timer.c
index dbf7a78..83a0d3c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -739,7 +739,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
 	cpu = smp_processor_id();
 
 #if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
-	if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu))
+	if (!pinned && get_sysctl_timer_migration())
 		cpu = get_nohz_timer_target();
 #endif
 	new_base = per_cpu(tvec_bases, cpu);
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu
  2013-03-18 15:23 [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues Viresh Kumar
  2013-03-18 15:23 ` [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving Viresh Kumar
  2013-03-18 15:23 ` [PATCH V3 2/7] timer: hrtimer: Don't check idle_cpu() before calling get_nohz_timer_target() Viresh Kumar
@ 2013-03-18 15:23 ` Viresh Kumar
  2013-03-19  5:15   ` Viresh Kumar
  2013-03-21  0:12   ` Tejun Heo
  2013-03-18 15:23 ` [PATCH V3 4/7] PHYLIB: queue " Viresh Kumar
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-18 15:23 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: linaro-kernel, robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel, Viresh Kumar

queue_work() queues work on current cpu. This may wake up an idle CPU, which is
actually not required.

Some of these works can be processed by any CPU and so we must select a non-idle
CPU here. The initial idea was to modify implementation of queue_work(), but
that may end up breaking lots of kernel code that would be a nightmare to debug.

So, we finalized to adding new workqueue interfaces, for works that don't depend
on a cpu to execute them.

This patch adds following new routines:
- queue_work_on_any_cpu
- queue_delayed_work_on_any_cpu

These routines would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 include/linux/workqueue.h |   5 ++
 kernel/workqueue.c        | 163 ++++++++++++++++++++++++++++++++--------------
 2 files changed, 118 insertions(+), 50 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index df30763..f0f7068 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -114,6 +114,7 @@ struct delayed_work {
 	/* target workqueue and CPU ->timer uses to queue ->work */
 	struct workqueue_struct *wq;
 	int cpu;
+	bool on_any_cpu;
 };
 
 /*
@@ -418,10 +419,14 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
 extern bool queue_work_on(int cpu, struct workqueue_struct *wq,
 			struct work_struct *work);
 extern bool queue_work(struct workqueue_struct *wq, struct work_struct *work);
+extern bool queue_work_on_any_cpu(struct workqueue_struct *wq,
+			struct work_struct *work);
 extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			struct delayed_work *work, unsigned long delay);
 extern bool queue_delayed_work(struct workqueue_struct *wq,
 			struct delayed_work *work, unsigned long delay);
+extern bool queue_delayed_work_on_any_cpu(struct workqueue_struct *wq,
+			struct delayed_work *dwork, unsigned long delay);
 extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			struct delayed_work *dwork, unsigned long delay);
 extern bool mod_delayed_work(struct workqueue_struct *wq,
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0e4fa1d..cf9c570 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1215,7 +1215,7 @@ static bool is_chained_work(struct workqueue_struct *wq)
 }
 
 static void __queue_work(int cpu, struct workqueue_struct *wq,
-			 struct work_struct *work)
+			 struct work_struct *work, bool on_any_cpu)
 {
 	struct pool_workqueue *pwq;
 	struct worker_pool *last_pool;
@@ -1240,8 +1240,13 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 retry:
 	/* pwq which will be used unless @work is executing elsewhere */
 	if (!(wq->flags & WQ_UNBOUND)) {
-		if (cpu == WORK_CPU_UNBOUND)
-			cpu = raw_smp_processor_id();
+		if (cpu == WORK_CPU_UNBOUND) {
+			if (on_any_cpu)
+				cpu = sched_select_cpu(0);
+			else
+				cpu = raw_smp_processor_id();
+		}
+
 		pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
 	} else {
 		pwq = first_pwq(wq);
@@ -1315,6 +1320,22 @@ retry:
 	spin_unlock(&pwq->pool->lock);
 }
 
+static bool __queue_work_on(int cpu, struct workqueue_struct *wq,
+		   struct work_struct *work, bool on_any_cpu)
+{
+	bool ret = false;
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
+		__queue_work(cpu, wq, work, on_any_cpu);
+		ret = true;
+	}
+
+	local_irq_restore(flags);
+	return ret;
+}
 /**
  * queue_work_on - queue work on specific cpu
  * @cpu: CPU number to execute work on
@@ -1329,18 +1350,7 @@ retry:
 bool queue_work_on(int cpu, struct workqueue_struct *wq,
 		   struct work_struct *work)
 {
-	bool ret = false;
-	unsigned long flags;
-
-	local_irq_save(flags);
-
-	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
-		__queue_work(cpu, wq, work);
-		ret = true;
-	}
-
-	local_irq_restore(flags);
-	return ret;
+	return __queue_work_on(cpu, wq, work, false);
 }
 EXPORT_SYMBOL_GPL(queue_work_on);
 
@@ -1356,21 +1366,38 @@ EXPORT_SYMBOL_GPL(queue_work_on);
  */
 bool queue_work(struct workqueue_struct *wq, struct work_struct *work)
 {
-	return queue_work_on(WORK_CPU_UNBOUND, wq, work);
+	return __queue_work_on(WORK_CPU_UNBOUND, wq, work, false);
 }
 EXPORT_SYMBOL_GPL(queue_work);
 
+/**
+ * queue_work_on_any_cpu - queue work on any cpu on a workqueue
+ * @wq: workqueue to use
+ * @work: work to queue
+ *
+ * Returns %false if @work was already on a queue, %true otherwise.
+ *
+ * We queue the work to any non-idle (from schedulers perspective) cpu.
+ */
+bool queue_work_on_any_cpu(struct workqueue_struct *wq,
+		struct work_struct *work)
+{
+	return __queue_work_on(WORK_CPU_UNBOUND, wq, work, true);
+}
+EXPORT_SYMBOL_GPL(queue_work_on_any_cpu);
+
 void delayed_work_timer_fn(unsigned long __data)
 {
 	struct delayed_work *dwork = (struct delayed_work *)__data;
 
 	/* should have been called from irqsafe timer with irq already off */
-	__queue_work(dwork->cpu, dwork->wq, &dwork->work);
+	__queue_work(dwork->cpu, dwork->wq, &dwork->work, dwork->on_any_cpu);
 }
 EXPORT_SYMBOL(delayed_work_timer_fn);
 
 static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
-				struct delayed_work *dwork, unsigned long delay)
+				struct delayed_work *dwork, unsigned long delay,
+				bool on_any_cpu)
 {
 	struct timer_list *timer = &dwork->timer;
 	struct work_struct *work = &dwork->work;
@@ -1387,7 +1414,7 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 	 * on that there's no such delay when @delay is 0.
 	 */
 	if (!delay) {
-		__queue_work(cpu, wq, &dwork->work);
+		__queue_work(cpu, wq, &dwork->work, on_any_cpu);
 		return;
 	}
 
@@ -1395,6 +1422,7 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 
 	dwork->wq = wq;
 	dwork->cpu = cpu;
+	dwork->on_any_cpu = on_any_cpu;
 	timer->expires = jiffies + delay;
 
 	if (unlikely(cpu != WORK_CPU_UNBOUND))
@@ -1403,19 +1431,9 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 		add_timer(timer);
 }
 
-/**
- * queue_delayed_work_on - queue work on specific CPU after delay
- * @cpu: CPU number to execute work on
- * @wq: workqueue to use
- * @dwork: work to queue
- * @delay: number of jiffies to wait before queueing
- *
- * Returns %false if @work was already on a queue, %true otherwise.  If
- * @delay is zero and @dwork is idle, it will be scheduled for immediate
- * execution.
- */
-bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
-			   struct delayed_work *dwork, unsigned long delay)
+static bool __queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
+			   struct delayed_work *dwork, unsigned long delay,
+			   bool on_any_cpu)
 {
 	struct work_struct *work = &dwork->work;
 	bool ret = false;
@@ -1425,13 +1443,30 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 	local_irq_save(flags);
 
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
-		__queue_delayed_work(cpu, wq, dwork, delay);
+		__queue_delayed_work(cpu, wq, dwork, delay, on_any_cpu);
 		ret = true;
 	}
 
 	local_irq_restore(flags);
 	return ret;
 }
+
+/**
+ * queue_delayed_work_on - queue work on specific CPU after delay
+ * @cpu: CPU number to execute work on
+ * @wq: workqueue to use
+ * @dwork: work to queue
+ * @delay: number of jiffies to wait before queueing
+ *
+ * Returns %false if @work was already on a queue, %true otherwise.  If
+ * @delay is zero and @dwork is idle, it will be scheduled for immediate
+ * execution.
+ */
+bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
+			   struct delayed_work *dwork, unsigned long delay)
+{
+	return __queue_delayed_work_on(cpu, wq, dwork, delay, false);
+}
 EXPORT_SYMBOL_GPL(queue_delayed_work_on);
 
 /**
@@ -1445,11 +1480,50 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on);
 bool queue_delayed_work(struct workqueue_struct *wq,
 			struct delayed_work *dwork, unsigned long delay)
 {
-	return queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
+	return __queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay,
+			false);
 }
 EXPORT_SYMBOL_GPL(queue_delayed_work);
 
 /**
+ * queue_delayed_work_on_any_cpu - queue work on any non-idle cpu on a workqueue
+ * after delay
+ * @wq: workqueue to use
+ * @dwork: delayable work to queue
+ * @delay: number of jiffies to wait before queueing
+ *
+ * Equivalent to queue_delayed_work() but tries to use any non-idle (from
+ * schedulers perspective) CPU.
+ */
+bool queue_delayed_work_on_any_cpu(struct workqueue_struct *wq,
+			struct delayed_work *dwork, unsigned long delay)
+{
+	return __queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay,
+			true);
+}
+EXPORT_SYMBOL_GPL(queue_delayed_work_on_any_cpu);
+
+static bool __mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
+			 struct delayed_work *dwork, unsigned long delay,
+			 bool on_any_cpu)
+{
+	unsigned long flags;
+	int ret;
+
+	do {
+		ret = try_to_grab_pending(&dwork->work, true, &flags);
+	} while (unlikely(ret == -EAGAIN));
+
+	if (likely(ret >= 0)) {
+		__queue_delayed_work(cpu, wq, dwork, delay, on_any_cpu);
+		local_irq_restore(flags);
+	}
+
+	/* -ENOENT from try_to_grab_pending() becomes %true */
+	return ret;
+}
+
+/**
  * mod_delayed_work_on - modify delay of or queue a delayed work on specific CPU
  * @cpu: CPU number to execute work on
  * @wq: workqueue to use
@@ -1470,20 +1544,7 @@ EXPORT_SYMBOL_GPL(queue_delayed_work);
 bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			 struct delayed_work *dwork, unsigned long delay)
 {
-	unsigned long flags;
-	int ret;
-
-	do {
-		ret = try_to_grab_pending(&dwork->work, true, &flags);
-	} while (unlikely(ret == -EAGAIN));
-
-	if (likely(ret >= 0)) {
-		__queue_delayed_work(cpu, wq, dwork, delay);
-		local_irq_restore(flags);
-	}
-
-	/* -ENOENT from try_to_grab_pending() becomes %true */
-	return ret;
+	return __mod_delayed_work_on(cpu, wq, dwork, delay, false);
 }
 EXPORT_SYMBOL_GPL(mod_delayed_work_on);
 
@@ -1498,7 +1559,8 @@ EXPORT_SYMBOL_GPL(mod_delayed_work_on);
 bool mod_delayed_work(struct workqueue_struct *wq, struct delayed_work *dwork,
 		      unsigned long delay)
 {
-	return mod_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
+	return __mod_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay,
+			dwork->on_any_cpu);
 }
 EXPORT_SYMBOL_GPL(mod_delayed_work);
 
@@ -2952,7 +3014,8 @@ bool flush_delayed_work(struct delayed_work *dwork)
 {
 	local_irq_disable();
 	if (del_timer_sync(&dwork->timer))
-		__queue_work(dwork->cpu, dwork->wq, &dwork->work);
+		__queue_work(dwork->cpu, dwork->wq, &dwork->work,
+				dwork->on_any_cpu);
 	local_irq_enable();
 	return flush_work(&dwork->work);
 }
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V3 4/7] PHYLIB: queue work on any cpu
  2013-03-18 15:23 [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues Viresh Kumar
                   ` (2 preceding siblings ...)
  2013-03-18 15:23 ` [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu Viresh Kumar
@ 2013-03-18 15:23 ` Viresh Kumar
  2013-03-18 17:33   ` David Miller
  2013-03-18 15:23 ` [PATCH V3 5/7] mmc: " Viresh Kumar
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Viresh Kumar @ 2013-03-18 15:23 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: linaro-kernel, robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel, Viresh Kumar, David S. Miller, netdev

Phylib uses workqueues for multiple purposes. There is no real dependency of
scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which isn't
idle to save on power.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This patch replaces the schedule_work() and schedule_delayed_work() routines
with their queue_[delayed_]work_on_any_cpu() siblings with system_wq as
parameter.

These routines would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/net/phy/phy.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 298b4c2..a517706 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -439,7 +439,7 @@ void phy_start_machine(struct phy_device *phydev,
 {
 	phydev->adjust_state = handler;
 
-	schedule_delayed_work(&phydev->state_queue, HZ);
+	queue_delayed_work_on_any_cpu(system_wq, &phydev->state_queue, HZ);
 }
 
 /**
@@ -527,7 +527,7 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
 	disable_irq_nosync(irq);
 	atomic_inc(&phydev->irq_disable);
 
-	schedule_work(&phydev->phy_queue);
+	queue_work_on_any_cpu(system_wq, &phydev->phy_queue);
 
 	return IRQ_HANDLED;
 }
@@ -682,7 +682,7 @@ static void phy_change(struct work_struct *work)
 
 	/* reschedule state queue work to run as soon as possible */
 	cancel_delayed_work_sync(&phydev->state_queue);
-	schedule_delayed_work(&phydev->state_queue, 0);
+	queue_delayed_work_on_any_cpu(system_wq, &phydev->state_queue, 0);
 
 	return;
 
@@ -966,7 +966,8 @@ void phy_state_machine(struct work_struct *work)
 	if (err < 0)
 		phy_error(phydev);
 
-	schedule_delayed_work(&phydev->state_queue, PHY_STATE_TIME * HZ);
+	queue_delayed_work_on_any_cpu(system_wq, &phydev->state_queue,
+			PHY_STATE_TIME * HZ);
 }
 
 static inline void mmd_phy_indirect(struct mii_bus *bus, int prtad, int devad,
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V3 5/7] mmc: queue work on any cpu
  2013-03-18 15:23 [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues Viresh Kumar
                   ` (3 preceding siblings ...)
  2013-03-18 15:23 ` [PATCH V3 4/7] PHYLIB: queue " Viresh Kumar
@ 2013-03-18 15:23 ` Viresh Kumar
  2013-03-19  7:58   ` Ulf Hansson
                     ` (2 more replies)
  2013-03-18 15:23 ` [PATCH V3 6/7] block: " Viresh Kumar
                   ` (2 subsequent siblings)
  7 siblings, 3 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-18 15:23 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: linaro-kernel, robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel, Viresh Kumar, Chris Ball, linux-mmc

mmc uses workqueues for running mmc_rescan(). There is no real dependency of
scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which isn't
idle to save on power.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This patch replaces the queue_delayed_work() with
queue_delayed_work_on_any_cpu() siblings.

This routine would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Cc: Chris Ball <cjb@laptop.org>
Cc: linux-mmc@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/mmc/core/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 9290bb5..adf331a 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -85,7 +85,7 @@ MODULE_PARM_DESC(
 static int mmc_schedule_delayed_work(struct delayed_work *work,
 				     unsigned long delay)
 {
-	return queue_delayed_work(workqueue, work, delay);
+	return queue_delayed_work_on_any_cpu(workqueue, work, delay);
 }
 
 /*
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V3 6/7] block: queue work on any cpu
  2013-03-18 15:23 [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues Viresh Kumar
                   ` (4 preceding siblings ...)
  2013-03-18 15:23 ` [PATCH V3 5/7] mmc: " Viresh Kumar
@ 2013-03-18 15:23 ` Viresh Kumar
  2013-03-22 15:05   ` Jens Axboe
  2013-03-18 15:35   ` Viresh Kumar
  2013-03-19  5:00 ` [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues Viresh Kumar
  7 siblings, 1 reply; 37+ messages in thread
From: Viresh Kumar @ 2013-03-18 15:23 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: linaro-kernel, robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel, Viresh Kumar, Jens Axboe

block layer uses workqueues for multiple purposes. There is no real dependency
of scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which isn't
idle to save on power.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This patch replaces schedule_work() and queue_[delayed_]work() with
queue_[delayed_]work_on_any_cpu() siblings.

These routines would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 block/blk-core.c | 6 +++---
 block/blk-ioc.c  | 2 +-
 block/genhd.c    | 9 ++++++---
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 186603b..14ae74f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -225,7 +225,7 @@ static void blk_delay_work(struct work_struct *work)
 void blk_delay_queue(struct request_queue *q, unsigned long msecs)
 {
 	if (likely(!blk_queue_dead(q)))
-		queue_delayed_work(kblockd_workqueue, &q->delay_work,
+		queue_delayed_work_on_any_cpu(kblockd_workqueue, &q->delay_work,
 				   msecs_to_jiffies(msecs));
 }
 EXPORT_SYMBOL(blk_delay_queue);
@@ -2852,14 +2852,14 @@ EXPORT_SYMBOL_GPL(blk_rq_prep_clone);
 
 int kblockd_schedule_work(struct request_queue *q, struct work_struct *work)
 {
-	return queue_work(kblockd_workqueue, work);
+	return queue_work_on_any_cpu(kblockd_workqueue, work);
 }
 EXPORT_SYMBOL(kblockd_schedule_work);
 
 int kblockd_schedule_delayed_work(struct request_queue *q,
 			struct delayed_work *dwork, unsigned long delay)
 {
-	return queue_delayed_work(kblockd_workqueue, dwork, delay);
+	return queue_delayed_work_on_any_cpu(kblockd_workqueue, dwork, delay);
 }
 EXPORT_SYMBOL(kblockd_schedule_delayed_work);
 
diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 9c4bb82..2eefeb1 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -144,7 +144,7 @@ void put_io_context(struct io_context *ioc)
 	if (atomic_long_dec_and_test(&ioc->refcount)) {
 		spin_lock_irqsave(&ioc->lock, flags);
 		if (!hlist_empty(&ioc->icq_list))
-			schedule_work(&ioc->release_work);
+			queue_work_on_any_cpu(system_wq, &ioc->release_work);
 		else
 			free_ioc = true;
 		spin_unlock_irqrestore(&ioc->lock, flags);
diff --git a/block/genhd.c b/block/genhd.c
index a1ed52a..4bdb735 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1488,9 +1488,11 @@ static void __disk_unblock_events(struct gendisk *disk, bool check_now)
 	intv = disk_events_poll_jiffies(disk);
 	set_timer_slack(&ev->dwork.timer, intv / 4);
 	if (check_now)
-		queue_delayed_work(system_freezable_wq, &ev->dwork, 0);
+		queue_delayed_work_on_any_cpu(system_freezable_wq, &ev->dwork,
+				0);
 	else if (intv)
-		queue_delayed_work(system_freezable_wq, &ev->dwork, intv);
+		queue_delayed_work_on_any_cpu(system_freezable_wq, &ev->dwork,
+				intv);
 out_unlock:
 	spin_unlock_irqrestore(&ev->lock, flags);
 }
@@ -1626,7 +1628,8 @@ static void disk_check_events(struct disk_events *ev,
 
 	intv = disk_events_poll_jiffies(disk);
 	if (!ev->block && intv)
-		queue_delayed_work(system_freezable_wq, &ev->dwork, intv);
+		queue_delayed_work_on_any_cpu(system_freezable_wq, &ev->dwork,
+				intv);
 
 	spin_unlock_irq(&ev->lock);
 
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V3 7/7] fbcon: queue work on any cpu
  2013-03-18 15:23 [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues Viresh Kumar
@ 2013-03-18 15:35   ` Viresh Kumar
  2013-03-18 15:23 ` [PATCH V3 2/7] timer: hrtimer: Don't check idle_cpu() before calling get_nohz_timer_target() Viresh Kumar
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-18 15:23 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: linaro-kernel, robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel, Viresh Kumar, Dave Airlie, linux-fbdev

fbcon uses workqueues and it has no real dependency of scheduling these on the
cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up few times just to
service this work. It would be better if we can schedule it on a cpu which isn't
idle to save on power.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This patch replaces schedule_work() routine with queue_work_on_any_cpu()
sibling with system_wq as workqueue.

This routine would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Cc: Dave Airlie <airlied@redhat.com>
Cc: linux-fbdev@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/video/console/fbcon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
index 3cd6759..a900997 100644
--- a/drivers/video/console/fbcon.c
+++ b/drivers/video/console/fbcon.c
@@ -404,7 +404,7 @@ static void cursor_timer_handler(unsigned long dev_addr)
 	struct fb_info *info = (struct fb_info *) dev_addr;
 	struct fbcon_ops *ops = info->fbcon_par;
 
-	schedule_work(&info->queue);
+	queue_work_on_any_cpu(system_wq, &info->queue);
 	mod_timer(&ops->cursor_timer, jiffies + HZ/5);
 }
 
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V3 7/7] fbcon: queue work on any cpu
@ 2013-03-18 15:35   ` Viresh Kumar
  0 siblings, 0 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-18 15:35 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: linaro-kernel, robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel, Viresh Kumar, Dave Airlie, linux-fbdev

fbcon uses workqueues and it has no real dependency of scheduling these on the
cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up few times just to
service this work. It would be better if we can schedule it on a cpu which isn't
idle to save on power.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running = 0
- wake_list is empty

This patch replaces schedule_work() routine with queue_work_on_any_cpu()
sibling with system_wq as workqueue.

This routine would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Cc: Dave Airlie <airlied@redhat.com>
Cc: linux-fbdev@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/video/console/fbcon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
index 3cd6759..a900997 100644
--- a/drivers/video/console/fbcon.c
+++ b/drivers/video/console/fbcon.c
@@ -404,7 +404,7 @@ static void cursor_timer_handler(unsigned long dev_addr)
 	struct fb_info *info = (struct fb_info *) dev_addr;
 	struct fbcon_ops *ops = info->fbcon_par;
 
-	schedule_work(&info->queue);
+	queue_work_on_any_cpu(system_wq, &info->queue);
 	mod_timer(&ops->cursor_timer, jiffies + HZ/5);
 }
 
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving
  2013-03-18 15:23 ` [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving Viresh Kumar
@ 2013-03-18 15:39   ` Frederic Weisbecker
  2013-03-18 15:44     ` Viresh Kumar
  2013-03-19 12:30   ` Peter Zijlstra
  1 sibling, 1 reply; 37+ messages in thread
From: Frederic Weisbecker @ 2013-03-18 15:39 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

2013/3/18 Viresh Kumar <viresh.kumar@linaro.org>:
> In order to save power, it would be useful to schedule light weight work on cpus
> that aren't IDLE instead of waking up an IDLE one.
>
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty
>
> This is already implemented for timers as get_nohz_timer_target(). We can figure
> out few more users of this feature, like workqueues.
>
> This patch converts get_nohz_timer_target() into a generic API
> sched_select_cpu() so that other frameworks (like workqueue) can also use it.
>
> This routine returns the cpu which is non-idle. It accepts a bitwise OR of SD_*
> flags present in linux/sched.h. If the local CPU isn't idle OR all cpus are
> idle, local cpu is returned back. If local cpu is idle, then we must look for
> another CPU which have all the flags passed as argument as set and isn't idle.
>
> This patch reuses the code from get_nohz_timer_target() routine, which had
> similar implementation.
>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  include/linux/sched.h | 21 +++++++++++++++--
>  kernel/sched/core.c   | 63 +++++++++++++++++++++++++++++----------------------
>  2 files changed, 55 insertions(+), 29 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index e20580d..216fa0d 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -230,14 +230,31 @@ extern void init_idle_bootup_task(struct task_struct *idle);
>
>  extern int runqueue_is_locked(int cpu);
>
> -#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
> +#ifdef CONFIG_SMP
> +extern int sched_select_cpu(unsigned int sd_flags);
> +
> +#ifdef CONFIG_NO_HZ
>  extern void nohz_balance_enter_idle(int cpu);
>  extern void set_cpu_sd_state_idle(void);
> -extern int get_nohz_timer_target(void);
> +/*
> + * In the semi idle case, use the nearest busy cpu for migrating timers
> + * from an idle cpu.  This is good for power-savings.
> + *
> + * We don't do similar optimization for completely idle system, as
> + * selecting an idle cpu will add more delays to the timers than intended
> + * (as that cpu's timer base may not be uptodate wrt jiffies etc).
> + */
> +#define get_nohz_timer_target() sched_select_cpu(0)
>  #else
>  static inline void nohz_balance_enter_idle(int cpu) { }
>  static inline void set_cpu_sd_state_idle(void) { }
> +
> +static inline int sched_select_cpu(unsigned int sd_flags)
> +{
> +       return raw_smp_processor_id();

I feel this should be symetric with the requirement of having
preemption disabled as in the CONFIG_NO_HZ version. This should be
smp_processor_id().

[...]
> @@ -648,6 +621,42 @@ void sched_avg_update(struct rq *rq)
>         }
>  }
>
> +/*
> + * This routine returns the nearest non-idle cpu. It accepts a bitwise OR of
> + * SD_* flags present in linux/sched.h. If the local CPU isn't idle, it is
> + * returned back. If it is idle, then we must look for another CPU which have
> + * all the flags passed as argument as set.
> + */
> +int sched_select_cpu(unsigned int sd_flags)

It would be nice to have some more precise naming. sched_select_busy_cpu() ?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving
  2013-03-18 15:39   ` Frederic Weisbecker
@ 2013-03-18 15:44     ` Viresh Kumar
  2013-03-18 15:57       ` Frederic Weisbecker
  0 siblings, 1 reply; 37+ messages in thread
From: Viresh Kumar @ 2013-03-18 15:44 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

On Mon, Mar 18, 2013 at 9:09 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 2013/3/18 Viresh Kumar <viresh.kumar@linaro.org>:

>> +static inline int sched_select_cpu(unsigned int sd_flags)
>> +{
>> +       return raw_smp_processor_id();
>
> I feel this should be symetric with the requirement of having
> preemption disabled as in the CONFIG_NO_HZ version. This should be
> smp_processor_id().

Yes, my fault.

>> +int sched_select_cpu(unsigned int sd_flags)
>
> It would be nice to have some more precise naming. sched_select_busy_cpu() ?

You are correct that name can be improved but busy may give it a different
meaning. Maybe sched_select_non_idle_cpu()?

Thanks for your instantaneous review :)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving
  2013-03-18 15:44     ` Viresh Kumar
@ 2013-03-18 15:57       ` Frederic Weisbecker
  2013-03-19  5:15           ` Viresh Kumar
  0 siblings, 1 reply; 37+ messages in thread
From: Frederic Weisbecker @ 2013-03-18 15:57 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

2013/3/18 Viresh Kumar <viresh.kumar@linaro.org>:
> On Mon, Mar 18, 2013 at 9:09 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>> 2013/3/18 Viresh Kumar <viresh.kumar@linaro.org>:
>
>>> +static inline int sched_select_cpu(unsigned int sd_flags)
>>> +{
>>> +       return raw_smp_processor_id();
>>
>> I feel this should be symetric with the requirement of having
>> preemption disabled as in the CONFIG_NO_HZ version. This should be
>> smp_processor_id().
>
> Yes, my fault.
>
>>> +int sched_select_cpu(unsigned int sd_flags)
>>
>> It would be nice to have some more precise naming. sched_select_busy_cpu() ?
>
> You are correct that name can be improved but busy may give it a different
> meaning. Maybe sched_select_non_idle_cpu()?

That works too yeah.

Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 4/7] PHYLIB: queue work on any cpu
  2013-03-18 15:23 ` [PATCH V3 4/7] PHYLIB: queue " Viresh Kumar
@ 2013-03-18 17:33   ` David Miller
  0 siblings, 0 replies; 37+ messages in thread
From: David Miller @ 2013-03-18 17:33 UTC (permalink / raw)
  To: viresh.kumar
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel, netdev

From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Mon, 18 Mar 2013 20:53:26 +0530

> Phylib uses workqueues for multiple purposes. There is no real dependency of
> scheduling these on the cpu which scheduled them.
> 
> On a idle system, it is observed that and idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which isn't
> idle to save on power.
> 
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty
> 
> This patch replaces the schedule_work() and schedule_delayed_work() routines
> with their queue_[delayed_]work_on_any_cpu() siblings with system_wq as
> parameter.
> 
> These routines would look for the closest (via scheduling domains) non-idle cpu
> (non-idle from schedulers perspective). If the current cpu is not idle or all
> cpus are idle, work will be scheduled on local cpu.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

This will need to be applied to whatever tree adds these new interfaces,
and for that:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues
  2013-03-18 15:23 [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues Viresh Kumar
                   ` (6 preceding siblings ...)
  2013-03-18 15:35   ` Viresh Kumar
@ 2013-03-19  5:00 ` Viresh Kumar
  7 siblings, 0 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-19  5:00 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: linaro-kernel, robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel, Viresh Kumar

On 18 March 2013 20:53, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> In order to save power, it would be useful to schedule light weight work on cpus
> that aren't IDLE instead of waking up an IDLE one.
>
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty

Oops!! I forgot the change log:

V2->V3:
- Dropped changes into core queue_work() API, rather create *_on_any_cpu()
  APIs
- Dropped running timers migration patch as that was broken
- Migrated few users of workqueues to use *_on_any_cpu() APIs.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving
  2013-03-18 15:57       ` Frederic Weisbecker
@ 2013-03-19  5:15           ` Viresh Kumar
  0 siblings, 0 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-19  5:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

On 18 March 2013 21:27, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 2013/3/18 Viresh Kumar <viresh.kumar@linaro.org>:
>>> It would be nice to have some more precise naming. sched_select_busy_cpu() ?
>>
>> You are correct that name can be improved but busy may give it a different
>> meaning. Maybe sched_select_non_idle_cpu()?
>
> That works too yeah.

Here is the fixup that i applied (it fixes another stupid mistake in sched.h).
I have pushed this series here again:

http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/sched-wq-migration-v3

commit e769ff0d78fd2fb504ae056a44b70bba7c259126
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Fri Sep 14 15:13:05 2012 +0530

    fixup! sched: Create sched_select_cpu() to give preferred CPU for
power saving

    Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 include/linux/sched.h | 19 ++++++++++---------
 kernel/sched/core.c   |  2 +-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 216fa0d..db6655a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -231,11 +231,18 @@ extern void init_idle_bootup_task(struct
task_struct *idle);
 extern int runqueue_is_locked(int cpu);

 #ifdef CONFIG_SMP
-extern int sched_select_cpu(unsigned int sd_flags);
+extern int sched_select_non_idle_cpu(unsigned int sd_flags);
+#else
+static inline int sched_select_non_idle_cpu(unsigned int sd_flags)
+{
+       return smp_processor_id();
+}
+#endif

-#ifdef CONFIG_NO_HZ
+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
 extern void nohz_balance_enter_idle(int cpu);
 extern void set_cpu_sd_state_idle(void);
+
 /*
  * In the semi idle case, use the nearest busy cpu for migrating timers
  * from an idle cpu.  This is good for power-savings.
@@ -244,17 +251,11 @@ extern void set_cpu_sd_state_idle(void);
  * selecting an idle cpu will add more delays to the timers than intended
  * (as that cpu's timer base may not be uptodate wrt jiffies etc).
  */
-#define get_nohz_timer_target() sched_select_cpu(0)
+#define get_nohz_timer_target() sched_select_non_idle_cpu(0)
 #else
 static inline void nohz_balance_enter_idle(int cpu) { }
 static inline void set_cpu_sd_state_idle(void) { }
-
-static inline int sched_select_cpu(unsigned int sd_flags)
-{
-       return raw_smp_processor_id();
-}
 #endif
-#endif /* CONFIG_SMP */

 /*
  * Only dump TASK_* tasks. (0 for all tasks)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ccd76d7..0265a5e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -627,7 +627,7 @@ void sched_avg_update(struct rq *rq)
  * returned back. If it is idle, then we must look for another CPU which have
  * all the flags passed as argument as set.
  */
-int sched_select_cpu(unsigned int sd_flags)
+int sched_select_non_idle_cpu(unsigned int sd_flags)
 {
        struct sched_domain *sd;
        int cpu = smp_processor_id();

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving
@ 2013-03-19  5:15           ` Viresh Kumar
  0 siblings, 0 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-19  5:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

On 18 March 2013 21:27, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 2013/3/18 Viresh Kumar <viresh.kumar@linaro.org>:
>>> It would be nice to have some more precise naming. sched_select_busy_cpu() ?
>>
>> You are correct that name can be improved but busy may give it a different
>> meaning. Maybe sched_select_non_idle_cpu()?
>
> That works too yeah.

Here is the fixup that i applied (it fixes another stupid mistake in sched.h).
I have pushed this series here again:

http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/sched-wq-migration-v3

commit e769ff0d78fd2fb504ae056a44b70bba7c259126
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Fri Sep 14 15:13:05 2012 +0530

    fixup! sched: Create sched_select_cpu() to give preferred CPU for
power saving

    Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 include/linux/sched.h | 19 ++++++++++---------
 kernel/sched/core.c   |  2 +-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 216fa0d..db6655a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -231,11 +231,18 @@ extern void init_idle_bootup_task(struct
task_struct *idle);
 extern int runqueue_is_locked(int cpu);

 #ifdef CONFIG_SMP
-extern int sched_select_cpu(unsigned int sd_flags);
+extern int sched_select_non_idle_cpu(unsigned int sd_flags);
+#else
+static inline int sched_select_non_idle_cpu(unsigned int sd_flags)
+{
+       return smp_processor_id();
+}
+#endif

-#ifdef CONFIG_NO_HZ
+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
 extern void nohz_balance_enter_idle(int cpu);
 extern void set_cpu_sd_state_idle(void);
+
 /*
  * In the semi idle case, use the nearest busy cpu for migrating timers
  * from an idle cpu.  This is good for power-savings.
@@ -244,17 +251,11 @@ extern void set_cpu_sd_state_idle(void);
  * selecting an idle cpu will add more delays to the timers than intended
  * (as that cpu's timer base may not be uptodate wrt jiffies etc).
  */
-#define get_nohz_timer_target() sched_select_cpu(0)
+#define get_nohz_timer_target() sched_select_non_idle_cpu(0)
 #else
 static inline void nohz_balance_enter_idle(int cpu) { }
 static inline void set_cpu_sd_state_idle(void) { }

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu
  2013-03-18 15:23 ` [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu Viresh Kumar
@ 2013-03-19  5:15   ` Viresh Kumar
  2013-03-19 13:23     ` Viresh Kumar
  2013-03-21  0:12   ` Tejun Heo
  1 sibling, 1 reply; 37+ messages in thread
From: Viresh Kumar @ 2013-03-19  5:15 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: linaro-kernel, robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel, Viresh Kumar

On 18 March 2013 20:53, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> queue_work() queues work on current cpu. This may wake up an idle CPU, which is
> actually not required.
>
> Some of these works can be processed by any CPU and so we must select a non-idle
> CPU here. The initial idea was to modify implementation of queue_work(), but
> that may end up breaking lots of kernel code that would be a nightmare to debug.
>
> So, we finalized to adding new workqueue interfaces, for works that don't depend
> on a cpu to execute them.

Fixup:

commit 6e5b6fac3353f5e4e5490931b3eb6d3fd7864ca0
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Tue Mar 19 10:34:20 2013 +0530

    fixup! workqueue: Add helpers to schedule work on any cpu
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index cf9c570..68daf50 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1242,7 +1242,7 @@ retry:
        if (!(wq->flags & WQ_UNBOUND)) {
                if (cpu == WORK_CPU_UNBOUND) {
                        if (on_any_cpu)
-                               cpu = sched_select_cpu(0);
+                               cpu = sched_select_non_idle_cpu(0);
                        else
                                cpu = raw_smp_processor_id();
                }

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 5/7] mmc: queue work on any cpu
  2013-03-18 15:23 ` [PATCH V3 5/7] mmc: " Viresh Kumar
@ 2013-03-19  7:58   ` Ulf Hansson
  2013-03-22 17:09   ` Chris Ball
  2013-03-22 17:26   ` Chris Ball
  2 siblings, 0 replies; 37+ messages in thread
From: Ulf Hansson @ 2013-03-19  7:58 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel, Chris Ball, linux-mmc

On 18 March 2013 16:23, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> mmc uses workqueues for running mmc_rescan(). There is no real dependency of
> scheduling these on the cpu which scheduled them.
>
> On a idle system, it is observed that and idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which isn't
> idle to save on power.
>
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty
>
> This patch replaces the queue_delayed_work() with
> queue_delayed_work_on_any_cpu() siblings.
>
> This routine would look for the closest (via scheduling domains) non-idle cpu
> (non-idle from schedulers perspective). If the current cpu is not idle or all
> cpus are idle, work will be scheduled on local cpu.
>
> Cc: Chris Ball <cjb@laptop.org>
> Cc: linux-mmc@vger.kernel.org
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/mmc/core/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 9290bb5..adf331a 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -85,7 +85,7 @@ MODULE_PARM_DESC(
>  static int mmc_schedule_delayed_work(struct delayed_work *work,
>                                      unsigned long delay)
>  {
> -       return queue_delayed_work(workqueue, work, delay);
> +       return queue_delayed_work_on_any_cpu(workqueue, work, delay);
>  }
>
>  /*
> --
> 1.7.12.rc2.18.g61b472e
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Acked-by: Ulf Hansson <ulf.hansson@linaro.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving
  2013-03-18 15:23 ` [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving Viresh Kumar
  2013-03-18 15:39   ` Frederic Weisbecker
@ 2013-03-19 12:30   ` Peter Zijlstra
  2013-03-19 12:52     ` Viresh Kumar
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2013-03-19 12:30 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

On Mon, 2013-03-18 at 20:53 +0530, Viresh Kumar wrote:
> +/*
> + * This routine returns the nearest non-idle cpu. It accepts a
> bitwise OR of
> + * SD_* flags present in linux/sched.h. If the local CPU isn't idle,
> it is
> + * returned back. If it is idle, then we must look for another CPU
> which have
> + * all the flags passed as argument as set.
> + */
> +int sched_select_cpu(unsigned int sd_flags)

So the only problem I have is that you expose the SD_flags to !sched
code. The only proposed user is in the workqueue code and passes 0.

Why not leave out the sd_flags argument and introduce it once you start
using it; at which point we can argue on the interface.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving
  2013-03-19 12:30   ` Peter Zijlstra
@ 2013-03-19 12:52     ` Viresh Kumar
  2013-03-19 13:22       ` Viresh Kumar
  0 siblings, 1 reply; 37+ messages in thread
From: Viresh Kumar @ 2013-03-19 12:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

On 19 March 2013 18:00, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2013-03-18 at 20:53 +0530, Viresh Kumar wrote:
>> +/*
>> + * This routine returns the nearest non-idle cpu. It accepts a
>> bitwise OR of
>> + * SD_* flags present in linux/sched.h. If the local CPU isn't idle,
>> it is
>> + * returned back. If it is idle, then we must look for another CPU
>> which have
>> + * all the flags passed as argument as set.
>> + */
>> +int sched_select_cpu(unsigned int sd_flags)
>
> So the only problem I have is that you expose the SD_flags to !sched
> code. The only proposed user is in the workqueue code and passes 0.
>
> Why not leave out the sd_flags argument and introduce it once you start
> using it; at which point we can argue on the interface.

Yes, that can be done. Will fix it up.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving
  2013-03-19 12:52     ` Viresh Kumar
@ 2013-03-19 13:22       ` Viresh Kumar
  0 siblings, 0 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-19 13:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

On 19 March 2013 18:22, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 19 March 2013 18:00, Peter Zijlstra <peterz@infradead.org> wrote:
>> Why not leave out the sd_flags argument and introduce it once you start
>> using it; at which point we can argue on the interface.
>
> Yes, that can be done. Will fix it up.

Fixup:

commit 77927939224520cbb0ac47270d3458bedffe42c4
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Tue Mar 19 18:50:37 2013 +0530

    fixup! sched: Create sched_select_non_idle_cpu() to give preferred
CPU for power saving
---
 include/linux/sched.h | 6 +++---
 kernel/sched/core.c   | 6 +-----
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index db6655a..37eb1dd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -231,9 +231,9 @@ extern void init_idle_bootup_task(struct task_struct *idle);
 extern int runqueue_is_locked(int cpu);

 #ifdef CONFIG_SMP
-extern int sched_select_non_idle_cpu(unsigned int sd_flags);
+extern int sched_select_non_idle_cpu(void);
 #else
-static inline int sched_select_non_idle_cpu(unsigned int sd_flags)
+static inline int sched_select_non_idle_cpu(void)
 {
        return smp_processor_id();
 }
@@ -251,7 +251,7 @@ extern void set_cpu_sd_state_idle(void);
  * selecting an idle cpu will add more delays to the timers than intended
  * (as that cpu's timer base may not be uptodate wrt jiffies etc).
  */
-#define get_nohz_timer_target() sched_select_non_idle_cpu(0)
+#define get_nohz_timer_target() sched_select_non_idle_cpu()
 #else
 static inline void nohz_balance_enter_idle(int cpu) { }
 static inline void set_cpu_sd_state_idle(void) { }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0265a5e..f597d2b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -627,7 +627,7 @@ void sched_avg_update(struct rq *rq)
  * returned back. If it is idle, then we must look for another CPU which have
  * all the flags passed as argument as set.
  */
-int sched_select_non_idle_cpu(unsigned int sd_flags)
+int sched_select_non_idle_cpu(void)
 {
        struct sched_domain *sd;
        int cpu = smp_processor_id();
@@ -639,10 +639,6 @@ int sched_select_non_idle_cpu(unsigned int sd_flags)

        rcu_read_lock();
        for_each_domain(cpu, sd) {
-                /* If sd doesnt' have sd_flags set skip sd. */
-               if ((sd->flags & sd_flags) != sd_flags)
-                       continue;
-
                for_each_cpu(i, sched_domain_span(sd)) {
                        if (i == cpu)
                                continue;

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu
  2013-03-19  5:15   ` Viresh Kumar
@ 2013-03-19 13:23     ` Viresh Kumar
  0 siblings, 0 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-19 13:23 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, mingo, peterz, rostedt
  Cc: linaro-kernel, robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel, Viresh Kumar

On 19 March 2013 10:45, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 18 March 2013 20:53, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> queue_work() queues work on current cpu. This may wake up an idle CPU, which is
>> actually not required.
>>
>> Some of these works can be processed by any CPU and so we must select a non-idle
>> CPU here. The initial idea was to modify implementation of queue_work(), but
>> that may end up breaking lots of kernel code that would be a nightmare to debug.
>>
>> So, we finalized to adding new workqueue interfaces, for works that don't depend
>> on a cpu to execute them.

Another fixup:

commit 8753c6d936faa6e3233cbf44a55913d05de05683
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Tue Mar 19 18:50:59 2013 +0530

    fixup! workqueue: Add helpers to schedule work on any cpu
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 68daf50..4e023ab 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1242,7 +1242,7 @@ retry:
        if (!(wq->flags & WQ_UNBOUND)) {
                if (cpu == WORK_CPU_UNBOUND) {
                        if (on_any_cpu)
-                               cpu = sched_select_non_idle_cpu(0);
+                               cpu = sched_select_non_idle_cpu();
                        else
                                cpu = raw_smp_processor_id();
                }

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu
  2013-03-18 15:23 ` [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu Viresh Kumar
  2013-03-19  5:15   ` Viresh Kumar
@ 2013-03-21  0:12   ` Tejun Heo
  2013-03-21 10:57     ` Viresh Kumar
  1 sibling, 1 reply; 37+ messages in thread
From: Tejun Heo @ 2013-03-21  0:12 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, suresh.b.siddha, venki, mingo, peterz,
	rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

Hello, Viresh.

On Mon, Mar 18, 2013 at 08:53:25PM +0530, Viresh Kumar wrote:
> queue_work() queues work on current cpu. This may wake up an idle CPU, which is
> actually not required.
> 
> Some of these works can be processed by any CPU and so we must select a non-idle
> CPU here. The initial idea was to modify implementation of queue_work(), but
> that may end up breaking lots of kernel code that would be a nightmare to debug.
> 
> So, we finalized to adding new workqueue interfaces, for works that don't depend
> on a cpu to execute them.
> 
> This patch adds following new routines:
> - queue_work_on_any_cpu
> - queue_delayed_work_on_any_cpu
> 
> These routines would look for the closest (via scheduling domains) non-idle cpu
> (non-idle from schedulers perspective). If the current cpu is not idle or all
> cpus are idle, work will be scheduled on local cpu.

For some reason, I've been always unsure about this patchset.  I've
been thinking about it past few days and I think I now know why.

I can see that strategy like this being useful for timer, and
impressive power saving numbers BTW - we definitely want that.  While
workqueue shares a lot of attributes with timers, there's one
important difference - scheduler is involved in work item executions.

Work item scheduling for per-cpu work items artificially removes the
choice of CPU from the scheduler with the assumption that such
restrictions would lead to lower overall overhead.  There is another
variant of workqueue - WQ_UNBOUND - when such assumption wouldn't hold
too well and it would be best to give the scheduler full latitude
regarding scheduling work item execution including choice of CPU.

Now, this patchset, kinda sorta adds back CPU selection by scheduler
in an ad-hoc way, right?  If we're gonna do that, why not just make it
unbound workqueues?  Then, the scheduler would have full discretion
over them and we don't need to implement this separate ad-hoc thing on
the side.  It's the roundaboutness that bothers me.

I'm not sure about other workqueues but the block one can be very
performance sensitive on machines with high IOPS and it would
definitely be advisable to keep it CPU-affine unless power consumption
is the overriding concern, so how about introducing another workqueue
flag, say WQ_UNBOUND_FOR_POWER (it's terrible, please come up with
something better), which is WQ_UNBOUND on configurations where this
matters and becomes noop if not?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu
  2013-03-21  0:12   ` Tejun Heo
@ 2013-03-21 10:57     ` Viresh Kumar
  2013-03-21 18:29       ` Tejun Heo
  0 siblings, 1 reply; 37+ messages in thread
From: Viresh Kumar @ 2013-03-21 10:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: pjt, paul.mckenney, tglx, suresh.b.siddha, venki, mingo, peterz,
	rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

On 21 March 2013 05:42, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Mar 18, 2013 at 08:53:25PM +0530, Viresh Kumar wrote:

> For some reason, I've been always unsure about this patchset.  I've
> been thinking about it past few days and I think I now know why.

I knew this since the beginning and thought power numbers i showed might
change your view. My hard luck :)

> I can see that strategy like this being useful for timer, and
> impressive power saving numbers BTW - we definitely want that.  While
> workqueue shares a lot of attributes with timers, there's one
> important difference - scheduler is involved in work item executions.

Yes, scheduler is involved for queuing works queued on a UNBOUND wq.

> Work item scheduling for per-cpu work items artificially removes the
> choice of CPU from the scheduler with the assumption that such
> restrictions would lead to lower overall overhead.

Yes.. We aren't touching them here and we can't :)

> There is another
> variant of workqueue - WQ_UNBOUND - when such assumption wouldn't hold
> too well and it would be best to give the scheduler full latitude
> regarding scheduling work item execution including choice of CPU.

I liked this point of yours and what you said is correct too. But there is a
difference here with our approach.

I here see two parallel solutions:

1. What we proposed:
- Add another wq api and used sched_select_non_idle_cpu() to choose
the right cpu
- Fix user drivers we care about

2. What you proposed:
- Fix user drivers we care about by using UNBOUNDED workqueues rather than
system_wq or other private wq's. And let the scheduler do everything. Right?


Now there are two things i am worried about.

A.
About the difference in behavior of scheduler and sched_select_non_idle_cpu().
Scheduler will treat this work as any normal task and can schedule it anywhere
based on what it feels is the right place and may still wake up an idle one for
load balancing.

In our approach, We aren't switching CPU for a work if that cpu (or system) is
busy enough, like in case of high IOPS for block layer. We will find that cpu as
busy, most of the time on such situations and so we will stay where we were.

In case system is fairly idle, then also we stay on the same cpu.
The only time we migrate is when current cpu is idle but the whole system
isn't.

B.
It is sort of difficult to teach users about BOUND and UNBOUND workqueues
and people have used them without thinking too much about them since some
time. We need a straightforward API to make this more transparent. And
queue_work_on_any_cpu() was a good candidate here. I am not talking about
its implementation but about the interface.


Though in both suggestions we need to fix user drivers one by one to use
queue_work_on_any_cpu() or use WQ_UNBOUND..

> Now, this patchset, kinda sorta adds back CPU selection by scheduler
> in an ad-hoc way, right?

Yes.

> If we're gonna do that, why not just make it
> unbound workqueues?  Then, the scheduler would have full discretion
> over them and we don't need to implement this separate ad-hoc thing on
> the side.  It's the roundaboutness that bothers me.

I had my own reasons as explained earlier.

> I'm not sure about other workqueues but the block one can be very
> performance sensitive on machines with high IOPS and it would
> definitely be advisable to keep it CPU-affine unless power consumption
> is the overriding concern, so how about introducing another workqueue
> flag, say WQ_UNBOUND_FOR_POWER (it's terrible, please come up with
> something better), which is WQ_UNBOUND on configurations where this
> matters and becomes noop if not?

Maybe people wouldn't suffer much because of the reasons i told you earlier.
On a busy system we will never switch cpus.

This static configuration might not work well with multiplatform images.

--
viresh

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu
  2013-03-21 10:57     ` Viresh Kumar
@ 2013-03-21 18:29       ` Tejun Heo
  2013-03-28 18:13         ` Tejun Heo
  0 siblings, 1 reply; 37+ messages in thread
From: Tejun Heo @ 2013-03-21 18:29 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, suresh.b.siddha, venki, mingo, peterz,
	rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

Hey, Viresh.

On Thu, Mar 21, 2013 at 04:27:23PM +0530, Viresh Kumar wrote:
> > For some reason, I've been always unsure about this patchset.  I've
> > been thinking about it past few days and I think I now know why.
> 
> I knew this since the beginning and thought power numbers i showed might
> change your view. My hard luck :)

Heh, sorry about the trouble but, yeah, your numbers are impressive
and no matter how this discussion turns out, we're gonna have it in
some form, so please don't think that your numbers didn't change
anything.  Actually, to me, it seems that identifying which workqueues
are good candidates for conversion and determining how much powersave
can be gained are 90% of the work.

> Now there are two things i am worried about.
> 
> A.
> About the difference in behavior of scheduler and sched_select_non_idle_cpu().
> Scheduler will treat this work as any normal task and can schedule it anywhere
> based on what it feels is the right place and may still wake up an idle one for
> load balancing.
> 
> In our approach, We aren't switching CPU for a work if that cpu (or system) is
> busy enough, like in case of high IOPS for block layer. We will find that cpu as
> busy, most of the time on such situations and so we will stay where we were.
>
> In case system is fairly idle, then also we stay on the same cpu.
> The only time we migrate is when current cpu is idle but the whole system
> isn't.

Yes, I actually like that part a lot although I do wish the idle check
was inlined.

What I'm wondering is whether the kinda out-of-band decision via
sched_select_cpu() is justified given that it can and is likely to go
through full scheduling decision anyway.  For timer, we don't have
that, so it makes sense.  For work items, it's a bit different.

To rephrase, *if* the scheduler can't already make proper decisions
regarding power consumption on an idlish system, maybe it can be
improved to do so?  It could as well be that this CPU selection is
special enough that it's just better to keep it separate as this
patchset proposes.  This is something up to the scheduler people.
Peter, Ingo, what do you guys think?

> B.
> It is sort of difficult to teach users about BOUND and UNBOUND workqueues
> and people have used them without thinking too much about them since some
> time. We need a straightforward API to make this more transparent. And
> queue_work_on_any_cpu() was a good candidate here. I am not talking about
> its implementation but about the interface.
> 
> Though in both suggestions we need to fix user drivers one by one to use
> queue_work_on_any_cpu() or use WQ_UNBOUND..

I don't know.  Workqueues are attribute domains and I kinda dislike
having a completely separate set of interface for this.  Regardless of
how the backend gets implemented, I'd really much prefer it being
implemented as a workqueue attribute rather than another set of
queueing interface functions.

> Maybe people wouldn't suffer much because of the reasons i told you earlier.
> On a busy system we will never switch cpus.
> 
> This static configuration might not work well with multiplatform images.

Well, it doesn't have to be a compile time thing.  It can be
dynamically switched with, say, static_branch(), but I think that's a
minor point at this point.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 6/7] block: queue work on any cpu
  2013-03-18 15:23 ` [PATCH V3 6/7] block: " Viresh Kumar
@ 2013-03-22 15:05   ` Jens Axboe
  2013-03-23  6:44     ` Viresh Kumar
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2013-03-22 15:05 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

On Mon, Mar 18 2013, Viresh Kumar wrote:
> block layer uses workqueues for multiple purposes. There is no real dependency
> of scheduling these on the cpu which scheduled them.
> 
> On a idle system, it is observed that and idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which isn't
> idle to save on power.
> 
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty
> 
> This patch replaces schedule_work() and queue_[delayed_]work() with
> queue_[delayed_]work_on_any_cpu() siblings.
> 
> These routines would look for the closest (via scheduling domains) non-idle cpu
> (non-idle from schedulers perspective). If the current cpu is not idle or all
> cpus are idle, work will be scheduled on local cpu.

What are the performance implications of this? From that perspective,
scheduling on a local core is the better choice. Hopefully it's already
running on the local socket of the device, keeping it on that node would
be preferable.

For the delayed functions, the timer is typically added on the current
CPU. It's hard to know what the state of idleness of CPUs would be when
the delay has expired. How are you handling this?

I can see the win from a power consumption point of view, but it quickly
gets a little more complicated than that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 5/7] mmc: queue work on any cpu
  2013-03-18 15:23 ` [PATCH V3 5/7] mmc: " Viresh Kumar
  2013-03-19  7:58   ` Ulf Hansson
@ 2013-03-22 17:09   ` Chris Ball
  2013-03-22 17:26   ` Chris Ball
  2 siblings, 0 replies; 37+ messages in thread
From: Chris Ball @ 2013-03-22 17:09 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel, linux-mmc

Hi,

On Mon, Mar 18 2013, Viresh Kumar wrote:
> mmc uses workqueues for running mmc_rescan(). There is no real dependency of
> scheduling these on the cpu which scheduled them.
>
> On a idle system, it is observed that and idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which isn't
> idle to save on power.
>
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty
>
> This patch replaces the queue_delayed_work() with
> queue_delayed_work_on_any_cpu() siblings.
>
> This routine would look for the closest (via scheduling domains) non-idle cpu
> (non-idle from schedulers perspective). If the current cpu is not idle or all
> cpus are idle, work will be scheduled on local cpu.
>
> Cc: Chris Ball <cjb@laptop.org>
> Cc: linux-mmc@vger.kernel.org
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/mmc/core/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 9290bb5..adf331a 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -85,7 +85,7 @@ MODULE_PARM_DESC(
>  static int mmc_schedule_delayed_work(struct delayed_work *work,
>  				     unsigned long delay)
>  {
> -	return queue_delayed_work(workqueue, work, delay);
> +	return queue_delayed_work_on_any_cpu(workqueue, work, delay);
>  }
>  
>  /*

Thanks, pushed to mmc-next for 3.10.

- Chris.
-- 
Chris Ball   <cjb@laptop.org>   <http://printf.net/>
One Laptop Per Child

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 5/7] mmc: queue work on any cpu
  2013-03-18 15:23 ` [PATCH V3 5/7] mmc: " Viresh Kumar
  2013-03-19  7:58   ` Ulf Hansson
  2013-03-22 17:09   ` Chris Ball
@ 2013-03-22 17:26   ` Chris Ball
  2013-03-22 17:27     ` Viresh Kumar
  2 siblings, 1 reply; 37+ messages in thread
From: Chris Ball @ 2013-03-22 17:26 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel, linux-mmc

Hi,

On Mon, Mar 18 2013, Viresh Kumar wrote:
> mmc uses workqueues for running mmc_rescan(). There is no real dependency of
> scheduling these on the cpu which scheduled them.
>
> On a idle system, it is observed that and idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which isn't
> idle to save on power.
>
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty
>
> This patch replaces the queue_delayed_work() with
> queue_delayed_work_on_any_cpu() siblings.
>
> This routine would look for the closest (via scheduling domains) non-idle cpu
> (non-idle from schedulers perspective). If the current cpu is not idle or all
> cpus are idle, work will be scheduled on local cpu.
>
> Cc: Chris Ball <cjb@laptop.org>
> Cc: linux-mmc@vger.kernel.org
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/mmc/core/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 9290bb5..adf331a 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -85,7 +85,7 @@ MODULE_PARM_DESC(
>  static int mmc_schedule_delayed_work(struct delayed_work *work,
>  				     unsigned long delay)
>  {
> -	return queue_delayed_work(workqueue, work, delay);
> +	return queue_delayed_work_on_any_cpu(workqueue, work, delay);
>  }
>  
>  /*

/home/cjb/git/mmc/drivers/mmc/core/core.c: In function ‘mmc_schedule_delayed_work’:
/home/cjb/git/mmc/drivers/mmc/core/core.c:88:2: error: implicit declaration of function ‘queue_delayed_work_on_any_cpu’ [-Werror=implicit-function-declaration]

I've dropped this patch for now.  This function doesn't seem to be
defined in linux-next either.

- Chris.
-- 
Chris Ball   <cjb@laptop.org>   <http://printf.net/>
One Laptop Per Child

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 5/7] mmc: queue work on any cpu
  2013-03-22 17:26   ` Chris Ball
@ 2013-03-22 17:27     ` Viresh Kumar
  2013-03-22 17:30       ` Chris Ball
  0 siblings, 1 reply; 37+ messages in thread
From: Viresh Kumar @ 2013-03-22 17:27 UTC (permalink / raw)
  To: Chris Ball
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel, linux-mmc

On 22 March 2013 22:56, Chris Ball <cjb@laptop.org> wrote:
> On Mon, Mar 18 2013, Viresh Kumar wrote:
>
> /home/cjb/git/mmc/drivers/mmc/core/core.c: In function ‘mmc_schedule_delayed_work’:
> /home/cjb/git/mmc/drivers/mmc/core/core.c:88:2: error: implicit declaration of function ‘queue_delayed_work_on_any_cpu’ [-Werror=implicit-function-declaration]
>
> I've dropped this patch for now.  This function doesn't seem to be
> defined in linux-next either.

Hi chris,

This patch was part of a bigger patchset which also adds this API. I
don't want you to
apply this one but just Ack here. Probably Tejun or some scheduler
maintainer will
apply it later (if they like all patches).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 5/7] mmc: queue work on any cpu
  2013-03-22 17:27     ` Viresh Kumar
@ 2013-03-22 17:30       ` Chris Ball
  0 siblings, 0 replies; 37+ messages in thread
From: Chris Ball @ 2013-03-22 17:30 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel, linux-mmc

Hi,

On Fri, Mar 22 2013, Viresh Kumar wrote:
> On 22 March 2013 22:56, Chris Ball <cjb@laptop.org> wrote:
>> On Mon, Mar 18 2013, Viresh Kumar wrote:
>>
>> /home/cjb/git/mmc/drivers/mmc/core/core.c: In function
>> ‘mmc_schedule_delayed_work’:
>> /home/cjb/git/mmc/drivers/mmc/core/core.c:88:2: error: implicit
>> declaration of function ‘queue_delayed_work_on_any_cpu’
>> [-Werror=implicit-function-declaration]
>>
>> I've dropped this patch for now.  This function doesn't seem to be
>> defined in linux-next either.
>
> Hi chris,
>
> This patch was part of a bigger patchset which also adds this API. I
> don't want you to
> apply this one but just Ack here. Probably Tejun or some scheduler
> maintainer will
> apply it later (if they like all patches).

Thanks, makes sense.  For [5/7]:

Acked-by: Chris Ball <cjb@laptop.org>

- Chris.
-- 
Chris Ball   <cjb@laptop.org>   <http://printf.net/>
One Laptop Per Child

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 6/7] block: queue work on any cpu
  2013-03-22 15:05   ` Jens Axboe
@ 2013-03-23  6:44     ` Viresh Kumar
  0 siblings, 0 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-23  6:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: pjt, paul.mckenney, tglx, tj, Ingo Molnar, peterz, rostedt,
	linaro-kernel, robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel

On 22 March 2013 20:35, Jens Axboe <axboe@kernel.dk> wrote:

Hi Jens,

First of all, please have a look at:

1. https://lkml.org/lkml/2013/3/18/364

This is cover-letter of my mail where i have shown power savings achieved
with this patchset.

2. https://lkml.org/lkml/2013/3/21/172

This is the link for discussion i had with Tejun which answers some of these
questions.

> What are the performance implications of this? From that perspective,
> scheduling on a local core is the better choice. Hopefully it's already
> running on the local socket of the device, keeping it on that node would
> be preferable.

If the local cpu is busy or all cpus are idle then we don't migrate, and so
for performance shouldn't be affected with it.

> For the delayed functions, the timer is typically added on the current
> CPU.

This is the code that executes here:

	if (unlikely(cpu != WORK_CPU_UNBOUND))
		add_timer_on(timer, cpu);
	else
		add_timer(timer);

Clearly for queue_delayed_work() call we don't use local cpu but we use
the same sched_select_non_idle_cpu() routine to give us the right cpu.

> It's hard to know what the state of idleness of CPUs would be when
> the delay has expired. How are you handling this?

Correct, and we are not deciding that right now, but when the timer expires.

Thanks for your feedback.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu
  2013-03-21 18:29       ` Tejun Heo
@ 2013-03-28 18:13         ` Tejun Heo
  2013-03-29  2:39           ` Viresh Kumar
  2013-03-29  7:27           ` Viresh Kumar
  0 siblings, 2 replies; 37+ messages in thread
From: Tejun Heo @ 2013-03-28 18:13 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, suresh.b.siddha, venki, mingo, peterz,
	rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

On Thu, Mar 21, 2013 at 11:29:37AM -0700, Tejun Heo wrote:
> Yes, I actually like that part a lot although I do wish the idle check
> was inlined.
> 
> What I'm wondering is whether the kinda out-of-band decision via
> sched_select_cpu() is justified given that it can and is likely to go
> through full scheduling decision anyway.  For timer, we don't have
> that, so it makes sense.  For work items, it's a bit different.
> 
> To rephrase, *if* the scheduler can't already make proper decisions
> regarding power consumption on an idlish system, maybe it can be
> improved to do so?  It could as well be that this CPU selection is
> special enough that it's just better to keep it separate as this
> patchset proposes.  This is something up to the scheduler people.
> Peter, Ingo, what do you guys think?

Ping.  Peter, Ingo?

Viresh, would it be difficult to make another measurement of the same
workload with the said workqueues converted to unbound?  I think that
would at least provide a nice reference point.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu
  2013-03-28 18:13         ` Tejun Heo
@ 2013-03-29  2:39           ` Viresh Kumar
  2013-03-29  7:27           ` Viresh Kumar
  1 sibling, 0 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-29  2:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: pjt, paul.mckenney, tglx, suresh.b.siddha, venki, mingo, peterz,
	rostedt, linaro-kernel, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, Arvind.Chauhan,
	linux-rt-users, linux-kernel

On 28 March 2013 23:43, Tejun Heo <tj@kernel.org> wrote:
> Ping.  Peter, Ingo?

Thanks for your ping, even i was thinking of it since sometime :)

> Viresh, would it be difficult to make another measurement of the same
> workload with the said workqueues converted to unbound?  I think that
> would at least provide a nice reference point.

I will try.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu
  2013-03-28 18:13         ` Tejun Heo
  2013-03-29  2:39           ` Viresh Kumar
@ 2013-03-29  7:27           ` Viresh Kumar
  2013-03-29 17:40             ` Tejun Heo
  1 sibling, 1 reply; 37+ messages in thread
From: Viresh Kumar @ 2013-03-29  7:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: pjt, paul.mckenney, tglx, mingo, peterz, rostedt, linaro-kernel,
	robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel

On 28 March 2013 23:43, Tejun Heo <tj@kernel.org> wrote:
> Viresh, would it be difficult to make another measurement of the same
> workload with the said workqueues converted to unbound?  I think that
> would at least provide a nice reference point.

My heart is shattered after getting the results :)


Cluster A15 Energy      Cluster A7 Energy       Total
-------------------------      -----------------------       ------

Without this patchset (Energy in Joules):
---------------------------------------------------

0.151162                2.183545                2.334707
0.223730                2.687067                2.910797
0.289687                2.732702                3.022389
0.454198                2.745908                3.200106
0.495552                2.746465                3.242017

Average:
0.322866                2.619137                2.942003


With this patchset (Energy in Joules):
-----------------------------------------------

0.133361                2.267822                2.401183
0.260626                2.833389                3.094015
0.142365                2.277929                2.420294
0.246793                2.582550                2.829343
0.130462                2.257033                2.387495

Average:
0.182721                2.443745                2.626466


With WQ Unbound (Energy in Joules):
-----------------------------------------------

0.226421                2.283658                2.510079
0.151361                2.236656                2.388017
0.197726                2.249849                2.447575
0.221915                2.229446                2.451361
0.347098                2.257707                2.604805

Average:
0.2289042              2.2514632              2.4803674

PS: I have to add another generic workqueue for these tests:
system_unbound_freezable_wq as block layer was using
system_freezable_wq.

--
viresh

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu
  2013-03-29  7:27           ` Viresh Kumar
@ 2013-03-29 17:40             ` Tejun Heo
  2013-03-29 17:56               ` Tejun Heo
  2013-03-30  3:30               ` Viresh Kumar
  0 siblings, 2 replies; 37+ messages in thread
From: Tejun Heo @ 2013-03-29 17:40 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, mingo, peterz, rostedt, linaro-kernel,
	robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel

Hello, Viresh.

On Fri, Mar 29, 2013 at 12:57:23PM +0530, Viresh Kumar wrote:
> On 28 March 2013 23:43, Tejun Heo <tj@kernel.org> wrote:
> > Viresh, would it be difficult to make another measurement of the same
> > workload with the said workqueues converted to unbound?  I think that
> > would at least provide a nice reference point.
> 
> My heart is shattered after getting the results :)
> 
> 
> Cluster A15 Energy      Cluster A7 Energy       Total
> -------------------------      -----------------------       ------
> 
> Without this patchset (Energy in Joules):
> ---------------------------------------------------
> 0.322866                2.619137                2.942003
> 
> 
> With this patchset (Energy in Joules):
> -----------------------------------------------
> 0.182721                2.443745                2.626466
> 
> 
> With WQ Unbound (Energy in Joules):
> -----------------------------------------------
> 0.2289042              2.2514632              2.4803674


So the scheduler does know what it's doing after all, which is a nice
news.  Given the result, the best thing to do would be somehow marking
these workqueues as unbound for powersaving?

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu
  2013-03-29 17:40             ` Tejun Heo
@ 2013-03-29 17:56               ` Tejun Heo
  2013-03-30  3:30               ` Viresh Kumar
  1 sibling, 0 replies; 37+ messages in thread
From: Tejun Heo @ 2013-03-29 17:56 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, mingo, peterz, rostedt, linaro-kernel,
	robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel

On Fri, Mar 29, 2013 at 10:40:03AM -0700, Tejun Heo wrote:
> So the scheduler does know what it's doing after all, which is a nice
> news.  Given the result, the best thing to do would be somehow marking
> these workqueues as unbound for powersaving?

BTW, recent changes to workqueue means that it shouldn't be too
difficult to create a combined workqueue which distributes work items
to both per-cpu and unbound worker pools depending on cpu_idle(), but,
at least for now, I think we can make do with much coarser approach.
We can refine it afterwards.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu
  2013-03-29 17:40             ` Tejun Heo
  2013-03-29 17:56               ` Tejun Heo
@ 2013-03-30  3:30               ` Viresh Kumar
  1 sibling, 0 replies; 37+ messages in thread
From: Viresh Kumar @ 2013-03-30  3:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: pjt, paul.mckenney, tglx, mingo, peterz, rostedt, linaro-kernel,
	robin.randhawa, Steve.Bannister, Liviu.Dudau,
	charles.garcia-tobin, Arvind.Chauhan, linux-rt-users,
	linux-kernel

On 29 March 2013 23:10, Tejun Heo <tj@kernel.org> wrote:
> So the scheduler does know what it's doing after all, which is a nice
> news.  Given the result, the best thing to do would be somehow marking
> these workqueues as unbound for powersaving?

Yes. I am going to do it soon.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2013-03-30  3:30 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-18 15:23 [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues Viresh Kumar
2013-03-18 15:23 ` [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving Viresh Kumar
2013-03-18 15:39   ` Frederic Weisbecker
2013-03-18 15:44     ` Viresh Kumar
2013-03-18 15:57       ` Frederic Weisbecker
2013-03-19  5:15         ` Viresh Kumar
2013-03-19  5:15           ` Viresh Kumar
2013-03-19 12:30   ` Peter Zijlstra
2013-03-19 12:52     ` Viresh Kumar
2013-03-19 13:22       ` Viresh Kumar
2013-03-18 15:23 ` [PATCH V3 2/7] timer: hrtimer: Don't check idle_cpu() before calling get_nohz_timer_target() Viresh Kumar
2013-03-18 15:23 ` [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu Viresh Kumar
2013-03-19  5:15   ` Viresh Kumar
2013-03-19 13:23     ` Viresh Kumar
2013-03-21  0:12   ` Tejun Heo
2013-03-21 10:57     ` Viresh Kumar
2013-03-21 18:29       ` Tejun Heo
2013-03-28 18:13         ` Tejun Heo
2013-03-29  2:39           ` Viresh Kumar
2013-03-29  7:27           ` Viresh Kumar
2013-03-29 17:40             ` Tejun Heo
2013-03-29 17:56               ` Tejun Heo
2013-03-30  3:30               ` Viresh Kumar
2013-03-18 15:23 ` [PATCH V3 4/7] PHYLIB: queue " Viresh Kumar
2013-03-18 17:33   ` David Miller
2013-03-18 15:23 ` [PATCH V3 5/7] mmc: " Viresh Kumar
2013-03-19  7:58   ` Ulf Hansson
2013-03-22 17:09   ` Chris Ball
2013-03-22 17:26   ` Chris Ball
2013-03-22 17:27     ` Viresh Kumar
2013-03-22 17:30       ` Chris Ball
2013-03-18 15:23 ` [PATCH V3 6/7] block: " Viresh Kumar
2013-03-22 15:05   ` Jens Axboe
2013-03-23  6:44     ` Viresh Kumar
2013-03-18 15:23 ` [PATCH V3 7/7] fbcon: " Viresh Kumar
2013-03-18 15:35   ` Viresh Kumar
2013-03-19  5:00 ` [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues Viresh Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.