linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers
@ 2012-11-06 10:34 Viresh Kumar
  2012-11-06 10:38 ` [PATCH V2 Resend 1/4] sched: Create sched_select_cpu() to give preferred CPU for power saving Viresh Kumar
                   ` (4 more replies)
  0 siblings, 5 replies; 48+ messages in thread
From: Viresh Kumar @ 2012-11-06 10:34 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users, Viresh Kumar

Hi All,

This is V2 Resend of my sched_select_cpu() work. Resend because didn't got much
attention on V2. Including more guys now in cc :)

In order to save power, it would be useful to schedule work onto non-IDLE cpus
instead of waking up an IDLE one.

To achieve this, we need scheduler to guide kernel frameworks (like: timers &
workqueues) on which is the most preferred CPU that must be used for these
tasks.

This patchset is about implementing this concept.

- The first patch adds sched_select_cpu() routine which returns the preferred
  cpu which is non-idle.
- Second patch removes idle_cpu() calls from timer & hrtimer.
- Third patch is about adapting this change in workqueue framework.
- Fourth patch add migration capability in running timer

Earlier discussions over v1 can be found here:
http://www.mail-archive.com/linaro-dev@lists.linaro.org/msg13342.html

Earlier discussions over this concept were done at last LPC:
http://summit.linuxplumbersconf.org/lpc-2012/meeting/90/lpc2012-sched-timer-workqueue/

Module created for testing this behavior is present here:
http://git.linaro.org/gitweb?p=people/vireshk/module.git;a=summary

Following are the steps followed in test module:
1. Run single work on each cpu
2. This work will start a timer after x (tested with 10) jiffies of delay
3. Timer routine queues a work... (This may be called from idle or non-idle cpu)
   and starts the same timer again STEP 3 is done for n number of times (i.e.
   queuing n works, one after other)
4. All works will call a single routine, which will count following per cpu:
 - Total works processed by a CPU
 - Total works processed by a CPU, which are queued from it
 - Total works processed by a CPU, which aren't queued from it

Setup:
-----
- ARM Vexpress TC2 - big.LITTLE CPU
- Core 0-1: A15, 2-4: A7
- rootfs: linaro-ubuntu-nano

Results:
-------
Without Workqueue Modification, i.e. PATCH 3/3:
[ 2493.022335] Workqueue Analyser: works processsed by CPU0, Total: 1000, Own: 0, migrated: 0
[ 2493.047789] Workqueue Analyser: works processsed by CPU1, Total: 1000, Own: 0, migrated: 0
[ 2493.072918] Workqueue Analyser: works processsed by CPU2, Total: 1000, Own: 0, migrated: 0
[ 2493.098576] Workqueue Analyser: works processsed by CPU3, Total: 1000, Own: 0, migrated: 0
[ 2493.123702] Workqueue Analyser: works processsed by CPU4, Total: 1000, Own: 0, migrated: 0

With Workqueue Modification, i.e. PATCH 3/3:
[ 2493.022335] Workqueue Analyser: works processsed by CPU0, Total: 1002, Own: 999, migrated: 3
[ 2493.047789] Workqueue Analyser: works processsed by CPU1, Total: 998,  Own: 997, migrated: 1
[ 2493.072918] Workqueue Analyser: works processsed by CPU2, Total: 1013, Own: 996, migrated: 17
[ 2493.098576] Workqueue Analyser: works processsed by CPU3, Total: 998,  Own: 993, migrated: 5
[ 2493.123702] Workqueue Analyser: works processsed by CPU4, Total: 989,  Own: 987, migrated: 2

V2->V2-Resend
-------------
- Included timer migration patch in the same thread.

V1->V2
-----
- New SD_* macros removed now and earlier ones used
- sched_select_cpu() rewritten and it includes the check on current cpu's
  idleness.
- cpu_idle() calls from timer and hrtimer removed now.
- Patch 2/3 from V1, removed as it doesn't apply to latest workqueue branch from
  tejun.
- CONFIG_MIGRATE_WQ removed and so is wq_select_cpu()
- sched_select_cpu() called only from __queue_work()
- got tejun/for-3.7 branch in my tree, before making workqueue changes.

Viresh Kumar (4):
  sched: Create sched_select_cpu() to give preferred CPU for power
    saving
  timer: hrtimer: Don't check idle_cpu() before calling
    get_nohz_timer_target()
  workqueue: Schedule work on non-idle cpu instead of current one
  timer: Migrate running timer

 include/linux/sched.h | 16 ++++++++++--
 include/linux/timer.h |  2 ++
 kernel/hrtimer.c      |  2 +-
 kernel/sched/core.c   | 69 +++++++++++++++++++++++++++++++--------------------
 kernel/timer.c        | 50 ++++++++++++++++++++++---------------
 kernel/workqueue.c    |  4 +--
 6 files changed, 91 insertions(+), 52 deletions(-)

-- 
1.7.12.rc2.18.g61b472e



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH V2 Resend 1/4] sched: Create sched_select_cpu() to give preferred CPU for power saving
  2012-11-06 10:34 [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers Viresh Kumar
@ 2012-11-06 10:38 ` Viresh Kumar
  2012-11-06 10:38 ` [PATCH V2 Resend 2/4] timer: hrtimer: Don't check idle_cpu() before calling get_nohz_timer_target() Viresh Kumar
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 48+ messages in thread
From: Viresh Kumar @ 2012-11-06 10:38 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users, Viresh Kumar

In order to save power, it would be useful to schedule work onto non-IDLE cpus
instead of waking up an IDLE one.

To achieve this, we need scheduler to guide kernel frameworks (like: timers &
workqueues) on which is the most preferred CPU that must be used for these
tasks.

This routine returns the preferred cpu which is non-idle. It accepts a bitwise
OR of SD_* flags present in linux/sched.h. If the local CPU isn't idle, it is
returned back. If it is idle, then we must look for another CPU which have all
the flags passed as argument as set. Also, as this activity is part of load
balancing only, SD_LOAD_BALANCE must also be set for selected domain.

This patch reuses the code from get_nohz_timer_target() routine, which had
similar implementation. get_nohz_timer_target() is also modified to use
sched_select_cpu() now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 include/linux/sched.h | 16 ++++++++++--
 kernel/sched/core.c   | 69 +++++++++++++++++++++++++++++++--------------------
 2 files changed, 56 insertions(+), 29 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0dd42a0..24f546d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -232,14 +232,26 @@ extern void init_idle_bootup_task(struct task_struct *idle);
 
 extern int runqueue_is_locked(int cpu);
 
-#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
+#ifdef CONFIG_SMP
+extern int sched_select_cpu(unsigned int sd_flags);
+
+#ifdef CONFIG_NO_HZ
 extern void nohz_balance_enter_idle(int cpu);
 extern void set_cpu_sd_state_idle(void);
-extern int get_nohz_timer_target(void);
+/*
+ * In the semi idle case, use the nearest busy cpu for migrating timers
+ * from an idle cpu.  This is good for power-savings.
+ *
+ * We don't do similar optimization for completely idle system, as
+ * selecting an idle cpu will add more delays to the timers than intended
+ * (as that cpu's timer base may not be uptodate wrt jiffies etc).
+ */
+#define get_nohz_timer_target() sched_select_cpu(0)
 #else
 static inline void nohz_balance_enter_idle(int cpu) { }
 static inline void set_cpu_sd_state_idle(void) { }
 #endif
+#endif /* CONFIG_SMP */
 
 /*
  * Only dump TASK_* tasks. (0 for all tasks)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..cf1a420 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -542,33 +542,6 @@ void resched_cpu(int cpu)
 
 #ifdef CONFIG_NO_HZ
 /*
- * In the semi idle case, use the nearest busy cpu for migrating timers
- * from an idle cpu.  This is good for power-savings.
- *
- * We don't do similar optimization for completely idle system, as
- * selecting an idle cpu will add more delays to the timers than intended
- * (as that cpu's timer base may not be uptodate wrt jiffies etc).
- */
-int get_nohz_timer_target(void)
-{
-	int cpu = smp_processor_id();
-	int i;
-	struct sched_domain *sd;
-
-	rcu_read_lock();
-	for_each_domain(cpu, sd) {
-		for_each_cpu(i, sched_domain_span(sd)) {
-			if (!idle_cpu(i)) {
-				cpu = i;
-				goto unlock;
-			}
-		}
-	}
-unlock:
-	rcu_read_unlock();
-	return cpu;
-}
-/*
  * When add_timer_on() enqueues a timer into the timer wheel of an
  * idle CPU then this timer might expire before the next timer event
  * which is scheduled to wake up that CPU. In case of a completely
@@ -639,6 +612,48 @@ void sched_avg_update(struct rq *rq)
 	}
 }
 
+/*
+ * This routine returns the preferred cpu which is non-idle. It accepts a
+ * bitwise OR of SD_* flags present in linux/sched.h. If the local CPU isn't
+ * idle, it is returned back. If it is idle, then we must look for another CPU
+ * which have all the flags passed as argument as set. Also, as this activity is
+ * part of load balancing only, SD_LOAD_BALANCE must also be set for selected
+ * domain.
+ */
+int sched_select_cpu(unsigned int sd_flags)
+{
+	struct sched_domain *sd;
+	int cpu = smp_processor_id();
+	int i;
+
+	/* If Current cpu isn't idle, don't migrate anything */
+	if (!idle_cpu(cpu))
+		return cpu;
+
+	/* Add SD_LOAD_BALANCE to flags */
+	sd_flags |= SD_LOAD_BALANCE;
+
+	rcu_read_lock();
+	for_each_domain(cpu, sd) {
+		/*
+		 * If sd doesnt' have both sd_flags and SD_LOAD_BALANCE set,
+		 * skip sd.
+		 */
+		if ((sd->flags & sd_flags) != sd_flags)
+			continue;
+
+		for_each_cpu(i, sched_domain_span(sd)) {
+			if (!idle_cpu(i)) {
+				cpu = i;
+				goto unlock;
+			}
+		}
+	}
+unlock:
+	rcu_read_unlock();
+	return cpu;
+}
+
 #else /* !CONFIG_SMP */
 void resched_task(struct task_struct *p)
 {
-- 
1.7.12.rc2.18.g61b472e



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V2 Resend 2/4] timer: hrtimer: Don't check idle_cpu() before calling get_nohz_timer_target()
  2012-11-06 10:34 [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers Viresh Kumar
  2012-11-06 10:38 ` [PATCH V2 Resend 1/4] sched: Create sched_select_cpu() to give preferred CPU for power saving Viresh Kumar
@ 2012-11-06 10:38 ` Viresh Kumar
  2012-11-06 10:38 ` [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one Viresh Kumar
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 48+ messages in thread
From: Viresh Kumar @ 2012-11-06 10:38 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users, Viresh Kumar

Check for current cpu's idleness already done in implementation of
sched_select_cpu() which is called by get_nohz_timer_target(). So, no need to
call idle_cpu() twice, once from sched_select_cpu() and once from timer and
hrtimer before calling get_nohz_timer_target().

This patch removes calls to idle_cpu() from timer and hrtimer.

This also reduces an extra call to smp_processor_id() when
get_nohz_timer_target() is called.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/hrtimer.c | 2 +-
 kernel/timer.c   | 8 +++++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 6db7a5e..74bdaf6 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -159,7 +159,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
 static int hrtimer_get_target(int this_cpu, int pinned)
 {
 #ifdef CONFIG_NO_HZ
-	if (!pinned && get_sysctl_timer_migration() && idle_cpu(this_cpu))
+	if (!pinned && get_sysctl_timer_migration())
 		return get_nohz_timer_target();
 #endif
 	return this_cpu;
diff --git a/kernel/timer.c b/kernel/timer.c
index 367d008..1170ece 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -735,11 +735,13 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
 
 	debug_activate(timer, expires);
 
-	cpu = smp_processor_id();
-
 #if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
-	if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu))
+	if (!pinned && get_sysctl_timer_migration())
 		cpu = get_nohz_timer_target();
+	else
+		cpu = smp_processor_id();
+#else
+	cpu = smp_processor_id();
 #endif
 	new_base = per_cpu(tvec_bases, cpu);
 
-- 
1.7.12.rc2.18.g61b472e



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2012-11-06 10:34 [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers Viresh Kumar
  2012-11-06 10:38 ` [PATCH V2 Resend 1/4] sched: Create sched_select_cpu() to give preferred CPU for power saving Viresh Kumar
  2012-11-06 10:38 ` [PATCH V2 Resend 2/4] timer: hrtimer: Don't check idle_cpu() before calling get_nohz_timer_target() Viresh Kumar
@ 2012-11-06 10:38 ` Viresh Kumar
  2012-11-26 17:15   ` Tejun Heo
  2012-11-27 13:26   ` Steven Rostedt
  2012-11-06 10:38 ` [PATCH V2 Resend 4/4] timer: Migrate running timer Viresh Kumar
  2012-11-26 15:00 ` [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers Viresh Kumar
  4 siblings, 2 replies; 48+ messages in thread
From: Viresh Kumar @ 2012-11-06 10:38 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users, Viresh Kumar

Workqueues queues work on current cpu, if the caller haven't passed a preferred
cpu. This may wake up an idle CPU, which is actually not required.

This work can be processed by any CPU and so we must select a non-idle CPU here.
This patch adds in support in workqueue framework to get preferred CPU details
from the scheduler, instead of using current CPU.

Most of the time when a work is queued, the current cpu isn't idle and so we
will choose it only. There are cases when a cpu is idle when it queues some
work. For example, consider following scenario:
- A cpu has programmed a timer and is IDLE now.
- CPU gets into interrupt handler due to timer and queues a work. As the CPU is
  currently IDLE, we can queue this work to some other CPU.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/workqueue.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 042d221..d32efa2 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1238,7 +1238,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 		struct global_cwq *last_gcwq;
 
 		if (cpu == WORK_CPU_UNBOUND)
-			cpu = raw_smp_processor_id();
+			cpu = sched_select_cpu(0);
 
 		/*
 		 * It's multi cpu.  If @work was previously on a different
@@ -1383,7 +1383,7 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 		if (gcwq)
 			lcpu = gcwq->cpu;
 		if (lcpu == WORK_CPU_UNBOUND)
-			lcpu = raw_smp_processor_id();
+			lcpu = sched_select_cpu(0);
 	} else {
 		lcpu = WORK_CPU_UNBOUND;
 	}
-- 
1.7.12.rc2.18.g61b472e



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V2 Resend 4/4] timer: Migrate running timer
  2012-11-06 10:34 [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers Viresh Kumar
                   ` (2 preceding siblings ...)
  2012-11-06 10:38 ` [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one Viresh Kumar
@ 2012-11-06 10:38 ` Viresh Kumar
  2012-11-27 13:47   ` Steven Rostedt
  2012-11-26 15:00 ` [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers Viresh Kumar
  4 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2012-11-06 10:38 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users, Viresh Kumar

Till now, we weren't migrating a running timer because with migration
del_timer_sync() can't detect that the timer's handler yet has not finished.

Now, when can we actually to reach to the code (inside __mod_timer()) where

base->running_timer == timer ? i.e. We are trying to migrate current timer

I can see only following case:
- Timer re-armed itself. i.e. Currently we are running interrupt handler of a
  timer and it rearmed itself from there. At this time user might have called
  del_timer_sync() or not. If not, then there is no harm in re-arming the timer?

Now, when somebody tries to delete a timer, obviously he doesn't want to run it
any more for now. So, why should we ever re-arm a timer, which is scheduled for
deletion?

This patch tries to fix "migration of running timer", assuming above theory is
correct :)

So, now when we get a call to del_timer_sync(), we will mark it scheduled for
deletion in an additional variable. This would be checked whenever we try to
modify/arm it again. If it was currently scheduled for deletion, we must not
modify/arm it.

And so, whenever we reach to the situation where:
base->running_timer == timer

We are sure, nobody is waiting in del_timer_sync().

We will clear this flag as soon as the timer is deleted, so that it can be
started again after deleting it successfully.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 include/linux/timer.h |  2 ++
 kernel/timer.c        | 42 +++++++++++++++++++++++++-----------------
 2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197..6aa720f 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -22,6 +22,7 @@ struct timer_list {
 	unsigned long data;
 
 	int slack;
+	int sched_del;
 
 #ifdef CONFIG_TIMER_STATS
 	int start_pid;
@@ -77,6 +78,7 @@ extern struct tvec_base boot_tvec_bases;
 		.data = (_data),				\
 		.base = (void *)((unsigned long)&boot_tvec_bases + (_flags)), \
 		.slack = -1,					\
+		.sched_del = 0,					\
 		__TIMER_LOCKDEP_MAP_INITIALIZER(		\
 			__FILE__ ":" __stringify(__LINE__))	\
 	}
diff --git a/kernel/timer.c b/kernel/timer.c
index 1170ece..14e1f76 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -622,6 +622,7 @@ static void do_init_timer(struct timer_list *timer, unsigned int flags,
 	timer->entry.next = NULL;
 	timer->base = (void *)((unsigned long)base | flags);
 	timer->slack = -1;
+	timer->sched_del = 0;
 #ifdef CONFIG_TIMER_STATS
 	timer->start_site = NULL;
 	timer->start_pid = -1;
@@ -729,6 +730,12 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
 
 	base = lock_timer_base(timer, &flags);
 
+	if (timer->sched_del) {
+		/* Don't schedule it again, as it is getting deleted */
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
 	ret = detach_if_pending(timer, base, false);
 	if (!ret && pending_only)
 		goto out_unlock;
@@ -746,21 +753,12 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
 	new_base = per_cpu(tvec_bases, cpu);
 
 	if (base != new_base) {
-		/*
-		 * We are trying to schedule the timer on the local CPU.
-		 * However we can't change timer's base while it is running,
-		 * otherwise del_timer_sync() can't detect that the timer's
-		 * handler yet has not finished. This also guarantees that
-		 * the timer is serialized wrt itself.
-		 */
-		if (likely(base->running_timer != timer)) {
-			/* See the comment in lock_timer_base() */
-			timer_set_base(timer, NULL);
-			spin_unlock(&base->lock);
-			base = new_base;
-			spin_lock(&base->lock);
-			timer_set_base(timer, base);
-		}
+		/* See the comment in lock_timer_base() */
+		timer_set_base(timer, NULL);
+		spin_unlock(&base->lock);
+		base = new_base;
+		spin_lock(&base->lock);
+		timer_set_base(timer, base);
 	}
 
 	timer->expires = expires;
@@ -1039,9 +1037,11 @@ EXPORT_SYMBOL(try_to_del_timer_sync);
  */
 int del_timer_sync(struct timer_list *timer)
 {
-#ifdef CONFIG_LOCKDEP
+	struct tvec_base *base;
 	unsigned long flags;
 
+#ifdef CONFIG_LOCKDEP
+
 	/*
 	 * If lockdep gives a backtrace here, please reference
 	 * the synchronization rules above.
@@ -1051,6 +1051,12 @@ int del_timer_sync(struct timer_list *timer)
 	lock_map_release(&timer->lockdep_map);
 	local_irq_restore(flags);
 #endif
+
+	/* Timer is scheduled for deletion, don't let it re-arm itself */
+	base = lock_timer_base(timer, &flags);
+	timer->sched_del = 1;
+	spin_unlock_irqrestore(&base->lock, flags);
+
 	/*
 	 * don't use it in hardirq context, because it
 	 * could lead to deadlock.
@@ -1058,8 +1064,10 @@ int del_timer_sync(struct timer_list *timer)
 	WARN_ON(in_irq() && !tbase_get_irqsafe(timer->base));
 	for (;;) {
 		int ret = try_to_del_timer_sync(timer);
-		if (ret >= 0)
+		if (ret >= 0) {
+			timer->sched_del = 0;
 			return ret;
+		}
 		cpu_relax();
 	}
 }
-- 
1.7.12.rc2.18.g61b472e



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers
  2012-11-06 10:34 [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers Viresh Kumar
                   ` (3 preceding siblings ...)
  2012-11-06 10:38 ` [PATCH V2 Resend 4/4] timer: Migrate running timer Viresh Kumar
@ 2012-11-26 15:00 ` Viresh Kumar
  2012-11-26 16:40   ` Steven Rostedt
  4 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2012-11-26 15:00 UTC (permalink / raw)
  To: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, rostedt
  Cc: Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users, Viresh Kumar

On 6 November 2012 16:08, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> This is V2 Resend of my sched_select_cpu() work. Resend because didn't got much
> attention on V2. Including more guys now in cc :)
>
> In order to save power, it would be useful to schedule work onto non-IDLE cpus
> instead of waking up an IDLE one.
>
> To achieve this, we need scheduler to guide kernel frameworks (like: timers &
> workqueues) on which is the most preferred CPU that must be used for these
> tasks.
>
> This patchset is about implementing this concept.
>
> - The first patch adds sched_select_cpu() routine which returns the preferred
>   cpu which is non-idle.
> - Second patch removes idle_cpu() calls from timer & hrtimer.
> - Third patch is about adapting this change in workqueue framework.
> - Fourth patch add migration capability in running timer
>
> Earlier discussions over v1 can be found here:
> http://www.mail-archive.com/linaro-dev@lists.linaro.org/msg13342.html
>
> Earlier discussions over this concept were done at last LPC:
> http://summit.linuxplumbersconf.org/lpc-2012/meeting/90/lpc2012-sched-timer-workqueue/
>
> Module created for testing this behavior is present here:
> http://git.linaro.org/gitweb?p=people/vireshk/module.git;a=summary

Ping!!

--
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers
  2012-11-26 15:00 ` [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers Viresh Kumar
@ 2012-11-26 16:40   ` Steven Rostedt
  2012-11-26 17:03     ` Paul E. McKenney
  2012-11-27  6:25     ` Viresh Kumar
  0 siblings, 2 replies; 48+ messages in thread
From: Steven Rostedt @ 2012-11-26 16:40 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users

On Mon, 2012-11-26 at 20:30 +0530, Viresh Kumar wrote:
> On 6 November 2012 16:08, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > This is V2 Resend of my sched_select_cpu() work. Resend because didn't got much
> > attention on V2. Including more guys now in cc :)
> >
> > In order to save power, it would be useful to schedule work onto non-IDLE cpus
> > instead of waking up an IDLE one.
> >
> > To achieve this, we need scheduler to guide kernel frameworks (like: timers &
> > workqueues) on which is the most preferred CPU that must be used for these
> > tasks.
> >
> > This patchset is about implementing this concept.
> >
> > - The first patch adds sched_select_cpu() routine which returns the preferred
> >   cpu which is non-idle.
> > - Second patch removes idle_cpu() calls from timer & hrtimer.
> > - Third patch is about adapting this change in workqueue framework.
> > - Fourth patch add migration capability in running timer
> >
> > Earlier discussions over v1 can be found here:
> > http://www.mail-archive.com/linaro-dev@lists.linaro.org/msg13342.html
> >
> > Earlier discussions over this concept were done at last LPC:
> > http://summit.linuxplumbersconf.org/lpc-2012/meeting/90/lpc2012-sched-timer-workqueue/
> >
> > Module created for testing this behavior is present here:
> > http://git.linaro.org/gitweb?p=people/vireshk/module.git;a=summary
> 
> Ping!!

This is a really bad time of year to post new patches :-/
A lot of people are trying to get their own work done by year end and
then there's holidays and such that are also distractions. Not to
mention that a new merge window will be opening soon.

That said...

As workqueues are set off by the CPU that queued it, what real benefit
does this give? A CPU was active when it queued the work and the work
should be done before it gets back to sleep.

OK, an interrupt happens on an idle CPU and queues some work. That work
should execute before the CPU gets back to sleep, right? I fail to see
the benefit of trying to move that work elsewhere. The CPU had to wake
up to execute the interrupt. It's no longer in a deep sleep (or any
sleep for that matter).

To me it seems best to avoid waking up an idle CPU in the first place.

I'm still working off a turkey overdose, so maybe I'm missing something
obvious.

-- Steve



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers
  2012-11-26 16:40   ` Steven Rostedt
@ 2012-11-26 17:03     ` Paul E. McKenney
  2012-11-26 17:35       ` Steven Rostedt
  2012-11-27  6:25     ` Viresh Kumar
  1 sibling, 1 reply; 48+ messages in thread
From: Paul E. McKenney @ 2012-11-26 17:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Viresh Kumar, pjt, paul.mckenney, tglx, tj, suresh.b.siddha,
	venki, mingo, peterz, Arvind.Chauhan, linaro-dev, patches,
	pdsw-power-team, linux-kernel, linux-rt-users

On Mon, Nov 26, 2012 at 11:40:27AM -0500, Steven Rostedt wrote:
> On Mon, 2012-11-26 at 20:30 +0530, Viresh Kumar wrote:
> > On 6 November 2012 16:08, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > This is V2 Resend of my sched_select_cpu() work. Resend because didn't got much
> > > attention on V2. Including more guys now in cc :)
> > >
> > > In order to save power, it would be useful to schedule work onto non-IDLE cpus
> > > instead of waking up an IDLE one.
> > >
> > > To achieve this, we need scheduler to guide kernel frameworks (like: timers &
> > > workqueues) on which is the most preferred CPU that must be used for these
> > > tasks.
> > >
> > > This patchset is about implementing this concept.
> > >
> > > - The first patch adds sched_select_cpu() routine which returns the preferred
> > >   cpu which is non-idle.
> > > - Second patch removes idle_cpu() calls from timer & hrtimer.
> > > - Third patch is about adapting this change in workqueue framework.
> > > - Fourth patch add migration capability in running timer
> > >
> > > Earlier discussions over v1 can be found here:
> > > http://www.mail-archive.com/linaro-dev@lists.linaro.org/msg13342.html
> > >
> > > Earlier discussions over this concept were done at last LPC:
> > > http://summit.linuxplumbersconf.org/lpc-2012/meeting/90/lpc2012-sched-timer-workqueue/
> > >
> > > Module created for testing this behavior is present here:
> > > http://git.linaro.org/gitweb?p=people/vireshk/module.git;a=summary
> > 
> > Ping!!
> 
> This is a really bad time of year to post new patches :-/
> A lot of people are trying to get their own work done by year end and
> then there's holidays and such that are also distractions. Not to
> mention that a new merge window will be opening soon.
> 
> That said...
> 
> As workqueues are set off by the CPU that queued it, what real benefit
> does this give? A CPU was active when it queued the work and the work
> should be done before it gets back to sleep.
> 
> OK, an interrupt happens on an idle CPU and queues some work. That work
> should execute before the CPU gets back to sleep, right? I fail to see
> the benefit of trying to move that work elsewhere. The CPU had to wake
> up to execute the interrupt. It's no longer in a deep sleep (or any
> sleep for that matter).
> 
> To me it seems best to avoid waking up an idle CPU in the first place.
> 
> I'm still working off a turkey overdose, so maybe I'm missing something
> obvious.

If I understand correctly (though also suffering turkey OD), the idea is
to offload work to more energy-efficient CPUs.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2012-11-06 10:38 ` [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one Viresh Kumar
@ 2012-11-26 17:15   ` Tejun Heo
  2012-11-27  5:19     ` Viresh Kumar
  2012-11-27 13:26   ` Steven Rostedt
  1 sibling, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2012-11-26 17:15 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, suresh.b.siddha, venki, mingo, peterz,
	rostedt, Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users

Hello, Viresh.

On Tue, Nov 06, 2012 at 04:08:45PM +0530, Viresh Kumar wrote:
> Workqueues queues work on current cpu, if the caller haven't passed a preferred
> cpu. This may wake up an idle CPU, which is actually not required.
> 
> This work can be processed by any CPU and so we must select a non-idle CPU here.
> This patch adds in support in workqueue framework to get preferred CPU details
> from the scheduler, instead of using current CPU.
> 
> Most of the time when a work is queued, the current cpu isn't idle and so we
> will choose it only. There are cases when a cpu is idle when it queues some
> work. For example, consider following scenario:
> - A cpu has programmed a timer and is IDLE now.
> - CPU gets into interrupt handler due to timer and queues a work. As the CPU is
>   currently IDLE, we can queue this work to some other CPU.

I'm pretty skeptical about this.  queue_work() w/o explicit CPU
assignment has always guaranteed that the work item will be executed
on the same CPU.  I don't think there are too many users depending on
that but am not sure at all that there are none.  I asked you last
time that you would at the very least need to audit most users but it
doesn't seem like there has been any effort there.  Given the
seemingly low rate of migration and subtlety of such bugs, things
could get nasty to debug.

That said, if the obtained benefit is big enough, sure, we can
definitely change the behavior, which isn't all that great to begin
with, and try to shake out the bugs quicky by e.g. forcing all work
items to execute on different CPUs, but you're presenting a few
percent of work items being migrated to a different CPU from an
already active CPU, which doesn't seem like such a big benefit to me
even if the migration target CPU is somewhat more efficient.  How much
powersaving are we talking about?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers
  2012-11-26 17:03     ` Paul E. McKenney
@ 2012-11-26 17:35       ` Steven Rostedt
  2012-11-26 19:03         ` Paul E. McKenney
  0 siblings, 1 reply; 48+ messages in thread
From: Steven Rostedt @ 2012-11-26 17:35 UTC (permalink / raw)
  To: paulmck
  Cc: Viresh Kumar, pjt, paul.mckenney, tglx, tj, suresh.b.siddha,
	venki, mingo, peterz, Arvind.Chauhan, linaro-dev, patches,
	pdsw-power-team, linux-kernel, linux-rt-users

On Mon, 2012-11-26 at 09:03 -0800, Paul E. McKenney wrote:


> If I understand correctly (though also suffering turkey OD), the idea is
> to offload work to more energy-efficient CPUs.

This is determined by a CPU that isn't running the idle task? Is it
because a CPU that just woke up may be running at a lower freq, and thus
not as efficient? But pushing off to another CPU may cause cache misses
as well. Wouldn't that also be a factor in efficiencies, if a CPU is
stalled waiting for memory to be loaded?

I should also ask the obvious. Has these patches shown real world
efficiencies or is this just a theory? Do these patches actually improve
battery life when applied?

Just asking.

-- Steve



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers
  2012-11-26 17:35       ` Steven Rostedt
@ 2012-11-26 19:03         ` Paul E. McKenney
  2012-11-26 19:17           ` Steven Rostedt
  0 siblings, 1 reply; 48+ messages in thread
From: Paul E. McKenney @ 2012-11-26 19:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Viresh Kumar, pjt, paul.mckenney, tglx, tj, suresh.b.siddha,
	venki, mingo, peterz, Arvind.Chauhan, linaro-dev, patches,
	pdsw-power-team, linux-kernel, linux-rt-users

On Mon, Nov 26, 2012 at 12:35:52PM -0500, Steven Rostedt wrote:
> On Mon, 2012-11-26 at 09:03 -0800, Paul E. McKenney wrote:
> 
> 
> > If I understand correctly (though also suffering turkey OD), the idea is
> > to offload work to more energy-efficient CPUs.
> 
> This is determined by a CPU that isn't running the idle task? Is it
> because a CPU that just woke up may be running at a lower freq, and thus
> not as efficient? But pushing off to another CPU may cause cache misses
> as well. Wouldn't that also be a factor in efficiencies, if a CPU is
> stalled waiting for memory to be loaded?

Two different microarchitectures -- same instruction set (at user level,
anyway), but different power/performance characteristics.  One set is
optimized for performance, the other for energy efficiency.  For example,
ARM's big.LITTLE architecture.

> I should also ask the obvious. Has these patches shown real world
> efficiencies or is this just a theory? Do these patches actually improve
> battery life when applied?

I must defer to Viresh on this one.

							Thanx, Paul

> Just asking.
> 
> -- Steve
> 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers
  2012-11-26 19:03         ` Paul E. McKenney
@ 2012-11-26 19:17           ` Steven Rostedt
  0 siblings, 0 replies; 48+ messages in thread
From: Steven Rostedt @ 2012-11-26 19:17 UTC (permalink / raw)
  To: paulmck
  Cc: Viresh Kumar, pjt, paul.mckenney, tglx, tj, suresh.b.siddha,
	venki, mingo, peterz, Arvind.Chauhan, linaro-dev, patches,
	pdsw-power-team, linux-kernel, linux-rt-users

On Mon, 2012-11-26 at 11:03 -0800, Paul E. McKenney wrote:
> On Mon, Nov 26, 2012 at 12:35:52PM -0500, Steven Rostedt wrote:
> > On Mon, 2012-11-26 at 09:03 -0800, Paul E. McKenney wrote:
> > 
> > 
> > > If I understand correctly (though also suffering turkey OD), the idea is
> > > to offload work to more energy-efficient CPUs.
> > 
> > This is determined by a CPU that isn't running the idle task? Is it
> > because a CPU that just woke up may be running at a lower freq, and thus
> > not as efficient? But pushing off to another CPU may cause cache misses
> > as well. Wouldn't that also be a factor in efficiencies, if a CPU is
> > stalled waiting for memory to be loaded?
> 
> Two different microarchitectures -- same instruction set (at user level,
> anyway), but different power/performance characteristics.  One set is
> optimized for performance, the other for energy efficiency.  For example,
> ARM's big.LITTLE architecture.
> 

But I don't see anything in the patch set that guarantees that we will
be moving something off to one of the "big" cores. The only heuristic
that is used is if the CPU is idle or not. In fact, if the big core was
idle we may be pushing work off to a "LITTLE" CPU.

Again, work is only run on the CPU it was queued on (which this patch
set is trying to change). Thus, only work that would be queued on a
LITTLE CPU is by something that ran on that CPU.

I'm still having a bit of trouble understanding where the benefit comes
from.

-- Steve



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2012-11-26 17:15   ` Tejun Heo
@ 2012-11-27  5:19     ` Viresh Kumar
  2012-11-27 12:54       ` Vincent Guittot
  2013-01-04 11:11       ` Viresh Kumar
  0 siblings, 2 replies; 48+ messages in thread
From: Viresh Kumar @ 2012-11-27  5:19 UTC (permalink / raw)
  To: Tejun Heo, Vincent Guittot
  Cc: pjt, paul.mckenney, tglx, suresh.b.siddha, venki, mingo, peterz,
	rostedt, Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users

Hi Tejun,

On 26 November 2012 22:45, Tejun Heo <tj@kernel.org> wrote:
> On Tue, Nov 06, 2012 at 04:08:45PM +0530, Viresh Kumar wrote:

> I'm pretty skeptical about this.  queue_work() w/o explicit CPU
> assignment has always guaranteed that the work item will be executed
> on the same CPU.  I don't think there are too many users depending on
> that but am not sure at all that there are none.  I asked you last
> time that you would at the very least need to audit most users but it
> doesn't seem like there has been any effort there.

My bad. I completely missed/forgot that comment from your earlier mails.
Will do it.

> That said, if the obtained benefit is big enough, sure, we can
> definitely change the behavior, which isn't all that great to begin
> with, and try to shake out the bugs quicky by e.g. forcing all work
> items to execute on different CPUs, but you're presenting a few
> percent of work items being migrated to a different CPU from an
> already active CPU, which doesn't seem like such a big benefit to me
> even if the migration target CPU is somewhat more efficient.  How much
> powersaving are we talking about?

Hmm.. I actually implemented the problem discussed here:
(I know you have seen this earlier :) )

http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/08/lpc2012-sched-timer-workqueue.pdf

Specifically slides: 12 & 19.

I haven't done much power calculations with it and have tested it more from
functionality point of view.

@Vincent: Can you add some comments here?

--
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers
  2012-11-26 16:40   ` Steven Rostedt
  2012-11-26 17:03     ` Paul E. McKenney
@ 2012-11-27  6:25     ` Viresh Kumar
  1 sibling, 0 replies; 48+ messages in thread
From: Viresh Kumar @ 2012-11-27  6:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users

Hi Steven,

Thanks for sharing your opinion. :)

As, this went out to be a long thread of discussion (thanks Paul), i will try to
answer everything here.

On 26 November 2012 22:10, Steven Rostedt <rostedt@goodmis.org> wrote:
> This is a really bad time of year to post new patches :-/
> A lot of people are trying to get their own work done by year end and
> then there's holidays and such that are also distractions. Not to
> mention that a new merge window will be opening soon.

Patches are there since End of September and it was just a ping now (actually
with bad timing - merge window).

> As workqueues are set off by the CPU that queued it, what real benefit
> does this give? A CPU was active when it queued the work and the work
> should be done before it gets back to sleep.
>
> OK, an interrupt happens on an idle CPU and queues some work. That work
> should execute before the CPU gets back to sleep, right? I fail to see
> the benefit of trying to move that work elsewhere. The CPU had to wake
> up to execute the interrupt. It's no longer in a deep sleep (or any
> sleep for that matter).
>
> To me it seems best to avoid waking up an idle CPU in the first place.
>
> I'm still working off a turkey overdose, so maybe I'm missing something
> obvious.

Ok, here is the story behind these patches. The idea was first discussed
by Vincent in LPC this year:

http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/08/lpc2012-sched-timer-workqueue.pdf

Specifically slides: 12 & 19.

cpu idleness here means idleness from scheduler's perspective, i.e.
For the scheduler CPU is idle, if all below are true:
- current task is idle task
- nr_running == 0
- wake_list is empty


There are two use cases of workqueue patch 3/4:

- queue-work from timer interrupt, which re-arms itself (slide 12):
  Whether timer is deferred or not, it will queue the work on current
cpu once cpu
  wakes up. The cpu could have immediately gone back to idle state again, if the
  work is not queued on it. So, because this cpu is idle (from
schedulers point of
  view), we can move this work to other cpu.

- delayed-work, which rearm's itself (slide 19):
  Again the same thing, we could have kept the cpu in idle state for
some more time.

There might not be many users with this behavior, but a single user can actually
have significant impact.

For now, it doesn't take care of big LITTLE stuff that Paul pointed
out, but yes that
is in plan. Some work is going on in that direction too:

http://linux.kernel.narkive.com/mCyvFVUX/rfc-0-6-sched-packing-small-tasks

The very first reason of having this patchset was to have a single
preferred_cpu()
routine, which can be used by all frameworks. Timers being the first
user (already),
workqueues tried to be the second one.

I tested it more from functionality point of view rather than with
power figures :(
And i understood, that it is very much required.

Having said that, I believe all the questions raised are on PATCH 3/4
(workqueue).
And other 3 patches should be fine. Can you share your opinion on those patches,
I will then split this patchset and send workqueue stuff after doing some power
measurements later.

--
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2012-11-27  5:19     ` Viresh Kumar
@ 2012-11-27 12:54       ` Vincent Guittot
  2013-01-04 11:11       ` Viresh Kumar
  1 sibling, 0 replies; 48+ messages in thread
From: Vincent Guittot @ 2012-11-27 12:54 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Tejun Heo, pjt, paul.mckenney, tglx, suresh.b.siddha, venki,
	mingo, peterz, rostedt, Arvind.Chauhan, linaro-dev, patches,
	pdsw-power-team, linux-kernel, linux-rt-users

On 27 November 2012 06:19, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Hi Tejun,
>
> On 26 November 2012 22:45, Tejun Heo <tj@kernel.org> wrote:
>> On Tue, Nov 06, 2012 at 04:08:45PM +0530, Viresh Kumar wrote:
>
>> I'm pretty skeptical about this.  queue_work() w/o explicit CPU
>> assignment has always guaranteed that the work item will be executed
>> on the same CPU.  I don't think there are too many users depending on
>> that but am not sure at all that there are none.  I asked you last
>> time that you would at the very least need to audit most users but it
>> doesn't seem like there has been any effort there.
>
> My bad. I completely missed/forgot that comment from your earlier mails.
> Will do it.
>
>> That said, if the obtained benefit is big enough, sure, we can
>> definitely change the behavior, which isn't all that great to begin
>> with, and try to shake out the bugs quicky by e.g. forcing all work
>> items to execute on different CPUs, but you're presenting a few
>> percent of work items being migrated to a different CPU from an
>> already active CPU, which doesn't seem like such a big benefit to me
>> even if the migration target CPU is somewhat more efficient.  How much
>> powersaving are we talking about?
>
> Hmm.. I actually implemented the problem discussed here:
> (I know you have seen this earlier :) )
>
> http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/08/lpc2012-sched-timer-workqueue.pdf
>
> Specifically slides: 12 & 19.
>
> I haven't done much power calculations with it and have tested it more from
> functionality point of view.
>
> @Vincent: Can you add some comments here?

Sorry for this late reply.

We have faced some situations on TC2 (as an example) where the tasks
are running in the LITTLE cluster whereas some periodic works stay on
the big cluster so we can have one cluster that wakes up for tasks and
another one that wakes up for work. We would like to consolidate the
behaviour of the work with the tasks behaviour.

Sorry, I don't have relevant figures as the patches are used with
other ones which also impact the power consumption.

This series introduces the possibility to run a work on another CPU
which is necessary if we want a better correlation of task and work
scheduling on the system. Most of the time the queue_work is used when
a driver don't mind the CPU on which you want to run whereas it looks
like it should be used only if you want to run locally. We would like
to solve this point with the new interface that is proposed by viresh

Vincent

>
> --
> viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2012-11-06 10:38 ` [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one Viresh Kumar
  2012-11-26 17:15   ` Tejun Heo
@ 2012-11-27 13:26   ` Steven Rostedt
  2012-11-27 13:48     ` Viresh Kumar
  1 sibling, 1 reply; 48+ messages in thread
From: Steven Rostedt @ 2012-11-27 13:26 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users

On Tue, 2012-11-06 at 16:08 +0530, Viresh Kumar wrote:
> Workqueues queues work on current cpu, if the caller haven't passed a preferred
> cpu. This may wake up an idle CPU, which is actually not required.
> 
> This work can be processed by any CPU and so we must select a non-idle CPU here.
> This patch adds in support in workqueue framework to get preferred CPU details
> from the scheduler, instead of using current CPU.
> 
> Most of the time when a work is queued, the current cpu isn't idle and so we
> will choose it only. There are cases when a cpu is idle when it queues some
> work. For example, consider following scenario:
> - A cpu has programmed a timer and is IDLE now.
> - CPU gets into interrupt handler due to timer and queues a work. As the CPU is
>   currently IDLE, we can queue this work to some other CPU.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  kernel/workqueue.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 042d221..d32efa2 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1238,7 +1238,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
>  		struct global_cwq *last_gcwq;
>  
>  		if (cpu == WORK_CPU_UNBOUND)
> -			cpu = raw_smp_processor_id();
> +			cpu = sched_select_cpu(0);

A couple of things. The sched_select_cpu() is not cheap. It has a double
loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
and we are CPU 1023 and all other CPUs happen to be idle, we could be
searching 1023 CPUs before we come up with our own.

__queue_work() should be fast as there is a reason that it is delaying
the work and not running it itself.

Also, I really don't like this as a default behavior. It seems that this
solution is for a very special case, and this can become very intrusive
for the normal case.

To be honest, I'm uncomfortable with this approach. It seems to be
fighting a symptom and not the disease. I'd rather find a way to keep
work from being queued on wrong CPU. If it is a timer, find a way to
move the timer. If it is something else, lets work to fix that. Doing
searches of possibly all CPUs (unlikely, but it is there), just seems
wrong to me.

-- Steve


>  
>  		/*
>  		 * It's multi cpu.  If @work was previously on a different
> @@ -1383,7 +1383,7 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
>  		if (gcwq)
>  			lcpu = gcwq->cpu;
>  		if (lcpu == WORK_CPU_UNBOUND)
> -			lcpu = raw_smp_processor_id();
> +			lcpu = sched_select_cpu(0);
>  	} else {
>  		lcpu = WORK_CPU_UNBOUND;
>  	}



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2012-11-06 10:38 ` [PATCH V2 Resend 4/4] timer: Migrate running timer Viresh Kumar
@ 2012-11-27 13:47   ` Steven Rostedt
  2013-03-20 15:13     ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Steven Rostedt @ 2012-11-27 13:47 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users, john stultz

[ Added John Stultz ]

On Tue, 2012-11-06 at 16:08 +0530, Viresh Kumar wrote:
> Till now, we weren't migrating a running timer because with migration
> del_timer_sync() can't detect that the timer's handler yet has not finished.
> 
> Now, when can we actually to reach to the code (inside __mod_timer()) where
> 
> base->running_timer == timer ? i.e. We are trying to migrate current timer
> 
> I can see only following case:
> - Timer re-armed itself. i.e. Currently we are running interrupt handler of a
>   timer and it rearmed itself from there. At this time user might have called
>   del_timer_sync() or not. If not, then there is no harm in re-arming the timer?
> 
> Now, when somebody tries to delete a timer, obviously he doesn't want to run it
> any more for now. So, why should we ever re-arm a timer, which is scheduled for
> deletion?
> 
> This patch tries to fix "migration of running timer", assuming above theory is
> correct :)
> 

That's a question for Thomas or John (hello! Thomas or John :-)

> So, now when we get a call to del_timer_sync(), we will mark it scheduled for
> deletion in an additional variable. This would be checked whenever we try to
> modify/arm it again. If it was currently scheduled for deletion, we must not
> modify/arm it.
> 
> And so, whenever we reach to the situation where:
> base->running_timer == timer
> 
> We are sure, nobody is waiting in del_timer_sync().
> 
> We will clear this flag as soon as the timer is deleted, so that it can be
> started again after deleting it successfully.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  include/linux/timer.h |  2 ++
>  kernel/timer.c        | 42 +++++++++++++++++++++++++-----------------
>  2 files changed, 27 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 8c5a197..6aa720f 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -22,6 +22,7 @@ struct timer_list {
>  	unsigned long data;
>  
>  	int slack;
> +	int sched_del;

Make that a bool, as it's just a flag. Maybe gcc can optimize or
something.

>  
>  #ifdef CONFIG_TIMER_STATS
>  	int start_pid;
> @@ -77,6 +78,7 @@ extern struct tvec_base boot_tvec_bases;
>  		.data = (_data),				\
>  		.base = (void *)((unsigned long)&boot_tvec_bases + (_flags)), \
>  		.slack = -1,					\
> +		.sched_del = 0,					\
>  		__TIMER_LOCKDEP_MAP_INITIALIZER(		\
>  			__FILE__ ":" __stringify(__LINE__))	\
>  	}
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 1170ece..14e1f76 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -622,6 +622,7 @@ static void do_init_timer(struct timer_list *timer, unsigned int flags,
>  	timer->entry.next = NULL;
>  	timer->base = (void *)((unsigned long)base | flags);
>  	timer->slack = -1;
> +	timer->sched_del = 0;
>  #ifdef CONFIG_TIMER_STATS
>  	timer->start_site = NULL;
>  	timer->start_pid = -1;
> @@ -729,6 +730,12 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
>  
>  	base = lock_timer_base(timer, &flags);
>  
> +	if (timer->sched_del) {
> +		/* Don't schedule it again, as it is getting deleted */
> +		ret = -EBUSY;
> +		goto out_unlock;
> +	}
> +
>  	ret = detach_if_pending(timer, base, false);
>  	if (!ret && pending_only)
>  		goto out_unlock;
> @@ -746,21 +753,12 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
>  	new_base = per_cpu(tvec_bases, cpu);
>  
>  	if (base != new_base) {
> -		/*
> -		 * We are trying to schedule the timer on the local CPU.
> -		 * However we can't change timer's base while it is running,
> -		 * otherwise del_timer_sync() can't detect that the timer's
> -		 * handler yet has not finished. This also guarantees that
> -		 * the timer is serialized wrt itself.
> -		 */
> -		if (likely(base->running_timer != timer)) {
> -			/* See the comment in lock_timer_base() */
> -			timer_set_base(timer, NULL);
> -			spin_unlock(&base->lock);
> -			base = new_base;
> -			spin_lock(&base->lock);
> -			timer_set_base(timer, base);
> -		}
> +		/* See the comment in lock_timer_base() */
> +		timer_set_base(timer, NULL);
> +		spin_unlock(&base->lock);
> +		base = new_base;
> +		spin_lock(&base->lock);
> +		timer_set_base(timer, base);
>  	}
>  
>  	timer->expires = expires;
> @@ -1039,9 +1037,11 @@ EXPORT_SYMBOL(try_to_del_timer_sync);
>   */
>  int del_timer_sync(struct timer_list *timer)
>  {
> -#ifdef CONFIG_LOCKDEP
> +	struct tvec_base *base;
>  	unsigned long flags;
>  
> +#ifdef CONFIG_LOCKDEP
> +
>  	/*
>  	 * If lockdep gives a backtrace here, please reference
>  	 * the synchronization rules above.
> @@ -1051,6 +1051,12 @@ int del_timer_sync(struct timer_list *timer)
>  	lock_map_release(&timer->lockdep_map);
>  	local_irq_restore(flags);
>  #endif
> +
> +	/* Timer is scheduled for deletion, don't let it re-arm itself */
> +	base = lock_timer_base(timer, &flags);
> +	timer->sched_del = 1;
> +	spin_unlock_irqrestore(&base->lock, flags);

I don't think this is good enough. For one thing, it doesn't handle
try_to_del_timer_sync() or even del_timer_sync() for that matter. As
that may return success when the timer happens to be running on another
CPU.

We have this:

	CPU0			CPU1
	----			----
   timerA (running)
   mod_timer(timerA)
   [ migrate to CPU2 ]
   release timer base lock
 			   del_timer_sync(timerA)
			   timer->sched_del = true
			   try_to_del_timer_sync(timerA)
				base(CPU2)->timer != timerA
				[TRUE!]
  timerA (finishes)

Fail!

-- Steve

			   

> +
>  	/*
>  	 * don't use it in hardirq context, because it
>  	 * could lead to deadlock.
> @@ -1058,8 +1064,10 @@ int del_timer_sync(struct timer_list *timer)
>  	WARN_ON(in_irq() && !tbase_get_irqsafe(timer->base));
>  	for (;;) {
>  		int ret = try_to_del_timer_sync(timer);
> -		if (ret >= 0)
> +		if (ret >= 0) {
> +			timer->sched_del = 0;
>  			return ret;
> +		}
>  		cpu_relax();
>  	}
>  }



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2012-11-27 13:26   ` Steven Rostedt
@ 2012-11-27 13:48     ` Viresh Kumar
  2012-11-27 13:59       ` Steven Rostedt
  0 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2012-11-27 13:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users

On 27 November 2012 18:56, Steven Rostedt <rostedt@goodmis.org> wrote:
> A couple of things. The sched_select_cpu() is not cheap. It has a double
> loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
> and we are CPU 1023 and all other CPUs happen to be idle, we could be
> searching 1023 CPUs before we come up with our own.

Not sure if you missed the first check sched_select_cpu()

+int sched_select_cpu(unsigned int sd_flags)
+{
+       /* If Current cpu isn't idle, don't migrate anything */
+       if (!idle_cpu(cpu))
+               return cpu;

We aren't going to search if we aren't idle.

> Also, I really don't like this as a default behavior. It seems that this
> solution is for a very special case, and this can become very intrusive
> for the normal case.

We tried with an KCONFIG option for it, which Tejun rejected.

> To be honest, I'm uncomfortable with this approach. It seems to be
> fighting a symptom and not the disease. I'd rather find a way to keep
> work from being queued on wrong CPU. If it is a timer, find a way to
> move the timer. If it is something else, lets work to fix that. Doing
> searches of possibly all CPUs (unlikely, but it is there), just seems
> wrong to me.

As Vincent pointed out, on big LITTLE systems we just don't want to
serve works on big cores. That would be wasting too much of power.
Specially if we are going to wake up big cores.

It would be difficult to control the source driver (which queues work) to
little cores. We thought, if somebody wanted to queue work on current
cpu then they must use queue_work_on().

--
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2012-11-27 13:48     ` Viresh Kumar
@ 2012-11-27 13:59       ` Steven Rostedt
  2012-11-27 14:55         ` Vincent Guittot
  0 siblings, 1 reply; 48+ messages in thread
From: Steven Rostedt @ 2012-11-27 13:59 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: pjt, paul.mckenney, tglx, tj, suresh.b.siddha, venki, mingo,
	peterz, Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users

On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote:
> On 27 November 2012 18:56, Steven Rostedt <rostedt@goodmis.org> wrote:
> > A couple of things. The sched_select_cpu() is not cheap. It has a double
> > loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
> > and we are CPU 1023 and all other CPUs happen to be idle, we could be
> > searching 1023 CPUs before we come up with our own.
> 
> Not sure if you missed the first check sched_select_cpu()
> 
> +int sched_select_cpu(unsigned int sd_flags)
> +{
> +       /* If Current cpu isn't idle, don't migrate anything */
> +       if (!idle_cpu(cpu))
> +               return cpu;
> 
> We aren't going to search if we aren't idle.

OK, we are idle, but CPU 1022 isn't. We still need a large search. But,
heh we are idle we can spin. But then why go through this in the first
place ;-)


> 
> > Also, I really don't like this as a default behavior. It seems that this
> > solution is for a very special case, and this can become very intrusive
> > for the normal case.
> 
> We tried with an KCONFIG option for it, which Tejun rejected.

Yeah, I saw that. I don't like adding KCONFIG options either. Best is to
get something working that doesn't add any regressions. If you can get
this to work without making *any* regressions in the normal case than
I'm totally fine with that. But if this adds any issues with the normal
case, then it's a show stopper.

> 
> > To be honest, I'm uncomfortable with this approach. It seems to be
> > fighting a symptom and not the disease. I'd rather find a way to keep
> > work from being queued on wrong CPU. If it is a timer, find a way to
> > move the timer. If it is something else, lets work to fix that. Doing
> > searches of possibly all CPUs (unlikely, but it is there), just seems
> > wrong to me.
> 
> As Vincent pointed out, on big LITTLE systems we just don't want to
> serve works on big cores. That would be wasting too much of power.
> Specially if we are going to wake up big cores.
> 
> It would be difficult to control the source driver (which queues work) to
> little cores. We thought, if somebody wanted to queue work on current
> cpu then they must use queue_work_on().

As Tejun has mentioned earlier, is there any assumptions anywhere that
expects an unbounded work queue to not migrate? Where per cpu variables
might be used. Tejun had a good idea of forcing this to migrate the work
*every* time. To not let a work queue run on the same CPU that it was
queued on. If it can survive that, then it is probably OK. Maybe add a
config option that forces this? That way, anyone can test that this
isn't an issue.

-- Steve



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2012-11-27 13:59       ` Steven Rostedt
@ 2012-11-27 14:55         ` Vincent Guittot
  2012-11-27 15:04           ` Steven Rostedt
  0 siblings, 1 reply; 48+ messages in thread
From: Vincent Guittot @ 2012-11-27 14:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Viresh Kumar, pjt, paul.mckenney, tglx, tj, suresh.b.siddha,
	venki, mingo, peterz, Arvind.Chauhan, linaro-dev, patches,
	pdsw-power-team, linux-kernel, linux-rt-users

On 27 November 2012 14:59, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote:
>> On 27 November 2012 18:56, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > A couple of things. The sched_select_cpu() is not cheap. It has a double
>> > loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
>> > and we are CPU 1023 and all other CPUs happen to be idle, we could be
>> > searching 1023 CPUs before we come up with our own.
>>
>> Not sure if you missed the first check sched_select_cpu()
>>
>> +int sched_select_cpu(unsigned int sd_flags)
>> +{
>> +       /* If Current cpu isn't idle, don't migrate anything */
>> +       if (!idle_cpu(cpu))
>> +               return cpu;
>>
>> We aren't going to search if we aren't idle.
>
> OK, we are idle, but CPU 1022 isn't. We still need a large search. But,
> heh we are idle we can spin. But then why go through this in the first
> place ;-)

By migrating it now, it will create its activity and wake up on the
right CPU next time.

If migrating on any CPUs seems a bit risky, we could restrict the
migration on a CPU on the same node. We can pass such contraints on
sched_select_cpu

>
>
>>
>> > Also, I really don't like this as a default behavior. It seems that this
>> > solution is for a very special case, and this can become very intrusive
>> > for the normal case.
>>
>> We tried with an KCONFIG option for it, which Tejun rejected.
>
> Yeah, I saw that. I don't like adding KCONFIG options either. Best is to
> get something working that doesn't add any regressions. If you can get
> this to work without making *any* regressions in the normal case than
> I'm totally fine with that. But if this adds any issues with the normal
> case, then it's a show stopper.
>
>>
>> > To be honest, I'm uncomfortable with this approach. It seems to be
>> > fighting a symptom and not the disease. I'd rather find a way to keep
>> > work from being queued on wrong CPU. If it is a timer, find a way to
>> > move the timer. If it is something else, lets work to fix that. Doing
>> > searches of possibly all CPUs (unlikely, but it is there), just seems
>> > wrong to me.
>>
>> As Vincent pointed out, on big LITTLE systems we just don't want to
>> serve works on big cores. That would be wasting too much of power.
>> Specially if we are going to wake up big cores.
>>
>> It would be difficult to control the source driver (which queues work) to
>> little cores. We thought, if somebody wanted to queue work on current
>> cpu then they must use queue_work_on().
>
> As Tejun has mentioned earlier, is there any assumptions anywhere that
> expects an unbounded work queue to not migrate? Where per cpu variables
> might be used. Tejun had a good idea of forcing this to migrate the work
> *every* time. To not let a work queue run on the same CPU that it was
> queued on. If it can survive that, then it is probably OK. Maybe add a
> config option that forces this? That way, anyone can test that this
> isn't an issue.
>
> -- Steve
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2012-11-27 14:55         ` Vincent Guittot
@ 2012-11-27 15:04           ` Steven Rostedt
  2012-11-27 15:35             ` Vincent Guittot
  0 siblings, 1 reply; 48+ messages in thread
From: Steven Rostedt @ 2012-11-27 15:04 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Viresh Kumar, pjt, paul.mckenney, tglx, tj, suresh.b.siddha,
	venki, mingo, peterz, Arvind.Chauhan, linaro-dev, patches,
	pdsw-power-team, linux-kernel, linux-rt-users

On Tue, 2012-11-27 at 15:55 +0100, Vincent Guittot wrote:
> On 27 November 2012 14:59, Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote:
> >> On 27 November 2012 18:56, Steven Rostedt <rostedt@goodmis.org> wrote:
> >> > A couple of things. The sched_select_cpu() is not cheap. It has a double
> >> > loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
> >> > and we are CPU 1023 and all other CPUs happen to be idle, we could be
> >> > searching 1023 CPUs before we come up with our own.
> >>
> >> Not sure if you missed the first check sched_select_cpu()
> >>
> >> +int sched_select_cpu(unsigned int sd_flags)
> >> +{
> >> +       /* If Current cpu isn't idle, don't migrate anything */
> >> +       if (!idle_cpu(cpu))
> >> +               return cpu;
> >>
> >> We aren't going to search if we aren't idle.
> >
> > OK, we are idle, but CPU 1022 isn't. We still need a large search. But,
> > heh we are idle we can spin. But then why go through this in the first
> > place ;-)
> 
> By migrating it now, it will create its activity and wake up on the
> right CPU next time.
> 
> If migrating on any CPUs seems a bit risky, we could restrict the
> migration on a CPU on the same node. We can pass such contraints on
> sched_select_cpu
> 

That's assuming that the CPUs stay idle. Now if we move the work to
another CPU and it goes idle, then it may move that again. It could end
up being a ping pong approach.

I don't think idle is a strong enough heuristic for the general case. If
interrupts are constantly going off on a CPU that happens to be idle
most of the time, it will constantly be moving work onto CPUs that are
currently doing real work, and by doing so, it will be slowing those
CPUs down.

-- Steve



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2012-11-27 15:04           ` Steven Rostedt
@ 2012-11-27 15:35             ` Vincent Guittot
  0 siblings, 0 replies; 48+ messages in thread
From: Vincent Guittot @ 2012-11-27 15:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Viresh Kumar, pjt, paul.mckenney, tglx, tj, suresh.b.siddha,
	venki, mingo, peterz, Arvind.Chauhan, linaro-dev, patches,
	pdsw-power-team, linux-kernel, linux-rt-users

On 27 November 2012 16:04, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2012-11-27 at 15:55 +0100, Vincent Guittot wrote:
>> On 27 November 2012 14:59, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote:
>> >> On 27 November 2012 18:56, Steven Rostedt <rostedt@goodmis.org> wrote:
>> >> > A couple of things. The sched_select_cpu() is not cheap. It has a double
>> >> > loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
>> >> > and we are CPU 1023 and all other CPUs happen to be idle, we could be
>> >> > searching 1023 CPUs before we come up with our own.
>> >>
>> >> Not sure if you missed the first check sched_select_cpu()
>> >>
>> >> +int sched_select_cpu(unsigned int sd_flags)
>> >> +{
>> >> +       /* If Current cpu isn't idle, don't migrate anything */
>> >> +       if (!idle_cpu(cpu))
>> >> +               return cpu;
>> >>
>> >> We aren't going to search if we aren't idle.
>> >
>> > OK, we are idle, but CPU 1022 isn't. We still need a large search. But,
>> > heh we are idle we can spin. But then why go through this in the first
>> > place ;-)
>>
>> By migrating it now, it will create its activity and wake up on the
>> right CPU next time.
>>
>> If migrating on any CPUs seems a bit risky, we could restrict the
>> migration on a CPU on the same node. We can pass such contraints on
>> sched_select_cpu
>>
>
> That's assuming that the CPUs stay idle. Now if we move the work to
> another CPU and it goes idle, then it may move that again. It could end
> up being a ping pong approach.
>
> I don't think idle is a strong enough heuristic for the general case. If
> interrupts are constantly going off on a CPU that happens to be idle
> most of the time, it will constantly be moving work onto CPUs that are
> currently doing real work, and by doing so, it will be slowing those
> CPUs down.
>

I agree that idle is probably not enough but it's the heuristic that
is currently used for selecting a CPU for a timer and the timer also
uses sched_select_cpu in this series. So in order to go step by step,
a common interface has been introduced for selecting a CPU and this
function uses the same algorithm than the timer already do.
Once we agreed on an interface, the heuristic could be updated.


> -- Steve
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2012-11-27  5:19     ` Viresh Kumar
  2012-11-27 12:54       ` Vincent Guittot
@ 2013-01-04 11:11       ` Viresh Kumar
  2013-01-04 15:09         ` Tejun Heo
  1 sibling, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2013-01-04 11:11 UTC (permalink / raw)
  To: Tejun Heo, Vincent Guittot
  Cc: pjt, paul.mckenney, tglx, suresh.b.siddha, venki, mingo, peterz,
	rostedt, Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users

Hi Tejun,

On 27 November 2012 10:49, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 26 November 2012 22:45, Tejun Heo <tj@kernel.org> wrote:
>> On Tue, Nov 06, 2012 at 04:08:45PM +0530, Viresh Kumar wrote:
>
>> I'm pretty skeptical about this.  queue_work() w/o explicit CPU
>> assignment has always guaranteed that the work item will be executed
>> on the same CPU.  I don't think there are too many users depending on
>> that but am not sure at all that there are none.  I asked you last
>> time that you would at the very least need to audit most users but it
>> doesn't seem like there has been any effort there.
>
> My bad. I completely missed/forgot that comment from your earlier mails.
> Will do it.

And this is the first thing i did this year :)

So there are almost 1200 files which are using: queue_work, queue_delayed_work,
schedule_work, schedule_delayed_work or schedule_on_each_cpu

Obviously i can't try to understand all of them :) , and so i tried to
search two
strings in them: "cpu" or "processor_id". So, this would catch every
file of these 1200
odd files which use some variables/comment/code with cpu or smp_processor_id or
raw_processor_id, etc..

I got around 650 files with these two searches.. Then i went into
these files to see if
there is anything noticeable, which may break due to this patch...

I got a list of files where cpu/processor_id strings are found, which
may break with
this patch (still can't guarantee as i don't have knowledge of these drivers)...

- arch/powerpc/mm/numa.c
- arch/s390/appldata/appldata_base.c
- arch/s390/kernel/time.c
- arch/s390/kernel/topology.c
- arch/s390/oprofile/hwsampler.c
- arch/x86/kernel/cpu/mcheck/mce.c
- arch/x86/kernel/tsc.c
- arch/x86/kvm/x86.c
- block/blk-core.c
- block/blk-throttle.c
- block/blk-genhd.c
- crypto/cryptd.c
- drivers/base/power/domain.c
- drivers/cpufreq/cpufreq.c
- drivers/hv/vmbus_drv.c
- drivers/infiniband/hw/qib/qib_iba7322.c and
drivers/infiniband/hw/qib/qib_init.c
- drivers/md/dm-crypt.c
- drivers/md/md.c
- drivers/net/ethernet/sfc/efx.c
- drivers/net/ethernet/tile/tilepro.c
- drivers/net/team/team_mode_loadbalance.c
- drivers/oprofile/cpu_buffer.c
- drivers/s390/kvm/kvm_virtio.c
- drivers/scsi/fcoe/fcoe.c
- drivers/tty/sysrq.c
- drivers/xen/pcpu.c
- fs/file.c, file_table.c, fs/fscache/object.c
- include/linux/stop_machine.h, kernel/stop_machine.c
- mm/percpu.c
- mm/slab.c
- mm/vmstat.c
- net/core/flow.c
- net/iucv/iucv.c

I am not sure what to do now :) , can you assist ?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2013-01-04 11:11       ` Viresh Kumar
@ 2013-01-04 15:09         ` Tejun Heo
  2013-01-07  9:58           ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2013-01-04 15:09 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Vincent Guittot, pjt, paul.mckenney, tglx, suresh.b.siddha,
	venki, mingo, peterz, rostedt, Arvind.Chauhan, linaro-dev,
	patches, pdsw-power-team, linux-kernel, linux-rt-users

Hello, Viresh.

On Fri, Jan 04, 2013 at 04:41:47PM +0530, Viresh Kumar wrote:
> I got a list of files where cpu/processor_id strings are found, which
> may break with
> this patch (still can't guarantee as i don't have knowledge of these drivers)...
...
> I am not sure what to do now :) , can you assist ?

I don't know either.  Changing behavior subtly like this is hard.  I
usually try to spot some problem cases and try to identify patterns
there.  Once you identify a few of them, understanding and detecting
other problem cases get a lot easier.  In this case, maybe there are
too many places to audit and the problems are too subtle, and, if we
*have* to do it, the only thing we can do is implementing a debug
option to make such problems more visible - say, always schedule to a
different cpu on queue_work().

That said, at this point, the patchset doesn't seem all that
convincing to me and if I'm comprehending responses from others
correctly that seems to be the consensus.  It might be a better
approach to identify the specific offending workqueue users and make
them handle the situation more intelligently than trying to impose the
behavior on all workqueue users.  At any rate, we need way more data
showing this actually helps and if so why.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2013-01-04 15:09         ` Tejun Heo
@ 2013-01-07  9:58           ` Viresh Kumar
  2013-01-07 13:28             ` Steven Rostedt
  2013-01-07 15:04             ` Tejun Heo
  0 siblings, 2 replies; 48+ messages in thread
From: Viresh Kumar @ 2013-01-07  9:58 UTC (permalink / raw)
  To: rostedt, Tejun Heo
  Cc: Vincent Guittot, pjt, paul.mckenney, tglx, suresh.b.siddha,
	venki, mingo, peterz, Arvind.Chauhan, linaro-dev, patches,
	pdsw-power-team, linux-kernel, linux-rt-users

Hi Tejun,

On 4 January 2013 20:39, Tejun Heo <tj@kernel.org> wrote:
> I don't know either.  Changing behavior subtly like this is hard.  I
> usually try to spot some problem cases and try to identify patterns
> there.  Once you identify a few of them, understanding and detecting
> other problem cases get a lot easier.  In this case, maybe there are
> too many places to audit and the problems are too subtle, and, if we
> *have* to do it, the only thing we can do is implementing a debug
> option to make such problems more visible - say, always schedule to a
> different cpu on queue_work().
>
> That said, at this point, the patchset doesn't seem all that
> convincing to me and if I'm comprehending responses from others
> correctly that seems to be the consensus.  It might be a better
> approach to identify the specific offending workqueue users and make
> them handle the situation more intelligently than trying to impose the
> behavior on all workqueue users.  At any rate, we need way more data
> showing this actually helps and if so why.

I understand your concerns and believe me, even i feel the same :)
I had another idea, that i wanted to share.

Firstly the root cause of this patchset.

Myself and some others in Linaro are working on ARM future cores:
big.LITTLE systems.
Here we have few very powerful, high power consuming cores (big,
currently A15's) and
few very efficient ones (LITTLE, currently A7's).

The ultimate goal is to save as much power as possible without compromising
much with performance. For, that we wanted most of the stuff to run on LITTLE
cores and some performance-demanding stuff on big Cores. There are
multiple things
going around in this direction. Now, we thought A15's or big cores
shouldn't be used
for running small tasks like timers/workqueues and hence this patch is
an attempt
towards reaching that goal.

Over that we can do some load balancing of works within multiple alive
cpus, so that
it can get done quickly. Also, we shouldn't start using an idle cpu
just for processing
work :)

I have another idea that we can try:

queue_work_on_any_cpu().

With this we would not break any existing code and can try to migrate
old users to
this new infrastructure (atleast the ones which are rearming works from their
work_handlers). What do you say?

To take care of the cache locality issue, we can pass an argument to
this routine,
that can provide
- the mask of cpus to schedule this work on
  OR
- Sched Level (SD_LEVEL) of cpus to run it.

Waiting for your view on it :)

--
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2013-01-07  9:58           ` Viresh Kumar
@ 2013-01-07 13:28             ` Steven Rostedt
  2013-01-07 17:59               ` Viresh Kumar
  2013-01-07 15:04             ` Tejun Heo
  1 sibling, 1 reply; 48+ messages in thread
From: Steven Rostedt @ 2013-01-07 13:28 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Tejun Heo, Vincent Guittot, pjt, paul.mckenney, tglx,
	suresh.b.siddha, venki, mingo, peterz, Arvind.Chauhan,
	linaro-dev, patches, pdsw-power-team, linux-kernel,
	linux-rt-users

On Mon, 2013-01-07 at 15:28 +0530, Viresh Kumar wrote:
> Hi Tejun,
> 
> On 4 January 2013 20:39, Tejun Heo <tj@kernel.org> wrote:
> > I don't know either.  Changing behavior subtly like this is hard.  I
> > usually try to spot some problem cases and try to identify patterns
> > there.  Once you identify a few of them, understanding and detecting
> > other problem cases get a lot easier.  In this case, maybe there are
> > too many places to audit and the problems are too subtle, and, if we
> > *have* to do it, the only thing we can do is implementing a debug
> > option to make such problems more visible - say, always schedule to a
> > different cpu on queue_work().
> >
> > That said, at this point, the patchset doesn't seem all that
> > convincing to me and if I'm comprehending responses from others
> > correctly that seems to be the consensus.  It might be a better
> > approach to identify the specific offending workqueue users and make
> > them handle the situation more intelligently than trying to impose the
> > behavior on all workqueue users.  At any rate, we need way more data
> > showing this actually helps and if so why.
> 
> I understand your concerns and believe me, even i feel the same :)
> I had another idea, that i wanted to share.
> 
> Firstly the root cause of this patchset.
> 
> Myself and some others in Linaro are working on ARM future cores:
> big.LITTLE systems.
> Here we have few very powerful, high power consuming cores (big,
> currently A15's) and
> few very efficient ones (LITTLE, currently A7's).
> 
> The ultimate goal is to save as much power as possible without compromising
> much with performance. For, that we wanted most of the stuff to run on LITTLE
> cores and some performance-demanding stuff on big Cores. There are
> multiple things
> going around in this direction. Now, we thought A15's or big cores
> shouldn't be used
> for running small tasks like timers/workqueues and hence this patch is
> an attempt
> towards reaching that goal.
> 
> Over that we can do some load balancing of works within multiple alive
> cpus, so that
> it can get done quickly. Also, we shouldn't start using an idle cpu
> just for processing
> work :)
> 
> I have another idea that we can try:
> 
> queue_work_on_any_cpu().

I think this is a good idea.

> 
> With this we would not break any existing code and can try to migrate
> old users to
> this new infrastructure (atleast the ones which are rearming works from their
> work_handlers). What do you say?
> 
> To take care of the cache locality issue, we can pass an argument to
> this routine,
> that can provide
> - the mask of cpus to schedule this work on
>   OR
> - Sched Level (SD_LEVEL) of cpus to run it.

I wouldn't give a mask. If one is needed, we could have a
queue_work_on_mask_cpus(), or something. I think the "any" in the name
should be good enough to let developers know that this will not be on
the CPU that is called. By default, I would suggest for cache locality,
that we try to keep it on the same CPU. But if there's a better CPU to
run on, it runs there. Also, we could still add a debug option that
makes it always run on other CPUs to slap developers that don't read.

-- Steve

> 
> Waiting for your view on it :)
> 
> --
> viresh



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2013-01-07  9:58           ` Viresh Kumar
  2013-01-07 13:28             ` Steven Rostedt
@ 2013-01-07 15:04             ` Tejun Heo
  2013-01-07 15:40               ` Amit Kucheria
  2013-01-07 18:07               ` Viresh Kumar
  1 sibling, 2 replies; 48+ messages in thread
From: Tejun Heo @ 2013-01-07 15:04 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: rostedt, Vincent Guittot, pjt, paul.mckenney, tglx,
	suresh.b.siddha, venki, mingo, peterz, Arvind.Chauhan,
	linaro-dev, patches, pdsw-power-team, linux-kernel,
	linux-rt-users

Hello, Viresh.

On Mon, Jan 07, 2013 at 03:28:33PM +0530, Viresh Kumar wrote:
> Firstly the root cause of this patchset.
> 
> Myself and some others in Linaro are working on ARM future cores:
> big.LITTLE systems.
> Here we have few very powerful, high power consuming cores (big,
> currently A15's) and
> few very efficient ones (LITTLE, currently A7's).
> 
> The ultimate goal is to save as much power as possible without compromising
> much with performance. For, that we wanted most of the stuff to run on LITTLE
> cores and some performance-demanding stuff on big Cores. There are
> multiple things
> going around in this direction. Now, we thought A15's or big cores
> shouldn't be used
> for running small tasks like timers/workqueues and hence this patch is
> an attempt
> towards reaching that goal.

I see.  My understanding of big.little is very superficial so there
are very good chances that I'm utterly wrong, but to me it seems like
more of a stop-gap solution than something which will have staying
power in the sense that the complexity doesn't seem to have any
inherent reason other than the failure to make the big ones power
efficient enough while idle or under minor load.

Maybe this isn't specific to big.little and heterogeneous cores will
be the way of future with big.little being just a forefront of the
trend.  And even if that's not the case, it definitely doesn't mean
that we can't accomodate big.little, but, at this point, it does make
me a bit more reluctant to make wholesale changes specifically for
big.little.

> Over that we can do some load balancing of works within multiple alive
> cpus, so that
> it can get done quickly. Also, we shouldn't start using an idle cpu
> just for processing
> work :)

The latter part "not using idle cpu just for processing work" does
apply to homogeneous systems too but as I wrote earlier work items
don't spontaneously happen on an idle core.  Something, including
timer, needs to kick it.  So, by definition, a CPU already isn't idle
when a work item starts execution on it.  What am I missing here?

> I have another idea that we can try:
> 
> queue_work_on_any_cpu().
>
> With this we would not break any existing code and can try to migrate
> old users to
> this new infrastructure (atleast the ones which are rearming works from their
> work_handlers). What do you say?

Yeah, this could be a better solution, I think.  Plus, it's not like
finding the optimal cpu is free.

> To take care of the cache locality issue, we can pass an argument to
> this routine,
> that can provide
> - the mask of cpus to schedule this work on
>   OR
> - Sched Level (SD_LEVEL) of cpus to run it.

Let's start simple for now.  If we really need it, we can always add
more later.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2013-01-07 15:04             ` Tejun Heo
@ 2013-01-07 15:40               ` Amit Kucheria
  2013-01-07 18:07               ` Viresh Kumar
  1 sibling, 0 replies; 48+ messages in thread
From: Amit Kucheria @ 2013-01-07 15:40 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Viresh Kumar, venki, Suresh Siddha, Patch Tracking,
	Peter Zijlstra, linux-kernel, Steven Rostedt, Paul McKenney,
	mingo, PDSW-power-team, Lists linaro-dev, Thomas Gleixner, pjt,
	linux-rt-users

On Mon, Jan 7, 2013 at 8:34 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Viresh.
>
> On Mon, Jan 07, 2013 at 03:28:33PM +0530, Viresh Kumar wrote:
>> Firstly the root cause of this patchset.
>>
>> Myself and some others in Linaro are working on ARM future cores:
>> big.LITTLE systems.
>> Here we have few very powerful, high power consuming cores (big,
>> currently A15's) and
>> few very efficient ones (LITTLE, currently A7's).
>>
>> The ultimate goal is to save as much power as possible without compromising
>> much with performance. For, that we wanted most of the stuff to run on LITTLE
>> cores and some performance-demanding stuff on big Cores. There are
>> multiple things
>> going around in this direction. Now, we thought A15's or big cores
>> shouldn't be used
>> for running small tasks like timers/workqueues and hence this patch is
>> an attempt
>> towards reaching that goal.
>
> I see.  My understanding of big.little is very superficial so there
> are very good chances that I'm utterly wrong, but to me it seems like
> more of a stop-gap solution than something which will have staying
> power in the sense that the complexity doesn't seem to have any
> inherent reason other than the failure to make the big ones power
> efficient enough while idle or under minor load.

The two processors use different manufacturing processes - one
optimised for performance, the other for really low power. So the
reason is physics at this point. Other architectures are known to be
playing with similar schemes. ARM's big.LITTLE is just the first one
to the market.

> Maybe this isn't specific to big.little and heterogeneous cores will
> be the way of future with big.little being just a forefront of the
> trend.  And even if that's not the case, it definitely doesn't mean
> that we can't accomodate big.little, but, at this point, it does make
> me a bit more reluctant to make wholesale changes specifically for
> big.little.

The patches aren't targeted to benefit only b.L systems though it was
definitely the trigger for our investigations. Our hope is that after
presenting more analysis results from our side we'll be able to
convince the community of the general usefulness of this feature.

Here are a few short introductions to big.LITTLE in case you're
interested.[1][2]

[1] http://www.arm.com/files/downloads/big.LITTLE_Final.pdf
[2] http://lwn.net/Articles/481055/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2013-01-07 13:28             ` Steven Rostedt
@ 2013-01-07 17:59               ` Viresh Kumar
  2013-01-07 22:29                 ` Steven Rostedt
  0 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2013-01-07 17:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Tejun Heo, Vincent Guittot, pjt, paul.mckenney, tglx,
	suresh.b.siddha, venki, mingo, peterz, Arvind.Chauhan,
	linaro-dev, patches, pdsw-power-team, linux-kernel,
	linux-rt-users

On 7 January 2013 18:58, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 2013-01-07 at 15:28 +0530, Viresh Kumar wrote:
>> I have another idea that we can try:
>>
>> queue_work_on_any_cpu().
>
> I think this is a good idea.

:) :)

>> - the mask of cpus to schedule this work on
>>   OR
>> - Sched Level (SD_LEVEL) of cpus to run it.
>
> I wouldn't give a mask. If one is needed, we could have a
> queue_work_on_mask_cpus(), or something. I think the "any" in the name
> should be good enough to let developers know that this will not be on
> the CPU that is called.

Fair Enough.

> By default, I would suggest for cache locality,
> that we try to keep it on the same CPU. But if there's a better CPU to
> run on, it runs there.

That would break our intention behind this routine. We should run
it on a cpu which we think is the best one for it power/performance wise.

So, i would like to call the sched_select_cpu() routine from this routine to
get the suggested cpu.

This routine would however would see more changes later to optimize it
more.

> Also, we could still add a debug option that
> makes it always run on other CPUs to slap developers that don't read.

I don't think we need it :(
It would be a new API, with zero existing users. And so, whomsoever uses
it, should know what exactly he is doing :)

Thanks for your quick feedback.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2013-01-07 15:04             ` Tejun Heo
  2013-01-07 15:40               ` Amit Kucheria
@ 2013-01-07 18:07               ` Viresh Kumar
  2013-01-09 18:49                 ` Tejun Heo
  1 sibling, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2013-01-07 18:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rostedt, Vincent Guittot, pjt, paul.mckenney, tglx, mingo,
	peterz, Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users

[Removed Suresh and Venki from discussion, they switched their companies
probably]

On 7 January 2013 20:34, Tejun Heo <tj@kernel.org> wrote:
> The latter part "not using idle cpu just for processing work" does
> apply to homogeneous systems too but as I wrote earlier work items
> don't spontaneously happen on an idle core.  Something, including
> timer, needs to kick it.  So, by definition, a CPU already isn't idle
> when a work item starts execution on it.  What am I missing here?

We are talking about a core being idle from schedulers perspective :)

>> I have another idea that we can try:
>>
>> queue_work_on_any_cpu().
>>
>> With this we would not break any existing code and can try to migrate
>> old users to
>> this new infrastructure (atleast the ones which are rearming works from their
>> work_handlers). What do you say?
>
> Yeah, this could be a better solution, I think.  Plus, it's not like
> finding the optimal cpu is free.

Thanks for the first part (When i shared this idea with Vincent and Amit, i
wasn't sure at all about the feedback i will get from you and others, but i
am very happy now :) ).

I couldn't understand the second part. We still need to search for a free cpu
for this new routine. And the implementation would almost be same as the
implementation of queue_work() in my initial patch

>> To take care of the cache locality issue, we can pass an argument to
>> this routine,
>> that can provide
>> - the mask of cpus to schedule this work on
>>   OR
>> - Sched Level (SD_LEVEL) of cpus to run it.
>
> Let's start simple for now.  If we really need it, we can always add
> more later.

:)
Agreed. But i liked the idea from steven, we can have two routines:
queue_work_on_any_cpu() and queue_work_on_cpus()

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2013-01-07 17:59               ` Viresh Kumar
@ 2013-01-07 22:29                 ` Steven Rostedt
  2013-01-08  4:03                   ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Steven Rostedt @ 2013-01-07 22:29 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Tejun Heo, Vincent Guittot, pjt, paul.mckenney, tglx,
	suresh.b.siddha, venki, mingo, peterz, Arvind.Chauhan,
	linaro-dev, patches, pdsw-power-team, linux-kernel,
	linux-rt-users

On Mon, 2013-01-07 at 23:29 +0530, Viresh Kumar wrote:

> > By default, I would suggest for cache locality,
> > that we try to keep it on the same CPU. But if there's a better CPU to
> > run on, it runs there.
> 
> That would break our intention behind this routine. We should run
> it on a cpu which we think is the best one for it power/performance wise.

If you are running on a big.Little box sure. But for normal (read x86 ;)
systems, it should probably stay on the current CPU.

> 
> So, i would like to call the sched_select_cpu() routine from this routine to
> get the suggested cpu.

Does the caller need to know? Or if you have a big.LITTLE setup, it
should just work automagically?

> 
> This routine would however would see more changes later to optimize it
> more.
> 
> > Also, we could still add a debug option that
> > makes it always run on other CPUs to slap developers that don't read.
> 
> I don't think we need it :(
> It would be a new API, with zero existing users. And so, whomsoever uses
> it, should know what exactly he is doing :)

Heh, you don't know kernel developers well do you? ;-)

Once a new API is added to the kernel tree, it quickly becomes
(mis)used.

-- Steve



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2013-01-07 22:29                 ` Steven Rostedt
@ 2013-01-08  4:03                   ` Viresh Kumar
  0 siblings, 0 replies; 48+ messages in thread
From: Viresh Kumar @ 2013-01-08  4:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Tejun Heo, Vincent Guittot, pjt, paul.mckenney, tglx,
	suresh.b.siddha, venki, mingo, peterz, Arvind.Chauhan,
	linaro-dev, patches, pdsw-power-team, linux-kernel,
	linux-rt-users

Hi Steven,

On 8 January 2013 03:59, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 2013-01-07 at 23:29 +0530, Viresh Kumar wrote:
>
>> > By default, I would suggest for cache locality,
>> > that we try to keep it on the same CPU. But if there's a better CPU to
>> > run on, it runs there.
>>
>> That would break our intention behind this routine. We should run
>> it on a cpu which we think is the best one for it power/performance wise.
>
> If you are running on a big.Little box sure. But for normal (read x86 ;)
> systems, it should probably stay on the current CPU.

But that's not the purpose of this new call. If the caller want it to be on
local cpu, he must not use this call. It is upto the core routine
(sched_select_cpu()
in our case) to decide where to launch it. If we need something special for x86,
we can hack this routine.

Even for non bigLITTLE systems, we may want to run a work on non-idle cpu and
so we can't guarantee local cpu here.

>> So, i would like to call the sched_select_cpu() routine from this routine to
>> get the suggested cpu.
>
> Does the caller need to know? Or if you have a big.LITTLE setup, it
> should just work automagically?

Caller wouldn't know, he just needs to trust on sched_select_cpu() to give the
most optimum cpu.

Again, it is not only for big LITTLE, but for any SMP system, where we
don't want
an idle cpu to do this work.

>> I don't think we need it :(
>> It would be a new API, with zero existing users. And so, whomsoever uses
>> it, should know what exactly he is doing :)
>
> Heh, you don't know kernel developers well do you? ;-)

I agree with you, but we don't need to care for foolish new code here.

> Once a new API is added to the kernel tree, it quickly becomes
> (mis)used.

Its true with all pieces of code in kernel and we really don't need to take
care of such users here :)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2013-01-07 18:07               ` Viresh Kumar
@ 2013-01-09 18:49                 ` Tejun Heo
  2013-01-10  5:04                   ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2013-01-09 18:49 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: rostedt, Vincent Guittot, pjt, paul.mckenney, tglx, mingo,
	peterz, Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users

Hello,

On Mon, Jan 07, 2013 at 11:37:22PM +0530, Viresh Kumar wrote:
> On 7 January 2013 20:34, Tejun Heo <tj@kernel.org> wrote:
> > The latter part "not using idle cpu just for processing work" does
> > apply to homogeneous systems too but as I wrote earlier work items
> > don't spontaneously happen on an idle core.  Something, including
> > timer, needs to kick it.  So, by definition, a CPU already isn't idle
> > when a work item starts execution on it.  What am I missing here?
> 
> We are talking about a core being idle from schedulers perspective :)

But it's not like cpu doesn't consume power if scheduler considers it
idle, right?  Can you please explain in detail how this contributes to
saving power?  Is it primarily about routing work items to lower power
CPUs?  And please don't point to presentation slides.  They don't seem
to explain much better and patches and the code should be able to
describe themselves.  Here, more so, as the desired behavior change
and the resulting powersave are rather subtle.

> > Yeah, this could be a better solution, I think.  Plus, it's not like
> > finding the optimal cpu is free.
> 
> Thanks for the first part (When i shared this idea with Vincent and Amit, i
> wasn't sure at all about the feedback i will get from you and others, but i
> am very happy now :) ).
> 
> I couldn't understand the second part. We still need to search for a free cpu
> for this new routine. And the implementation would almost be same as the
> implementation of queue_work() in my initial patch

I meant that enforcing lookup for the "optimal" cpu on queue_work() by
default would add overhead to the operation, which in cases (on most
homogeneous configs) would lead to overhead without any realized
benefit.

> > Let's start simple for now.  If we really need it, we can always add
> > more later.
> 
> :)
> Agreed. But i liked the idea from steven, we can have two routines:
> queue_work_on_any_cpu() and queue_work_on_cpus()

We can talk about specifics later but please try to *justify* each new
interface.  Please note that "putting work items to cpus which the
scheduler considers idle" can't really be justification in itself.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
  2013-01-09 18:49                 ` Tejun Heo
@ 2013-01-10  5:04                   ` Viresh Kumar
  0 siblings, 0 replies; 48+ messages in thread
From: Viresh Kumar @ 2013-01-10  5:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rostedt, Vincent Guittot, pjt, paul.mckenney, tglx, mingo,
	peterz, Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users

On 10 January 2013 00:19, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Jan 07, 2013 at 11:37:22PM +0530, Viresh Kumar wrote:
>> We are talking about a core being idle from schedulers perspective :)
>
> But it's not like cpu doesn't consume power if scheduler considers it
> idle, right?  Can you please explain in detail how this contributes to
> saving power?  Is it primarily about routing work items to lower power
> CPUs?  And please don't point to presentation slides.  They don't seem
> to explain much better and patches and the code should be able to
> describe themselves.  Here, more so, as the desired behavior change
> and the resulting powersave are rather subtle.

I got your concerns. Firstly, when cpu is idle from schedulers perspective, it
consumes a lot of power.

queue_work_on_any_cpu() would queue the work on any other cpu only
when current cpu is idle from schedulers perspective, and this can only
happen when the cpu was actually idle (in low power state), woke up due
to some interrupt/timer and is asked to queue a work..

The idea is to choose other non-idle cpu at this point, so that current cpu
can immediately go into deeper idle state. With this cpus can stay longer
at deeper idle state, rather than running works.

And in cases, where works are rearmed from the handler, this can cause
sufficient power loss, which could be easily saved by pushing this work to
non-idle cpus.

The same approach is taken for deffered timers too, they are already using
such routine. .

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2012-11-27 13:47   ` Steven Rostedt
@ 2013-03-20 15:13     ` Viresh Kumar
  2013-04-09 14:52       ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2013-03-20 15:13 UTC (permalink / raw)
  To: tglx, Steven Rostedt
  Cc: pjt, paul.mckenney, tj, suresh.b.siddha, venki, mingo, peterz,
	Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users, john stultz

On 27 November 2012 19:17, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2012-11-06 at 16:08 +0530, Viresh Kumar wrote:

>> diff --git a/kernel/timer.c b/kernel/timer.c
>> @@ -729,6 +730,12 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
>>
>>       base = lock_timer_base(timer, &flags);
>>
>> +     if (timer->sched_del) {
>> +             /* Don't schedule it again, as it is getting deleted */
>> +             ret = -EBUSY;
>> +             goto out_unlock;
>> +     }
>> +
>>       ret = detach_if_pending(timer, base, false);
>>       if (!ret && pending_only)
>>               goto out_unlock;
>> @@ -746,21 +753,12 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
>>       new_base = per_cpu(tvec_bases, cpu);
>>
>>       if (base != new_base) {
>> -             /*
>> -              * We are trying to schedule the timer on the local CPU.
>> -              * However we can't change timer's base while it is running,
>> -              * otherwise del_timer_sync() can't detect that the timer's
>> -              * handler yet has not finished. This also guarantees that
>> -              * the timer is serialized wrt itself.
>> -              */
>> -             if (likely(base->running_timer != timer)) {
>> -                     /* See the comment in lock_timer_base() */
>> -                     timer_set_base(timer, NULL);
>> -                     spin_unlock(&base->lock);
>> -                     base = new_base;
>> -                     spin_lock(&base->lock);
>> -                     timer_set_base(timer, base);
>> -             }
>> +             /* See the comment in lock_timer_base() */
>> +             timer_set_base(timer, NULL);
>> +             spin_unlock(&base->lock);
>> +             base = new_base;
>> +             spin_lock(&base->lock);
>> +             timer_set_base(timer, base);
>>       }

> I don't think this is good enough. For one thing, it doesn't handle
> try_to_del_timer_sync() or even del_timer_sync() for that matter. As
> that may return success when the timer happens to be running on another
> CPU.
>
> We have this:
>
>         CPU0                    CPU1
>         ----                    ----
>    timerA (running)
>    mod_timer(timerA)
>    [ migrate to CPU2 ]
>    release timer base lock
>                            del_timer_sync(timerA)
>                            timer->sched_del = true
>                            try_to_del_timer_sync(timerA)
>                                 base(CPU2)->timer != timerA
>                                 [TRUE!]
>   timerA (finishes)
>
> Fail!

Hi Steven/Thomas,

I came back to this patch after completing some other stuff and posting
wq part of this patchset separately.

I got your point and understand how this would fail.

@Thomas: I need your opinion first. Do you like this concept of migrating
running timer or not? Or you see some basic problem with this concept?

If no (i.e. i can go ahead with another version), then i have some solution to
fix earlier problems reported by Steven:

The problem lies with del_timer_sync() which just checks
base->running_timer != timer to check if timer is currently running or not.

What if we add another variable in struct timer_list, that will store if we are
running timer callback or not. And so, before we call callback in timer core,
we will set this variable and will reset it after finishing callback.

del_timer_sync() will have something like:

if (base->running_timer != timer)
    remove timer and return;
else if (timer->running_callback)
    go back to its loop...

So, with my existing patch + this change, del_timer_sync() will not return
back unless the callback is completed on CPU0.

But what can happen now is base->running_timer == timer can be true
for two cpus simultaneously cpu0 (running callback) and cpu2 (running hardware
timer). Will that cause any issues?

--
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2013-03-20 15:13     ` Viresh Kumar
@ 2013-04-09 14:52       ` Viresh Kumar
  2013-04-24 11:22         ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2013-04-09 14:52 UTC (permalink / raw)
  To: tglx, Steven Rostedt
  Cc: pjt, paul.mckenney, tj, mingo, peterz, Arvind.Chauhan,
	linaro-dev, patches, pdsw-power-team, linux-kernel,
	linux-rt-users, john stultz

[Steven replied to a personal Ping!!, including everybody again]

On 9 April 2013 19:25, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2013-04-09 at 14:05 +0530, Viresh Kumar wrote:
>> Ping!!
>>
>
> Remind me again. What problem are you trying to solve?

I was trying to migrate a running timer which arms itself, so that we don't
keep a cpu busy just for servicing this timer.

>> On 20 March 2013 20:43, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> >
>> > Hi Steven/Thomas,
>> >
>> > I came back to this patch after completing some other stuff and posting
>> > wq part of this patchset separately.
>> >
>> > I got your point and understand how this would fail.
>> >
>> > @Thomas: I need your opinion first. Do you like this concept of migrating
>> > running timer or not? Or you see some basic problem with this concept?
>
> I'll let Thomas answer this, but to me, this sounds really racy.

Sure.

>> > If no (i.e. i can go ahead with another version), then i have some solution to
>> > fix earlier problems reported by Steven:
>> >
>> > The problem lies with del_timer_sync() which just checks
>> > base->running_timer != timer to check if timer is currently running or not.
>> >
>> > What if we add another variable in struct timer_list, that will store if we are
>> > running timer callback or not. And so, before we call callback in timer core,
>> > we will set this variable and will reset it after finishing callback.
>> >
>> > del_timer_sync() will have something like:
>> >
>> > if (base->running_timer != timer)
>> >     remove timer and return;
>
> For example, this didn't fix the issue. You removed the timer when it
> was still running, because base->running_timer did not equal timer.

You are correct and i was stupid. I wanted to write this instead:

del_timer_sync() will have something like:

if (base->running_timer != timer)
    if (timer->running_callback)
        go back to its loop...
    else
        remove timer and return;

i.e. if we aren't running on our base cpu, just check if our callback is
executing somewhere else due to migration.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2013-04-09 14:52       ` Viresh Kumar
@ 2013-04-24 11:22         ` Viresh Kumar
  2013-05-13  9:19           ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2013-04-24 11:22 UTC (permalink / raw)
  To: tglx, Steven Rostedt
  Cc: pjt, paul.mckenney, tj, mingo, peterz, Arvind.Chauhan,
	linaro-dev, patches, pdsw-power-team, linux-kernel,
	linux-rt-users, john stultz

On 9 April 2013 20:22, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> [Steven replied to a personal Ping!!, including everybody again]
>
> On 9 April 2013 19:25, Steven Rostedt <rostedt@goodmis.org> wrote:
>> On Tue, 2013-04-09 at 14:05 +0530, Viresh Kumar wrote:
>>> Ping!!
>>>
>>
>> Remind me again. What problem are you trying to solve?
>
> I was trying to migrate a running timer which arms itself, so that we don't
> keep a cpu busy just for servicing this timer.
>
>>> On 20 March 2013 20:43, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>> >
>>> > Hi Steven/Thomas,
>>> >
>>> > I came back to this patch after completing some other stuff and posting
>>> > wq part of this patchset separately.
>>> >
>>> > I got your point and understand how this would fail.
>>> >
>>> > @Thomas: I need your opinion first. Do you like this concept of migrating
>>> > running timer or not? Or you see some basic problem with this concept?
>>
>> I'll let Thomas answer this, but to me, this sounds really racy.
>
> Sure.
>
>>> > If no (i.e. i can go ahead with another version), then i have some solution to
>>> > fix earlier problems reported by Steven:
>>> >
>>> > The problem lies with del_timer_sync() which just checks
>>> > base->running_timer != timer to check if timer is currently running or not.
>>> >
>>> > What if we add another variable in struct timer_list, that will store if we are
>>> > running timer callback or not. And so, before we call callback in timer core,
>>> > we will set this variable and will reset it after finishing callback.
>>> >
>>> > del_timer_sync() will have something like:
>>> >
>>> > if (base->running_timer != timer)
>>> >     remove timer and return;
>>
>> For example, this didn't fix the issue. You removed the timer when it
>> was still running, because base->running_timer did not equal timer.
>
> You are correct and i was stupid. I wanted to write this instead:
>
> del_timer_sync() will have something like:
>
> if (base->running_timer != timer)
>     if (timer->running_callback)
>         go back to its loop...
>     else
>         remove timer and return;
>
> i.e. if we aren't running on our base cpu, just check if our callback is
> executing somewhere else due to migration.

Ping!!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2013-04-24 11:22         ` Viresh Kumar
@ 2013-05-13  9:19           ` Viresh Kumar
  2013-05-13 10:35             ` Thomas Gleixner
  0 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2013-05-13  9:19 UTC (permalink / raw)
  To: tglx, Steven Rostedt
  Cc: pjt, paul.mckenney, tj, mingo, peterz, Arvind.Chauhan,
	linaro-dev, patches, pdsw-power-team, linux-kernel,
	linux-rt-users, john stultz

On 24 April 2013 16:52, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 9 April 2013 20:22, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> [Steven replied to a personal Ping!!, including everybody again]
>>
>> On 9 April 2013 19:25, Steven Rostedt <rostedt@goodmis.org> wrote:
>>> On Tue, 2013-04-09 at 14:05 +0530, Viresh Kumar wrote:
>>>> Ping!!
>>>>
>>>
>>> Remind me again. What problem are you trying to solve?
>>
>> I was trying to migrate a running timer which arms itself, so that we don't
>> keep a cpu busy just for servicing this timer.
>>
>>>> On 20 March 2013 20:43, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>>> >
>>>> > Hi Steven/Thomas,
>>>> >
>>>> > I came back to this patch after completing some other stuff and posting
>>>> > wq part of this patchset separately.
>>>> >
>>>> > I got your point and understand how this would fail.
>>>> >
>>>> > @Thomas: I need your opinion first. Do you like this concept of migrating
>>>> > running timer or not? Or you see some basic problem with this concept?
>>>
>>> I'll let Thomas answer this, but to me, this sounds really racy.
>>
>> Sure.
>>
>>>> > If no (i.e. i can go ahead with another version), then i have some solution to
>>>> > fix earlier problems reported by Steven:
>>>> >
>>>> > The problem lies with del_timer_sync() which just checks
>>>> > base->running_timer != timer to check if timer is currently running or not.
>>>> >
>>>> > What if we add another variable in struct timer_list, that will store if we are
>>>> > running timer callback or not. And so, before we call callback in timer core,
>>>> > we will set this variable and will reset it after finishing callback.
>>>> >
>>>> > del_timer_sync() will have something like:
>>>> >
>>>> > if (base->running_timer != timer)
>>>> >     remove timer and return;
>>>
>>> For example, this didn't fix the issue. You removed the timer when it
>>> was still running, because base->running_timer did not equal timer.
>>
>> You are correct and i was stupid. I wanted to write this instead:
>>
>> del_timer_sync() will have something like:
>>
>> if (base->running_timer != timer)
>>     if (timer->running_callback)
>>         go back to its loop...
>>     else
>>         remove timer and return;
>>
>> i.e. if we aren't running on our base cpu, just check if our callback is
>> executing somewhere else due to migration.
>
> Ping!!

Ping!!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2013-05-13  9:19           ` Viresh Kumar
@ 2013-05-13 10:35             ` Thomas Gleixner
  2013-05-22  8:34               ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Thomas Gleixner @ 2013-05-13 10:35 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Steven Rostedt, pjt, paul.mckenney, tj, mingo, peterz,
	Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users, john stultz

On Mon, 13 May 2013, Viresh Kumar wrote:
> On 24 April 2013 16:52, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > On 9 April 2013 20:22, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >> [Steven replied to a personal Ping!!, including everybody again]
> >>
> >> On 9 April 2013 19:25, Steven Rostedt <rostedt@goodmis.org> wrote:
> >>> On Tue, 2013-04-09 at 14:05 +0530, Viresh Kumar wrote:
> >>>> Ping!!
> >>>>
> >>>
> >>> Remind me again. What problem are you trying to solve?
> >>
> >> I was trying to migrate a running timer which arms itself, so that we don't
> >> keep a cpu busy just for servicing this timer.

Which mechanism is migrating the timer away?

> >>>> On 20 March 2013 20:43, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >>>> >
> >>>> > Hi Steven/Thomas,
> >>>> >
> >>>> > I came back to this patch after completing some other stuff and posting
> >>>> > wq part of this patchset separately.
> >>>> >
> >>>> > I got your point and understand how this would fail.
> >>>> >
> >>>> > @Thomas: I need your opinion first. Do you like this concept of migrating
> >>>> > running timer or not? Or you see some basic problem with this concept?

I have no objections to the functionality per se, but the proposed
solution is not going to fly.

Aside of bloating the data structure you're changing the semantics of
__mod_timer(). No __mod_timer() caller can deal with -EBUSY. So you'd
break the world and some more.

Here is a list of questions:

      - Which mechanism migrates timers?

      - How is that mechanism triggered?

      - How does that deal with CPU bound timers?

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2013-05-13 10:35             ` Thomas Gleixner
@ 2013-05-22  8:34               ` Viresh Kumar
  2013-05-22  9:06                 ` Peter Zijlstra
  2013-05-31 10:49                 ` Viresh Kumar
  0 siblings, 2 replies; 48+ messages in thread
From: Viresh Kumar @ 2013-05-22  8:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Rostedt, pjt, paul.mckenney, tj, mingo, peterz,
	Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users, john stultz

Sorry for being late in replying to your queries.

On 13 May 2013 16:05, Thomas Gleixner <tglx@linutronix.de> wrote:
> Which mechanism is migrating the timer away?

It will be the same: get_nohz_timer_target() which will decide target
cpu for migration.

> I have no objections to the functionality per se, but the proposed
> solution is not going to fly.
>
> Aside of bloating the data structure you're changing the semantics of
> __mod_timer(). No __mod_timer() caller can deal with -EBUSY. So you'd
> break the world and some more.

Ahh.. That idea was dropped already.

> Here is a list of questions:
>
>       - Which mechanism migrates timers?
>
>       - How is that mechanism triggered?

The mechanism remains the same as is for non-rearmed timers.
i.e. get_nohz_timer_target()..

We are just trying to give a approach with which we can migrate
running timers. i.e. those which re-arm themselves from their
handlers.

>       - How does that deal with CPU bound timers?

We will still check 'Pinned' for this timer as is done for any other
normal timer. So, we don't migrate them.

So, this is the clean draft for the idea I had.. (Naming is poor for
now):

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197..ad00ebe 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -20,6 +20,7 @@ struct timer_list {

        void (*function)(unsigned long);
        unsigned long data;
+       int wait_for_migration_to_complete;

        int slack;

diff --git a/kernel/timer.c b/kernel/timer.c
index a860bba..7791f28 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -746,21 +746,15 @@ __mod_timer(struct timer_list *timer, unsigned
long expires,
        new_base = per_cpu(tvec_bases, cpu);

        if (base != new_base) {
-               /*
-                * We are trying to schedule the timer on the local CPU.
-                * However we can't change timer's base while it is running,
-                * otherwise del_timer_sync() can't detect that the timer's
-                * handler yet has not finished. This also guarantees that
-                * the timer is serialized wrt itself.
-                */
-               if (likely(base->running_timer != timer)) {
-                       /* See the comment in lock_timer_base() */
-                       timer_set_base(timer, NULL);
-                       spin_unlock(&base->lock);
-                       base = new_base;
-                       spin_lock(&base->lock);
-                       timer_set_base(timer, base);
-               }
+               if (base->running_timer == timer)
+                       timer->wait_for_migration_to_complete = 1;
+
+               /* See the comment in lock_timer_base() */
+               timer_set_base(timer, NULL);
+               spin_unlock(&base->lock);
+               base = new_base;
+               spin_lock(&base->lock);
+               timer_set_base(timer, base);
        }

        timer->expires = expires;
@@ -990,7 +984,8 @@ int try_to_del_timer_sync(struct timer_list *timer)

        base = lock_timer_base(timer, &flags);

-       if (base->running_timer != timer) {
+       if ((base->running_timer != timer) &&
+                       !timer->wait_for_migration_to_complete) {
                timer_stats_timer_clear_start_info(timer);
                ret = detach_if_pending(timer, base, true);
        }
@@ -1183,6 +1178,8 @@ static inline void __run_timers(struct tvec_base *base)
                                call_timer_fn(timer, fn, data);
                                spin_lock_irq(&base->lock);
                        }
+                       if (timer->wait_for_migration_to_complete)
+                               timer->wait_for_migration_to_complete = 0;
                }
        }
        base->running_timer = NULL;


Please see if it a junk idea or has some light of hope :)

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2013-05-22  8:34               ` Viresh Kumar
@ 2013-05-22  9:06                 ` Peter Zijlstra
  2013-05-22  9:23                   ` Viresh Kumar
  2013-05-31 10:49                 ` Viresh Kumar
  1 sibling, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2013-05-22  9:06 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Thomas Gleixner, Steven Rostedt, pjt, paul.mckenney, tj, mingo,
	Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users, john stultz

On Wed, May 22, 2013 at 02:04:16PM +0530, Viresh Kumar wrote:

> So, this is the clean draft for the idea I had.. (Naming is poor for
> now):
> 
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 8c5a197..ad00ebe 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -20,6 +20,7 @@ struct timer_list {
> 
>         void (*function)(unsigned long);
>         unsigned long data;
> +       int wait_for_migration_to_complete;

If you play games with the alignment constraints of struct tvec_base you
can get extra low bits to play with for TIMER_FLAG_MASK. See struct
pool_workqueue for similar games.

That would avoid the struct bloat and I suppose make tglx happy :-)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2013-05-22  9:06                 ` Peter Zijlstra
@ 2013-05-22  9:23                   ` Viresh Kumar
  0 siblings, 0 replies; 48+ messages in thread
From: Viresh Kumar @ 2013-05-22  9:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Steven Rostedt, pjt, paul.mckenney, tj, mingo,
	Arvind.Chauhan, linaro-dev, patches, pdsw-power-team,
	linux-kernel, linux-rt-users, john stultz

On 22 May 2013 14:36, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, May 22, 2013 at 02:04:16PM +0530, Viresh Kumar wrote:
>
>> So, this is the clean draft for the idea I had.. (Naming is poor for
>> now):
>>
>> diff --git a/include/linux/timer.h b/include/linux/timer.h
>> index 8c5a197..ad00ebe 100644
>> --- a/include/linux/timer.h
>> +++ b/include/linux/timer.h
>> @@ -20,6 +20,7 @@ struct timer_list {
>>
>>         void (*function)(unsigned long);
>>         unsigned long data;
>> +       int wait_for_migration_to_complete;
>
> If you play games with the alignment constraints of struct tvec_base you
> can get extra low bits to play with for TIMER_FLAG_MASK. See struct
> pool_workqueue for similar games.
>
> That would avoid the struct bloat and I suppose make tglx happy :-)

Sure, once the initial draft is approved we can get it optimized to the
best of our capabilities. :)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2013-05-22  8:34               ` Viresh Kumar
  2013-05-22  9:06                 ` Peter Zijlstra
@ 2013-05-31 10:49                 ` Viresh Kumar
  2013-06-18  4:51                   ` Viresh Kumar
  1 sibling, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2013-05-31 10:49 UTC (permalink / raw)
  To: Steven Rostedt, Thomas Gleixner
  Cc: pjt, paul.mckenney, tj, mingo, peterz, Arvind.Chauhan,
	linaro-dev, patches, pdsw-power-team, linux-kernel,
	linux-rt-users, john stultz

On 22 May 2013 14:04, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Sorry for being late in replying to your queries.
>
> On 13 May 2013 16:05, Thomas Gleixner <tglx@linutronix.de> wrote:
>> Which mechanism is migrating the timer away?
>
> It will be the same: get_nohz_timer_target() which will decide target
> cpu for migration.
>
>> I have no objections to the functionality per se, but the proposed
>> solution is not going to fly.
>>
>> Aside of bloating the data structure you're changing the semantics of
>> __mod_timer(). No __mod_timer() caller can deal with -EBUSY. So you'd
>> break the world and some more.
>
> Ahh.. That idea was dropped already.
>
>> Here is a list of questions:
>>
>>       - Which mechanism migrates timers?
>>
>>       - How is that mechanism triggered?
>
> The mechanism remains the same as is for non-rearmed timers.
> i.e. get_nohz_timer_target()..
>
> We are just trying to give a approach with which we can migrate
> running timers. i.e. those which re-arm themselves from their
> handlers.
>
>>       - How does that deal with CPU bound timers?
>
> We will still check 'Pinned' for this timer as is done for any other
> normal timer. So, we don't migrate them.
>
> So, this is the clean draft for the idea I had.. (Naming is poor for
> now):
>
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 8c5a197..ad00ebe 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -20,6 +20,7 @@ struct timer_list {
>
>         void (*function)(unsigned long);
>         unsigned long data;
> +       int wait_for_migration_to_complete;
>
>         int slack;
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index a860bba..7791f28 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -746,21 +746,15 @@ __mod_timer(struct timer_list *timer, unsigned
> long expires,
>         new_base = per_cpu(tvec_bases, cpu);
>
>         if (base != new_base) {
> -               /*
> -                * We are trying to schedule the timer on the local CPU.
> -                * However we can't change timer's base while it is running,
> -                * otherwise del_timer_sync() can't detect that the timer's
> -                * handler yet has not finished. This also guarantees that
> -                * the timer is serialized wrt itself.
> -                */
> -               if (likely(base->running_timer != timer)) {
> -                       /* See the comment in lock_timer_base() */
> -                       timer_set_base(timer, NULL);
> -                       spin_unlock(&base->lock);
> -                       base = new_base;
> -                       spin_lock(&base->lock);
> -                       timer_set_base(timer, base);
> -               }
> +               if (base->running_timer == timer)
> +                       timer->wait_for_migration_to_complete = 1;
> +
> +               /* See the comment in lock_timer_base() */
> +               timer_set_base(timer, NULL);
> +               spin_unlock(&base->lock);
> +               base = new_base;
> +               spin_lock(&base->lock);
> +               timer_set_base(timer, base);
>         }
>
>         timer->expires = expires;
> @@ -990,7 +984,8 @@ int try_to_del_timer_sync(struct timer_list *timer)
>
>         base = lock_timer_base(timer, &flags);
>
> -       if (base->running_timer != timer) {
> +       if ((base->running_timer != timer) &&
> +                       !timer->wait_for_migration_to_complete) {
>                 timer_stats_timer_clear_start_info(timer);
>                 ret = detach_if_pending(timer, base, true);
>         }
> @@ -1183,6 +1178,8 @@ static inline void __run_timers(struct tvec_base *base)
>                                 call_timer_fn(timer, fn, data);
>                                 spin_lock_irq(&base->lock);
>                         }
> +                       if (timer->wait_for_migration_to_complete)
> +                               timer->wait_for_migration_to_complete = 0;
>                 }
>         }
>         base->running_timer = NULL;
>
>
> Please see if it a junk idea or has some light of hope :)

Ping!!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2013-05-31 10:49                 ` Viresh Kumar
@ 2013-06-18  4:51                   ` Viresh Kumar
  2013-07-24  9:17                     ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2013-06-18  4:51 UTC (permalink / raw)
  To: Steven Rostedt, Thomas Gleixner
  Cc: pjt, paul.mckenney, tj, mingo, peterz, Arvind.Chauhan,
	linaro-dev, patches, pdsw-power-team, linux-kernel,
	linux-rt-users, John Stultz

On 31 May 2013 16:19, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 22 May 2013 14:04, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> Sorry for being late in replying to your queries.
>>
>> On 13 May 2013 16:05, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> Which mechanism is migrating the timer away?
>>
>> It will be the same: get_nohz_timer_target() which will decide target
>> cpu for migration.
>>
>>> I have no objections to the functionality per se, but the proposed
>>> solution is not going to fly.
>>>
>>> Aside of bloating the data structure you're changing the semantics of
>>> __mod_timer(). No __mod_timer() caller can deal with -EBUSY. So you'd
>>> break the world and some more.
>>
>> Ahh.. That idea was dropped already.
>>
>>> Here is a list of questions:
>>>
>>>       - Which mechanism migrates timers?
>>>
>>>       - How is that mechanism triggered?
>>
>> The mechanism remains the same as is for non-rearmed timers.
>> i.e. get_nohz_timer_target()..
>>
>> We are just trying to give a approach with which we can migrate
>> running timers. i.e. those which re-arm themselves from their
>> handlers.
>>
>>>       - How does that deal with CPU bound timers?
>>
>> We will still check 'Pinned' for this timer as is done for any other
>> normal timer. So, we don't migrate them.
>>
>> So, this is the clean draft for the idea I had.. (Naming is poor for
>> now):
>>
>> diff --git a/include/linux/timer.h b/include/linux/timer.h
>> index 8c5a197..ad00ebe 100644
>> --- a/include/linux/timer.h
>> +++ b/include/linux/timer.h
>> @@ -20,6 +20,7 @@ struct timer_list {
>>
>>         void (*function)(unsigned long);
>>         unsigned long data;
>> +       int wait_for_migration_to_complete;
>>
>>         int slack;
>>
>> diff --git a/kernel/timer.c b/kernel/timer.c
>> index a860bba..7791f28 100644
>> --- a/kernel/timer.c
>> +++ b/kernel/timer.c
>> @@ -746,21 +746,15 @@ __mod_timer(struct timer_list *timer, unsigned
>> long expires,
>>         new_base = per_cpu(tvec_bases, cpu);
>>
>>         if (base != new_base) {
>> -               /*
>> -                * We are trying to schedule the timer on the local CPU.
>> -                * However we can't change timer's base while it is running,
>> -                * otherwise del_timer_sync() can't detect that the timer's
>> -                * handler yet has not finished. This also guarantees that
>> -                * the timer is serialized wrt itself.
>> -                */
>> -               if (likely(base->running_timer != timer)) {
>> -                       /* See the comment in lock_timer_base() */
>> -                       timer_set_base(timer, NULL);
>> -                       spin_unlock(&base->lock);
>> -                       base = new_base;
>> -                       spin_lock(&base->lock);
>> -                       timer_set_base(timer, base);
>> -               }
>> +               if (base->running_timer == timer)
>> +                       timer->wait_for_migration_to_complete = 1;
>> +
>> +               /* See the comment in lock_timer_base() */
>> +               timer_set_base(timer, NULL);
>> +               spin_unlock(&base->lock);
>> +               base = new_base;
>> +               spin_lock(&base->lock);
>> +               timer_set_base(timer, base);
>>         }
>>
>>         timer->expires = expires;
>> @@ -990,7 +984,8 @@ int try_to_del_timer_sync(struct timer_list *timer)
>>
>>         base = lock_timer_base(timer, &flags);
>>
>> -       if (base->running_timer != timer) {
>> +       if ((base->running_timer != timer) &&
>> +                       !timer->wait_for_migration_to_complete) {
>>                 timer_stats_timer_clear_start_info(timer);
>>                 ret = detach_if_pending(timer, base, true);
>>         }
>> @@ -1183,6 +1178,8 @@ static inline void __run_timers(struct tvec_base *base)
>>                                 call_timer_fn(timer, fn, data);
>>                                 spin_lock_irq(&base->lock);
>>                         }
>> +                       if (timer->wait_for_migration_to_complete)
>> +                               timer->wait_for_migration_to_complete = 0;
>>                 }
>>         }
>>         base->running_timer = NULL;
>>
>>
>> Please see if it a junk idea or has some light of hope :)
>
> Ping!!

Ping!!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2013-06-18  4:51                   ` Viresh Kumar
@ 2013-07-24  9:17                     ` Viresh Kumar
  2013-08-07  9:55                       ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2013-07-24  9:17 UTC (permalink / raw)
  To: Steven Rostedt, Thomas Gleixner
  Cc: pjt, paul.mckenney, tj, mingo, peterz, Arvind.Chauhan,
	linaro-dev, patches, pdsw-power-team, linux-kernel,
	linux-rt-users, John Stultz

On 18 June 2013 10:21, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 31 May 2013 16:19, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> On 22 May 2013 14:04, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>> Sorry for being late in replying to your queries.
>>>
>>> On 13 May 2013 16:05, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>> Which mechanism is migrating the timer away?
>>>
>>> It will be the same: get_nohz_timer_target() which will decide target
>>> cpu for migration.
>>>
>>>> I have no objections to the functionality per se, but the proposed
>>>> solution is not going to fly.
>>>>
>>>> Aside of bloating the data structure you're changing the semantics of
>>>> __mod_timer(). No __mod_timer() caller can deal with -EBUSY. So you'd
>>>> break the world and some more.
>>>
>>> Ahh.. That idea was dropped already.
>>>
>>>> Here is a list of questions:
>>>>
>>>>       - Which mechanism migrates timers?
>>>>
>>>>       - How is that mechanism triggered?
>>>
>>> The mechanism remains the same as is for non-rearmed timers.
>>> i.e. get_nohz_timer_target()..
>>>
>>> We are just trying to give a approach with which we can migrate
>>> running timers. i.e. those which re-arm themselves from their
>>> handlers.
>>>
>>>>       - How does that deal with CPU bound timers?
>>>
>>> We will still check 'Pinned' for this timer as is done for any other
>>> normal timer. So, we don't migrate them.
>>>
>>> So, this is the clean draft for the idea I had.. (Naming is poor for
>>> now):
>>>
>>> diff --git a/include/linux/timer.h b/include/linux/timer.h
>>> index 8c5a197..ad00ebe 100644
>>> --- a/include/linux/timer.h
>>> +++ b/include/linux/timer.h
>>> @@ -20,6 +20,7 @@ struct timer_list {
>>>
>>>         void (*function)(unsigned long);
>>>         unsigned long data;
>>> +       int wait_for_migration_to_complete;
>>>
>>>         int slack;
>>>
>>> diff --git a/kernel/timer.c b/kernel/timer.c
>>> index a860bba..7791f28 100644
>>> --- a/kernel/timer.c
>>> +++ b/kernel/timer.c
>>> @@ -746,21 +746,15 @@ __mod_timer(struct timer_list *timer, unsigned
>>> long expires,
>>>         new_base = per_cpu(tvec_bases, cpu);
>>>
>>>         if (base != new_base) {
>>> -               /*
>>> -                * We are trying to schedule the timer on the local CPU.
>>> -                * However we can't change timer's base while it is running,
>>> -                * otherwise del_timer_sync() can't detect that the timer's
>>> -                * handler yet has not finished. This also guarantees that
>>> -                * the timer is serialized wrt itself.
>>> -                */
>>> -               if (likely(base->running_timer != timer)) {
>>> -                       /* See the comment in lock_timer_base() */
>>> -                       timer_set_base(timer, NULL);
>>> -                       spin_unlock(&base->lock);
>>> -                       base = new_base;
>>> -                       spin_lock(&base->lock);
>>> -                       timer_set_base(timer, base);
>>> -               }
>>> +               if (base->running_timer == timer)
>>> +                       timer->wait_for_migration_to_complete = 1;
>>> +
>>> +               /* See the comment in lock_timer_base() */
>>> +               timer_set_base(timer, NULL);
>>> +               spin_unlock(&base->lock);
>>> +               base = new_base;
>>> +               spin_lock(&base->lock);
>>> +               timer_set_base(timer, base);
>>>         }
>>>
>>>         timer->expires = expires;
>>> @@ -990,7 +984,8 @@ int try_to_del_timer_sync(struct timer_list *timer)
>>>
>>>         base = lock_timer_base(timer, &flags);
>>>
>>> -       if (base->running_timer != timer) {
>>> +       if ((base->running_timer != timer) &&
>>> +                       !timer->wait_for_migration_to_complete) {
>>>                 timer_stats_timer_clear_start_info(timer);
>>>                 ret = detach_if_pending(timer, base, true);
>>>         }
>>> @@ -1183,6 +1178,8 @@ static inline void __run_timers(struct tvec_base *base)
>>>                                 call_timer_fn(timer, fn, data);
>>>                                 spin_lock_irq(&base->lock);
>>>                         }
>>> +                       if (timer->wait_for_migration_to_complete)
>>> +                               timer->wait_for_migration_to_complete = 0;
>>>                 }
>>>         }
>>>         base->running_timer = NULL;
>>>
>>>
>>> Please see if it a junk idea or has some light of hope :)
>>
>> Ping!!
>
> Ping!!

Hi Thomas,

Do you still see some light of hope in this patch :) ?

--
viresh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2013-07-24  9:17                     ` Viresh Kumar
@ 2013-08-07  9:55                       ` Viresh Kumar
  2013-10-04 12:39                         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2013-08-07  9:55 UTC (permalink / raw)
  To: Steven Rostedt, Thomas Gleixner
  Cc: pjt, paul.mckenney, tj, mingo, peterz, Arvind.Chauhan,
	linaro-dev, patches, pdsw-power-team, linux-kernel,
	linux-rt-users, John Stultz

On 24 July 2013 14:47, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 18 June 2013 10:21, Viresh Kumar <viresh.kumar@linaro.org> wrote:

> Hi Thomas,
>
> Do you still see some light of hope in this patch :) ?

Ping!!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2013-08-07  9:55                       ` Viresh Kumar
@ 2013-10-04 12:39                         ` Sebastian Andrzej Siewior
  2013-10-23  5:55                           ` Viresh Kumar
  0 siblings, 1 reply; 48+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-04 12:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Rostedt, Thomas Gleixner, Viresh Kumar, paul.mckenney, tj,
	mingo, peterz, Arvind.Chauhan, linaro-dev, patches,
	pdsw-power-team, linux-kernel, linux-rt-users, John Stultz

* Viresh Kumar | 2013-08-07 15:25:55 [+0530]:

>On 24 July 2013 14:47, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> On 18 June 2013 10:21, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
>> Hi Thomas,
>>
>> Do you still see some light of hope in this patch :) ?
>
>Ping!!

tglx, could you please take a look at this patch / thread?

Sebastian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V2 Resend 4/4] timer: Migrate running timer
  2013-10-04 12:39                         ` Sebastian Andrzej Siewior
@ 2013-10-23  5:55                           ` Viresh Kumar
  0 siblings, 0 replies; 48+ messages in thread
From: Viresh Kumar @ 2013-10-23  5:55 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Thomas Gleixner
  Cc: Steven Rostedt, Paul McKenney, Tejun Heo, Ingo Molnar,
	Peter Zijlstra, Arvind Chauhan, Lists linaro-dev, Patch Tracking,
	PDSW-power-team, Linux Kernel Mailing List, linux-rt-users,
	John Stultz

On 4 October 2013 18:09, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> tglx, could you please take a look at this patch / thread?

Thomas, Ping!!

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2013-10-23  5:55 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-06 10:34 [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers Viresh Kumar
2012-11-06 10:38 ` [PATCH V2 Resend 1/4] sched: Create sched_select_cpu() to give preferred CPU for power saving Viresh Kumar
2012-11-06 10:38 ` [PATCH V2 Resend 2/4] timer: hrtimer: Don't check idle_cpu() before calling get_nohz_timer_target() Viresh Kumar
2012-11-06 10:38 ` [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one Viresh Kumar
2012-11-26 17:15   ` Tejun Heo
2012-11-27  5:19     ` Viresh Kumar
2012-11-27 12:54       ` Vincent Guittot
2013-01-04 11:11       ` Viresh Kumar
2013-01-04 15:09         ` Tejun Heo
2013-01-07  9:58           ` Viresh Kumar
2013-01-07 13:28             ` Steven Rostedt
2013-01-07 17:59               ` Viresh Kumar
2013-01-07 22:29                 ` Steven Rostedt
2013-01-08  4:03                   ` Viresh Kumar
2013-01-07 15:04             ` Tejun Heo
2013-01-07 15:40               ` Amit Kucheria
2013-01-07 18:07               ` Viresh Kumar
2013-01-09 18:49                 ` Tejun Heo
2013-01-10  5:04                   ` Viresh Kumar
2012-11-27 13:26   ` Steven Rostedt
2012-11-27 13:48     ` Viresh Kumar
2012-11-27 13:59       ` Steven Rostedt
2012-11-27 14:55         ` Vincent Guittot
2012-11-27 15:04           ` Steven Rostedt
2012-11-27 15:35             ` Vincent Guittot
2012-11-06 10:38 ` [PATCH V2 Resend 4/4] timer: Migrate running timer Viresh Kumar
2012-11-27 13:47   ` Steven Rostedt
2013-03-20 15:13     ` Viresh Kumar
2013-04-09 14:52       ` Viresh Kumar
2013-04-24 11:22         ` Viresh Kumar
2013-05-13  9:19           ` Viresh Kumar
2013-05-13 10:35             ` Thomas Gleixner
2013-05-22  8:34               ` Viresh Kumar
2013-05-22  9:06                 ` Peter Zijlstra
2013-05-22  9:23                   ` Viresh Kumar
2013-05-31 10:49                 ` Viresh Kumar
2013-06-18  4:51                   ` Viresh Kumar
2013-07-24  9:17                     ` Viresh Kumar
2013-08-07  9:55                       ` Viresh Kumar
2013-10-04 12:39                         ` Sebastian Andrzej Siewior
2013-10-23  5:55                           ` Viresh Kumar
2012-11-26 15:00 ` [PATCH V2 Resend 0/4] Create sched_select_cpu() and use it for workqueues and timers Viresh Kumar
2012-11-26 16:40   ` Steven Rostedt
2012-11-26 17:03     ` Paul E. McKenney
2012-11-26 17:35       ` Steven Rostedt
2012-11-26 19:03         ` Paul E. McKenney
2012-11-26 19:17           ` Steven Rostedt
2012-11-27  6:25     ` Viresh Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).