linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Stop sched tick in idle injection task
@ 2016-11-23 20:13 Jacob Pan
  2016-11-23 20:13 ` [PATCH v3 1/3] idle: add support for tasks that inject idle Jacob Pan
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Jacob Pan @ 2016-11-23 20:13 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, LKML, Linux PM,
	Rafael Wysocki
  Cc: Arjan van de Ven, Srinivas Pandruvada, Len Brown,
	Eduardo Valentin, Zhang Rui, Petr Mladek,
	Sebastian Andrzej Siewior, Jacob Pan

Changelog:
v3:	- rearrange idle.c change based on Rafael's suggestion.

v2:
	- moved duration timer from powerclamp driver to play_idle()
	- unexport cpuidle_use_deepest_state
	- indentation fix

Idle injection drivers today use RT threads to run idle loop. There are
efficiency and accounting issues with the current intel_powerclamp.c
and acpi_pad.c. A while ago, I posted CFS based idle injection patch trying
to address them:
https://lkml.org/lkml/2015/11/13/576

Peter proposed another approach with the introduction of a PF_IDLE flag.
This patchset is based on his original posting:
https://lkml.org/lkml/2014/6/4/56

These patches apply on top of the kworker and cpu hotplug state machine
changes made to Intel powerclamp driver.
https://lkml.org/lkml/2016/10/17/362

Similar changes to ACPI PAD driver is developed along with other
enhancements. It will be posted after this patchset is accepted.

Jacob Pan (2):
  cpuidle: allow setting deepest idle
  thermal/powerclamp: stop sched tick in forced idle

Peter Zijlstra (1):
  idle: add support for tasks that inject idle

 drivers/cpuidle/cpuidle.c          |  11 +++
 drivers/thermal/intel_powerclamp.c |  35 +------
 include/linux/cpu.h                |   2 +
 include/linux/cpuidle.h            |   4 +-
 include/linux/sched.h              |   3 +-
 kernel/fork.c                      |   3 +
 kernel/sched/core.c                |   1 +
 kernel/sched/idle.c                | 183 +++++++++++++++++++++++--------------
 8 files changed, 139 insertions(+), 103 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/3] idle: add support for tasks that inject idle
  2016-11-23 20:13 [PATCH v3 0/3] Stop sched tick in idle injection task Jacob Pan
@ 2016-11-23 20:13 ` Jacob Pan
  2016-11-24  4:18   ` Ingo Molnar
  2016-11-24  9:50   ` Peter Zijlstra
  2016-11-23 20:13 ` [PATCH v3 2/3] cpuidle: allow setting deepest idle Jacob Pan
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Jacob Pan @ 2016-11-23 20:13 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, LKML, Linux PM,
	Rafael Wysocki
  Cc: Arjan van de Ven, Srinivas Pandruvada, Len Brown,
	Eduardo Valentin, Zhang Rui, Petr Mladek,
	Sebastian Andrzej Siewior, Jacob Pan

From: Peter Zijlstra <peterz@infradead.org>

Idle injection drivers such as Intel powerclamp and ACPI PAD drivers use
realtime tasks to take control of CPU then inject idle. There are two issues
with this approach:
 1. Low efficiency: injected idle task is treated as busy so sched ticks do
    not stop during injected idle period, the result of these unwanted
    wakeups can be ~20% loss in power savings.
 2. Idle accounting: injected idle time is presented to user as busy.

This patch addresses the issues by introducing a new PF_IDLE flag which
allows any given task to be treated as idle task while the flag is set.
Therefore, idle injection tasks can run through the normal flow of NOHZ idle
enter/exit to get the correct accounting as well as tick stop when possible.

The implication is that idle task is then no longer limited to PID == 0.
Inheriting PF_IDLE flag from parents is also prevented.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 include/linux/cpu.h   |   2 +
 include/linux/sched.h |   3 +-
 kernel/fork.c         |   3 +
 kernel/sched/core.c   |   1 +
 kernel/sched/idle.c   | 174 +++++++++++++++++++++++++++++++-------------------
 5 files changed, 118 insertions(+), 65 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index b886dc1..ac0efae 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -245,6 +245,8 @@ static inline void enable_nonboot_cpus(void) {}
 int cpu_report_state(int cpu);
 int cpu_check_up_prepare(int cpu);
 void cpu_set_state_online(int cpu);
+void play_idle(unsigned long duration_ms);
+
 #ifdef CONFIG_HOTPLUG_CPU
 bool cpu_wait_death(unsigned int cpu, int seconds);
 bool cpu_report_death(void);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b..114c7fc 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2254,6 +2254,7 @@ static inline cputime_t task_gtime(struct task_struct *t)
 /*
  * Per process flags
  */
+#define PF_IDLE		0x00000002	/* I am an IDLE thread */
 #define PF_EXITING	0x00000004	/* getting shut down */
 #define PF_EXITPIDONE	0x00000008	/* pi exit done on shut down */
 #define PF_VCPU		0x00000010	/* I'm a virtual CPU */
@@ -2609,7 +2610,7 @@ extern int sched_setattr(struct task_struct *,
  */
 static inline bool is_idle_task(const struct task_struct *p)
 {
-	return p->pid == 0;
+	return !!(p->flags & PF_IDLE);
 }
 extern struct task_struct *curr_task(int cpu);
 extern void ia64_set_curr_task(int cpu, struct task_struct *p);
diff --git a/kernel/fork.c b/kernel/fork.c
index 997ac1d..73edbcd 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1819,6 +1819,9 @@ static __latent_entropy struct task_struct *copy_process(
 	threadgroup_change_end(current);
 	perf_event_fork(p);
 
+	/* do not inherit idle task */
+	p->flags &= ~PF_IDLE;
+
 	trace_task_newtask(p, clone_flags);
 	uprobe_copy_process(p, clone_flags);
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 154fd68..c95fbcd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5279,6 +5279,7 @@ void init_idle(struct task_struct *idle, int cpu)
 	__sched_fork(0, idle);
 	idle->state = TASK_RUNNING;
 	idle->se.exec_start = sched_clock();
+	idle->flags |= PF_IDLE;
 
 	kasan_unpoison_task_stack(idle);
 
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 1d8718d..cb6442f 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -202,76 +202,68 @@ static void cpuidle_idle_call(void)
  *
  * Called with polling cleared.
  */
-static void cpu_idle_loop(void)
+static void do_idle(void)
 {
-	int cpu = smp_processor_id();
+	/*
+	 * If the arch has a polling bit, we maintain an invariant:
+	 *
+	 * Our polling bit is clear if we're not scheduled (i.e. if
+	 * rq->curr != rq->idle).  This means that, if rq->idle has
+	 * the polling bit set, then setting need_resched is
+	 * guaranteed to cause the cpu to reschedule.
+	 */
 
-	while (1) {
-		/*
-		 * If the arch has a polling bit, we maintain an invariant:
-		 *
-		 * Our polling bit is clear if we're not scheduled (i.e. if
-		 * rq->curr != rq->idle).  This means that, if rq->idle has
-		 * the polling bit set, then setting need_resched is
-		 * guaranteed to cause the cpu to reschedule.
-		 */
+	__current_set_polling();
+	tick_nohz_idle_enter();
 
-		__current_set_polling();
-		quiet_vmstat();
-		tick_nohz_idle_enter();
-
-		while (!need_resched()) {
-			check_pgt_cache();
-			rmb();
-
-			if (cpu_is_offline(cpu)) {
-				cpuhp_report_idle_dead();
-				arch_cpu_idle_dead();
-			}
-
-			local_irq_disable();
-			arch_cpu_idle_enter();
-
-			/*
-			 * In poll mode we reenable interrupts and spin.
-			 *
-			 * Also if we detected in the wakeup from idle
-			 * path that the tick broadcast device expired
-			 * for us, we don't want to go deep idle as we
-			 * know that the IPI is going to arrive right
-			 * away
-			 */
-			if (cpu_idle_force_poll || tick_check_broadcast_expired())
-				cpu_idle_poll();
-			else
-				cpuidle_idle_call();
-
-			arch_cpu_idle_exit();
-		}
+	while (!need_resched()) {
+		check_pgt_cache();
+		rmb();
 
-		/*
-		 * Since we fell out of the loop above, we know
-		 * TIF_NEED_RESCHED must be set, propagate it into
-		 * PREEMPT_NEED_RESCHED.
-		 *
-		 * This is required because for polling idle loops we will
-		 * not have had an IPI to fold the state for us.
-		 */
-		preempt_set_need_resched();
-		tick_nohz_idle_exit();
-		__current_clr_polling();
+		if (cpu_is_offline(smp_processor_id())) {
+			cpuhp_report_idle_dead();
+			arch_cpu_idle_dead();
+		}
 
-		/*
-		 * We promise to call sched_ttwu_pending and reschedule
-		 * if need_resched is set while polling is set.  That
-		 * means that clearing polling needs to be visible
-		 * before doing these things.
-		 */
-		smp_mb__after_atomic();
+		local_irq_disable();
+		arch_cpu_idle_enter();
 
-		sched_ttwu_pending();
-		schedule_preempt_disabled();
+	/* In poll mode we reenable interrupts and spin.
+	 * Also if we detected in the wakeup from idle
+	 * path that the tick broadcast device expired
+	 * for us, we don't want to go deep idle as we
+	 * know that the IPI is going to arrive right
+	 * away
+	 */
+		if (cpu_idle_force_poll || tick_check_broadcast_expired())
+			cpu_idle_poll();
+		else
+			cpuidle_idle_call();
+		arch_cpu_idle_exit();
 	}
+
+	/*
+	 * Since we fell out of the loop above, we know
+	 * TIF_NEED_RESCHED must be set, propagate it into
+	 * PREEMPT_NEED_RESCHED.
+	 *
+	 * This is required because for polling idle loops we will
+	 * not have had an IPI to fold the state for us.
+	 */
+	preempt_set_need_resched();
+	tick_nohz_idle_exit();
+	__current_clr_polling();
+
+	/*
+	 * We promise to call sched_ttwu_pending and reschedule
+	 * if need_resched is set while polling is set.  That
+	 * means that clearing polling needs to be visible
+	 * before doing these things.
+	 */
+	smp_mb__after_atomic();
+
+	sched_ttwu_pending();
+	schedule_preempt_disabled();
 }
 
 bool cpu_in_idle(unsigned long pc)
@@ -280,6 +272,58 @@ bool cpu_in_idle(unsigned long pc)
 		pc < (unsigned long)__cpuidle_text_end;
 }
 
+static enum hrtimer_restart idle_inject_timer_fn(struct hrtimer *hrtimer)
+{
+	set_tsk_need_resched(current);
+	return HRTIMER_NORESTART;
+}
+
+void play_idle(unsigned long duration_ms)
+{
+	struct hrtimer timer;
+	unsigned long end_time;
+
+	/*
+	 * Only FIFO tasks can disable the tick since they don't need the forced
+	 * preemption.
+	 */
+	WARN_ON_ONCE(current->policy != SCHED_FIFO);
+	WARN_ON_ONCE(current->nr_cpus_allowed != 1);
+	WARN_ON_ONCE(!(current->flags & PF_KTHREAD));
+	WARN_ON_ONCE(!(current->flags & PF_NO_SETAFFINITY));
+
+	rcu_sleep_check();
+	preempt_disable();
+	current->flags |= PF_IDLE;
+	cpuidle_use_deepest_state(true);
+
+	/*
+	 * If duration is 0, we will return after a natural wake event occurs. If
+	 * duration is none zero, we will go back to sleep if we were woken up earlier.
+	 * We also set up a timer to make sure we don't oversleep in deep idle.
+	 */
+	if (!duration_ms)
+		do_idle();
+	else {
+		hrtimer_init_on_stack(&timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+		timer.function = idle_inject_timer_fn;
+		hrtimer_start(&timer, ms_to_ktime(duration_ms),
+			HRTIMER_MODE_REL_PINNED);
+		end_time = jiffies + msecs_to_jiffies(duration_ms);
+
+		while (time_after(end_time, jiffies))
+			do_idle();
+	}
+	hrtimer_cancel(&timer);
+
+	cpuidle_use_deepest_state(false);
+	current->flags &= ~PF_IDLE;
+
+	preempt_fold_need_resched();
+	preempt_enable();
+}
+EXPORT_SYMBOL_GPL(play_idle);
+
 void cpu_startup_entry(enum cpuhp_state state)
 {
 	/*
@@ -299,5 +343,7 @@ void cpu_startup_entry(enum cpuhp_state state)
 #endif
 	arch_cpu_idle_prepare();
 	cpuhp_online_idle(state);
-	cpu_idle_loop();
+	while (1)
+		do_idle();
+
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/3] cpuidle: allow setting deepest idle
  2016-11-23 20:13 [PATCH v3 0/3] Stop sched tick in idle injection task Jacob Pan
  2016-11-23 20:13 ` [PATCH v3 1/3] idle: add support for tasks that inject idle Jacob Pan
@ 2016-11-23 20:13 ` Jacob Pan
  2016-11-23 20:13 ` [PATCH v3 3/3] thermal/powerclamp: stop sched tick in forced idle Jacob Pan
  2016-11-23 21:12 ` [PATCH v3 0/3] Stop sched tick in idle injection task Rafael J. Wysocki
  3 siblings, 0 replies; 14+ messages in thread
From: Jacob Pan @ 2016-11-23 20:13 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, LKML, Linux PM,
	Rafael Wysocki
  Cc: Arjan van de Ven, Srinivas Pandruvada, Len Brown,
	Eduardo Valentin, Zhang Rui, Petr Mladek,
	Sebastian Andrzej Siewior, Jacob Pan

When idle injection is used to cap power, we need to override
governor's choice of idle states. This patch allows caller to select
the deepest idle state on a CPU therefore achieve the maximum
potential power saving.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/cpuidle/cpuidle.c | 11 +++++++++++
 include/linux/cpuidle.h   |  4 +++-
 kernel/sched/idle.c       | 13 ++++++++-----
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index c73207a..887a52a 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -97,6 +97,17 @@ static int find_deepest_state(struct cpuidle_driver *drv,
 	return ret;
 }
 
+/* Set the current cpu to use the deepest idle state, override governors */
+void cpuidle_use_deepest_state(bool enable)
+{
+	struct cpuidle_device *dev;
+
+	preempt_disable();
+	dev = cpuidle_get_device();
+	dev->use_deepest_state = enable;
+	preempt_enable();
+}
+
 #ifdef CONFIG_SUSPEND
 /**
  * cpuidle_find_deepest_state - Find the deepest available idle state.
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index bb31373..97c169b 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -74,6 +74,7 @@ struct cpuidle_state {
 struct cpuidle_device {
 	unsigned int		registered:1;
 	unsigned int		enabled:1;
+	unsigned int		use_deepest_state:1;
 	unsigned int		cpu;
 
 	int			last_residency;
@@ -192,11 +193,12 @@ static inline struct cpuidle_driver *cpuidle_get_cpu_driver(
 static inline struct cpuidle_device *cpuidle_get_device(void) {return NULL; }
 #endif
 
-#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_SUSPEND)
+#ifdef CONFIG_CPU_IDLE
 extern int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
 				      struct cpuidle_device *dev);
 extern int cpuidle_enter_freeze(struct cpuidle_driver *drv,
 				struct cpuidle_device *dev);
+extern void cpuidle_use_deepest_state(bool enable);
 #else
 static inline int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
 					     struct cpuidle_device *dev)
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index cb6442f..2729da5 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -164,11 +164,14 @@ static void cpuidle_idle_call(void)
 	 * timekeeping to prevent timer interrupts from kicking us out of idle
 	 * until a proper wakeup interrupt happens.
 	 */
-	if (idle_should_freeze()) {
-		entered_state = cpuidle_enter_freeze(drv, dev);
-		if (entered_state > 0) {
-			local_irq_enable();
-			goto exit_idle;
+
+	if (idle_should_freeze() || dev->use_deepest_state) {
+		if (idle_should_freeze()) {
+			entered_state = cpuidle_enter_freeze(drv, dev);
+			if (entered_state > 0) {
+				local_irq_enable();
+				goto exit_idle;
+			}
 		}
 
 		next_state = cpuidle_find_deepest_state(drv, dev);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/3] thermal/powerclamp: stop sched tick in forced idle
  2016-11-23 20:13 [PATCH v3 0/3] Stop sched tick in idle injection task Jacob Pan
  2016-11-23 20:13 ` [PATCH v3 1/3] idle: add support for tasks that inject idle Jacob Pan
  2016-11-23 20:13 ` [PATCH v3 2/3] cpuidle: allow setting deepest idle Jacob Pan
@ 2016-11-23 20:13 ` Jacob Pan
  2016-11-23 21:12 ` [PATCH v3 0/3] Stop sched tick in idle injection task Rafael J. Wysocki
  3 siblings, 0 replies; 14+ messages in thread
From: Jacob Pan @ 2016-11-23 20:13 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, LKML, Linux PM,
	Rafael Wysocki
  Cc: Arjan van de Ven, Srinivas Pandruvada, Len Brown,
	Eduardo Valentin, Zhang Rui, Petr Mladek,
	Sebastian Andrzej Siewior, Jacob Pan

With the introduction of play_idle(), idle injection kthread can
go through the normal idle task processing to get correct accounting
and turn off scheduler tick when possible.

Hrtimer is used to wake up since timeout is most likely to occur
during idle injection.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/thermal/intel_powerclamp.c | 35 +----------------------------------
 1 file changed, 1 insertion(+), 34 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index 745fcec..68dd963 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -93,7 +93,6 @@ struct powerclamp_worker_data {
 	struct kthread_worker *worker;
 	struct kthread_work balancing_work;
 	struct kthread_delayed_work idle_injection_work;
-	struct timer_list wakeup_timer;
 	unsigned int cpu;
 	unsigned int count;
 	unsigned int guard;
@@ -278,11 +277,6 @@ static u64 pkg_state_counter(void)
 	return count;
 }
 
-static void noop_timer(unsigned long foo)
-{
-	/* empty... just the fact that we get the interrupt wakes us up */
-}
-
 static unsigned int get_compensation(int ratio)
 {
 	unsigned int comp = 0;
@@ -432,7 +426,6 @@ static void clamp_balancing_func(struct kthread_work *work)
 static void clamp_idle_injection_func(struct kthread_work *work)
 {
 	struct powerclamp_worker_data *w_data;
-	unsigned long target_jiffies;
 
 	w_data = container_of(work, struct powerclamp_worker_data,
 			      idle_injection_work.work);
@@ -453,31 +446,7 @@ static void clamp_idle_injection_func(struct kthread_work *work)
 	if (should_skip)
 		goto balance;
 
-	target_jiffies = jiffies + w_data->duration_jiffies;
-	mod_timer(&w_data->wakeup_timer, target_jiffies);
-	if (unlikely(local_softirq_pending()))
-		goto balance;
-	/*
-	 * stop tick sched during idle time, interrupts are still
-	 * allowed. thus jiffies are updated properly.
-	 */
-	preempt_disable();
-	/* mwait until target jiffies is reached */
-	while (time_before(jiffies, target_jiffies)) {
-		unsigned long ecx = 1;
-		unsigned long eax = target_mwait;
-
-		/*
-		 * REVISIT: may call enter_idle() to notify drivers who
-		 * can save power during cpu idle. same for exit_idle()
-		 */
-		local_touch_nmi();
-		stop_critical_timings();
-		mwait_idle_with_hints(eax, ecx);
-		start_critical_timings();
-		atomic_inc(&idle_wakeup_counter);
-	}
-	preempt_enable();
+	play_idle(jiffies_to_msecs(w_data->duration_jiffies));
 
 balance:
 	if (clamping && w_data->clamping && cpu_online(w_data->cpu))
@@ -540,7 +509,6 @@ static void start_power_clamp_worker(unsigned long cpu)
 	w_data->cpu = cpu;
 	w_data->clamping = true;
 	set_bit(cpu, cpu_clamping_mask);
-	setup_timer(&w_data->wakeup_timer, noop_timer, 0);
 	sched_setscheduler(worker->task, SCHED_FIFO, &sparam);
 	kthread_init_work(&w_data->balancing_work, clamp_balancing_func);
 	kthread_init_delayed_work(&w_data->idle_injection_work,
@@ -572,7 +540,6 @@ static void stop_power_clamp_worker(unsigned long cpu)
 	 * a big deal. The balancing work is fast and destroy kthread
 	 * will wait for it.
 	 */
-	del_timer_sync(&w_data->wakeup_timer);
 	clear_bit(w_data->cpu, cpu_clamping_mask);
 	kthread_destroy_worker(w_data->worker);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/3] Stop sched tick in idle injection task
  2016-11-23 20:13 [PATCH v3 0/3] Stop sched tick in idle injection task Jacob Pan
                   ` (2 preceding siblings ...)
  2016-11-23 20:13 ` [PATCH v3 3/3] thermal/powerclamp: stop sched tick in forced idle Jacob Pan
@ 2016-11-23 21:12 ` Rafael J. Wysocki
  2016-11-23 21:45   ` Peter Zijlstra
  2016-11-24  4:21   ` Ingo Molnar
  3 siblings, 2 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2016-11-23 21:12 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, LKML, Linux PM,
	Rafael Wysocki, Arjan van de Ven, Srinivas Pandruvada, Len Brown,
	Eduardo Valentin, Zhang Rui, Petr Mladek,
	Sebastian Andrzej Siewior

On Wed, Nov 23, 2016 at 9:13 PM, Jacob Pan
<jacob.jun.pan@linux.intel.com> wrote:
> Changelog:
> v3:     - rearrange idle.c change based on Rafael's suggestion.
>
> v2:
>         - moved duration timer from powerclamp driver to play_idle()
>         - unexport cpuidle_use_deepest_state
>         - indentation fix
>
> Idle injection drivers today use RT threads to run idle loop. There are
> efficiency and accounting issues with the current intel_powerclamp.c
> and acpi_pad.c. A while ago, I posted CFS based idle injection patch trying
> to address them:
> https://lkml.org/lkml/2015/11/13/576
>
> Peter proposed another approach with the introduction of a PF_IDLE flag.
> This patchset is based on his original posting:
> https://lkml.org/lkml/2014/6/4/56
>
> These patches apply on top of the kworker and cpu hotplug state machine
> changes made to Intel powerclamp driver.
> https://lkml.org/lkml/2016/10/17/362
>
> Similar changes to ACPI PAD driver is developed along with other
> enhancements. It will be posted after this patchset is accepted.
>
> Jacob Pan (2):
>   cpuidle: allow setting deepest idle
>   thermal/powerclamp: stop sched tick in forced idle
>
> Peter Zijlstra (1):
>   idle: add support for tasks that inject idle
>
>  drivers/cpuidle/cpuidle.c          |  11 +++
>  drivers/thermal/intel_powerclamp.c |  35 +------
>  include/linux/cpu.h                |   2 +
>  include/linux/cpuidle.h            |   4 +-
>  include/linux/sched.h              |   3 +-
>  kernel/fork.c                      |   3 +
>  kernel/sched/core.c                |   1 +
>  kernel/sched/idle.c                | 183 +++++++++++++++++++++++--------------
>  8 files changed, 139 insertions(+), 103 deletions(-)

Any objections anyone?

If not, I'll queue up this series for 4.10.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/3] Stop sched tick in idle injection task
  2016-11-23 21:12 ` [PATCH v3 0/3] Stop sched tick in idle injection task Rafael J. Wysocki
@ 2016-11-23 21:45   ` Peter Zijlstra
  2016-11-23 22:38     ` Rafael J. Wysocki
  2016-11-24  4:21   ` Ingo Molnar
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2016-11-23 21:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jacob Pan, Ingo Molnar, Thomas Gleixner, LKML, Linux PM,
	Rafael Wysocki, Arjan van de Ven, Srinivas Pandruvada, Len Brown,
	Eduardo Valentin, Zhang Rui, Petr Mladek,
	Sebastian Andrzej Siewior

On Wed, Nov 23, 2016 at 10:12:46PM +0100, Rafael J. Wysocki wrote:
> On Wed, Nov 23, 2016 at 9:13 PM, Jacob Pan

> 
> Any objections anyone?
> 
> If not, I'll queue up this series for 4.10.

1h30 is a bit short to expect people to have even seen this, let alone
reviewed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/3] Stop sched tick in idle injection task
  2016-11-23 21:45   ` Peter Zijlstra
@ 2016-11-23 22:38     ` Rafael J. Wysocki
  2016-11-24  0:25       ` Jacob Pan
  0 siblings, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2016-11-23 22:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Jacob Pan, Ingo Molnar, Thomas Gleixner, LKML,
	Linux PM, Rafael Wysocki, Arjan van de Ven, Srinivas Pandruvada,
	Len Brown, Eduardo Valentin, Zhang Rui, Petr Mladek,
	Sebastian Andrzej Siewior

On Wed, Nov 23, 2016 at 10:45 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Nov 23, 2016 at 10:12:46PM +0100, Rafael J. Wysocki wrote:
>> On Wed, Nov 23, 2016 at 9:13 PM, Jacob Pan
>
>>
>> Any objections anyone?
>>
>> If not, I'll queue up this series for 4.10.
>
> 1h30 is a bit short to expect people to have even seen this, let alone
> reviewed.

I didn't say I'd queue it up right away. :-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/3] Stop sched tick in idle injection task
  2016-11-24  0:25       ` Jacob Pan
@ 2016-11-24  0:24         ` Rafael J. Wysocki
  0 siblings, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2016-11-24  0:24 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	LKML, Linux PM, Rafael Wysocki, Arjan van de Ven,
	Srinivas Pandruvada, Len Brown, Eduardo Valentin, Zhang Rui,
	Petr Mladek, Sebastian Andrzej Siewior

On Thu, Nov 24, 2016 at 1:25 AM, Jacob Pan
<jacob.jun.pan@linux.intel.com> wrote:
> On Wed, 23 Nov 2016 23:38:39 +0100
> "Rafael J. Wysocki" <rafael@kernel.org> wrote:
>
>> On Wed, Nov 23, 2016 at 10:45 PM, Peter Zijlstra
>> <peterz@infradead.org> wrote:
>> > On Wed, Nov 23, 2016 at 10:12:46PM +0100, Rafael J. Wysocki wrote:
>> >> On Wed, Nov 23, 2016 at 9:13 PM, Jacob Pan
>> >
>> >>
>> >> Any objections anyone?
>> >>
>> >> If not, I'll queue up this series for 4.10.
>> >
>> > 1h30 is a bit short to expect people to have even seen this, let
>> > alone reviewed.
>>
>> I didn't say I'd queue it up right away. :-)
>
> The powerclamp change is on top of Petr and Sebastian's kworker and cpu
> hp state machine patches. I plan to put all powerclamp patches in one
> series after 1 and 2 are accepted.

OK

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/3] Stop sched tick in idle injection task
  2016-11-23 22:38     ` Rafael J. Wysocki
@ 2016-11-24  0:25       ` Jacob Pan
  2016-11-24  0:24         ` Rafael J. Wysocki
  0 siblings, 1 reply; 14+ messages in thread
From: Jacob Pan @ 2016-11-24  0:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, LKML, Linux PM,
	Rafael Wysocki, Arjan van de Ven, Srinivas Pandruvada, Len Brown,
	Eduardo Valentin, Zhang Rui, Petr Mladek,
	Sebastian Andrzej Siewior, jacob.jun.pan

On Wed, 23 Nov 2016 23:38:39 +0100
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Wed, Nov 23, 2016 at 10:45 PM, Peter Zijlstra
> <peterz@infradead.org> wrote:
> > On Wed, Nov 23, 2016 at 10:12:46PM +0100, Rafael J. Wysocki wrote:  
> >> On Wed, Nov 23, 2016 at 9:13 PM, Jacob Pan  
> >  
> >>
> >> Any objections anyone?
> >>
> >> If not, I'll queue up this series for 4.10.  
> >
> > 1h30 is a bit short to expect people to have even seen this, let
> > alone reviewed.  
> 
> I didn't say I'd queue it up right away. :-)

The powerclamp change is on top of Petr and Sebastian's kworker and cpu
hp state machine patches. I plan to put all powerclamp patches in one
series after 1 and 2 are accepted.

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] idle: add support for tasks that inject idle
  2016-11-23 20:13 ` [PATCH v3 1/3] idle: add support for tasks that inject idle Jacob Pan
@ 2016-11-24  4:18   ` Ingo Molnar
  2016-11-24  9:50   ` Peter Zijlstra
  1 sibling, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2016-11-24  4:18 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, LKML, Linux PM,
	Rafael Wysocki, Arjan van de Ven, Srinivas Pandruvada, Len Brown,
	Eduardo Valentin, Zhang Rui, Petr Mladek,
	Sebastian Andrzej Siewior


* Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> From: Peter Zijlstra <peterz@infradead.org>
> 
> Idle injection drivers such as Intel powerclamp and ACPI PAD drivers use
> realtime tasks to take control of CPU then inject idle. There are two issues
> with this approach:

'CPU' is capitalized properly here.

And here:

>  #define PF_VCPU		0x00000010	/* I'm a virtual CPU */

But not later in the patch.

> +++ b/kernel/fork.c
> @@ -1819,6 +1819,9 @@ static __latent_entropy struct task_struct *copy_process(
>  	threadgroup_change_end(current);
>  	perf_event_fork(p);
>  
> +	/* do not inherit idle task */
> +	p->flags &= ~PF_IDLE;

Sloppy formulation: how can the idle task be inherited? Also, capitalize comments 
please.

> +	/*
> +	 * If the arch has a polling bit, we maintain an invariant:
> +	 *
> +	 * Our polling bit is clear if we're not scheduled (i.e. if
> +	 * rq->curr != rq->idle).  This means that, if rq->idle has
> +	 * the polling bit set, then setting need_resched is
> +	 * guaranteed to cause the cpu to reschedule.
> +	 */

'CPU' is not properly capitalized here.

> +	/* In poll mode we reenable interrupts and spin.
> +	 * Also if we detected in the wakeup from idle
> +	 * path that the tick broadcast device expired
> +	 * for us, we don't want to go deep idle as we
> +	 * know that the IPI is going to arrive right
> +	 * away
> +	 */
> +		if (cpu_idle_force_poll || tick_check_broadcast_expired())
> +			cpu_idle_poll();
> +		else
> +			cpuidle_idle_call();
> +		arch_cpu_idle_exit();

Sigh: that's not standard comment style, nor is it standard coding style.

Also, multi-sentence comments should end with a full stop.

> +	/*
> +	 * We promise to call sched_ttwu_pending and reschedule
> +	 * if need_resched is set while polling is set.  That
> +	 * means that clearing polling needs to be visible
> +	 * before doing these things.
> +	 */

When referring to functions in comments please spell them as "fn()", i.e. with 
parentheses.

> +	/*
> +	 * If duration is 0, we will return after a natural wake event occurs. If
> +	 * duration is none zero, we will go back to sleep if we were woken up earlier.
> +	 * We also set up a timer to make sure we don't oversleep in deep idle.
> +	 */
> +	if (!duration_ms)
> +		do_idle();
> +	else {
> +		hrtimer_init_on_stack(&timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +		timer.function = idle_inject_timer_fn;
> +		hrtimer_start(&timer, ms_to_ktime(duration_ms),
> +			HRTIMER_MODE_REL_PINNED);
> +		end_time = jiffies + msecs_to_jiffies(duration_ms);
> +
> +		while (time_after(end_time, jiffies))
> +			do_idle();
> +	}

Curly braces should be balanced and nonsensical line breaks should be omitted.

Also, comments should be read and grammar should be fixed.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/3] Stop sched tick in idle injection task
  2016-11-23 21:12 ` [PATCH v3 0/3] Stop sched tick in idle injection task Rafael J. Wysocki
  2016-11-23 21:45   ` Peter Zijlstra
@ 2016-11-24  4:21   ` Ingo Molnar
  1 sibling, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2016-11-24  4:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jacob Pan, Peter Zijlstra, Ingo Molnar, Thomas Gleixner, LKML,
	Linux PM, Rafael Wysocki, Arjan van de Ven, Srinivas Pandruvada,
	Len Brown, Eduardo Valentin, Zhang Rui, Petr Mladek,
	Sebastian Andrzej Siewior


* Rafael J. Wysocki <rafael@kernel.org> wrote:

> On Wed, Nov 23, 2016 at 9:13 PM, Jacob Pan
> <jacob.jun.pan@linux.intel.com> wrote:
> > Changelog:
> > v3:     - rearrange idle.c change based on Rafael's suggestion.
> >
> > v2:
> >         - moved duration timer from powerclamp driver to play_idle()
> >         - unexport cpuidle_use_deepest_state
> >         - indentation fix
> >
> > Idle injection drivers today use RT threads to run idle loop. There are
> > efficiency and accounting issues with the current intel_powerclamp.c
> > and acpi_pad.c. A while ago, I posted CFS based idle injection patch trying
> > to address them:
> > https://lkml.org/lkml/2015/11/13/576
> >
> > Peter proposed another approach with the introduction of a PF_IDLE flag.
> > This patchset is based on his original posting:
> > https://lkml.org/lkml/2014/6/4/56
> >
> > These patches apply on top of the kworker and cpu hotplug state machine
> > changes made to Intel powerclamp driver.
> > https://lkml.org/lkml/2016/10/17/362
> >
> > Similar changes to ACPI PAD driver is developed along with other
> > enhancements. It will be posted after this patchset is accepted.
> >
> > Jacob Pan (2):
> >   cpuidle: allow setting deepest idle
> >   thermal/powerclamp: stop sched tick in forced idle
> >
> > Peter Zijlstra (1):
> >   idle: add support for tasks that inject idle
> >
> >  drivers/cpuidle/cpuidle.c          |  11 +++
> >  drivers/thermal/intel_powerclamp.c |  35 +------
> >  include/linux/cpu.h                |   2 +
> >  include/linux/cpuidle.h            |   4 +-
> >  include/linux/sched.h              |   3 +-
> >  kernel/fork.c                      |   3 +
> >  kernel/sched/core.c                |   1 +
> >  kernel/sched/idle.c                | 183 +++++++++++++++++++++++--------------
> >  8 files changed, 139 insertions(+), 103 deletions(-)
> 
> Any objections anyone?
> 
> If not, I'll queue up this series for 4.10.

NAK. The first patch alone has a dozen of trivial problems - very sloppy.

Guys, please rework, double check and actually _READ_ the fine patches you are 
promoting and then in a couple of days I'll review this series again to see 
whether the NAK can be lifted ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] idle: add support for tasks that inject idle
  2016-11-23 20:13 ` [PATCH v3 1/3] idle: add support for tasks that inject idle Jacob Pan
  2016-11-24  4:18   ` Ingo Molnar
@ 2016-11-24  9:50   ` Peter Zijlstra
  2016-11-24 11:04     ` Ingo Molnar
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2016-11-24  9:50 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Ingo Molnar, Thomas Gleixner, LKML, Linux PM, Rafael Wysocki,
	Arjan van de Ven, Srinivas Pandruvada, Len Brown,
	Eduardo Valentin, Zhang Rui, Petr Mladek,
	Sebastian Andrzej Siewior

On Wed, Nov 23, 2016 at 12:13:08PM -0800, Jacob Pan wrote:
> @@ -280,6 +272,58 @@ bool cpu_in_idle(unsigned long pc)
>  		pc < (unsigned long)__cpuidle_text_end;
>  }
>  
> +static enum hrtimer_restart idle_inject_timer_fn(struct hrtimer *hrtimer)
> +{
> +	set_tsk_need_resched(current);
> +	return HRTIMER_NORESTART;
> +}
> +
> +void play_idle(unsigned long duration_ms)
> +{
> +	struct hrtimer timer;
> +	unsigned long end_time;
> +
> +	/*
> +	 * Only FIFO tasks can disable the tick since they don't need the forced
> +	 * preemption.
> +	 */
> +	WARN_ON_ONCE(current->policy != SCHED_FIFO);
> +	WARN_ON_ONCE(current->nr_cpus_allowed != 1);
> +	WARN_ON_ONCE(!(current->flags & PF_KTHREAD));
> +	WARN_ON_ONCE(!(current->flags & PF_NO_SETAFFINITY));
> +
> +	rcu_sleep_check();
> +	preempt_disable();
> +	current->flags |= PF_IDLE;
> +	cpuidle_use_deepest_state(true);
> +
> +	/*
> +	 * If duration is 0, we will return after a natural wake event occurs. If
> +	 * duration is none zero, we will go back to sleep if we were woken up earlier.
> +	 * We also set up a timer to make sure we don't oversleep in deep idle.
> +	 */
> +	if (!duration_ms)
> +		do_idle();

OK, so that doesn't make any sense, you should not be calling this
without a timeout.

> +	else {
> +		hrtimer_init_on_stack(&timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +		timer.function = idle_inject_timer_fn;
> +		hrtimer_start(&timer, ms_to_ktime(duration_ms),
> +			HRTIMER_MODE_REL_PINNED);
> +		end_time = jiffies + msecs_to_jiffies(duration_ms);
> +
> +		while (time_after(end_time, jiffies))
> +			do_idle();
> +	}
> +	hrtimer_cancel(&timer);
> +
> +	cpuidle_use_deepest_state(false);
> +	current->flags &= ~PF_IDLE;
> +
> +	preempt_fold_need_resched();
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL_GPL(play_idle);


How about something like so... (since I had to edit, I fixed up most
things Ingo complained about as well).

Note that it doesn't build because of a distinct lack of
cpuidle_use_deepest_state() in my kernel tree.

---
Subject: idle: add support for tasks that inject idle
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed, 23 Nov 2016 12:13:08 -0800

Idle injection drivers such as Intel powerclamp and ACPI PAD drivers use
realtime tasks to take control of CPU then inject idle. There are two
issues with this approach:

 1. Low efficiency: injected idle task is treated as busy so sched ticks
    do not stop during injected idle period, the result of these
    unwanted wakeups can be ~20% loss in power savings.

 2. Idle accounting: injected idle time is presented to user as busy.

This patch addresses the issues by introducing a new PF_IDLE flag which
allows any given task to be treated as idle task while the flag is set.
Therefore, idle injection tasks can run through the normal flow of NOHZ
idle enter/exit to get the correct accounting as well as tick stop when
possible.

The implication is that idle task is then no longer limited to PID == 0.

Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/cpu.h   |    2 
 include/linux/sched.h |    3 
 kernel/fork.c         |    2 
 kernel/sched/core.c   |    1 
 kernel/sched/idle.c   |  165 +++++++++++++++++++++++++++++++-------------------
 5 files changed, 109 insertions(+), 64 deletions(-)

--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -245,6 +245,8 @@ void arch_cpu_idle_dead(void);
 int cpu_report_state(int cpu);
 int cpu_check_up_prepare(int cpu);
 void cpu_set_state_online(int cpu);
+void play_idle(unsigned long duration_ms);
+
 #ifdef CONFIG_HOTPLUG_CPU
 bool cpu_wait_death(unsigned int cpu, int seconds);
 bool cpu_report_death(void);
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2284,6 +2284,7 @@ extern void thread_group_cputime_adjuste
 /*
  * Per process flags
  */
+#define PF_IDLE		0x00000002	/* I am an IDLE thread */
 #define PF_EXITING	0x00000004	/* getting shut down */
 #define PF_EXITPIDONE	0x00000008	/* pi exit done on shut down */
 #define PF_VCPU		0x00000010	/* I'm a virtual CPU */
@@ -2645,7 +2646,7 @@ extern struct task_struct *idle_task(int
  */
 static inline bool is_idle_task(const struct task_struct *p)
 {
-	return p->pid == 0;
+	return !!(p->flags & PF_IDLE);
 }
 extern struct task_struct *curr_task(int cpu);
 extern void ia64_set_curr_task(int cpu, struct task_struct *p);
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1542,7 +1542,7 @@ static __latent_entropy struct task_stru
 		goto bad_fork_cleanup_count;
 
 	delayacct_tsk_init(p);	/* Must remain after dup_task_struct() */
-	p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER);
+	p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
 	p->flags |= PF_FORKNOEXEC;
 	INIT_LIST_HEAD(&p->children);
 	INIT_LIST_HEAD(&p->sibling);
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5298,6 +5298,7 @@ void init_idle(struct task_struct *idle,
 	__sched_fork(0, idle);
 	idle->state = TASK_RUNNING;
 	idle->se.exec_start = sched_clock();
+	idle->flags |= PF_IDLE;
 
 	kasan_unpoison_task_stack(idle);
 
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -202,76 +202,65 @@ static void cpuidle_idle_call(void)
  *
  * Called with polling cleared.
  */
-static void cpu_idle_loop(void)
+static void do_idle(void)
 {
-	int cpu = smp_processor_id();
-
-	while (1) {
-		/*
-		 * If the arch has a polling bit, we maintain an invariant:
-		 *
-		 * Our polling bit is clear if we're not scheduled (i.e. if
-		 * rq->curr != rq->idle).  This means that, if rq->idle has
-		 * the polling bit set, then setting need_resched is
-		 * guaranteed to cause the cpu to reschedule.
-		 */
+	/*
+	 * If the arch has a polling bit, we maintain an invariant:
+	 *
+	 * Our polling bit is clear if we're not scheduled (i.e. if rq->curr !=
+	 * rq->idle). This means that, if rq->idle has the polling bit set,
+	 * then setting need_resched is guaranteed to cause the CPU to
+	 * reschedule.
+	 */
 
-		__current_set_polling();
-		quiet_vmstat();
-		tick_nohz_idle_enter();
-
-		while (!need_resched()) {
-			check_pgt_cache();
-			rmb();
-
-			if (cpu_is_offline(cpu)) {
-				cpuhp_report_idle_dead();
-				arch_cpu_idle_dead();
-			}
-
-			local_irq_disable();
-			arch_cpu_idle_enter();
-
-			/*
-			 * In poll mode we reenable interrupts and spin.
-			 *
-			 * Also if we detected in the wakeup from idle
-			 * path that the tick broadcast device expired
-			 * for us, we don't want to go deep idle as we
-			 * know that the IPI is going to arrive right
-			 * away
-			 */
-			if (cpu_idle_force_poll || tick_check_broadcast_expired())
-				cpu_idle_poll();
-			else
-				cpuidle_idle_call();
+	__current_set_polling();
+	tick_nohz_idle_enter();
 
-			arch_cpu_idle_exit();
+	while (!need_resched()) {
+		check_pgt_cache();
+		rmb();
+
+		if (cpu_is_offline(smp_processor_id())) {
+			cpuhp_report_idle_dead();
+			arch_cpu_idle_dead();
 		}
 
-		/*
-		 * Since we fell out of the loop above, we know
-		 * TIF_NEED_RESCHED must be set, propagate it into
-		 * PREEMPT_NEED_RESCHED.
-		 *
-		 * This is required because for polling idle loops we will
-		 * not have had an IPI to fold the state for us.
-		 */
-		preempt_set_need_resched();
-		tick_nohz_idle_exit();
-		__current_clr_polling();
+		local_irq_disable();
+		arch_cpu_idle_enter();
 
 		/*
-		 * We promise to call sched_ttwu_pending and reschedule
-		 * if need_resched is set while polling is set.  That
-		 * means that clearing polling needs to be visible
-		 * before doing these things.
+		 * In poll mode we reenable interrupts and spin. Also if we
+		 * detected in the wakeup from idle path that the tick
+		 * broadcast device expired for us, we don't want to go deep
+		 * idle as we know that the IPI is going to arrive right away.
 		 */
-		smp_mb__after_atomic();
-
-		sched_ttwu_pending();
-		schedule_preempt_disabled();
+		if (cpu_idle_force_poll || tick_check_broadcast_expired())
+			cpu_idle_poll();
+		else
+			cpuidle_idle_call();
+		arch_cpu_idle_exit();
 	}
+
+	/*
+	 * Since we fell out of the loop above, we know TIF_NEED_RESCHED must
+	 * be set, propagate it into PREEMPT_NEED_RESCHED.
+	 *
+	 * This is required because for polling idle loops we will not have had
+	 * an IPI to fold the state for us.
+	 */
+	preempt_set_need_resched();
+	tick_nohz_idle_exit();
+	__current_clr_polling();
+
+	/*
+	 * We promise to call sched_ttwu_pending() and reschedule if
+	 * need_resched() is set while polling is set. That means that clearing
+	 * polling needs to be visible before doing these things.
+	 */
+	smp_mb__after_atomic();
+
+	sched_ttwu_pending();
+	schedule_preempt_disabled();
 }
 
 bool cpu_in_idle(unsigned long pc)
@@ -280,6 +269,56 @@ bool cpu_in_idle(unsigned long pc)
 		pc < (unsigned long)__cpuidle_text_end;
 }
 
+struct idle_timer {
+	struct hrtimer timer;
+	int done;
+};
+
+static enum hrtimer_restart idle_inject_timer_fn(struct hrtimer *timer)
+{
+	struct idle_timer *it = container_of(timer, struct idle_timer, timer);
+
+	WRITE_ONCE(it->done, 1);
+	set_tsk_need_resched(current);
+
+	return HRTIMER_NORESTART;
+}
+
+void play_idle(unsigned long duration_ms)
+{
+	struct idle_timer it;
+
+	/*
+	 * Only FIFO tasks can disable the tick since they don't need the forced
+	 * preemption.
+	 */
+	WARN_ON_ONCE(current->policy != SCHED_FIFO);
+	WARN_ON_ONCE(current->nr_cpus_allowed != 1);
+	WARN_ON_ONCE(!(current->flags & PF_KTHREAD));
+	WARN_ON_ONCE(!(current->flags & PF_NO_SETAFFINITY));
+	WARN_ON_ONCE(!duration_ms);
+
+	rcu_sleep_check();
+	preempt_disable();
+	current->flags |= PF_IDLE;
+	cpuidle_use_deepest_state(true);
+
+	it.done = 0;
+	hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	it.timer.function = idle_inject_timer_fn;
+	hrtimer_start(&it.timer, ms_to_ktime(duration_ms), HRTIMER_MODE_REL_PINNED);
+
+	while (!READ_ONCE(it.done))
+		do_idle();
+
+	cpuidle_use_deepest_state(false);
+	current->flags &= ~PF_IDLE;
+
+	preempt_fold_need_resched();
+	preempt_enable();
+}
+EXPORT_SYMBOL_GPL(play_idle);
+
 void cpu_startup_entry(enum cpuhp_state state)
 {
 	/*
@@ -299,5 +338,6 @@ void cpu_startup_entry(enum cpuhp_state
 #endif
 	arch_cpu_idle_prepare();
 	cpuhp_online_idle(state);
-	cpu_idle_loop();
+	while (1)
+		do_idle();
 }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] idle: add support for tasks that inject idle
  2016-11-24  9:50   ` Peter Zijlstra
@ 2016-11-24 11:04     ` Ingo Molnar
  2016-11-24 13:44       ` Rafael J. Wysocki
  0 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2016-11-24 11:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jacob Pan, Ingo Molnar, Thomas Gleixner, LKML, Linux PM,
	Rafael Wysocki, Arjan van de Ven, Srinivas Pandruvada, Len Brown,
	Eduardo Valentin, Zhang Rui, Petr Mladek,
	Sebastian Andrzej Siewior


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Nov 23, 2016 at 12:13:08PM -0800, Jacob Pan wrote:
> > @@ -280,6 +272,58 @@ bool cpu_in_idle(unsigned long pc)
> >  		pc < (unsigned long)__cpuidle_text_end;
> >  }
> >  
> > +static enum hrtimer_restart idle_inject_timer_fn(struct hrtimer *hrtimer)
> > +{
> > +	set_tsk_need_resched(current);
> > +	return HRTIMER_NORESTART;
> > +}
> > +
> > +void play_idle(unsigned long duration_ms)
> > +{
> > +	struct hrtimer timer;
> > +	unsigned long end_time;
> > +
> > +	/*
> > +	 * Only FIFO tasks can disable the tick since they don't need the forced
> > +	 * preemption.
> > +	 */
> > +	WARN_ON_ONCE(current->policy != SCHED_FIFO);
> > +	WARN_ON_ONCE(current->nr_cpus_allowed != 1);
> > +	WARN_ON_ONCE(!(current->flags & PF_KTHREAD));
> > +	WARN_ON_ONCE(!(current->flags & PF_NO_SETAFFINITY));
> > +
> > +	rcu_sleep_check();
> > +	preempt_disable();
> > +	current->flags |= PF_IDLE;
> > +	cpuidle_use_deepest_state(true);
> > +
> > +	/*
> > +	 * If duration is 0, we will return after a natural wake event occurs. If
> > +	 * duration is none zero, we will go back to sleep if we were woken up earlier.
> > +	 * We also set up a timer to make sure we don't oversleep in deep idle.
> > +	 */
> > +	if (!duration_ms)
> > +		do_idle();
> 
> OK, so that doesn't make any sense, you should not be calling this
> without a timeout.
> 
> > +	else {
> > +		hrtimer_init_on_stack(&timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> > +		timer.function = idle_inject_timer_fn;
> > +		hrtimer_start(&timer, ms_to_ktime(duration_ms),
> > +			HRTIMER_MODE_REL_PINNED);
> > +		end_time = jiffies + msecs_to_jiffies(duration_ms);
> > +
> > +		while (time_after(end_time, jiffies))
> > +			do_idle();
> > +	}
> > +	hrtimer_cancel(&timer);
> > +
> > +	cpuidle_use_deepest_state(false);
> > +	current->flags &= ~PF_IDLE;
> > +
> > +	preempt_fold_need_resched();
> > +	preempt_enable();
> > +}
> > +EXPORT_SYMBOL_GPL(play_idle);
> 
> 
> How about something like so... (since I had to edit, I fixed up most
> things Ingo complained about as well).
> 
> Note that it doesn't build because of a distinct lack of
> cpuidle_use_deepest_state() in my kernel tree.

Ok, I really like this one, it so much cleaner!

If the patch builds & works:

  Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] idle: add support for tasks that inject idle
  2016-11-24 11:04     ` Ingo Molnar
@ 2016-11-24 13:44       ` Rafael J. Wysocki
  0 siblings, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2016-11-24 13:44 UTC (permalink / raw)
  To: Ingo Molnar, Jacob Pan
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, LKML, Linux PM,
	Rafael Wysocki, Arjan van de Ven, Srinivas Pandruvada, Len Brown,
	Eduardo Valentin, Zhang Rui, Petr Mladek,
	Sebastian Andrzej Siewior

On Thu, Nov 24, 2016 at 12:04 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Wed, Nov 23, 2016 at 12:13:08PM -0800, Jacob Pan wrote:
>> > @@ -280,6 +272,58 @@ bool cpu_in_idle(unsigned long pc)
>> >             pc < (unsigned long)__cpuidle_text_end;
>> >  }
>> >
>> > +static enum hrtimer_restart idle_inject_timer_fn(struct hrtimer *hrtimer)
>> > +{
>> > +   set_tsk_need_resched(current);
>> > +   return HRTIMER_NORESTART;
>> > +}
>> > +
>> > +void play_idle(unsigned long duration_ms)
>> > +{
>> > +   struct hrtimer timer;
>> > +   unsigned long end_time;
>> > +
>> > +   /*
>> > +    * Only FIFO tasks can disable the tick since they don't need the forced
>> > +    * preemption.
>> > +    */
>> > +   WARN_ON_ONCE(current->policy != SCHED_FIFO);
>> > +   WARN_ON_ONCE(current->nr_cpus_allowed != 1);
>> > +   WARN_ON_ONCE(!(current->flags & PF_KTHREAD));
>> > +   WARN_ON_ONCE(!(current->flags & PF_NO_SETAFFINITY));
>> > +
>> > +   rcu_sleep_check();
>> > +   preempt_disable();
>> > +   current->flags |= PF_IDLE;
>> > +   cpuidle_use_deepest_state(true);
>> > +
>> > +   /*
>> > +    * If duration is 0, we will return after a natural wake event occurs. If
>> > +    * duration is none zero, we will go back to sleep if we were woken up earlier.
>> > +    * We also set up a timer to make sure we don't oversleep in deep idle.
>> > +    */
>> > +   if (!duration_ms)
>> > +           do_idle();
>>
>> OK, so that doesn't make any sense, you should not be calling this
>> without a timeout.
>>
>> > +   else {
>> > +           hrtimer_init_on_stack(&timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>> > +           timer.function = idle_inject_timer_fn;
>> > +           hrtimer_start(&timer, ms_to_ktime(duration_ms),
>> > +                   HRTIMER_MODE_REL_PINNED);
>> > +           end_time = jiffies + msecs_to_jiffies(duration_ms);
>> > +
>> > +           while (time_after(end_time, jiffies))
>> > +                   do_idle();
>> > +   }
>> > +   hrtimer_cancel(&timer);
>> > +
>> > +   cpuidle_use_deepest_state(false);
>> > +   current->flags &= ~PF_IDLE;
>> > +
>> > +   preempt_fold_need_resched();
>> > +   preempt_enable();
>> > +}
>> > +EXPORT_SYMBOL_GPL(play_idle);
>>
>>
>> How about something like so... (since I had to edit, I fixed up most
>> things Ingo complained about as well).
>>
>> Note that it doesn't build because of a distinct lack of
>> cpuidle_use_deepest_state() in my kernel tree.
>
> Ok, I really like this one, it so much cleaner!
>
> If the patch builds & works:
>
>   Acked-by: Ingo Molnar <mingo@kernel.org>

OK, thanks!

Jacob, can you please test this one along with the rest of the series?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-11-24 13:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-23 20:13 [PATCH v3 0/3] Stop sched tick in idle injection task Jacob Pan
2016-11-23 20:13 ` [PATCH v3 1/3] idle: add support for tasks that inject idle Jacob Pan
2016-11-24  4:18   ` Ingo Molnar
2016-11-24  9:50   ` Peter Zijlstra
2016-11-24 11:04     ` Ingo Molnar
2016-11-24 13:44       ` Rafael J. Wysocki
2016-11-23 20:13 ` [PATCH v3 2/3] cpuidle: allow setting deepest idle Jacob Pan
2016-11-23 20:13 ` [PATCH v3 3/3] thermal/powerclamp: stop sched tick in forced idle Jacob Pan
2016-11-23 21:12 ` [PATCH v3 0/3] Stop sched tick in idle injection task Rafael J. Wysocki
2016-11-23 21:45   ` Peter Zijlstra
2016-11-23 22:38     ` Rafael J. Wysocki
2016-11-24  0:25       ` Jacob Pan
2016-11-24  0:24         ` Rafael J. Wysocki
2016-11-24  4:21   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).