linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/3] sched/idle: run-time support for setting idle polling
@ 2015-09-22 20:34 Luiz Capitulino
  2015-09-22 20:34 ` [RFC 1/3] sched/idle: cpu_idle_poll(): drop unused return code Luiz Capitulino
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Luiz Capitulino @ 2015-09-22 20:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: riel, rafael.j.wysocki, mingo

Hi,

Some archs allow the system administrator to set the
idle thread behavior to spin instead of entering
sleep states. The x86 arch, for example, has a idle=
command-line parameter for this purpose.

However, the command-line parameter has two problems:

 1. You have to reboot if you change your mind
 2. This setting affects all system cores

The second point is relevant for systems where cores
are partitioned into bookkeeping and low-latency cores.
Usually, it's OK for bookkeeping cores to enter deeper
sleep states. It's only the low-latency cores that should
poll when entering idle.

This series adds the following file:

 /sys/devices/system/cpu/cpu_idle

This file outputs and stores a cpumask of the cores
which will have idle polling behavior.

This implementation seems to work fine on x86, however
it's RFC because of the following points (for which
feedback is greatly appreciated):

 o I believe this implementation should work for all archs,
   but I can't confirm it as my machines and experience is
   limited to x86

 o Some x86 cpufreq drivers explicitly check if idle=poll
   was passed. Does anyone know if this is an optmization
   or is there actually a conflict between idle=poll and
   driver operation?

 o This series maintains cpu_idle_poll_ctrl() semantics
   which led to a more complex implementation. That is, today
   cpu_idle_poll_ctrl() increments or decrements a counter.
   A lot of arch code seems to count on this semantic, where
   cpu_idle_poll_ctrl(enable or false) calls have to match to
   enable or disable idle polling

Luiz Capitulino (3):
  sched/idle: cpu_idle_poll(): drop unused return code
  sched/idle: make cpu_idle_force_poll per-cpu
  sched/idle: run-time support for setting idle polling

 drivers/base/cpu.c  | 44 ++++++++++++++++++++++++
 include/linux/cpu.h |  2 ++
 kernel/sched/idle.c | 96 +++++++++++++++++++++++++++++++++++++++++++++--------
 3 files changed, 129 insertions(+), 13 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC 1/3] sched/idle: cpu_idle_poll(): drop unused return code
  2015-09-22 20:34 [RFC 0/3] sched/idle: run-time support for setting idle polling Luiz Capitulino
@ 2015-09-22 20:34 ` Luiz Capitulino
  2015-09-22 20:34 ` [RFC 2/3] sched/idle: make cpu_idle_force_poll per-cpu Luiz Capitulino
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Luiz Capitulino @ 2015-09-22 20:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: riel, rafael.j.wysocki, mingo

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
 kernel/sched/idle.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 8f177c7..93d0657 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -52,7 +52,7 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
 __setup("hlt", cpu_idle_nopoll_setup);
 #endif
 
-static inline int cpu_idle_poll(void)
+static inline void cpu_idle_poll(void)
 {
 	rcu_idle_enter();
 	trace_cpu_idle_rcuidle(0, smp_processor_id());
@@ -62,7 +62,6 @@ static inline int cpu_idle_poll(void)
 		cpu_relax();
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
 	rcu_idle_exit();
-	return 1;
 }
 
 /* Weak implementations for optional arch specific functions */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 2/3] sched/idle: make cpu_idle_force_poll per-cpu
  2015-09-22 20:34 [RFC 0/3] sched/idle: run-time support for setting idle polling Luiz Capitulino
  2015-09-22 20:34 ` [RFC 1/3] sched/idle: cpu_idle_poll(): drop unused return code Luiz Capitulino
@ 2015-09-22 20:34 ` Luiz Capitulino
  2015-09-22 20:34 ` [RFC 3/3] sched/idle: run-time support for setting idle polling Luiz Capitulino
  2015-09-23  1:17 ` [RFC 0/3] " Rafael J. Wysocki
  3 siblings, 0 replies; 10+ messages in thread
From: Luiz Capitulino @ 2015-09-22 20:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: riel, rafael.j.wysocki, mingo

In preparation to support setting idle polling behavior
at run-time, this commit makes the cpu_idle_force_poll
global counter a per-cpu data.

The new per-cpu data is actually a struct, and new
helper functions are added in order to maintain the
same semantics cpu_idle_force_poll used to have.

This change should not be visible to arch code calling
cpu_idle_poll_ctrl().

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
 kernel/sched/idle.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 93d0657..3060977 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -24,29 +24,49 @@ void sched_idle_set_state(struct cpuidle_state *idle_state)
 	idle_set_state(this_rq(), idle_state);
 }
 
-static int __read_mostly cpu_idle_force_poll;
+struct idle_poll {
+	int force_poll;
+};
+
+static DEFINE_PER_CPU(struct idle_poll, idle_poll) = {
+	.force_poll = 0,
+};
+
+static bool this_cpu_idle_poll(void)
+{
+	return per_cpu(idle_poll, smp_processor_id()).force_poll > 0;
+}
+
+static void cpu_idle_poll_set_all(int v)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		per_cpu(idle_poll, cpu).force_poll = v;
+}
 
 void cpu_idle_poll_ctrl(bool enable)
 {
-	if (enable) {
-		cpu_idle_force_poll++;
-	} else {
-		cpu_idle_force_poll--;
-		WARN_ON_ONCE(cpu_idle_force_poll < 0);
+	int *p, cpu;
+
+	for_each_possible_cpu(cpu) {
+		p = &per_cpu(idle_poll, cpu).force_poll;
+		enable == true ? ++*p : --*p;
+		WARN_ON_ONCE(*p < 0);
 	}
 }
 
 #ifdef CONFIG_GENERIC_IDLE_POLL_SETUP
 static int __init cpu_idle_poll_setup(char *__unused)
 {
-	cpu_idle_force_poll = 1;
+	cpu_idle_poll_set_all(1);
 	return 1;
 }
 __setup("nohlt", cpu_idle_poll_setup);
 
 static int __init cpu_idle_nopoll_setup(char *__unused)
 {
-	cpu_idle_force_poll = 0;
+	cpu_idle_poll_set_all(0);
 	return 1;
 }
 __setup("hlt", cpu_idle_nopoll_setup);
@@ -58,7 +78,7 @@ static inline void cpu_idle_poll(void)
 	trace_cpu_idle_rcuidle(0, smp_processor_id());
 	local_irq_enable();
 	while (!tif_need_resched() &&
-		(cpu_idle_force_poll || tick_check_broadcast_expired()))
+		(this_cpu_idle_poll() || tick_check_broadcast_expired()))
 		cpu_relax();
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
 	rcu_idle_exit();
@@ -71,7 +91,7 @@ void __weak arch_cpu_idle_exit(void) { }
 void __weak arch_cpu_idle_dead(void) { }
 void __weak arch_cpu_idle(void)
 {
-	cpu_idle_force_poll = 1;
+	cpu_idle_poll_set_all(1);
 	local_irq_enable();
 }
 
@@ -242,7 +262,7 @@ static void cpu_idle_loop(void)
 			 * know that the IPI is going to arrive right
 			 * away
 			 */
-			if (cpu_idle_force_poll || tick_check_broadcast_expired())
+			if (this_cpu_idle_poll() || tick_check_broadcast_expired())
 				cpu_idle_poll();
 			else
 				cpuidle_idle_call();
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 3/3] sched/idle: run-time support for setting idle polling
  2015-09-22 20:34 [RFC 0/3] sched/idle: run-time support for setting idle polling Luiz Capitulino
  2015-09-22 20:34 ` [RFC 1/3] sched/idle: cpu_idle_poll(): drop unused return code Luiz Capitulino
  2015-09-22 20:34 ` [RFC 2/3] sched/idle: make cpu_idle_force_poll per-cpu Luiz Capitulino
@ 2015-09-22 20:34 ` Luiz Capitulino
  2015-09-23  1:17 ` [RFC 0/3] " Rafael J. Wysocki
  3 siblings, 0 replies; 10+ messages in thread
From: Luiz Capitulino @ 2015-09-22 20:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: riel, rafael.j.wysocki, mingo

Some archs allow the system administrator to set the
idle thread behavior to spin instead of entering
sleep states. The x86 arch, for example, has a idle=
command-line parameter for this purpose.

However, the command-line parameter has two problems:

 1. You have to reboot if you change your mind
 2. This setting affects all system cores

The second point is relevant for systems where cores
are partitioned into bookkeeping and isolated,
low-latency cores. Usually, it's OK for bookkeeping
cores to enter deeper sleep states. It's only the
isolated ones that should poll when entering idle.

This commit solves both problems by allowing the
system administrator to set the idle thread behavior
at run-time by writing to the following file:

 /sys/devices/system/cpu/idle_poll

This file stores and outputs the cpumask which will
have idle polling behavior.

It's important to note that the system administrator
can't undo what was set by arch code. That is, if
arch code enables idle=poll by calling cpu_idle_poll_ctrl(true),
the system administrator won't be able to unset it.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
 drivers/base/cpu.c  | 44 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/cpu.h |  2 ++
 kernel/sched/idle.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 97 insertions(+)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 91bbb19..ac0dc3c 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -277,6 +277,49 @@ static ssize_t print_cpus_isolated(struct device *dev,
 }
 static DEVICE_ATTR(isolated, 0444, print_cpus_isolated, NULL);
 
+static ssize_t print_cpus_idle_loop(struct device *dev,
+                                struct device_attribute *attr, char *buf)
+{
+	int ret, len = PAGE_SIZE-2;
+	cpumask_var_t mask;
+
+	ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
+	if (!ret)
+		return -ENOMEM;
+
+	cpu_idle_loop_to_cpumask(&mask);
+	ret = scnprintf(buf, len, "%*pb\n", cpumask_pr_args(mask));
+
+	free_cpumask_var(&mask);
+	return ret;
+}
+
+static ssize_t store_cpus_idle_loop(struct device *dev,
+                                struct device_attribute *attr,
+                                const char *buf,
+                                size_t count)
+{
+	cpumask_var_t mask;
+	int ret;
+
+	ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
+	if (!ret)
+		return -ENOMEM;
+
+	ret = cpumask_parse(buf, &mask);
+	if (ret < 0)
+		goto out;
+
+	ret = cpu_idle_loop_from_cpumask(&mask);
+
+out:
+	free_cpumask_var(&mask);
+	return ret < 0 ? ret : count;
+}
+
+static DEVICE_ATTR(idle_loop, 0644, print_cpus_idle_loop,
+                              store_cpus_idle_loop);
+
 #ifdef CONFIG_NO_HZ_FULL
 static ssize_t print_cpus_nohz_full(struct device *dev,
 				  struct device_attribute *attr, char *buf)
@@ -457,6 +500,7 @@ static struct attribute *cpu_root_attrs[] = {
 	&dev_attr_kernel_max.attr,
 	&dev_attr_offline.attr,
 	&dev_attr_isolated.attr,
+	&dev_attr_idle_loop.attr,
 #ifdef CONFIG_NO_HZ_FULL
 	&dev_attr_nohz_full.attr,
 #endif
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 23c30bd..8744507 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -274,6 +274,8 @@ enum cpuhp_state {
 
 void cpu_startup_entry(enum cpuhp_state state);
 
+void cpu_idle_loop_to_cpumask(struct cpumask *mask);
+int cpu_idle_loop_from_cpumask(const struct cpumask *mask);
 void cpu_idle_poll_ctrl(bool enable);
 
 void arch_cpu_idle(void);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 3060977..03567d6 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -26,10 +26,12 @@ void sched_idle_set_state(struct cpuidle_state *idle_state)
 
 struct idle_poll {
 	int force_poll;
+	bool sysfs_set;
 };
 
 static DEFINE_PER_CPU(struct idle_poll, idle_poll) = {
 	.force_poll = 0,
+	.sysfs_set = 0,
 };
 
 static bool this_cpu_idle_poll(void)
@@ -56,6 +58,55 @@ void cpu_idle_poll_ctrl(bool enable)
 	}
 }
 
+void cpu_idle_loop_to_cpumask(struct cpumask *mask)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		if (per_cpu(idle_poll, cpu).force_poll)
+			cpumask_set_cpu(cpu, mask);
+}
+
+/*
+ * This function enforces two rules:
+ *
+ * 1. force_poll can be incremented only once for a CPU
+ *    via sysfs
+ * 2. force_poll can only be decremented for a CPU if it
+ *    was previously incremented via sysfs
+ *
+ * This ensures that sysfs changes are independent of
+ * cpu_idle_poll_ctrl().
+ */
+int cpu_idle_loop_from_cpumask(const struct cpumask *mask)
+{
+	struct idle_poll *p;
+	int is_set, cpu;
+
+	for_each_possible_cpu(cpu) {
+		p = &per_cpu(idle_poll, cpu);
+		is_set = cpumask_test_cpu(cpu, mask);
+		if (!is_set && p->force_poll > 0 && !p->sysfs_set) {
+			pr_err("idle_poll: trying to undo arch setting\n");
+			return -EINVAL;
+		}
+	}
+
+	for_each_possible_cpu(cpu) {
+		struct idle_poll *p = &per_cpu(idle_poll, cpu);
+		bool is_set = cpumask_test_cpu(cpu, mask);
+		if (is_set && !p->sysfs_set) {
+			p->force_poll++;
+			p->sysfs_set = true;
+		} else if (!is_set && p->sysfs_set) {
+			p->force_poll--;
+			p->sysfs_set = false;
+		}
+	}
+
+	return 0;
+}
+
 #ifdef CONFIG_GENERIC_IDLE_POLL_SETUP
 static int __init cpu_idle_poll_setup(char *__unused)
 {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] sched/idle: run-time support for setting idle polling
  2015-09-22 20:34 [RFC 0/3] sched/idle: run-time support for setting idle polling Luiz Capitulino
                   ` (2 preceding siblings ...)
  2015-09-22 20:34 ` [RFC 3/3] sched/idle: run-time support for setting idle polling Luiz Capitulino
@ 2015-09-23  1:17 ` Rafael J. Wysocki
  2015-09-23 13:21   ` Luiz Capitulino
  2015-09-30 15:48   ` Peter Zijlstra
  3 siblings, 2 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2015-09-23  1:17 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: linux-kernel, riel, rafael.j.wysocki, mingo, Linux PM list,
	Len Brown, Peter Zijlstra

On Tuesday, September 22, 2015 04:34:19 PM Luiz Capitulino wrote:
> Hi,

Hi,

Please always CC patches related to power management to linux-pm@vger.kernel.org.

Also CCing Len Brown who's the maintainer of the intel_idle driver and Peter Z.

> Some archs allow the system administrator to set the
> idle thread behavior to spin instead of entering
> sleep states. The x86 arch, for example, has a idle=
> command-line parameter for this purpose.
> 
> However, the command-line parameter has two problems:
> 
>  1. You have to reboot if you change your mind
>  2. This setting affects all system cores
> 
> The second point is relevant for systems where cores
> are partitioned into bookkeeping and low-latency cores.
> Usually, it's OK for bookkeeping cores to enter deeper
> sleep states. It's only the low-latency cores that should
> poll when entering idle.

This looks like a use case for PM QoS to me rather.  You'd need to make it
work per-CPU rather than globally, but that really is about asking for
minimum latency.

> This series adds the following file:
> 
>  /sys/devices/system/cpu/cpu_idle
> 
> This file outputs and stores a cpumask of the cores
> which will have idle polling behavior.

I don't like this interface at all.

You have a cpuidle directory per core already, so what's the reason to add an
extra mask file really? 

> This implementation seems to work fine on x86, however
> it's RFC because of the following points (for which
> feedback is greatly appreciated):
> 
>  o I believe this implementation should work for all archs,
>    but I can't confirm it as my machines and experience is
>    limited to x86
> 
>  o Some x86 cpufreq drivers explicitly check if idle=poll
>    was passed. Does anyone know if this is an optmization
>    or is there actually a conflict between idle=poll and
>    driver operation?

idle=poll is used as a workaround for platform defects on some systems IIRC.

>  o This series maintains cpu_idle_poll_ctrl() semantics
>    which led to a more complex implementation. That is, today
>    cpu_idle_poll_ctrl() increments or decrements a counter.
>    A lot of arch code seems to count on this semantic, where
>    cpu_idle_poll_ctrl(enable or false) calls have to match to
>    enable or disable idle polling
> 
> Luiz Capitulino (3):
>   sched/idle: cpu_idle_poll(): drop unused return code
>   sched/idle: make cpu_idle_force_poll per-cpu
>   sched/idle: run-time support for setting idle polling
> 
>  drivers/base/cpu.c  | 44 ++++++++++++++++++++++++
>  include/linux/cpu.h |  2 ++
>  kernel/sched/idle.c | 96 +++++++++++++++++++++++++++++++++++++++++++++--------
>  3 files changed, 129 insertions(+), 13 deletions(-)

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] sched/idle: run-time support for setting idle polling
  2015-09-23  1:17 ` [RFC 0/3] " Rafael J. Wysocki
@ 2015-09-23 13:21   ` Luiz Capitulino
  2015-09-30 15:48   ` Peter Zijlstra
  1 sibling, 0 replies; 10+ messages in thread
From: Luiz Capitulino @ 2015-09-23 13:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, riel, rafael.j.wysocki, mingo, Linux PM list,
	Len Brown, Peter Zijlstra

On Wed, 23 Sep 2015 03:17:59 +0200
"Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:

> On Tuesday, September 22, 2015 04:34:19 PM Luiz Capitulino wrote:
> > Hi,
> 
> Hi,
> 
> Please always CC patches related to power management to linux-pm@vger.kernel.org.
> 
> Also CCing Len Brown who's the maintainer of the intel_idle driver and Peter Z.
> 
> > Some archs allow the system administrator to set the
> > idle thread behavior to spin instead of entering
> > sleep states. The x86 arch, for example, has a idle=
> > command-line parameter for this purpose.
> > 
> > However, the command-line parameter has two problems:
> > 
> >  1. You have to reboot if you change your mind
> >  2. This setting affects all system cores
> > 
> > The second point is relevant for systems where cores
> > are partitioned into bookkeeping and low-latency cores.
> > Usually, it's OK for bookkeeping cores to enter deeper
> > sleep states. It's only the low-latency cores that should
> > poll when entering idle.
> 
> This looks like a use case for PM QoS to me rather.  You'd need to make it
> work per-CPU rather than globally, but that really is about asking for
> minimum latency.

Yes, wake up latency. But that feature is already there, I'm just making
it a run-time tunable.

> > This series adds the following file:
> > 
> >  /sys/devices/system/cpu/cpu_idle
> > 
> > This file outputs and stores a cpumask of the cores
> > which will have idle polling behavior.
> 
> I don't like this interface at all.
> 
> You have a cpuidle directory per core already, so what's the reason to add an
> extra mask file really? 

If there's consensus that this is the right thing to do, I'd do it.

However, idle polling behavior is a idle thread parameter which is a
core kernel component not tied to drivers. In this case, it would
make more sense to add a idle_thread dir to sysfs so that future
idle thread parameters can be added there.

> > This implementation seems to work fine on x86, however
> > it's RFC because of the following points (for which
> > feedback is greatly appreciated):
> > 
> >  o I believe this implementation should work for all archs,
> >    but I can't confirm it as my machines and experience is
> >    limited to x86
> > 
> >  o Some x86 cpufreq drivers explicitly check if idle=poll
> >    was passed. Does anyone know if this is an optmization
> >    or is there actually a conflict between idle=poll and
> >    driver operation?
> 
> idle=poll is used as a workaround for platform defects on some systems IIRC.

Oh, makes sense.

> 
> >  o This series maintains cpu_idle_poll_ctrl() semantics
> >    which led to a more complex implementation. That is, today
> >    cpu_idle_poll_ctrl() increments or decrements a counter.
> >    A lot of arch code seems to count on this semantic, where
> >    cpu_idle_poll_ctrl(enable or false) calls have to match to
> >    enable or disable idle polling
> > 
> > Luiz Capitulino (3):
> >   sched/idle: cpu_idle_poll(): drop unused return code
> >   sched/idle: make cpu_idle_force_poll per-cpu
> >   sched/idle: run-time support for setting idle polling
> > 
> >  drivers/base/cpu.c  | 44 ++++++++++++++++++++++++
> >  include/linux/cpu.h |  2 ++
> >  kernel/sched/idle.c | 96 +++++++++++++++++++++++++++++++++++++++++++++--------
> >  3 files changed, 129 insertions(+), 13 deletions(-)
> 
> Thanks,
> Rafael
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] sched/idle: run-time support for setting idle polling
  2015-09-23  1:17 ` [RFC 0/3] " Rafael J. Wysocki
  2015-09-23 13:21   ` Luiz Capitulino
@ 2015-09-30 15:48   ` Peter Zijlstra
  2015-09-30 17:16     ` Luiz Capitulino
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2015-09-30 15:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Luiz Capitulino, linux-kernel, riel, rafael.j.wysocki, mingo,
	Linux PM list, Len Brown, Thomas Gleixner

On Wed, Sep 23, 2015 at 03:17:59AM +0200, Rafael J. Wysocki wrote:
> On Tuesday, September 22, 2015 04:34:19 PM Luiz Capitulino wrote:
> > Hi,
> 
> Hi,
> 
> Please always CC patches related to power management to linux-pm@vger.kernel.org.
>
> Also CCing Len Brown who's the maintainer of the intel_idle driver and Peter Z.

And the sched people for touching kernel/sched (thanks Rafael)!

> > Some archs allow the system administrator to set the
> > idle thread behavior to spin instead of entering
> > sleep states. The x86 arch, for example, has a idle=
> > command-line parameter for this purpose.
> > 
> > However, the command-line parameter has two problems:
> > 
> >  1. You have to reboot if you change your mind
> >  2. This setting affects all system cores
> > 
> > The second point is relevant for systems where cores
> > are partitioned into bookkeeping and low-latency cores.
> > Usually, it's OK for bookkeeping cores to enter deeper
> > sleep states. It's only the low-latency cores that should
> > poll when entering idle.
> 
> This looks like a use case for PM QoS to me rather.  You'd need to make it
> work per-CPU rather than globally, but that really is about asking for
> minimum latency.

Agreed, this should very much hook into the PM QoS stuff.

> 
> > This series adds the following file:
> > 
> >  /sys/devices/system/cpu/cpu_idle
> > 
> > This file outputs and stores a cpumask of the cores
> > which will have idle polling behavior.
> 
> I don't like this interface at all.
> 
> You have a cpuidle directory per core already, so what's the reason to add an

Yeah this should very much not be exposed like this.

Ideally every cpuidle driver would get 'polling' as a
default state and the QoS constraints might select it if nothing else
matches.

And no mucking about with the force polling flag at _all_.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] sched/idle: run-time support for setting idle polling
  2015-09-30 15:48   ` Peter Zijlstra
@ 2015-09-30 17:16     ` Luiz Capitulino
  2015-09-30 17:27       ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Luiz Capitulino @ 2015-09-30 17:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, linux-kernel, riel, rafael.j.wysocki, mingo,
	Linux PM list, Len Brown, Thomas Gleixner

On Wed, 30 Sep 2015 17:48:35 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Sep 23, 2015 at 03:17:59AM +0200, Rafael J. Wysocki wrote:
> > On Tuesday, September 22, 2015 04:34:19 PM Luiz Capitulino wrote:
> > > Hi,
> > 
> > Hi,
> > 
> > Please always CC patches related to power management to linux-pm@vger.kernel.org.
> >
> > Also CCing Len Brown who's the maintainer of the intel_idle driver and Peter Z.
> 
> And the sched people for touching kernel/sched (thanks Rafael)!
> 
> > > Some archs allow the system administrator to set the
> > > idle thread behavior to spin instead of entering
> > > sleep states. The x86 arch, for example, has a idle=
> > > command-line parameter for this purpose.
> > > 
> > > However, the command-line parameter has two problems:
> > > 
> > >  1. You have to reboot if you change your mind
> > >  2. This setting affects all system cores
> > > 
> > > The second point is relevant for systems where cores
> > > are partitioned into bookkeeping and low-latency cores.
> > > Usually, it's OK for bookkeeping cores to enter deeper
> > > sleep states. It's only the low-latency cores that should
> > > poll when entering idle.
> > 
> > This looks like a use case for PM QoS to me rather.  You'd need to make it
> > work per-CPU rather than globally, but that really is about asking for
> > minimum latency.
> 
> Agreed, this should very much hook into the PM QoS stuff.
> 
> > 
> > > This series adds the following file:
> > > 
> > >  /sys/devices/system/cpu/cpu_idle
> > > 
> > > This file outputs and stores a cpumask of the cores
> > > which will have idle polling behavior.
> > 
> > I don't like this interface at all.
> > 
> > You have a cpuidle directory per core already, so what's the reason to add an
> 
> Yeah this should very much not be exposed like this.
> 
> Ideally every cpuidle driver would get 'polling' as a
> default state and the QoS constraints might select it if nothing else
> matches.

So, I didn't know we had support for polling from the idle
thread via cpuidle. This solves the first part of the problem.

The second part is having this available in KVM guests. Looks
like there's work being done to have PV idle support in KVM,
so this solves the second problem.

No need for this series then.

> 
> And no mucking about with the force polling flag at _all_.
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] sched/idle: run-time support for setting idle polling
  2015-09-30 17:16     ` Luiz Capitulino
@ 2015-09-30 17:27       ` Peter Zijlstra
  2015-09-30 18:01         ` Luiz Capitulino
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2015-09-30 17:27 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Rafael J. Wysocki, linux-kernel, riel, rafael.j.wysocki, mingo,
	Linux PM list, Len Brown, Thomas Gleixner

On Wed, Sep 30, 2015 at 01:16:10PM -0400, Luiz Capitulino wrote:
> On Wed, 30 Sep 2015 17:48:35 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:

> > Ideally every cpuidle driver would get 'polling' as a
> > default state and the QoS constraints might select it if nothing else
> > matches.
> 
> So, I didn't know we had support for polling from the idle
> thread via cpuidle. This solves the first part of the problem.

I'm not sure we do; I'm saying that's what we should do ;-)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] sched/idle: run-time support for setting idle polling
  2015-09-30 17:27       ` Peter Zijlstra
@ 2015-09-30 18:01         ` Luiz Capitulino
  0 siblings, 0 replies; 10+ messages in thread
From: Luiz Capitulino @ 2015-09-30 18:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, linux-kernel, riel, rafael.j.wysocki, mingo,
	Linux PM list, Len Brown, Thomas Gleixner

On Wed, 30 Sep 2015 19:27:13 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Sep 30, 2015 at 01:16:10PM -0400, Luiz Capitulino wrote:
> > On Wed, 30 Sep 2015 17:48:35 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > Ideally every cpuidle driver would get 'polling' as a
> > > default state and the QoS constraints might select it if nothing else
> > > matches.
> > 
> > So, I didn't know we had support for polling from the idle
> > thread via cpuidle. This solves the first part of the problem.
> 
> I'm not sure we do; I'm saying that's what we should do ;-)

Yeah, but I think it's done already. Although I haven't tested it yet:

[root@localhost ~]# cat /sys/devices/system/cpu/cpu0/cpuidle/state0/name 
POLL
[root@localhost ~]# 

This is installed by drivers/cpuidle/driver.c:poll_idle_init().

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-09-30 18:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-22 20:34 [RFC 0/3] sched/idle: run-time support for setting idle polling Luiz Capitulino
2015-09-22 20:34 ` [RFC 1/3] sched/idle: cpu_idle_poll(): drop unused return code Luiz Capitulino
2015-09-22 20:34 ` [RFC 2/3] sched/idle: make cpu_idle_force_poll per-cpu Luiz Capitulino
2015-09-22 20:34 ` [RFC 3/3] sched/idle: run-time support for setting idle polling Luiz Capitulino
2015-09-23  1:17 ` [RFC 0/3] " Rafael J. Wysocki
2015-09-23 13:21   ` Luiz Capitulino
2015-09-30 15:48   ` Peter Zijlstra
2015-09-30 17:16     ` Luiz Capitulino
2015-09-30 17:27       ` Peter Zijlstra
2015-09-30 18:01         ` Luiz Capitulino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).