linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
@ 2018-03-12  9:36 Rafael J. Wysocki
  2018-03-14 11:24 ` Rafael J. Wysocki
  2018-03-14 12:04 ` Peter Zijlstra
  0 siblings, 2 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2018-03-12  9:36 UTC (permalink / raw)
  To: Linux PM
  Cc: Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Paul McKenney, Thomas Ilsche, Doug Smythies, Rik van Riel,
	Aubrey Li, Mike Galbraith, LKML

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

If poll_idle() is allowed to spin until need_resched() returns 'true',
it may actually spin for a much longer time than expected by the idle
governor, since set_tsk_need_resched() is not always called by the
timer interrupt handler.  If that happens, the CPU may spend much
more time than anticipated in the "polling" state.

To prevent that from happening, limit the time of the spinning loop
in poll_idle().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

-> v2: After additional testing reduce POLL_IDLE_TIME_CHECK_COUNT to 1000.

---
 drivers/cpuidle/poll_state.c |   17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/cpuidle/poll_state.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/poll_state.c
+++ linux-pm/drivers/cpuidle/poll_state.c
@@ -5,16 +5,31 @@
  */
 
 #include <linux/cpuidle.h>
+#include <linux/ktime.h>
 #include <linux/sched.h>
 #include <linux/sched/idle.h>
 
+#define POLL_IDLE_TIME_CHECK_COUNT	1000
+#define POLL_IDLE_TIME_LIMIT		(TICK_NSEC / 16)
+
 static int __cpuidle poll_idle(struct cpuidle_device *dev,
 			       struct cpuidle_driver *drv, int index)
 {
+	ktime_t start = ktime_get();
+
 	local_irq_enable();
 	if (!current_set_polling_and_test()) {
-		while (!need_resched())
+		unsigned int time_check_counter = 0;
+
+		while (!need_resched()) {
 			cpu_relax();
+			if (time_check_counter++ < POLL_IDLE_TIME_CHECK_COUNT)
+				continue;
+
+			time_check_counter = 0;
+			if (ktime_sub(ktime_get(), start) > POLL_IDLE_TIME_LIMIT)
+				break;
+		}
 	}
 	current_clr_polling();
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-12  9:36 [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle() Rafael J. Wysocki
@ 2018-03-14 11:24 ` Rafael J. Wysocki
  2018-03-14 12:04 ` Peter Zijlstra
  1 sibling, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2018-03-14 11:24 UTC (permalink / raw)
  To: Linux PM
  Cc: Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Paul McKenney, Thomas Ilsche, Doug Smythies, Rik van Riel,
	Aubrey Li, Mike Galbraith, LKML

On Monday, March 12, 2018 10:36:27 AM CET Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> If poll_idle() is allowed to spin until need_resched() returns 'true',
> it may actually spin for a much longer time than expected by the idle
> governor, since set_tsk_need_resched() is not always called by the
> timer interrupt handler.  If that happens, the CPU may spend much
> more time than anticipated in the "polling" state.
> 
> To prevent that from happening, limit the time of the spinning loop
> in poll_idle().
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> -> v2: After additional testing reduce POLL_IDLE_TIME_CHECK_COUNT to 1000.
> 
> ---
>  drivers/cpuidle/poll_state.c |   17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/cpuidle/poll_state.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/poll_state.c
> +++ linux-pm/drivers/cpuidle/poll_state.c
> @@ -5,16 +5,31 @@
>   */
>  
>  #include <linux/cpuidle.h>
> +#include <linux/ktime.h>
>  #include <linux/sched.h>
>  #include <linux/sched/idle.h>
>  
> +#define POLL_IDLE_TIME_CHECK_COUNT	1000
> +#define POLL_IDLE_TIME_LIMIT		(TICK_NSEC / 16)
> +
>  static int __cpuidle poll_idle(struct cpuidle_device *dev,
>  			       struct cpuidle_driver *drv, int index)
>  {
> +	ktime_t start = ktime_get();
> +
>  	local_irq_enable();
>  	if (!current_set_polling_and_test()) {
> -		while (!need_resched())
> +		unsigned int time_check_counter = 0;
> +
> +		while (!need_resched()) {
>  			cpu_relax();
> +			if (time_check_counter++ < POLL_IDLE_TIME_CHECK_COUNT)
> +				continue;
> +
> +			time_check_counter = 0;
> +			if (ktime_sub(ktime_get(), start) > POLL_IDLE_TIME_LIMIT)
> +				break;
> +		}
>  	}
>  	current_clr_polling();
>  

No comments, so I'm assuming no objections or concerns.

I've seen reports telling me that this patch alone may reduce the CPU package
power by as much as 30% sometimes.

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-12  9:36 [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle() Rafael J. Wysocki
  2018-03-14 11:24 ` Rafael J. Wysocki
@ 2018-03-14 12:04 ` Peter Zijlstra
  2018-03-14 12:08   ` Rafael J. Wysocki
                     ` (3 more replies)
  1 sibling, 4 replies; 15+ messages in thread
From: Peter Zijlstra @ 2018-03-14 12:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Frederic Weisbecker, Thomas Gleixner, Paul McKenney,
	Thomas Ilsche, Doug Smythies, Rik van Riel, Aubrey Li,
	Mike Galbraith, LKML

On Mon, Mar 12, 2018 at 10:36:27AM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> If poll_idle() is allowed to spin until need_resched() returns 'true',
> it may actually spin for a much longer time than expected by the idle
> governor, since set_tsk_need_resched() is not always called by the
> timer interrupt handler.  If that happens, the CPU may spend much
> more time than anticipated in the "polling" state.
> 
> To prevent that from happening, limit the time of the spinning loop
> in poll_idle().
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> -> v2: After additional testing reduce POLL_IDLE_TIME_CHECK_COUNT to 1000.
> 
> ---
>  drivers/cpuidle/poll_state.c |   17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/cpuidle/poll_state.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/poll_state.c
> +++ linux-pm/drivers/cpuidle/poll_state.c
> @@ -5,16 +5,31 @@
>   */
>  
>  #include <linux/cpuidle.h>
> +#include <linux/ktime.h>
>  #include <linux/sched.h>
>  #include <linux/sched/idle.h>
>  
> +#define POLL_IDLE_TIME_CHECK_COUNT	1000
> +#define POLL_IDLE_TIME_LIMIT		(TICK_NSEC / 16)
> +
>  static int __cpuidle poll_idle(struct cpuidle_device *dev,
>  			       struct cpuidle_driver *drv, int index)
>  {
> +	ktime_t start = ktime_get();

I would recoomend not using ktime_get(), imagine the 'joy' if that
happens to be the HPET.

>  	local_irq_enable();
>  	if (!current_set_polling_and_test()) {
> +		unsigned int time_check_counter = 0;
> +
> +		while (!need_resched()) {
>  			cpu_relax();
> +			if (time_check_counter++ < POLL_IDLE_TIME_CHECK_COUNT)
> +				continue;
> +
> +			time_check_counter = 0;
> +			if (ktime_sub(ktime_get(), start) > POLL_IDLE_TIME_LIMIT)
> +				break;
> +		}
>  	}
>  	current_clr_polling();

Since the idle loop is strictly per-cpu, you can use regular
sched_clock() here. Something like:

	u64 start = sched_clock();

	local_irq_enable();
	if (!current_set_polling_and_test()) {
		while (!need_resched()) {
			cpu_relax();

			if (sched_clock() - start > POLL_IDLE_TIME_LIMIT)
				break;
		}
	}
	current_clr_polling();

On x86 we don't have to use that time_check_counter thing, sched_clock()
is really cheap, not sure if it makes sense on other platforms.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-14 12:04 ` Peter Zijlstra
@ 2018-03-14 12:08   ` Rafael J. Wysocki
  2018-03-22 16:32   ` Rik van Riel
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2018-03-14 12:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux PM, Frederic Weisbecker, Thomas Gleixner, Paul McKenney,
	Thomas Ilsche, Doug Smythies, Rik van Riel, Aubrey Li,
	Mike Galbraith, LKML

On Wednesday, March 14, 2018 1:04:50 PM CET Peter Zijlstra wrote:
> On Mon, Mar 12, 2018 at 10:36:27AM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > If poll_idle() is allowed to spin until need_resched() returns 'true',
> > it may actually spin for a much longer time than expected by the idle
> > governor, since set_tsk_need_resched() is not always called by the
> > timer interrupt handler.  If that happens, the CPU may spend much
> > more time than anticipated in the "polling" state.
> > 
> > To prevent that from happening, limit the time of the spinning loop
> > in poll_idle().
> > 
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> > 
> > -> v2: After additional testing reduce POLL_IDLE_TIME_CHECK_COUNT to 1000.
> > 
> > ---
> >  drivers/cpuidle/poll_state.c |   17 ++++++++++++++++-
> >  1 file changed, 16 insertions(+), 1 deletion(-)
> > 
> > Index: linux-pm/drivers/cpuidle/poll_state.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpuidle/poll_state.c
> > +++ linux-pm/drivers/cpuidle/poll_state.c
> > @@ -5,16 +5,31 @@
> >   */
> >  
> >  #include <linux/cpuidle.h>
> > +#include <linux/ktime.h>
> >  #include <linux/sched.h>
> >  #include <linux/sched/idle.h>
> >  
> > +#define POLL_IDLE_TIME_CHECK_COUNT	1000
> > +#define POLL_IDLE_TIME_LIMIT		(TICK_NSEC / 16)
> > +
> >  static int __cpuidle poll_idle(struct cpuidle_device *dev,
> >  			       struct cpuidle_driver *drv, int index)
> >  {
> > +	ktime_t start = ktime_get();
> 
> I would recoomend not using ktime_get(), imagine the 'joy' if that
> happens to be the HPET.
> 
> >  	local_irq_enable();
> >  	if (!current_set_polling_and_test()) {
> > +		unsigned int time_check_counter = 0;
> > +
> > +		while (!need_resched()) {
> >  			cpu_relax();
> > +			if (time_check_counter++ < POLL_IDLE_TIME_CHECK_COUNT)
> > +				continue;
> > +
> > +			time_check_counter = 0;
> > +			if (ktime_sub(ktime_get(), start) > POLL_IDLE_TIME_LIMIT)
> > +				break;
> > +		}
> >  	}
> >  	current_clr_polling();
> 
> Since the idle loop is strictly per-cpu, you can use regular
> sched_clock() here. Something like:
> 
> 	u64 start = sched_clock();
> 
> 	local_irq_enable();
> 	if (!current_set_polling_and_test()) {
> 		while (!need_resched()) {
> 			cpu_relax();
> 
> 			if (sched_clock() - start > POLL_IDLE_TIME_LIMIT)
> 				break;
> 		}
> 	}
> 	current_clr_polling();

Good idea!

> On x86 we don't have to use that time_check_counter thing, sched_clock()
> is really cheap, not sure if it makes sense on other platforms.

Let's do it the way you suggested, if there are issues with it, we
can still add the counter later.

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-14 12:04 ` Peter Zijlstra
  2018-03-14 12:08   ` Rafael J. Wysocki
@ 2018-03-22 16:32   ` Rik van Riel
  2018-03-22 19:11     ` Peter Zijlstra
  2018-03-27 16:42     ` Rafael J. Wysocki
  2018-03-22 19:11   ` Doug Smythies
  2018-03-23  3:19   ` Doug Smythies
  3 siblings, 2 replies; 15+ messages in thread
From: Rik van Riel @ 2018-03-22 16:32 UTC (permalink / raw)
  To: Peter Zijlstra, Rafael J. Wysocki
  Cc: Linux PM, Frederic Weisbecker, Thomas Gleixner, Paul McKenney,
	Thomas Ilsche, Doug Smythies, Aubrey Li, Mike Galbraith, LKML

[-- Attachment #1: Type: text/plain, Size: 366 bytes --]

On Wed, 2018-03-14 at 13:04 +0100, Peter Zijlstra wrote:

> On x86 we don't have to use that time_check_counter thing,
> sched_clock()
> is really cheap, not sure if it makes sense on other platforms.

Are you sure? I saw a 5-10% increase in CPU use,
for a constant query rate to a memcache style
workload, with v3 of this patch.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-22 16:32   ` Rik van Riel
@ 2018-03-22 19:11     ` Peter Zijlstra
  2018-03-27 16:42     ` Rafael J. Wysocki
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2018-03-22 19:11 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Rafael J. Wysocki, Linux PM, Frederic Weisbecker,
	Thomas Gleixner, Paul McKenney, Thomas Ilsche, Doug Smythies,
	Aubrey Li, Mike Galbraith, LKML

On Thu, Mar 22, 2018 at 12:32:18PM -0400, Rik van Riel wrote:
> On Wed, 2018-03-14 at 13:04 +0100, Peter Zijlstra wrote:
> 
> > On x86 we don't have to use that time_check_counter thing,
> > sched_clock()
> > is really cheap, not sure if it makes sense on other platforms.
> 
> Are you sure? I saw a 5-10% increase in CPU use,
> for a constant query rate to a memcache style
> workload, with v3 of this patch.

Well, I expected it to be really cheap, but if you have numbers saying
otherwise, clearly I'll need to adjust expectations ;-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-14 12:04 ` Peter Zijlstra
  2018-03-14 12:08   ` Rafael J. Wysocki
  2018-03-22 16:32   ` Rik van Riel
@ 2018-03-22 19:11   ` Doug Smythies
  2018-03-23  3:19   ` Doug Smythies
  3 siblings, 0 replies; 15+ messages in thread
From: Doug Smythies @ 2018-03-22 19:11 UTC (permalink / raw)
  To: 'Rik van Riel'
  Cc: 'Linux PM', 'Frederic Weisbecker',
	'Thomas Gleixner', 'Paul McKenney',
	'Thomas Ilsche', 'Aubrey Li',
	'Mike Galbraith', 'LKML',
	'Peter Zijlstra', 'Rafael J. Wysocki',
	Doug Smythies

On 2018.03.22 09:32 Rik van Riel wrote:
> On Wed, 2018-03-14 at 13:04 +0100, Peter Zijlstra wrote:
>
>> On x86 we don't have to use that time_check_counter thing,
>> sched_clock()
>> is really cheap, not sure if it makes sense on other platforms.
>
> Are you sure? I saw a 5-10% increase in CPU use,
> for a constant query rate to a memcache style
> workload, with v3 of this patch.

I would very much like to be able to repeat your test results.
However, I am not sure what you mean by "memcache style workload".
Is there a test you can point me to? Say a Phoronix type test, for example.

All of my tests with the V3 of this patch have been fine.

... Doug

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-14 12:04 ` Peter Zijlstra
                     ` (2 preceding siblings ...)
  2018-03-22 19:11   ` Doug Smythies
@ 2018-03-23  3:19   ` Doug Smythies
  2018-03-23  8:57     ` Rafael J. Wysocki
  3 siblings, 1 reply; 15+ messages in thread
From: Doug Smythies @ 2018-03-23  3:19 UTC (permalink / raw)
  To: 'Rik van Riel', 'Rafael J. Wysocki'
  Cc: 'Linux PM', 'Frederic Weisbecker',
	'Thomas Gleixner', 'Paul McKenney',
	'Thomas Ilsche', 'Aubrey Li',
	'Mike Galbraith', 'LKML',
	'Peter Zijlstra',
	Doug Smythies

On 2018.03.22 12:12 Doug Smythies wrote: 
>On 2018.03.22 09:32 Rik van Riel wrote:
>> On Wed, 2018-03-14 at 13:04 +0100, Peter Zijlstra wrote:
>>
>>> On x86 we don't have to use that time_check_counter thing,
>>> sched_clock()
>>> is really cheap, not sure if it makes sense on other platforms.
>>
>> Are you sure? I saw a 5-10% increase in CPU use,
>> for a constant query rate to a memcache style
>> workload, with v3 of this patch.
>
> I would very much like to be able to repeat your test results.
> However, I am not sure what you mean by "memcache style workload".
> Is there a test you can point me to? Say a Phoronix type test, for example.
>
> All of my tests with the V3 of this patch have been fine.

What is the difference between sched_clock() talked about herein,
and local_clock() used in the patch?

I'm not sure how good it is but I made a test. I didn't believe
the results, so I did it 3 times.

V7.3 is as from the git branch.
V7.3p is plus the patch adding the counter loop to poll_state.c

The test is a tight loop (about 19600 loops per second) running
on all 8 CPUs. I can not seem to get my system to use Idle State
0, so I disabled Idle States 1 and 2 to force use of Idle State 0.

V7.3 uses a processor package power of 62.5 Watts
V7.3p uses a processor package power of 53.4 Watts, or 14.6% less power.

The loop times do not change.
The Idle state 0 residency per unit time does not change.

... Doug

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-23  3:19   ` Doug Smythies
@ 2018-03-23  8:57     ` Rafael J. Wysocki
  2018-03-23  9:07       ` Rafael J. Wysocki
  2018-03-23 21:30       ` Doug Smythies
  0 siblings, 2 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2018-03-23  8:57 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Rik van Riel, Rafael J. Wysocki, Linux PM, Frederic Weisbecker,
	Thomas Gleixner, Paul McKenney, Thomas Ilsche, Aubrey Li,
	Mike Galbraith, LKML, Peter Zijlstra

On Fri, Mar 23, 2018 at 4:19 AM, Doug Smythies <dsmythies@telus.net> wrote:
> On 2018.03.22 12:12 Doug Smythies wrote:
>>On 2018.03.22 09:32 Rik van Riel wrote:
>>> On Wed, 2018-03-14 at 13:04 +0100, Peter Zijlstra wrote:
>>>
>>>> On x86 we don't have to use that time_check_counter thing,
>>>> sched_clock()
>>>> is really cheap, not sure if it makes sense on other platforms.
>>>
>>> Are you sure? I saw a 5-10% increase in CPU use,
>>> for a constant query rate to a memcache style
>>> workload, with v3 of this patch.
>>
>> I would very much like to be able to repeat your test results.
>> However, I am not sure what you mean by "memcache style workload".
>> Is there a test you can point me to? Say a Phoronix type test, for example.
>>
>> All of my tests with the V3 of this patch have been fine.
>
> What is the difference between sched_clock() talked about herein,
> and local_clock() used in the patch?

It is almost the same (modulo offset) unless sched_clock_stable()
returns 'false'.

> I'm not sure how good it is but I made a test. I didn't believe
> the results, so I did it 3 times.
>
> V7.3 is as from the git branch.
> V7.3p is plus the patch adding the counter loop to poll_state.c
>
> The test is a tight loop (about 19600 loops per second) running
> on all 8 CPUs. I can not seem to get my system to use Idle State
> 0, so I disabled Idle States 1 and 2 to force use of Idle State 0.
>
> V7.3 uses a processor package power of 62.5 Watts
> V7.3p uses a processor package power of 53.4 Watts, or 14.6% less power.
>
> The loop times do not change.
> The Idle state 0 residency per unit time does not change.

OK, so this means that the results should improve for Rik with this
patch too. :-)

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-23  8:57     ` Rafael J. Wysocki
@ 2018-03-23  9:07       ` Rafael J. Wysocki
  2018-03-23 21:30       ` Doug Smythies
  1 sibling, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2018-03-23  9:07 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Rik van Riel, Rafael J. Wysocki, Linux PM, Frederic Weisbecker,
	Thomas Gleixner, Paul McKenney, Thomas Ilsche, Aubrey Li,
	Mike Galbraith, LKML, Peter Zijlstra

On Fri, Mar 23, 2018 at 9:57 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, Mar 23, 2018 at 4:19 AM, Doug Smythies <dsmythies@telus.net> wrote:
>> On 2018.03.22 12:12 Doug Smythies wrote:
>>>On 2018.03.22 09:32 Rik van Riel wrote:
>>>> On Wed, 2018-03-14 at 13:04 +0100, Peter Zijlstra wrote:
>>>>
>>>>> On x86 we don't have to use that time_check_counter thing,
>>>>> sched_clock()
>>>>> is really cheap, not sure if it makes sense on other platforms.
>>>>
>>>> Are you sure? I saw a 5-10% increase in CPU use,
>>>> for a constant query rate to a memcache style
>>>> workload, with v3 of this patch.
>>>
>>> I would very much like to be able to repeat your test results.
>>> However, I am not sure what you mean by "memcache style workload".
>>> Is there a test you can point me to? Say a Phoronix type test, for example.
>>>
>>> All of my tests with the V3 of this patch have been fine.
>>
>> What is the difference between sched_clock() talked about herein,
>> and local_clock() used in the patch?
>
> It is almost the same (modulo offset) unless sched_clock_stable()
> returns 'false'.
>
>> I'm not sure how good it is but I made a test. I didn't believe
>> the results, so I did it 3 times.
>>
>> V7.3 is as from the git branch.
>> V7.3p is plus the patch adding the counter loop to poll_state.c
>>
>> The test is a tight loop (about 19600 loops per second) running
>> on all 8 CPUs. I can not seem to get my system to use Idle State
>> 0, so I disabled Idle States 1 and 2 to force use of Idle State 0.
>>
>> V7.3 uses a processor package power of 62.5 Watts
>> V7.3p uses a processor package power of 53.4 Watts, or 14.6% less power.
>>
>> The loop times do not change.
>> The Idle state 0 residency per unit time does not change.
>
> OK, so this means that the results should improve for Rik with this
> patch too. :-)

BTW, can you possibly check how much of a difference it makes to
reduce POLL_IDLE_COUNT in the patch to, say, 500 or even more?

The lower it is, the less noise it will introduce AFAICS.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-23  8:57     ` Rafael J. Wysocki
  2018-03-23  9:07       ` Rafael J. Wysocki
@ 2018-03-23 21:30       ` Doug Smythies
  2018-03-24 11:25         ` Rafael J. Wysocki
  1 sibling, 1 reply; 15+ messages in thread
From: Doug Smythies @ 2018-03-23 21:30 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'
  Cc: 'Rik van Riel', 'Rafael J. Wysocki',
	'Linux PM', 'Frederic Weisbecker',
	'Thomas Gleixner', 'Paul McKenney',
	'Thomas Ilsche', 'Aubrey Li',
	'Mike Galbraith', 'LKML',
	'Peter Zijlstra',
	Doug Smythies

On 2018.03.23 02:08 Rafael J. Wysocki wrote:
> On Fri, Mar 23, 2018 at 9:57 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Fri, Mar 23, 2018 at 4:19 AM, Doug Smythies <dsmythies@telus.net> wrote:
>>> On 2018.03.22 12:12 Doug Smythies wrote:

...[snip]...

>>
>>> I'm not sure how good it is but I made a test. I didn't believe
>>> the results, so I did it 3 times.
>>>
>>> V7.3 is as from the git branch.
>>> V7.3p is plus the patch adding the counter loop to poll_state.c
>>>
>>> The test is a tight loop (about 19600 loops per second) running
>>> on all 8 CPUs. I can not seem to get my system to use Idle State
>>> 0, so I disabled Idle States 1 and 2 to force use of Idle State 0.
>>>
>>> V7.3 uses a processor package power of 62.5 Watts
>>> V7.3p uses a processor package power of 53.4 Watts, or 14.6% less power.
>>>
>>> The loop times do not change.
>>> The Idle state 0 residency per unit time does not change.
>>
>> OK, so this means that the results should improve for Rik with this
>> patch too. :-)

I hope so.

> BTW, can you possibly check how much of a difference it makes to
> reduce POLL_IDLE_COUNT in the patch to, say, 500 or even more?
> 
> The lower it is, the less noise it will introduce AFAICS.

Well, we would expect the curve to be something like a typical 1/x curve:

Power = 53.4 + k1/(k2* POLL_IDLE_COUNT + k3)

I did some runs and did a crude fit:

Power ~= 53.4 + 35/(POLL_IDLE_COUNT + 3)

And then calculate an allowed error from that. A count of 100 gives
back only 0.64% of the power, and so I suggest would be a reasonable
number.

That being said, my test is quite crude and we should first see what
others, including Rik, get.

These two graphs might help explain what I did:

http://fast.smythies.com/v73p_vary_count.png
http://fast.smythies.com/v73p_extra_power.png

It is just my opinion, but I think users with very stringent
idle state 0 exit latency requirements should test with
POLL_IDLE_COUNT set to 1. Then they know the worst case works,
whereas they might not hit it at 1/POLL_IDLE_COUNT probability.
Once happy that the worst case works, use nominal (T.B.D.)
POLL_IDLE_COUNT, for the power savings.

... Doug

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-23 21:30       ` Doug Smythies
@ 2018-03-24 11:25         ` Rafael J. Wysocki
  0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2018-03-24 11:25 UTC (permalink / raw)
  To: Doug Smythies
  Cc: 'Rik van Riel', 'Linux PM',
	'Frederic Weisbecker', 'Thomas Gleixner',
	'Paul McKenney', 'Thomas Ilsche',
	'Aubrey Li', 'Mike Galbraith', 'LKML',
	'Peter Zijlstra'

On Friday, March 23, 2018 10:30:03 PM CET Doug Smythies wrote:
> On 2018.03.23 02:08 Rafael J. Wysocki wrote:
> > On Fri, Mar 23, 2018 at 9:57 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> On Fri, Mar 23, 2018 at 4:19 AM, Doug Smythies <dsmythies@telus.net> wrote:
> >>> On 2018.03.22 12:12 Doug Smythies wrote:
> 
> ...[snip]...
> 
> >>
> >>> I'm not sure how good it is but I made a test. I didn't believe
> >>> the results, so I did it 3 times.
> >>>
> >>> V7.3 is as from the git branch.
> >>> V7.3p is plus the patch adding the counter loop to poll_state.c
> >>>
> >>> The test is a tight loop (about 19600 loops per second) running
> >>> on all 8 CPUs. I can not seem to get my system to use Idle State
> >>> 0, so I disabled Idle States 1 and 2 to force use of Idle State 0.
> >>>
> >>> V7.3 uses a processor package power of 62.5 Watts
> >>> V7.3p uses a processor package power of 53.4 Watts, or 14.6% less power.
> >>>
> >>> The loop times do not change.
> >>> The Idle state 0 residency per unit time does not change.
> >>
> >> OK, so this means that the results should improve for Rik with this
> >> patch too. :-)
> 
> I hope so.
> 
> > BTW, can you possibly check how much of a difference it makes to
> > reduce POLL_IDLE_COUNT in the patch to, say, 500 or even more?
> > 
> > The lower it is, the less noise it will introduce AFAICS.
> 
> Well, we would expect the curve to be something like a typical 1/x curve:
> 
> Power = 53.4 + k1/(k2* POLL_IDLE_COUNT + k3)
> 
> I did some runs and did a crude fit:
> 
> Power ~= 53.4 + 35/(POLL_IDLE_COUNT + 3)
> 
> And then calculate an allowed error from that. A count of 100 gives
> back only 0.64% of the power, and so I suggest would be a reasonable
> number.

I agree.

> That being said, my test is quite crude and we should first see what
> others, including Rik, get.
> 
> These two graphs might help explain what I did:
> 
> http://fast.smythies.com/v73p_vary_count.png
> http://fast.smythies.com/v73p_extra_power.png
> 
> It is just my opinion, but I think users with very stringent
> idle state 0 exit latency requirements should test with
> POLL_IDLE_COUNT set to 1. Then they know the worst case works,
> whereas they might not hit it at 1/POLL_IDLE_COUNT probability.
> Once happy that the worst case works, use nominal (T.B.D.)
> POLL_IDLE_COUNT, for the power savings.

Well, the exit latency is a sort of different story. :-)

Obviously, cpuidle_poll_state_init() lies about the exit latency of the
polling state (which is supposed to be worst-case), but that's for the
polling state to fit the "state 0" slot.  Ideally, the latency should be
really 0, meaning that the polling loop should terminate as soon as an
interrupt occurs (regardless of the need_resched() status).

I have a prototype (appended below) to make that happen which seems to be
working here.

---
 drivers/cpuidle/poll_state.c |    4 +++-
 include/linux/sched.h        |    1 +
 include/linux/sched/idle.h   |   15 +++++++++++++++
 kernel/softirq.c             |   10 ++++++++++
 4 files changed, 29 insertions(+), 1 deletion(-)

Index: linux-pm/include/linux/sched.h
===================================================================
--- linux-pm.orig/include/linux/sched.h
+++ linux-pm/include/linux/sched.h
@@ -1319,6 +1319,7 @@ extern struct pid *cad_pid;
 #define PF_SWAPWRITE		0x00800000	/* Allowed to write to swap */
 #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_allowed */
 #define PF_MCE_EARLY		0x08000000      /* Early kill for mce process policy */
+#define PF_IDLE_POLL		0x10000000	/* I am an idle task in a polling loop */
 #define PF_MUTEX_TESTER		0x20000000	/* Thread belongs to the rt mutex tester */
 #define PF_FREEZER_SKIP		0x40000000	/* Freezer should not count it as freezable */
 #define PF_SUSPEND_TASK		0x80000000      /* This thread called freeze_processes() and should not be frozen */
Index: linux-pm/include/linux/sched/idle.h
===================================================================
--- linux-pm.orig/include/linux/sched/idle.h
+++ linux-pm/include/linux/sched/idle.h
@@ -84,4 +84,19 @@ static inline void current_clr_polling(v
 	preempt_fold_need_resched();
 }
 
+static inline bool current_may_poll(void)
+{
+	return !!(current->flags & PF_IDLE_POLL);
+}
+
+static inline void current_enable_polling(void)
+{
+	current->flags |= PF_IDLE_POLL;
+}
+
+static inline void current_disable_polling(void)
+{
+	current->flags &= ~PF_IDLE_POLL;
+}
+
 #endif /* _LINUX_SCHED_IDLE_H */
Index: linux-pm/drivers/cpuidle/poll_state.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/poll_state.c
+++ linux-pm/drivers/cpuidle/poll_state.c
@@ -17,11 +17,12 @@ static int __cpuidle poll_idle(struct cp
 {
 	u64 time_start = local_clock();
 
+	current_enable_polling();
 	local_irq_enable();
 	if (!current_set_polling_and_test()) {
 		unsigned int loop_count = 0;
 
-		while (!need_resched()) {
+		while (!need_resched() && current_may_poll()) {
 			cpu_relax();
 			if (loop_count++ < POLL_IDLE_COUNT)
 				continue;
@@ -32,6 +33,7 @@ static int __cpuidle poll_idle(struct cp
 		}
 	}
 	current_clr_polling();
+	current_disable_polling();
 
 	return index;
 }
Index: linux-pm/kernel/softirq.c
===================================================================
--- linux-pm.orig/kernel/softirq.c
+++ linux-pm/kernel/softirq.c
@@ -22,6 +22,7 @@
 #include <linux/kthread.h>
 #include <linux/rcupdate.h>
 #include <linux/ftrace.h>
+#include <linux/sched/idle.h>
 #include <linux/smp.h>
 #include <linux/smpboot.h>
 #include <linux/tick.h>
@@ -389,6 +390,14 @@ static inline void tick_irq_exit(void)
 #endif
 }
 
+static inline void idle_irq_exit(void)
+{
+#ifdef CONFIG_CPU_IDLE
+	if (is_idle_task(current))
+		current_disable_polling();
+#endif
+}
+
 /*
  * Exit an interrupt context. Process softirqs if needed and possible:
  */
@@ -405,6 +414,7 @@ void irq_exit(void)
 		invoke_softirq();
 
 	tick_irq_exit();
+	idle_irq_exit();
 	rcu_irq_exit();
 	trace_hardirq_exit(); /* must be last! */
 }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-22 16:32   ` Rik van Riel
  2018-03-22 19:11     ` Peter Zijlstra
@ 2018-03-27 16:42     ` Rafael J. Wysocki
  2018-03-27 18:02       ` Rik van Riel
  1 sibling, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2018-03-27 16:42 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Peter Zijlstra, Rafael J. Wysocki, Linux PM, Frederic Weisbecker,
	Thomas Gleixner, Paul McKenney, Thomas Ilsche, Doug Smythies,
	Aubrey Li, Mike Galbraith, LKML

On Thu, Mar 22, 2018 at 5:32 PM, Rik van Riel <riel@surriel.com> wrote:
> On Wed, 2018-03-14 at 13:04 +0100, Peter Zijlstra wrote:
>
>> On x86 we don't have to use that time_check_counter thing,
>> sched_clock()
>> is really cheap, not sure if it makes sense on other platforms.
>
> Are you sure? I saw a 5-10% increase in CPU use,
> for a constant query rate to a memcache style
> workload, with v3 of this patch.

I think I know what's going on.

Increased utilization with the same amount of work per time unit (and
I guess that's the case given the lack of specific information about
the workload) means more non-idle time with respect to total time and
that implies reduced frequency (eg. less turbo).

Now, combine that with the Doug's observation that limiting the rate
of local_clock() invocations in the poll_idle() loop reduces power
draw during experiments on his system significantly and with the other
one that in both cases local_lock() ends up being rdtsc() (most
likely).

What this implies to me is that invoking rdtsc() at a high rate on
multiple logical CPUs in parallel causes chips to get hot.  Actually
that may be so hot that they hit power/thremal (eg. RAPL) limits and
get their frequency reduced as a result.

Limiting the rate of local_clock() invocations obviously avoids this issue.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-27 16:42     ` Rafael J. Wysocki
@ 2018-03-27 18:02       ` Rik van Riel
  2018-03-27 21:09         ` Rafael J. Wysocki
  0 siblings, 1 reply; 15+ messages in thread
From: Rik van Riel @ 2018-03-27 18:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Rafael J. Wysocki, Linux PM, Frederic Weisbecker,
	Thomas Gleixner, Paul McKenney, Thomas Ilsche, Doug Smythies,
	Aubrey Li, Mike Galbraith, LKML

[-- Attachment #1: Type: text/plain, Size: 998 bytes --]

On Tue, 2018-03-27 at 18:42 +0200, Rafael J. Wysocki wrote:
> On Thu, Mar 22, 2018 at 5:32 PM, Rik van Riel <riel@surriel.com>
> wrote:
> > On Wed, 2018-03-14 at 13:04 +0100, Peter Zijlstra wrote:
> > 
> > > On x86 we don't have to use that time_check_counter thing,
> > > sched_clock()
> > > is really cheap, not sure if it makes sense on other platforms.
> > 
> > Are you sure? I saw a 5-10% increase in CPU use,
> > for a constant query rate to a memcache style
> > workload, with v3 of this patch.
> 
> I think I know what's going on.

I ran my tests wrong, and the script never propagated
errors back to me. Sigh.

However, the poll_idle() that reads the TSC at a
reduced rate seems to perform better than the one
that reads the TSC every time it goes around the
loop.

The size of the idle loop seems to make a slight
difference, too. Having just one cpu_relax() in
the entire loop seems to be better than having
them all over the place.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()
  2018-03-27 18:02       ` Rik van Riel
@ 2018-03-27 21:09         ` Rafael J. Wysocki
  0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2018-03-27 21:09 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Rafael J. Wysocki, Peter Zijlstra, Rafael J. Wysocki, Linux PM,
	Frederic Weisbecker, Thomas Gleixner, Paul McKenney,
	Thomas Ilsche, Doug Smythies, Aubrey Li, Mike Galbraith, LKML

On Tue, Mar 27, 2018 at 8:02 PM, Rik van Riel <riel@surriel.com> wrote:
> On Tue, 2018-03-27 at 18:42 +0200, Rafael J. Wysocki wrote:
>> On Thu, Mar 22, 2018 at 5:32 PM, Rik van Riel <riel@surriel.com>
>> wrote:
>> > On Wed, 2018-03-14 at 13:04 +0100, Peter Zijlstra wrote:
>> >
>> > > On x86 we don't have to use that time_check_counter thing,
>> > > sched_clock()
>> > > is really cheap, not sure if it makes sense on other platforms.
>> >
>> > Are you sure? I saw a 5-10% increase in CPU use,
>> > for a constant query rate to a memcache style
>> > workload, with v3 of this patch.
>>
>> I think I know what's going on.
>
> I ran my tests wrong, and the script never propagated
> errors back to me. Sigh.

Again, no worries. :-)

> However, the poll_idle() that reads the TSC at a
> reduced rate seems to perform better than the one
> that reads the TSC every time it goes around the
> loop.

OK

> The size of the idle loop seems to make a slight
> difference, too. Having just one cpu_relax() in
> the entire loop seems to be better than having
> them all over the place.

OK

Thanks for the above observations, they match my understanding of
what's happening.

I'll resend the patch to read the TSC at a reduced rate with a proper
changelog and I'll tentatively add it to my pm-cpuidle branch.

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-03-27 21:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-12  9:36 [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle() Rafael J. Wysocki
2018-03-14 11:24 ` Rafael J. Wysocki
2018-03-14 12:04 ` Peter Zijlstra
2018-03-14 12:08   ` Rafael J. Wysocki
2018-03-22 16:32   ` Rik van Riel
2018-03-22 19:11     ` Peter Zijlstra
2018-03-27 16:42     ` Rafael J. Wysocki
2018-03-27 18:02       ` Rik van Riel
2018-03-27 21:09         ` Rafael J. Wysocki
2018-03-22 19:11   ` Doug Smythies
2018-03-23  3:19   ` Doug Smythies
2018-03-23  8:57     ` Rafael J. Wysocki
2018-03-23  9:07       ` Rafael J. Wysocki
2018-03-23 21:30       ` Doug Smythies
2018-03-24 11:25         ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).