All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] cpuidle: Fix last_residency division
@ 2016-06-24  8:23 Shreyas B. Prabhu
  2016-06-24  9:00   ` David Laight
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-24  8:23 UTC (permalink / raw)
  To: rjw
  Cc: daniel.lezcano, linux-pm, linuxppc-dev, anton, mpe, bsingharora,
	Shreyas B. Prabhu

Snooze is a poll idle state in powernv and pseries platforms. Snooze
has a timeout so that if a cpu stays in snooze for more than target
residency of the next available idle state, then it would exit thereby
giving chance to the cpuidle governor to re-evaluate and
promote the cpu to a deeper idle state. Therefore whenever snooze exits
due to this timeout, its last_residency will be target_residency of next
deeper state.

commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
changed the math around last_residency calculation. Specifically, while
converting last_residency value from nanoseconds to microseconds it does
right shift by 10. Due to this, in snooze timeout exit scenarios
last_residency calculated is roughly 2.3% less than target_residency of
next available state. This pattern is picked up get_typical_interval()
in the menu governor and therefore expected_interval in menu_select() is
frequently less than the target_residency of any state but snooze.

Due to this we are entering snooze at a higher rate, thereby affecting
the single thread performance.

Fix this by replacing right shift by 10 with /1000 while calculating
last_residency.

Reported-by: Anton Blanchard <anton@samba.org>
Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
Changes in v2
=============
 - Fixing it in the cpuidle core code instead of driver code.

 drivers/cpuidle/cpuidle.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a4d0059..30d67a8 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -218,10 +218,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 		local_irq_enable();
 
 	/*
-	 * local_clock() returns the time in nanosecond, let's shift
-	 * by 10 (divide by 1024) to have microsecond based time.
+	 * local_clock() returns the time in nanosecond, let's
+	 * divide by 1000 to have microsecond based time.
 	 */
-	diff = (time_end - time_start) >> 10;
+	diff = (time_end - time_start) / 1000;
 	if (diff > INT_MAX)
 		diff = INT_MAX;
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* RE: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24  8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
@ 2016-06-24  9:00   ` David Laight
  2016-06-24  9:30   ` kbuild test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: David Laight @ 2016-06-24  9:00 UTC (permalink / raw)
  To: 'Shreyas B. Prabhu', rjw
  Cc: daniel.lezcano, linuxppc-dev, anton, linux-pm

From: Shreyas B. Prabhu
> Sent: 24 June 2016 09:24
>
> Snooze is a poll idle state in powernv and pseries platforms. Snooze
> has a timeout so that if a cpu stays in snooze for more than target
> residency of the next available idle state, then it would exit thereby
> giving chance to the cpuidle governor to re-evaluate and
> promote the cpu to a deeper idle state. Therefore whenever snooze exits
> due to this timeout, its last_residency will be target_residency of next
> deeper state.
> 
> commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
> changed the math around last_residency calculation. Specifically, while
> converting last_residency value from nanoseconds to microseconds it does
> right shift by 10. Due to this, in snooze timeout exit scenarios
> last_residency calculated is roughly 2.3% less than target_residency of
> next available state. This pattern is picked up get_typical_interval()
> in the menu governor and therefore expected_interval in menu_select() is
> frequently less than the target_residency of any state but snooze.
> 
> Due to this we are entering snooze at a higher rate, thereby affecting
> the single thread performance.
> 
> Fix this by replacing right shift by 10 with /1000 while calculating
> last_residency.
> 
> Reported-by: Anton Blanchard <anton@samba.org>
> Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> ---
> Changes in v2
> =============
>  - Fixing it in the cpuidle core code instead of driver code.
> 
>  drivers/cpuidle/cpuidle.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a4d0059..30d67a8 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -218,10 +218,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>  		local_irq_enable();
> 
>  	/*
> -	 * local_clock() returns the time in nanosecond, let's shift
> -	 * by 10 (divide by 1024) to have microsecond based time.
> +	 * local_clock() returns the time in nanosecond, let's
> +	 * divide by 1000 to have microsecond based time.
>  	 */
> -	diff = (time_end - time_start) >> 10;
> +	diff = (time_end - time_start) / 1000;
>  	if (diff > INT_MAX)
>  		diff = INT_MAX;

The intent of the >> 10 was probably to avoid an expensive 64bit divide.
So maybe something like:
	diff = time_end - time_start;
	if (diff >= INT_MAX/2)
		diff_32 = INT_MAX/2/1000;
	else
		diff_32 = diff;
		diff_32 += diff_32 >> 6;
		diff_32 >>= 10;
	}

Adding an extra 1/32 makes the division by be something slightly below 1000.

	David

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v2] cpuidle: Fix last_residency division
@ 2016-06-24  9:00   ` David Laight
  0 siblings, 0 replies; 16+ messages in thread
From: David Laight @ 2016-06-24  9:00 UTC (permalink / raw)
  To: 'Shreyas B. Prabhu', rjw
  Cc: linux-pm, daniel.lezcano, anton, linuxppc-dev

RnJvbTogU2hyZXlhcyBCLiBQcmFiaHUNCj4gU2VudDogMjQgSnVuZSAyMDE2IDA5OjI0DQo+DQo+
IFNub296ZSBpcyBhIHBvbGwgaWRsZSBzdGF0ZSBpbiBwb3dlcm52IGFuZCBwc2VyaWVzIHBsYXRm
b3Jtcy4gU25vb3plDQo+IGhhcyBhIHRpbWVvdXQgc28gdGhhdCBpZiBhIGNwdSBzdGF5cyBpbiBz
bm9vemUgZm9yIG1vcmUgdGhhbiB0YXJnZXQNCj4gcmVzaWRlbmN5IG9mIHRoZSBuZXh0IGF2YWls
YWJsZSBpZGxlIHN0YXRlLCB0aGVuIGl0IHdvdWxkIGV4aXQgdGhlcmVieQ0KPiBnaXZpbmcgY2hh
bmNlIHRvIHRoZSBjcHVpZGxlIGdvdmVybm9yIHRvIHJlLWV2YWx1YXRlIGFuZA0KPiBwcm9tb3Rl
IHRoZSBjcHUgdG8gYSBkZWVwZXIgaWRsZSBzdGF0ZS4gVGhlcmVmb3JlIHdoZW5ldmVyIHNub296
ZSBleGl0cw0KPiBkdWUgdG8gdGhpcyB0aW1lb3V0LCBpdHMgbGFzdF9yZXNpZGVuY3kgd2lsbCBi
ZSB0YXJnZXRfcmVzaWRlbmN5IG9mIG5leHQNCj4gZGVlcGVyIHN0YXRlLg0KPiANCj4gY29tbWl0
IGU5M2U1OWNlNWI4NSAoImNwdWlkbGU6IFJlcGxhY2Uga3RpbWVfZ2V0KCkgd2l0aCBsb2NhbF9j
bG9jaygpIikNCj4gY2hhbmdlZCB0aGUgbWF0aCBhcm91bmQgbGFzdF9yZXNpZGVuY3kgY2FsY3Vs
YXRpb24uIFNwZWNpZmljYWxseSwgd2hpbGUNCj4gY29udmVydGluZyBsYXN0X3Jlc2lkZW5jeSB2
YWx1ZSBmcm9tIG5hbm9zZWNvbmRzIHRvIG1pY3Jvc2Vjb25kcyBpdCBkb2VzDQo+IHJpZ2h0IHNo
aWZ0IGJ5IDEwLiBEdWUgdG8gdGhpcywgaW4gc25vb3plIHRpbWVvdXQgZXhpdCBzY2VuYXJpb3MN
Cj4gbGFzdF9yZXNpZGVuY3kgY2FsY3VsYXRlZCBpcyByb3VnaGx5IDIuMyUgbGVzcyB0aGFuIHRh
cmdldF9yZXNpZGVuY3kgb2YNCj4gbmV4dCBhdmFpbGFibGUgc3RhdGUuIFRoaXMgcGF0dGVybiBp
cyBwaWNrZWQgdXAgZ2V0X3R5cGljYWxfaW50ZXJ2YWwoKQ0KPiBpbiB0aGUgbWVudSBnb3Zlcm5v
ciBhbmQgdGhlcmVmb3JlIGV4cGVjdGVkX2ludGVydmFsIGluIG1lbnVfc2VsZWN0KCkgaXMNCj4g
ZnJlcXVlbnRseSBsZXNzIHRoYW4gdGhlIHRhcmdldF9yZXNpZGVuY3kgb2YgYW55IHN0YXRlIGJ1
dCBzbm9vemUuDQo+IA0KPiBEdWUgdG8gdGhpcyB3ZSBhcmUgZW50ZXJpbmcgc25vb3plIGF0IGEg
aGlnaGVyIHJhdGUsIHRoZXJlYnkgYWZmZWN0aW5nDQo+IHRoZSBzaW5nbGUgdGhyZWFkIHBlcmZv
cm1hbmNlLg0KPiANCj4gRml4IHRoaXMgYnkgcmVwbGFjaW5nIHJpZ2h0IHNoaWZ0IGJ5IDEwIHdp
dGggLzEwMDAgd2hpbGUgY2FsY3VsYXRpbmcNCj4gbGFzdF9yZXNpZGVuY3kuDQo+IA0KPiBSZXBv
cnRlZC1ieTogQW50b24gQmxhbmNoYXJkIDxhbnRvbkBzYW1iYS5vcmc+DQo+IEJpc2VjdGVkLWJ5
OiBTaGlscGFzcmkgRyBCaGF0IDxzaGlscGEuYmhhdEBsaW51eC52bmV0LmlibS5jb20+DQo+IFNp
Z25lZC1vZmYtYnk6IFNocmV5YXMgQi4gUHJhYmh1IDxzaHJleWFzQGxpbnV4LnZuZXQuaWJtLmNv
bT4NCj4gLS0tDQo+IENoYW5nZXMgaW4gdjINCj4gPT09PT09PT09PT09PQ0KPiAgLSBGaXhpbmcg
aXQgaW4gdGhlIGNwdWlkbGUgY29yZSBjb2RlIGluc3RlYWQgb2YgZHJpdmVyIGNvZGUuDQo+IA0K
PiAgZHJpdmVycy9jcHVpZGxlL2NwdWlkbGUuYyB8IDYgKysrLS0tDQo+ICAxIGZpbGUgY2hhbmdl
ZCwgMyBpbnNlcnRpb25zKCspLCAzIGRlbGV0aW9ucygtKQ0KPiANCj4gZGlmZiAtLWdpdCBhL2Ry
aXZlcnMvY3B1aWRsZS9jcHVpZGxlLmMgYi9kcml2ZXJzL2NwdWlkbGUvY3B1aWRsZS5jDQo+IGlu
ZGV4IGE0ZDAwNTkuLjMwZDY3YTggMTAwNjQ0DQo+IC0tLSBhL2RyaXZlcnMvY3B1aWRsZS9jcHVp
ZGxlLmMNCj4gKysrIGIvZHJpdmVycy9jcHVpZGxlL2NwdWlkbGUuYw0KPiBAQCAtMjE4LDEwICsy
MTgsMTAgQEAgaW50IGNwdWlkbGVfZW50ZXJfc3RhdGUoc3RydWN0IGNwdWlkbGVfZGV2aWNlICpk
ZXYsIHN0cnVjdCBjcHVpZGxlX2RyaXZlciAqZHJ2LA0KPiAgCQlsb2NhbF9pcnFfZW5hYmxlKCk7
DQo+IA0KPiAgCS8qDQo+IC0JICogbG9jYWxfY2xvY2soKSByZXR1cm5zIHRoZSB0aW1lIGluIG5h
bm9zZWNvbmQsIGxldCdzIHNoaWZ0DQo+IC0JICogYnkgMTAgKGRpdmlkZSBieSAxMDI0KSB0byBo
YXZlIG1pY3Jvc2Vjb25kIGJhc2VkIHRpbWUuDQo+ICsJICogbG9jYWxfY2xvY2soKSByZXR1cm5z
IHRoZSB0aW1lIGluIG5hbm9zZWNvbmQsIGxldCdzDQo+ICsJICogZGl2aWRlIGJ5IDEwMDAgdG8g
aGF2ZSBtaWNyb3NlY29uZCBiYXNlZCB0aW1lLg0KPiAgCSAqLw0KPiAtCWRpZmYgPSAodGltZV9l
bmQgLSB0aW1lX3N0YXJ0KSA+PiAxMDsNCj4gKwlkaWZmID0gKHRpbWVfZW5kIC0gdGltZV9zdGFy
dCkgLyAxMDAwOw0KPiAgCWlmIChkaWZmID4gSU5UX01BWCkNCj4gIAkJZGlmZiA9IElOVF9NQVg7
DQoNClRoZSBpbnRlbnQgb2YgdGhlID4+IDEwIHdhcyBwcm9iYWJseSB0byBhdm9pZCBhbiBleHBl
bnNpdmUgNjRiaXQgZGl2aWRlLg0KU28gbWF5YmUgc29tZXRoaW5nIGxpa2U6DQoJZGlmZiA9IHRp
bWVfZW5kIC0gdGltZV9zdGFydDsNCglpZiAoZGlmZiA+PSBJTlRfTUFYLzIpDQoJCWRpZmZfMzIg
PSBJTlRfTUFYLzIvMTAwMDsNCgllbHNlDQoJCWRpZmZfMzIgPSBkaWZmOw0KCQlkaWZmXzMyICs9
IGRpZmZfMzIgPj4gNjsNCgkJZGlmZl8zMiA+Pj0gMTA7DQoJfQ0KDQpBZGRpbmcgYW4gZXh0cmEg
MS8zMiBtYWtlcyB0aGUgZGl2aXNpb24gYnkgYmUgc29tZXRoaW5nIHNsaWdodGx5IGJlbG93IDEw
MDAuDQoNCglEYXZpZA0KDQo=

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24  8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
@ 2016-06-24  9:30   ` kbuild test robot
  2016-06-24  9:30   ` kbuild test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2016-06-24  9:30 UTC (permalink / raw)
  Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
	mpe, bsingharora, Shreyas B. Prabhu

[-- Attachment #1: Type: text/plain, Size: 900 bytes --]

Hi,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160624]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-defconfig (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `cpuidle_enter_state':
>> (.text+0x31e0d2): undefined reference to `__udivdi3'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 24898 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
@ 2016-06-24  9:30   ` kbuild test robot
  0 siblings, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2016-06-24  9:30 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
	mpe, bsingharora, Shreyas B. Prabhu

[-- Attachment #1: Type: text/plain, Size: 900 bytes --]

Hi,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160624]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-defconfig (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `cpuidle_enter_state':
>> (.text+0x31e0d2): undefined reference to `__udivdi3'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 24898 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24  9:00   ` David Laight
  (?)
@ 2016-06-24 10:05   ` Daniel Lezcano
  -1 siblings, 0 replies; 16+ messages in thread
From: Daniel Lezcano @ 2016-06-24 10:05 UTC (permalink / raw)
  To: David Laight, 'Shreyas B. Prabhu', rjw
  Cc: linux-pm, anton, linuxppc-dev

On 06/24/2016 11:00 AM, David Laight wrote:
> From: Shreyas B. Prabhu
>> Sent: 24 June 2016 09:24
>>
>> Snooze is a poll idle state in powernv and pseries platforms. Snooze
>> has a timeout so that if a cpu stays in snooze for more than target
>> residency of the next available idle state, then it would exit thereby
>> giving chance to the cpuidle governor to re-evaluate and
>> promote the cpu to a deeper idle state. Therefore whenever snooze exits
>> due to this timeout, its last_residency will be target_residency of next
>> deeper state.
>>
>> commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
>> changed the math around last_residency calculation. Specifically, while
>> converting last_residency value from nanoseconds to microseconds it does
>> right shift by 10. Due to this, in snooze timeout exit scenarios
>> last_residency calculated is roughly 2.3% less than target_residency of
>> next available state. This pattern is picked up get_typical_interval()
>> in the menu governor and therefore expected_interval in menu_select() is
>> frequently less than the target_residency of any state but snooze.
>>
>> Due to this we are entering snooze at a higher rate, thereby affecting
>> the single thread performance.
>>
>> Fix this by replacing right shift by 10 with /1000 while calculating
>> last_residency.
>>
>> Reported-by: Anton Blanchard <anton@samba.org>
>> Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
>> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
>> ---
>> Changes in v2
>> =============
>>   - Fixing it in the cpuidle core code instead of driver code.
>>
>>   drivers/cpuidle/cpuidle.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index a4d0059..30d67a8 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -218,10 +218,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>>   		local_irq_enable();
>>
>>   	/*
>> -	 * local_clock() returns the time in nanosecond, let's shift
>> -	 * by 10 (divide by 1024) to have microsecond based time.
>> +	 * local_clock() returns the time in nanosecond, let's
>> +	 * divide by 1000 to have microsecond based time.
>>   	 */
>> -	diff = (time_end - time_start) >> 10;
>> +	diff = (time_end - time_start) / 1000;

do_div ?

>>   	if (diff > INT_MAX)
>>   		diff = INT_MAX;
>
> The intent of the >> 10 was probably to avoid an expensive 64bit divide.
> So maybe something like:
> 	diff = time_end - time_start;
> 	if (diff >= INT_MAX/2)
> 		diff_32 = INT_MAX/2/1000;
> 	else
> 		diff_32 = diff;
> 		diff_32 += diff_32 >> 6;
> 		diff_32 >>= 10;
> 	}
>
> Adding an extra 1/32 makes the division by be something slightly below 1000.



-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24  9:00   ` David Laight
  (?)
  (?)
@ 2016-06-24 10:11   ` Arnd Bergmann
  2016-06-24 16:01     ` Shreyas B Prabhu
  -1 siblings, 1 reply; 16+ messages in thread
From: Arnd Bergmann @ 2016-06-24 10:11 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: David Laight, 'Shreyas B. Prabhu',
	rjw, daniel.lezcano, anton, linux-pm

On Friday, June 24, 2016 9:00:48 AM CEST David Laight wrote:
> The intent of the >> 10 was probably to avoid an expensive 64bit divide.
> So maybe something like:
>         diff = time_end - time_start;
>         if (diff >= INT_MAX/2)
>                 diff_32 = INT_MAX/2/1000;
>         else
>                 diff_32 = diff;
>                 diff_32 += diff_32 >> 6;
>                 diff_32 >>= 10;
>         }
> 
> Adding an extra 1/32 makes the division by be something slightly below 1000.

Why not change the definition of the time to nanoseconds and update
the users accordingly?

I see that cpuidle_enter_state() writes to the last_residency value,
and it is read in three places: ladder_select_state(), menu_update()
and show_state_time() (for sysfs).

If those functions are called less often than cpuidle_enter_state(),
we could just move the division there. Since the divisor is constant,
do_div() can convert it into a multiply and shift, or we could use
your the code you suggest above, or use a 32-bit division most of
the time:

	if (diff <= UINT_MAX)
		diff_32 = (u32)diff / NSECS_PER_USEC;
	else
		diff_32 = div_u64(diff, NSECS_PER_USEC;

which gcc itself will turn into a multiplication or series of
shifts on CPUs on which that is faster.

	Arnd


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24  8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
@ 2016-06-24 10:27   ` kbuild test robot
  2016-06-24  9:30   ` kbuild test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2016-06-24 10:27 UTC (permalink / raw)
  Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
	mpe, bsingharora, Shreyas B. Prabhu

[-- Attachment #1: Type: text/plain, Size: 1975 bytes --]

Hi,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160624]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: sh-sh7785lcr_32bit_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `cpuidle_enter_state':
>> drivers/cpuidle/cpuidle.c:242: undefined reference to `__udivdi3'

vim +242 drivers/cpuidle/cpuidle.c

56cfbf74a Colin Cross    2012-05-07  236  		dev->states_usage[entered_state].usage++;
56cfbf74a Colin Cross    2012-05-07  237  	} else {
56cfbf74a Colin Cross    2012-05-07  238  		dev->last_residency = 0;
56cfbf74a Colin Cross    2012-05-07  239  	}
56cfbf74a Colin Cross    2012-05-07  240  
56cfbf74a Colin Cross    2012-05-07  241  	return entered_state;
56cfbf74a Colin Cross    2012-05-07 @242  }
56cfbf74a Colin Cross    2012-05-07  243  
56cfbf74a Colin Cross    2012-05-07  244  /**
907e30f1b Daniel Lezcano 2014-03-03  245   * cpuidle_select - ask the cpuidle framework to choose an idle state

:::::: The code at line 242 was first introduced by commit
:::::: 56cfbf74a17c40f3a741398103c9f5d5a6806715 cpuidle: refactor out cpuidle_enter_state

:::::: TO: Colin Cross <ccross@android.com>
:::::: CC: Len Brown <len.brown@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 16948 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
@ 2016-06-24 10:27   ` kbuild test robot
  0 siblings, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2016-06-24 10:27 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
	mpe, bsingharora, Shreyas B. Prabhu

[-- Attachment #1: Type: text/plain, Size: 1975 bytes --]

Hi,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160624]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: sh-sh7785lcr_32bit_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `cpuidle_enter_state':
>> drivers/cpuidle/cpuidle.c:242: undefined reference to `__udivdi3'

vim +242 drivers/cpuidle/cpuidle.c

56cfbf74a Colin Cross    2012-05-07  236  		dev->states_usage[entered_state].usage++;
56cfbf74a Colin Cross    2012-05-07  237  	} else {
56cfbf74a Colin Cross    2012-05-07  238  		dev->last_residency = 0;
56cfbf74a Colin Cross    2012-05-07  239  	}
56cfbf74a Colin Cross    2012-05-07  240  
56cfbf74a Colin Cross    2012-05-07  241  	return entered_state;
56cfbf74a Colin Cross    2012-05-07 @242  }
56cfbf74a Colin Cross    2012-05-07  243  
56cfbf74a Colin Cross    2012-05-07  244  /**
907e30f1b Daniel Lezcano 2014-03-03  245   * cpuidle_select - ask the cpuidle framework to choose an idle state

:::::: The code at line 242 was first introduced by commit
:::::: 56cfbf74a17c40f3a741398103c9f5d5a6806715 cpuidle: refactor out cpuidle_enter_state

:::::: TO: Colin Cross <ccross@android.com>
:::::: CC: Len Brown <len.brown@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 16948 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24  8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
@ 2016-06-24 12:10   ` kbuild test robot
  2016-06-24  9:30   ` kbuild test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2016-06-24 12:10 UTC (permalink / raw)
  Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
	mpe, bsingharora, Shreyas B. Prabhu

[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]

Hi,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160623]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: arm-multi_v5_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `cpuidle_enter_state':
>> drivers/cpuidle/cpuidle.c:224: undefined reference to `__aeabi_uldivmod'

vim +224 drivers/cpuidle/cpuidle.c

   218			local_irq_enable();
   219	
   220		/*
   221		 * local_clock() returns the time in nanosecond, let's
   222		 * divide by 1000 to have microsecond based time.
   223		 */
 > 224		diff = (time_end - time_start) / 1000;
   225		if (diff > INT_MAX)
   226			diff = INT_MAX;
   227	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 23385 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
@ 2016-06-24 12:10   ` kbuild test robot
  0 siblings, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2016-06-24 12:10 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: kbuild-all, rjw, daniel.lezcano, linux-pm, linuxppc-dev, anton,
	mpe, bsingharora, Shreyas B. Prabhu

[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]

Hi,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.7-rc4 next-20160623]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Shreyas-B-Prabhu/cpuidle-Fix-last_residency-division/20160624-162633
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: arm-multi_v5_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `cpuidle_enter_state':
>> drivers/cpuidle/cpuidle.c:224: undefined reference to `__aeabi_uldivmod'

vim +224 drivers/cpuidle/cpuidle.c

   218			local_irq_enable();
   219	
   220		/*
   221		 * local_clock() returns the time in nanosecond, let's
   222		 * divide by 1000 to have microsecond based time.
   223		 */
 > 224		diff = (time_end - time_start) / 1000;
   225		if (diff > INT_MAX)
   226			diff = INT_MAX;
   227	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 23385 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24 10:11   ` Arnd Bergmann
@ 2016-06-24 16:01     ` Shreyas B Prabhu
  2016-06-24 19:43       ` Arnd Bergmann
  0 siblings, 1 reply; 16+ messages in thread
From: Shreyas B Prabhu @ 2016-06-24 16:01 UTC (permalink / raw)
  To: Arnd Bergmann, linuxppc-dev
  Cc: David Laight, rjw, daniel.lezcano, anton, linux-pm



On 06/24/2016 03:41 PM, Arnd Bergmann wrote:
> On Friday, June 24, 2016 9:00:48 AM CEST David Laight wrote:
>> The intent of the >> 10 was probably to avoid an expensive 64bit divide.
>> So maybe something like:
>>         diff = time_end - time_start;
>>         if (diff >= INT_MAX/2)
>>                 diff_32 = INT_MAX/2/1000;
>>         else
>>                 diff_32 = diff;
>>                 diff_32 += diff_32 >> 6;
>>                 diff_32 >>= 10;
>>         }
>>
>> Adding an extra 1/32 makes the division by be something slightly below 1000.
> 
> Why not change the definition of the time to nanoseconds and update
> the users accordingly?
> 
> I see that cpuidle_enter_state() writes to the last_residency value,
> and it is read in three places: ladder_select_state(), menu_update()
> and show_state_time() (for sysfs).
> 

Updating show_state_time() to convert ns to us gets bit messy since
show_state_##_name which is used for disable, usage and time simply
returns state_usage->_name. And state_usage->time updation happens in
cpuidle_enter_state() itself.

> If those functions are called less often than cpuidle_enter_state(),
> we could just move the division there. Since the divisor is constant,
> do_div() can convert it into a multiply and shift, or we could use
> your the code you suggest above, or use a 32-bit division most of
> the time:
> 
> 	if (diff <= UINT_MAX)
> 		diff_32 = (u32)diff / NSECS_PER_USEC;
> 	else
> 		diff_32 = div_u64(diff, NSECS_PER_USEC;
> 
> which gcc itself will turn into a multiplication or series of
> shifts on CPUs on which that is faster.
>
I'm not sure which division method of the three suggested here to use.
Does anyone have a strong preference?

--Shreyas


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24 16:01     ` Shreyas B Prabhu
@ 2016-06-24 19:43       ` Arnd Bergmann
  2016-06-27  8:59           ` David Laight
  0 siblings, 1 reply; 16+ messages in thread
From: Arnd Bergmann @ 2016-06-24 19:43 UTC (permalink / raw)
  To: Shreyas B Prabhu
  Cc: linuxppc-dev, David Laight, rjw, daniel.lezcano, anton, linux-pm

On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
> > If those functions are called less often than cpuidle_enter_state(),
> > we could just move the division there. Since the divisor is constant,
> > do_div() can convert it into a multiply and shift, or we could use
> > your the code you suggest above, or use a 32-bit division most of
> > the time:
> > 
> >       if (diff <= UINT_MAX)
> >               diff_32 = (u32)diff / NSECS_PER_USEC;
> >       else
> >               diff_32 = div_u64(diff, NSECS_PER_USEC;
> > 
> > which gcc itself will turn into a multiplication or series of
> > shifts on CPUs on which that is faster.
> >
> I'm not sure which division method of the three suggested here to use.
> Does anyone have a strong preference?
> 

It depends on how accurate we want it and how long we expect
the times to be. The optimization for the 4.2 second cutoff
for doing a 32-bit division only makes sense if the majority
of the sleep times are below that.

	Arnd

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-24 19:43       ` Arnd Bergmann
@ 2016-06-27  8:59           ` David Laight
  0 siblings, 0 replies; 16+ messages in thread
From: David Laight @ 2016-06-27  8:59 UTC (permalink / raw)
  To: 'Arnd Bergmann', Shreyas B Prabhu
  Cc: rjw, daniel.lezcano, linuxppc-dev, anton, linux-pm

From: Arnd Bergmann
> Sent: 24 June 2016 20:43
> On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
> > > If those functions are called less often than cpuidle_enter_state(),
> > > we could just move the division there. Since the divisor is constant,
> > > do_div() can convert it into a multiply and shift, or we could use
> > > your the code you suggest above, or use a 32-bit division most of
> > > the time:
> > >
> > >       if (diff <= UINT_MAX)
> > >               diff_32 = (u32)diff / NSECS_PER_USEC;
> > >       else
> > >               diff_32 = div_u64(diff, NSECS_PER_USEC;
> > >
> > > which gcc itself will turn into a multiplication or series of
> > > shifts on CPUs on which that is faster.
> > >
> > I'm not sure which division method of the three suggested here to use.
> > Does anyone have a strong preference?
> >
> 
> It depends on how accurate we want it and how long we expect
> the times to be. The optimization for the 4.2 second cutoff
> for doing a 32-bit division only makes sense if the majority
> of the sleep times are below that.

It also depends if the code actually cares about the length of 'long' sleeps.
I'd guess that for cpu idle 4.2 seconds is 'a long time', so the div_u64()
result could be treated as 4.2 seconds without causing grief.

Actually the cost of a 64bit divide after a 4 second sleep will be noise.
OTOH a 64bit divide after a sleep that lasted a few ns will be significant.

	David

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v2] cpuidle: Fix last_residency division
@ 2016-06-27  8:59           ` David Laight
  0 siblings, 0 replies; 16+ messages in thread
From: David Laight @ 2016-06-27  8:59 UTC (permalink / raw)
  To: 'Arnd Bergmann', Shreyas B Prabhu
  Cc: linuxppc-dev, rjw, daniel.lezcano, anton, linux-pm

From: Arnd Bergmann
> Sent: 24 June 2016 20:43
> On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
> > > If those functions are called less often than cpuidle_enter_state(),
> > > we could just move the division there. Since the divisor is constant,
> > > do_div() can convert it into a multiply and shift, or we could use
> > > your the code you suggest above, or use a 32-bit division most of
> > > the time:
> > >
> > >       if (diff <=3D UINT_MAX)
> > >               diff_32 =3D (u32)diff / NSECS_PER_USEC;
> > >       else
> > >               diff_32 =3D div_u64(diff, NSECS_PER_USEC;
> > >
> > > which gcc itself will turn into a multiplication or series of
> > > shifts on CPUs on which that is faster.
> > >
> > I'm not sure which division method of the three suggested here to use.
> > Does anyone have a strong preference?
> >
>=20
> It depends on how accurate we want it and how long we expect
> the times to be. The optimization for the 4.2 second cutoff
> for doing a 32-bit division only makes sense if the majority
> of the sleep times are below that.

It also depends if the code actually cares about the length of 'long' sleep=
s.
I'd guess that for cpu idle 4.2 seconds is 'a long time', so the div_u64()
result could be treated as 4.2 seconds without causing grief.

Actually the cost of a 64bit divide after a 4 second sleep will be noise.
OTOH a 64bit divide after a sleep that lasted a few ns will be significant.

	David

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] cpuidle: Fix last_residency division
  2016-06-27  8:59           ` David Laight
  (?)
@ 2016-06-29  7:00           ` Shreyas B Prabhu
  -1 siblings, 0 replies; 16+ messages in thread
From: Shreyas B Prabhu @ 2016-06-29  7:00 UTC (permalink / raw)
  To: David Laight, 'Arnd Bergmann'
  Cc: linuxppc-dev, rjw, daniel.lezcano, anton, linux-pm



On 06/27/2016 02:29 PM, David Laight wrote:
> From: Arnd Bergmann
>> Sent: 24 June 2016 20:43
>> On Friday, June 24, 2016 9:31:35 PM CEST Shreyas B Prabhu wrote:
>>>> If those functions are called less often than cpuidle_enter_state(),
>>>> we could just move the division there. Since the divisor is constant,
>>>> do_div() can convert it into a multiply and shift, or we could use
>>>> your the code you suggest above, or use a 32-bit division most of
>>>> the time:
>>>>
>>>>       if (diff <= UINT_MAX)
>>>>               diff_32 = (u32)diff / NSECS_PER_USEC;
>>>>       else
>>>>               diff_32 = div_u64(diff, NSECS_PER_USEC;
>>>>
>>>> which gcc itself will turn into a multiplication or series of
>>>> shifts on CPUs on which that is faster.
>>>>
>>> I'm not sure which division method of the three suggested here to use.
>>> Does anyone have a strong preference?
>>>
>>
>> It depends on how accurate we want it and how long we expect
>> the times to be. The optimization for the 4.2 second cutoff
>> for doing a 32-bit division only makes sense if the majority
>> of the sleep times are below that.
> 
> It also depends if the code actually cares about the length of 'long' sleeps.
> I'd guess that for cpu idle 4.2 seconds is 'a long time', so the div_u64()
> result could be treated as 4.2 seconds without causing grief.
> 
> Actually the cost of a 64bit divide after a 4 second sleep will be noise.
> OTOH a 64bit divide after a sleep that lasted a few ns will be significant.
> 
Agreed. I'll use the code you suggested, with a small change-
Using diff_32 += diff_32 >> 5 instead of diff_32 += diff_32 >> 6
since I want to err on the side of last_residency being more than actual.

And for long sleep cases, I'll use div_u64().

Thanks,
Shreyas


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-06-29  7:00 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-24  8:23 [PATCH v2] cpuidle: Fix last_residency division Shreyas B. Prabhu
2016-06-24  9:00 ` David Laight
2016-06-24  9:00   ` David Laight
2016-06-24 10:05   ` Daniel Lezcano
2016-06-24 10:11   ` Arnd Bergmann
2016-06-24 16:01     ` Shreyas B Prabhu
2016-06-24 19:43       ` Arnd Bergmann
2016-06-27  8:59         ` David Laight
2016-06-27  8:59           ` David Laight
2016-06-29  7:00           ` Shreyas B Prabhu
2016-06-24  9:30 ` kbuild test robot
2016-06-24  9:30   ` kbuild test robot
2016-06-24 10:27 ` kbuild test robot
2016-06-24 10:27   ` kbuild test robot
2016-06-24 12:10 ` kbuild test robot
2016-06-24 12:10   ` kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.