* [PATCH] cpufreq, Fix overflow in busy_scaled due to long delay
@ 2015-06-10 13:18 Prarit Bhargava
2015-06-10 23:45 ` Rafael J. Wysocki
0 siblings, 1 reply; 7+ messages in thread
From: Prarit Bhargava @ 2015-06-10 13:18 UTC (permalink / raw)
To: linux-kernel
Cc: Prarit Bhargava, Kristen Carlson Accardi, Rafael J. Wysocki,
Viresh Kumar, linux-pm
I looked into switching to div64_s64() instead of the 32-bit version in
div_fp(), however, this would result in sample_ratio and core_busy returning
0 which is something we don't want.
P.
---8<---
The kernel may delay interrupts for a long time which can result in timers
being delayed. If this occurs the intel_pstate driver will crash with
a divide by zero error:
divide error: 0000 [#1] SMP
Modules linked in: btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc arc4 md4 nls_utf8 cifs dns_resolver tcp_lp bnep bluetooth rfkill fuse dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables intel_powerclamp coretemp vfat fat kvm_intel iTCO_wdt iTCO_vendor_support ipmi_devintf sr_mod kvm crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel cdc_ether lrw usbnet cdrom mii gf128mul glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr sb_edac edac_core ipmi_si ipmi_msghandler ioatdma wmi shpchp acpi_pad nfsd auth_rpcgss nfs_acl lockd uinput dm_multipath sunrpc xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ixgbe mgag200 syscopyarea sysfillrect sysimgblt mdio drm_kms_helper ttm igb drm ptp pps_core dca i2c_algo_bit megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 113 PID: 0 Comm: swapper/113 Tainted: G W -------------- 3.10.0-229.1.2.el7.x86_64 #1
Hardware name: IBM x3950 X6 -[3837AC2]-/00FN827, BIOS -[A8E112BUS-1.00]- 08/27/2014
task: ffff880fe8abe660 ti: ffff880fe8ae4000 task.ti: ffff880fe8ae4000
RIP: 0010:[<ffffffff814a9279>] [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
RSP: 0018:ffff883fff4e3db8 EFLAGS: 00010206
RAX: 0000000027100000 RBX: ffff883fe6965100 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000010 RDI: 000000002e53632d
RBP: ffff883fff4e3e20 R08: 000e6f69a5a125c0 R09: ffff883fe84ec001
R10: 0000000000000002 R11: 0000000000000005 R12: 00000000000049f5
R13: 0000000000271000 R14: 00000000000049f5 R15: 0000000000000246
FS: 0000000000000000(0000) GS:ffff883fff4e0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7668601000 CR3: 000000000190a000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff883fff4e3e58 ffffffff81099dc1 0000000000000086 0000000000000071
ffff883fff4f3680 0000000000000071 fbdc8a965e33afee ffffffff810b69dd
ffff883fe84ec000 ffff883fe6965108 0000000000000100 ffffffff814a9100
Call Trace:
<IRQ>
[<ffffffff81099dc1>] ? run_posix_cpu_timers+0x51/0x840
[<ffffffff810b69dd>] ? trigger_load_balance+0x5d/0x200
[<ffffffff814a9100>] ? pid_param_set+0x130/0x130
[<ffffffff8107df56>] call_timer_fn+0x36/0x110
[<ffffffff814a9100>] ? pid_param_set+0x130/0x130
[<ffffffff8107fdcf>] run_timer_softirq+0x21f/0x320
[<ffffffff81077b2f>] __do_softirq+0xef/0x280
[<ffffffff816156dc>] call_softirq+0x1c/0x30
[<ffffffff81015d95>] do_softirq+0x65/0xa0
[<ffffffff81077ec5>] irq_exit+0x115/0x120
[<ffffffff81616355>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff81614a1d>] apic_timer_interrupt+0x6d/0x80
<EOI>
[<ffffffff814a9c32>] ? cpuidle_enter_state+0x52/0xc0
[<ffffffff814a9c28>] ? cpuidle_enter_state+0x48/0xc0
[<ffffffff814a9d65>] cpuidle_idle_call+0xc5/0x200
[<ffffffff8101d14e>] arch_cpu_idle+0xe/0x30
[<ffffffff810c67c1>] cpu_startup_entry+0xf1/0x290
[<ffffffff8104228a>] start_secondary+0x1ba/0x230
Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 <48> f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29
RIP [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
RSP <ffff883fff4e3db8>
The kernel values for cpudata for CPU 113 were:
struct cpudata {
cpu = 113,
timer = {
entry = {
next = 0x0,
prev = 0xdead000000200200
},
expires = 8357799745,
base = 0xffff883fe84ec001,
function = 0xffffffff814a9100 <intel_pstate_timer_func>,
data = 18446612406765768960,
<snip>
i_gain = 0,
d_gain = 0,
deadband = 0,
last_err = 22489
},
last_sample_time = {
tv64 = 4063132438017305
},
prev_aperf = 287326796397463,
prev_mperf = 251427432090198,
sample = {
core_pct_busy = 23081,
aperf = 2937407,
mperf = 3257884,
freq = 2524484,
time = {
tv64 = 4063149215234118
}
}
}
which results in the time between samples = last_sample_time - sample.time
= 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds.
The duration between reads of the APERF and MPERF registers overflowed a s32
sized integer in intel_pstate_get_scaled_busy()'s call to div_fp(). The result
is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0.
While the kernel shouldn't be delaying for a long time, it can and does
happen, and the intel_pstate driver should not panic in this situation. This
patch checks for an overflow and ignores the current calculation cycle by
returning -EINVAL. Since intel_pstate_sample() has been called, subsequent
timer function calls will then again pick up the correct calculations and the
system will continue functioning properly.
Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
---
drivers/cpufreq/intel_pstate.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 6414661..6a145e5 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -823,6 +823,9 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
sample_time = pid_params.sample_rate_ms * USEC_PER_MSEC;
duration_us = (u32) ktime_us_delta(cpu->sample.time,
cpu->last_sample_time);
+ if (duration_us > S32_MAX)
+ return -EINVAL;
+
if (duration_us > sample_time * 3) {
sample_ratio = div_fp(int_tofp(sample_time),
int_tofp(duration_us));
@@ -832,7 +835,7 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
return core_busy;
}
-static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
+static inline int32_t intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
{
int32_t busy_scaled;
struct _pid *pid;
@@ -840,11 +843,15 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
pid = &cpu->pid;
busy_scaled = intel_pstate_get_scaled_busy(cpu);
+ if (busy_scaled < 0)
+ return S32_MAX;
ctl = pid_calc(pid, busy_scaled);
/* Negative values of ctl increase the pstate and vice versa */
intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl);
+
+ return busy_scaled;
}
static void intel_hwp_timer_func(unsigned long __data)
@@ -859,15 +866,16 @@ static void intel_pstate_timer_func(unsigned long __data)
{
struct cpudata *cpu = (struct cpudata *) __data;
struct sample *sample;
+ int32_t busy_scaled;
intel_pstate_sample(cpu);
sample = &cpu->sample;
- intel_pstate_adjust_busy_pstate(cpu);
+ busy_scaled = intel_pstate_adjust_busy_pstate(cpu);
trace_pstate_sample(fp_toint(sample->core_pct_busy),
- fp_toint(intel_pstate_get_scaled_busy(cpu)),
+ fp_toint(busy_scaled),
cpu->pstate.current_pstate,
sample->mperf,
sample->aperf,
--
1.7.9.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] cpufreq, Fix overflow in busy_scaled due to long delay
2015-06-10 13:18 [PATCH] cpufreq, Fix overflow in busy_scaled due to long delay Prarit Bhargava
@ 2015-06-10 23:45 ` Rafael J. Wysocki
2015-06-11 14:51 ` Doug Smythies
0 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2015-06-10 23:45 UTC (permalink / raw)
To: Prarit Bhargava
Cc: linux-kernel, Kristen Carlson Accardi, Viresh Kumar, linux-pm
On Wednesday, June 10, 2015 09:18:45 AM Prarit Bhargava wrote:
> I looked into switching to div64_s64() instead of the 32-bit version in
> div_fp(), however, this would result in sample_ratio and core_busy returning
> 0 which is something we don't want.
>
> P.
>
> ---8<---
>
> The kernel may delay interrupts for a long time which can result in timers
> being delayed. If this occurs the intel_pstate driver will crash with
> a divide by zero error:
>
> divide error: 0000 [#1] SMP
> Modules linked in: btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc arc4 md4 nls_utf8 cifs dns_resolver tcp_lp bnep bluetooth rfkill fuse dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables intel_powerclamp coretemp vfat fat kvm_intel iTCO_wdt iTCO_vendor_support ipmi_devintf sr_mod kvm crct10dif_pclmul
> crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel cdc_ether lrw usbnet cdrom mii gf128mul glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr sb_edac edac_core ipmi_si ipmi_msghandler ioatdma wmi shpchp acpi_pad nfsd auth_rpcgss nfs_acl lockd uinput dm_multipath sunrpc xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ixgbe mgag200 syscopyarea sysfillrect sysimgblt mdio drm_kms_helper ttm igb drm ptp pps_core dca i2c_algo_bit megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod
> CPU: 113 PID: 0 Comm: swapper/113 Tainted: G W -------------- 3.10.0-229.1.2.el7.x86_64 #1
> Hardware name: IBM x3950 X6 -[3837AC2]-/00FN827, BIOS -[A8E112BUS-1.00]- 08/27/2014
> task: ffff880fe8abe660 ti: ffff880fe8ae4000 task.ti: ffff880fe8ae4000
> RIP: 0010:[<ffffffff814a9279>] [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
> RSP: 0018:ffff883fff4e3db8 EFLAGS: 00010206
> RAX: 0000000027100000 RBX: ffff883fe6965100 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000010 RDI: 000000002e53632d
> RBP: ffff883fff4e3e20 R08: 000e6f69a5a125c0 R09: ffff883fe84ec001
> R10: 0000000000000002 R11: 0000000000000005 R12: 00000000000049f5
> R13: 0000000000271000 R14: 00000000000049f5 R15: 0000000000000246
> FS: 0000000000000000(0000) GS:ffff883fff4e0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f7668601000 CR3: 000000000190a000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Stack:
> ffff883fff4e3e58 ffffffff81099dc1 0000000000000086 0000000000000071
> ffff883fff4f3680 0000000000000071 fbdc8a965e33afee ffffffff810b69dd
> ffff883fe84ec000 ffff883fe6965108 0000000000000100 ffffffff814a9100
> Call Trace:
> <IRQ>
>
> [<ffffffff81099dc1>] ? run_posix_cpu_timers+0x51/0x840
> [<ffffffff810b69dd>] ? trigger_load_balance+0x5d/0x200
> [<ffffffff814a9100>] ? pid_param_set+0x130/0x130
> [<ffffffff8107df56>] call_timer_fn+0x36/0x110
> [<ffffffff814a9100>] ? pid_param_set+0x130/0x130
> [<ffffffff8107fdcf>] run_timer_softirq+0x21f/0x320
> [<ffffffff81077b2f>] __do_softirq+0xef/0x280
> [<ffffffff816156dc>] call_softirq+0x1c/0x30
> [<ffffffff81015d95>] do_softirq+0x65/0xa0
> [<ffffffff81077ec5>] irq_exit+0x115/0x120
> [<ffffffff81616355>] smp_apic_timer_interrupt+0x45/0x60
> [<ffffffff81614a1d>] apic_timer_interrupt+0x6d/0x80
> <EOI>
>
> [<ffffffff814a9c32>] ? cpuidle_enter_state+0x52/0xc0
> [<ffffffff814a9c28>] ? cpuidle_enter_state+0x48/0xc0
> [<ffffffff814a9d65>] cpuidle_idle_call+0xc5/0x200
> [<ffffffff8101d14e>] arch_cpu_idle+0xe/0x30
> [<ffffffff810c67c1>] cpu_startup_entry+0xf1/0x290
> [<ffffffff8104228a>] start_secondary+0x1ba/0x230
> Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 <48> f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29
> RIP [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
> RSP <ffff883fff4e3db8>
>
> The kernel values for cpudata for CPU 113 were:
>
> struct cpudata {
> cpu = 113,
> timer = {
> entry = {
> next = 0x0,
> prev = 0xdead000000200200
> },
> expires = 8357799745,
> base = 0xffff883fe84ec001,
> function = 0xffffffff814a9100 <intel_pstate_timer_func>,
> data = 18446612406765768960,
> <snip>
> i_gain = 0,
> d_gain = 0,
> deadband = 0,
> last_err = 22489
> },
> last_sample_time = {
> tv64 = 4063132438017305
> },
> prev_aperf = 287326796397463,
> prev_mperf = 251427432090198,
> sample = {
> core_pct_busy = 23081,
> aperf = 2937407,
> mperf = 3257884,
> freq = 2524484,
> time = {
> tv64 = 4063149215234118
> }
> }
> }
>
> which results in the time between samples = last_sample_time - sample.time
> = 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds.
>
> The duration between reads of the APERF and MPERF registers overflowed a s32
> sized integer in intel_pstate_get_scaled_busy()'s call to div_fp(). The result
> is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0.
>
> While the kernel shouldn't be delaying for a long time, it can and does
> happen, and the intel_pstate driver should not panic in this situation. This
> patch checks for an overflow and ignores the current calculation cycle by
> returning -EINVAL. Since intel_pstate_sample() has been called, subsequent
> timer function calls will then again pick up the correct calculations and the
> system will continue functioning properly.
>
> Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: linux-pm@vger.kernel.org
>
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Looks OK to me, but I need Kristen to review it too.
> ---
> drivers/cpufreq/intel_pstate.c | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 6414661..6a145e5 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -823,6 +823,9 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
> sample_time = pid_params.sample_rate_ms * USEC_PER_MSEC;
> duration_us = (u32) ktime_us_delta(cpu->sample.time,
> cpu->last_sample_time);
> + if (duration_us > S32_MAX)
> + return -EINVAL;
> +
> if (duration_us > sample_time * 3) {
> sample_ratio = div_fp(int_tofp(sample_time),
> int_tofp(duration_us));
> @@ -832,7 +835,7 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
> return core_busy;
> }
>
> -static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
> +static inline int32_t intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
> {
> int32_t busy_scaled;
> struct _pid *pid;
> @@ -840,11 +843,15 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
>
> pid = &cpu->pid;
> busy_scaled = intel_pstate_get_scaled_busy(cpu);
> + if (busy_scaled < 0)
> + return S32_MAX;
>
> ctl = pid_calc(pid, busy_scaled);
>
> /* Negative values of ctl increase the pstate and vice versa */
> intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl);
> +
> + return busy_scaled;
> }
>
> static void intel_hwp_timer_func(unsigned long __data)
> @@ -859,15 +866,16 @@ static void intel_pstate_timer_func(unsigned long __data)
> {
> struct cpudata *cpu = (struct cpudata *) __data;
> struct sample *sample;
> + int32_t busy_scaled;
>
> intel_pstate_sample(cpu);
>
> sample = &cpu->sample;
>
> - intel_pstate_adjust_busy_pstate(cpu);
> + busy_scaled = intel_pstate_adjust_busy_pstate(cpu);
>
> trace_pstate_sample(fp_toint(sample->core_pct_busy),
> - fp_toint(intel_pstate_get_scaled_busy(cpu)),
> + fp_toint(busy_scaled),
> cpu->pstate.current_pstate,
> sample->mperf,
> sample->aperf,
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] cpufreq, Fix overflow in busy_scaled due to long delay
2015-06-10 23:45 ` Rafael J. Wysocki
@ 2015-06-11 14:51 ` Doug Smythies
2015-06-11 14:58 ` Prarit Bhargava
2015-06-11 15:00 ` Prarit Bhargava
0 siblings, 2 replies; 7+ messages in thread
From: Doug Smythies @ 2015-06-11 14:51 UTC (permalink / raw)
To: 'Rafael J. Wysocki', 'Prarit Bhargava'
Cc: linux-kernel, 'Kristen Carlson Accardi',
'Viresh Kumar',
linux-pm, Doug Smythies
On 2015.06.10 16:46 Rafael J. Wysocki wrote:
> On Wednesday, June 10, 2015 09:18:45 AM Prarit Bhargava wrote:
>> I looked into switching to div64_s64() instead of the 32-bit version in
>> div_fp(), however, this would result in sample_ratio and core_busy returning
>> 0 which is something we don't want.
???
Due to a great many overflow related issues, div_fp() was changed to div64_s64()
a long time ago.
I have not found the actual commit to reference, but it was about a year ago.
And the math in general was all changed to 64 bit, over a few commits.
>
> P.
>
> ---8<---
>
> The kernel may delay interrupts for a long time which can result in timers
> being delayed. If this occurs the intel_pstate driver will crash with
> a divide by zero error:
More recent versions will not crash.
Long timer delays are extremely common, and this is a fundamental flaw
in the duration method. Patch sets have been submitting dealing with this,
and other, issues.
>>
>> which results in the time between samples = last_sample_time - sample.time
>> = 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds.
I have never seen anything over 4 seconds before, and I study this stuff
(with respect to the intel_pstate driver operation) a lot. Due to help
from others, I have data from a variety of processors.
4 seconds not unusual, even under load.
>>
>> The duration between reads of the APERF and MPERF registers overflowed a s32
>> sized integer in intel_pstate_get_scaled_busy()'s call to div_fp(). The result
>> is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0.
>>
>> While the kernel shouldn't be delaying for a long time, it can and does
>> happen, and the intel_pstate driver should not panic in this situation. This
>> patch checks for an overflow and ignores the current calculation cycle by
>> returning -EINVAL. Since intel_pstate_sample() has been called, subsequent
>> timer function calls will then again pick up the correct calculations and the
>> system will continue functioning properly.
That would run the risk that the correct calculation would never be done.
It is fairly easy (I do it all the time) to create a scenario where
there is high load on a CPU, but also a very very high duration value,
for each and every duration. (and O.K., in that scenario the calculation is always
wrong anyhow, due to the long duration check engaging.)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] cpufreq, Fix overflow in busy_scaled due to long delay
2015-06-11 14:51 ` Doug Smythies
@ 2015-06-11 14:58 ` Prarit Bhargava
2015-06-11 15:00 ` Prarit Bhargava
1 sibling, 0 replies; 7+ messages in thread
From: Prarit Bhargava @ 2015-06-11 14:58 UTC (permalink / raw)
To: Doug Smythies
Cc: 'Rafael J. Wysocki',
linux-kernel, 'Kristen Carlson Accardi',
'Viresh Kumar',
linux-pm
On 06/11/2015 10:51 AM, Doug Smythies wrote:
>
> On 2015.06.10 16:46 Rafael J. Wysocki wrote:
>> On Wednesday, June 10, 2015 09:18:45 AM Prarit Bhargava wrote:
>>> I looked into switching to div64_s64() instead of the 32-bit version in
>>> div_fp(), however, this would result in sample_ratio and core_busy returning
>>> 0 which is something we don't want.
>
> ???
> Due to a great many overflow related issues, div_fp() was changed to div64_s64()
> a long time ago.
/me scratches head ... Perhaps I was looking at the wrong tree (RHEL vs
linux.git) but I'll go back & double check.
P.
>
> I have not found the actual commit to reference, but it was about a year ago.
> And the math in general was all changed to 64 bit, over a few commits.
>
>>
>> P.
>>
>> ---8<---
>>
>> The kernel may delay interrupts for a long time which can result in timers
>> being delayed. If this occurs the intel_pstate driver will crash with
>> a divide by zero error:
>
> More recent versions will not crash.
> Long timer delays are extremely common, and this is a fundamental flaw
> in the duration method. Patch sets have been submitting dealing with this,
> and other, issues.
>
>>>
>>> which results in the time between samples = last_sample_time - sample.time
>>> = 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds.
>
> I have never seen anything over 4 seconds before, and I study this stuff
> (with respect to the intel_pstate driver operation) a lot. Due to help
> from others, I have data from a variety of processors.
> 4 seconds not unusual, even under load.
>
>>>
>>> The duration between reads of the APERF and MPERF registers overflowed a s32
>>> sized integer in intel_pstate_get_scaled_busy()'s call to div_fp(). The result
>>> is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0.
>>>
>>> While the kernel shouldn't be delaying for a long time, it can and does
>>> happen, and the intel_pstate driver should not panic in this situation. This
>>> patch checks for an overflow and ignores the current calculation cycle by
>>> returning -EINVAL. Since intel_pstate_sample() has been called, subsequent
>>> timer function calls will then again pick up the correct calculations and the
>>> system will continue functioning properly.
>
> That would run the risk that the correct calculation would never be done.
> It is fairly easy (I do it all the time) to create a scenario where
> there is high load on a CPU, but also a very very high duration value,
> for each and every duration. (and O.K., in that scenario the calculation is always
> wrong anyhow, due to the long duration check engaging.)
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] cpufreq, Fix overflow in busy_scaled due to long delay
2015-06-11 14:51 ` Doug Smythies
2015-06-11 14:58 ` Prarit Bhargava
@ 2015-06-11 15:00 ` Prarit Bhargava
2015-06-11 16:17 ` Doug Smythies
1 sibling, 1 reply; 7+ messages in thread
From: Prarit Bhargava @ 2015-06-11 15:00 UTC (permalink / raw)
To: Doug Smythies
Cc: 'Rafael J. Wysocki',
linux-kernel, 'Kristen Carlson Accardi',
'Viresh Kumar',
linux-pm
On 06/11/2015 10:51 AM, Doug Smythies wrote:
>
> On 2015.06.10 16:46 Rafael J. Wysocki wrote:
>> On Wednesday, June 10, 2015 09:18:45 AM Prarit Bhargava wrote:
>>> I looked into switching to div64_s64() instead of the 32-bit version in
>>> div_fp(), however, this would result in sample_ratio and core_busy returning
>>> 0 which is something we don't want.
>
> ???
> Due to a great many overflow related issues, div_fp() was changed to div64_s64()
> a long time ago.
Doug,
Nope -- in a linux.git tree (up-to-date as of 7:00AM ET this AM)
static inline int32_t div_fp(int32_t x, int32_t y)
{
return div_s64((int64_t)x << FRAC_BITS, y);
}
If we do want this to be div64_s64, I can make that change, however, I feel that
a long delay like this should be ignored in the performance calculations in the
driver and that's why I chose to go the direction I did.
P.
>
> I have not found the actual commit to reference, but it was about a year ago.
> And the math in general was all changed to 64 bit, over a few commits.
>
>>
>> P.
>>
>> ---8<---
>>
>> The kernel may delay interrupts for a long time which can result in timers
>> being delayed. If this occurs the intel_pstate driver will crash with
>> a divide by zero error:
>
> More recent versions will not crash.
> Long timer delays are extremely common, and this is a fundamental flaw
> in the duration method. Patch sets have been submitting dealing with this,
> and other, issues.
>
>>>
>>> which results in the time between samples = last_sample_time - sample.time
>>> = 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds.
>
> I have never seen anything over 4 seconds before, and I study this stuff
> (with respect to the intel_pstate driver operation) a lot. Due to help
> from others, I have data from a variety of processors.
> 4 seconds not unusual, even under load.
>
>>>
>>> The duration between reads of the APERF and MPERF registers overflowed a s32
>>> sized integer in intel_pstate_get_scaled_busy()'s call to div_fp(). The result
>>> is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0.
>>>
>>> While the kernel shouldn't be delaying for a long time, it can and does
>>> happen, and the intel_pstate driver should not panic in this situation. This
>>> patch checks for an overflow and ignores the current calculation cycle by
>>> returning -EINVAL. Since intel_pstate_sample() has been called, subsequent
>>> timer function calls will then again pick up the correct calculations and the
>>> system will continue functioning properly.
>
> That would run the risk that the correct calculation would never be done.
> It is fairly easy (I do it all the time) to create a scenario where
> there is high load on a CPU, but also a very very high duration value,
> for each and every duration. (and O.K., in that scenario the calculation is always
> wrong anyhow, due to the long duration check engaging.)
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] cpufreq, Fix overflow in busy_scaled due to long delay
2015-06-11 15:00 ` Prarit Bhargava
@ 2015-06-11 16:17 ` Doug Smythies
2015-06-11 16:59 ` Prarit Bhargava
0 siblings, 1 reply; 7+ messages in thread
From: Doug Smythies @ 2015-06-11 16:17 UTC (permalink / raw)
To: 'Prarit Bhargava'
Cc: 'Rafael J. Wysocki',
linux-kernel, 'Kristen Carlson Accardi',
'Viresh Kumar',
linux-pm
On 2015.06.11 08:01 Prarit Bhargava wrote:
> On 06/11/2015 10:51 AM, Doug Smythies wrote:
>>
>> On 2015.06.10 16:46 Rafael J. Wysocki wrote:
>>> On Wednesday, June 10, 2015 09:18:45 AM Prarit Bhargava wrote:
>>>> I looked into switching to div64_s64() instead of the 32-bit version in
>>>> div_fp(), however, this would result in sample_ratio and core_busy returning
>>>> 0 which is something we don't want.
>>
>> ???
>> Due to a great many overflow related issues, div_fp() was changed to div64_s64()
>> a long time ago.
> Doug,
>
> Nope -- in a linux.git tree (up-to-date as of 7:00AM ET this AM)
>
> static inline int32_t div_fp(int32_t x, int32_t y)
> {
> return div_s64((int64_t)x << FRAC_BITS, y);
> }
> If we do want this to be div64_s64, I can make that change, however, I feel that
> a long delay like this should be ignored in the performance calculations in the
> driver and that's why I chose to go the direction I did.
Prarit,
Apologies to you and the list for the distraction. I mis-read "div_s64" as "div64_s64".
Your suggestion is a good one.
I do maintain that my point about the duration method being flawed is valid.
I proposed a fix for that some time ago.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] cpufreq, Fix overflow in busy_scaled due to long delay
2015-06-11 16:17 ` Doug Smythies
@ 2015-06-11 16:59 ` Prarit Bhargava
0 siblings, 0 replies; 7+ messages in thread
From: Prarit Bhargava @ 2015-06-11 16:59 UTC (permalink / raw)
To: Doug Smythies
Cc: 'Rafael J. Wysocki',
linux-kernel, 'Kristen Carlson Accardi',
'Viresh Kumar',
linux-pm
On 06/11/2015 12:17 PM, Doug Smythies wrote:
>
> On 2015.06.11 08:01 Prarit Bhargava wrote:
>> On 06/11/2015 10:51 AM, Doug Smythies wrote:
>>>
>>> On 2015.06.10 16:46 Rafael J. Wysocki wrote:
>>>> On Wednesday, June 10, 2015 09:18:45 AM Prarit Bhargava wrote:
>>>>> I looked into switching to div64_s64() instead of the 32-bit version in
>>>>> div_fp(), however, this would result in sample_ratio and core_busy returning
>>>>> 0 which is something we don't want.
>>>
>>> ???
>>> Due to a great many overflow related issues, div_fp() was changed to div64_s64()
>>> a long time ago.
>
>> Doug,
>>
>> Nope -- in a linux.git tree (up-to-date as of 7:00AM ET this AM)
>>
>> static inline int32_t div_fp(int32_t x, int32_t y)
>> {
>> return div_s64((int64_t)x << FRAC_BITS, y);
>> }
>
>> If we do want this to be div64_s64, I can make that change, however, I feel that
>> a long delay like this should be ignored in the performance calculations in the
>> driver and that's why I chose to go the direction I did.
>
> Prarit,
>
> Apologies to you and the list for the distraction. I mis-read "div_s64" as "div64_s64".
> Your suggestion is a good one.
Thanks and no problems. It's always good to have someone make me go back and
double check ;)
>
> I do maintain that my point about the duration method being flawed is valid.
> I proposed a fix for that some time ago.
Okay, I'll go back and look at switching to div64_s64() and do some additional
testing.
P.
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-06-11 16:59 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-10 13:18 [PATCH] cpufreq, Fix overflow in busy_scaled due to long delay Prarit Bhargava
2015-06-10 23:45 ` Rafael J. Wysocki
2015-06-11 14:51 ` Doug Smythies
2015-06-11 14:58 ` Prarit Bhargava
2015-06-11 15:00 ` Prarit Bhargava
2015-06-11 16:17 ` Doug Smythies
2015-06-11 16:59 ` Prarit Bhargava
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).