* intel_pstate_timer_func divide by zero oops
@ 2013-03-28 1:49 ` Parag Warudkar
0 siblings, 0 replies; 6+ messages in thread
From: Parag Warudkar @ 2013-03-28 1:49 UTC (permalink / raw)
To: rjw, cpufreq, linux-pm; +Cc: linux-kernel, torvalds
I get this same oops occassionally - the machine freezes and there doesn't
seem to be any record of the oops on disk.
I captured it on camera -
https://lh3.googleusercontent.com/-K0lNbJrZBMQ/UVOU1vv1vvI/AAAAAAAANqI/pY92mWm3caE/s800/20130327_205245.jpg
If I am reading this right, it dies on this instruction -
0xffffffff8145792d <+349>: divq 0x18(%rcx)
>From the lst file that *seems* to be this inline function -
static inline void intel_pstate_calc_busy(struct cpudata *cpu,
struct sample *sample)
{
u64 core_pct;
sample->pstate_pct_busy = 100 - div64_u64(
ffffffff8145791d: 48 8b 41 20 mov 0x20(%rcx),%rax
ffffffff81457921: 48 8d 04 80 lea (%rax,%rax,4),%rax
ffffffff81457925: 48 8d 04 80 lea (%rax,%rax,4),%rax
ffffffff81457929: 48 c1 e0 02 shl $0x2,%rax
ffffffff8145792d: 48 f7 71 18 divq 0x18(%rcx)
That is -
sample->pstate_pct_busy = 100 - div64_u64(
sample->idletime_us * 100,
sample->duration_us);
So looks like sample->duration_us is 0? If so, that implies that
ktime_us_delta(now, cpu->prev_sample) is zero. I am not entirely sure how
to handle this case - return if sampling too early, or if there is some
other bug making the delta calculation go poof.
Thanks,
Parag
^ permalink raw reply [flat|nested] 6+ messages in thread
* intel_pstate_timer_func divide by zero oops
@ 2013-03-28 1:49 ` Parag Warudkar
0 siblings, 0 replies; 6+ messages in thread
From: Parag Warudkar @ 2013-03-28 1:49 UTC (permalink / raw)
To: rjw, cpufreq, linux-pm; +Cc: linux-kernel, torvalds
I get this same oops occassionally - the machine freezes and there doesn't
seem to be any record of the oops on disk.
I captured it on camera -
https://lh3.googleusercontent.com/-K0lNbJrZBMQ/UVOU1vv1vvI/AAAAAAAANqI/pY92mWm3caE/s800/20130327_205245.jpg
If I am reading this right, it dies on this instruction -
0xffffffff8145792d <+349>: divq 0x18(%rcx)
From the lst file that *seems* to be this inline function -
static inline void intel_pstate_calc_busy(struct cpudata *cpu,
struct sample *sample)
{
u64 core_pct;
sample->pstate_pct_busy = 100 - div64_u64(
ffffffff8145791d: 48 8b 41 20 mov 0x20(%rcx),%rax
ffffffff81457921: 48 8d 04 80 lea (%rax,%rax,4),%rax
ffffffff81457925: 48 8d 04 80 lea (%rax,%rax,4),%rax
ffffffff81457929: 48 c1 e0 02 shl $0x2,%rax
ffffffff8145792d: 48 f7 71 18 divq 0x18(%rcx)
That is -
sample->pstate_pct_busy = 100 - div64_u64(
sample->idletime_us * 100,
sample->duration_us);
So looks like sample->duration_us is 0? If so, that implies that
ktime_us_delta(now, cpu->prev_sample) is zero. I am not entirely sure how
to handle this case - return if sampling too early, or if there is some
other bug making the delta calculation go poof.
Thanks,
Parag
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: intel_pstate_timer_func divide by zero oops
2013-03-28 1:49 ` Parag Warudkar
(?)
@ 2013-03-28 2:51 ` Dirk Brandewie
2013-03-28 3:13 ` Parag Warudkar
-1 siblings, 1 reply; 6+ messages in thread
From: Dirk Brandewie @ 2013-03-28 2:51 UTC (permalink / raw)
To: Parag Warudkar
Cc: rjw, cpufreq, linux-pm, linux-kernel, torvalds, dirk.brandewie
Is there any way to capture the beginning of this trace?
pid_param_set() is on the stack which means that something is changing
the debugfs parameters or the stack is FUBAR.
On 03/27/2013 06:49 PM, Parag Warudkar wrote:
> I get this same oops occassionally - the machine freezes and there doesn't
> seem to be any record of the oops on disk.
>
>
> That is -
> sample->pstate_pct_busy = 100 - div64_u64(
> sample->idletime_us * 100,
> sample->duration_us);
>
I don't see how duration_us can be zero unless somehow I am getting back-to-back
timer callbacks which seems unlikely since the timer is not re-armed until
the timer function is about to return and the driver has done all its work
for the sample period
--Dirk
> So looks like sample->duration_us is 0? If so, that implies that
> ktime_us_delta(now, cpu->prev_sample) is zero. I am not entirely sure how
> to handle this case - return if sampling too early, or if there is some
> other bug making the delta calculation go poof.
>
>
> Thanks,
>
> Parag
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: intel_pstate_timer_func divide by zero oops
2013-03-28 2:51 ` Dirk Brandewie
@ 2013-03-28 3:13 ` Parag Warudkar
2013-03-28 15:35 ` Dirk Brandewie
0 siblings, 1 reply; 6+ messages in thread
From: Parag Warudkar @ 2013-03-28 3:13 UTC (permalink / raw)
To: Dirk Brandewie; +Cc: Rafael J. Wysocki, cpufreq, linux-pm, LKML, Linus Torvalds
On Wed, Mar 27, 2013 at 10:51 PM, Dirk Brandewie
<dirk.brandewie@gmail.com> wrote:
>
> Is there any way to capture the beginning of this trace?
I tried but since the oops scrolls fast followed by a hard freeze, I
wasn't able to capture it completely.
May be I can try netconsole and see if that helps.
>
> pid_param_set() is on the stack which means that something is changing
> the debugfs parameters or the stack is FUBAR.
>
I somehow doubt the stack is messed up as the call traces are always identical.
(pid_param_set() seems to be in first trace as well.)
>
> I don't see how duration_us can be zero unless somehow I am getting
> back-to-back
> timer callbacks which seems unlikely since the timer is not re-armed until
> the timer function is about to return and the driver has done all its work
> for the sample period
Do the two oops with common call stack suggest back to back callbacks?
I will add some debugging checks tomorrow to see what is going on. But
sounds like a minimal fix would be to guard against callbacks in quick
succession?
i.e. return from sample if ktime_us_delta(now, cpu->prev_sample) is zero?
Thanks,
Parag
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: intel_pstate_timer_func divide by zero oops
2013-03-28 3:13 ` Parag Warudkar
@ 2013-03-28 15:35 ` Dirk Brandewie
2013-03-28 18:25 ` Parag Warudkar
0 siblings, 1 reply; 6+ messages in thread
From: Dirk Brandewie @ 2013-03-28 15:35 UTC (permalink / raw)
To: Parag Warudkar
Cc: Dirk Brandewie, Rafael J. Wysocki, cpufreq, linux-pm, LKML,
Linus Torvalds
On 03/27/2013 08:13 PM, Parag Warudkar wrote:
> On Wed, Mar 27, 2013 at 10:51 PM, Dirk Brandewie
> <dirk.brandewie@gmail.com> wrote:
>>
>> Is there any way to capture the beginning of this trace?
>
> I tried but since the oops scrolls fast followed by a hard freeze, I
> wasn't able to capture it completely.
> May be I can try netconsole and see if that helps.
>
>>
>> pid_param_set() is on the stack which means that something is changing
>> the debugfs parameters or the stack is FUBAR.
>>
> I somehow doubt the stack is messed up as the call traces are always identical.
> (pid_param_set() seems to be in first trace as well.)
>
I agree that the two oops are likely the same but unless something is crawling
through debugfs writing random values to the files there pid_param_set()
should not be on any stack anywhere.
There was a similar bug reported by fedora:
https://bugzilla.redhat.com/show_bug.cgi?id=920289
This bug has not showed up again since rc3 can you try the current rc to see if
you still see the problem?
>>
>> I don't see how duration_us can be zero unless somehow I am getting
>> back-to-back
>> timer callbacks which seems unlikely since the timer is not re-armed until
>> the timer function is about to return and the driver has done all its work
>> for the sample period
>
> Do the two oops with common call stack suggest back to back callbacks?
>
> I will add some debugging checks tomorrow to see what is going on. But
> sounds like a minimal fix would be to guard against callbacks in quick
> succession?
> i.e. return from sample if ktime_us_delta(now, cpu->prev_sample) is zero?
>
> Thanks,
> Parag
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: intel_pstate_timer_func divide by zero oops
2013-03-28 15:35 ` Dirk Brandewie
@ 2013-03-28 18:25 ` Parag Warudkar
0 siblings, 0 replies; 6+ messages in thread
From: Parag Warudkar @ 2013-03-28 18:25 UTC (permalink / raw)
To: Dirk Brandewie; +Cc: Rafael J. Wysocki, cpufreq, linux-pm, LKML, Linus Torvalds
On Thu, Mar 28, 2013 at 11:35 AM, Dirk Brandewie
<dirk.brandewie@gmail.com> wrote:
>>> pid_param_set() is on the stack which means that something is changing
>>> the debugfs parameters or the stack is FUBAR.
>>>
>> I somehow doubt the stack is messed up as the call traces are always
>> identical.
>> (pid_param_set() seems to be in first trace as well.)
>>
>
> I agree that the two oops are likely the same but unless something is
> crawling
> through debugfs writing random values to the files there pid_param_set()
> should not be on any stack anywhere.
Ok.
>
> There was a similar bug reported by fedora:
> https://bugzilla.redhat.com/show_bug.cgi?id=920289
>
> This bug has not showed up again since rc3 can you try the current rc to see
> if
> you still see the problem?
>
I updated the Fedora BZ - the oops I got is from latest git as of
yesterday. So I think the bug is still there post -rc3.
Parag
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-03-28 18:25 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-28 1:49 intel_pstate_timer_func divide by zero oops Parag Warudkar
2013-03-28 1:49 ` Parag Warudkar
2013-03-28 2:51 ` Dirk Brandewie
2013-03-28 3:13 ` Parag Warudkar
2013-03-28 15:35 ` Dirk Brandewie
2013-03-28 18:25 ` Parag Warudkar
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.