All of lore.kernel.org
 help / color / mirror / Atom feed
* intel_pstate_timer_func divide by zero oops
@ 2013-03-28  1:49 ` Parag Warudkar
  0 siblings, 0 replies; 6+ messages in thread
From: Parag Warudkar @ 2013-03-28  1:49 UTC (permalink / raw)
  To: rjw, cpufreq, linux-pm; +Cc: linux-kernel, torvalds

I get this same oops occassionally - the machine freezes and there doesn't 
seem to be any record of the oops on disk.

I captured it on camera - 
https://lh3.googleusercontent.com/-K0lNbJrZBMQ/UVOU1vv1vvI/AAAAAAAANqI/pY92mWm3caE/s800/20130327_205245.jpg

If I am reading this right, it dies on this instruction -

   0xffffffff8145792d <+349>:   divq   0x18(%rcx)

>From the lst file that *seems* to be this inline function -

static inline void intel_pstate_calc_busy(struct cpudata *cpu,
                                        struct sample *sample)
{
        u64 core_pct;
        sample->pstate_pct_busy = 100 - div64_u64(
ffffffff8145791d:       48 8b 41 20             mov    0x20(%rcx),%rax
ffffffff81457921:       48 8d 04 80             lea    (%rax,%rax,4),%rax
ffffffff81457925:       48 8d 04 80             lea    (%rax,%rax,4),%rax
ffffffff81457929:       48 c1 e0 02             shl    $0x2,%rax
ffffffff8145792d:       48 f7 71 18             divq   0x18(%rcx)


That  is -
	sample->pstate_pct_busy = 100 - div64_u64(
					sample->idletime_us * 100,
					sample->duration_us);

So looks like sample->duration_us is 0? If so, that implies that 
ktime_us_delta(now, cpu->prev_sample) is zero. I am not entirely sure how 
to handle this case - return if sampling too early, or if there is some 
other bug making the delta calculation go poof.


Thanks,

Parag

^ permalink raw reply	[flat|nested] 6+ messages in thread

* intel_pstate_timer_func divide by zero oops
@ 2013-03-28  1:49 ` Parag Warudkar
  0 siblings, 0 replies; 6+ messages in thread
From: Parag Warudkar @ 2013-03-28  1:49 UTC (permalink / raw)
  To: rjw, cpufreq, linux-pm; +Cc: linux-kernel, torvalds

I get this same oops occassionally - the machine freezes and there doesn't 
seem to be any record of the oops on disk.

I captured it on camera - 
https://lh3.googleusercontent.com/-K0lNbJrZBMQ/UVOU1vv1vvI/AAAAAAAANqI/pY92mWm3caE/s800/20130327_205245.jpg

If I am reading this right, it dies on this instruction -

   0xffffffff8145792d <+349>:   divq   0x18(%rcx)

From the lst file that *seems* to be this inline function -

static inline void intel_pstate_calc_busy(struct cpudata *cpu,
                                        struct sample *sample)
{
        u64 core_pct;
        sample->pstate_pct_busy = 100 - div64_u64(
ffffffff8145791d:       48 8b 41 20             mov    0x20(%rcx),%rax
ffffffff81457921:       48 8d 04 80             lea    (%rax,%rax,4),%rax
ffffffff81457925:       48 8d 04 80             lea    (%rax,%rax,4),%rax
ffffffff81457929:       48 c1 e0 02             shl    $0x2,%rax
ffffffff8145792d:       48 f7 71 18             divq   0x18(%rcx)


That  is -
	sample->pstate_pct_busy = 100 - div64_u64(
					sample->idletime_us * 100,
					sample->duration_us);

So looks like sample->duration_us is 0? If so, that implies that 
ktime_us_delta(now, cpu->prev_sample) is zero. I am not entirely sure how 
to handle this case - return if sampling too early, or if there is some 
other bug making the delta calculation go poof.


Thanks,

Parag

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: intel_pstate_timer_func divide by zero oops
  2013-03-28  1:49 ` Parag Warudkar
  (?)
@ 2013-03-28  2:51 ` Dirk Brandewie
  2013-03-28  3:13   ` Parag Warudkar
  -1 siblings, 1 reply; 6+ messages in thread
From: Dirk Brandewie @ 2013-03-28  2:51 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: rjw, cpufreq, linux-pm, linux-kernel, torvalds, dirk.brandewie


Is there any way to capture the beginning of this trace?

pid_param_set() is on the stack which means that something is changing
the debugfs parameters or the stack is FUBAR.

On 03/27/2013 06:49 PM, Parag Warudkar wrote:
> I get this same oops occassionally - the machine freezes and there doesn't
> seem to be any record of the oops on disk.
>

>
> That  is -
> 	sample->pstate_pct_busy = 100 - div64_u64(
> 					sample->idletime_us * 100,
> 					sample->duration_us);
>

I don't see how duration_us can be zero unless somehow I am getting back-to-back
timer callbacks which seems unlikely since the timer is not re-armed until
the timer function is about to return and the driver has done all its work
for the sample period

--Dirk

> So looks like sample->duration_us is 0? If so, that implies that
> ktime_us_delta(now, cpu->prev_sample) is zero. I am not entirely sure how
> to handle this case - return if sampling too early, or if there is some
> other bug making the delta calculation go poof.
>
>
> Thanks,
>
> Parag
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: intel_pstate_timer_func divide by zero oops
  2013-03-28  2:51 ` Dirk Brandewie
@ 2013-03-28  3:13   ` Parag Warudkar
  2013-03-28 15:35     ` Dirk Brandewie
  0 siblings, 1 reply; 6+ messages in thread
From: Parag Warudkar @ 2013-03-28  3:13 UTC (permalink / raw)
  To: Dirk Brandewie; +Cc: Rafael J. Wysocki, cpufreq, linux-pm, LKML, Linus Torvalds

On Wed, Mar 27, 2013 at 10:51 PM, Dirk Brandewie
<dirk.brandewie@gmail.com> wrote:
>
> Is there any way to capture the beginning of this trace?

I tried but since the oops scrolls fast followed by a hard freeze, I
wasn't able to capture it completely.
May be I can try netconsole and see if that helps.

>
> pid_param_set() is on the stack which means that something is changing
> the debugfs parameters or the stack is FUBAR.
>
I somehow doubt the stack is messed up as the call traces are always identical.
(pid_param_set() seems to be in first trace as well.)

>
> I don't see how duration_us can be zero unless somehow I am getting
> back-to-back
> timer callbacks which seems unlikely since the timer is not re-armed until
> the timer function is about to return and the driver has done all its work
> for the sample period

Do the two oops with common call stack suggest back to back callbacks?

I will add some debugging checks tomorrow to see what is going on. But
sounds like a minimal fix would be to guard against callbacks in quick
succession?
i.e. return from sample if ktime_us_delta(now, cpu->prev_sample) is zero?

Thanks,
Parag

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: intel_pstate_timer_func divide by zero oops
  2013-03-28  3:13   ` Parag Warudkar
@ 2013-03-28 15:35     ` Dirk Brandewie
  2013-03-28 18:25       ` Parag Warudkar
  0 siblings, 1 reply; 6+ messages in thread
From: Dirk Brandewie @ 2013-03-28 15:35 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Dirk Brandewie, Rafael J. Wysocki, cpufreq, linux-pm, LKML,
	Linus Torvalds

On 03/27/2013 08:13 PM, Parag Warudkar wrote:
> On Wed, Mar 27, 2013 at 10:51 PM, Dirk Brandewie
> <dirk.brandewie@gmail.com> wrote:
>>
>> Is there any way to capture the beginning of this trace?
>
> I tried but since the oops scrolls fast followed by a hard freeze, I
> wasn't able to capture it completely.
> May be I can try netconsole and see if that helps.
>
>>
>> pid_param_set() is on the stack which means that something is changing
>> the debugfs parameters or the stack is FUBAR.
>>
> I somehow doubt the stack is messed up as the call traces are always identical.
> (pid_param_set() seems to be in first trace as well.)
>

I agree that the two oops are likely the same but unless something is crawling
through debugfs writing random values to the files there pid_param_set()
should not be on any stack anywhere.

There was a similar bug reported by fedora:
https://bugzilla.redhat.com/show_bug.cgi?id=920289

This bug has not showed up again since rc3 can you try the current rc to see if
you still see the problem?

>>
>> I don't see how duration_us can be zero unless somehow I am getting
>> back-to-back
>> timer callbacks which seems unlikely since the timer is not re-armed until
>> the timer function is about to return and the driver has done all its work
>> for the sample period
>
> Do the two oops with common call stack suggest back to back callbacks?
>
> I will add some debugging checks tomorrow to see what is going on. But
> sounds like a minimal fix would be to guard against callbacks in quick
> succession?
> i.e. return from sample if ktime_us_delta(now, cpu->prev_sample) is zero?
>
> Thanks,
> Parag
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: intel_pstate_timer_func divide by zero oops
  2013-03-28 15:35     ` Dirk Brandewie
@ 2013-03-28 18:25       ` Parag Warudkar
  0 siblings, 0 replies; 6+ messages in thread
From: Parag Warudkar @ 2013-03-28 18:25 UTC (permalink / raw)
  To: Dirk Brandewie; +Cc: Rafael J. Wysocki, cpufreq, linux-pm, LKML, Linus Torvalds

On Thu, Mar 28, 2013 at 11:35 AM, Dirk Brandewie
<dirk.brandewie@gmail.com> wrote:
>>> pid_param_set() is on the stack which means that something is changing
>>> the debugfs parameters or the stack is FUBAR.
>>>
>> I somehow doubt the stack is messed up as the call traces are always
>> identical.
>> (pid_param_set() seems to be in first trace as well.)
>>
>
> I agree that the two oops are likely the same but unless something is
> crawling
> through debugfs writing random values to the files there pid_param_set()
> should not be on any stack anywhere.

Ok.

>
> There was a similar bug reported by fedora:
> https://bugzilla.redhat.com/show_bug.cgi?id=920289
>
> This bug has not showed up again since rc3 can you try the current rc to see
> if
> you still see the problem?
>

I updated the Fedora BZ - the oops I got is from latest git as of
yesterday. So I think the bug is still there post -rc3.

Parag

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-03-28 18:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-28  1:49 intel_pstate_timer_func divide by zero oops Parag Warudkar
2013-03-28  1:49 ` Parag Warudkar
2013-03-28  2:51 ` Dirk Brandewie
2013-03-28  3:13   ` Parag Warudkar
2013-03-28 15:35     ` Dirk Brandewie
2013-03-28 18:25       ` Parag Warudkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.