All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] oops in cpufreq driver with AMD Kaveri CPU
@ 2014-08-04 21:39 Oleksandr Natalenko
  2014-08-07 20:53 ` Oleksandr Natalenko
  0 siblings, 1 reply; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-08-04 21:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-pm

Hello.

Occasionally I get my machine hung completely. Fortunately, I've got and saved 
oops listing using netconsole before hang, and here it is [1].

Here is little piece of oops from the link above:

===
[15051.270461] BUG: unable to handle kernel paging request at 00000000ff5ae8e4
[15051.271583] IP: [<ffffffff8109ae6e>] srcu_notifier_call_chain+0xe/0x20
…
[15051.956205] Call Trace:
[15051.980641]  [<ffffffff81606085>] ? __cpufreq_notify_transition+0x95/0x1e0
[15052.005640]  [<ffffffff816081ee>] cpufreq_notify_transition+0x3e/0x70
[15052.030240]  [<ffffffff816083d8>] cpufreq_freq_transition_begin+0xe8/0x130
[15052.054522]  [<ffffffff813b8940>] ? ucs2_strncmp+0x70/0x70
[15052.078208]  [<ffffffff816089bf>] __target_index+0xbf/0x1a0
[15052.101348]  [<ffffffff81608b9c>] __cpufreq_driver_target+0xfc/0x160
[15052.124250]  [<ffffffff8160b0d4>] od_check_cpu+0xa4/0xb0
[15052.146789]  [<ffffffff8160c9ec>] dbs_check_cpu+0x16c/0x1c0
[15052.168935]  [<ffffffff8160b4dd>] od_dbs_timer+0x11d/0x180
[15052.190607]  [<ffffffff8108e6ff>] process_one_work+0x17f/0x4c0
[15052.211825]  [<ffffffff8108f46b>] worker_thread+0x11b/0x3f0
[15052.232490]  [<ffffffff8108f350>] ? create_and_start_worker+0x80/0x80
[15052.253127]  [<ffffffff81096479>] kthread+0xc9/0xe0
[15052.273292]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
[15052.293487]  [<ffffffff81793efc>] ret_from_fork+0x7c/0xb0
[15052.313544]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
…
===

Also here is my lspci [2] and cpuinfo [3] as well.

Vanilla 3.15.8 and 3.16.0 are affected as well as latest Ubuntu 3.13 kernel.

No visible reason to trigger the bug. After hang machine doesn't respond via 
network, there's no disk IO, and also it doesn't respond to pressing power 
button in order to perform soft off.

[1] https://gist.github.com/085af9da81197faf6637
[2] https://gist.github.com/318ebda5576b099590b8
[3] https://gist.github.com/9c1307463c7ad6835b2d
-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-04 21:39 [BUG] oops in cpufreq driver with AMD Kaveri CPU Oleksandr Natalenko
@ 2014-08-07 20:53 ` Oleksandr Natalenko
  2014-08-08 17:26   ` Oleksandr Natalenko
  2014-08-12  5:52   ` Viresh Kumar
  0 siblings, 2 replies; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-08-07 20:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-pm

Disabling cpufreq code in kernel config works around this issue.

Is this bug related to sleeping in atomic context, which is caused by improper 
GFP_KERNEL usage instead of GFP_ATOMIC? Should I test tat patch, or there will 
be another fix?

On Tuesday 05 August 2014 00:39:11 Oleksandr Natalenko wrote:
> Hello.
> 
> Occasionally I get my machine hung completely. Fortunately, I've got and
> saved oops listing using netconsole before hang, and here it is [1].
> 
> Here is little piece of oops from the link above:
> 
> ===
> [15051.270461] BUG: unable to handle kernel paging request at
> 00000000ff5ae8e4 [15051.271583] IP: [<ffffffff8109ae6e>]
> srcu_notifier_call_chain+0xe/0x20 …
> [15051.956205] Call Trace:
> [15051.980641]  [<ffffffff81606085>] ?
> __cpufreq_notify_transition+0x95/0x1e0 [15052.005640]  [<ffffffff816081ee>]
> cpufreq_notify_transition+0x3e/0x70 [15052.030240]  [<ffffffff816083d8>]
> cpufreq_freq_transition_begin+0xe8/0x130 [15052.054522] 
> [<ffffffff813b8940>] ? ucs2_strncmp+0x70/0x70
> [15052.078208]  [<ffffffff816089bf>] __target_index+0xbf/0x1a0
> [15052.101348]  [<ffffffff81608b9c>] __cpufreq_driver_target+0xfc/0x160
> [15052.124250]  [<ffffffff8160b0d4>] od_check_cpu+0xa4/0xb0
> [15052.146789]  [<ffffffff8160c9ec>] dbs_check_cpu+0x16c/0x1c0
> [15052.168935]  [<ffffffff8160b4dd>] od_dbs_timer+0x11d/0x180
> [15052.190607]  [<ffffffff8108e6ff>] process_one_work+0x17f/0x4c0
> [15052.211825]  [<ffffffff8108f46b>] worker_thread+0x11b/0x3f0
> [15052.232490]  [<ffffffff8108f350>] ? create_and_start_worker+0x80/0x80
> [15052.253127]  [<ffffffff81096479>] kthread+0xc9/0xe0
> [15052.273292]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
> [15052.293487]  [<ffffffff81793efc>] ret_from_fork+0x7c/0xb0
> [15052.313544]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
> …
> ===
> 
> Also here is my lspci [2] and cpuinfo [3] as well.
> 
> Vanilla 3.15.8 and 3.16.0 are affected as well as latest Ubuntu 3.13 kernel.
> 
> No visible reason to trigger the bug. After hang machine doesn't respond via
> network, there's no disk IO, and also it doesn't respond to pressing power
> button in order to perform soft off.
> 
> [1] https://gist.github.com/085af9da81197faf6637
> [2] https://gist.github.com/318ebda5576b099590b8
> [3] https://gist.github.com/9c1307463c7ad6835b2d
-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-07 20:53 ` Oleksandr Natalenko
@ 2014-08-08 17:26   ` Oleksandr Natalenko
  2014-08-12  5:52   ` Viresh Kumar
  1 sibling, 0 replies; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-08-08 17:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-pm

Filled up detailed bugreport on bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=81701

On Thursday 07 August 2014 23:53:17 Oleksandr Natalenko wrote:
> Disabling cpufreq code in kernel config works around this issue.
> 
> Is this bug related to sleeping in atomic context, which is caused by
> improper GFP_KERNEL usage instead of GFP_ATOMIC? Should I test tat patch,
> or there will be another fix?
> 
> On Tuesday 05 August 2014 00:39:11 Oleksandr Natalenko wrote:
> > Hello.
> > 
> > Occasionally I get my machine hung completely. Fortunately, I've got and
> > saved oops listing using netconsole before hang, and here it is [1].
> > 
> > Here is little piece of oops from the link above:
> > 
> > ===
> > [15051.270461] BUG: unable to handle kernel paging request at
> > 00000000ff5ae8e4 [15051.271583] IP: [<ffffffff8109ae6e>]
> > srcu_notifier_call_chain+0xe/0x20 …
> > [15051.956205] Call Trace:
> > [15051.980641]  [<ffffffff81606085>] ?
> > __cpufreq_notify_transition+0x95/0x1e0 [15052.005640] 
> > [<ffffffff816081ee>]
> > cpufreq_notify_transition+0x3e/0x70 [15052.030240]  [<ffffffff816083d8>]
> > cpufreq_freq_transition_begin+0xe8/0x130 [15052.054522]
> > [<ffffffff813b8940>] ? ucs2_strncmp+0x70/0x70
> > [15052.078208]  [<ffffffff816089bf>] __target_index+0xbf/0x1a0
> > [15052.101348]  [<ffffffff81608b9c>] __cpufreq_driver_target+0xfc/0x160
> > [15052.124250]  [<ffffffff8160b0d4>] od_check_cpu+0xa4/0xb0
> > [15052.146789]  [<ffffffff8160c9ec>] dbs_check_cpu+0x16c/0x1c0
> > [15052.168935]  [<ffffffff8160b4dd>] od_dbs_timer+0x11d/0x180
> > [15052.190607]  [<ffffffff8108e6ff>] process_one_work+0x17f/0x4c0
> > [15052.211825]  [<ffffffff8108f46b>] worker_thread+0x11b/0x3f0
> > [15052.232490]  [<ffffffff8108f350>] ? create_and_start_worker+0x80/0x80
> > [15052.253127]  [<ffffffff81096479>] kthread+0xc9/0xe0
> > [15052.273292]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
> > [15052.293487]  [<ffffffff81793efc>] ret_from_fork+0x7c/0xb0
> > [15052.313544]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
> > …
> > ===
> > 
> > Also here is my lspci [2] and cpuinfo [3] as well.
> > 
> > Vanilla 3.15.8 and 3.16.0 are affected as well as latest Ubuntu 3.13
> > kernel.
> > 
> > No visible reason to trigger the bug. After hang machine doesn't respond
> > via network, there's no disk IO, and also it doesn't respond to pressing
> > power button in order to perform soft off.
> > 
> > [1] https://gist.github.com/085af9da81197faf6637
> > [2] https://gist.github.com/318ebda5576b099590b8
> > [3] https://gist.github.com/9c1307463c7ad6835b2d
-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-07 20:53 ` Oleksandr Natalenko
  2014-08-08 17:26   ` Oleksandr Natalenko
@ 2014-08-12  5:52   ` Viresh Kumar
  2014-08-12  5:55     ` Oleksandr Natalenko
  1 sibling, 1 reply; 23+ messages in thread
From: Viresh Kumar @ 2014-08-12  5:52 UTC (permalink / raw)
  To: Oleksandr Natalenko; +Cc: linux-kernel, Linux PM list

On Fri, Aug 8, 2014 at 2:23 AM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> Disabling cpufreq code in kernel config works around this issue.

Yeah, because this is happening while the notifiers are being served.
Can you debug it a bit to go to the exact notifier routine where this crashes?

> Is this bug related to sleeping in atomic context, which is caused by improper
> GFP_KERNEL usage instead of GFP_ATOMIC? Should I test tat patch, or there will
> be another fix?

Which patch are you talking about here?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-12  5:52   ` Viresh Kumar
@ 2014-08-12  5:55     ` Oleksandr Natalenko
  2014-08-12  6:16       ` Viresh Kumar
  0 siblings, 1 reply; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-08-12  5:55 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linux-kernel, Linux PM list

On Tuesday 12 August 2014 11:22:28 Viresh Kumar wrote:
> Yeah, because this is happening while the notifiers are being served.
> Can you debug it a bit to go to the exact notifier routine where this
> crashes?

What should I do to debug it? Is that necessary to recompile kernel with full 
debug?

> Which patch are you talking about here?

I thought about this one [1], but I guess that's not my case.

[1] https://lkml.org/lkml/2014/7/16/815

-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-12  5:55     ` Oleksandr Natalenko
@ 2014-08-12  6:16       ` Viresh Kumar
  2014-08-12  7:26         ` Oleksandr Natalenko
  0 siblings, 1 reply; 23+ messages in thread
From: Viresh Kumar @ 2014-08-12  6:16 UTC (permalink / raw)
  To: Oleksandr Natalenko; +Cc: linux-kernel, Linux PM list

On Tue, Aug 12, 2014 at 11:25 AM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> What should I do to debug it? Is that necessary to recompile kernel with full
> debug?

Yeah, you need to recompile the kernel for sure but not necessarily with debug
support..

Some background:

In cpufreq framework, we manage a CPUs frequency based on the load
on CPU. You are using ondemand governor which tries to
increase/decrease frequency
continuously at fixed intervals.

Now, when we change the frequency we *may* need to communicate this to few
drivers which *may* depend on CPUs frequency for their functioning.

This is handled via notifications.

Other drivers are required to do this:
cpufreq_register_notifier(&<some-local-struct>, CPUFREQ_TRANSITION_NOTIFIER);

to register themselves for frequency-change and them a routine of theirs would
be called from cpufreq-core..

This is exactly where it is crashing for you. i.e. while calling the
notifier list.

So, you need to check which all notifiers are registered and out of those which
one is crashing..

The first parameter of the above register-call should be declared this way:

static struct notifier_block xyz_notifier_block = {
    .notifier_call = xyz_freq_notifier,
};

I would have done it this way:

- Add a print in cpufreq_register_notifier() to print the address of routine
present in .notifier_call for case: CPUFREQ_TRANSITION_NOTIFIER

- Then add prints to all the notifiers added for your configuration, shouldn't
be much only 4-5 I believe. i.e. you can add print messages to the notifier
callbacks..

- Then see the sequence/order in which they are called normally, when we
don't crash.. and check that when it crashes..

- You will be able to make out which notifier is crashing. And then we can
see why?

- You can add something like this to notifier routines:
pr_info("%s\n", __func__);

This will print function name.

>> Which patch are you talking about here?
>
> I thought about this one [1], but I guess that's not my case.
>
> [1] https://lkml.org/lkml/2014/7/16/815

I guessed so and I don't think it will help as the crashes reported in this
bug-log is something different.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-12  6:16       ` Viresh Kumar
@ 2014-08-12  7:26         ` Oleksandr Natalenko
  2014-08-12  7:46           ` Viresh Kumar
  0 siblings, 1 reply; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-08-12  7:26 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linux-kernel, Linux PM list

Got that and will try to investigate ASAP.

Just to note: I tried to use powersave governor and got the same result. I believe notifiers are used with powersave governor as well. Am I wrong?

On August 12, 2014 9:16:21 AM EEST, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>On Tue, Aug 12, 2014 at 11:25 AM, Oleksandr Natalenko
><oleksandr@natalenko.name> wrote:
>> What should I do to debug it? Is that necessary to recompile kernel
>with full
>> debug?
>
>Yeah, you need to recompile the kernel for sure but not necessarily
>with debug
>support..
>
>Some background:
>
>In cpufreq framework, we manage a CPUs frequency based on the load
>on CPU. You are using ondemand governor which tries to
>increase/decrease frequency
>continuously at fixed intervals.
>
>Now, when we change the frequency we *may* need to communicate this to
>few
>drivers which *may* depend on CPUs frequency for their functioning.
>
>This is handled via notifications.
>
>Other drivers are required to do this:
>cpufreq_register_notifier(&<some-local-struct>,
>CPUFREQ_TRANSITION_NOTIFIER);
>
>to register themselves for frequency-change and them a routine of
>theirs would
>be called from cpufreq-core..
>
>This is exactly where it is crashing for you. i.e. while calling the
>notifier list.
>
>So, you need to check which all notifiers are registered and out of
>those which
>one is crashing..
>
>The first parameter of the above register-call should be declared this
>way:
>
>static struct notifier_block xyz_notifier_block = {
>    .notifier_call = xyz_freq_notifier,
>};
>
>I would have done it this way:
>
>- Add a print in cpufreq_register_notifier() to print the address of
>routine
>present in .notifier_call for case: CPUFREQ_TRANSITION_NOTIFIER
>
>- Then add prints to all the notifiers added for your configuration,
>shouldn't
>be much only 4-5 I believe. i.e. you can add print messages to the
>notifier
>callbacks..
>
>- Then see the sequence/order in which they are called normally, when
>we
>don't crash.. and check that when it crashes..
>
>- You will be able to make out which notifier is crashing. And then we
>can
>see why?
>
>- You can add something like this to notifier routines:
>pr_info("%s\n", __func__);
>
>This will print function name.
>
>>> Which patch are you talking about here?
>>
>> I thought about this one [1], but I guess that's not my case.
>>
>> [1] https://lkml.org/lkml/2014/7/16/815
>
>I guessed so and I don't think it will help as the crashes reported in
>this
>bug-log is something different.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-12  7:26         ` Oleksandr Natalenko
@ 2014-08-12  7:46           ` Viresh Kumar
  2014-08-12 18:04             ` Oleksandr Natalenko
  0 siblings, 1 reply; 23+ messages in thread
From: Viresh Kumar @ 2014-08-12  7:46 UTC (permalink / raw)
  To: Oleksandr Natalenko; +Cc: linux-kernel, Linux PM list

On Tue, Aug 12, 2014 at 12:56 PM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> Got that and will try to investigate ASAP.
>
> Just to note: I tried to use powersave governor and got the same result. I believe notifiers are used with powersave governor as well. Am I wrong?

Notifiers are used from the cpufreq core when frequency is changed.
And the powersave governor should have changed the frequency only
ONCE, i.e. to go to lowest frequency.

So, it probably happened at the first & only change at that time as well..
But it shouldn't happen any later. So, see what the log looks like on
that crash.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-12  7:46           ` Viresh Kumar
@ 2014-08-12 18:04             ` Oleksandr Natalenko
  2014-08-12 18:18               ` Oleksandr Natalenko
  0 siblings, 1 reply; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-08-12 18:04 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linux-kernel, Linux PM list

Well, I've added the following code to cpufreq_register_notifier() 
(drivers/cpufreq/cpufreq.c) to CPUFREQ_TRANSITION_NOTIFIER case:

===
pr_info("Registered transition notifier: %p, (%p)\n", nb->notifier_call, &nb-
>notifier_call);
===

And got the following in my dmesg (only 1 line):

===
$ cat /var/log/kern.log | grep "Registered transition notifier"
Aug 12 17:23:45 defiant kernel: [    3.084977] cpufreq: Registered transition 
notifier: ffffffff81590378, (ffffffff81cb0ba0)
===

System.map tells me, that ffffffff81590378 corresponds to 
cpufreq_stat_notifier_trans() (in drivers/cpufreq/cpufreq_stats.c).

Also I've added

===
pr_info("%s\n", __func__);
===

to all callbacks (hello, LXR :)), and got the following occurrences in dmesg:

===
$ cat /var/log/kern.log | egrep -e '(acpi_|cpufreq|notifier)' | awk 
'{print($7)}' | sort -u
acpi_processor_ppc_notifier
acpi_thermal_cpufreq_notifier
cpufreq_stat_notifier_policy
===

No extra notifiers are involved so far, but also I haven't caught new hang yet 
as it occurs randomly. Still waiting for it, but I hope my little 
investigation could help somehow.

On Tuesday 12 August 2014 13:16:25 Viresh Kumar wrote:
> On Tue, Aug 12, 2014 at 12:56 PM, Oleksandr Natalenko
> 
> <oleksandr@natalenko.name> wrote:
> > Got that and will try to investigate ASAP.
> > 
> > Just to note: I tried to use powersave governor and got the same result. I
> > believe notifiers are used with powersave governor as well. Am I wrong?
> Notifiers are used from the cpufreq core when frequency is changed.
> And the powersave governor should have changed the frequency only
> ONCE, i.e. to go to lowest frequency.
> 
> So, it probably happened at the first & only change at that time as well..
> But it shouldn't happen any later. So, see what the log looks like on
> that crash.
-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-12 18:04             ` Oleksandr Natalenko
@ 2014-08-12 18:18               ` Oleksandr Natalenko
  2014-08-12 18:54                 ` Oleksandr Natalenko
  2014-08-13  4:32                 ` Viresh Kumar
  0 siblings, 2 replies; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-08-12 18:18 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linux-kernel, Linux PM list

Hmm, looks like I've put pr_info to policy notifier instead of transition 
notifier. Will fix it now and post new logs.

On Tuesday 12 August 2014 21:04:30 Oleksandr Natalenko wrote:
> Well, I've added the following code to cpufreq_register_notifier()
> (drivers/cpufreq/cpufreq.c) to CPUFREQ_TRANSITION_NOTIFIER case:
> 
> ===
> pr_info("Registered transition notifier: %p, (%p)\n", nb->notifier_call,
> &nb-
> >notifier_call);
> 
> ===
> 
> And got the following in my dmesg (only 1 line):
> 
> ===
> $ cat /var/log/kern.log | grep "Registered transition notifier"
> Aug 12 17:23:45 defiant kernel: [    3.084977] cpufreq: Registered
> transition notifier: ffffffff81590378, (ffffffff81cb0ba0)
> ===
> 
> System.map tells me, that ffffffff81590378 corresponds to
> cpufreq_stat_notifier_trans() (in drivers/cpufreq/cpufreq_stats.c).
> 
> Also I've added
> 
> ===
> pr_info("%s\n", __func__);
> ===
> 
> to all callbacks (hello, LXR :)), and got the following occurrences in
> dmesg:
> 
> ===
> $ cat /var/log/kern.log | egrep -e '(acpi_|cpufreq|notifier)' | awk
> '{print($7)}' | sort -u
> acpi_processor_ppc_notifier
> acpi_thermal_cpufreq_notifier
> cpufreq_stat_notifier_policy
> ===
> 
> No extra notifiers are involved so far, but also I haven't caught new hang
> yet as it occurs randomly. Still waiting for it, but I hope my little
> investigation could help somehow.
> 
> On Tuesday 12 August 2014 13:16:25 Viresh Kumar wrote:
> > On Tue, Aug 12, 2014 at 12:56 PM, Oleksandr Natalenko
> > 
> > <oleksandr@natalenko.name> wrote:
> > > Got that and will try to investigate ASAP.
> > > 
> > > Just to note: I tried to use powersave governor and got the same result.
> > > I
> > > believe notifiers are used with powersave governor as well. Am I wrong?
> > 
> > Notifiers are used from the cpufreq core when frequency is changed.
> > And the powersave governor should have changed the frequency only
> > ONCE, i.e. to go to lowest frequency.
> > 
> > So, it probably happened at the first & only change at that time as well..
> > But it shouldn't happen any later. So, see what the log looks like on
> > that crash.
-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-12 18:18               ` Oleksandr Natalenko
@ 2014-08-12 18:54                 ` Oleksandr Natalenko
       [not found]                   ` <CAOjmkp_mrMYJJfEqqKtPVrbMuaoJ9W6212LKHETeUsOsJryh-Q@mail.gmail.com>
                                     ` (2 more replies)
  2014-08-13  4:32                 ` Viresh Kumar
  1 sibling, 3 replies; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-08-12 18:54 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linux-kernel, Linux PM list

Updated logs.

Registered notifiers:

===
pf@defiant:~$ cat /var/log/kern.log | grep "Registered transition notifier"
Aug 12 21:50:13 defiant kernel: [    3.081759] cpufreq: Registered transition 
notifier: ffffffff81590378, (ffffffff81cb0ba0)
===

Triggered notifiers:

===
pf@defiant:~$ dmesg | grep CPUFREQ_NOTIFIER | awk '{ s = ""; for (i = 3; i <= 
NF; i++) s = s $i " "; print s }' | sort -u
CPUFREQ_NOTIFIER: acpi_processor_ppc_notifier 
CPUFREQ_NOTIFIER: acpi_thermal_cpufreq_notifier 
CPUFREQ_NOTIFIER: cpufreq_stat_notifier_trans 
===

On Tuesday 12 August 2014 21:18:09 Oleksandr Natalenko wrote:
> Hmm, looks like I've put pr_info to policy notifier instead of transition
> notifier. Will fix it now and post new logs.
> 
> On Tuesday 12 August 2014 21:04:30 Oleksandr Natalenko wrote:
> > Well, I've added the following code to cpufreq_register_notifier()
> > (drivers/cpufreq/cpufreq.c) to CPUFREQ_TRANSITION_NOTIFIER case:
> > 
> > ===
> > pr_info("Registered transition notifier: %p, (%p)\n", nb->notifier_call,
> > &nb-
> > 
> > >notifier_call);
> > 
> > ===
> > 
> > And got the following in my dmesg (only 1 line):
> > 
> > ===
> > $ cat /var/log/kern.log | grep "Registered transition notifier"
> > Aug 12 17:23:45 defiant kernel: [    3.084977] cpufreq: Registered
> > transition notifier: ffffffff81590378, (ffffffff81cb0ba0)
> > ===
> > 
> > System.map tells me, that ffffffff81590378 corresponds to
> > cpufreq_stat_notifier_trans() (in drivers/cpufreq/cpufreq_stats.c).
> > 
> > Also I've added
> > 
> > ===
> > pr_info("%s\n", __func__);
> > ===
> > 
> > to all callbacks (hello, LXR :)), and got the following occurrences in
> > dmesg:
> > 
> > ===
> > $ cat /var/log/kern.log | egrep -e '(acpi_|cpufreq|notifier)' | awk
> > '{print($7)}' | sort -u
> > acpi_processor_ppc_notifier
> > acpi_thermal_cpufreq_notifier
> > cpufreq_stat_notifier_policy
> > ===
> > 
> > No extra notifiers are involved so far, but also I haven't caught new hang
> > yet as it occurs randomly. Still waiting for it, but I hope my little
> > investigation could help somehow.
> > 
> > On Tuesday 12 August 2014 13:16:25 Viresh Kumar wrote:
> > > On Tue, Aug 12, 2014 at 12:56 PM, Oleksandr Natalenko
> > > 
> > > <oleksandr@natalenko.name> wrote:
> > > > Got that and will try to investigate ASAP.
> > > > 
> > > > Just to note: I tried to use powersave governor and got the same
> > > > result.
> > > > I
> > > > believe notifiers are used with powersave governor as well. Am I
> > > > wrong?
> > > 
> > > Notifiers are used from the cpufreq core when frequency is changed.
> > > And the powersave governor should have changed the frequency only
> > > ONCE, i.e. to go to lowest frequency.
> > > 
> > > So, it probably happened at the first & only change at that time as
> > > well..
> > > But it shouldn't happen any later. So, see what the log looks like on
> > > that crash.
-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Fwd: [BUG] oops in cpufreq driver with AMD Kaveri CPU
       [not found]                     ` <CAOjmkp8h19de3bYaLpqmXxEynKi--gHBf6MxXuuNoDzZXw=O8Q@mail.gmail.com>
@ 2014-08-12 23:39                         ` Aravind Gopalakrishnan
  0 siblings, 0 replies; 23+ messages in thread
From: Aravind Gopalakrishnan @ 2014-08-12 23:39 UTC (permalink / raw)
  To: oleksandr; +Cc: viresh.kumar, LKML, linux-pm

On 8/12/2014 2:51 PM, Aravind Gopalakrishnan wrote:
>
>
> Hello.
>
> Occasionally I get my machine hung completely. Fortunately, I've got 
> and saved
> oops listing using netconsole before hang, and here it is [1].
>
> Here is little piece of oops from the link above:
>
> ===
> [15051.270461] BUG: unable to handle kernel paging request at 
> 00000000ff5ae8e4
> [15051.271583] IP: [<ffffffff8109ae6e>] srcu_notifier_call_chain+0xe/0x20
> …
> [15051.956205] Call Trace:
> [15051.980641]  [<ffffffff81606085>] ? 
> __cpufreq_notify_transition+0x95/0x1e0
> [15052.005640]  [<ffffffff816081ee>] cpufreq_notify_transition+0x3e/0x70
> [15052.030240]  [<ffffffff816083d8>] 
> cpufreq_freq_transition_begin+0xe8/0x130
> [15052.054522]  [<ffffffff813b8940>] ? ucs2_strncmp+0x70/0x70
> [15052.078208]  [<ffffffff816089bf>] __target_index+0xbf/0x1a0
> [15052.101348]  [<ffffffff81608b9c>] __cpufreq_driver_target+0xfc/0x160
> [15052.124250]  [<ffffffff8160b0d4>] od_check_cpu+0xa4/0xb0
> [15052.146789]  [<ffffffff8160c9ec>] dbs_check_cpu+0x16c/0x1c0
> [15052.168935]  [<ffffffff8160b4dd>] od_dbs_timer+0x11d/0x180
> [15052.190607]  [<ffffffff8108e6ff>] process_one_work+0x17f/0x4c0
> [15052.211825]  [<ffffffff8108f46b>] worker_thread+0x11b/0x3f0
> [15052.232490]  [<ffffffff8108f350>] ? create_and_start_worker+0x80/0x80
> [15052.253127]  [<ffffffff81096479>] kthread+0xc9/0xe0
> [15052.273292]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
> [15052.293487]  [<ffffffff81793efc>] ret_from_fork+0x7c/0xb0
> [15052.313544]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
> …
> ===
>
> Also here is my lspci [2] and cpuinfo [3] as well.
>
> Vanilla 3.15.8 and 3.16.0 are affected as well as latest Ubuntu 3.13 
> kernel.
>
> No visible reason to trigger the bug. After hang machine doesn't 
> respond via
> network, there's no disk IO, and also it doesn't respond to pressing power
> button in order to perform soft off.
>
> [1] https://gist.github.com/085af9da81197faf6637
> [2] https://gist.github.com/318ebda5576b099590b8
> [3] https://gist.github.com/9c1307463c7ad6835b2d
>
>

Hi,

I noticed this ping yesterday and tried to reproduce your issue on a 
similar system I have (btw, this is a 'Kabini' processor and not a 
'Kaveri') without success.

/proc/cpuinfo:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 22
model           : 0
model name      : AMD Opteron(tm) X2150 APU
stepping        : 1
microcode       : 0x7000106
cpu MHz         : 800.000
cache size      : 2048 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext 
fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc 
extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 cx16 sse4_1 
sse4_2 movbe popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic 
cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt 
topoext perfctr_nb perfctr_l2 arat xsaveopt hw_pstate proc_feedback npt 
lbrv svm_lock nrip_save tsc_scale flushbyasid decodeassists pausefilter 
pfthreshold bmi1
bogomips        : 3793.19
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate [11]

Since the BUG happens on a frequency transition, I tried this-
periodically ramped up the cpu frequency by running a workload to keep 
all cores busy for sometime; And let cpu frequency drop down by killing 
the load.
Repeated this cycle overnight yesterday but did not notice the BUG.
(Using ondemand governor, with uname -r: 3.16-rc4)
(I think you mentioned you were able to reproduce on 3.16. So assuming 
-rc will be affected too)

Are you noticing this BUG when you are running any particular load?
I could help debug effort or test patches to fix issue(whenever 
necessary) if I have some way to reproduce this..

-Aravind

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Fwd: [BUG] oops in cpufreq driver with AMD Kaveri CPU
@ 2014-08-12 23:39                         ` Aravind Gopalakrishnan
  0 siblings, 0 replies; 23+ messages in thread
From: Aravind Gopalakrishnan @ 2014-08-12 23:39 UTC (permalink / raw)
  To: oleksandr; +Cc: viresh.kumar, LKML, linux-pm

On 8/12/2014 2:51 PM, Aravind Gopalakrishnan wrote:
>
>
> Hello.
>
> Occasionally I get my machine hung completely. Fortunately, I've got 
> and saved
> oops listing using netconsole before hang, and here it is [1].
>
> Here is little piece of oops from the link above:
>
> ===
> [15051.270461] BUG: unable to handle kernel paging request at 
> 00000000ff5ae8e4
> [15051.271583] IP: [<ffffffff8109ae6e>] srcu_notifier_call_chain+0xe/0x20
> …
> [15051.956205] Call Trace:
> [15051.980641]  [<ffffffff81606085>] ? 
> __cpufreq_notify_transition+0x95/0x1e0
> [15052.005640]  [<ffffffff816081ee>] cpufreq_notify_transition+0x3e/0x70
> [15052.030240]  [<ffffffff816083d8>] 
> cpufreq_freq_transition_begin+0xe8/0x130
> [15052.054522]  [<ffffffff813b8940>] ? ucs2_strncmp+0x70/0x70
> [15052.078208]  [<ffffffff816089bf>] __target_index+0xbf/0x1a0
> [15052.101348]  [<ffffffff81608b9c>] __cpufreq_driver_target+0xfc/0x160
> [15052.124250]  [<ffffffff8160b0d4>] od_check_cpu+0xa4/0xb0
> [15052.146789]  [<ffffffff8160c9ec>] dbs_check_cpu+0x16c/0x1c0
> [15052.168935]  [<ffffffff8160b4dd>] od_dbs_timer+0x11d/0x180
> [15052.190607]  [<ffffffff8108e6ff>] process_one_work+0x17f/0x4c0
> [15052.211825]  [<ffffffff8108f46b>] worker_thread+0x11b/0x3f0
> [15052.232490]  [<ffffffff8108f350>] ? create_and_start_worker+0x80/0x80
> [15052.253127]  [<ffffffff81096479>] kthread+0xc9/0xe0
> [15052.273292]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
> [15052.293487]  [<ffffffff81793efc>] ret_from_fork+0x7c/0xb0
> [15052.313544]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
> …
> ===
>
> Also here is my lspci [2] and cpuinfo [3] as well.
>
> Vanilla 3.15.8 and 3.16.0 are affected as well as latest Ubuntu 3.13 
> kernel.
>
> No visible reason to trigger the bug. After hang machine doesn't 
> respond via
> network, there's no disk IO, and also it doesn't respond to pressing power
> button in order to perform soft off.
>
> [1] https://gist.github.com/085af9da81197faf6637
> [2] https://gist.github.com/318ebda5576b099590b8
> [3] https://gist.github.com/9c1307463c7ad6835b2d
>
>

Hi,

I noticed this ping yesterday and tried to reproduce your issue on a 
similar system I have (btw, this is a 'Kabini' processor and not a 
'Kaveri') without success.

/proc/cpuinfo:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 22
model           : 0
model name      : AMD Opteron(tm) X2150 APU
stepping        : 1
microcode       : 0x7000106
cpu MHz         : 800.000
cache size      : 2048 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext 
fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc 
extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 cx16 sse4_1 
sse4_2 movbe popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic 
cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt 
topoext perfctr_nb perfctr_l2 arat xsaveopt hw_pstate proc_feedback npt 
lbrv svm_lock nrip_save tsc_scale flushbyasid decodeassists pausefilter 
pfthreshold bmi1
bogomips        : 3793.19
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate [11]

Since the BUG happens on a frequency transition, I tried this-
periodically ramped up the cpu frequency by running a workload to keep 
all cores busy for sometime; And let cpu frequency drop down by killing 
the load.
Repeated this cycle overnight yesterday but did not notice the BUG.
(Using ondemand governor, with uname -r: 3.16-rc4)
(I think you mentioned you were able to reproduce on 3.16. So assuming 
-rc will be affected too)

Are you noticing this BUG when you are running any particular load?
I could help debug effort or test patches to fix issue(whenever 
necessary) if I have some way to reproduce this..

-Aravind

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-12 18:18               ` Oleksandr Natalenko
  2014-08-12 18:54                 ` Oleksandr Natalenko
@ 2014-08-13  4:32                 ` Viresh Kumar
  2014-08-13  5:56                   ` Oleksandr Natalenko
  1 sibling, 1 reply; 23+ messages in thread
From: Viresh Kumar @ 2014-08-13  4:32 UTC (permalink / raw)
  To: Oleksandr Natalenko; +Cc: linux-kernel, Linux PM list

On 12 August 2014 23:48, Oleksandr Natalenko <oleksandr@natalenko.name> wrote:
> Hmm, looks like I've put pr_info to policy notifier instead of transition
> notifier. Will fix it now and post new logs.

That's why you should have copied the diff here, so that I could have
confirmed if what you have done is correct or not.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-12 18:54                 ` Oleksandr Natalenko
       [not found]                   ` <CAOjmkp_mrMYJJfEqqKtPVrbMuaoJ9W6212LKHETeUsOsJryh-Q@mail.gmail.com>
@ 2014-08-13  4:36                   ` Viresh Kumar
  2014-08-13  4:42                   ` Viresh Kumar
  2 siblings, 0 replies; 23+ messages in thread
From: Viresh Kumar @ 2014-08-13  4:36 UTC (permalink / raw)
  To: Oleksandr Natalenko; +Cc: linux-kernel, Linux PM list

On 13 August 2014 00:24, Oleksandr Natalenko <oleksandr@natalenko.name> wrote:
> Updated logs.
>
> Registered notifiers:
>
> ===
> pf@defiant:~$ cat /var/log/kern.log | grep "Registered transition notifier"
> Aug 12 21:50:13 defiant kernel: [    3.081759] cpufreq: Registered transition
> notifier: ffffffff81590378, (ffffffff81cb0ba0)
> ===
>
> Triggered notifiers:
>
> ===
> pf@defiant:~$ dmesg | grep CPUFREQ_NOTIFIER | awk '{ s = ""; for (i = 3; i <=
> NF; i++) s = s $i " "; print s }' | sort -u
> CPUFREQ_NOTIFIER: acpi_processor_ppc_notifier
> CPUFREQ_NOTIFIER: acpi_thermal_cpufreq_notifier
> CPUFREQ_NOTIFIER: cpufreq_stat_notifier_trans

I don't know why there are three lines here. The first two are registered
for policy-notifiers and not transition ones.

So, I think its just about cpufreq-stat's notifier.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-12 18:54                 ` Oleksandr Natalenko
       [not found]                   ` <CAOjmkp_mrMYJJfEqqKtPVrbMuaoJ9W6212LKHETeUsOsJryh-Q@mail.gmail.com>
  2014-08-13  4:36                   ` Viresh Kumar
@ 2014-08-13  4:42                   ` Viresh Kumar
  2014-08-13  5:43                     ` Oleksandr Natalenko
  2014-11-11 10:41                     ` Oleksandr Natalenko
  2 siblings, 2 replies; 23+ messages in thread
From: Viresh Kumar @ 2014-08-13  4:42 UTC (permalink / raw)
  To: Oleksandr Natalenko; +Cc: linux-kernel, Linux PM list

On 13 August 2014 00:24, Oleksandr Natalenko <oleksandr@natalenko.name> wrote:
> Updated logs.

Also will it be possible for you to find the last working kernel after
which this
happened?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-13  4:42                   ` Viresh Kumar
@ 2014-08-13  5:43                     ` Oleksandr Natalenko
  2014-11-11 10:41                     ` Oleksandr Natalenko
  1 sibling, 0 replies; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-08-13  5:43 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linux-kernel, Linux PM list

On Wednesday 13 August 2014 10:12:06 Viresh Kumar wrote: 
> Also will it be possible for you to find the last working kernel after
> which this
> happened?

I'll try to reproduce the issue with older kernels, but that could be 
problematic and may take some time. So far, I can say that 3.13 and 3.15 are 
also affected (haven't tried 3.14, though).

-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-13  4:32                 ` Viresh Kumar
@ 2014-08-13  5:56                   ` Oleksandr Natalenko
  2014-08-13 12:45                     ` Oleksandr Natalenko
  0 siblings, 1 reply; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-08-13  5:56 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linux-kernel, Linux PM list

[-- Attachment #1: Type: text/plain, Size: 385 bytes --]

On Wednesday 13 August 2014 10:02:26 Viresh Kumar wrote:
> That's why you should have copied the diff here, so that I could have
> confirmed if what you have done is correct or not.

You may check my modifications here [1] or attached to this email.

[1] https://gist.github.com/bdcc7d41883b5e077974

-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

[-- Attachment #2: 0001-add-cpufreq-debug.patch --]
[-- Type: text/x-patch, Size: 18907 bytes --]

>From 493cd190339201c264cdc4cbc1c96c70c786a776 Mon Sep 17 00:00:00 2001
From: Oleksandr Natalenko <oleksandr@natalenko.name>
Date: Wed, 13 Aug 2014 08:54:09 +0300
Subject: [PATCH] add cpufreq debug

---
 arch/arm/kernel/smp.c                            | 2 ++
 arch/arm/kernel/smp_twd.c                        | 2 ++
 arch/arm/mach-pxa/viper.c                        | 2 ++
 arch/arm/mach-s3c24xx/mach-osiris-dvs.c          | 2 ++
 arch/blackfin/mach-common/dpmc.c                 | 2 ++
 arch/cris/arch-v32/kernel/time.c                 | 3 +++
 arch/powerpc/oprofile/op_model_cell.c            | 3 +++
 arch/sparc/kernel/time_64.c                      | 2 ++
 arch/x86/kernel/tsc.c                            | 2 ++
 arch/x86/kvm/x86.c                               | 2 ++
 drivers/acpi/processor_perflib.c                 | 2 ++
 drivers/acpi/processor_thermal.c                 | 2 ++
 drivers/cpufreq/cpufreq.c                        | 1 +
 drivers/cpufreq/cpufreq_stats.c                  | 2 ++
 drivers/cpufreq/cris-artpec3-cpufreq.c           | 3 +++
 drivers/cpufreq/cris-etraxfs-cpufreq.c           | 3 +++
 drivers/cpufreq/loongson2_cpufreq.c              | 2 ++
 drivers/cpufreq/ppc_cbe_cpufreq_pmi.c            | 2 ++
 drivers/gpu/drm/tilcdc/tilcdc_drv.c              | 3 +++
 drivers/i2c/busses/i2c-davinci.c                 | 2 ++
 drivers/i2c/busses/i2c-s3c2410.c                 | 2 ++
 drivers/macintosh/windfarm_cpufreq_clamp.c       | 2 ++
 drivers/mmc/host/davinci_mmc.c                   | 2 ++
 drivers/mmc/host/s3cmci.c                        | 2 ++
 drivers/mtd/nand/s3c2410.c                       | 2 ++
 drivers/pcmcia/soc_common.c                      | 2 ++
 drivers/staging/tidspbridge/rmgr/drv_interface.c | 2 ++
 drivers/thermal/cpu_cooling.c                    | 2 ++
 drivers/tty/serial/samsung.c                     | 2 ++
 drivers/tty/serial/sh-sci.c                      | 2 ++
 drivers/video/fbdev/da8xx-fb.c                   | 2 ++
 drivers/video/fbdev/nuc900fb.c                   | 1 +
 drivers/video/fbdev/pxafb.c                      | 2 ++
 drivers/video/fbdev/s3c2410fb.c                  | 2 ++
 drivers/video/fbdev/sa1100fb.c                   | 2 ++
 drivers/watchdog/s3c2410_wdt.c                   | 2 ++
 36 files changed, 75 insertions(+)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 7c4fada..57dff15 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -660,6 +660,8 @@ static int cpufreq_callback(struct notifier_block *nb,
 	struct cpufreq_freqs *freq = data;
 	int cpu = freq->cpu;
 
+	pr_info("%s\n", __func__);
+
 	if (freq->flags & CPUFREQ_CONST_LOOPS)
 		return NOTIFY_OK;
 
diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c
index dfc3213..986df64 100644
--- a/arch/arm/kernel/smp_twd.c
+++ b/arch/arm/kernel/smp_twd.c
@@ -161,6 +161,8 @@ static int twd_cpufreq_transition(struct notifier_block *nb,
 {
 	struct cpufreq_freqs *freqs = data;
 
+	pr_info("%s\n", __func__);
+
 	/*
 	 * The twd clock events must be reprogrammed to account for the new
 	 * frequency.  The timer is local to a cpu, so cross-call to the
diff --git a/arch/arm/mach-pxa/viper.c b/arch/arm/mach-pxa/viper.c
index 41f27f6..c047617 100644
--- a/arch/arm/mach-pxa/viper.c
+++ b/arch/arm/mach-pxa/viper.c
@@ -868,6 +868,8 @@ static int viper_cpufreq_notifier(struct notifier_block *nb,
 {
 	struct cpufreq_freqs *freq = data;
 
+	pr_info("%s\n", __func__);
+
 	/* TODO: Adjust timings??? */
 
 	switch (val) {
diff --git a/arch/arm/mach-s3c24xx/mach-osiris-dvs.c b/arch/arm/mach-s3c24xx/mach-osiris-dvs.c
index 33afb91..1caf694 100644
--- a/arch/arm/mach-s3c24xx/mach-osiris-dvs.c
+++ b/arch/arm/mach-s3c24xx/mach-osiris-dvs.c
@@ -61,6 +61,8 @@ static int osiris_dvs_notify(struct notifier_block *nb,
 	bool new_dvs = is_dvs(&freqs->new);
 	int ret = 0;
 
+	pr_info("%s\n", __func__);
+
 	if (!dvs_en)
 		return 0;
 
diff --git a/arch/blackfin/mach-common/dpmc.c b/arch/blackfin/mach-common/dpmc.c
index 724a8c5..1fd9a05 100644
--- a/arch/blackfin/mach-common/dpmc.c
+++ b/arch/blackfin/mach-common/dpmc.c
@@ -103,6 +103,8 @@ vreg_cpufreq_notifier(struct notifier_block *nb, unsigned long val, void *data)
 {
 	struct cpufreq_freqs *freq = data;
 
+	pr_info("%s\n", __func__);
+
 	if (freq->cpu != CPUFREQ_CPU)
 		return 0;
 
diff --git a/arch/cris/arch-v32/kernel/time.c b/arch/cris/arch-v32/kernel/time.c
index ee66866..b7cb448 100644
--- a/arch/cris/arch-v32/kernel/time.c
+++ b/arch/cris/arch-v32/kernel/time.c
@@ -303,6 +303,9 @@ cris_time_freq_notifier(struct notifier_block *nb, unsigned long val,
 			void *data)
 {
 	struct cpufreq_freqs *freqs = data;
+
+	pr_info("%s\n", __func__);
+
 	if (val == CPUFREQ_POSTCHANGE) {
 		reg_timer_r_tmr0_data data;
 		reg_timer_rw_tmr0_div div = (freqs->new * 500) / HZ;
diff --git a/arch/powerpc/oprofile/op_model_cell.c b/arch/powerpc/oprofile/op_model_cell.c
index 863d893..283c5d1 100644
--- a/arch/powerpc/oprofile/op_model_cell.c
+++ b/arch/powerpc/oprofile/op_model_cell.c
@@ -1120,6 +1120,9 @@ oprof_cpufreq_notify(struct notifier_block *nb, unsigned long val, void *data)
 {
 	int ret = 0;
 	struct cpufreq_freqs *frq = data;
+
+	pr_info("%s\n", __func__);
+
 	if ((val == CPUFREQ_PRECHANGE && frq->old < frq->new) ||
 	    (val == CPUFREQ_POSTCHANGE && frq->old > frq->new))
 		set_spu_profiling_frequency(frq->new, spu_cycle_reset);
diff --git a/arch/sparc/kernel/time_64.c b/arch/sparc/kernel/time_64.c
index 3fddf64..af62a1a 100644
--- a/arch/sparc/kernel/time_64.c
+++ b/arch/sparc/kernel/time_64.c
@@ -654,6 +654,8 @@ static int sparc64_cpufreq_notifier(struct notifier_block *nb, unsigned long val
 	unsigned int cpu = freq->cpu;
 	struct freq_table *ft = &per_cpu(sparc64_freq_table, cpu);
 
+	pr_info("%s\n", __func__);
+
 	if (!ft->ref_freq) {
 		ft->ref_freq = freq->old;
 		ft->clock_tick_ref = cpu_data(cpu).clock_tick;
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ea03031..81b9bbc 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -899,6 +899,8 @@ static int time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
 	struct cpufreq_freqs *freq = data;
 	unsigned long *lpj;
 
+	pr_info("%s\n", __func__);
+
 	if (cpu_has(&cpu_data(freq->cpu), X86_FEATURE_CONSTANT_TSC))
 		return 0;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ef432f8..dcba683 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5402,6 +5402,8 @@ static int kvmclock_cpu_notifier(struct notifier_block *nfb,
 {
 	unsigned int cpu = (unsigned long)hcpu;
 
+	pr_info("%s\n", __func__);
+
 	switch (action) {
 		case CPU_ONLINE:
 		case CPU_DOWN_FAILED:
diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index cfc8aba..eb858f7 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -79,6 +79,8 @@ static int acpi_processor_ppc_notifier(struct notifier_block *nb,
 	struct acpi_processor *pr;
 	unsigned int ppc = 0;
 
+	pr_info("CPUFREQ_NOTIFIER: %s\n", __func__);
+
 	if (event == CPUFREQ_START && ignore_ppc <= 0) {
 		ignore_ppc = 0;
 		return 0;
diff --git a/drivers/acpi/processor_thermal.c b/drivers/acpi/processor_thermal.c
index e003663..ef8fcec 100644
--- a/drivers/acpi/processor_thermal.c
+++ b/drivers/acpi/processor_thermal.c
@@ -89,6 +89,8 @@ static int acpi_thermal_cpufreq_notifier(struct notifier_block *nb,
 	struct cpufreq_policy *policy = data;
 	unsigned long max_freq = 0;
 
+	pr_info("CPUFREQ_NOTIFIER: %s\n", __func__);
+
 	if (event != CPUFREQ_ADJUST)
 		goto out;
 
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 6f02485..9255321 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1764,6 +1764,7 @@ int cpufreq_register_notifier(struct notifier_block *nb, unsigned int list)
 
 	switch (list) {
 	case CPUFREQ_TRANSITION_NOTIFIER:
+		pr_info("Registered transition notifier: %p, (%p)\n", nb->notifier_call, &nb->notifier_call);
 		ret = srcu_notifier_chain_register(
 				&cpufreq_transition_notifier_list, nb);
 		break;
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index 0cd9b4d..aed067c 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -298,6 +298,8 @@ static int cpufreq_stat_notifier_trans(struct notifier_block *nb,
 	struct cpufreq_stats *stat;
 	int old_index, new_index;
 
+	pr_info("CPUFREQ_NOTIFIER: %s\n", __func__);
+
 	if (val != CPUFREQ_POSTCHANGE)
 		return 0;
 
diff --git a/drivers/cpufreq/cris-artpec3-cpufreq.c b/drivers/cpufreq/cris-artpec3-cpufreq.c
index 601b88c..357bc51 100644
--- a/drivers/cpufreq/cris-artpec3-cpufreq.c
+++ b/drivers/cpufreq/cris-artpec3-cpufreq.c
@@ -76,6 +76,9 @@ cris_sdram_freq_notifier(struct notifier_block *nb, unsigned long val,
 {
 	int i;
 	struct cpufreq_freqs *freqs = data;
+
+	pr_info("%s\n", __func__);
+
 	if (val == CPUFREQ_PRECHANGE) {
 		reg_ddr2_rw_cfg cfg =
 		  REG_RD(ddr2, regi_ddr2_ctrl, rw_cfg);
diff --git a/drivers/cpufreq/cris-etraxfs-cpufreq.c b/drivers/cpufreq/cris-etraxfs-cpufreq.c
index 22b2cdd..acc859d 100644
--- a/drivers/cpufreq/cris-etraxfs-cpufreq.c
+++ b/drivers/cpufreq/cris-etraxfs-cpufreq.c
@@ -76,6 +76,9 @@ cris_sdram_freq_notifier(struct notifier_block *nb, unsigned long val,
 {
 	int i;
 	struct cpufreq_freqs *freqs = data;
+
+	pr_info("%s\n", __func__);
+
 	if (val == CPUFREQ_PRECHANGE) {
 		reg_bif_core_rw_sdram_timing timing =
 		    REG_RD(bif_core, regi_bif_core, rw_sdram_timing);
diff --git a/drivers/cpufreq/loongson2_cpufreq.c b/drivers/cpufreq/loongson2_cpufreq.c
index d4add86..42446f4 100644
--- a/drivers/cpufreq/loongson2_cpufreq.c
+++ b/drivers/cpufreq/loongson2_cpufreq.c
@@ -36,6 +36,8 @@ static struct notifier_block loongson2_cpufreq_notifier_block = {
 static int loongson2_cpu_freq_notifier(struct notifier_block *nb,
 					unsigned long val, void *data)
 {
+	pr_info("%s\n", __func__);
+
 	if (val == CPUFREQ_POSTCHANGE)
 		current_cpu_data.udelay_val = loops_per_jiffy;
 
diff --git a/drivers/cpufreq/ppc_cbe_cpufreq_pmi.c b/drivers/cpufreq/ppc_cbe_cpufreq_pmi.c
index d29e8da..0f1f2b1 100644
--- a/drivers/cpufreq/ppc_cbe_cpufreq_pmi.c
+++ b/drivers/cpufreq/ppc_cbe_cpufreq_pmi.c
@@ -97,6 +97,8 @@ static int pmi_notifier(struct notifier_block *nb,
 	struct cpufreq_frequency_table *cbe_freqs;
 	u8 node;
 
+	pr_info("%s\n", __func__);
+
 	/* Should this really be called for CPUFREQ_ADJUST, CPUFREQ_INCOMPATIBLE
 	 * and CPUFREQ_NOTIFY policy events?)
 	 */
diff --git a/drivers/gpu/drm/tilcdc/tilcdc_drv.c b/drivers/gpu/drm/tilcdc/tilcdc_drv.c
index b20b694..6defd4b 100644
--- a/drivers/gpu/drm/tilcdc/tilcdc_drv.c
+++ b/drivers/gpu/drm/tilcdc/tilcdc_drv.c
@@ -102,6 +102,9 @@ static int cpufreq_transition(struct notifier_block *nb,
 {
 	struct tilcdc_drm_private *priv = container_of(nb,
 			struct tilcdc_drm_private, freq_transition);
+
+	pr_info("%s\n", __func__);
+
 	if (val == CPUFREQ_POSTCHANGE) {
 		if (priv->lcd_fck_rate != clk_get_rate(priv->clk)) {
 			priv->lcd_fck_rate = clk_get_rate(priv->clk);
diff --git a/drivers/i2c/busses/i2c-davinci.c b/drivers/i2c/busses/i2c-davinci.c
index 389bc68..8228027 100644
--- a/drivers/i2c/busses/i2c-davinci.c
+++ b/drivers/i2c/busses/i2c-davinci.c
@@ -589,6 +589,8 @@ static int i2c_davinci_cpufreq_transition(struct notifier_block *nb,
 {
 	struct davinci_i2c_dev *dev;
 
+	pr_info("%s\n", __func__);
+
 	dev = container_of(nb, struct davinci_i2c_dev, freq_transition);
 	if (val == CPUFREQ_PRECHANGE) {
 		wait_for_completion(&dev->xfr_complete);
diff --git a/drivers/i2c/busses/i2c-s3c2410.c b/drivers/i2c/busses/i2c-s3c2410.c
index e828a1d..0c0ee6b 100644
--- a/drivers/i2c/busses/i2c-s3c2410.c
+++ b/drivers/i2c/busses/i2c-s3c2410.c
@@ -928,6 +928,8 @@ static int s3c24xx_i2c_cpufreq_transition(struct notifier_block *nb,
 	int delta_f;
 	int ret;
 
+	pr_info("%s\n", __func__);
+
 	delta_f = clk_get_rate(i2c->clk) - i2c->clkrate;
 
 	/* if we're post-change and the input clock has slowed down
diff --git a/drivers/macintosh/windfarm_cpufreq_clamp.c b/drivers/macintosh/windfarm_cpufreq_clamp.c
index 72d1fdf..8342ae1 100644
--- a/drivers/macintosh/windfarm_cpufreq_clamp.c
+++ b/drivers/macintosh/windfarm_cpufreq_clamp.c
@@ -22,6 +22,8 @@ static int clamp_notifier_call(struct notifier_block *self,
 	struct cpufreq_policy *p = data;
 	unsigned long max_freq;
 
+	pr_info("%s\n", __func__);
+
 	if (event != CPUFREQ_ADJUST)
 		return 0;
 
diff --git a/drivers/mmc/host/davinci_mmc.c b/drivers/mmc/host/davinci_mmc.c
index 5d4c5e0..bbb5a0e 100644
--- a/drivers/mmc/host/davinci_mmc.c
+++ b/drivers/mmc/host/davinci_mmc.c
@@ -1110,6 +1110,8 @@ static int mmc_davinci_cpufreq_transition(struct notifier_block *nb,
 	struct mmc_host *mmc;
 	unsigned long flags;
 
+	pr_info("%s\n", __func__);
+
 	host = container_of(nb, struct mmc_davinci_host, freq_transition);
 	mmc = host->mmc;
 	mmc_pclk = clk_get_rate(host->clk);
diff --git a/drivers/mmc/host/s3cmci.c b/drivers/mmc/host/s3cmci.c
index f237826..6bbdc6e 100644
--- a/drivers/mmc/host/s3cmci.c
+++ b/drivers/mmc/host/s3cmci.c
@@ -1451,6 +1451,8 @@ static int s3cmci_cpufreq_transition(struct notifier_block *nb,
 	unsigned long newclk;
 	unsigned long flags;
 
+	pr_info("%s\n", __func__);
+
 	host = container_of(nb, struct s3cmci_host, freq_transition);
 	newclk = clk_get_rate(host->clk);
 	mmc = host->mmc;
diff --git a/drivers/mtd/nand/s3c2410.c b/drivers/mtd/nand/s3c2410.c
index 79acbb8..d8eb2ee 100644
--- a/drivers/mtd/nand/s3c2410.c
+++ b/drivers/mtd/nand/s3c2410.c
@@ -686,6 +686,8 @@ static int s3c2410_nand_cpufreq_transition(struct notifier_block *nb,
 	struct s3c2410_nand_info *info;
 	unsigned long newclk;
 
+	pr_info("%s\n", __func__);
+
 	info = container_of(nb, struct s3c2410_nand_info, freq_transition);
 	newclk = clk_get_rate(info->clk);
 
diff --git a/drivers/pcmcia/soc_common.c b/drivers/pcmcia/soc_common.c
index a2bc6ee..e67eb30 100644
--- a/drivers/pcmcia/soc_common.c
+++ b/drivers/pcmcia/soc_common.c
@@ -644,6 +644,8 @@ soc_pcmcia_notifier(struct notifier_block *nb, unsigned long val, void *data)
 	struct cpufreq_freqs *freqs = data;
 	int ret = 0;
 
+	pr_info("%s\n", __func__);
+
 	mutex_lock(&soc_pcmcia_sockets_lock);
 	list_for_each_entry(skt, &soc_pcmcia_sockets, node)
 		if (skt->ops->frequency_change)
diff --git a/drivers/staging/tidspbridge/rmgr/drv_interface.c b/drivers/staging/tidspbridge/rmgr/drv_interface.c
index 74d31da..a93e2e4 100644
--- a/drivers/staging/tidspbridge/rmgr/drv_interface.c
+++ b/drivers/staging/tidspbridge/rmgr/drv_interface.c
@@ -380,6 +380,8 @@ static int dspbridge_scale_notification(struct notifier_block *op,
 	struct omap_dsp_platform_data *pdata =
 	    omap_dspbridge_dev->dev.platform_data;
 
+	pr_info("%s\n", __func__);
+
 	if (CPUFREQ_POSTCHANGE == val && pdata->dsp_get_opp)
 		pwr_pm_post_scale(PRCM_VDD1, pdata->dsp_get_opp());
 
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 84a75f8..6f42abc 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -317,6 +317,8 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
 	struct cpufreq_policy *policy = data;
 	unsigned long max_freq = 0;
 
+	pr_info("%s\n", __func__);
+
 	if (event != CPUFREQ_ADJUST || notify_device == NOTIFY_INVALID)
 		return 0;
 
diff --git a/drivers/tty/serial/samsung.c b/drivers/tty/serial/samsung.c
index c1d3ebd..0dabb16 100644
--- a/drivers/tty/serial/samsung.c
+++ b/drivers/tty/serial/samsung.c
@@ -1066,6 +1066,8 @@ static int s3c24xx_serial_cpufreq_transition(struct notifier_block *nb,
 	struct s3c24xx_uart_port *port;
 	struct uart_port *uport;
 
+	pr_info("%s\n", __func__);
+
 	port = container_of(nb, struct s3c24xx_uart_port, freq_transition);
 	uport = &port->port;
 
diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index 88236da..aaa78fa 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -1039,6 +1039,8 @@ static int sci_notifier(struct notifier_block *self,
 	struct sci_port *sci_port;
 	unsigned long flags;
 
+	pr_info("%s\n", __func__);
+
 	sci_port = container_of(self, struct sci_port, freq_transition);
 
 	if (phase == CPUFREQ_POSTCHANGE) {
diff --git a/drivers/video/fbdev/da8xx-fb.c b/drivers/video/fbdev/da8xx-fb.c
index a8484f7..4edf809 100644
--- a/drivers/video/fbdev/da8xx-fb.c
+++ b/drivers/video/fbdev/da8xx-fb.c
@@ -1048,6 +1048,8 @@ static int lcd_da8xx_cpufreq_transition(struct notifier_block *nb,
 {
 	struct da8xx_fb_par *par;
 
+	pr_info("%s\n", __func__);
+
 	par = container_of(nb, struct da8xx_fb_par, freq_transition);
 	if (val == CPUFREQ_POSTCHANGE) {
 		if (par->lcdc_clk_rate != clk_get_rate(par->lcdc_clk)) {
diff --git a/drivers/video/fbdev/nuc900fb.c b/drivers/video/fbdev/nuc900fb.c
index 478f980..a2d62bc 100644
--- a/drivers/video/fbdev/nuc900fb.c
+++ b/drivers/video/fbdev/nuc900fb.c
@@ -484,6 +484,7 @@ static inline void nuc900fb_cpufreq_deregister(struct nuc900fb_info *fbi)
 static inline int nuc900fb_cpufreq_transition(struct notifier_block *nb,
 				       unsigned long val, void *data)
 {
+	pr_info("%s\n", __func__);
 	return 0;
 }
 
diff --git a/drivers/video/fbdev/pxafb.c b/drivers/video/fbdev/pxafb.c
index 1ecd9ce..227eec7 100644
--- a/drivers/video/fbdev/pxafb.c
+++ b/drivers/video/fbdev/pxafb.c
@@ -1640,6 +1640,8 @@ pxafb_freq_transition(struct notifier_block *nb, unsigned long val, void *data)
 	/* TODO struct cpufreq_freqs *f = data; */
 	u_int pcd;
 
+	pr_info("%s\n", __func__);
+
 	switch (val) {
 	case CPUFREQ_PRECHANGE:
 #ifdef CONFIG_FB_PXA_OVERLAY
diff --git a/drivers/video/fbdev/s3c2410fb.c b/drivers/video/fbdev/s3c2410fb.c
index 81af5a6..eae9fc5 100644
--- a/drivers/video/fbdev/s3c2410fb.c
+++ b/drivers/video/fbdev/s3c2410fb.c
@@ -776,6 +776,8 @@ static int s3c2410fb_cpufreq_transition(struct notifier_block *nb,
 	struct fb_info *fbinfo;
 	long delta_f;
 
+	pr_info("%s\n", __func__);
+
 	info = container_of(nb, struct s3c2410fb_info, freq_transition);
 	fbinfo = platform_get_drvdata(to_platform_device(info->dev));
 
diff --git a/drivers/video/fbdev/sa1100fb.c b/drivers/video/fbdev/sa1100fb.c
index 580c444..da092bd 100644
--- a/drivers/video/fbdev/sa1100fb.c
+++ b/drivers/video/fbdev/sa1100fb.c
@@ -1006,6 +1006,8 @@ sa1100fb_freq_transition(struct notifier_block *nb, unsigned long val,
 	struct cpufreq_freqs *f = data;
 	u_int pcd;
 
+	pr_info("%s\n", __func__);
+
 	switch (val) {
 	case CPUFREQ_PRECHANGE:
 		set_ctrlr_state(fbi, C_DISABLE_CLKCHANGE);
diff --git a/drivers/watchdog/s3c2410_wdt.c b/drivers/watchdog/s3c2410_wdt.c
index 7c6ccd0..a01b11b 100644
--- a/drivers/watchdog/s3c2410_wdt.c
+++ b/drivers/watchdog/s3c2410_wdt.c
@@ -379,6 +379,8 @@ static int s3c2410wdt_cpufreq_transition(struct notifier_block *nb,
 	int ret;
 	struct s3c2410_wdt *wdt = freq_to_wdt(nb);
 
+	pr_info("%s\n", __func__);
+
 	if (!s3c2410wdt_is_running(wdt))
 		goto done;
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Fwd: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-12 23:39                         ` Aravind Gopalakrishnan
  (?)
@ 2014-08-13  8:02                         ` Oleksandr Natalenko
  -1 siblings, 0 replies; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-08-13  8:02 UTC (permalink / raw)
  To: Aravind Gopalakrishnan; +Cc: viresh.kumar, LKML, linux-pm

On Tuesday 12 August 2014 18:39:39 Aravind Gopalakrishnan wrote:
> (btw, this is a 'Kabini' processor and not a
> 'Kaveri')

Right, sorry.

> (I think you mentioned you were able to reproduce on 3.16. So assuming
> -rc will be affected too)

At least, 3.16-rc7 was affected too.

> Are you noticing this BUG when you are running any particular load?
> I could help debug effort or test patches to fix issue(whenever
> necessary) if I have some way to reproduce this..

Well, there's no particular workload type. The machine in question is my home 
router with x2go session, torrents, i2p and tor. And the most difficult is to 
wait for the hang to occur, as it happens ~1 time per 24 hours or so (could be 
much sooner or a little bit later).

Also, I've noticed that there's no need to load CPU intentionally with some 
test scripts to trigger this bug as CPU frequencies are jumping constantly:

===
pf@defiant:~$ cat 
/sys/devices/system/cpu/cpu{0,1,2,3}/cpufreq/stats/trans_table 
   From  :    To
         :   1600000   1400000   1200000   1000000    800000 
  1600000:         0     21936     14210      9997     61789 
  1400000:     12195         0      2149      1026     20635 
  1200000:     10557       881         0      1631     21333 
  1000000:     10292      1021      1596         0     25018 
   800000:     74888     12167     16447     25273         0 
   From  :    To
         :   1600000   1400000   1200000   1000000    800000 
  1600000:         0     23610     14510      9827     61768 
  1400000:     13856         0      2445      1086     20849 
  1200000:     10925       885         0      1773     21581 
  1000000:     10206      1037      1683         0     24775 
   800000:     74727     12704     16526     25015         0 
   From  :    To
         :   1600000   1400000   1200000   1000000    800000 
  1600000:         0     26721     15016     10040     61981 
  1400000:     16249         0      2902      1150     21444 
  1200000:     11811       958         0      1939     23065 
  1000000:     11310      1159      1958         0     26525 
   800000:     74387     12907     17897     27823         0 
   From  :    To
         :   1600000   1400000   1200000   1000000    800000 
  1600000:         0     27715     13060     10143     62328 
  1400000:     18215         0      3379      1881     24753 
  1200000:     12920      1459         0      3529     27835 
  1000000:     13193      1915      3372         0     39240 
   800000:     68917     17139     25932     42167         0
===

-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-13  5:56                   ` Oleksandr Natalenko
@ 2014-08-13 12:45                     ` Oleksandr Natalenko
  0 siblings, 0 replies; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-08-13 12:45 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linux-kernel, Linux PM list

Also I've found similar oops report with upstreamed fix (commit 
46a310b80bc2c9ccc019649c9da91194cbc10944).

On Wednesday 13 August 2014 08:56:02 Oleksandr Natalenko wrote:
> On Wednesday 13 August 2014 10:02:26 Viresh Kumar wrote:
> > That's why you should have copied the diff here, so that I could have
> > confirmed if what you have done is correct or not.
> 
> You may check my modifications here [1] or attached to this email.
> 
> [1] https://gist.github.com/bdcc7d41883b5e077974
-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-08-13  4:42                   ` Viresh Kumar
  2014-08-13  5:43                     ` Oleksandr Natalenko
@ 2014-11-11 10:41                     ` Oleksandr Natalenko
  2014-11-18 19:07                       ` Oleksandr Natalenko
  1 sibling, 1 reply; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-11-11 10:41 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linux-kernel, Linux PM list, linux-acpi

It seems that this bug has nothing to do with acpi-cpufreq code but with 
another ACPI area.

With ACPI enabled kernel may hang in a day or in a week (never survived more 
than approx. 2 weeks). With acpi=off it seems to work OK. For instance, I had 
to boot Ubuntu installer with acpi disabled to finish it successfully.

Usually, hanging is not accompanied by panic log. Only small vertical red 
lines appear on the screen near letters (tried to use plaintext 80x25 console 
without radeon and got the same issue).

Still observing this for 3.16 kernel.

On Wednesday 13 August 2014 10:12:06 Viresh Kumar wrote:
> On 13 August 2014 00:24, Oleksandr Natalenko <oleksandr@natalenko.name> 
wrote:
> > Updated logs.
> 
> Also will it be possible for you to find the last working kernel after
> which this
> happened?
-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-11-11 10:41                     ` Oleksandr Natalenko
@ 2014-11-18 19:07                       ` Oleksandr Natalenko
  2015-04-14 16:07                         ` Oleksandr Natalenko
  0 siblings, 1 reply; 23+ messages in thread
From: Oleksandr Natalenko @ 2014-11-18 19:07 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linux-kernel, Linux PM list, linux-acpi

Hmmm, weird, acpi=off as well as disabling ASPM and NMI watchdog didn't help :
(.

Now trying to update BIOS.

P.S. Still affected while using 3.17.2 kernel.

On Tuesday 11 November 2014 12:41:18 Oleksandr Natalenko wrote:
> It seems that this bug has nothing to do with acpi-cpufreq code but with
> another ACPI area.
> 
> With ACPI enabled kernel may hang in a day or in a week (never survived more
> than approx. 2 weeks). With acpi=off it seems to work OK. For instance, I
> had to boot Ubuntu installer with acpi disabled to finish it successfully.
> 
> Usually, hanging is not accompanied by panic log. Only small vertical red
> lines appear on the screen near letters (tried to use plaintext 80x25
> console without radeon and got the same issue).
> 
> Still observing this for 3.16 kernel.
> 
> On Wednesday 13 August 2014 10:12:06 Viresh Kumar wrote:
> > On 13 August 2014 00:24, Oleksandr Natalenko <oleksandr@natalenko.name>
> 
> wrote:
> > > Updated logs.
> > 
> > Also will it be possible for you to find the last working kernel after
> > which this
> > happened?
-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [BUG] oops in cpufreq driver with AMD Kaveri CPU
  2014-11-18 19:07                       ` Oleksandr Natalenko
@ 2015-04-14 16:07                         ` Oleksandr Natalenko
  0 siblings, 0 replies; 23+ messages in thread
From: Oleksandr Natalenko @ 2015-04-14 16:07 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linux-kernel, Linux PM list, linux-acpi

Cross-posting from kernel bugzilla:

===
Definitely not a kernel bug.

I've replaced RAM module with another one and the issue went away.

No idea why oopses refered to ACPI, but they seem to be the result of simple 
hardware incompatibility.
===

On Tuesday 18 November 2014 21:07:51 Oleksandr Natalenko wrote:
> Hmmm, weird, acpi=off as well as disabling ASPM and NMI watchdog didn't help
> : (.
> 
> Now trying to update BIOS.
> 
> P.S. Still affected while using 3.17.2 kernel.
> 
> On Tuesday 11 November 2014 12:41:18 Oleksandr Natalenko wrote:
> > It seems that this bug has nothing to do with acpi-cpufreq code but with
> > another ACPI area.
> > 
> > With ACPI enabled kernel may hang in a day or in a week (never survived
> > more than approx. 2 weeks). With acpi=off it seems to work OK. For
> > instance, I had to boot Ubuntu installer with acpi disabled to finish it
> > successfully.
> > 
> > Usually, hanging is not accompanied by panic log. Only small vertical red
> > lines appear on the screen near letters (tried to use plaintext 80x25
> > console without radeon and got the same issue).
> > 
> > Still observing this for 3.16 kernel.
> > 
> > On Wednesday 13 August 2014 10:12:06 Viresh Kumar wrote:
> > > On 13 August 2014 00:24, Oleksandr Natalenko <oleksandr@natalenko.name>
> > 
> > wrote:
> > > > Updated logs.
> > > 
> > > Also will it be possible for you to find the last working kernel after
> > > which this
> > > happened?
-- 
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-04-14 16:17 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-04 21:39 [BUG] oops in cpufreq driver with AMD Kaveri CPU Oleksandr Natalenko
2014-08-07 20:53 ` Oleksandr Natalenko
2014-08-08 17:26   ` Oleksandr Natalenko
2014-08-12  5:52   ` Viresh Kumar
2014-08-12  5:55     ` Oleksandr Natalenko
2014-08-12  6:16       ` Viresh Kumar
2014-08-12  7:26         ` Oleksandr Natalenko
2014-08-12  7:46           ` Viresh Kumar
2014-08-12 18:04             ` Oleksandr Natalenko
2014-08-12 18:18               ` Oleksandr Natalenko
2014-08-12 18:54                 ` Oleksandr Natalenko
     [not found]                   ` <CAOjmkp_mrMYJJfEqqKtPVrbMuaoJ9W6212LKHETeUsOsJryh-Q@mail.gmail.com>
     [not found]                     ` <CAOjmkp8h19de3bYaLpqmXxEynKi--gHBf6MxXuuNoDzZXw=O8Q@mail.gmail.com>
2014-08-12 23:39                       ` Fwd: " Aravind Gopalakrishnan
2014-08-12 23:39                         ` Aravind Gopalakrishnan
2014-08-13  8:02                         ` Oleksandr Natalenko
2014-08-13  4:36                   ` Viresh Kumar
2014-08-13  4:42                   ` Viresh Kumar
2014-08-13  5:43                     ` Oleksandr Natalenko
2014-11-11 10:41                     ` Oleksandr Natalenko
2014-11-18 19:07                       ` Oleksandr Natalenko
2015-04-14 16:07                         ` Oleksandr Natalenko
2014-08-13  4:32                 ` Viresh Kumar
2014-08-13  5:56                   ` Oleksandr Natalenko
2014-08-13 12:45                     ` Oleksandr Natalenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.