* [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done @ 2014-02-24 6:57 Saravana Kannan 2014-02-24 7:42 ` Srivatsa S. Bhat 0 siblings, 1 reply; 27+ messages in thread From: Saravana Kannan @ 2014-02-24 6:57 UTC (permalink / raw) To: Rafael J. Wysocki, Viresh Kumar Cc: linux-pm, linux-kernel, linux-arm-msm, linux-arm-kernel, Saravana Kannan The existing code sets the per CPU policy to a non-NULL value before all the steps performed during the hotplug online path is done. Specifically, this is done before the policy min/max, governors, etc are initialized for the policy. This in turn means that calls to cpufreq_cpu_get() return a non-NULL policy before the policy/CPU is ready to be used. To fix this, move the update of per CPU policy to a valid value after all the initialization steps for the policy are completed. Example kernel panic without this fix: [ 512.146185] Unable to handle kernel NULL pointer dereference at virtual address 00000020 [ 512.146195] pgd = c0003000 [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000 [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM <snip> [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150 <snip> [ 512.149740] ---[ end trace f23a8defea6cd706 ]--- [ 512.149761] Kernel panic - not syncing: Fatal exception [ 513.152016] CPU0: stopping [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W 3.10.0-gd727407-00074-g979ede8 #396 <snip> [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68) [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44) [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc) [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78) [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74) [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c) [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180) [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68) [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30) In this specific case, CPU0 set's CPU1's policy->governor in cpufreq_init_policy() to NULL while CPU1 is using the policy->governor in __cpufreq_governor(). Signed-off-by: Saravana Kannan <skannan@codeaurora.org> --- drivers/cpufreq/cpufreq.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index cb003a6..d5ceb43 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, goto err_set_policy_cpu; } - write_lock_irqsave(&cpufreq_driver_lock, flags); - for_each_cpu(j, policy->cpus) - per_cpu(cpufreq_cpu_data, j) = policy; - write_unlock_irqrestore(&cpufreq_driver_lock, flags); - if (cpufreq_driver->get) { policy->cur = cpufreq_driver->get(policy->cpu); if (!policy->cur) { @@ -1207,6 +1202,11 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, policy->user_policy.governor = policy->governor; } + write_lock_irqsave(&cpufreq_driver_lock, flags); + for_each_cpu(j, policy->cpus) + per_cpu(cpufreq_cpu_data, j) = policy; + write_unlock_irqrestore(&cpufreq_driver_lock, flags); + kobject_uevent(&policy->kobj, KOBJ_ADD); up_read(&cpufreq_rwsem); -- 1.8.2.1 The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-24 6:57 [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done Saravana Kannan @ 2014-02-24 7:42 ` Srivatsa S. Bhat 2014-02-24 8:11 ` Viresh Kumar 2014-02-24 8:39 ` skannan 0 siblings, 2 replies; 27+ messages in thread From: Srivatsa S. Bhat @ 2014-02-24 7:42 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm, linux-kernel, linux-arm-msm, linux-arm-kernel On 02/24/2014 12:27 PM, Saravana Kannan wrote: > The existing code sets the per CPU policy to a non-NULL value before all > the steps performed during the hotplug online path is done. Specifically, > this is done before the policy min/max, governors, etc are initialized for > the policy. This in turn means that calls to cpufreq_cpu_get() return a > non-NULL policy before the policy/CPU is ready to be used. > > To fix this, move the update of per CPU policy to a valid value after all > the initialization steps for the policy are completed. > > Example kernel panic without this fix: > [ 512.146185] Unable to handle kernel NULL pointer dereference at virtual address 00000020 > [ 512.146195] pgd = c0003000 > [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000 > [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM > <snip> > [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac > [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150 > <snip> > [ 512.149740] ---[ end trace f23a8defea6cd706 ]--- > [ 512.149761] Kernel panic - not syncing: Fatal exception > [ 513.152016] CPU0: stopping > [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W 3.10.0-gd727407-00074-g979ede8 #396 > <snip> > [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) > [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) > [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) > [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) > [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) > [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) > [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68) > [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44) > [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc) > [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78) > [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74) > [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c) > [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180) > [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68) > [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30) > > In this specific case, CPU0 set's CPU1's policy->governor in > cpufreq_init_policy() to NULL while CPU1 is using the policy->governor in > __cpufreq_governor(). > Whoa! That's horrible! Can you please explain how exactly you triggered this? I'm curious... > Signed-off-by: Saravana Kannan <skannan@codeaurora.org> > --- > drivers/cpufreq/cpufreq.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index cb003a6..d5ceb43 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, > goto err_set_policy_cpu; > } > > - write_lock_irqsave(&cpufreq_driver_lock, flags); > - for_each_cpu(j, policy->cpus) > - per_cpu(cpufreq_cpu_data, j) = policy; > - write_unlock_irqrestore(&cpufreq_driver_lock, flags); > - > if (cpufreq_driver->get) { > policy->cur = cpufreq_driver->get(policy->cpu); If you move the per-cpu init further down, then what happens to the cpufreq_generic_get() that gets invoked here by some of the drivers? It will almost always fail (because policy will be NULL) and hence CPU online will be unsuccessful (though you wont observe it because the error code is not bubbled up to the CPU hotplug core; perhaps we should). IMHO, we should fix the synchronization in cpufreq_init_policy(). I notice cpufreq_update_policy() as well as __cpufreq_governor() in your stack trace, which means cpufreq_set_policy() was involved. cpufreq_update_policy() takes the policy->rwsem for write, whereas cpufreq_init_policy() doesn't take that lock at all, which is clearly wrong. My guess is that, if we fix that locking, everything will be fine and you won't hit the bug. Would you like to give that a shot? Regards, Srivatsa S. Bhat > if (!policy->cur) { > @@ -1207,6 +1202,11 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, > policy->user_policy.governor = policy->governor; > } > > + write_lock_irqsave(&cpufreq_driver_lock, flags); > + for_each_cpu(j, policy->cpus) > + per_cpu(cpufreq_cpu_data, j) = policy; > + write_unlock_irqrestore(&cpufreq_driver_lock, flags); > + > kobject_uevent(&policy->kobj, KOBJ_ADD); > up_read(&cpufreq_rwsem); > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-24 7:42 ` Srivatsa S. Bhat @ 2014-02-24 8:11 ` Viresh Kumar 2014-02-24 8:41 ` skannan 2014-02-24 8:39 ` skannan 1 sibling, 1 reply; 27+ messages in thread From: Viresh Kumar @ 2014-02-24 8:11 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Saravana Kannan, Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 24 February 2014 13:12, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > On 02/24/2014 12:27 PM, Saravana Kannan wrote: >> The existing code sets the per CPU policy to a non-NULL value before all >> the steps performed during the hotplug online path is done. Specifically, >> this is done before the policy min/max, governors, etc are initialized for >> the policy. This in turn means that calls to cpufreq_cpu_get() return a >> non-NULL policy before the policy/CPU is ready to be used. >> >> To fix this, move the update of per CPU policy to a valid value after all >> the initialization steps for the policy are completed. >> >> Example kernel panic without this fix: >> [ 512.146185] Unable to handle kernel NULL pointer dereference at virtual address 00000020 >> [ 512.146195] pgd = c0003000 >> [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000 >> [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM >> <snip> >> [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac >> [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150 >> <snip> >> [ 512.149740] ---[ end trace f23a8defea6cd706 ]--- >> [ 512.149761] Kernel panic - not syncing: Fatal exception >> [ 513.152016] CPU0: stopping >> [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W 3.10.0-gd727407-00074-g979ede8 #396 >> <snip> >> [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) >> [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) >> [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) >> [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) >> [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) >> [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) >> [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68) >> [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44) >> [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc) >> [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78) >> [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74) >> [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c) >> [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180) >> [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68) >> [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30) >> >> In this specific case, CPU0 set's CPU1's policy->governor in >> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor in >> __cpufreq_governor(). >> > > Whoa! That's horrible! Can you please explain how exactly you > triggered this? I'm curious... :) Actually I didn't understood his problem very well until now. I couldn't get from his trace that which pointer was actually set to NULL here? And hence what caused this crash? Also, what Saravana just wrote here didn't looked like relevant at all. i.e.: policy->governor being set to NULL. So what? It was set to NULL with a reason and it shouldn't cause any crashes I hope. And also its sort of wrong to say that CPU0 has set governor for CPU1, as governor was set for a policy and both CPUs were part of the same policy. >> Signed-off-by: Saravana Kannan <skannan@codeaurora.org> >> --- >> drivers/cpufreq/cpufreq.c | 10 +++++----- >> 1 file changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >> index cb003a6..d5ceb43 100644 >> --- a/drivers/cpufreq/cpufreq.c >> +++ b/drivers/cpufreq/cpufreq.c >> @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, >> goto err_set_policy_cpu; >> } >> >> - write_lock_irqsave(&cpufreq_driver_lock, flags); >> - for_each_cpu(j, policy->cpus) >> - per_cpu(cpufreq_cpu_data, j) = policy; >> - write_unlock_irqrestore(&cpufreq_driver_lock, flags); >> - >> if (cpufreq_driver->get) { >> policy->cur = cpufreq_driver->get(policy->cpu); > > If you move the per-cpu init further down, then what happens to the > cpufreq_generic_get() that gets invoked here by some of the drivers? > It will almost always fail (because policy will be NULL) and hence > CPU online will be unsuccessful Thanks, I was about to point that as well. > (though you wont observe it because > the error code is not bubbled up to the CPU hotplug core; perhaps we > should). Don't know :) > IMHO, we should fix the synchronization in cpufreq_init_policy(). > I notice cpufreq_update_policy() as well as __cpufreq_governor() in > your stack trace, which means cpufreq_set_policy() was involved. > cpufreq_update_policy() takes the policy->rwsem for write, whereas > cpufreq_init_policy() doesn't take that lock at all, which is clearly > wrong. > > My guess is that, if we fix that locking, everything will be fine and > you won't hit the bug. Would you like to give that a shot? I am not sure about that guess as well. I first want to know what went bad exactly.. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-24 8:11 ` Viresh Kumar @ 2014-02-24 8:41 ` skannan 2014-02-24 8:43 ` Viresh Kumar 0 siblings, 1 reply; 27+ messages in thread From: skannan @ 2014-02-24 8:41 UTC (permalink / raw) To: Viresh Kumar Cc: Srivatsa S. Bhat, Saravana Kannan, Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel Viresh Kumar wrote: > On 24 February 2014 13:12, Srivatsa S. Bhat > <srivatsa.bhat@linux.vnet.ibm.com> wrote: >> On 02/24/2014 12:27 PM, Saravana Kannan wrote: >>> The existing code sets the per CPU policy to a non-NULL value before >>> all >>> the steps performed during the hotplug online path is done. >>> Specifically, >>> this is done before the policy min/max, governors, etc are initialized >>> for >>> the policy. This in turn means that calls to cpufreq_cpu_get() return >>> a >>> non-NULL policy before the policy/CPU is ready to be used. >>> >>> To fix this, move the update of per CPU policy to a valid value after >>> all >>> the initialization steps for the policy are completed. >>> >>> Example kernel panic without this fix: >>> [ 512.146185] Unable to handle kernel NULL pointer dereference at >>> virtual address 00000020 >>> [ 512.146195] pgd = c0003000 >>> [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000 >>> [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM >>> <snip> >>> [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac >>> [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150 >>> <snip> >>> [ 512.149740] ---[ end trace f23a8defea6cd706 ]--- >>> [ 512.149761] Kernel panic - not syncing: Fatal exception >>> [ 513.152016] CPU0: stopping >>> [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W >>> 3.10.0-gd727407-00074-g979ede8 #396 >>> <snip> >>> [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from >>> [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) >>> [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) >>> from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) >>> [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) >>> from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) >>> [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from >>> [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) >>> [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from >>> [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) >>> [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) >>> from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) >>> [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from >>> [<c0afe180>] (notifier_call_chain+0x40/0x68) >>> [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from >>> [<c02812dc>] (__cpu_notify+0x28/0x44) >>> [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] >>> (_cpu_up+0xf4/0x1dc) >>> [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] >>> (cpu_up+0x5c/0x78) >>> [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] >>> (store_online+0x44/0x74) >>> [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] >>> (sysfs_write_file+0x108/0x14c) >>> [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from >>> [<c03517d4>] (vfs_write+0xd0/0x180) >>> [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] >>> (SyS_write+0x38/0x68) >>> [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] >>> (ret_fast_syscall+0x0/0x30) >>> >>> In this specific case, CPU0 set's CPU1's policy->governor in >>> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor >>> in >>> __cpufreq_governor(). >>> >> >> Whoa! That's horrible! Can you please explain how exactly you >> triggered this? I'm curious... > > :) > > Actually I didn't understood his problem very well until now. I couldn't > get from his trace that which pointer was actually set to NULL here? > And hence what caused this crash? > > Also, what Saravana just wrote here didn't looked like relevant at all. > i.e.: policy->governor being set to NULL. > > So what? It was set to NULL with a reason and it shouldn't cause > any crashes I hope. And also its sort of wrong to say that CPU0 > has set governor for CPU1, as governor was set for a policy and > both CPUs were part of the same policy. > >>> Signed-off-by: Saravana Kannan <skannan@codeaurora.org> >>> --- >>> drivers/cpufreq/cpufreq.c | 10 +++++----- >>> 1 file changed, 5 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>> index cb003a6..d5ceb43 100644 >>> --- a/drivers/cpufreq/cpufreq.c >>> +++ b/drivers/cpufreq/cpufreq.c >>> @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev, >>> struct subsys_interface *sif, >>> goto err_set_policy_cpu; >>> } >>> >>> - write_lock_irqsave(&cpufreq_driver_lock, flags); >>> - for_each_cpu(j, policy->cpus) >>> - per_cpu(cpufreq_cpu_data, j) = policy; >>> - write_unlock_irqrestore(&cpufreq_driver_lock, flags); >>> - >>> if (cpufreq_driver->get) { >>> policy->cur = cpufreq_driver->get(policy->cpu); >> >> If you move the per-cpu init further down, then what happens to the >> cpufreq_generic_get() that gets invoked here by some of the drivers? >> It will almost always fail (because policy will be NULL) and hence >> CPU online will be unsuccessful > > Thanks, I was about to point that as well. > >> (though you wont observe it because >> the error code is not bubbled up to the CPU hotplug core; perhaps we >> should). > > Don't know :) > >> IMHO, we should fix the synchronization in cpufreq_init_policy(). >> I notice cpufreq_update_policy() as well as __cpufreq_governor() in >> your stack trace, which means cpufreq_set_policy() was involved. >> cpufreq_update_policy() takes the policy->rwsem for write, whereas >> cpufreq_init_policy() doesn't take that lock at all, which is clearly >> wrong. >> >> My guess is that, if we fix that locking, everything will be fine and >> you won't hit the bug. Would you like to give that a shot? > > I am not sure about that guess as well. I first want to know what > went bad exactly.. > I just replied to the other email. I think I answered both your questions there. Sorry about mixing up CPU and policy. In my case, each CPU is independently scalable -- so for now take CPU == policy. I'll fix it up once we agree on the fix. -Saravana -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-24 8:41 ` skannan @ 2014-02-24 8:43 ` Viresh Kumar 2014-02-24 8:47 ` skannan 0 siblings, 1 reply; 27+ messages in thread From: Viresh Kumar @ 2014-02-24 8:43 UTC (permalink / raw) To: Saravana Kannan Cc: Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 24 February 2014 14:11, <skannan@codeaurora.org> wrote: > I just replied to the other email. I think I answered both your questions > there. Sorry about mixing up CPU and policy. In my case, each CPU is > independently scalable -- so for now take CPU == policy. I'll fix it up > once we agree on the fix. But why do you say this then? >>> In this specific case, CPU0 set's CPU1's policy->governor in >>> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor >>> in __cpufreq_governor(). ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-24 8:43 ` Viresh Kumar @ 2014-02-24 8:47 ` skannan 2014-02-24 8:50 ` Viresh Kumar 0 siblings, 1 reply; 27+ messages in thread From: skannan @ 2014-02-24 8:47 UTC (permalink / raw) To: Viresh Kumar Cc: Saravana Kannan, Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel Viresh Kumar wrote: > On 24 February 2014 14:11, <skannan@codeaurora.org> wrote: >> I just replied to the other email. I think I answered both your >> questions >> there. Sorry about mixing up CPU and policy. In my case, each CPU is >> independently scalable -- so for now take CPU == policy. I'll fix it up >> once we agree on the fix. > > But why do you say this then? Sorry, not sure I understand what you mean. I agree, wording in my commit text might be unclear. I'll fix it after we agree on the code fix. In the MSM case, each CPU has it's own policy. I'm assuming your original complaint was about my confusing wording. Maybe that's not what you were pointing out? -Saravana -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-24 8:47 ` skannan @ 2014-02-24 8:50 ` Viresh Kumar 2014-02-24 9:00 ` skannan 0 siblings, 1 reply; 27+ messages in thread From: Viresh Kumar @ 2014-02-24 8:50 UTC (permalink / raw) To: Saravana Kannan Cc: Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 24 February 2014 14:17, <skannan@codeaurora.org> wrote: > Sorry, not sure I understand what you mean. > > I agree, wording in my commit text might be unclear. I'll fix it after we > agree on the code fix. In the MSM case, each CPU has it's own policy. > > I'm assuming your original complaint was about my confusing wording. Maybe > that's not what you were pointing out? In your case each CPU has a separate policy structure as they have separately controllable clocks. But you also said that CPU0 is setting CPU1's governor to NULL. I don't see that happening. Each CPU sets its own governor to NULL on init(). ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-24 8:50 ` Viresh Kumar @ 2014-02-24 9:00 ` skannan 0 siblings, 0 replies; 27+ messages in thread From: skannan @ 2014-02-24 9:00 UTC (permalink / raw) To: Viresh Kumar Cc: Saravana Kannan, Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel Viresh Kumar wrote: > On 24 February 2014 14:17, <skannan@codeaurora.org> wrote: >> Sorry, not sure I understand what you mean. >> >> I agree, wording in my commit text might be unclear. I'll fix it after >> we >> agree on the code fix. In the MSM case, each CPU has it's own policy. >> >> I'm assuming your original complaint was about my confusing wording. >> Maybe >> that's not what you were pointing out? > > In your case each CPU has a separate policy structure as they have > separately > controllable clocks. But you also said that CPU0 is setting CPU1's > governor to > NULL. I don't see that happening. Each CPU sets its own governor to NULL > on > init(). When I said "CPU0 is setting CPU1's governor to NULL", I meant thread/context running in CPU0 is setting CPU1's governor as part of CPU1's ONLINE notifier. -Saravana -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-24 7:42 ` Srivatsa S. Bhat 2014-02-24 8:11 ` Viresh Kumar @ 2014-02-24 8:39 ` skannan 2014-02-24 10:55 ` Viresh Kumar 1 sibling, 1 reply; 27+ messages in thread From: skannan @ 2014-02-24 8:39 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Saravana Kannan, Rafael J. Wysocki, Viresh Kumar, linux-pm, linux-kernel, linux-arm-msm, linux-arm-kernel Srivatsa S. Bhat wrote: > On 02/24/2014 12:27 PM, Saravana Kannan wrote: >> The existing code sets the per CPU policy to a non-NULL value before all >> the steps performed during the hotplug online path is done. >> Specifically, >> this is done before the policy min/max, governors, etc are initialized >> for >> the policy. This in turn means that calls to cpufreq_cpu_get() return a >> non-NULL policy before the policy/CPU is ready to be used. >> >> To fix this, move the update of per CPU policy to a valid value after >> all >> the initialization steps for the policy are completed. >> >> Example kernel panic without this fix: >> [ 512.146185] Unable to handle kernel NULL pointer dereference at >> virtual address 00000020 >> [ 512.146195] pgd = c0003000 >> [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000 >> [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM >> <snip> >> [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac >> [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150 >> <snip> >> [ 512.149740] ---[ end trace f23a8defea6cd706 ]--- >> [ 512.149761] Kernel panic - not syncing: Fatal exception >> [ 513.152016] CPU0: stopping >> [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W >> 3.10.0-gd727407-00074-g979ede8 #396 >> <snip> >> [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from >> [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) >> [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) >> from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) >> [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) >> from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) >> [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from >> [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) >> [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from >> [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) >> [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from >> [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) >> [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from >> [<c0afe180>] (notifier_call_chain+0x40/0x68) >> [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from >> [<c02812dc>] (__cpu_notify+0x28/0x44) >> [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] >> (_cpu_up+0xf4/0x1dc) >> [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] >> (cpu_up+0x5c/0x78) >> [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] >> (store_online+0x44/0x74) >> [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] >> (sysfs_write_file+0x108/0x14c) >> [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from >> [<c03517d4>] (vfs_write+0xd0/0x180) >> [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] >> (SyS_write+0x38/0x68) >> [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] >> (ret_fast_syscall+0x0/0x30) >> >> In this specific case, CPU0 set's CPU1's policy->governor in >> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor >> in >> __cpufreq_governor(). >> > > Whoa! That's horrible! Can you please explain how exactly you > triggered this? I'm curious... Just call cpufreq_update_policy(cpu) on a CPU and have another thread online/offline it. > >> Signed-off-by: Saravana Kannan <skannan@codeaurora.org> >> --- >> drivers/cpufreq/cpufreq.c | 10 +++++----- >> 1 file changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >> index cb003a6..d5ceb43 100644 >> --- a/drivers/cpufreq/cpufreq.c >> +++ b/drivers/cpufreq/cpufreq.c >> @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev, >> struct subsys_interface *sif, >> goto err_set_policy_cpu; >> } >> >> - write_lock_irqsave(&cpufreq_driver_lock, flags); >> - for_each_cpu(j, policy->cpus) >> - per_cpu(cpufreq_cpu_data, j) = policy; >> - write_unlock_irqrestore(&cpufreq_driver_lock, flags); >> - >> if (cpufreq_driver->get) { >> policy->cur = cpufreq_driver->get(policy->cpu); > > If you move the per-cpu init further down, then what happens to the > cpufreq_generic_get() that gets invoked here by some of the drivers? While cpufreq_generic_get() was a good refactor, I think it's causing unnecessary cyclic dependency. You need that function to not fail for a policy to get added properly and you need a proper policy for the function to work. I care more about fixing the panic than trying to keep cpufreq_generic_get(). > It will almost always fail (because policy will be NULL) and hence > CPU online will be unsuccessful (though you wont observe it because > the error code is not bubbled up to the CPU hotplug core; perhaps we > should). Good catch. I actually hit this issue, fixed and test it on a 3.12 cpufreq code base. Since this new function isn't there at that point, I missed it. Even if I did use the latest kernel, I wouldn't have hit this issue, because the MSM cpufreq driver doesn't use this function. > > IMHO, we should fix the synchronization in cpufreq_init_policy(). > I notice cpufreq_update_policy() as well as __cpufreq_governor() in > your stack trace, which means cpufreq_set_policy() was involved. > cpufreq_update_policy() takes the policy->rwsem for write, whereas > cpufreq_init_policy() doesn't take that lock at all, which is clearly > wrong. > > My guess is that, if we fix that locking, everything will be fine and > you won't hit the bug. Would you like to give that a shot? While locking might need fixing, this is not about the locking though. Plenty of drivers and the framework use cpufreq_cpu_get() to get a policy that they consider valid and also use it as a way to check and reject API calls trying to manipulate an offline CPU. Without this fix, the framework is advertising an incomplete policy object as available. I think that breaks the CPUfreq framework APIs in a very fundamental sense. This is a "no-no" in programming. It's like trying to register a CPUfreq driver before getting the clocks to control the CPU. As for the ->get() issue, I think the framework should just call clk_get_rate() instead of calling .get() if policy->clk is not NULL. No point in doing framework -> driver -> framework. Thanks, Saravana -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-24 8:39 ` skannan @ 2014-02-24 10:55 ` Viresh Kumar 2014-02-24 20:23 ` Saravana Kannan 0 siblings, 1 reply; 27+ messages in thread From: Viresh Kumar @ 2014-02-24 10:55 UTC (permalink / raw) To: Saravana Kannan Cc: Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 24 February 2014 14:09, <skannan@codeaurora.org> wrote: > > Srivatsa S. Bhat wrote: >> On 02/24/2014 12:27 PM, Saravana Kannan wrote: >>> The existing code sets the per CPU policy to a non-NULL value before all >>> the steps performed during the hotplug online path is done. >>> Specifically, >>> this is done before the policy min/max, governors, etc are initialized >>> for >>> the policy. This in turn means that calls to cpufreq_cpu_get() return a >>> non-NULL policy before the policy/CPU is ready to be used. >>> >>> To fix this, move the update of per CPU policy to a valid value after >>> all >>> the initialization steps for the policy are completed. >>> >>> Example kernel panic without this fix: >>> [ 512.146185] Unable to handle kernel NULL pointer dereference at >>> virtual address 00000020 >>> [ 512.146195] pgd = c0003000 >>> [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000 >>> [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM >>> <snip> >>> [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac >>> [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150 >>> <snip> >>> [ 512.149740] ---[ end trace f23a8defea6cd706 ]--- >>> [ 512.149761] Kernel panic - not syncing: Fatal exception >>> [ 513.152016] CPU0: stopping >>> [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W >>> 3.10.0-gd727407-00074-g979ede8 #396 >>> <snip> >>> [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from >>> [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) >>> [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) >>> from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) >>> [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) >>> from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) >>> [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from >>> [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) >>> [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from >>> [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) >>> [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from >>> [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) >>> [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from >>> [<c0afe180>] (notifier_call_chain+0x40/0x68) >>> [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from >>> [<c02812dc>] (__cpu_notify+0x28/0x44) >>> [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] >>> (_cpu_up+0xf4/0x1dc) >>> [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] >>> (cpu_up+0x5c/0x78) >>> [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] >>> (store_online+0x44/0x74) >>> [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] >>> (sysfs_write_file+0x108/0x14c) >>> [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from >>> [<c03517d4>] (vfs_write+0xd0/0x180) >>> [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] >>> (SyS_write+0x38/0x68) >>> [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] >>> (ret_fast_syscall+0x0/0x30) >>> >>> In this specific case, CPU0 set's CPU1's policy->governor in >>> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor >>> in >>> __cpufreq_governor(). >>> >> >> Whoa! That's horrible! Can you please explain how exactly you >> triggered this? I'm curious... > > Just call cpufreq_update_policy(cpu) on a CPU and have another thread > online/offline it. But would you do that? Means why would you call them this way? cpufreq_update_policy() shouldn't be called that way I believe.. >>> Signed-off-by: Saravana Kannan <skannan@codeaurora.org> >>> --- >>> drivers/cpufreq/cpufreq.c | 10 +++++----- >>> 1 file changed, 5 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>> index cb003a6..d5ceb43 100644 >>> --- a/drivers/cpufreq/cpufreq.c >>> +++ b/drivers/cpufreq/cpufreq.c >>> @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev, >>> struct subsys_interface *sif, >>> goto err_set_policy_cpu; >>> } >>> >>> - write_lock_irqsave(&cpufreq_driver_lock, flags); >>> - for_each_cpu(j, policy->cpus) >>> - per_cpu(cpufreq_cpu_data, j) = policy; >>> - write_unlock_irqrestore(&cpufreq_driver_lock, flags); >>> - >>> if (cpufreq_driver->get) { >>> policy->cur = cpufreq_driver->get(policy->cpu); >> >> If you move the per-cpu init further down, then what happens to the >> cpufreq_generic_get() that gets invoked here by some of the drivers? > > While cpufreq_generic_get() was a good refactor, I think it's causing > unnecessary cyclic dependency. You need that function to not fail for a > policy to get added properly and you need a proper policy for the function > to work. I care more about fixing the panic than trying to keep > cpufreq_generic_get(). Surely, I will like a solution which would keep this as is :).. cpufreq_generic_get() should pass.. >> It will almost always fail (because policy will be NULL) and hence >> CPU online will be unsuccessful (though you wont observe it because >> the error code is not bubbled up to the CPU hotplug core; perhaps we >> should). > > Good catch. I actually hit this issue, fixed and test it on a 3.12 cpufreq > code base. Since this new function isn't there at that point, I missed it. > Even if I did use the latest kernel, I wouldn't have hit this issue, > because the MSM cpufreq driver doesn't use this function. > >> >> IMHO, we should fix the synchronization in cpufreq_init_policy(). >> I notice cpufreq_update_policy() as well as __cpufreq_governor() in >> your stack trace, which means cpufreq_set_policy() was involved. >> cpufreq_update_policy() takes the policy->rwsem for write, whereas >> cpufreq_init_policy() doesn't take that lock at all, which is clearly >> wrong. >> >> My guess is that, if we fix that locking, everything will be fine and >> you won't hit the bug. Would you like to give that a shot? > > While locking might need fixing, this is not about the locking though. > Plenty of drivers and the framework use cpufreq_cpu_get() to get a policy > that they consider valid and also use it as a way to check and reject API > calls trying to manipulate an offline CPU. > > Without this fix, the framework is advertising an incomplete policy object > as available. I think that breaks the CPUfreq framework APIs in a very > fundamental sense. This is a "no-no" in programming. > > It's like trying to register a CPUfreq driver before getting the clocks to > control the CPU. So the real question here is: What fields of 'policy' do we need to guarantee to be initialized before policy is made available for others? And probably that is what we need to fix. Even your current solution would break things. For example, below will happen before policy is set in per-cpu variable: - CPUFREQ_CREATE_POLICY notifier will be fired and calls to do cpufreq_cpu_get() there will fail. And hence cpufreq-stats sysfs will break. - Governors also use cpufreq_cpu_get() and so those would also fail as they are started from cpufreq_init_policy().. .. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-24 10:55 ` Viresh Kumar @ 2014-02-24 20:23 ` Saravana Kannan 2014-02-25 8:50 ` Viresh Kumar 0 siblings, 1 reply; 27+ messages in thread From: Saravana Kannan @ 2014-02-24 20:23 UTC (permalink / raw) To: Viresh Kumar Cc: Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 02/24/2014 02:55 AM, Viresh Kumar wrote: > On 24 February 2014 14:09, <skannan@codeaurora.org> wrote: >> >> Srivatsa S. Bhat wrote: >>> On 02/24/2014 12:27 PM, Saravana Kannan wrote: >>>> The existing code sets the per CPU policy to a non-NULL value before all >>>> the steps performed during the hotplug online path is done. >>>> Specifically, >>>> this is done before the policy min/max, governors, etc are initialized >>>> for >>>> the policy. This in turn means that calls to cpufreq_cpu_get() return a >>>> non-NULL policy before the policy/CPU is ready to be used. >>>> >>>> To fix this, move the update of per CPU policy to a valid value after >>>> all >>>> the initialization steps for the policy are completed. >>>> >>>> Example kernel panic without this fix: >>>> [ 512.146185] Unable to handle kernel NULL pointer dereference at >>>> virtual address 00000020 >>>> [ 512.146195] pgd = c0003000 >>>> [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000 >>>> [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM >>>> <snip> >>>> [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac >>>> [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150 >>>> <snip> >>>> [ 512.149740] ---[ end trace f23a8defea6cd706 ]--- >>>> [ 512.149761] Kernel panic - not syncing: Fatal exception >>>> [ 513.152016] CPU0: stopping >>>> [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W >>>> 3.10.0-gd727407-00074-g979ede8 #396 >>>> <snip> >>>> [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from >>>> [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) >>>> [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) >>>> from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) >>>> [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) >>>> from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) >>>> [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from >>>> [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) >>>> [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from >>>> [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) >>>> [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from >>>> [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) >>>> [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from >>>> [<c0afe180>] (notifier_call_chain+0x40/0x68) >>>> [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from >>>> [<c02812dc>] (__cpu_notify+0x28/0x44) >>>> [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] >>>> (_cpu_up+0xf4/0x1dc) >>>> [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] >>>> (cpu_up+0x5c/0x78) >>>> [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] >>>> (store_online+0x44/0x74) >>>> [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] >>>> (sysfs_write_file+0x108/0x14c) >>>> [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from >>>> [<c03517d4>] (vfs_write+0xd0/0x180) >>>> [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] >>>> (SyS_write+0x38/0x68) >>>> [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] >>>> (ret_fast_syscall+0x0/0x30) >>>> >>>> In this specific case, CPU0 set's CPU1's policy->governor in >>>> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor >>>> in >>>> __cpufreq_governor(). >>>> >>> >>> Whoa! That's horrible! Can you please explain how exactly you >>> triggered this? I'm curious... >> >> Just call cpufreq_update_policy(cpu) on a CPU and have another thread >> online/offline it. > > But would you do that? Means why would you call them this way? > cpufreq_update_policy() shouldn't be called that way I believe.. I was simplifying the scenario that causes it. We change the min/max using ADJUST notifiers for multiple reasons -- thermal being one of them. thermal/cpu_cooling is one example of it. So, cpufreq_update_policy() can be called on any CPU. If that races with someone offlining a CPU and onlining it, you'll get this crash. >>>> Signed-off-by: Saravana Kannan <skannan@codeaurora.org> >>>> --- >>>> drivers/cpufreq/cpufreq.c | 10 +++++----- >>>> 1 file changed, 5 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>>> index cb003a6..d5ceb43 100644 >>>> --- a/drivers/cpufreq/cpufreq.c >>>> +++ b/drivers/cpufreq/cpufreq.c >>>> @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev, >>>> struct subsys_interface *sif, >>>> goto err_set_policy_cpu; >>>> } >>>> >>>> - write_lock_irqsave(&cpufreq_driver_lock, flags); >>>> - for_each_cpu(j, policy->cpus) >>>> - per_cpu(cpufreq_cpu_data, j) = policy; >>>> - write_unlock_irqrestore(&cpufreq_driver_lock, flags); >>>> - >>>> if (cpufreq_driver->get) { >>>> policy->cur = cpufreq_driver->get(policy->cpu); >>> >>> If you move the per-cpu init further down, then what happens to the >>> cpufreq_generic_get() that gets invoked here by some of the drivers? >> >> While cpufreq_generic_get() was a good refactor, I think it's causing >> unnecessary cyclic dependency. You need that function to not fail for a >> policy to get added properly and you need a proper policy for the function >> to work. I care more about fixing the panic than trying to keep >> cpufreq_generic_get(). > > Surely, I will like a solution which would keep this as is :).. > cpufreq_generic_get() should pass.. The idea would exist, but we can just call cpufreq_generic_get() and pass it policy->clk if it is not NULL. Does that work for you? >>> It will almost always fail (because policy will be NULL) and hence >>> CPU online will be unsuccessful (though you wont observe it because >>> the error code is not bubbled up to the CPU hotplug core; perhaps we >>> should). >> >> Good catch. I actually hit this issue, fixed and test it on a 3.12 cpufreq >> code base. Since this new function isn't there at that point, I missed it. >> Even if I did use the latest kernel, I wouldn't have hit this issue, >> because the MSM cpufreq driver doesn't use this function. >> >>> >>> IMHO, we should fix the synchronization in cpufreq_init_policy(). >>> I notice cpufreq_update_policy() as well as __cpufreq_governor() in >>> your stack trace, which means cpufreq_set_policy() was involved. >>> cpufreq_update_policy() takes the policy->rwsem for write, whereas >>> cpufreq_init_policy() doesn't take that lock at all, which is clearly >>> wrong. >>> >>> My guess is that, if we fix that locking, everything will be fine and >>> you won't hit the bug. Would you like to give that a shot? >> >> While locking might need fixing, this is not about the locking though. >> Plenty of drivers and the framework use cpufreq_cpu_get() to get a policy >> that they consider valid and also use it as a way to check and reject API >> calls trying to manipulate an offline CPU. >> >> Without this fix, the framework is advertising an incomplete policy object >> as available. I think that breaks the CPUfreq framework APIs in a very >> fundamental sense. This is a "no-no" in programming. >> >> It's like trying to register a CPUfreq driver before getting the clocks to >> control the CPU. > > So the real question here is: What fields of 'policy' do we need to guarantee > to be initialized before policy is made available for others? And probably > that is what we need to fix. That's going to be hard to define since future uses could be looking at different fields. What is the API semantics of cpufreq_cpu_get(). I can't ever imagine it being correct for any API to return a partially initialized data structure. > Even your current solution would break things. For example, below will happen > before policy is set in per-cpu variable: > - CPUFREQ_CREATE_POLICY notifier will be fired and calls to do cpufreq_cpu_get() > there will fail. And hence cpufreq-stats sysfs will break. I did this on 3.12. I'll fix it up to handle this one. > - Governors also use cpufreq_cpu_get() and so those would also fail as they > are started from cpufreq_init_policy().. The only use of this in governors is in update_sampling_rate() and the behavior after this fix would be correct in that case. If the policy is not fully initialized -- that update doesn't get to go through. All other uses of this API fall under one of these categories: * CPUs are already fully offline whenever it's called. * Already get the real policy as part of the notifier so don't need to call cpufreq_cpu_get If I find any that doesn't fit the above, I would be glad to fix that if it's pointed out. Regards, Saravana -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-24 20:23 ` Saravana Kannan @ 2014-02-25 8:50 ` Viresh Kumar 2014-02-25 13:04 ` Rafael J. Wysocki 0 siblings, 1 reply; 27+ messages in thread From: Viresh Kumar @ 2014-02-25 8:50 UTC (permalink / raw) To: Saravana Kannan Cc: Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 25 February 2014 01:53, Saravana Kannan <skannan@codeaurora.org> wrote: > I was simplifying the scenario that causes it. We change the min/max using > ADJUST notifiers for multiple reasons -- thermal being one of them. > > thermal/cpu_cooling is one example of it. Just to understand the clear picture, you are actually hitting this bug? Or is this only a theoretical bug? > So, cpufreq_update_policy() can be called on any CPU. If that races with > someone offlining a CPU and onlining it, you'll get this crash. Then shouldn't that be fixed by locks? I think yes. That makes me agree with Srivatsa more here. Though I would say that your argument was also valid that 'policy' shouldn't be up for sale unless it is prepared to. And for that reason only I floated that question earlier: What exactly we need to make sure is initialized in policy? Because policy might keep changing in future as well and that needs locks to protect that stuff. Like min/max/governor/ etc.. So, probably a solution here might be a mix of both. Initialize policy to this minimum level and then make sure locking is used correctly.. > The idea would exist, but we can just call cpufreq_generic_get() and pass it > policy->clk if it is not NULL. Does that work for you? No. Not all drivers implement clk interface. And so clk doesn't look to be the right parameter. I thought maybe 'policy' can be the right parameter and then people can get use policy->cpu to get cpu id out of it. But even that doesn't look to be a great idea. X86 drivers may share policy structure for CPUs that don't actually share a clock line. And so they do need right CPU number as parameter instead of policy. As they might be doing some tricky stuff there. Also, we need to make sure that ->get() returns the frequency at which CPU x is running. >> So the real question here is: What fields of 'policy' do we need to >> guarantee >> to be initialized before policy is made available for others? And probably >> that is what we need to fix. > > That's going to be hard to define since future uses could be looking at > different fields. What is the API semantics of cpufreq_cpu_get(). I can't > ever imagine it being correct for any API to return a partially initialized > data structure. I do agree that we can't broadcast 'policy' before it is usable. And that's why I am asking about the right place where we are sure that it is ready.. >> Even your current solution would break things. For example, below will >> happen >> before policy is set in per-cpu variable: >> - CPUFREQ_CREATE_POLICY notifier will be fired and calls to do >> cpufreq_cpu_get() >> there will fail. And hence cpufreq-stats sysfs will break. > > I did this on 3.12. I'll fix it up to handle this one. This can be moved down without much issues I believe. >> - Governors also use cpufreq_cpu_get() and so those would also fail as >> they >> are started from cpufreq_init_policy().. > > The only use of this in governors is in update_sampling_rate() and the > behavior after this fix would be correct in that case. If the policy is not > fully initialized -- that update doesn't get to go through. hmm.. but even that looks odd. We have reached upto governors but policy isn't available to be used? It should have been ready by now? > All other uses of this API fall under one of these categories: > * CPUs are already fully offline whenever it's called. > * Already get the real policy as part of the notifier so don't need to > call cpufreq_cpu_get > > If I find any that doesn't fit the above, I would be glad to fix that if > it's pointed out. I have just sent out a patchset to you and others, would be great if you can give it a review/test .. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-25 8:50 ` Viresh Kumar @ 2014-02-25 13:04 ` Rafael J. Wysocki 2014-02-25 14:40 ` Viresh Kumar 2014-02-25 21:11 ` Saravana Kannan 0 siblings, 2 replies; 27+ messages in thread From: Rafael J. Wysocki @ 2014-02-25 13:04 UTC (permalink / raw) To: Viresh Kumar Cc: Saravana Kannan, Srivatsa S. Bhat, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On Tuesday, February 25, 2014 02:20:57 PM Viresh Kumar wrote: > On 25 February 2014 01:53, Saravana Kannan <skannan@codeaurora.org> wrote: > > I was simplifying the scenario that causes it. We change the min/max using > > ADJUST notifiers for multiple reasons -- thermal being one of them. > > > > thermal/cpu_cooling is one example of it. > > Just to understand the clear picture, you are actually hitting this bug? Or > is this only a theoretical bug? > > > So, cpufreq_update_policy() can be called on any CPU. If that races with > > someone offlining a CPU and onlining it, you'll get this crash. > > Then shouldn't that be fixed by locks? I think yes. That makes me agree with > Srivatsa more here. > > Though I would say that your argument was also valid that 'policy' shouldn't be > up for sale unless it is prepared to. And for that reason only I > floated that question > earlier: What exactly we need to make sure is initialized in policy? Because > policy might keep changing in future as well and that needs locks to protect > that stuff. Like min/max/governor/ etc.. Well, that depends on what the current users expect it to look like initially. It should be initialized to the point in which all of them can handle it correctly. > So, probably a solution here might be a mix of both. Initialize policy to this > minimum level and then make sure locking is used correctly.. Yes. > > The idea would exist, but we can just call cpufreq_generic_get() and pass it > > policy->clk if it is not NULL. Does that work for you? > > No. Not all drivers implement clk interface. And so clk doesn't look to be the > right parameter. I thought maybe 'policy' can be the right parameter and > then people can get use policy->cpu to get cpu id out of it. > > But even that doesn't look to be a great idea. X86 drivers may share policy > structure for CPUs that don't actually share a clock line. And so they do need > right CPU number as parameter instead of policy. As they might be doing > some tricky stuff there. Also, we need to make sure that ->get() returns > the frequency at which CPU x is running. That's not going to work in at least some cases anyway, because for some types of HW we simply can't retrieve the current frequency in a non-racy way. Thanks! -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-25 13:04 ` Rafael J. Wysocki @ 2014-02-25 14:40 ` Viresh Kumar 2014-02-25 21:11 ` Saravana Kannan 1 sibling, 0 replies; 27+ messages in thread From: Viresh Kumar @ 2014-02-25 14:40 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Saravana Kannan, Srivatsa S. Bhat, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 25 February 2014 18:34, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > Well, that depends on what the current users expect it to look like initially. > It should be initialized to the point in which all of them can handle it > correctly. Exactly. > That's not going to work in at least some cases anyway, because for some types > of HW we simply can't retrieve the current frequency in a non-racy way. I agree. But this is all that we can guarantee here :) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-25 13:04 ` Rafael J. Wysocki 2014-02-25 14:40 ` Viresh Kumar @ 2014-02-25 21:11 ` Saravana Kannan 2014-02-25 22:41 ` Rafael J. Wysocki 2014-02-26 5:20 ` [PATCH] " Viresh Kumar 1 sibling, 2 replies; 27+ messages in thread From: Saravana Kannan @ 2014-02-25 21:11 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Viresh Kumar, Srivatsa S. Bhat, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 02/25/2014 05:04 AM, Rafael J. Wysocki wrote: > On Tuesday, February 25, 2014 02:20:57 PM Viresh Kumar wrote: >> On 25 February 2014 01:53, Saravana Kannan <skannan@codeaurora.org> wrote: >>> I was simplifying the scenario that causes it. We change the min/max using >>> ADJUST notifiers for multiple reasons -- thermal being one of them. >>> >>> thermal/cpu_cooling is one example of it. >> >> Just to understand the clear picture, you are actually hitting this bug? Or >> is this only a theoretical bug? >> This is a real bug. But the actual caller of cpufreq_update_policy() is a driver that's local to our tree. I'm just giving examples of upstream code that act in a similar way. >>> So, cpufreq_update_policy() can be called on any CPU. If that races with >>> someone offlining a CPU and onlining it, you'll get this crash. >> >> Then shouldn't that be fixed by locks? I think yes. That makes me agree with >> Srivatsa more here. >> >> Though I would say that your argument was also valid that 'policy' shouldn't be >> up for sale unless it is prepared to. And for that reason only I >> floated that question >> earlier: What exactly we need to make sure is initialized in policy? Because >> policy might keep changing in future as well and that needs locks to protect >> that stuff. Like min/max/governor/ etc.. > > Well, that depends on what the current users expect it to look like initially. > It should be initialized to the point in which all of them can handle it > correctly. Yes, so let's not make it available until all of it is initialized. I don't like the piece meal check. 6 months down the lane someone making changes might not remember this. The problem also applies for drivers that might not be upstreamed, etc. >> So, probably a solution here might be a mix of both. Initialize policy to this >> minimum level and then make sure locking is used correctly.. > > Yes. Rafael, It's not clear what you mean by the yes. Do you want to initialize it partly and make it available. I think that's always wrong. >>> The idea would exist, but we can just call cpufreq_generic_get() and pass it >>> policy->clk if it is not NULL. Does that work for you? >> >> No. Not all drivers implement clk interface. And so clk doesn't look to be the >> right parameter. I thought maybe 'policy' can be the right parameter and >> then people can get use policy->cpu to get cpu id out of it. >> >> But even that doesn't look to be a great idea. X86 drivers may share policy >> structure for CPUs that don't actually share a clock line. And so they do need >> right CPU number as parameter instead of policy. As they might be doing >> some tricky stuff there. Also, we need to make sure that ->get() returns >> the frequency at which CPU x is running. > > That's not going to work in at least some cases anyway, because for some types > of HW we simply can't retrieve the current frequency in a non-racy way. I think there's been a misunderstanding of what I'm proposing. The reference to policy->clk is only to get rid of the dependency that cpufreq_generic_get() has on the per cpu policy variable being filled. You can do that by just removing calls to get _IF_ clk is filled in. Viresh, I'll look at the patches later today and respond. Although, I would have been nice you had helped me fix any issues with my patch than coming up with new ones. Kinda removes to motivation for submitting patches upstream. Regards, Saravana -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-25 21:11 ` Saravana Kannan @ 2014-02-25 22:41 ` Rafael J. Wysocki 2014-02-26 1:48 ` Saravana Kannan ` (3 more replies) 2014-02-26 5:20 ` [PATCH] " Viresh Kumar 1 sibling, 4 replies; 27+ messages in thread From: Rafael J. Wysocki @ 2014-02-25 22:41 UTC (permalink / raw) To: Saravana Kannan Cc: Viresh Kumar, Srivatsa S. Bhat, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On Tuesday, February 25, 2014 01:11:27 PM Saravana Kannan wrote: > On 02/25/2014 05:04 AM, Rafael J. Wysocki wrote: > > On Tuesday, February 25, 2014 02:20:57 PM Viresh Kumar wrote: > >> On 25 February 2014 01:53, Saravana Kannan <skannan@codeaurora.org> wrote: > >>> I was simplifying the scenario that causes it. We change the min/max using > >>> ADJUST notifiers for multiple reasons -- thermal being one of them. > >>> > >>> thermal/cpu_cooling is one example of it. > >> > >> Just to understand the clear picture, you are actually hitting this bug? Or > >> is this only a theoretical bug? > >> > This is a real bug. But the actual caller of cpufreq_update_policy() is > a driver that's local to our tree. I'm just giving examples of upstream > code that act in a similar way. > > >>> So, cpufreq_update_policy() can be called on any CPU. If that races with > >>> someone offlining a CPU and onlining it, you'll get this crash. > >> > >> Then shouldn't that be fixed by locks? I think yes. That makes me agree with > >> Srivatsa more here. > >> > >> Though I would say that your argument was also valid that 'policy' shouldn't be > >> up for sale unless it is prepared to. And for that reason only I > >> floated that question > >> earlier: What exactly we need to make sure is initialized in policy? Because > >> policy might keep changing in future as well and that needs locks to protect > >> that stuff. Like min/max/governor/ etc.. > > > > Well, that depends on what the current users expect it to look like initially. > > It should be initialized to the point in which all of them can handle it > > correctly. > > Yes, so let's not make it available until all of it is initialized. And is "fully initialized" actually well defined? > I don't like the piece meal check. 6 months down the lane someone making > changes might not remember this. The problem also applies for drivers > that might not be upstreamed, etc. Please don't bring up out-of-the-tree drivers argument in mainline discussions, it is meaningless here. > >> So, probably a solution here might be a mix of both. Initialize policy to this > >> minimum level and then make sure locking is used correctly.. > > > > Yes. > > Rafael, It's not clear what you mean by the yes. Do you want to > initialize it partly and make it available. I think that's always wrong. So what actually is your porposal? > >>> The idea would exist, but we can just call cpufreq_generic_get() and pass it > >>> policy->clk if it is not NULL. Does that work for you? > >> > >> No. Not all drivers implement clk interface. And so clk doesn't look to be the > >> right parameter. I thought maybe 'policy' can be the right parameter and > >> then people can get use policy->cpu to get cpu id out of it. > >> > >> But even that doesn't look to be a great idea. X86 drivers may share policy > >> structure for CPUs that don't actually share a clock line. And so they do need > >> right CPU number as parameter instead of policy. As they might be doing > >> some tricky stuff there. Also, we need to make sure that ->get() returns > >> the frequency at which CPU x is running. > > > > That's not going to work in at least some cases anyway, because for some types > > of HW we simply can't retrieve the current frequency in a non-racy way. > > I think there's been a misunderstanding of what I'm proposing. The > reference to policy->clk is only to get rid of the dependency that > cpufreq_generic_get() has on the per cpu policy variable being filled. > You can do that by just removing calls to get _IF_ clk is filled in. I still have a little problem understanding what you mean exactly. At least please explain the last sentence. > Viresh, > > I'll look at the patches later today and respond. Although, I would have > been nice you had helped me fix any issues with my patch than coming up > with new ones. Kinda removes to motivation for submitting patches upstream. Everyone is free to post patches at any time during the discussion. Viresh is as well as you are. Thanks! -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-25 22:41 ` Rafael J. Wysocki @ 2014-02-26 1:48 ` Saravana Kannan 2014-02-26 6:02 ` Viresh Kumar 2014-02-26 3:38 ` [PATCH 1/3] cpufreq: stats: Remove redundant cpufreq_cpu_get() call Saravana Kannan ` (2 subsequent siblings) 3 siblings, 1 reply; 27+ messages in thread From: Saravana Kannan @ 2014-02-26 1:48 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Viresh Kumar, Srivatsa S. Bhat, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 02/25/2014 02:41 PM, Rafael J. Wysocki wrote: > On Tuesday, February 25, 2014 01:11:27 PM Saravana Kannan wrote: >> On 02/25/2014 05:04 AM, Rafael J. Wysocki wrote: >>> On Tuesday, February 25, 2014 02:20:57 PM Viresh Kumar wrote: >>>> On 25 February 2014 01:53, Saravana Kannan <skannan@codeaurora.org> wrote: >>>>> I was simplifying the scenario that causes it. We change the min/max using >>>>> ADJUST notifiers for multiple reasons -- thermal being one of them. >>>>> >>>>> thermal/cpu_cooling is one example of it. >>>> >>>> Just to understand the clear picture, you are actually hitting this bug? Or >>>> is this only a theoretical bug? >>>> >> This is a real bug. But the actual caller of cpufreq_update_policy() is >> a driver that's local to our tree. I'm just giving examples of upstream >> code that act in a similar way. >> >>>>> So, cpufreq_update_policy() can be called on any CPU. If that races with >>>>> someone offlining a CPU and onlining it, you'll get this crash. >>>> >>>> Then shouldn't that be fixed by locks? I think yes. That makes me agree with >>>> Srivatsa more here. >>>> >>>> Though I would say that your argument was also valid that 'policy' shouldn't be >>>> up for sale unless it is prepared to. And for that reason only I >>>> floated that question >>>> earlier: What exactly we need to make sure is initialized in policy? Because >>>> policy might keep changing in future as well and that needs locks to protect >>>> that stuff. Like min/max/governor/ etc.. >>> >>> Well, that depends on what the current users expect it to look like initially. >>> It should be initialized to the point in which all of them can handle it >>> correctly. >> >> Yes, so let's not make it available until all of it is initialized. > > And is "fully initialized" actually well defined? The point in add dev/hot plug path after which we will no longer change policy fields without sending further CPUFREQ_UPDATE_POLICY_CPU / CPUFRE_NOTIFY notifiers. Pretty much the end of __cpufreq_add_dev() so that it's after: - cpufreq_init_policy() - And the update of userpolicy fields that after thie init call >> I don't like the piece meal check. 6 months down the lane someone making >> changes might not remember this. The problem also applies for drivers >> that might not be upstreamed, etc. > > Please don't bring up out-of-the-tree drivers argument in mainline discussions, > it is meaningless here. Sure. This also applies to in-tree code. The 6 months down the lane, what fields are being looked at could still change and we might slip up on making sure that the policy is not made available before that field is initialized. >>>> So, probably a solution here might be a mix of both. Initialize policy to this >>>> minimum level and then make sure locking is used correctly.. >>> >>> Yes. >> >> Rafael, It's not clear what you mean by the yes. Do you want to >> initialize it partly and make it available. I think that's always wrong. > > So what actually is your porposal? > Same as reply to "fully initialized". We make the policy after it's fully initialized. After that point, any changes to the policy can be tracked by registering for CPUFREQ_UPDATE_POLICY_CPU / CPUFREQ_NOTIFY. >>>>> The idea would exist, but we can just call cpufreq_generic_get() and pass it >>>>> policy->clk if it is not NULL. Does that work for you? >>>> >>>> No. Not all drivers implement clk interface. And so clk doesn't look to be the >>>> right parameter. I thought maybe 'policy' can be the right parameter and >>>> then people can get use policy->cpu to get cpu id out of it. >>>> >>>> But even that doesn't look to be a great idea. X86 drivers may share policy >>>> structure for CPUs that don't actually share a clock line. And so they do need >>>> right CPU number as parameter instead of policy. As they might be doing >>>> some tricky stuff there. Also, we need to make sure that ->get() returns >>>> the frequency at which CPU x is running. >>> >>> That's not going to work in at least some cases anyway, because for some types >>> of HW we simply can't retrieve the current frequency in a non-racy way. >> >> I think there's been a misunderstanding of what I'm proposing. The >> reference to policy->clk is only to get rid of the dependency that >> cpufreq_generic_get() has on the per cpu policy variable being filled. >> You can do that by just removing calls to get _IF_ clk is filled in. > > I still have a little problem understanding what you mean exactly. At least > please explain the last sentence. Ok, here's some pseudo code to explain it better: Something like, replace all calls to cpufreq_driver->get with __cpufreq_driver_get() with the fn being something like: unsigned int __cpufreq_driver_get(cpufreq_policy *policy) { if (policy->clk) return clk_get_rate(policy->clk) / 1000; else return cpufreq_driver->get(policy->cpu); } That way, we don't have to depend on cpufreq_cpu_get() to return a policy in cpufreq_generic_get() for the existing add dev code path to complete without an error (after my patch to advertise the policy struct after it's fully initialized) This also avoid unnecessary function calls to ->get() when we just as well know that it's going to call back into cpufreq_generic_get() and get the clock rate. We can just do that in the framework. -Saravana -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-26 1:48 ` Saravana Kannan @ 2014-02-26 6:02 ` Viresh Kumar 2014-02-26 20:20 ` Saravana Kannan 0 siblings, 1 reply; 27+ messages in thread From: Viresh Kumar @ 2014-02-26 6:02 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Srivatsa S. Bhat, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 26 February 2014 07:18, Saravana Kannan <skannan@codeaurora.org> wrote: > On 02/25/2014 02:41 PM, Rafael J. Wysocki wrote: >> And is "fully initialized" actually well defined? > > The point in add dev/hot plug path after which we will no longer change > policy fields without sending further CPUFREQ_UPDATE_POLICY_CPU / > CPUFRE_NOTIFY notifiers. Okay.. > Pretty much the end of __cpufreq_add_dev() so that it's after: > - cpufreq_init_policy() > - And the update of userpolicy fields that after thie init call No. In that case it can be considered initialized before cpufreq_init_policy(). As we do send CPUFREQ_NOTIFY after that from cpufreq_init_policy()-> cpufreq_set_policy(). There are two types of fields within policy, some are very basic: cpu/min/max/ affected_cpus/related_cpus some are advanced: sysfs/governors/.. And as a rule you have to get policy->rwsem lock before accessing policy members. We might not have followed it very well for small things like cpu. And so if you are doing anything over that, please use a lock and that is already present in cpufreq_update_policy(). With my latest patchset that I sent yesterday, locking is improved and now a policy will be usable only after the rwsem is released. And that should be fine. And so making it available in the per-cpu variable after all the necessary fields are filled looks fine to me. And so I don't think we need to move it after call to cpufreq_init_policy(maybe a better name to this function is required).. > Ok, here's some pseudo code to explain it better: > > Something like, replace all calls to cpufreq_driver->get with > __cpufreq_driver_get() with the fn being something like: > > unsigned int __cpufreq_driver_get(cpufreq_policy *policy) > { > if (policy->clk) > return clk_get_rate(policy->clk) / 1000; > else > return cpufreq_driver->get(policy->cpu); This part may still use cpufreq_cpu_get(). > } Drivers are free to have their implementation of ->get() even if they have a valid policy->clk field.. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-26 6:02 ` Viresh Kumar @ 2014-02-26 20:20 ` Saravana Kannan 0 siblings, 0 replies; 27+ messages in thread From: Saravana Kannan @ 2014-02-26 20:20 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael J. Wysocki, Srivatsa S. Bhat, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 02/25/2014 10:02 PM, Viresh Kumar wrote: > On 26 February 2014 07:18, Saravana Kannan <skannan@codeaurora.org> wrote: >> On 02/25/2014 02:41 PM, Rafael J. Wysocki wrote: > >>> And is "fully initialized" actually well defined? >> >> The point in add dev/hot plug path after which we will no longer change >> policy fields without sending further CPUFREQ_UPDATE_POLICY_CPU / >> CPUFRE_NOTIFY notifiers. > > Okay.. > >> Pretty much the end of __cpufreq_add_dev() so that it's after: >> - cpufreq_init_policy() >> - And the update of userpolicy fields that after thie init call > > No. In that case it can be considered initialized before cpufreq_init_policy(). > As we do send CPUFREQ_NOTIFY after that from cpufreq_init_policy()-> > cpufreq_set_policy(). Ok, valid hole in my definition of "fully initialized". > > There are two types of fields within policy, some are very basic: cpu/min/max/ > affected_cpus/related_cpus > > some are advanced: sysfs/governors/.. > > And as a rule you have to get policy->rwsem lock before accessing policy > members. We might not have followed it very well for small things like cpu. > > And so if you are doing anything over that, please use a lock and that is > already present in cpufreq_update_policy(). > > With my latest patchset that I sent yesterday, locking is improved and now > a policy will be usable only after the rwsem is released. And that should be > fine. And so making it available in the per-cpu variable after all the necessary > fields are filled looks fine to me. And so I don't think we need to move it > after call to cpufreq_init_policy(maybe a better name to this function is > required).. I'll take a closer look. Internal tree cpufreq code is in 3.12, so back-porting all the cpufreq changes and testing it can take a bit of time. Will get back on this. -Saravana -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 1/3] cpufreq: stats: Remove redundant cpufreq_cpu_get() call 2014-02-25 22:41 ` Rafael J. Wysocki 2014-02-26 1:48 ` Saravana Kannan @ 2014-02-26 3:38 ` Saravana Kannan 2014-02-26 5:06 ` Viresh Kumar 2014-02-26 3:38 ` [PATCH 2/3] cpufreq: stats: Fix error handling in __cpufreq_stats_create_table() Saravana Kannan 2014-02-26 3:38 ` [PATCH 3/3] cpufreq: Set policy to non-NULL only after all hotplug online work is done Saravana Kannan 3 siblings, 1 reply; 27+ messages in thread From: Saravana Kannan @ 2014-02-26 3:38 UTC (permalink / raw) To: Rafael J. Wysocki, Viresh Kumar Cc: linux-pm, linux-kernel, linux-arm-msm, linux-arm-kernel, Saravana Kannan __cpufreq_stats_create_table always gets pass the valid and real policy struct. So, there's no need to call cpufreq_cpu_get() to get the policy again. Change-Id: I0136b3e67018ee3af2335906407f55d8c6219f71 Signed-off-by: Saravana Kannan <skannan@codeaurora.org> --- Viresh/Rafael, These 3 patches is the approximate code I have in mind. Approximate because: * I inserted one question as a comment into the code. * If the patch doesn't have any bugs, the plan is to remove cpufreq_generic_get() and references to it. This takes care of the "don't advertise before it's ready for use" rule. Viresh, I think the locking updates needs to be done in addition to this. Regards, Saravana drivers/cpufreq/cpufreq_stats.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index 5793e14..e4bd27f 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -185,7 +185,6 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, { unsigned int i, j, count = 0, ret = 0; struct cpufreq_stats *stat; - struct cpufreq_policy *current_policy; unsigned int alloc_size; unsigned int cpu = policy->cpu; if (per_cpu(cpufreq_stats_table, cpu)) @@ -194,13 +193,7 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, if ((stat) == NULL) return -ENOMEM; - current_policy = cpufreq_cpu_get(cpu); - if (current_policy == NULL) { - ret = -EINVAL; - goto error_get_fail; - } - - ret = sysfs_create_group(¤t_policy->kobj, &stats_attr_group); + ret = sysfs_create_group(&policy->kobj, &stats_attr_group); if (ret) goto error_out; @@ -243,11 +236,8 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, stat->last_time = get_jiffies_64(); stat->last_index = freq_table_get_index(stat, policy->cur); spin_unlock(&cpufreq_stats_lock); - cpufreq_cpu_put(current_policy); return 0; error_out: - cpufreq_cpu_put(current_policy); -error_get_fail: kfree(stat); per_cpu(cpufreq_stats_table, cpu) = NULL; return ret; -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/3] cpufreq: stats: Remove redundant cpufreq_cpu_get() call 2014-02-26 3:38 ` [PATCH 1/3] cpufreq: stats: Remove redundant cpufreq_cpu_get() call Saravana Kannan @ 2014-02-26 5:06 ` Viresh Kumar 2014-02-26 20:04 ` Saravana Kannan 0 siblings, 1 reply; 27+ messages in thread From: Viresh Kumar @ 2014-02-26 5:06 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 26 February 2014 09:08, Saravana Kannan <skannan@codeaurora.org> wrote: > __cpufreq_stats_create_table always gets pass the valid and real policy > struct. So, there's no need to call cpufreq_cpu_get() to get the policy > again. > > Change-Id: I0136b3e67018ee3af2335906407f55d8c6219f71 ?? > Signed-off-by: Saravana Kannan <skannan@codeaurora.org> > --- > > Viresh/Rafael, > > These 3 patches is the approximate code I have in mind. > > Approximate because: > * I inserted one question as a comment into the code. > * If the patch doesn't have any bugs, the plan is to remove > cpufreq_generic_get() and references to it. > > This takes care of the "don't advertise before it's ready for use" rule. > > Viresh, > > I think the locking updates needs to be done in addition to this. > > Regards, > Saravana > > drivers/cpufreq/cpufreq_stats.c | 12 +----------- > 1 file changed, 1 insertion(+), 11 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c > index 5793e14..e4bd27f 100644 > --- a/drivers/cpufreq/cpufreq_stats.c > +++ b/drivers/cpufreq/cpufreq_stats.c > @@ -185,7 +185,6 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, > { > unsigned int i, j, count = 0, ret = 0; > struct cpufreq_stats *stat; > - struct cpufreq_policy *current_policy; > unsigned int alloc_size; > unsigned int cpu = policy->cpu; > if (per_cpu(cpufreq_stats_table, cpu)) > @@ -194,13 +193,7 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, > if ((stat) == NULL) > return -ENOMEM; > > - current_policy = cpufreq_cpu_get(cpu); > - if (current_policy == NULL) { > - ret = -EINVAL; > - goto error_get_fail; > - } > - > - ret = sysfs_create_group(¤t_policy->kobj, &stats_attr_group); > + ret = sysfs_create_group(&policy->kobj, &stats_attr_group); > if (ret) > goto error_out; > > @@ -243,11 +236,8 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, > stat->last_time = get_jiffies_64(); > stat->last_index = freq_table_get_index(stat, policy->cur); > spin_unlock(&cpufreq_stats_lock); > - cpufreq_cpu_put(current_policy); > return 0; > error_out: > - cpufreq_cpu_put(current_policy); > -error_get_fail: > kfree(stat); > per_cpu(cpufreq_stats_table, cpu) = NULL; > return ret; I was damn sure that this wasn't a waste of time. This was some meaningful code when I visited it earlier. And we absolutely required a new cpufreq_cpu_get().. Reason: Earlier tables were created for this notifier: CPUFREQ_NOTIFY and it used to come with another changed copy of 'policy' and so we were required to get the real copy of policy to get to the right kobj. But recently I have simplified stuff there and these tables are now added with CPUFREQ_CREATE_POLICY and so this replication isn't required anymore. So, Acked-by: Viresh Kumar <viresh.kumar@linaro.org> While you are at it please get this part into __cpufreq_stats_create_table() routine: table = cpufreq_frequency_get_table(cpu); if (!table) return 0; As it is replicated at two places currently. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/3] cpufreq: stats: Remove redundant cpufreq_cpu_get() call 2014-02-26 5:06 ` Viresh Kumar @ 2014-02-26 20:04 ` Saravana Kannan 0 siblings, 0 replies; 27+ messages in thread From: Saravana Kannan @ 2014-02-26 20:04 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 02/25/2014 09:06 PM, Viresh Kumar wrote: > On 26 February 2014 09:08, Saravana Kannan <skannan@codeaurora.org> wrote: >> __cpufreq_stats_create_table always gets pass the valid and real policy >> struct. So, there's no need to call cpufreq_cpu_get() to get the policy >> again. >> >> Change-Id: I0136b3e67018ee3af2335906407f55d8c6219f71 > > ?? > >> Signed-off-by: Saravana Kannan <skannan@codeaurora.org> >> --- >> >> Viresh/Rafael, >> >> These 3 patches is the approximate code I have in mind. >> >> Approximate because: >> * I inserted one question as a comment into the code. >> * If the patch doesn't have any bugs, the plan is to remove >> cpufreq_generic_get() and references to it. >> >> This takes care of the "don't advertise before it's ready for use" rule. >> >> Viresh, >> >> I think the locking updates needs to be done in addition to this. >> >> Regards, >> Saravana >> >> drivers/cpufreq/cpufreq_stats.c | 12 +----------- >> 1 file changed, 1 insertion(+), 11 deletions(-) >> >> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c >> index 5793e14..e4bd27f 100644 >> --- a/drivers/cpufreq/cpufreq_stats.c >> +++ b/drivers/cpufreq/cpufreq_stats.c >> @@ -185,7 +185,6 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, >> { >> unsigned int i, j, count = 0, ret = 0; >> struct cpufreq_stats *stat; >> - struct cpufreq_policy *current_policy; >> unsigned int alloc_size; >> unsigned int cpu = policy->cpu; >> if (per_cpu(cpufreq_stats_table, cpu)) >> @@ -194,13 +193,7 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, >> if ((stat) == NULL) >> return -ENOMEM; >> >> - current_policy = cpufreq_cpu_get(cpu); >> - if (current_policy == NULL) { >> - ret = -EINVAL; >> - goto error_get_fail; >> - } >> - >> - ret = sysfs_create_group(¤t_policy->kobj, &stats_attr_group); >> + ret = sysfs_create_group(&policy->kobj, &stats_attr_group); >> if (ret) >> goto error_out; >> >> @@ -243,11 +236,8 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, >> stat->last_time = get_jiffies_64(); >> stat->last_index = freq_table_get_index(stat, policy->cur); >> spin_unlock(&cpufreq_stats_lock); >> - cpufreq_cpu_put(current_policy); >> return 0; >> error_out: >> - cpufreq_cpu_put(current_policy); >> -error_get_fail: >> kfree(stat); >> per_cpu(cpufreq_stats_table, cpu) = NULL; >> return ret; > > I was damn sure that this wasn't a waste of time. This was some meaningful > code when I visited it earlier. And we absolutely required a new > cpufreq_cpu_get().. > > Reason: Earlier tables were created for this notifier: CPUFREQ_NOTIFY and > it used to come with another changed copy of 'policy' and so we were required > to get the real copy of policy to get to the right kobj. > > But recently I have simplified stuff there and these tables are now added with > CPUFREQ_CREATE_POLICY and so this replication isn't required anymore. Agreed. I already knew it had a good reason. :) Just that it's not needed anymore. > So, Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Thanks. > > While you are at it please get this part into __cpufreq_stats_create_table() > routine: > > table = cpufreq_frequency_get_table(cpu); > if (!table) > return 0; > > As it is replicated at two places currently. > Doing it as a separate patch since it's technically unrelated to these changes. -Saravana -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 2/3] cpufreq: stats: Fix error handling in __cpufreq_stats_create_table() 2014-02-25 22:41 ` Rafael J. Wysocki 2014-02-26 1:48 ` Saravana Kannan 2014-02-26 3:38 ` [PATCH 1/3] cpufreq: stats: Remove redundant cpufreq_cpu_get() call Saravana Kannan @ 2014-02-26 3:38 ` Saravana Kannan 2014-02-26 5:11 ` Viresh Kumar 2014-02-26 3:38 ` [PATCH 3/3] cpufreq: Set policy to non-NULL only after all hotplug online work is done Saravana Kannan 3 siblings, 1 reply; 27+ messages in thread From: Saravana Kannan @ 2014-02-26 3:38 UTC (permalink / raw) To: Rafael J. Wysocki, Viresh Kumar Cc: linux-pm, linux-kernel, linux-arm-msm, linux-arm-kernel, Saravana Kannan Remove sysfs group if __cpufreq_stats_create_table() fails after creating one. Change-Id: Icb0b44424cc4eb6c88be255e2839ef51c3f8779c Signed-off-by: Saravana Kannan <skannan@codeaurora.org> --- drivers/cpufreq/cpufreq_stats.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index e4bd27f..c52b440 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -216,7 +216,7 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, stat->time_in_state = kzalloc(alloc_size, GFP_KERNEL); if (!stat->time_in_state) { ret = -ENOMEM; - goto error_out; + goto error_alloc; } stat->freq_table = (unsigned int *)(stat->time_in_state + count); @@ -237,6 +237,8 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, stat->last_index = freq_table_get_index(stat, policy->cur); spin_unlock(&cpufreq_stats_lock); return 0; +error_alloc: + sysfs_remove_group(&policy->kobj, &stats_attr_group); error_out: kfree(stat); per_cpu(cpufreq_stats_table, cpu) = NULL; -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 2/3] cpufreq: stats: Fix error handling in __cpufreq_stats_create_table() 2014-02-26 3:38 ` [PATCH 2/3] cpufreq: stats: Fix error handling in __cpufreq_stats_create_table() Saravana Kannan @ 2014-02-26 5:11 ` Viresh Kumar 0 siblings, 0 replies; 27+ messages in thread From: Viresh Kumar @ 2014-02-26 5:11 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 26 February 2014 09:08, Saravana Kannan <skannan@codeaurora.org> wrote: > Remove sysfs group if __cpufreq_stats_create_table() fails after creating > one. > > Change-Id: Icb0b44424cc4eb6c88be255e2839ef51c3f8779c > Signed-off-by: Saravana Kannan <skannan@codeaurora.org> > --- > drivers/cpufreq/cpufreq_stats.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c > index e4bd27f..c52b440 100644 > --- a/drivers/cpufreq/cpufreq_stats.c > +++ b/drivers/cpufreq/cpufreq_stats.c > @@ -216,7 +216,7 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, > stat->time_in_state = kzalloc(alloc_size, GFP_KERNEL); > if (!stat->time_in_state) { > ret = -ENOMEM; > - goto error_out; > + goto error_alloc; > } > stat->freq_table = (unsigned int *)(stat->time_in_state + count); > > @@ -237,6 +237,8 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy, > stat->last_index = freq_table_get_index(stat, policy->cur); > spin_unlock(&cpufreq_stats_lock); > return 0; > +error_alloc: > + sysfs_remove_group(&policy->kobj, &stats_attr_group); > error_out: > kfree(stat); > per_cpu(cpufreq_stats_table, cpu) = NULL; Acked-by: Viresh Kumar <viresh.kumar@linaro.org> ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 3/3] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-25 22:41 ` Rafael J. Wysocki ` (2 preceding siblings ...) 2014-02-26 3:38 ` [PATCH 2/3] cpufreq: stats: Fix error handling in __cpufreq_stats_create_table() Saravana Kannan @ 2014-02-26 3:38 ` Saravana Kannan 2014-02-26 6:14 ` Viresh Kumar 3 siblings, 1 reply; 27+ messages in thread From: Saravana Kannan @ 2014-02-26 3:38 UTC (permalink / raw) To: Rafael J. Wysocki, Viresh Kumar Cc: linux-pm, linux-kernel, linux-arm-msm, linux-arm-kernel, Saravana Kannan The existing code sets the per CPU policy to a non-NULL value before all the steps performed during the hotplug online path is done. Specifically, this is done before the policy min/max, governors, etc are initialized for the policy. This in turn means that calls to cpufreq_cpu_get() return a non-NULL policy before the policy/CPU is ready to be used. To fix this, move the update of per CPU policy to a valid value after all the initialization steps for the policy are completed. Example kernel panic without this fix: [ 512.146185] Unable to handle kernel NULL pointer dereference at virtual address 00000020 [ 512.146195] pgd = c0003000 [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000 [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM <snip> [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150 <snip> [ 512.149740] ---[ end trace f23a8defea6cd706 ]--- [ 512.149761] Kernel panic - not syncing: Fatal exception [ 513.152016] CPU0: stopping [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W 3.10.0-gd727407-00074-g979ede8 #396 <snip> [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68) [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44) [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc) [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78) [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74) [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c) [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180) [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68) [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30) In this specific case, thread A set's CPU1's policy->governor in cpufreq_init_policy() to NULL while thread B is using the policy->governor in __cpufreq_governor(). Change-Id: I0f6f4e51ac3b7127a1ea56a1cb8e7ae1bcf8d6b6 Signed-off-by: Saravana Kannan <skannan@codeaurora.org> --- drivers/cpufreq/cpufreq.c | 52 ++++++++++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 21 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index cb003a6..5caefa9 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -849,7 +849,7 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy, goto err_out_kobj_put; drv_attr++; } - if (cpufreq_driver->get) { + if (cpufreq_driver->get || policy->clk) { ret = sysfs_create_file(&policy->kobj, &cpuinfo_cur_freq.attr); if (ret) goto err_out_kobj_put; @@ -877,6 +877,22 @@ err_out_kobj_put: return ret; } +static unsigned int __cpufreq_get_freq(struct cpufreq_policy *policy) +{ + unsigned long freq; + + if (policy->clk) { + freq = clk_get_rate(policy->clk); + if(!IS_ERR_VALUE(freq)) + return freq / 1000; + } + + if (cpufreq_driver->get) + return cpufreq_driver->get(policy->cpu); + + return 0; +} + static void cpufreq_init_policy(struct cpufreq_policy *policy) { struct cpufreq_policy new_policy; @@ -1109,17 +1125,10 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, goto err_set_policy_cpu; } - write_lock_irqsave(&cpufreq_driver_lock, flags); - for_each_cpu(j, policy->cpus) - per_cpu(cpufreq_cpu_data, j) = policy; - write_unlock_irqrestore(&cpufreq_driver_lock, flags); - - if (cpufreq_driver->get) { - policy->cur = cpufreq_driver->get(policy->cpu); - if (!policy->cur) { - pr_err("%s: ->get() failed\n", __func__); - goto err_get_freq; - } + policy->cur = __cpufreq_get_freq(policy); + if (!policy->cur) { + pr_err("%s: get freq failed\n", __func__); + goto err_get_freq; } /* @@ -1207,6 +1216,11 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, policy->user_policy.governor = policy->governor; } + write_lock_irqsave(&cpufreq_driver_lock, flags); + for_each_cpu(j, policy->cpus) + per_cpu(cpufreq_cpu_data, j) = policy; + write_unlock_irqrestore(&cpufreq_driver_lock, flags); + kobject_uevent(&policy->kobj, KOBJ_ADD); up_read(&cpufreq_rwsem); @@ -1216,11 +1230,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, err_out_unregister: err_get_freq: - write_lock_irqsave(&cpufreq_driver_lock, flags); - for_each_cpu(j, policy->cpus) - per_cpu(cpufreq_cpu_data, j) = NULL; - write_unlock_irqrestore(&cpufreq_driver_lock, flags); - if (cpufreq_driver->exit) cpufreq_driver->exit(policy); err_set_policy_cpu: @@ -1482,6 +1491,8 @@ unsigned int cpufreq_quick_get(unsigned int cpu) struct cpufreq_policy *policy; unsigned int ret_freq = 0; + /* What's up with the setpolicy check? Why the call to get only for + * arch's that implement setpolicy? */ if (cpufreq_driver && cpufreq_driver->setpolicy && cpufreq_driver->get) return cpufreq_driver->get(cpu); @@ -1520,11 +1531,10 @@ static unsigned int __cpufreq_get(unsigned int cpu) struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu); unsigned int ret_freq = 0; - if (!cpufreq_driver->get) + ret_freq = __cpufreq_get_freq(policy); + if (!ret_freq) return ret_freq; - ret_freq = cpufreq_driver->get(cpu); - if (ret_freq && policy->cur && !(cpufreq_driver->flags & CPUFREQ_CONST_LOOPS)) { /* verify no discrepancy between actual and @@ -2148,7 +2158,7 @@ int cpufreq_update_policy(unsigned int cpu) * BIOS might change freq behind our back * -> ask driver for current freq and notify governors about a change */ - if (cpufreq_driver->get) { + if (cpufreq_driver->get || policy->clk) { new_policy.cur = cpufreq_driver->get(cpu); if (!policy->cur) { pr_debug("Driver did not initialize current freq"); -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-26 3:38 ` [PATCH 3/3] cpufreq: Set policy to non-NULL only after all hotplug online work is done Saravana Kannan @ 2014-02-26 6:14 ` Viresh Kumar 0 siblings, 0 replies; 27+ messages in thread From: Viresh Kumar @ 2014-02-26 6:14 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 26 February 2014 09:08, Saravana Kannan <skannan@codeaurora.org> wrote: > The existing code sets the per CPU policy to a non-NULL value before all > the steps performed during the hotplug online path is done. Specifically, > this is done before the policy min/max, governors, etc are initialized for > the policy. This in turn means that calls to cpufreq_cpu_get() return a > non-NULL policy before the policy/CPU is ready to be used. First two patches are fine but I would still say that take the three patches I posted instead of this. Reasoning already given in other mails. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done 2014-02-25 21:11 ` Saravana Kannan 2014-02-25 22:41 ` Rafael J. Wysocki @ 2014-02-26 5:20 ` Viresh Kumar 1 sibling, 0 replies; 27+ messages in thread From: Viresh Kumar @ 2014-02-26 5:20 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Srivatsa S. Bhat, linux-pm, Linux Kernel Mailing List, linux-arm-msm, linux-arm-kernel On 26 February 2014 02:41, Saravana Kannan <skannan@codeaurora.org> wrote: > On 02/25/2014 05:04 AM, Rafael J. Wysocki wrote: >> On Tuesday, February 25, 2014 02:20:57 PM Viresh Kumar wrote: > I think there's been a misunderstanding of what I'm proposing. The reference > to policy->clk is only to get rid of the dependency that > cpufreq_generic_get() has on the per cpu policy variable being filled. You > can do that by just removing calls to get _IF_ clk is filled in. cpufreq_cpu_get() can be called by other drivers as well, which may not have clock interface implemented for them. We can't stop them from calling it. > I'll look at the patches later today and respond. Although, I would have > been nice you had helped me fix any issues with my patch than coming up with > new ones. Kinda removes to motivation for submitting patches upstream. Sorry if I demotivated you at all :) I just wanted to get to a quick-fix and wasn't interested in getting my patch count up. Thought that isn't always bad :) I sent my patches as they were very different then your approach. Obviously, I wouldn't have done this otherwise :) ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2014-02-26 20:20 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-02-24 6:57 [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done Saravana Kannan 2014-02-24 7:42 ` Srivatsa S. Bhat 2014-02-24 8:11 ` Viresh Kumar 2014-02-24 8:41 ` skannan 2014-02-24 8:43 ` Viresh Kumar 2014-02-24 8:47 ` skannan 2014-02-24 8:50 ` Viresh Kumar 2014-02-24 9:00 ` skannan 2014-02-24 8:39 ` skannan 2014-02-24 10:55 ` Viresh Kumar 2014-02-24 20:23 ` Saravana Kannan 2014-02-25 8:50 ` Viresh Kumar 2014-02-25 13:04 ` Rafael J. Wysocki 2014-02-25 14:40 ` Viresh Kumar 2014-02-25 21:11 ` Saravana Kannan 2014-02-25 22:41 ` Rafael J. Wysocki 2014-02-26 1:48 ` Saravana Kannan 2014-02-26 6:02 ` Viresh Kumar 2014-02-26 20:20 ` Saravana Kannan 2014-02-26 3:38 ` [PATCH 1/3] cpufreq: stats: Remove redundant cpufreq_cpu_get() call Saravana Kannan 2014-02-26 5:06 ` Viresh Kumar 2014-02-26 20:04 ` Saravana Kannan 2014-02-26 3:38 ` [PATCH 2/3] cpufreq: stats: Fix error handling in __cpufreq_stats_create_table() Saravana Kannan 2014-02-26 5:11 ` Viresh Kumar 2014-02-26 3:38 ` [PATCH 3/3] cpufreq: Set policy to non-NULL only after all hotplug online work is done Saravana Kannan 2014-02-26 6:14 ` Viresh Kumar 2014-02-26 5:20 ` [PATCH] " Viresh Kumar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).