Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
       [not found]   ` <CAKohpon10=xemOZkvqMTV2JbdTMXPDiW+-v52Qzm8SsNymN-Kg@mail.gmail.com>
@ 2015-02-02  3:38     ` ethan zhao
  2015-02-02  3:43       ` Viresh Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: ethan zhao @ 2015-02-02  3:38 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
	linux-pm, linux-kernel


On 2015/2/2 11:24, Viresh Kumar wrote:
> On 2 February 2015 at 08:50, ethan zhao <ethan.zhao@oracle.com> wrote:
>>   This seems couldn't prevent all the 'bad thing' from happening, E.G.
>>
>>
>>   Thread A: Workqueue: kacpi_notify
>>
>>   acpi_processor_notify()
>>     acpi_processor_ppc_has_changed()
>>           cpufreq_update_policy()
>>             cpufreq_cpu_get()
> We take cpufreq_driver_lock() here, and so this will
> block thread B.
  No, there is no  cpufreq_driver_lock acquired between

  cpufreq_cpu_get()  and cpufreq_cpu_put()


>>             beginning the deference of policy        Thread B:
>>             ... ... __cpufreq_remove_dev_finish()
>> cpufreq_policy_free(policy);
>>
>>
>> Perhaps move policy->rwsem out side the policy structure is a way to avoid
>> it completely.
>> and you could stopping the PPC thread stepping forward as my patch as
>> temporary workaround.
> I couldn't understand your problem completely. Apart from giving a detailed
> look of what's going on both threads, always specify where the BUG actually
> is..
The problem is you are using a rwsem inside policy structure to protect its
assessment, that is bad design.

Thanks,
Ethan


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-02  3:38     ` [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject ethan zhao
@ 2015-02-02  3:43       ` Viresh Kumar
  2015-02-02  3:56         ` ethan zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Viresh Kumar @ 2015-02-02  3:43 UTC (permalink / raw)
  To: ethan zhao
  Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
	linux-pm, linux-kernel

On 2 February 2015 at 09:08, ethan zhao <ethan.zhao@oracle.com> wrote:
>> We take cpufreq_driver_lock() here, and so this will
>> block thread B.
>
>  No, there is no  cpufreq_driver_lock acquired between
>
>  cpufreq_cpu_get()  and cpufreq_cpu_put()

I am not saying that the lock is taken between them. But within
cpufreq_cpu_get() to make sure policy doesn't get freed while we
are doing kobject_get().

>>>             beginning the deference of policy        Thread B:
>>>             ... ... __cpufreq_remove_dev_finish()
>>> cpufreq_policy_free(policy);
>>>
>>>
>>> Perhaps move policy->rwsem out side the policy structure is a way to
>>> avoid
>>> it completely.
>>> and you could stopping the PPC thread stepping forward as my patch as
>>> temporary workaround.
>>
>> I couldn't understand your problem completely. Apart from giving a
>> detailed
>> look of what's going on both threads, always specify where the BUG
>> actually
>> is..
>
> The problem is you are using a rwsem inside policy structure to protect its
> assessment, that is bad design.

What is the current bug you are facing right now, I haven't understood it well.
Also a lock within the structure isn't new. Its all over the kernel. I
don't understand
why you say its a bad design.

--
viresh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-02  3:43       ` Viresh Kumar
@ 2015-02-02  3:56         ` ethan zhao
  2015-02-02  3:59           ` Viresh Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: ethan zhao @ 2015-02-02  3:56 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
	linux-pm, linux-kernel


On 2015/2/2 11:43, Viresh Kumar wrote:
> On 2 February 2015 at 09:08, ethan zhao <ethan.zhao@oracle.com> wrote:
>>> We take cpufreq_driver_lock() here, and so this will
>>> block thread B.
>>   No, there is no  cpufreq_driver_lock acquired between
>>
>>   cpufreq_cpu_get()  and cpufreq_cpu_put()
> I am not saying that the lock is taken between them. But within
> cpufreq_cpu_get() to make sure policy doesn't get freed while we
> are doing kobject_get().
  How to prevent the policy to be freed between

cpufreq_cpu_get()  and cpufreq_cpu_put() ?

>>>>              beginning the deference of policy        Thread B:
>>>>              ... ... __cpufreq_remove_dev_finish()
>>>> cpufreq_policy_free(policy);
>>>>
>>>>
>>>> Perhaps move policy->rwsem out side the policy structure is a way to
>>>> avoid
>>>> it completely.
>>>> and you could stopping the PPC thread stepping forward as my patch as
>>>> temporary workaround.
>>> I couldn't understand your problem completely. Apart from giving a
>>> detailed
>>> look of what's going on both threads, always specify where the BUG
>>> actually
>>> is..
>> The problem is you are using a rwsem inside policy structure to protect its
>> assessment, that is bad design.
> What is the current bug you are facing right now, I haven't understood it well.
> Also a lock within the structure isn't new. Its all over the kernel. I
> don't understand
> why you say its a bad design.
You are maxing up the water with sand ?

Thanks,
Ethan

>
> --
> viresh


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-02  3:56         ` ethan zhao
@ 2015-02-02  3:59           ` Viresh Kumar
  2015-02-02  4:06             ` ethan zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Viresh Kumar @ 2015-02-02  3:59 UTC (permalink / raw)
  To: ethan zhao
  Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
	linux-pm, linux-kernel

On 2 February 2015 at 09:26, ethan zhao <ethan.zhao@oracle.com> wrote:
>  How to prevent the policy to be freed between
>
> cpufreq_cpu_get()  and cpufreq_cpu_put() ?

kobject_get() increases the reference count of a policy and the policy
will only be freed when this is zero. And it will only be zero once all
cpufreq_cpu_get() are matched with corresponding cpufreq_cpu_put().

> You are maxing up the water with sand ?

:)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-02  3:59           ` Viresh Kumar
@ 2015-02-02  4:06             ` ethan zhao
  2015-02-02  4:09               ` Viresh Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: ethan zhao @ 2015-02-02  4:06 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
	linux-pm, linux-kernel



On 2015/2/2 11:59, Viresh Kumar wrote:
> On 2 February 2015 at 09:26, ethan zhao <ethan.zhao@oracle.com> wrote:
>>   How to prevent the policy to be freed between
>>
>> cpufreq_cpu_get()  and cpufreq_cpu_put() ?
> kobject_get() increases the reference count of a policy and the policy
> will only be freed when this is zero. And it will only be zero once all
> cpufreq_cpu_get() are matched with corresponding cpufreq_cpu_put().
  Is that an idea it supposed to be or fact ?

if (!cpufreq_suspended)
             cpufreq_policy_free(policy);

  static void cpufreq_policy_free(struct cpufreq_policy *policy)
{
     free_cpumask_var(policy->related_cpus);
     free_cpumask_var(policy->cpus);
     kfree(policy);
}

  It seems you just think about it ideally in mind.

  Thanks,
  Ethan
>
>> You are maxing up the water with sand ?
> :)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-02  4:06             ` ethan zhao
@ 2015-02-02  4:09               ` Viresh Kumar
  2015-02-02  4:16                 ` ethan zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Viresh Kumar @ 2015-02-02  4:09 UTC (permalink / raw)
  To: ethan zhao
  Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
	linux-pm, linux-kernel

On 2 February 2015 at 09:36, ethan zhao <ethan.zhao@oracle.com> wrote:
>  Is that an idea it supposed to be or fact ?
>
> if (!cpufreq_suspended)
>             cpufreq_policy_free(policy);
>
>  static void cpufreq_policy_free(struct cpufreq_policy *policy)
> {
>     free_cpumask_var(policy->related_cpus);
>     free_cpumask_var(policy->cpus);
>     kfree(policy);
> }
>
>  It seems you just think about it ideally in mind.

if (!cpufreq_suspended)
        cpufreq_policy_put_kobj(policy);

static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
{
        ...

        kobject_put(kobj);

        /*
        * We need to make sure that the underlying kobj is
        * actually not referenced anymore by anybody before we
        * proceed with unloading.
        */
        pr_debug("waiting for dropping of refcount\n");
        wait_for_completion(cmp);
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-02  4:09               ` Viresh Kumar
@ 2015-02-02  4:16                 ` ethan zhao
  2015-02-02  4:26                   ` Viresh Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: ethan zhao @ 2015-02-02  4:16 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
	linux-pm, linux-kernel


On 2015/2/2 12:09, Viresh Kumar wrote:
> On 2 February 2015 at 09:36, ethan zhao <ethan.zhao@oracle.com> wrote:
>>   Is that an idea it supposed to be or fact ?
>>
>> if (!cpufreq_suspended)
>>              cpufreq_policy_free(policy);
>>
>>   static void cpufreq_policy_free(struct cpufreq_policy *policy)
>> {
>>      free_cpumask_var(policy->related_cpus);
>>      free_cpumask_var(policy->cpus);
>>      kfree(policy);
>> }
>>
>>   It seems
>>   you just think about it ideally in mind.
  We am talking about the policy allocation and de-allocation. right ?
  I showed you the cpufreq_policy_free(policy) doesn't check kobject
  refcount  as above.

  Hmmm, you are still sleeping in the kobject, wake up and don't mix
  water anymore.

Thanks,
Ethan


> if (!cpufreq_suspended)
>          cpufreq_policy_put_kobj(policy);
>
> static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
> {
>          ...
>
>          kobject_put(kobj);
>
>          /*
>          * We need to make sure that the underlying kobj is
>          * actually not referenced anymore by anybody before we
>          * proceed with unloading.
>          */
>          pr_debug("waiting for dropping of refcount\n");
>          wait_for_completion(cmp);
> }


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-02  4:16                 ` ethan zhao
@ 2015-02-02  4:26                   ` Viresh Kumar
  2015-02-02  4:45                     ` ethan zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Viresh Kumar @ 2015-02-02  4:26 UTC (permalink / raw)
  To: ethan zhao
  Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
	linux-pm, linux-kernel

On 2 February 2015 at 09:46, ethan zhao <ethan.zhao@oracle.com> wrote:
>  We am talking about the policy allocation and de-allocation. right ?
>  I showed you the cpufreq_policy_free(policy) doesn't check kobject
>  refcount  as above.
>
>  Hmmm, you are still sleeping in the kobject, wake up and don't mix
>  water anymore.

It would be nice if we give each other the respect we deserve, And talk
about concrete points here.

>> if (!cpufreq_suspended)
>>          cpufreq_policy_put_kobj(policy);
>>
>> static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
>> {
>>          ...
>>
>>          kobject_put(kobj);
>>
>>          /*
>>          * We need to make sure that the underlying kobj is
>>          * actually not referenced anymore by anybody before we
>>          * proceed with unloading.
>>          */
>>          pr_debug("waiting for dropping of refcount\n");
>>          wait_for_completion(cmp);
>> }

I gave you exactly what you wanted to go through, but it seems you
haven't tried enough.

Before freeing policy with cpufreq_policy_free(), we call
cpufreq_policy_put_kobj(). Now what does this function do? It waits
for the completion to fire (wait_for_completion()). This completion
will only fire when refcount of a kobject becomes zero.

Initially when we create the kobject, it is initialized to one. And the
last kobject_put() you see above in cpufreq_policy_put_kobj()
makes it zero. All other cpufreq_cpu_get() and put() should happen
in pairs, otherwise this refcount will never be zero again.

As soon as the refcount becomes zero, we are sure no one else is
using the policy structure anymore. And so we free it with
cpufreq_policy_free().

That routines doesn't have any tricks and simply frees the policy.
Because, before calling cpufreq_policy_put_kobj(), we have set
the per-cpu variable to NULL, nobody else will get the policy
structure by calling cpufreq_cpu_get(). And that's what my patch
tried to solve.

Let me know if I wasn't explanatory enough..

--
viresh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-02  4:26                   ` Viresh Kumar
@ 2015-02-02  4:45                     ` ethan zhao
  2015-02-02  4:54                       ` Viresh Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: ethan zhao @ 2015-02-02  4:45 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
	linux-pm, linux-kernel


On 2015/2/2 12:26, Viresh Kumar wrote:
> On 2 February 2015 at 09:46, ethan zhao <ethan.zhao@oracle.com> wrote:
>>   We am talking about the policy allocation and de-allocation. right ?
>>   I showed you the cpufreq_policy_free(policy) doesn't check kobject
>>   refcount  as above.
>>
>>   Hmmm, you are still sleeping in the kobject, wake up and don't mix
>>   water anymore.
> It would be nice if we give each other the respect we deserve, And talk
> about concrete points here.
  Welcome back to the right way.
>>> if (!cpufreq_suspended)
>>>           cpufreq_policy_put_kobj(policy);
>>>
>>> static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
>>> {
>>>           ...
>>>
>>>           kobject_put(kobj);
>>>
>>>           /*
>>>           * We need to make sure that the underlying kobj is
>>>           * actually not referenced anymore by anybody before we
>>>           * proceed with unloading.
>>>           */
>>>           pr_debug("waiting for dropping of refcount\n");
>>>           wait_for_completion(cmp);
>>> }
> I gave you exactly what you wanted to go through, but it seems you
> haven't tried enough.
>
> Before freeing policy with cpufreq_policy_free(), we call
> cpufreq_policy_put_kobj(). Now what does this function do? It waits
> for the completion to fire (wait_for_completion()). This completion
> will only fire when refcount of a kobject becomes zero.
>
> Initially when we create the kobject, it is initialized to one. And the
> last kobject_put() you see above in cpufreq_policy_put_kobj()
> makes it zero. All other cpufreq_cpu_get() and put() should happen
> in pairs, otherwise this refcount will never be zero again.
>
> As soon as the refcount becomes zero, we are sure no one else is
> using the policy structure anymore. And so we free it with
> cpufreq_policy_free().
  But there is no checking against refcount in or before

  cpufreq_policy_free(), that is one issue I mentioned.

>
> That routines doesn't have any tricks and simply frees the policy.
> Because, before calling cpufreq_policy_put_kobj(), we have set
> the per-cpu variable to NULL, nobody else will get the policy
  It is possible cpufreq_cpu_get() within the PPC thread was called just
  before __cpufreq_remove_dev_finish() is to be called in another thread,
  so you set the per_cpu(cpufreq_cpu_data, cpu) to NULL will not prevent
  the actions between cpufreq_cpu_get and cpufreq_cpu_put().

  And then the freeing happens in __cpufreq_remove_dev_finish().
> structure by calling cpufreq_cpu_get(). And that's what my patch
> tried to solve.
>
> Let me know if I wasn't explanatory enough..

Ethan
>
> --
> viresh


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-02  4:45                     ` ethan zhao
@ 2015-02-02  4:54                       ` Viresh Kumar
  2015-02-02 15:04                         ` ethan zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Viresh Kumar @ 2015-02-02  4:54 UTC (permalink / raw)
  To: ethan zhao
  Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
	linux-pm, linux-kernel

On 2 February 2015 at 10:15, ethan zhao <ethan.zhao@oracle.com> wrote:
> On 2015/2/2 12:26, Viresh Kumar wrote:

>  But there is no checking against refcount in or before
>
>  cpufreq_policy_free(), that is one issue I mentioned.

As I said earlier, the completion will only fire once the refcount
is zero. And so there is no need of any check here.

>> That routines doesn't have any tricks and simply frees the policy.
>> Because, before calling cpufreq_policy_put_kobj(), we have set
>> the per-cpu variable to NULL, nobody else will get the policy
>
>  It is possible cpufreq_cpu_get() within the PPC thread was called just
>  before __cpufreq_remove_dev_finish() is to be called in another thread,
>  so you set the per_cpu(cpufreq_cpu_data, cpu) to NULL will not prevent
>  the actions between cpufreq_cpu_get and cpufreq_cpu_put().
>
>  And then the freeing happens in __cpufreq_remove_dev_finish().

It will.. You aren't looking closely enough. If cpufreq_cpu_get() is called just
before remove-dev, then cpufreq_cpu_get() will take:

read_lock_irqsave(&cpufreq_driver_lock, flags);

And it will do:

read_unlock_irqrestore(&cpufreq_driver_lock, flags);

only after increasing the refcount with kobject_get().

While on the other side __cpufreq_remove_dev_finish() will do this:

       write_lock_irqsave(&cpufreq_driver_lock, flags);
       policy = per_cpu(cpufreq_cpu_data, cpu);
       per_cpu(cpufreq_cpu_data, cpu) = NULL;
       write_unlock_irqrestore(&cpufreq_driver_lock, flags);

So, it will wait for the read_lock in cpufreq_cpu_get() to finish before
setting per-cpu variable to NULL. And so, after kobject_put() in
cpufreq_policy_put_kobj(), we will wait for the completion to fire
and that will only happen once a corresponding cpufreq_cpu_put()
is issued.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-02  4:54                       ` Viresh Kumar
@ 2015-02-02 15:04                         ` ethan zhao
  2015-02-25  3:24                           ` Ethan Zhao
  0 siblings, 1 reply; 18+ messages in thread
From: ethan zhao @ 2015-02-02 15:04 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
	linux-pm, linux-kernel


On 2015/2/2 12:54, Viresh Kumar wrote:
> On 2 February 2015 at 10:15, ethan zhao <ethan.zhao@oracle.com> wrote:
>> On 2015/2/2 12:26, Viresh Kumar wrote:
>>   But there is no checking against refcount in or before
>>
>>   cpufreq_policy_free(), that is one issue I mentioned.
> As I said earlier, the completion will only fire once the refcount
> is zero. And so there is no need of any check here.
>
>>> That routines doesn't have any tricks and simply frees the policy.
>>> Because, before calling cpufreq_policy_put_kobj(), we have set
>>> the per-cpu variable to NULL, nobody else will get the policy
>>   It is possible cpufreq_cpu_get() within the PPC thread was called just
>>   before __cpufreq_remove_dev_finish() is to be called in another thread,
>>   so you set the per_cpu(cpufreq_cpu_data, cpu) to NULL will not prevent
>>   the actions between cpufreq_cpu_get and cpufreq_cpu_put().
>>
>>   And then the freeing happens in __cpufreq_remove_dev_finish().
> It will.. You aren't looking closely enough. If cpufreq_cpu_get() is called just
> before remove-dev, then cpufreq_cpu_get() will take:
>
> read_lock_irqsave(&cpufreq_driver_lock, flags);
>
> And it will do:
>
> read_unlock_irqrestore(&cpufreq_driver_lock, flags);
>
> only after increasing the refcount with kobject_get().
>
> While on the other side __cpufreq_remove_dev_finish() will do this:
>
>         write_lock_irqsave(&cpufreq_driver_lock, flags);
>         policy = per_cpu(cpufreq_cpu_data, cpu);
>         per_cpu(cpufreq_cpu_data, cpu) = NULL;
>         write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>
> So, it will wait for the read_lock in cpufreq_cpu_get() to finish before
> setting per-cpu variable to NULL. And so, after kobject_put() in
> cpufreq_policy_put_kobj(), we will wait for the completion to fire
  Closely enough this time, understood, thanks for your explanation.


  Ethan
> and that will only happen once a corresponding cpufreq_cpu_put()
> is issued.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-02 15:04                         ` ethan zhao
@ 2015-02-25  3:24                           ` Ethan Zhao
  2015-02-25  4:35                             ` viresh kumar
  0 siblings, 1 reply; 18+ messages in thread
From: Ethan Zhao @ 2015-02-25  3:24 UTC (permalink / raw)
  To: ethan zhao
  Cc: Viresh Kumar, Rafael Wysocki, santosh shilimkar,
	Linaro Kernel Mailman List, linux-pm, linux-kernel, guangyu.sun,
	sriharsha.devdas

Viresh,
  With this patch applied, still got the following warning and panic,
seems it needs more care.

     54.474618] ------------[ cut here ]------------
[   54.545816] WARNING: CPU: 0 PID: 213 at include/linux/kref.h:47
kobject_get+0x41/0x50()
[   54.642595] Modules linked in: i2c_i801(+) mfd_core shpchp(+)
acpi_cpufreq(+) edac_core ioatdma(+) xfs libcrc32c ast syscopyarea ixgbe
sysfillrect sysimgblt sr_mod sd_mod drm_kms_helper igb mdio cdrom e1000e ahci
dca ttm libahci uas drm i2c_algo_bit ptp megaraid_sas libata usb_storage
i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[   55.007264] CPU: 0 PID: 213 Comm: kworker/0:2 Not tainted
3.18.5
[   55.099970] Hardware name: Oracle Corporation SUN FIRE X4170 M2 SERVER
   /ASSY,MOTHERBOARD,X4170, BIOS 08120104 05/08/2012
[   55.239736] Workqueue: kacpi_notify acpi_os_execute_deferred
[   55.308598]  0000000000000000 00000000bd730b61 ffff88046742baf8
ffffffff816b7edb
[   55.398305]  0000000000000000 0000000000000000 ffff88046742bb38
ffffffff81078ae1
[   55.488040]  ffff88046742bbd8 ffff8806706b3000 0000000000000292
0000000000000000
[   55.577776] Call Trace:
[   55.608228]  [<ffffffff816b7edb>] dump_stack+0x46/0x58
[   55.670895]  [<ffffffff81078ae1>] warn_slowpath_common+0x81/0xa0
[   55.743952]  [<ffffffff81078bfa>] warn_slowpath_null+0x1a/0x20
[   55.814929]  [<ffffffff8130d0d1>] kobject_get+0x41/0x50
[   55.878654]  [<ffffffff8153e955>] cpufreq_cpu_get+0x75/0xc0
[   55.946528]  [<ffffffff8153f37e>] cpufreq_update_policy+0x2e/0x1f0
[   56.021682]  [<ffffffff810bf9d2>] ? up+0x32/0x50
[   56.078126]  [<ffffffff813ab975>] ? acpi_ns_get_node+0xcb/0xf2
[   56.148974]  [<ffffffff813abdc9>] ? acpi_evaluate_object+0x22c/0x252
[   56.226066]  [<ffffffff813ac3c2>] ? acpi_get_handle+0x95/0xc0
[   56.295871]  [<ffffffff8138a7fb>] ? acpi_has_method+0x25/0x40
[   56.365661]  [<ffffffff813bbcd4>] acpi_processor_ppc_has_changed+0x77/0x82
[   56.448956]  [<ffffffff8108f726>] ? move_linked_works+0x66/0x90
[   56.520842]  [<ffffffff813b87b9>] acpi_processor_notify+0x58/0xe7
[   56.594807]  [<ffffffff8139dfd8>] acpi_ev_notify_dispatch+0x44/0x5c
[   56.670859]  [<ffffffff81388fe3>] acpi_os_execute_deferred+0x15/0x22
[   56.747936]  [<ffffffff8109268e>] process_one_work+0x14e/0x3f0
[   56.818766]  [<ffffffff81092d9b>] worker_thread+0x11b/0x4d0
[   56.886486]  [<ffffffff81092c80>] ? rescuer_thread+0x350/0x350
[   56.957316]  [<ffffffff810984f1>] kthread+0xe1/0x100
[   57.017742]  [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
[   57.096903]  [<ffffffff816bfe7c>] ret_from_fork+0x7c/0xb0
[   57.162534]  [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
[   57.241680] ---[ end trace dce06bb76f547de5 ]---


Any idea ?

Thanks,
Ethan

On Mon, Feb 2, 2015 at 11:04 PM, ethan zhao <ethan.zhao@oracle.com> wrote:
>
> On 2015/2/2 12:54, Viresh Kumar wrote:
>>
>> On 2 February 2015 at 10:15, ethan zhao <ethan.zhao@oracle.com> wrote:
>>>
>>> On 2015/2/2 12:26, Viresh Kumar wrote:
>>>   But there is no checking against refcount in or before
>>>
>>>   cpufreq_policy_free(), that is one issue I mentioned.
>>
>> As I said earlier, the completion will only fire once the refcount
>> is zero. And so there is no need of any check here.
>>
>>>> That routines doesn't have any tricks and simply frees the policy.
>>>> Because, before calling cpufreq_policy_put_kobj(), we have set
>>>> the per-cpu variable to NULL, nobody else will get the policy
>>>
>>>   It is possible cpufreq_cpu_get() within the PPC thread was called just
>>>   before __cpufreq_remove_dev_finish() is to be called in another thread,
>>>   so you set the per_cpu(cpufreq_cpu_data, cpu) to NULL will not prevent
>>>   the actions between cpufreq_cpu_get and cpufreq_cpu_put().
>>>
>>>   And then the freeing happens in __cpufreq_remove_dev_finish().
>>
>> It will.. You aren't looking closely enough. If cpufreq_cpu_get() is
>> called just
>> before remove-dev, then cpufreq_cpu_get() will take:
>>
>> read_lock_irqsave(&cpufreq_driver_lock, flags);
>>
>> And it will do:
>>
>> read_unlock_irqrestore(&cpufreq_driver_lock, flags);
>>
>> only after increasing the refcount with kobject_get().
>>
>> While on the other side __cpufreq_remove_dev_finish() will do this:
>>
>>         write_lock_irqsave(&cpufreq_driver_lock, flags);
>>         policy = per_cpu(cpufreq_cpu_data, cpu);
>>         per_cpu(cpufreq_cpu_data, cpu) = NULL;
>>         write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>>
>> So, it will wait for the read_lock in cpufreq_cpu_get() to finish before
>> setting per-cpu variable to NULL. And so, after kobject_put() in
>> cpufreq_policy_put_kobj(), we will wait for the completion to fire
>
>  Closely enough this time, understood, thanks for your explanation.
>
>
>  Ethan
>
>> and that will only happen once a corresponding cpufreq_cpu_put()
>> is issued.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-25  3:24                           ` Ethan Zhao
@ 2015-02-25  4:35                             ` viresh kumar
  2015-02-25  5:47                               ` Ethan Zhao
  0 siblings, 1 reply; 18+ messages in thread
From: viresh kumar @ 2015-02-25  4:35 UTC (permalink / raw)
  To: Ethan Zhao, ethan zhao
  Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
	linux-pm, linux-kernel, guangyu.sun, sriharsha.devdas

On Wednesday 25 February 2015 08:54 AM, Ethan Zhao wrote:
> Viresh,
>   With this patch applied, still got the following warning and panic,
> seems it needs more care.
> 
>      54.474618] ------------[ cut here ]------------
> [   54.545816] WARNING: CPU: 0 PID: 213 at include/linux/kref.h:47
> kobject_get+0x41/0x50()
> [   54.642595] Modules linked in: i2c_i801(+) mfd_core shpchp(+)
> acpi_cpufreq(+) edac_core ioatdma(+) xfs libcrc32c ast syscopyarea ixgbe
> sysfillrect sysimgblt sr_mod sd_mod drm_kms_helper igb mdio cdrom e1000e ahci
> dca ttm libahci uas drm i2c_algo_bit ptp megaraid_sas libata usb_storage
> i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
> [   55.007264] CPU: 0 PID: 213 Comm: kworker/0:2 Not tainted
> 3.18.5
> [   55.099970] Hardware name: Oracle Corporation SUN FIRE X4170 M2 SERVER
>    /ASSY,MOTHERBOARD,X4170, BIOS 08120104 05/08/2012
> [   55.239736] Workqueue: kacpi_notify acpi_os_execute_deferred
> [   55.308598]  0000000000000000 00000000bd730b61 ffff88046742baf8
> ffffffff816b7edb
> [   55.398305]  0000000000000000 0000000000000000 ffff88046742bb38
> ffffffff81078ae1
> [   55.488040]  ffff88046742bbd8 ffff8806706b3000 0000000000000292
> 0000000000000000
> [   55.577776] Call Trace:
> [   55.608228]  [<ffffffff816b7edb>] dump_stack+0x46/0x58
> [   55.670895]  [<ffffffff81078ae1>] warn_slowpath_common+0x81/0xa0
> [   55.743952]  [<ffffffff81078bfa>] warn_slowpath_null+0x1a/0x20
> [   55.814929]  [<ffffffff8130d0d1>] kobject_get+0x41/0x50
> [   55.878654]  [<ffffffff8153e955>] cpufreq_cpu_get+0x75/0xc0
> [   55.946528]  [<ffffffff8153f37e>] cpufreq_update_policy+0x2e/0x1f0
> [   56.021682]  [<ffffffff810bf9d2>] ? up+0x32/0x50
> [   56.078126]  [<ffffffff813ab975>] ? acpi_ns_get_node+0xcb/0xf2
> [   56.148974]  [<ffffffff813abdc9>] ? acpi_evaluate_object+0x22c/0x252
> [   56.226066]  [<ffffffff813ac3c2>] ? acpi_get_handle+0x95/0xc0
> [   56.295871]  [<ffffffff8138a7fb>] ? acpi_has_method+0x25/0x40
> [   56.365661]  [<ffffffff813bbcd4>] acpi_processor_ppc_has_changed+0x77/0x82
> [   56.448956]  [<ffffffff8108f726>] ? move_linked_works+0x66/0x90
> [   56.520842]  [<ffffffff813b87b9>] acpi_processor_notify+0x58/0xe7
> [   56.594807]  [<ffffffff8139dfd8>] acpi_ev_notify_dispatch+0x44/0x5c
> [   56.670859]  [<ffffffff81388fe3>] acpi_os_execute_deferred+0x15/0x22
> [   56.747936]  [<ffffffff8109268e>] process_one_work+0x14e/0x3f0
> [   56.818766]  [<ffffffff81092d9b>] worker_thread+0x11b/0x4d0
> [   56.886486]  [<ffffffff81092c80>] ? rescuer_thread+0x350/0x350
> [   56.957316]  [<ffffffff810984f1>] kthread+0xe1/0x100
> [   57.017742]  [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
> [   57.096903]  [<ffffffff816bfe7c>] ret_from_fork+0x7c/0xb0
> [   57.162534]  [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
> [   57.241680] ---[ end trace dce06bb76f547de5 ]---
> 
> 
> Any idea ?

No. Santosh reported this to me few days back, I asked him to perform some
testing but don't know what happened after that..

Can you give me full kernel logs along with the crash after this patch.
You will be required to do some testing this time as I don't have any clue
about the problem..


diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index b4375021238f..230a59d2e0d7 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -214,8 +214,10 @@ struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu)
        if (cpufreq_driver) {
                /* get the CPU */
                policy = per_cpu(cpufreq_cpu_data, cpu);
-               if (policy)
+               if (policy) {
                        kobject_get(&policy->kobj);
+                       pr_info("%s: %d", __func__, atomic_read(&policy->kobj.kref.refcount));
+               }
        }

        read_unlock_irqrestore(&cpufreq_driver_lock, flags);
@@ -233,6 +235,7 @@ void cpufreq_cpu_put(struct cpufreq_policy *policy)
                return;

        kobject_put(&policy->kobj);
+       pr_info("%s: %d", __func__, atomic_read(&policy->kobj.kref.refcount));
        up_read(&cpufreq_rwsem);
 }
 EXPORT_SYMBOL_GPL(cpufreq_cpu_put);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-25  4:35                             ` viresh kumar
@ 2015-02-25  5:47                               ` Ethan Zhao
  2015-02-25 16:31                                 ` santosh shilimkar
  0 siblings, 1 reply; 18+ messages in thread
From: Ethan Zhao @ 2015-02-25  5:47 UTC (permalink / raw)
  To: viresh kumar
  Cc: ethan zhao, Rafael Wysocki, santosh shilimkar,
	Linaro Kernel Mailman List, linux-pm, linux-kernel, guangyu.sun,
	sriharsha.devdas

Viresh,

    Will do that when I get the test box.

Thanks,
Ethan

On Wed, Feb 25, 2015 at 12:35 PM, viresh kumar <viresh.kumar@linaro.org> wrote:
> On Wednesday 25 February 2015 08:54 AM, Ethan Zhao wrote:
>> Viresh,
>>   With this patch applied, still got the following warning and panic,
>> seems it needs more care.
>>
>>      54.474618] ------------[ cut here ]------------
>> [   54.545816] WARNING: CPU: 0 PID: 213 at include/linux/kref.h:47
>> kobject_get+0x41/0x50()
>> [   54.642595] Modules linked in: i2c_i801(+) mfd_core shpchp(+)
>> acpi_cpufreq(+) edac_core ioatdma(+) xfs libcrc32c ast syscopyarea ixgbe
>> sysfillrect sysimgblt sr_mod sd_mod drm_kms_helper igb mdio cdrom e1000e ahci
>> dca ttm libahci uas drm i2c_algo_bit ptp megaraid_sas libata usb_storage
>> i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
>> [   55.007264] CPU: 0 PID: 213 Comm: kworker/0:2 Not tainted
>> 3.18.5
>> [   55.099970] Hardware name: Oracle Corporation SUN FIRE X4170 M2 SERVER
>>    /ASSY,MOTHERBOARD,X4170, BIOS 08120104 05/08/2012
>> [   55.239736] Workqueue: kacpi_notify acpi_os_execute_deferred
>> [   55.308598]  0000000000000000 00000000bd730b61 ffff88046742baf8
>> ffffffff816b7edb
>> [   55.398305]  0000000000000000 0000000000000000 ffff88046742bb38
>> ffffffff81078ae1
>> [   55.488040]  ffff88046742bbd8 ffff8806706b3000 0000000000000292
>> 0000000000000000
>> [   55.577776] Call Trace:
>> [   55.608228]  [<ffffffff816b7edb>] dump_stack+0x46/0x58
>> [   55.670895]  [<ffffffff81078ae1>] warn_slowpath_common+0x81/0xa0
>> [   55.743952]  [<ffffffff81078bfa>] warn_slowpath_null+0x1a/0x20
>> [   55.814929]  [<ffffffff8130d0d1>] kobject_get+0x41/0x50
>> [   55.878654]  [<ffffffff8153e955>] cpufreq_cpu_get+0x75/0xc0
>> [   55.946528]  [<ffffffff8153f37e>] cpufreq_update_policy+0x2e/0x1f0
>> [   56.021682]  [<ffffffff810bf9d2>] ? up+0x32/0x50
>> [   56.078126]  [<ffffffff813ab975>] ? acpi_ns_get_node+0xcb/0xf2
>> [   56.148974]  [<ffffffff813abdc9>] ? acpi_evaluate_object+0x22c/0x252
>> [   56.226066]  [<ffffffff813ac3c2>] ? acpi_get_handle+0x95/0xc0
>> [   56.295871]  [<ffffffff8138a7fb>] ? acpi_has_method+0x25/0x40
>> [   56.365661]  [<ffffffff813bbcd4>] acpi_processor_ppc_has_changed+0x77/0x82
>> [   56.448956]  [<ffffffff8108f726>] ? move_linked_works+0x66/0x90
>> [   56.520842]  [<ffffffff813b87b9>] acpi_processor_notify+0x58/0xe7
>> [   56.594807]  [<ffffffff8139dfd8>] acpi_ev_notify_dispatch+0x44/0x5c
>> [   56.670859]  [<ffffffff81388fe3>] acpi_os_execute_deferred+0x15/0x22
>> [   56.747936]  [<ffffffff8109268e>] process_one_work+0x14e/0x3f0
>> [   56.818766]  [<ffffffff81092d9b>] worker_thread+0x11b/0x4d0
>> [   56.886486]  [<ffffffff81092c80>] ? rescuer_thread+0x350/0x350
>> [   56.957316]  [<ffffffff810984f1>] kthread+0xe1/0x100
>> [   57.017742]  [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
>> [   57.096903]  [<ffffffff816bfe7c>] ret_from_fork+0x7c/0xb0
>> [   57.162534]  [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
>> [   57.241680] ---[ end trace dce06bb76f547de5 ]---
>>
>>
>> Any idea ?
>
> No. Santosh reported this to me few days back, I asked him to perform some
> testing but don't know what happened after that..
>
> Can you give me full kernel logs along with the crash after this patch.
> You will be required to do some testing this time as I don't have any clue
> about the problem..
>
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index b4375021238f..230a59d2e0d7 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -214,8 +214,10 @@ struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu)
>         if (cpufreq_driver) {
>                 /* get the CPU */
>                 policy = per_cpu(cpufreq_cpu_data, cpu);
> -               if (policy)
> +               if (policy) {
>                         kobject_get(&policy->kobj);
> +                       pr_info("%s: %d", __func__, atomic_read(&policy->kobj.kref.refcount));
> +               }
>         }
>
>         read_unlock_irqrestore(&cpufreq_driver_lock, flags);
> @@ -233,6 +235,7 @@ void cpufreq_cpu_put(struct cpufreq_policy *policy)
>                 return;
>
>         kobject_put(&policy->kobj);
> +       pr_info("%s: %d", __func__, atomic_read(&policy->kobj.kref.refcount));
>         up_read(&cpufreq_rwsem);
>  }
>  EXPORT_SYMBOL_GPL(cpufreq_cpu_put);
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-25  5:47                               ` Ethan Zhao
@ 2015-02-25 16:31                                 ` santosh shilimkar
  2015-03-09  1:34                                   ` Ethan Zhao
  0 siblings, 1 reply; 18+ messages in thread
From: santosh shilimkar @ 2015-02-25 16:31 UTC (permalink / raw)
  To: Ethan Zhao, viresh kumar
  Cc: ethan zhao, Rafael Wysocki, Linaro Kernel Mailman List, linux-pm,
	linux-kernel, guangyu.sun, sriharsha.devdas

On 2/24/2015 9:47 PM, Ethan Zhao wrote:
> Viresh,
>
>      Will do that when I get the test box.
>
Thanks Ethan.

>
> On Wed, Feb 25, 2015 at 12:35 PM, viresh kumar <viresh.kumar@linaro.org> wrote:
>> On Wednesday 25 February 2015 08:54 AM, Ethan Zhao wrote:
>>> Viresh,
>>>    With this patch applied, still got the following warning and panic,
>>> seems it needs more care.
>>>
>>>       54.474618] ------------[ cut here ]------------
>>> [   54.545816] WARNING: CPU: 0 PID: 213 at include/linux/kref.h:47
>>> kobject_get+0x41/0x50()
>>> [   54.642595] Modules linked in: i2c_i801(+) mfd_core shpchp(+)
>>> acpi_cpufreq(+) edac_core ioatdma(+) xfs libcrc32c ast syscopyarea ixgbe
>>> sysfillrect sysimgblt sr_mod sd_mod drm_kms_helper igb mdio cdrom e1000e ahci
>>> dca ttm libahci uas drm i2c_algo_bit ptp megaraid_sas libata usb_storage
>>> i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
>>> [   55.007264] CPU: 0 PID: 213 Comm: kworker/0:2 Not tainted
>>> 3.18.5
>>> [   55.099970] Hardware name: Oracle Corporation SUN FIRE X4170 M2 SERVER
>>>     /ASSY,MOTHERBOARD,X4170, BIOS 08120104 05/08/2012
>>> [   55.239736] Workqueue: kacpi_notify acpi_os_execute_deferred
>>> [   55.308598]  0000000000000000 00000000bd730b61 ffff88046742baf8
>>> ffffffff816b7edb
>>> [   55.398305]  0000000000000000 0000000000000000 ffff88046742bb38
>>> ffffffff81078ae1
>>> [   55.488040]  ffff88046742bbd8 ffff8806706b3000 0000000000000292
>>> 0000000000000000
>>> [   55.577776] Call Trace:
>>> [   55.608228]  [<ffffffff816b7edb>] dump_stack+0x46/0x58
>>> [   55.670895]  [<ffffffff81078ae1>] warn_slowpath_common+0x81/0xa0
>>> [   55.743952]  [<ffffffff81078bfa>] warn_slowpath_null+0x1a/0x20
>>> [   55.814929]  [<ffffffff8130d0d1>] kobject_get+0x41/0x50
>>> [   55.878654]  [<ffffffff8153e955>] cpufreq_cpu_get+0x75/0xc0
>>> [   55.946528]  [<ffffffff8153f37e>] cpufreq_update_policy+0x2e/0x1f0
>>> [   56.021682]  [<ffffffff810bf9d2>] ? up+0x32/0x50
>>> [   56.078126]  [<ffffffff813ab975>] ? acpi_ns_get_node+0xcb/0xf2
>>> [   56.148974]  [<ffffffff813abdc9>] ? acpi_evaluate_object+0x22c/0x252
>>> [   56.226066]  [<ffffffff813ac3c2>] ? acpi_get_handle+0x95/0xc0
>>> [   56.295871]  [<ffffffff8138a7fb>] ? acpi_has_method+0x25/0x40
>>> [   56.365661]  [<ffffffff813bbcd4>] acpi_processor_ppc_has_changed+0x77/0x82
>>> [   56.448956]  [<ffffffff8108f726>] ? move_linked_works+0x66/0x90
>>> [   56.520842]  [<ffffffff813b87b9>] acpi_processor_notify+0x58/0xe7
>>> [   56.594807]  [<ffffffff8139dfd8>] acpi_ev_notify_dispatch+0x44/0x5c
>>> [   56.670859]  [<ffffffff81388fe3>] acpi_os_execute_deferred+0x15/0x22
>>> [   56.747936]  [<ffffffff8109268e>] process_one_work+0x14e/0x3f0
>>> [   56.818766]  [<ffffffff81092d9b>] worker_thread+0x11b/0x4d0
>>> [   56.886486]  [<ffffffff81092c80>] ? rescuer_thread+0x350/0x350
>>> [   56.957316]  [<ffffffff810984f1>] kthread+0xe1/0x100
>>> [   57.017742]  [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
>>> [   57.096903]  [<ffffffff816bfe7c>] ret_from_fork+0x7c/0xb0
>>> [   57.162534]  [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
>>> [   57.241680] ---[ end trace dce06bb76f547de5 ]---
>>>
>>>
>>> Any idea ?
>>
>> No. Santosh reported this to me few days back, I asked him to perform some
>> testing but don't know what happened after that..
>>
I didn't get time to re-look into it so far.

Regards,
Santosh


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-02-25 16:31                                 ` santosh shilimkar
@ 2015-03-09  1:34                                   ` Ethan Zhao
  2015-03-09  4:06                                     ` Viresh Kumar
  0 siblings, 1 reply; 18+ messages in thread
From: Ethan Zhao @ 2015-03-09  1:34 UTC (permalink / raw)
  To: viresh kumar, santosh shilimkar
  Cc: ethan zhao, Rafael Wysocki, Linaro Kernel Mailman List, linux-pm,
	linux-kernel, guangyu.sun, sriharsha.devdas

Viresh,
    Got that box and did some debug, found the policy->kobj is not initialized.
So the race happened between cpufreq_cpu_get() and
__cpufreq_add_dev(), and verified 'this' race could be fixed by commit

6d4e81e cpufreq: Ref the policy object sooner

    I have reboot the box with crond for more than 12 hours, no warning found.

    But obviously, the commit
    cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
    also fixed one of the possible race condition.


Thanks,
Ethan

On Thu, Feb 26, 2015 at 12:31 AM, santosh shilimkar
<santosh.shilimkar@oracle.com> wrote:
> On 2/24/2015 9:47 PM, Ethan Zhao wrote:
>>
>> Viresh,
>>
>>      Will do that when I get the test box.
>>
> Thanks Ethan.
>
>
>>
>> On Wed, Feb 25, 2015 at 12:35 PM, viresh kumar <viresh.kumar@linaro.org>
>> wrote:
>>>
>>> On Wednesday 25 February 2015 08:54 AM, Ethan Zhao wrote:
>>>>
>>>> Viresh,
>>>>    With this patch applied, still got the following warning and panic,
>>>> seems it needs more care.
>>>>
>>>>       54.474618] ------------[ cut here ]------------
>>>> [   54.545816] WARNING: CPU: 0 PID: 213 at include/linux/kref.h:47
>>>> kobject_get+0x41/0x50()
>>>> [   54.642595] Modules linked in: i2c_i801(+) mfd_core shpchp(+)
>>>> acpi_cpufreq(+) edac_core ioatdma(+) xfs libcrc32c ast syscopyarea ixgbe
>>>> sysfillrect sysimgblt sr_mod sd_mod drm_kms_helper igb mdio cdrom e1000e
>>>> ahci
>>>> dca ttm libahci uas drm i2c_algo_bit ptp megaraid_sas libata usb_storage
>>>> i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
>>>> [   55.007264] CPU: 0 PID: 213 Comm: kworker/0:2 Not tainted
>>>> 3.18.5
>>>> [   55.099970] Hardware name: Oracle Corporation SUN FIRE X4170 M2
>>>> SERVER
>>>>     /ASSY,MOTHERBOARD,X4170, BIOS 08120104 05/08/2012
>>>> [   55.239736] Workqueue: kacpi_notify acpi_os_execute_deferred
>>>> [   55.308598]  0000000000000000 00000000bd730b61 ffff88046742baf8
>>>> ffffffff816b7edb
>>>> [   55.398305]  0000000000000000 0000000000000000 ffff88046742bb38
>>>> ffffffff81078ae1
>>>> [   55.488040]  ffff88046742bbd8 ffff8806706b3000 0000000000000292
>>>> 0000000000000000
>>>> [   55.577776] Call Trace:
>>>> [   55.608228]  [<ffffffff816b7edb>] dump_stack+0x46/0x58
>>>> [   55.670895]  [<ffffffff81078ae1>] warn_slowpath_common+0x81/0xa0
>>>> [   55.743952]  [<ffffffff81078bfa>] warn_slowpath_null+0x1a/0x20
>>>> [   55.814929]  [<ffffffff8130d0d1>] kobject_get+0x41/0x50
>>>> [   55.878654]  [<ffffffff8153e955>] cpufreq_cpu_get+0x75/0xc0
>>>> [   55.946528]  [<ffffffff8153f37e>] cpufreq_update_policy+0x2e/0x1f0
>>>> [   56.021682]  [<ffffffff810bf9d2>] ? up+0x32/0x50
>>>> [   56.078126]  [<ffffffff813ab975>] ? acpi_ns_get_node+0xcb/0xf2
>>>> [   56.148974]  [<ffffffff813abdc9>] ? acpi_evaluate_object+0x22c/0x252
>>>> [   56.226066]  [<ffffffff813ac3c2>] ? acpi_get_handle+0x95/0xc0
>>>> [   56.295871]  [<ffffffff8138a7fb>] ? acpi_has_method+0x25/0x40
>>>> [   56.365661]  [<ffffffff813bbcd4>]
>>>> acpi_processor_ppc_has_changed+0x77/0x82
>>>> [   56.448956]  [<ffffffff8108f726>] ? move_linked_works+0x66/0x90
>>>> [   56.520842]  [<ffffffff813b87b9>] acpi_processor_notify+0x58/0xe7
>>>> [   56.594807]  [<ffffffff8139dfd8>] acpi_ev_notify_dispatch+0x44/0x5c
>>>> [   56.670859]  [<ffffffff81388fe3>] acpi_os_execute_deferred+0x15/0x22
>>>> [   56.747936]  [<ffffffff8109268e>] process_one_work+0x14e/0x3f0
>>>> [   56.818766]  [<ffffffff81092d9b>] worker_thread+0x11b/0x4d0
>>>> [   56.886486]  [<ffffffff81092c80>] ? rescuer_thread+0x350/0x350
>>>> [   56.957316]  [<ffffffff810984f1>] kthread+0xe1/0x100
>>>> [   57.017742]  [<ffffffff81098410>] ?
>>>> kthread_create_on_node+0x1b0/0x1b0
>>>> [   57.096903]  [<ffffffff816bfe7c>] ret_from_fork+0x7c/0xb0
>>>> [   57.162534]  [<ffffffff81098410>] ?
>>>> kthread_create_on_node+0x1b0/0x1b0
>>>> [   57.241680] ---[ end trace dce06bb76f547de5 ]---
>>>>
>>>>
>>>> Any idea ?
>>>
>>>
>>> No. Santosh reported this to me few days back, I asked him to perform
>>> some
>>> testing but don't know what happened after that..
>>>
> I didn't get time to re-look into it so far.
>
> Regards,
> Santosh
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-03-09  1:34                                   ` Ethan Zhao
@ 2015-03-09  4:06                                     ` Viresh Kumar
  2015-03-09  4:14                                       ` ethan zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Viresh Kumar @ 2015-03-09  4:06 UTC (permalink / raw)
  To: Ethan Zhao
  Cc: santosh shilimkar, ethan zhao, Rafael Wysocki,
	Linaro Kernel Mailman List, linux-pm, linux-kernel, guangyu.sun,
	sriharsha.devdas

On 9 March 2015 at 07:04, Ethan Zhao <ethan.kernel@gmail.com> wrote:
> Viresh,
>     Got that box and did some debug, found the policy->kobj is not initialized.
> So the race happened between cpufreq_cpu_get() and
> __cpufreq_add_dev(), and verified 'this' race could be fixed by commit
>
> 6d4e81e cpufreq: Ref the policy object sooner
>
>     I have reboot the box with crond for more than 12 hours, no warning found.

Oh, great. Thanks for your work Ethan. You want this to be pushed for 3.18
stable kernel, right? I will see what I can do.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
  2015-03-09  4:06                                     ` Viresh Kumar
@ 2015-03-09  4:14                                       ` ethan zhao
  0 siblings, 0 replies; 18+ messages in thread
From: ethan zhao @ 2015-03-09  4:14 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Ethan Zhao, santosh shilimkar, Rafael Wysocki,
	Linaro Kernel Mailman List, linux-pm, linux-kernel, guangyu.sun,
	sriharsha.devdas


On 2015/3/9 12:06, Viresh Kumar wrote:
> On 9 March 2015 at 07:04, Ethan Zhao <ethan.kernel@gmail.com> wrote:
>> Viresh,
>>      Got that box and did some debug, found the policy->kobj is not initialized.
>> So the race happened between cpufreq_cpu_get() and
>> __cpufreq_add_dev(), and verified 'this' race could be fixed by commit
>>
>> 6d4e81e cpufreq: Ref the policy object sooner
>>
>>      I have reboot the box with crond for more than 12 hours, no warning found.
> Oh, great. Thanks for your work Ethan. You want this to be pushed for 3.18
> stable kernel, right? I will see what I can do.
  Of course we are happy to see it in 3.18 branch.

  Thanks,
  Ethan



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-03-09  4:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <ed8fd187687cb4ea9afd0bc32107ca5abf03e679.1422663249.git.viresh.kumar@linaro.org>
     [not found] ` <54CEECF7.7020504@oracle.com>
     [not found]   ` <CAKohpon10=xemOZkvqMTV2JbdTMXPDiW+-v52Qzm8SsNymN-Kg@mail.gmail.com>
2015-02-02  3:38     ` [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject ethan zhao
2015-02-02  3:43       ` Viresh Kumar
2015-02-02  3:56         ` ethan zhao
2015-02-02  3:59           ` Viresh Kumar
2015-02-02  4:06             ` ethan zhao
2015-02-02  4:09               ` Viresh Kumar
2015-02-02  4:16                 ` ethan zhao
2015-02-02  4:26                   ` Viresh Kumar
2015-02-02  4:45                     ` ethan zhao
2015-02-02  4:54                       ` Viresh Kumar
2015-02-02 15:04                         ` ethan zhao
2015-02-25  3:24                           ` Ethan Zhao
2015-02-25  4:35                             ` viresh kumar
2015-02-25  5:47                               ` Ethan Zhao
2015-02-25 16:31                                 ` santosh shilimkar
2015-03-09  1:34                                   ` Ethan Zhao
2015-03-09  4:06                                     ` Viresh Kumar
2015-03-09  4:14                                       ` ethan zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).