linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Could you please help to have a look a bug trace in pmu arm-cci.c
       [not found] <529F9A9100AE8045A7A5B5A00A39FBB862099B8E@ALA-MBD.corp.ad.wrs.com>
@ 2019-01-30 18:21 ` Will Deacon
  2019-01-30 19:09   ` Robin Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2019-01-30 18:21 UTC (permalink / raw)
  To: Li, Meng
  Cc: mark.rutland, linux-arm-kernel, linux-kernel, suzuki.poulose,
	robin.murphy

[+Suzuki and Robin]

On Mon, Jan 28, 2019 at 07:19:20AM +0000, Li, Meng wrote:
> When enable kernel configure CONFIG_DEBUG_ATOMIC_SLEEP, there is below trace
> during pmu arm cci driver probe phase.
> 
> [ 1.983337] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:2004
> [ 1.983340] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
> [ 1.983342] Preemption disabled at:
> [ 1.983353] [<ffffff80089801f4>] cci_pmu_probe+0x1dc/0x488
> [ 1.983360] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.20-rt8-yocto-preempt-rt #1
> [ 1.983362] Hardware name: ZynqMP ZCU102 Rev1.0 (DT)
> [ 1.983364] Call trace:
> [ 1.983369] dump_backtrace+0x0/0x158
> [ 1.983372] show_stack+0x24/0x30
> [ 1.983378] dump_stack+0x80/0xa4
> [ 1.983383] ___might_sleep+0x138/0x160
> [ 1.983386] __might_sleep+0x58/0x90
> [ 1.983391] __rt_mutex_lock_state+0x30/0xc0
> [ 1.983395] _mutex_lock+0x24/0x30
> [ 1.983400] perf_pmu_register+0x2c/0x388
> [ 1.983404] cci_pmu_probe+0x2bc/0x488
> [ 1.983409] platform_drv_probe+0x58/0xa8
> 
> Because get_cpu() is invoked, preempt is disable, finally, trace occurs when
> call might_sleep()

Hmm, the {get,put}_cpu() usage here looks very broken to me. There's the
fact that it might sleep, but also the assignment to g_cci_pmu is done after
we've re-enabled preemption, so there's a race with CPU hotplug there too.

I don't think we can simply register the hotplug notifier before registering
the PMU, because we can't call into perf_pmu_migrate_context() until the PMU
has been registered. Perhaps we need to use the _cpuslocked() versions of
the hotplug notifier registration functions.

I tried looking at some other drivers, but they all look broken to me, so
there's a good chance I'm missing something. Anybody know how this is
supposed to work?

Will

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Could you please help to have a look a bug trace in pmu arm-cci.c
  2019-01-30 18:21 ` Could you please help to have a look a bug trace in pmu arm-cci.c Will Deacon
@ 2019-01-30 19:09   ` Robin Murphy
  2019-02-01  3:23     ` Li, Meng
  2019-02-01 18:01     ` Will Deacon
  0 siblings, 2 replies; 6+ messages in thread
From: Robin Murphy @ 2019-01-30 19:09 UTC (permalink / raw)
  To: Will Deacon, Li, Meng
  Cc: mark.rutland, linux-arm-kernel, linux-kernel, suzuki.poulose

On 2019-01-30 6:21 pm, Will Deacon wrote:
> [+Suzuki and Robin]
> 
> On Mon, Jan 28, 2019 at 07:19:20AM +0000, Li, Meng wrote:
>> When enable kernel configure CONFIG_DEBUG_ATOMIC_SLEEP, there is below trace
>> during pmu arm cci driver probe phase.
>>
>> [ 1.983337] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:2004
>> [ 1.983340] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
>> [ 1.983342] Preemption disabled at:
>> [ 1.983353] [<ffffff80089801f4>] cci_pmu_probe+0x1dc/0x488
>> [ 1.983360] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.20-rt8-yocto-preempt-rt #1
>> [ 1.983362] Hardware name: ZynqMP ZCU102 Rev1.0 (DT)
>> [ 1.983364] Call trace:
>> [ 1.983369] dump_backtrace+0x0/0x158
>> [ 1.983372] show_stack+0x24/0x30
>> [ 1.983378] dump_stack+0x80/0xa4
>> [ 1.983383] ___might_sleep+0x138/0x160
>> [ 1.983386] __might_sleep+0x58/0x90
>> [ 1.983391] __rt_mutex_lock_state+0x30/0xc0
>> [ 1.983395] _mutex_lock+0x24/0x30
>> [ 1.983400] perf_pmu_register+0x2c/0x388
>> [ 1.983404] cci_pmu_probe+0x2bc/0x488
>> [ 1.983409] platform_drv_probe+0x58/0xa8
>>
>> Because get_cpu() is invoked, preempt is disable, finally, trace occurs when
>> call might_sleep()
> 
> Hmm, the {get,put}_cpu() usage here looks very broken to me. There's the
> fact that it might sleep, but also the assignment to g_cci_pmu is done after
> we've re-enabled preemption, so there's a race with CPU hotplug there too.

Hmm, looks like I failed to appreciate that particular race at the time 
- indeed the global should probably be assigned immediately after 
cci_pmu_init() has succeeded.

> I don't think we can simply register the hotplug notifier before registering
> the PMU, because we can't call into perf_pmu_migrate_context() until the PMU
> has been registered. Perhaps we need to use the _cpuslocked() versions of
> the hotplug notifier registration functions.
> 
> I tried looking at some other drivers, but they all look broken to me, so
> there's a good chance I'm missing something. Anybody know how this is
> supposed to work?

As I understand the general pattern, we register the notifier last to 
avoid taking a hotplug callback with a partly-initialised PMU state, 
however since the CPU we've picked is part of that PMU state, we also 
want to avoid getting migrated off that CPU before the notifier is in 
place lest things get out of sync, hence disabling preemption. As far as 
the correctness of implementing that logic, though, it was like that 
when I got here so I've always just assumed it was fine :)

I guess the question is whether we actually need to pick our nominal CPU 
before perf_pmu_register(), or if something like the below would suffice 
- what do you reckon?

Robin.

----->8-----
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 1bfeb160c5b1..da9309ff80d7 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1692,19 +1692,18 @@ static int cci_pmu_probe(struct platform_device 
*pdev)
         raw_spin_lock_init(&cci_pmu->hw_events.pmu_lock);
         mutex_init(&cci_pmu->reserve_mutex);
         atomic_set(&cci_pmu->active_events, 0);
-       cci_pmu->cpu = get_cpu();
+       cci_pmu->cpu = -1; /* Avoid races until hotplug notifier is alive */

         ret = cci_pmu_init(cci_pmu, pdev);
-       if (ret) {
-               put_cpu();
+       if (ret)
                 return ret;
-       }
+       g_cci_pmu = cci_pmu;

         cpuhp_setup_state_nocalls(CPUHP_AP_PERF_ARM_CCI_ONLINE,
                                   "perf/arm/cci:online", NULL,
                                   cci_pmu_offline_cpu);
-       put_cpu();
-       g_cci_pmu = cci_pmu;
+       cci_pmu->cpu = smp_processor_id();
+
         pr_info("ARM %s PMU driver probed", cci_pmu->model->name);
         return 0;
  }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: Could you please help to have a look a bug trace in pmu arm-cci.c
  2019-01-30 19:09   ` Robin Murphy
@ 2019-02-01  3:23     ` Li, Meng
  2019-02-01 18:01     ` Will Deacon
  1 sibling, 0 replies; 6+ messages in thread
From: Li, Meng @ 2019-02-01  3:23 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon
  Cc: mark.rutland, linux-arm-kernel, linux-kernel, suzuki.poulose



> -----Original Message-----
> From: Robin Murphy [mailto:robin.murphy@arm.com]
> Sent: Thursday, January 31, 2019 3:10 AM
> To: Will Deacon; Li, Meng
> Cc: mark.rutland@arm.com; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; suzuki.poulose@arm.com
> Subject: Re: Could you please help to have a look a bug trace in pmu arm-cci.c
> 
> On 2019-01-30 6:21 pm, Will Deacon wrote:
> > [+Suzuki and Robin]
> >
> > On Mon, Jan 28, 2019 at 07:19:20AM +0000, Li, Meng wrote:
> >> When enable kernel configure CONFIG_DEBUG_ATOMIC_SLEEP, there is
> below trace
> >> during pmu arm cci driver probe phase.
> >>
> >> [ 1.983337] BUG: sleeping function called from invalid context at
> kernel/locking/rtmutex.c:2004
> >> [ 1.983340] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
> >> [ 1.983342] Preemption disabled at:
> >> [ 1.983353] [<ffffff80089801f4>] cci_pmu_probe+0x1dc/0x488
> >> [ 1.983360] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.20-rt8-yocto-
> preempt-rt #1
> >> [ 1.983362] Hardware name: ZynqMP ZCU102 Rev1.0 (DT)
> >> [ 1.983364] Call trace:
> >> [ 1.983369] dump_backtrace+0x0/0x158
> >> [ 1.983372] show_stack+0x24/0x30
> >> [ 1.983378] dump_stack+0x80/0xa4
> >> [ 1.983383] ___might_sleep+0x138/0x160
> >> [ 1.983386] __might_sleep+0x58/0x90
> >> [ 1.983391] __rt_mutex_lock_state+0x30/0xc0
> >> [ 1.983395] _mutex_lock+0x24/0x30
> >> [ 1.983400] perf_pmu_register+0x2c/0x388
> >> [ 1.983404] cci_pmu_probe+0x2bc/0x488
> >> [ 1.983409] platform_drv_probe+0x58/0xa8
> >>
> >> Because get_cpu() is invoked, preempt is disable, finally, trace occurs
> when
> >> call might_sleep()
> >
> > Hmm, the {get,put}_cpu() usage here looks very broken to me. There's the
> > fact that it might sleep, but also the assignment to g_cci_pmu is done after
> > we've re-enabled preemption, so there's a race with CPU hotplug there
> too.
> 
> Hmm, looks like I failed to appreciate that particular race at the time
> - indeed the global should probably be assigned immediately after
> cci_pmu_init() has succeeded.
> 
> > I don't think we can simply register the hotplug notifier before registering
> > the PMU, because we can't call into perf_pmu_migrate_context() until the
> PMU
> > has been registered. Perhaps we need to use the _cpuslocked() versions
> of
> > the hotplug notifier registration functions.
> >
> > I tried looking at some other drivers, but they all look broken to me, so
> > there's a good chance I'm missing something. Anybody know how this is
> > supposed to work?
> 
> As I understand the general pattern, we register the notifier last to
> avoid taking a hotplug callback with a partly-initialised PMU state,
> however since the CPU we've picked is part of that PMU state, we also
> want to avoid getting migrated off that CPU before the notifier is in
> place lest things get out of sync, hence disabling preemption. As far as
> the correctness of implementing that logic, though, it was like that
> when I got here so I've always just assumed it was fine :)
> 
> I guess the question is whether we actually need to pick our nominal CPU
> before perf_pmu_register(), or if something like the below would suffice
> - what do you reckon?
> 
> Robin.
> 
> ----->8-----
> diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
> index 1bfeb160c5b1..da9309ff80d7 100644
> --- a/drivers/perf/arm-cci.c
> +++ b/drivers/perf/arm-cci.c
> @@ -1692,19 +1692,18 @@ static int cci_pmu_probe(struct platform_device
> *pdev)
>          raw_spin_lock_init(&cci_pmu->hw_events.pmu_lock);
>          mutex_init(&cci_pmu->reserve_mutex);
>          atomic_set(&cci_pmu->active_events, 0);
> -       cci_pmu->cpu = get_cpu();
> +       cci_pmu->cpu = -1; /* Avoid races until hotplug notifier is alive */

If we set cci_pmu->cpu = -1 here, I think we should modify code of cci_pmu_event_init() as below:
//////// original ////////////////////////
	if (event->cpu < 0)
		return -EINVAL;
	event->cpu = cci_pmu->cpu;
//////// original ////////////////////////
Change into
//////// new ////////////////////////
	event->cpu = cci_pmu->cpu;
	if (event->cpu < 0)
		return -EINVAL;
//////// new ////////////////////////
Move this if condition after the assignment value operation.
Otherwise, when call perf_event_open()->find_get_context(), 
int cpu = event->cpu = -1;
-1 as a parameter passed into per_cpu_ptr(), and return a wrong cpuctx,
If we continue to execute code with the wrong cpuctx, kernel may crash.

I am not sure whether my analysis is reasonable, only just for your reference.

Thanks,
Limeng

> 
>          ret = cci_pmu_init(cci_pmu, pdev);
> -       if (ret) {
> -               put_cpu();
> +       if (ret)
>                  return ret;
> -       }
> +       g_cci_pmu = cci_pmu;
> 
>          cpuhp_setup_state_nocalls(CPUHP_AP_PERF_ARM_CCI_ONLINE,
>                                    "perf/arm/cci:online", NULL,
>                                    cci_pmu_offline_cpu);
> -       put_cpu();
> -       g_cci_pmu = cci_pmu;
> +       cci_pmu->cpu = smp_processor_id();
> +
>          pr_info("ARM %s PMU driver probed", cci_pmu->model->name);
>          return 0;
>   }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Could you please help to have a look a bug trace in pmu arm-cci.c
  2019-01-30 19:09   ` Robin Murphy
  2019-02-01  3:23     ` Li, Meng
@ 2019-02-01 18:01     ` Will Deacon
  2019-02-01 18:42       ` Robin Murphy
  1 sibling, 1 reply; 6+ messages in thread
From: Will Deacon @ 2019-02-01 18:01 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Li, Meng, mark.rutland, linux-arm-kernel, linux-kernel, suzuki.poulose

On Wed, Jan 30, 2019 at 07:09:42PM +0000, Robin Murphy wrote:
> On 2019-01-30 6:21 pm, Will Deacon wrote:
> > [+Suzuki and Robin]
> > 
> > On Mon, Jan 28, 2019 at 07:19:20AM +0000, Li, Meng wrote:
> > > When enable kernel configure CONFIG_DEBUG_ATOMIC_SLEEP, there is below trace
> > > during pmu arm cci driver probe phase.
> > > 
> > > [ 1.983337] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:2004
> > > [ 1.983340] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
> > > [ 1.983342] Preemption disabled at:
> > > [ 1.983353] [<ffffff80089801f4>] cci_pmu_probe+0x1dc/0x488
> > > [ 1.983360] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.20-rt8-yocto-preempt-rt #1
> > > [ 1.983362] Hardware name: ZynqMP ZCU102 Rev1.0 (DT)
> > > [ 1.983364] Call trace:
> > > [ 1.983369] dump_backtrace+0x0/0x158
> > > [ 1.983372] show_stack+0x24/0x30
> > > [ 1.983378] dump_stack+0x80/0xa4
> > > [ 1.983383] ___might_sleep+0x138/0x160
> > > [ 1.983386] __might_sleep+0x58/0x90
> > > [ 1.983391] __rt_mutex_lock_state+0x30/0xc0
> > > [ 1.983395] _mutex_lock+0x24/0x30
> > > [ 1.983400] perf_pmu_register+0x2c/0x388
> > > [ 1.983404] cci_pmu_probe+0x2bc/0x488
> > > [ 1.983409] platform_drv_probe+0x58/0xa8
> > > 
> > > Because get_cpu() is invoked, preempt is disable, finally, trace occurs when
> > > call might_sleep()
> > 
> > Hmm, the {get,put}_cpu() usage here looks very broken to me. There's the
> > fact that it might sleep, but also the assignment to g_cci_pmu is done after
> > we've re-enabled preemption, so there's a race with CPU hotplug there too.
> 
> Hmm, looks like I failed to appreciate that particular race at the time -
> indeed the global should probably be assigned immediately after
> cci_pmu_init() has succeeded.
> 
> > I don't think we can simply register the hotplug notifier before registering
> > the PMU, because we can't call into perf_pmu_migrate_context() until the PMU
> > has been registered. Perhaps we need to use the _cpuslocked() versions of
> > the hotplug notifier registration functions.
> > 
> > I tried looking at some other drivers, but they all look broken to me, so
> > there's a good chance I'm missing something. Anybody know how this is
> > supposed to work?
> 
> As I understand the general pattern, we register the notifier last to avoid
> taking a hotplug callback with a partly-initialised PMU state, however since
> the CPU we've picked is part of that PMU state, we also want to avoid
> getting migrated off that CPU before the notifier is in place lest things
> get out of sync, hence disabling preemption. As far as the correctness of
> implementing that logic, though, it was like that when I got here so I've
> always just assumed it was fine :)
> 
> I guess the question is whether we actually need to pick our nominal CPU
> before perf_pmu_register(), or if something like the below would suffice -
> what do you reckon?
> 
> Robin.
> 
> ----->8-----
> diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
> index 1bfeb160c5b1..da9309ff80d7 100644
> --- a/drivers/perf/arm-cci.c
> +++ b/drivers/perf/arm-cci.c
> @@ -1692,19 +1692,18 @@ static int cci_pmu_probe(struct platform_device
> *pdev)
>         raw_spin_lock_init(&cci_pmu->hw_events.pmu_lock);
>         mutex_init(&cci_pmu->reserve_mutex);
>         atomic_set(&cci_pmu->active_events, 0);
> -       cci_pmu->cpu = get_cpu();
> +       cci_pmu->cpu = -1; /* Avoid races until hotplug notifier is alive */
> 
>         ret = cci_pmu_init(cci_pmu, pdev);

So at this point we've registered the PMU with perf, so I think we're open
to userspace. Given that things like pmu_cpumask_attr_show() call
cpumask_of(cci_pmu->cpu), having a cpu of -1 seems like a bad idea.

Why not just use the _cpuslocked() notifier registration functions so that
we don't need to disable preemption?

Will

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Could you please help to have a look a bug trace in pmu arm-cci.c
  2019-02-01 18:01     ` Will Deacon
@ 2019-02-01 18:42       ` Robin Murphy
  2019-02-04  9:26         ` Will Deacon
  0 siblings, 1 reply; 6+ messages in thread
From: Robin Murphy @ 2019-02-01 18:42 UTC (permalink / raw)
  To: Will Deacon
  Cc: Li, Meng, mark.rutland, linux-arm-kernel, linux-kernel, suzuki.poulose

On 01/02/2019 18:01, Will Deacon wrote:
> On Wed, Jan 30, 2019 at 07:09:42PM +0000, Robin Murphy wrote:
>> On 2019-01-30 6:21 pm, Will Deacon wrote:
>>> [+Suzuki and Robin]
>>>
>>> On Mon, Jan 28, 2019 at 07:19:20AM +0000, Li, Meng wrote:
>>>> When enable kernel configure CONFIG_DEBUG_ATOMIC_SLEEP, there is below trace
>>>> during pmu arm cci driver probe phase.
>>>>
>>>> [ 1.983337] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:2004
>>>> [ 1.983340] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
>>>> [ 1.983342] Preemption disabled at:
>>>> [ 1.983353] [<ffffff80089801f4>] cci_pmu_probe+0x1dc/0x488
>>>> [ 1.983360] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.20-rt8-yocto-preempt-rt #1
>>>> [ 1.983362] Hardware name: ZynqMP ZCU102 Rev1.0 (DT)
>>>> [ 1.983364] Call trace:
>>>> [ 1.983369] dump_backtrace+0x0/0x158
>>>> [ 1.983372] show_stack+0x24/0x30
>>>> [ 1.983378] dump_stack+0x80/0xa4
>>>> [ 1.983383] ___might_sleep+0x138/0x160
>>>> [ 1.983386] __might_sleep+0x58/0x90
>>>> [ 1.983391] __rt_mutex_lock_state+0x30/0xc0
>>>> [ 1.983395] _mutex_lock+0x24/0x30
>>>> [ 1.983400] perf_pmu_register+0x2c/0x388
>>>> [ 1.983404] cci_pmu_probe+0x2bc/0x488
>>>> [ 1.983409] platform_drv_probe+0x58/0xa8
>>>>
>>>> Because get_cpu() is invoked, preempt is disable, finally, trace occurs when
>>>> call might_sleep()
>>>
>>> Hmm, the {get,put}_cpu() usage here looks very broken to me. There's the
>>> fact that it might sleep, but also the assignment to g_cci_pmu is done after
>>> we've re-enabled preemption, so there's a race with CPU hotplug there too.
>>
>> Hmm, looks like I failed to appreciate that particular race at the time -
>> indeed the global should probably be assigned immediately after
>> cci_pmu_init() has succeeded.
>>
>>> I don't think we can simply register the hotplug notifier before registering
>>> the PMU, because we can't call into perf_pmu_migrate_context() until the PMU
>>> has been registered. Perhaps we need to use the _cpuslocked() versions of
>>> the hotplug notifier registration functions.
>>>
>>> I tried looking at some other drivers, but they all look broken to me, so
>>> there's a good chance I'm missing something. Anybody know how this is
>>> supposed to work?
>>
>> As I understand the general pattern, we register the notifier last to avoid
>> taking a hotplug callback with a partly-initialised PMU state, however since
>> the CPU we've picked is part of that PMU state, we also want to avoid
>> getting migrated off that CPU before the notifier is in place lest things
>> get out of sync, hence disabling preemption. As far as the correctness of
>> implementing that logic, though, it was like that when I got here so I've
>> always just assumed it was fine :)
>>
>> I guess the question is whether we actually need to pick our nominal CPU
>> before perf_pmu_register(), or if something like the below would suffice -
>> what do you reckon?
>>
>> Robin.
>>
>> ----->8-----
>> diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
>> index 1bfeb160c5b1..da9309ff80d7 100644
>> --- a/drivers/perf/arm-cci.c
>> +++ b/drivers/perf/arm-cci.c
>> @@ -1692,19 +1692,18 @@ static int cci_pmu_probe(struct platform_device
>> *pdev)
>>          raw_spin_lock_init(&cci_pmu->hw_events.pmu_lock);
>>          mutex_init(&cci_pmu->reserve_mutex);
>>          atomic_set(&cci_pmu->active_events, 0);
>> -       cci_pmu->cpu = get_cpu();
>> +       cci_pmu->cpu = -1; /* Avoid races until hotplug notifier is alive */
>>
>>          ret = cci_pmu_init(cci_pmu, pdev);
> 
> So at this point we've registered the PMU with perf, so I think we're open
> to userspace. Given that things like pmu_cpumask_attr_show() call
> cpumask_of(cci_pmu->cpu), having a cpu of -1 seems like a bad idea.
> 
> Why not just use the _cpuslocked() notifier registration functions so that
> we don't need to disable preemption?

Because that alone doesn't necessarily help, but what I failed to grasp 
is the implication that in order to do it you need to manually take the 
hotplug lock, and if you do *that* in the right places, it removes the 
race condition altogether. Now that I've made sense of it, I think 
that's actually the only valid way to solve the problem. Let me spin a 
proper patch...

Robin.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Could you please help to have a look a bug trace in pmu arm-cci.c
  2019-02-01 18:42       ` Robin Murphy
@ 2019-02-04  9:26         ` Will Deacon
  0 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2019-02-04  9:26 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Li, Meng, mark.rutland, linux-arm-kernel, linux-kernel, suzuki.poulose

On Fri, Feb 01, 2019 at 06:42:46PM +0000, Robin Murphy wrote:
> On 01/02/2019 18:01, Will Deacon wrote:
> > On Wed, Jan 30, 2019 at 07:09:42PM +0000, Robin Murphy wrote:
> > > On 2019-01-30 6:21 pm, Will Deacon wrote:
> > > > [+Suzuki and Robin]
> > > > 
> > > > On Mon, Jan 28, 2019 at 07:19:20AM +0000, Li, Meng wrote:
> > > > > When enable kernel configure CONFIG_DEBUG_ATOMIC_SLEEP, there is below trace
> > > > > during pmu arm cci driver probe phase.
> > > > > 
> > > > > [ 1.983337] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:2004
> > > > > [ 1.983340] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
> > > > > [ 1.983342] Preemption disabled at:
> > > > > [ 1.983353] [<ffffff80089801f4>] cci_pmu_probe+0x1dc/0x488
> > > > > [ 1.983360] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.20-rt8-yocto-preempt-rt #1
> > > > > [ 1.983362] Hardware name: ZynqMP ZCU102 Rev1.0 (DT)
> > > > > [ 1.983364] Call trace:
> > > > > [ 1.983369] dump_backtrace+0x0/0x158
> > > > > [ 1.983372] show_stack+0x24/0x30
> > > > > [ 1.983378] dump_stack+0x80/0xa4
> > > > > [ 1.983383] ___might_sleep+0x138/0x160
> > > > > [ 1.983386] __might_sleep+0x58/0x90
> > > > > [ 1.983391] __rt_mutex_lock_state+0x30/0xc0
> > > > > [ 1.983395] _mutex_lock+0x24/0x30
> > > > > [ 1.983400] perf_pmu_register+0x2c/0x388
> > > > > [ 1.983404] cci_pmu_probe+0x2bc/0x488
> > > > > [ 1.983409] platform_drv_probe+0x58/0xa8
> > > > > 
> > > > > Because get_cpu() is invoked, preempt is disable, finally, trace occurs when
> > > > > call might_sleep()
> > > > 
> > > > Hmm, the {get,put}_cpu() usage here looks very broken to me. There's the
> > > > fact that it might sleep, but also the assignment to g_cci_pmu is done after
> > > > we've re-enabled preemption, so there's a race with CPU hotplug there too.
> > > 
> > > Hmm, looks like I failed to appreciate that particular race at the time -
> > > indeed the global should probably be assigned immediately after
> > > cci_pmu_init() has succeeded.
> > > 
> > > > I don't think we can simply register the hotplug notifier before registering
> > > > the PMU, because we can't call into perf_pmu_migrate_context() until the PMU
> > > > has been registered. Perhaps we need to use the _cpuslocked() versions of
> > > > the hotplug notifier registration functions.
> > > > 
> > > > I tried looking at some other drivers, but they all look broken to me, so
> > > > there's a good chance I'm missing something. Anybody know how this is
> > > > supposed to work?
> > > 
> > > As I understand the general pattern, we register the notifier last to avoid
> > > taking a hotplug callback with a partly-initialised PMU state, however since
> > > the CPU we've picked is part of that PMU state, we also want to avoid
> > > getting migrated off that CPU before the notifier is in place lest things
> > > get out of sync, hence disabling preemption. As far as the correctness of
> > > implementing that logic, though, it was like that when I got here so I've
> > > always just assumed it was fine :)
> > > 
> > > I guess the question is whether we actually need to pick our nominal CPU
> > > before perf_pmu_register(), or if something like the below would suffice -
> > > what do you reckon?
> > > 
> > > Robin.
> > > 
> > > ----->8-----
> > > diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
> > > index 1bfeb160c5b1..da9309ff80d7 100644
> > > --- a/drivers/perf/arm-cci.c
> > > +++ b/drivers/perf/arm-cci.c
> > > @@ -1692,19 +1692,18 @@ static int cci_pmu_probe(struct platform_device
> > > *pdev)
> > >          raw_spin_lock_init(&cci_pmu->hw_events.pmu_lock);
> > >          mutex_init(&cci_pmu->reserve_mutex);
> > >          atomic_set(&cci_pmu->active_events, 0);
> > > -       cci_pmu->cpu = get_cpu();
> > > +       cci_pmu->cpu = -1; /* Avoid races until hotplug notifier is alive */
> > > 
> > >          ret = cci_pmu_init(cci_pmu, pdev);
> > 
> > So at this point we've registered the PMU with perf, so I think we're open
> > to userspace. Given that things like pmu_cpumask_attr_show() call
> > cpumask_of(cci_pmu->cpu), having a cpu of -1 seems like a bad idea.
> > 
> > Why not just use the _cpuslocked() notifier registration functions so that
> > we don't need to disable preemption?
> 
> Because that alone doesn't necessarily help, but what I failed to grasp is
> the implication that in order to do it you need to manually take the hotplug
> lock, and if you do *that* in the right places, it removes the race
> condition altogether. Now that I've made sense of it, I think that's
> actually the only valid way to solve the problem. Let me spin a proper
> patch...

Yeah, sorry for being unhelpfully vague there. I meant using the
_cpuslocked() calls in conjunction with cpus_read_lock().

I think at least the DSU PMU driver is also broken in this area.

Cheers,

Will

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-02-04  9:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <529F9A9100AE8045A7A5B5A00A39FBB862099B8E@ALA-MBD.corp.ad.wrs.com>
2019-01-30 18:21 ` Could you please help to have a look a bug trace in pmu arm-cci.c Will Deacon
2019-01-30 19:09   ` Robin Murphy
2019-02-01  3:23     ` Li, Meng
2019-02-01 18:01     ` Will Deacon
2019-02-01 18:42       ` Robin Murphy
2019-02-04  9:26         ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).