All of lore.kernel.org
 help / color / mirror / Atom feed
* Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work
@ 2017-07-26 11:57 Michael Ellerman
  2017-07-26 12:44 ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Ellerman @ 2017-07-26 11:57 UTC (permalink / raw)
  To: tglx; +Cc: linuxppc-dev

Hi Thomas,

I'm seeing the lockdep barf below on some bare metal Power8 machines.

This seems to be caused by our smp_cpus_done(), which does:

  void __init smp_cpus_done(unsigned int max_cpus)
  {
  	/*
  	 * We want the setup_cpu() here to be called on the boot CPU, but
  	 * init might run on any CPU, so make sure it's invoked on the boot
  	 * CPU.
  	 */
  	if (smp_ops && smp_ops->setup_cpu)
  		work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL);


I don't think CPU hotplug can happen at this point, so I don't think
there's really a bug.

But it looks like the work_on_cpu_safe() call could just go away, since
you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to
the boot CPU, initially"). Though I can't see where init is unpinned, so
maybe we do still need to do it?

cheers


[    1.468911] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March 14, 2017)
[    1.469032] ipr 0001:08:00.0: Found IOA with IRQ: 0
[    1.469167] 
[    1.469193] ======================================================
[    1.469250] WARNING: possible circular locking dependency detected
[    1.469309] 4.12.0-gcc6-gb40b238 #1 Not tainted
[    1.469355] ------------------------------------------------------
[    1.469413] kworker/0:1/971 is trying to acquire lock:
[    1.469459]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<c000000000100974>] apply_workqueue_attrs+0x34/0xa0
[    1.469544] 
[    1.469544] but task is already holding lock:
[    1.469602]  ((&wfc.work)){+.+.+.}, at: [<c0000000000fdb2c>] process_one_work+0x25c/0x800
[    1.469674] 
[    1.469674] which lock already depends on the new lock.
[    1.469674] 
[    1.469744] 
[    1.469744] the existing dependency chain (in reverse order) is:
[    1.469813] 
[    1.469813] -> #1 ((&wfc.work)){+.+.+.}:
[    1.469862]        flush_work+0x74/0x340
[    1.469898]        work_on_cpu+0xb8/0xe0
[    1.469934]        work_on_cpu_safe+0x80/0xd0
[    1.469982]        smp_cpus_done+0x54/0xb0
[    1.470017]        smp_init+0x1a0/0x1cc
[    1.470053]        kernel_init_freeable+0x1f4/0x3b0
[    1.470100]        kernel_init+0x24/0x160
[    1.470137]        ret_from_kernel_thread+0x5c/0x74
[    1.470183] 
[    1.470183] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
[    1.470244]        lock_acquire+0xec/0x2e0
[    1.470279]        cpus_read_lock+0x4c/0xd0
[    1.470315]        apply_workqueue_attrs+0x34/0xa0
[    1.470362]        __alloc_workqueue_key+0x1b4/0x5b4
[    1.470410]        scsi_host_alloc+0x328/0x4b0
[    1.470457]        ipr_probe+0xb4/0x1f00
[    1.470493]        local_pci_probe+0x6c/0x140
[    1.470541]        work_for_cpu_fn+0x38/0x60
[    1.470588]        process_one_work+0x310/0x800
[    1.470635]        worker_thread+0x2b0/0x520
[    1.470682]        kthread+0x164/0x1b0
[    1.470718]        ret_from_kernel_thread+0x5c/0x74
[    1.470764] 
[    1.470764] other info that might help us debug this:
[    1.470764] 
[    1.470834]  Possible unsafe locking scenario:
[    1.470834] 
[    1.470891]        CPU0                    CPU1
[    1.470937]        ----                    ----
[    1.470984]   lock((&wfc.work));
[    1.471020]                                lock(cpu_hotplug_lock.rw_sem);
[    1.471078]                                lock((&wfc.work));
[    1.471136]   lock(cpu_hotplug_lock.rw_sem);
[    1.471183] 
[    1.471183]  *** DEADLOCK ***
[    1.471183] 
[    1.471242] 2 locks held by kworker/0:1/971:
[    1.471289]  #0:  ("events"){.+.+.+}, at: [<c0000000000fdb2c>] process_one_work+0x25c/0x800
[    1.471360]  #1:  ((&wfc.work)){+.+.+.}, at: [<c0000000000fdb2c>] process_one_work+0x25c/0x800
[    1.471488] 
[    1.471488] stack backtrace:
[    1.471582] CPU: 0 PID: 971 Comm: kworker/0:1 Not tainted 4.12.0-gcc6-gb40b238 #1
[    1.471721] Workqueue: events work_for_cpu_fn
[    1.471813] Call Trace:
[    1.471859] [c0000007f877f560] [c000000000b491b8] dump_stack+0xe8/0x160 (unreliable)
[    1.471996] [c0000007f877f5a0] [c000000000153e68] print_circular_bug+0x288/0x3d0
[    1.472133] [c0000007f877f640] [c000000000157ec8] __lock_acquire+0x1858/0x1a20
[    1.472271] [c0000007f877f7b0] [c000000000158bfc] lock_acquire+0xec/0x2e0
[    1.472386] [c0000007f877f880] [c0000000000d4f8c] cpus_read_lock+0x4c/0xd0
[    1.472501] [c0000007f877f8b0] [c000000000100974] apply_workqueue_attrs+0x34/0xa0
[    1.472639] [c0000007f877f8f0] [c000000000102ef4] __alloc_workqueue_key+0x1b4/0x5b4
[    1.472776] [c0000007f877f9b0] [c00000000078cd58] scsi_host_alloc+0x328/0x4b0
[    1.472913] [c0000007f877fa50] [c0000000007bd414] ipr_probe+0xb4/0x1f00
[    1.473028] [c0000007f877fbb0] [c0000000006157fc] local_pci_probe+0x6c/0x140
[    1.473166] [c0000007f877fc40] [c0000000000f8298] work_for_cpu_fn+0x38/0x60
[    1.473281] [c0000007f877fc70] [c0000000000fdbe0] process_one_work+0x310/0x800
[    1.473418] [c0000007f877fd30] [c0000000000fe380] worker_thread+0x2b0/0x520
[    1.473534] [c0000007f877fdc0] [c000000000107c84] kthread+0x164/0x1b0
[    1.473649] [c0000007f877fe30] [c00000000000b6e8] ret_from_kernel_thread+0x5c/0x74

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work
  2017-07-26 11:57 Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work Michael Ellerman
@ 2017-07-26 12:44 ` Thomas Gleixner
  2017-07-27 13:22   ` Michael Ellerman
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2017-07-26 12:44 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On Wed, 26 Jul 2017, Michael Ellerman wrote:

> Hi Thomas,
> 
> I'm seeing the lockdep barf below on some bare metal Power8 machines.
> 
> This seems to be caused by our smp_cpus_done(), which does:
> 
>   void __init smp_cpus_done(unsigned int max_cpus)
>   {
>   	/*
>   	 * We want the setup_cpu() here to be called on the boot CPU, but
>   	 * init might run on any CPU, so make sure it's invoked on the boot
>   	 * CPU.
>   	 */
>   	if (smp_ops && smp_ops->setup_cpu)
>   		work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL);
> 
> 
> I don't think CPU hotplug can happen at this point, so I don't think
> there's really a bug.
> 
> But it looks like the work_on_cpu_safe() call could just go away, since
> you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to
> the boot CPU, initially"). Though I can't see where init is unpinned, so
> maybe we do still need to do it?

It's undone in sched_init_smp(). So it looks safe. The call order is:

     smp_init()
	...
	smp_cpus_done()

     sched_init_smp()

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work
  2017-07-26 12:44 ` Thomas Gleixner
@ 2017-07-27 13:22   ` Michael Ellerman
  2017-07-27 13:24     ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Ellerman @ 2017-07-27 13:22 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linuxppc-dev

Thomas Gleixner <tglx@linutronix.de> writes:

> On Wed, 26 Jul 2017, Michael Ellerman wrote:
>
>> Hi Thomas,
>> 
>> I'm seeing the lockdep barf below on some bare metal Power8 machines.
>> 
>> This seems to be caused by our smp_cpus_done(), which does:
>> 
>>   void __init smp_cpus_done(unsigned int max_cpus)
>>   {
>>   	/*
>>   	 * We want the setup_cpu() here to be called on the boot CPU, but
>>   	 * init might run on any CPU, so make sure it's invoked on the boot
>>   	 * CPU.
>>   	 */
>>   	if (smp_ops && smp_ops->setup_cpu)
>>   		work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL);
>> 
>> 
>> I don't think CPU hotplug can happen at this point, so I don't think
>> there's really a bug.
>> 
>> But it looks like the work_on_cpu_safe() call could just go away, since
>> you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to
>> the boot CPU, initially"). Though I can't see where init is unpinned, so
>> maybe we do still need to do it?
>
> It's undone in sched_init_smp(). So it looks safe. The call order is:
>
>      smp_init()
> 	...
> 	smp_cpus_done()
>
>      sched_init_smp()

Great thanks.

Patch on the way.

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work
  2017-07-27 13:22   ` Michael Ellerman
@ 2017-07-27 13:24     ` Thomas Gleixner
  2017-07-27 13:25       ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2017-07-27 13:24 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On Thu, 27 Jul 2017, Michael Ellerman wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
> 
> > On Wed, 26 Jul 2017, Michael Ellerman wrote:
> >
> >> Hi Thomas,
> >> 
> >> I'm seeing the lockdep barf below on some bare metal Power8 machines.
> >> 
> >> This seems to be caused by our smp_cpus_done(), which does:
> >> 
> >>   void __init smp_cpus_done(unsigned int max_cpus)
> >>   {
> >>   	/*
> >>   	 * We want the setup_cpu() here to be called on the boot CPU, but
> >>   	 * init might run on any CPU, so make sure it's invoked on the boot
> >>   	 * CPU.
> >>   	 */
> >>   	if (smp_ops && smp_ops->setup_cpu)
> >>   		work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL);
> >> 
> >> 
> >> I don't think CPU hotplug can happen at this point, so I don't think
> >> there's really a bug.
> >> 
> >> But it looks like the work_on_cpu_safe() call could just go away, since
> >> you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to
> >> the boot CPU, initially"). Though I can't see where init is unpinned, so
> >> maybe we do still need to do it?
> >
> > It's undone in sched_init_smp(). So it looks safe. The call order is:
> >
> >      smp_init()
> > 	...
> > 	smp_cpus_done()
> >
> >      sched_init_smp()
> 
> Great thanks.
> 
> Patch on the way.

Hmm. Second thoughts. The issue is the stability of the CPUs. Surely the
boot CPU can't go away at that point, but the debug stuff does not know
about it. maybe I'm missing something ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work
  2017-07-27 13:24     ` Thomas Gleixner
@ 2017-07-27 13:25       ` Thomas Gleixner
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Gleixner @ 2017-07-27 13:25 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On Thu, 27 Jul 2017, Thomas Gleixner wrote:
> On Thu, 27 Jul 2017, Michael Ellerman wrote:
> > Thomas Gleixner <tglx@linutronix.de> writes:
> > 
> > > On Wed, 26 Jul 2017, Michael Ellerman wrote:
> > >
> > >> Hi Thomas,
> > >> 
> > >> I'm seeing the lockdep barf below on some bare metal Power8 machines.
> > >> 
> > >> This seems to be caused by our smp_cpus_done(), which does:
> > >> 
> > >>   void __init smp_cpus_done(unsigned int max_cpus)
> > >>   {
> > >>   	/*
> > >>   	 * We want the setup_cpu() here to be called on the boot CPU, but
> > >>   	 * init might run on any CPU, so make sure it's invoked on the boot
> > >>   	 * CPU.
> > >>   	 */
> > >>   	if (smp_ops && smp_ops->setup_cpu)
> > >>   		work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL);
> > >> 
> > >> 
> > >> I don't think CPU hotplug can happen at this point, so I don't think
> > >> there's really a bug.
> > >> 
> > >> But it looks like the work_on_cpu_safe() call could just go away, since
> > >> you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to
> > >> the boot CPU, initially"). Though I can't see where init is unpinned, so
> > >> maybe we do still need to do it?
> > >
> > > It's undone in sched_init_smp(). So it looks safe. The call order is:
> > >
> > >      smp_init()
> > > 	...
> > > 	smp_cpus_done()
> > >
> > >      sched_init_smp()
> > 
> > Great thanks.
> > 
> > Patch on the way.
> 
> Hmm. Second thoughts. The issue is the stability of the CPUs. Surely the
> boot CPU can't go away at that point, but the debug stuff does not know
> about it. maybe I'm missing something ....

Ok. Now seeing your patch I know what I was missing :)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-07-27 13:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-26 11:57 Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work Michael Ellerman
2017-07-26 12:44 ` Thomas Gleixner
2017-07-27 13:22   ` Michael Ellerman
2017-07-27 13:24     ` Thomas Gleixner
2017-07-27 13:25       ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.