All of lore.kernel.org
 help / color / mirror / Atom feed
* workqueue, pci: INFO: possible recursive locking detected
@ 2013-07-16 14:41 Srivatsa S. Bhat
  2013-07-17 10:07 ` Lai Jiangshan
  0 siblings, 1 reply; 16+ messages in thread
From: Srivatsa S. Bhat @ 2013-07-16 14:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Rafael J. Wysocki, bhelgaas, Srivatsa S. Bhat

Hi,

I have been seeing this warning every time during boot. I haven't
spent time digging through it though... Please let me know if
any machine-specific info is needed.

Regards,
Srivatsa S. Bhat


----------------------------------------------------

=============================================
[ INFO: possible recursive locking detected ]
3.11.0-rc1-lockdep-fix-a #6 Not tainted
---------------------------------------------
kworker/0:1/142 is trying to acquire lock:
 ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0

but task is already holding lock:
 ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock((&wfc.work));
  lock((&wfc.work));

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/0:1/142:
 #0:  (events){.+.+.+}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
 #1:  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
 #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff8139a3ba>] device_attach+0x2a/0xc0

stack backtrace:
CPU: 0 PID: 142 Comm: kworker/0:1 Not tainted 3.11.0-rc1-lockdep-fix-a #6
Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
Workqueue: events work_for_cpu_fn
 ffff881036fecd88 ffff881036fef678 ffffffff8161a919 0000000000000003
 ffff881036fec400 ffff881036fef6a8 ffffffff810c2234 ffff881036fec400
 ffff881036fecd88 ffff881036fec400 0000000000000000 ffff881036fef708
Call Trace:
 [<ffffffff8161a919>] dump_stack+0x59/0x80
 [<ffffffff810c2234>] print_deadlock_bug+0xf4/0x100
 [<ffffffff810c3d14>] validate_chain+0x504/0x750
 [<ffffffff810c426d>] __lock_acquire+0x30d/0x580
 [<ffffffff810c4577>] lock_acquire+0x97/0x170
 [<ffffffff81077100>] ? start_flush_work+0x220/0x220
 [<ffffffff81077148>] flush_work+0x48/0xb0
 [<ffffffff81077100>] ? start_flush_work+0x220/0x220
 [<ffffffff810c2c10>] ? mark_held_locks+0x80/0x130
 [<ffffffff81074cfb>] ? queue_work_on+0x4b/0xa0
 [<ffffffff810c2f85>] ? trace_hardirqs_on_caller+0x105/0x1d0
 [<ffffffff810c305d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81077320>] work_on_cpu+0x80/0x90
 [<ffffffff8106f950>] ? wqattrs_hash+0x190/0x190
 [<ffffffff812d37b0>] ? pci_pm_prepare+0x60/0x60
 [<ffffffff812a0059>] ? cpumask_next_and+0x29/0x50
 [<ffffffff812d38da>] __pci_device_probe+0x9a/0xe0
 [<ffffffff81620850>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff812d4be2>] ? pci_dev_get+0x22/0x30
 [<ffffffff812d4c2a>] pci_device_probe+0x3a/0x60
 [<ffffffff81620850>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff8139a4bc>] really_probe+0x6c/0x320
 [<ffffffff8139a7b7>] driver_probe_device+0x47/0xa0
 [<ffffffff8139a8c0>] ? __driver_attach+0xb0/0xb0
 [<ffffffff8139a913>] __device_attach+0x53/0x60
 [<ffffffff81398404>] bus_for_each_drv+0x74/0xa0
 [<ffffffff8139a430>] device_attach+0xa0/0xc0
 [<ffffffff812cb2d9>] pci_bus_add_device+0x39/0x60
 [<ffffffff812eec21>] virtfn_add+0x251/0x3e0
 [<ffffffff810c305d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff812ef29f>] sriov_enable+0x22f/0x3d0
 [<ffffffff812ef48d>] pci_enable_sriov+0x4d/0x60
 [<ffffffffa0143045>] be_vf_setup+0x175/0x410 [be2net]
 [<ffffffffa01493ca>] be_setup+0x37a/0x4b0 [be2net]
 [<ffffffffa0149ac0>] be_probe+0x5c0/0x820 [be2net]
 [<ffffffff812d37fe>] local_pci_probe+0x4e/0x90
 [<ffffffff8106f968>] work_for_cpu_fn+0x18/0x30
 [<ffffffff81075e4a>] process_one_work+0x1da/0x610
 [<ffffffff81075dd9>] ? process_one_work+0x169/0x610
 [<ffffffff8107650c>] worker_thread+0x28c/0x3a0
 [<ffffffff81076280>] ? process_one_work+0x610/0x610
 [<ffffffff8107da3e>] kthread+0xee/0x100
 [<ffffffff8107d950>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff8162a71c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8107d950>] ? __init_kthread_worker+0x70/0x70


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: workqueue, pci: INFO: possible recursive locking detected
  2013-07-16 14:41 workqueue, pci: INFO: possible recursive locking detected Srivatsa S. Bhat
@ 2013-07-17 10:07 ` Lai Jiangshan
  2013-07-18 20:23   ` Srivatsa S. Bhat
  0 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2013-07-17 10:07 UTC (permalink / raw)
  To: Srivatsa S. Bhat; +Cc: linux-kernel, Tejun Heo, Rafael J. Wysocki, bhelgaas

On 07/16/2013 10:41 PM, Srivatsa S. Bhat wrote:
> Hi,
> 
> I have been seeing this warning every time during boot. I haven't
> spent time digging through it though... Please let me know if
> any machine-specific info is needed.
> 
> Regards,
> Srivatsa S. Bhat
> 
> 
> ----------------------------------------------------
> 
> =============================================
> [ INFO: possible recursive locking detected ]
> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
> ---------------------------------------------
> kworker/0:1/142 is trying to acquire lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
> 
> but task is already holding lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock((&wfc.work));
>   lock((&wfc.work));


Hi, Srivatsa

This is false negative, the two "wfc"s are different, they are
both on stack. flush_work() can't be deadlock in such case:

void foo(void *)
{
	...
	if (xxx)
		work_on_cpu(..., foo, ...);
	...
}

bar()
{
	work_on_cpu(..., foo, ...);
}

The complaint is caused by "work_on_cpu() uses a static lock_class_key".
we should fix work_on_cpu().
(but the caller should also be careful, the foo()/local_pci_probe() is re-entering)

But I can't find an elegant fix.

long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
{
	struct work_for_cpu wfc = { .fn = fn, .arg = arg };

+#ifdef CONFIG_LOCKDEP
+	static struct lock_class_key __key;
+	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+	lockdep_init_map(&wfc.work.lockdep_map, &wfc.work, &__key, 0);
+#else
	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+#endif
	schedule_work_on(cpu, &wfc.work);
	flush_work(&wfc.work);
	return wfc.ret;
}


Any think? Tejun?

thanks,
Lai

> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 3 locks held by kworker/0:1/142:
>  #0:  (events){.+.+.+}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>  #1:  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>  #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff8139a3ba>] device_attach+0x2a/0xc0
> 
> stack backtrace:
> CPU: 0 PID: 142 Comm: kworker/0:1 Not tainted 3.11.0-rc1-lockdep-fix-a #6
> Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
> Workqueue: events work_for_cpu_fn
>  ffff881036fecd88 ffff881036fef678 ffffffff8161a919 0000000000000003
>  ffff881036fec400 ffff881036fef6a8 ffffffff810c2234 ffff881036fec400
>  ffff881036fecd88 ffff881036fec400 0000000000000000 ffff881036fef708
> Call Trace:
>  [<ffffffff8161a919>] dump_stack+0x59/0x80
>  [<ffffffff810c2234>] print_deadlock_bug+0xf4/0x100
>  [<ffffffff810c3d14>] validate_chain+0x504/0x750
>  [<ffffffff810c426d>] __lock_acquire+0x30d/0x580
>  [<ffffffff810c4577>] lock_acquire+0x97/0x170
>  [<ffffffff81077100>] ? start_flush_work+0x220/0x220
>  [<ffffffff81077148>] flush_work+0x48/0xb0
>  [<ffffffff81077100>] ? start_flush_work+0x220/0x220
>  [<ffffffff810c2c10>] ? mark_held_locks+0x80/0x130
>  [<ffffffff81074cfb>] ? queue_work_on+0x4b/0xa0
>  [<ffffffff810c2f85>] ? trace_hardirqs_on_caller+0x105/0x1d0
>  [<ffffffff810c305d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff81077320>] work_on_cpu+0x80/0x90
>  [<ffffffff8106f950>] ? wqattrs_hash+0x190/0x190
>  [<ffffffff812d37b0>] ? pci_pm_prepare+0x60/0x60
>  [<ffffffff812a0059>] ? cpumask_next_and+0x29/0x50
>  [<ffffffff812d38da>] __pci_device_probe+0x9a/0xe0
>  [<ffffffff81620850>] ? _raw_spin_unlock_irq+0x30/0x50
>  [<ffffffff812d4be2>] ? pci_dev_get+0x22/0x30
>  [<ffffffff812d4c2a>] pci_device_probe+0x3a/0x60
>  [<ffffffff81620850>] ? _raw_spin_unlock_irq+0x30/0x50
>  [<ffffffff8139a4bc>] really_probe+0x6c/0x320
>  [<ffffffff8139a7b7>] driver_probe_device+0x47/0xa0
>  [<ffffffff8139a8c0>] ? __driver_attach+0xb0/0xb0
>  [<ffffffff8139a913>] __device_attach+0x53/0x60
>  [<ffffffff81398404>] bus_for_each_drv+0x74/0xa0
>  [<ffffffff8139a430>] device_attach+0xa0/0xc0
>  [<ffffffff812cb2d9>] pci_bus_add_device+0x39/0x60
>  [<ffffffff812eec21>] virtfn_add+0x251/0x3e0
>  [<ffffffff810c305d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff812ef29f>] sriov_enable+0x22f/0x3d0
>  [<ffffffff812ef48d>] pci_enable_sriov+0x4d/0x60
>  [<ffffffffa0143045>] be_vf_setup+0x175/0x410 [be2net]
>  [<ffffffffa01493ca>] be_setup+0x37a/0x4b0 [be2net]
>  [<ffffffffa0149ac0>] be_probe+0x5c0/0x820 [be2net]
>  [<ffffffff812d37fe>] local_pci_probe+0x4e/0x90
>  [<ffffffff8106f968>] work_for_cpu_fn+0x18/0x30
>  [<ffffffff81075e4a>] process_one_work+0x1da/0x610
>  [<ffffffff81075dd9>] ? process_one_work+0x169/0x610
>  [<ffffffff8107650c>] worker_thread+0x28c/0x3a0
>  [<ffffffff81076280>] ? process_one_work+0x610/0x610
>  [<ffffffff8107da3e>] kthread+0xee/0x100
>  [<ffffffff8107d950>] ? __init_kthread_worker+0x70/0x70
>  [<ffffffff8162a71c>] ret_from_fork+0x7c/0xb0
>  [<ffffffff8107d950>] ? __init_kthread_worker+0x70/0x70
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: workqueue, pci: INFO: possible recursive locking detected
  2013-07-17 10:07 ` Lai Jiangshan
@ 2013-07-18 20:23   ` Srivatsa S. Bhat
  2013-07-19  1:47     ` Lai Jiangshan
  0 siblings, 1 reply; 16+ messages in thread
From: Srivatsa S. Bhat @ 2013-07-18 20:23 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, Tejun Heo, Rafael J. Wysocki, bhelgaas


On 07/17/2013 03:37 PM, Lai Jiangshan wrote:
> On 07/16/2013 10:41 PM, Srivatsa S. Bhat wrote:
>> Hi,
>>
>> I have been seeing this warning every time during boot. I haven't
>> spent time digging through it though... Please let me know if
>> any machine-specific info is needed.
>>
>> Regards,
>> Srivatsa S. Bhat
>>
>>
>> ----------------------------------------------------
>>
>> =============================================
>> [ INFO: possible recursive locking detected ]
>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>> ---------------------------------------------
>> kworker/0:1/142 is trying to acquire lock:
>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>
>> but task is already holding lock:
>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>
>> other info that might help us debug this:
>>  Possible unsafe locking scenario:
>>
>>        CPU0
>>        ----
>>   lock((&wfc.work));
>>   lock((&wfc.work));
> 
>


Hi Lai,

Thanks for taking a look into this!

> 
> This is false negative,

I believe you meant false-positive...

> the two "wfc"s are different, they are
> both on stack. flush_work() can't be deadlock in such case:
> 
> void foo(void *)
> {
> 	...
> 	if (xxx)
> 		work_on_cpu(..., foo, ...);
> 	...
> }
> 
> bar()
> {
> 	work_on_cpu(..., foo, ...);
> }
> 
> The complaint is caused by "work_on_cpu() uses a static lock_class_key".
> we should fix work_on_cpu().
> (but the caller should also be careful, the foo()/local_pci_probe() is re-entering)
> 
> But I can't find an elegant fix.
> 
> long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
> {
> 	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
> 
> +#ifdef CONFIG_LOCKDEP
> +	static struct lock_class_key __key;
> +	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +	lockdep_init_map(&wfc.work.lockdep_map, &wfc.work, &__key, 0);
> +#else
> 	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +#endif
> 	schedule_work_on(cpu, &wfc.work);
> 	flush_work(&wfc.work);
> 	return wfc.ret;
> }
> 

Unfortunately that didn't seem to fix it.. I applied the patch
shown below, and I got the same old warning.

---

 kernel/workqueue.c |    6 ++++++
 1 file changed, 6 insertions(+)


diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02c4a4..07d9a67 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
 {
 	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
 
+#ifdef CONFIG_LOCKDEP
+	static struct lock_class_key __key;
+	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+	lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
+#else
 	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+#endif
 	schedule_work_on(cpu, &wfc.work);
 	flush_work(&wfc.work);
 	return wfc.ret;



Warning:
--------

wmi: Mapper loaded
be2net 0000:11:00.0: irq 102 for MSI/MSI-X
be2net 0000:11:00.0: enabled 1 MSI-x vector(s)
be2net 0000:11:00.0: created 0 RSS queue(s) and 1 default RX queue
be2net 0000:11:00.0: created 1 TX queue(s)
pci 0000:11:04.0: [19a2:0710] type 00 class 0x020000

=============================================
[ INFO: possible recursive locking detected ]
3.11.0-rc1-wq-fix #10 Not tainted
---------------------------------------------
kworker/0:1/126 is trying to acquire lock:
 (&wfc.work){+.+.+.}, at: [<ffffffff810770f0>] flush_work+0x0/0xb0

but task is already holding lock:
 (&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&wfc.work);
  lock(&wfc.work);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/0:1/126:
 #0:  (events){.+.+.+}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
 #1:  (&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
 #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff81398ada>] device_attach+0x2a/0xc0

stack backtrace:
CPU: 0 PID: 126 Comm: kworker/0:1 Not tainted 3.11.0-rc1-wq-fix #10
Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
Workqueue: events work_for_cpu_fn
 ffff881036887408 ffff881036889668 ffffffff81619059 0000000000000003
 ffff881036886a80 ffff881036889698 ffffffff810c1624 ffff881036886a80
 ffff881036887408 ffff881036886a80 0000000000000000 ffff8810368896f8
Call Trace:
 [<ffffffff81619059>] dump_stack+0x59/0x80
 [<ffffffff810c1624>] print_deadlock_bug+0xf4/0x100
 [<ffffffff810c3104>] validate_chain+0x504/0x750
 [<ffffffff810c365d>] __lock_acquire+0x30d/0x580
 [<ffffffff810c3967>] lock_acquire+0x97/0x170
 [<ffffffff810770f0>] ? start_flush_work+0x220/0x220
 [<ffffffff81077138>] flush_work+0x48/0xb0
 [<ffffffff810770f0>] ? start_flush_work+0x220/0x220
 [<ffffffff810c2000>] ? mark_held_locks+0x80/0x130
 [<ffffffff81074ceb>] ? queue_work_on+0x4b/0xa0
 [<ffffffff810c2375>] ? trace_hardirqs_on_caller+0x105/0x1d0
 [<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81077334>] work_on_cpu+0xa4/0xc0
 [<ffffffff8106f940>] ? wqattrs_hash+0x190/0x190
 [<ffffffff812d1ed0>] ? pci_pm_prepare+0x60/0x60
 [<ffffffff812d1ffa>] __pci_device_probe+0x9a/0xe0
 [<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff812d3302>] ? pci_dev_get+0x22/0x30
 [<ffffffff812d334a>] pci_device_probe+0x3a/0x60
 [<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff81398bdc>] really_probe+0x6c/0x320
 [<ffffffff81398ed7>] driver_probe_device+0x47/0xa0
 [<ffffffff81398fe0>] ? __driver_attach+0xb0/0xb0
 [<ffffffff81399033>] __device_attach+0x53/0x60
 [<ffffffff81396b24>] bus_for_each_drv+0x74/0xa0
 [<ffffffff81398b50>] device_attach+0xa0/0xc0
 [<ffffffff812c99f9>] pci_bus_add_device+0x39/0x60
 [<ffffffff812ed341>] virtfn_add+0x251/0x3e0
 [<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff812ed9bf>] sriov_enable+0x22f/0x3d0
 [<ffffffff812edbad>] pci_enable_sriov+0x4d/0x60
 [<ffffffffa0127045>] be_vf_setup+0x175/0x410 [be2net]
 [<ffffffffa012d3ca>] be_setup+0x37a/0x4b0 [be2net]
 [<ffffffffa012dac0>] be_probe+0x5c0/0x820 [be2net]
 [<ffffffff812d1f1e>] local_pci_probe+0x4e/0x90
 [<ffffffff8106f958>] work_for_cpu_fn+0x18/0x30
 [<ffffffff81075e3a>] process_one_work+0x1da/0x610
 [<ffffffff81075dc9>] ? process_one_work+0x169/0x610
 [<ffffffff810764fc>] worker_thread+0x28c/0x3a0
 [<ffffffff81076270>] ? process_one_work+0x610/0x610
 [<ffffffff8107da5e>] kthread+0xee/0x100
 [<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff81628e5c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70
be2net 0000:11:04.0: enabling device (0040 -> 0042)
be2net 0000:11:04.0: Could not use PCIe error reporting
be2net 0000:11:04.0: VF is not privileged to issue opcode 89-1
be2net 0000:11:04.0: VF is not privileged to issue opcode 125-1


Regards,
Srivatsa S. Bhat


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: workqueue, pci: INFO: possible recursive locking detected
  2013-07-18 20:23   ` Srivatsa S. Bhat
@ 2013-07-19  1:47     ` Lai Jiangshan
  2013-07-19  8:57       ` Srivatsa S. Bhat
  0 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2013-07-19  1:47 UTC (permalink / raw)
  To: Srivatsa S. Bhat; +Cc: linux-kernel, Tejun Heo, Rafael J. Wysocki, bhelgaas

On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
> 
> On 07/17/2013 03:37 PM, Lai Jiangshan wrote:
>> On 07/16/2013 10:41 PM, Srivatsa S. Bhat wrote:
>>> Hi,
>>>
>>> I have been seeing this warning every time during boot. I haven't
>>> spent time digging through it though... Please let me know if
>>> any machine-specific info is needed.
>>>
>>> Regards,
>>> Srivatsa S. Bhat
>>>
>>>
>>> ----------------------------------------------------
>>>
>>> =============================================
>>> [ INFO: possible recursive locking detected ]
>>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>>> ---------------------------------------------
>>> kworker/0:1/142 is trying to acquire lock:
>>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>>
>>> but task is already holding lock:
>>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>>
>>> other info that might help us debug this:
>>>  Possible unsafe locking scenario:
>>>
>>>        CPU0
>>>        ----
>>>   lock((&wfc.work));
>>>   lock((&wfc.work));
>>
>>
> 
> 
> Hi Lai,
> 
> Thanks for taking a look into this!
> 
>>
>> This is false negative,
> 
> I believe you meant false-positive...
> 
>> the two "wfc"s are different, they are
>> both on stack. flush_work() can't be deadlock in such case:
>>
>> void foo(void *)
>> {
>> 	...
>> 	if (xxx)
>> 		work_on_cpu(..., foo, ...);
>> 	...
>> }
>>
>> bar()
>> {
>> 	work_on_cpu(..., foo, ...);
>> }
>>
>> The complaint is caused by "work_on_cpu() uses a static lock_class_key".
>> we should fix work_on_cpu().
>> (but the caller should also be careful, the foo()/local_pci_probe() is re-entering)
>>
>> But I can't find an elegant fix.
>>
>> long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>> {
>> 	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>
>> +#ifdef CONFIG_LOCKDEP
>> +	static struct lock_class_key __key;
>> +	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> +	lockdep_init_map(&wfc.work.lockdep_map, &wfc.work, &__key, 0);
>> +#else
>> 	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> +#endif
>> 	schedule_work_on(cpu, &wfc.work);
>> 	flush_work(&wfc.work);
>> 	return wfc.ret;
>> }
>>
> 
> Unfortunately that didn't seem to fix it.. I applied the patch
> shown below, and I got the same old warning.
> 
> ---
> 
>  kernel/workqueue.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f02c4a4..07d9a67 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>  {
>  	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>  
> +#ifdef CONFIG_LOCKDEP
> +	static struct lock_class_key __key;

Sorry, this "static" should be removed.

Thanks,
Lai


> +	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +	lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
> +#else
>  	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +#endif
>  	schedule_work_on(cpu, &wfc.work);
>  	flush_work(&wfc.work);
>  	return wfc.ret;
> 
> 
> 
> Warning:
> --------
> 
> wmi: Mapper loaded
> be2net 0000:11:00.0: irq 102 for MSI/MSI-X
> be2net 0000:11:00.0: enabled 1 MSI-x vector(s)
> be2net 0000:11:00.0: created 0 RSS queue(s) and 1 default RX queue
> be2net 0000:11:00.0: created 1 TX queue(s)
> pci 0000:11:04.0: [19a2:0710] type 00 class 0x020000
> 
> =============================================
> [ INFO: possible recursive locking detected ]
> 3.11.0-rc1-wq-fix #10 Not tainted
> ---------------------------------------------
> kworker/0:1/126 is trying to acquire lock:
>  (&wfc.work){+.+.+.}, at: [<ffffffff810770f0>] flush_work+0x0/0xb0
> 
> but task is already holding lock:
>  (&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&wfc.work);
>   lock(&wfc.work);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 3 locks held by kworker/0:1/126:
>  #0:  (events){.+.+.+}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
>  #1:  (&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
>  #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff81398ada>] device_attach+0x2a/0xc0
> 
> stack backtrace:
> CPU: 0 PID: 126 Comm: kworker/0:1 Not tainted 3.11.0-rc1-wq-fix #10
> Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
> Workqueue: events work_for_cpu_fn
>  ffff881036887408 ffff881036889668 ffffffff81619059 0000000000000003
>  ffff881036886a80 ffff881036889698 ffffffff810c1624 ffff881036886a80
>  ffff881036887408 ffff881036886a80 0000000000000000 ffff8810368896f8
> Call Trace:
>  [<ffffffff81619059>] dump_stack+0x59/0x80
>  [<ffffffff810c1624>] print_deadlock_bug+0xf4/0x100
>  [<ffffffff810c3104>] validate_chain+0x504/0x750
>  [<ffffffff810c365d>] __lock_acquire+0x30d/0x580
>  [<ffffffff810c3967>] lock_acquire+0x97/0x170
>  [<ffffffff810770f0>] ? start_flush_work+0x220/0x220
>  [<ffffffff81077138>] flush_work+0x48/0xb0
>  [<ffffffff810770f0>] ? start_flush_work+0x220/0x220
>  [<ffffffff810c2000>] ? mark_held_locks+0x80/0x130
>  [<ffffffff81074ceb>] ? queue_work_on+0x4b/0xa0
>  [<ffffffff810c2375>] ? trace_hardirqs_on_caller+0x105/0x1d0
>  [<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff81077334>] work_on_cpu+0xa4/0xc0
>  [<ffffffff8106f940>] ? wqattrs_hash+0x190/0x190
>  [<ffffffff812d1ed0>] ? pci_pm_prepare+0x60/0x60
>  [<ffffffff812d1ffa>] __pci_device_probe+0x9a/0xe0
>  [<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50
>  [<ffffffff812d3302>] ? pci_dev_get+0x22/0x30
>  [<ffffffff812d334a>] pci_device_probe+0x3a/0x60
>  [<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50
>  [<ffffffff81398bdc>] really_probe+0x6c/0x320
>  [<ffffffff81398ed7>] driver_probe_device+0x47/0xa0
>  [<ffffffff81398fe0>] ? __driver_attach+0xb0/0xb0
>  [<ffffffff81399033>] __device_attach+0x53/0x60
>  [<ffffffff81396b24>] bus_for_each_drv+0x74/0xa0
>  [<ffffffff81398b50>] device_attach+0xa0/0xc0
>  [<ffffffff812c99f9>] pci_bus_add_device+0x39/0x60
>  [<ffffffff812ed341>] virtfn_add+0x251/0x3e0
>  [<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff812ed9bf>] sriov_enable+0x22f/0x3d0
>  [<ffffffff812edbad>] pci_enable_sriov+0x4d/0x60
>  [<ffffffffa0127045>] be_vf_setup+0x175/0x410 [be2net]
>  [<ffffffffa012d3ca>] be_setup+0x37a/0x4b0 [be2net]
>  [<ffffffffa012dac0>] be_probe+0x5c0/0x820 [be2net]
>  [<ffffffff812d1f1e>] local_pci_probe+0x4e/0x90
>  [<ffffffff8106f958>] work_for_cpu_fn+0x18/0x30
>  [<ffffffff81075e3a>] process_one_work+0x1da/0x610
>  [<ffffffff81075dc9>] ? process_one_work+0x169/0x610
>  [<ffffffff810764fc>] worker_thread+0x28c/0x3a0
>  [<ffffffff81076270>] ? process_one_work+0x610/0x610
>  [<ffffffff8107da5e>] kthread+0xee/0x100
>  [<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70
>  [<ffffffff81628e5c>] ret_from_fork+0x7c/0xb0
>  [<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70
> be2net 0000:11:04.0: enabling device (0040 -> 0042)
> be2net 0000:11:04.0: Could not use PCIe error reporting
> be2net 0000:11:04.0: VF is not privileged to issue opcode 89-1
> be2net 0000:11:04.0: VF is not privileged to issue opcode 125-1
> 
> 
> Regards,
> Srivatsa S. Bhat
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: workqueue, pci: INFO: possible recursive locking detected
  2013-07-19  1:47     ` Lai Jiangshan
@ 2013-07-19  8:57       ` Srivatsa S. Bhat
  2013-07-22 11:52         ` Lai Jiangshan
  0 siblings, 1 reply; 16+ messages in thread
From: Srivatsa S. Bhat @ 2013-07-19  8:57 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, Tejun Heo, Rafael J. Wysocki, bhelgaas

On 07/19/2013 07:17 AM, Lai Jiangshan wrote:
> On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
>>
>> ---
>>
>>  kernel/workqueue.c |    6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index f02c4a4..07d9a67 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>  {
>>  	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>  
>> +#ifdef CONFIG_LOCKDEP
>> +	static struct lock_class_key __key;
> 
> Sorry, this "static" should be removed.
> 

That didn't help either :-( Because it makes lockdep unhappy,
since the key isn't persistent.

This is the patch I used:

---

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02c4a4..7967e3b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
 {
 	struct work_for_cpu wfc = { .fn = fn, .arg = arg };

+#ifdef CONFIG_LOCKDEP
+	struct lock_class_key __key;
+	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+	lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
+#else
 	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+#endif
 	schedule_work_on(cpu, &wfc.work);
 	flush_work(&wfc.work);
 	return wfc.ret;


And here are the new warnings:


Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
BUG: key ffff881039557b98 not in .data!
------------[ cut here ]------------
WARNING: CPU: 8 PID: 1 at kernel/lockdep.c:2987 lockdep_init_map+0x168/0x170()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in:
CPU: 8 PID: 1 Comm: swapper/0 Not tainted 3.11.0-rc1-wq-fix3-a #12
Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
 0000000000000bab ffff881039557a58 ffffffff81619069 ffffffff819f6dce
 ffff881039557aa8 ffff881039557a98 ffffffff8104e6ac ffff881039557b98
 ffff881039557b58 0000000000000000 ffff881039557b58 0000000000000000
Call Trace:
 [<ffffffff81619069>] dump_stack+0x59/0x80
 [<ffffffff8104e6ac>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff8104e796>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff810bfbb8>] lockdep_init_map+0x168/0x170
 [<ffffffff81077326>] work_on_cpu+0x96/0xd0
 [<ffffffff8106f940>] ? wqattrs_hash+0x190/0x190
 [<ffffffff812d1ee0>] ? pci_pm_prepare+0x60/0x60
 [<ffffffff812d200a>] __pci_device_probe+0x9a/0xe0
 [<ffffffff8161efa0>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff812d3312>] ? pci_dev_get+0x22/0x30
 [<ffffffff812d335a>] pci_device_probe+0x3a/0x60
 [<ffffffff8161efa0>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff81398bec>] really_probe+0x6c/0x320
 [<ffffffff81398ee7>] driver_probe_device+0x47/0xa0
 [<ffffffff81398feb>] __driver_attach+0xab/0xb0
 [<ffffffff81398f40>] ? driver_probe_device+0xa0/0xa0
 [<ffffffff81398f40>] ? driver_probe_device+0xa0/0xa0
 [<ffffffff81396bf4>] bus_for_each_dev+0x94/0xb0
 [<ffffffff813988ae>] driver_attach+0x1e/0x20
 [<ffffffff81398258>] bus_add_driver+0x208/0x290
 [<ffffffff81399597>] driver_register+0x77/0x160
 [<ffffffff81d66b9e>] ? dmi_pcie_pme_disable_msi+0x21/0x21
 [<ffffffff812d3464>] __pci_register_driver+0x64/0x70
 [<ffffffff81d66c07>] pcie_portdrv_init+0x69/0x7a
 [<ffffffff810002c2>] do_one_initcall+0xf2/0x1a0
 [<ffffffff8107b0d6>] ? parse_args+0x156/0x2f0
 [<ffffffff81d32b80>] ? kernel_init_freeable+0x28a/0x28a
 [<ffffffff81d328d8>] do_basic_setup+0x9d/0xbb
 [<ffffffff81d32b80>] ? kernel_init_freeable+0x28a/0x28a
 [<ffffffff81d32b00>] kernel_init_freeable+0x20a/0x28a
 [<ffffffff81611c40>] ? rest_init+0x180/0x180
 [<ffffffff81611c4e>] kernel_init+0xe/0xf0
 [<ffffffff81628e5c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81611c40>] ? rest_init+0x180/0x180
---[ end trace 41047d25544a3721 ]---
pcieport 0000:00:01.0: irq 88 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:00:02.0: irq 89 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:00:02.2: irq 90 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:00:03.0: irq 91 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:00:11.0: irq 92 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:00:1c.0: irq 93 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:00:1c.4: irq 94 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
BUG: key ffff881039557b98 not in .data!
BUG: key ffff881039557b98 not in .data!
BUG: key ffff881039557b98 not in .data!
BUG: key ffff881039557b98 not in .data!
pcieport 0000:80:00.0: irq 95 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:80:01.0: irq 96 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:80:01.1: irq 97 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:80:02.0: irq 98 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:80:02.2: irq 99 for MSI/MSI-X
BUG: key ffff881039557b98 not in .data!
pcieport 0000:80:03.0: irq 100 for MSI/MSI-X
pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt
pcie_pme 0000:00:01.0:pcie01: service driver pcie_pme loaded

<many more instances of "BUG: key ... not in .data!">

Regards,
Srivatsa S. Bhat


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: workqueue, pci: INFO: possible recursive locking detected
  2013-07-19  8:57       ` Srivatsa S. Bhat
@ 2013-07-22 11:52         ` Lai Jiangshan
  2013-07-22 15:37           ` Srivatsa S. Bhat
  2013-07-22 21:32           ` Tejun Heo
  0 siblings, 2 replies; 16+ messages in thread
From: Lai Jiangshan @ 2013-07-22 11:52 UTC (permalink / raw)
  To: Srivatsa S. Bhat; +Cc: linux-kernel, Tejun Heo, Rafael J. Wysocki, bhelgaas

On 07/19/2013 04:57 PM, Srivatsa S. Bhat wrote:
> On 07/19/2013 07:17 AM, Lai Jiangshan wrote:
>> On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
>>>
>>> ---
>>>
>>>  kernel/workqueue.c |    6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>>
>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>> index f02c4a4..07d9a67 100644
>>> --- a/kernel/workqueue.c
>>> +++ b/kernel/workqueue.c
>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>  {
>>>  	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>  
>>> +#ifdef CONFIG_LOCKDEP
>>> +	static struct lock_class_key __key;
>>
>> Sorry, this "static" should be removed.
>>
> 
> That didn't help either :-( Because it makes lockdep unhappy,
> since the key isn't persistent.
> 
> This is the patch I used:
> 
> ---
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f02c4a4..7967e3b 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>  {
>  	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
> 
> +#ifdef CONFIG_LOCKDEP
> +	struct lock_class_key __key;
> +	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +	lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
> +#else
>  	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +#endif
>  	schedule_work_on(cpu, &wfc.work);
>  	flush_work(&wfc.work);
>  	return wfc.ret;
> 
> 
> And here are the new warnings:
> 
> 
> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
> io scheduler noop registered
> io scheduler deadline registered
> io scheduler cfq registered (default)
> BUG: key ffff881039557b98 not in .data!
> ------------[ cut here ]------------
> WARNING: CPU: 8 PID: 1 at kernel/lockdep.c:2987 lockdep_init_map+0x168/0x170()

Sorry again.

>From 0096b9dac2282ec03d59a3f665b92977381a18ad Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Mon, 22 Jul 2013 19:08:51 +0800
Subject: [PATCH] [PATCH] workqueue: allow the function of work_on_cpu() can
 call work_on_cpu()

If the @fn call work_on_cpu() again, the lockdep will complain:

> [ INFO: possible recursive locking detected ]
> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
> ---------------------------------------------
> kworker/0:1/142 is trying to acquire lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>
> but task is already holding lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock((&wfc.work));
>   lock((&wfc.work));
>
>  *** DEADLOCK ***

It is false-positive lockdep report. In this sutiation,
the two "wfc"s of the two work_on_cpu() are different,
they are both on stack. flush_work() can't be deadlock.

To fix this, we need to avoid the lockdep checking in this case,
But we don't want to change the flush_work(), so we use
completion instead of flush_work() in the work_on_cpu().

Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02c4a4..b021a45 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4731,6 +4731,7 @@ struct work_for_cpu {
 	long (*fn)(void *);
 	void *arg;
 	long ret;
+	struct completion done;
 };
 
 static void work_for_cpu_fn(struct work_struct *work)
@@ -4738,6 +4739,7 @@ static void work_for_cpu_fn(struct work_struct *work)
 	struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);
 
 	wfc->ret = wfc->fn(wfc->arg);
+	complete(&wfc->done);
 }
 
 /**
@@ -4755,8 +4757,9 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
 	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
 
 	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+	init_completion(&wfc.done);
 	schedule_work_on(cpu, &wfc.work);
-	flush_work(&wfc.work);
+	wait_for_completion(&wfc.done);
 	return wfc.ret;
 }
 EXPORT_SYMBOL_GPL(work_on_cpu);
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: workqueue, pci: INFO: possible recursive locking detected
  2013-07-22 11:52         ` Lai Jiangshan
@ 2013-07-22 15:37           ` Srivatsa S. Bhat
  2013-07-22 21:38             ` Bjorn Helgaas
  2013-07-22 21:32           ` Tejun Heo
  1 sibling, 1 reply; 16+ messages in thread
From: Srivatsa S. Bhat @ 2013-07-22 15:37 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, Tejun Heo, Rafael J. Wysocki, bhelgaas

On 07/22/2013 05:22 PM, Lai Jiangshan wrote:
> On 07/19/2013 04:57 PM, Srivatsa S. Bhat wrote:
>> On 07/19/2013 07:17 AM, Lai Jiangshan wrote:
>>> On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
>>>>
>>>> ---
>>>>
>>>>  kernel/workqueue.c |    6 ++++++
>>>>  1 file changed, 6 insertions(+)
>>>>
>>>>
>>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>>> index f02c4a4..07d9a67 100644
>>>> --- a/kernel/workqueue.c
>>>> +++ b/kernel/workqueue.c
>>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>>  {
>>>>  	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>>  
>>>> +#ifdef CONFIG_LOCKDEP
>>>> +	static struct lock_class_key __key;
>>>
>>> Sorry, this "static" should be removed.
>>>
>>
>> That didn't help either :-( Because it makes lockdep unhappy,
>> since the key isn't persistent.
>>
>> This is the patch I used:
>>
>> ---
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index f02c4a4..7967e3b 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>  {
>>  	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>
>> +#ifdef CONFIG_LOCKDEP
>> +	struct lock_class_key __key;
>> +	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> +	lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
>> +#else
>>  	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> +#endif
>>  	schedule_work_on(cpu, &wfc.work);
>>  	flush_work(&wfc.work);
>>  	return wfc.ret;
>>
>>
>> And here are the new warnings:
>>
>>
>> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
>> io scheduler noop registered
>> io scheduler deadline registered
>> io scheduler cfq registered (default)
>> BUG: key ffff881039557b98 not in .data!
>> ------------[ cut here ]------------
>> WARNING: CPU: 8 PID: 1 at kernel/lockdep.c:2987 lockdep_init_map+0x168/0x170()
> 
> Sorry again.
> 
> From 0096b9dac2282ec03d59a3f665b92977381a18ad Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date: Mon, 22 Jul 2013 19:08:51 +0800
> Subject: [PATCH] [PATCH] workqueue: allow the function of work_on_cpu() can
>  call work_on_cpu()
> 
> If the @fn call work_on_cpu() again, the lockdep will complain:
> 
>> [ INFO: possible recursive locking detected ]
>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>> ---------------------------------------------
>> kworker/0:1/142 is trying to acquire lock:
>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>
>> but task is already holding lock:
>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>
>> other info that might help us debug this:
>>  Possible unsafe locking scenario:
>>
>>        CPU0
>>        ----
>>   lock((&wfc.work));
>>   lock((&wfc.work));
>>
>>  *** DEADLOCK ***
> 
> It is false-positive lockdep report. In this sutiation,
> the two "wfc"s of the two work_on_cpu() are different,
> they are both on stack. flush_work() can't be deadlock.
> 
> To fix this, we need to avoid the lockdep checking in this case,
> But we don't want to change the flush_work(), so we use
> completion instead of flush_work() in the work_on_cpu().
> 
> Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---

That worked, thanks a lot!

Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Regards,
Srivatsa S. Bhat

>  kernel/workqueue.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f02c4a4..b021a45 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4731,6 +4731,7 @@ struct work_for_cpu {
>  	long (*fn)(void *);
>  	void *arg;
>  	long ret;
> +	struct completion done;
>  };
> 
>  static void work_for_cpu_fn(struct work_struct *work)
> @@ -4738,6 +4739,7 @@ static void work_for_cpu_fn(struct work_struct *work)
>  	struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);
> 
>  	wfc->ret = wfc->fn(wfc->arg);
> +	complete(&wfc->done);
>  }
> 
>  /**
> @@ -4755,8 +4757,9 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>  	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
> 
>  	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +	init_completion(&wfc.done);
>  	schedule_work_on(cpu, &wfc.work);
> -	flush_work(&wfc.work);
> +	wait_for_completion(&wfc.done);
>  	return wfc.ret;
>  }
>  EXPORT_SYMBOL_GPL(work_on_cpu);
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: workqueue, pci: INFO: possible recursive locking detected
  2013-07-22 11:52         ` Lai Jiangshan
  2013-07-22 15:37           ` Srivatsa S. Bhat
@ 2013-07-22 21:32           ` Tejun Heo
  2013-07-23  1:23             ` Lai Jiangshan
  1 sibling, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2013-07-22 21:32 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Srivatsa S. Bhat, linux-kernel, Rafael J. Wysocki, bhelgaas

On Mon, Jul 22, 2013 at 07:52:34PM +0800, Lai Jiangshan wrote:
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f02c4a4..b021a45 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4731,6 +4731,7 @@ struct work_for_cpu {
>  	long (*fn)(void *);
>  	void *arg;
>  	long ret;
> +	struct completion done;
>  };
>  
>  static void work_for_cpu_fn(struct work_struct *work)
> @@ -4738,6 +4739,7 @@ static void work_for_cpu_fn(struct work_struct *work)
>  	struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);
>  
>  	wfc->ret = wfc->fn(wfc->arg);
> +	complete(&wfc->done);
>  }
>  
>  /**
> @@ -4755,8 +4757,9 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>  	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>  
>  	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +	init_completion(&wfc.done);
>  	schedule_work_on(cpu, &wfc.work);
> -	flush_work(&wfc.work);
> +	wait_for_completion(&wfc.done);

Hmmm... it's kinda nasty.  Given how infrequently work_on_cpu() users
nest, I think it'd be cleaner to have work_on_cpu_nested() which takes
@subclass.  It requires extra work on the caller's part but I think
that actually is useful as nested work_on_cpu()s are pretty weird
things.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: workqueue, pci: INFO: possible recursive locking detected
  2013-07-22 15:37           ` Srivatsa S. Bhat
@ 2013-07-22 21:38             ` Bjorn Helgaas
  2013-07-22 22:06               ` Yinghai Lu
  2013-07-22 22:33               ` Alexander Duyck
  0 siblings, 2 replies; 16+ messages in thread
From: Bjorn Helgaas @ 2013-07-22 21:38 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, linux-kernel, Tejun Heo, Rafael J. Wysocki,
	Alexander Duyck, Yinghai Lu, linux-pci

[+cc Alex, Yinghai, linux-pci]

On Mon, Jul 22, 2013 at 9:37 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 07/22/2013 05:22 PM, Lai Jiangshan wrote:
>> On 07/19/2013 04:57 PM, Srivatsa S. Bhat wrote:
>>> On 07/19/2013 07:17 AM, Lai Jiangshan wrote:
>>>> On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
>>>>>
>>>>> ---
>>>>>
>>>>>  kernel/workqueue.c |    6 ++++++
>>>>>  1 file changed, 6 insertions(+)
>>>>>
>>>>>
>>>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>>>> index f02c4a4..07d9a67 100644
>>>>> --- a/kernel/workqueue.c
>>>>> +++ b/kernel/workqueue.c
>>>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>>>  {
>>>>>    struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>>>
>>>>> +#ifdef CONFIG_LOCKDEP
>>>>> +  static struct lock_class_key __key;
>>>>
>>>> Sorry, this "static" should be removed.
>>>>
>>>
>>> That didn't help either :-( Because it makes lockdep unhappy,
>>> since the key isn't persistent.
>>>
>>> This is the patch I used:
>>>
>>> ---
>>>
>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>> index f02c4a4..7967e3b 100644
>>> --- a/kernel/workqueue.c
>>> +++ b/kernel/workqueue.c
>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>  {
>>>      struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>
>>> +#ifdef CONFIG_LOCKDEP
>>> +    struct lock_class_key __key;
>>> +    INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>>> +    lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
>>> +#else
>>>      INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>>> +#endif
>>>      schedule_work_on(cpu, &wfc.work);
>>>      flush_work(&wfc.work);
>>>      return wfc.ret;
>>>
>>>
>>> And here are the new warnings:
>>>
>>>
>>> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
>>> io scheduler noop registered
>>> io scheduler deadline registered
>>> io scheduler cfq registered (default)
>>> BUG: key ffff881039557b98 not in .data!
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 8 PID: 1 at kernel/lockdep.c:2987 lockdep_init_map+0x168/0x170()
>>
>> Sorry again.
>>
>> From 0096b9dac2282ec03d59a3f665b92977381a18ad Mon Sep 17 00:00:00 2001
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date: Mon, 22 Jul 2013 19:08:51 +0800
>> Subject: [PATCH] [PATCH] workqueue: allow the function of work_on_cpu() can
>>  call work_on_cpu()
>>
>> If the @fn call work_on_cpu() again, the lockdep will complain:
>>
>>> [ INFO: possible recursive locking detected ]
>>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>>> ---------------------------------------------
>>> kworker/0:1/142 is trying to acquire lock:
>>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>>
>>> but task is already holding lock:
>>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>>
>>> other info that might help us debug this:
>>>  Possible unsafe locking scenario:
>>>
>>>        CPU0
>>>        ----
>>>   lock((&wfc.work));
>>>   lock((&wfc.work));
>>>
>>>  *** DEADLOCK ***
>>
>> It is false-positive lockdep report. In this sutiation,
>> the two "wfc"s of the two work_on_cpu() are different,
>> they are both on stack. flush_work() can't be deadlock.
>>
>> To fix this, we need to avoid the lockdep checking in this case,
>> But we don't want to change the flush_work(), so we use
>> completion instead of flush_work() in the work_on_cpu().
>>
>> Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>
> That worked, thanks a lot!
>
> Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>
> Regards,
> Srivatsa S. Bhat
>
>>  kernel/workqueue.c |    5 ++++-
>>  1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index f02c4a4..b021a45 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -4731,6 +4731,7 @@ struct work_for_cpu {
>>       long (*fn)(void *);
>>       void *arg;
>>       long ret;
>> +     struct completion done;
>>  };
>>
>>  static void work_for_cpu_fn(struct work_struct *work)
>> @@ -4738,6 +4739,7 @@ static void work_for_cpu_fn(struct work_struct *work)
>>       struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);
>>
>>       wfc->ret = wfc->fn(wfc->arg);
>> +     complete(&wfc->done);
>>  }
>>
>>  /**
>> @@ -4755,8 +4757,9 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>       struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>
>>       INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> +     init_completion(&wfc.done);
>>       schedule_work_on(cpu, &wfc.work);
>> -     flush_work(&wfc.work);
>> +     wait_for_completion(&wfc.done);
>>       return wfc.ret;
>>  }
>>  EXPORT_SYMBOL_GPL(work_on_cpu);
>>
>

Isn't this for the same issue Alex and others have been working on?

It doesn't feel like we have consensus on how this should be fixed.
You're proposing a change to work_on_cpu(), Alex proposed a change to
pci_call_probe() [1], Yinghai proposed some changes to the PCI core
SR-IOV code and several drivers [2].

[1] https://lkml.kernel.org/r/20130624195942.40795.27292.stgit@ahduyck-cp1.jf.intel.com
[2] https://lkml.kernel.org/r/1368498506-25857-7-git-send-email-yinghai@kernel.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: workqueue, pci: INFO: possible recursive locking detected
  2013-07-22 21:38             ` Bjorn Helgaas
@ 2013-07-22 22:06               ` Yinghai Lu
  2013-07-22 22:33               ` Alexander Duyck
  1 sibling, 0 replies; 16+ messages in thread
From: Yinghai Lu @ 2013-07-22 22:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Srivatsa S. Bhat, Lai Jiangshan, linux-kernel, Tejun Heo,
	Rafael J. Wysocki, Alexander Duyck, linux-pci

[-- Attachment #1: Type: text/plain, Size: 882 bytes --]

On Mon, Jul 22, 2013 at 2:38 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Alex, Yinghai, linux-pci]

>
> Isn't this for the same issue Alex and others have been working on?

yes.

>
> It doesn't feel like we have consensus on how this should be fixed.
> You're proposing a change to work_on_cpu(), Alex proposed a change to
> pci_call_probe() [1], Yinghai proposed some changes to the PCI core
> SR-IOV code and several drivers [2].
>
> [1] https://lkml.kernel.org/r/20130624195942.40795.27292.stgit@ahduyck-cp1.jf.intel.com
> [2] https://lkml.kernel.org/r/1368498506-25857-7-git-send-email-yinghai@kernel.org

Actually i used Alex's patch and attached -v2 of sriov_attatch in my
local testing for a while. -v2: remove the device_schedule_callback
hacking.

Also, someone comment that numa_node_id() in alex's should not be
used, as it could
be preempted.

Thanks

Yinghai

[-- Attachment #2: fix_racing_removing_6_6.patch --]
[-- Type: application/octet-stream, Size: 14647 bytes --]

Subject: [PATCH v2 6/7] PCI: Make sure VF's driver get attached after PF's
From: Yinghai Lu <yinghai@kernel.org>

Found kernel try to load mlx4 drivers for VFs before
PF's is loaded when the drivers are built-in, and kernel
command line include probe_vfs=63, num_vfs=63.

[  169.581682] calling  mlx4_init+0x0/0x119 @ 1
[  169.595681] mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011)
[  169.600194] mlx4_core: Initializing 0000:02:00.0
[  169.616322] mlx4_core 0000:02:00.0: Enabling SR-IOV with 63 VFs
[  169.724084] pci 0000:02:00.1: [15b3:1002] type 00 class 0x0c0600
[  169.732442] mlx4_core: Initializing 0000:02:00.1
[  169.734345] mlx4_core 0000:02:00.1: enabling device (0000 -> 0002)
[  169.747060] mlx4_core 0000:02:00.1: enabling bus mastering
[  169.764283] mlx4_core 0000:02:00.1: Detected virtual function - running in slave mode
[  169.767409] mlx4_core 0000:02:00.1: with iommu 3 : domain 11
[  169.785589] mlx4_core 0000:02:00.1: Sending reset
[  179.790131] mlx4_core 0000:02:00.1: Got slave FLRed from Communication channel (ret:0x1)
[  181.798661] mlx4_core 0000:02:00.1: slave is currently in themiddle of FLR. retrying...(try num:1)
[  181.803336] mlx4_core 0000:02:00.1: Communication channel is not idle.my toggle is 1 (cmd:0x0)
...
[  182.078710] mlx4_core 0000:02:00.1: slave is currently in themiddle of FLR. retrying...(try num:10)
[  182.096657] mlx4_core 0000:02:00.1: Communication channel is not idle.my toggle is 1 (cmd:0x0)
[  182.104935] mlx4_core 0000:02:00.1: slave driver version is not supported by the master
[  182.118570] mlx4_core 0000:02:00.1: Communication channel is not idle.my toggle is 1 (cmd:0x0)
[  182.138190] mlx4_core 0000:02:00.1: Failed to initialize slave
[  182.141728] mlx4_core: probe of 0000:02:00.1 failed with error -5

It turns that this also happen for hotadd path even drivers are
compiled as modules and if they are loaded. Esp some VF share the
same driver with PF.

calling path:
	device driver probe
		==> pci_enable_sriov
			==> virtfn_add
				==> pci_dev_add
				==> pci_bus_device_add
when pci_bus_device_add is called, the VF's driver will be attached.
and at that time PF's driver does not finish yet.

Need to move out pci_bus_device_add from virtfn_add and call it
later.

bnx2x and qlcnic are ok, because it does not modules command line
to enable sriov. They must use sysfs to enable it.

be2net is ok, according to Sathya Perla,
he fixed this issue in be2net with the following patch (commit b4c1df93)
  http://marc.info/?l=linux-netdev&m=136801459808765&w=2

For igb and ixgbe is ok, as Alex Duyck said:
| The VF driver should be able to be loaded when the PF driver is not
| present.  We handle it in igb and ixgbe last I checked, and I don't see
| any reason why it cannot be handled in all other VF drivers.  I'm not
| saying the VF has to be able to fully functional, but it should be able
| to detect the PF becoming enabled and then bring itself to a fully
| functional state.  To not handle that case is a bug.

Looks like the patch will help enic, mlx4, efx, vxge and lpfc now.

-v2: don't use schedule_callback, and initcall after Alex's patch.
	pci: Avoid reentrant calls to work_on_cpu

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Yan Burman <yanb@mellanox.com>
Cc: Sathya Perla <Sathya.Perla@Emulex.Com>
Cc: netdev@vger.kernel.org

---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c    |    7 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c      |    4 +
 drivers/net/ethernet/cisco/enic/enic_main.c          |    2 
 drivers/net/ethernet/emulex/benet/be_main.c          |    4 +
 drivers/net/ethernet/intel/igb/igb_main.c            |   11 +++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c        |    2 
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c       |    9 ++-
 drivers/net/ethernet/mellanox/mlx4/main.c            |    3 +
 drivers/net/ethernet/neterion/vxge/vxge-main.c       |    3 +
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c |    5 +-
 drivers/net/ethernet/sfc/efx.c                       |    1 
 drivers/pci/iov.c                                    |   47 ++++++++++++++++---
 drivers/scsi/lpfc/lpfc_init.c                        |    2 
 include/linux/pci.h                                  |    4 +
 14 files changed, 90 insertions(+), 14 deletions(-)

Index: linux-2.6/drivers/net/ethernet/mellanox/mlx4/main.c
===================================================================
--- linux-2.6.orig/drivers/net/ethernet/mellanox/mlx4/main.c
+++ linux-2.6/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2308,6 +2308,9 @@ slave_start:
 	priv->pci_dev_data = pci_dev_data;
 	pci_set_drvdata(pdev, dev);
 
+	if (dev->flags & MLX4_FLAG_SRIOV)
+		pci_bus_add_device_vfs(pdev);
+
 	return 0;
 
 err_port:
Index: linux-2.6/drivers/pci/iov.c
===================================================================
--- linux-2.6.orig/drivers/pci/iov.c
+++ linux-2.6/drivers/pci/iov.c
@@ -66,7 +66,8 @@ static void virtfn_remove_bus(struct pci
 		pci_remove_bus(child);
 }
 
-static int virtfn_add(struct pci_dev *dev, int id, int reset)
+static int virtfn_add(struct pci_dev *dev, int id, int reset,
+			struct pci_dev **ret)
 {
 	int i;
 	int rc;
@@ -116,7 +117,6 @@ static int virtfn_add(struct pci_dev *de
 	pci_device_add(virtfn, virtfn->bus);
 	mutex_unlock(&iov->dev->sriov->lock);
 
-	rc = pci_bus_add_device(virtfn);
 	sprintf(buf, "virtfn%u", id);
 	rc = sysfs_create_link(&dev->dev.kobj, &virtfn->dev.kobj, buf);
 	if (rc)
@@ -127,6 +127,8 @@ static int virtfn_add(struct pci_dev *de
 
 	kobject_uevent(&virtfn->dev.kobj, KOBJ_CHANGE);
 
+	if (ret)
+		*ret = virtfn;
 	return 0;
 
 failed2:
@@ -141,6 +143,26 @@ failed1:
 	return rc;
 }
 
+void pci_bus_add_device_vfs(struct pci_dev *pdev)
+{
+	int rc;
+	struct pci_dev *dev;
+
+       /* only search if it is a PF */
+	if (!pdev->is_physfn)
+		return;
+
+	/* loop through all the VFs to see if we own and is not added yet*/
+	dev = pci_get_device(pdev->vendor, PCI_ANY_ID, NULL);
+	while (dev) {
+		if (dev->is_virtfn && dev->physfn == pdev && !dev->is_added)
+			rc = pci_bus_add_device(dev);
+
+		dev = pci_get_device(pdev->vendor, PCI_ANY_ID, dev);
+	}
+}
+EXPORT_SYMBOL_GPL(pci_bus_add_device_vfs);
+
 static void virtfn_remove(struct pci_dev *dev, int id, int reset)
 {
 	char buf[VIRTFN_ID_LEN];
@@ -204,6 +226,7 @@ static int sriov_migration(struct pci_de
 static void sriov_migration_task(struct work_struct *work)
 {
 	int i;
+	int rc;
 	u8 state;
 	u16 status;
 	struct pci_sriov *iov = container_of(work, struct pci_sriov, mtask);
@@ -213,14 +236,24 @@ static void sriov_migration_task(struct
 		if (state == PCI_SRIOV_VFM_MI) {
 			writeb(PCI_SRIOV_VFM_AV, iov->mstate + i);
 			state = readb(iov->mstate + i);
-			if (state == PCI_SRIOV_VFM_AV)
-				virtfn_add(iov->self, i, 1);
+			if (state == PCI_SRIOV_VFM_AV) {
+				struct pci_dev *virtfn = NULL;
+
+				virtfn_add(iov->self, i, 1, &virtfn);
+				if (virtfn)
+					rc = pci_bus_add_device(virtfn);
+			}
 		} else if (state == PCI_SRIOV_VFM_MO) {
 			virtfn_remove(iov->self, i, 1);
 			writeb(PCI_SRIOV_VFM_UA, iov->mstate + i);
 			state = readb(iov->mstate + i);
-			if (state == PCI_SRIOV_VFM_AV)
-				virtfn_add(iov->self, i, 0);
+			if (state == PCI_SRIOV_VFM_AV) {
+				struct pci_dev *virtfn = NULL;
+
+				virtfn_add(iov->self, i, 0, &virtfn);
+				if (virtfn)
+					rc = pci_bus_add_device(virtfn);
+			}
 		}
 	}
 
@@ -356,7 +389,7 @@ static int sriov_enable(struct pci_dev *
 		initial = nr_virtfn;
 
 	for (i = 0; i < initial; i++) {
-		rc = virtfn_add(dev, i, 0);
+		rc = virtfn_add(dev, i, 0, NULL);
 		if (rc)
 			goto failed;
 	}
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -1659,6 +1659,7 @@ void __iomem *pci_ioremap_bar(struct pci
 
 #ifdef CONFIG_PCI_IOV
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
+void pci_bus_add_device_vfs(struct pci_dev *dev);
 void pci_disable_sriov(struct pci_dev *dev);
 irqreturn_t pci_sriov_migration(struct pci_dev *dev);
 int pci_num_vf(struct pci_dev *dev);
@@ -1670,6 +1671,9 @@ static inline int pci_enable_sriov(struc
 {
 	return -ENODEV;
 }
+static inline void pci_bus_add_device_vfs(struct pci_dev *dev)
+{
+}
 static inline void pci_disable_sriov(struct pci_dev *dev)
 {
 }
Index: linux-2.6/drivers/net/ethernet/intel/igb/igb_main.c
===================================================================
--- linux-2.6.orig/drivers/net/ethernet/intel/igb/igb_main.c
+++ linux-2.6/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2366,6 +2366,9 @@ static int igb_probe(struct pci_dev *pde
 	}
 
 	pm_runtime_put_noidle(&pdev->dev);
+
+	pci_bus_add_device_vfs(pdev);
+
 	return 0;
 
 err_register:
@@ -7278,8 +7281,12 @@ static int igb_pci_sriov_configure(struc
 #ifdef CONFIG_PCI_IOV
 	if (num_vfs == 0)
 		return igb_pci_disable_sriov(dev);
-	else
-		return igb_pci_enable_sriov(dev, num_vfs);
+	else {
+		int ret = igb_pci_enable_sriov(dev, num_vfs);
+		if (ret > 0)
+			pci_bus_add_device_vfs(dev);
+		return ret;
+	}
 #endif
 	return 0;
 }
Index: linux-2.6/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
===================================================================
--- linux-2.6.orig/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ linux-2.6/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -346,8 +346,13 @@ int ixgbe_pci_sriov_configure(struct pci
 {
 	if (num_vfs == 0)
 		return ixgbe_pci_sriov_disable(dev);
-	else
-		return ixgbe_pci_sriov_enable(dev, num_vfs);
+	else {
+		int ret = ixgbe_pci_sriov_enable(dev, num_vfs);
+
+		if (ret > 0)
+			pci_bus_add_device_vfs(dev);
+		return ret;
+	}
 }
 
 static int ixgbe_set_vf_multicasts(struct ixgbe_adapter *adapter,
Index: linux-2.6/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
===================================================================
--- linux-2.6.orig/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ linux-2.6/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7658,6 +7658,8 @@ skip_sriov:
 			IXGBE_LINK_SPEED_10GB_FULL | IXGBE_LINK_SPEED_1GB_FULL,
 			true);
 
+	pci_bus_add_device_vfs(pdev);
+
 	return 0;
 
 err_register:
Index: linux-2.6/drivers/scsi/lpfc/lpfc_init.c
===================================================================
--- linux-2.6.orig/drivers/scsi/lpfc/lpfc_init.c
+++ linux-2.6/drivers/scsi/lpfc/lpfc_init.c
@@ -10582,6 +10582,8 @@ lpfc_pci_probe_one(struct pci_dev *pdev,
 	else
 		rc = lpfc_pci_probe_one_s3(pdev, pid);
 
+	pci_bus_add_device_vfs(pdev);
+
 	return rc;
 }
 
Index: linux-2.6/drivers/net/ethernet/emulex/benet/be_main.c
===================================================================
--- linux-2.6.orig/drivers/net/ethernet/emulex/benet/be_main.c
+++ linux-2.6/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4111,6 +4111,7 @@ static int lancer_recover_func(struct be
 	if (status)
 		goto err;
 
+	pci_bus_add_device_vfs(adapter->pdev);
 	if (netif_running(adapter->netdev)) {
 		status = be_open(adapter->netdev);
 		if (status)
@@ -4327,6 +4328,8 @@ static int be_probe(struct pci_dev *pdev
 	dev_info(&pdev->dev, "%s: %s %s port %c\n", nic_name(pdev),
 		 func_name(adapter), mc_name(adapter), port_name);
 
+	pci_bus_add_device_vfs(pdev);
+
 	return 0;
 
 unsetup:
@@ -4398,6 +4401,7 @@ static int be_resume(struct pci_dev *pde
 		rtnl_unlock();
 	}
 
+	pci_bus_add_device_vfs(adapter->pdev);
 	schedule_delayed_work(&adapter->func_recovery_work,
 			      msecs_to_jiffies(1000));
 	netif_device_attach(netdev);
Index: linux-2.6/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
===================================================================
--- linux-2.6.orig/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
+++ linux-2.6/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
@@ -568,8 +568,11 @@ int qlcnic_pci_sriov_configure(struct pc
 
 	if (num_vfs == 0)
 		err = qlcnic_pci_sriov_disable(adapter);
-	else
+	else {
 		err = qlcnic_pci_sriov_enable(adapter, num_vfs);
+		if (err > 0)
+			pci_bus_add_device_vfs(dev);
+	}
 
 	clear_bit(__QLCNIC_RESETTING, &adapter->state);
 	return err;
Index: linux-2.6/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
===================================================================
--- linux-2.6.orig/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
+++ linux-2.6/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
@@ -3048,7 +3048,12 @@ int bnx2x_sriov_configure(struct pci_dev
 		pci_disable_sriov(dev);
 		return 0;
 	} else {
-		return bnx2x_enable_sriov(bp);
+		int ret = bnx2x_enable_sriov(bp);
+
+		if (ret > 0)
+			pci_bus_add_device_vfs(dev);
+
+		return 0;
 	}
 }
 
Index: linux-2.6/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
===================================================================
--- linux-2.6.orig/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ linux-2.6/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5749,10 +5749,12 @@ static int init_one(struct pci_dev *pdev
 sriov:
 #ifdef CONFIG_PCI_IOV
 	if (func < ARRAY_SIZE(num_vf) && num_vf[func] > 0)
-		if (pci_enable_sriov(pdev, num_vf[func]) == 0)
+		if (pci_enable_sriov(pdev, num_vf[func]) == 0) {
 			dev_info(&pdev->dev,
 				 "instantiated %u virtual functions\n",
 				 num_vf[func]);
+			pci_bus_add_device_vfs(pdev);
+		}
 #endif
 	return 0;
 
Index: linux-2.6/drivers/net/ethernet/cisco/enic/enic_main.c
===================================================================
--- linux-2.6.orig/drivers/net/ethernet/cisco/enic/enic_main.c
+++ linux-2.6/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -2524,6 +2524,8 @@ static int enic_probe(struct pci_dev *pd
 		goto err_out_dev_deinit;
 	}
 
+	pci_bus_add_device_vfs(pdev);
+
 	return 0;
 
 err_out_dev_deinit:
Index: linux-2.6/drivers/net/ethernet/neterion/vxge/vxge-main.c
===================================================================
--- linux-2.6.orig/drivers/net/ethernet/neterion/vxge/vxge-main.c
+++ linux-2.6/drivers/net/ethernet/neterion/vxge/vxge-main.c
@@ -4731,6 +4731,9 @@ vxge_probe(struct pci_dev *pdev, const s
 		vxge_hw_device_trace_level_get(hldev));
 
 	kfree(ll_config);
+
+	pci_bus_add_device_vfs(pdev);
+
 	return 0;
 
 _exit6:
Index: linux-2.6/drivers/net/ethernet/sfc/efx.c
===================================================================
--- linux-2.6.orig/drivers/net/ethernet/sfc/efx.c
+++ linux-2.6/drivers/net/ethernet/sfc/efx.c
@@ -2824,6 +2824,7 @@ static int efx_pci_probe(struct pci_dev
 		netif_warn(efx, probe, efx->net_dev,
 			   "pci_enable_pcie_error_reporting failed (%d)\n", rc);
 
+	pci_bus_add_device_vfs(pci_dev);
 	return 0;
 
  fail4:

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: workqueue, pci: INFO: possible recursive locking detected
  2013-07-22 21:38             ` Bjorn Helgaas
  2013-07-22 22:06               ` Yinghai Lu
@ 2013-07-22 22:33               ` Alexander Duyck
  1 sibling, 0 replies; 16+ messages in thread
From: Alexander Duyck @ 2013-07-22 22:33 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Srivatsa S. Bhat, Lai Jiangshan, linux-kernel, Tejun Heo,
	Rafael J. Wysocki, Yinghai Lu, linux-pci

On 07/22/2013 02:38 PM, Bjorn Helgaas wrote:
> [+cc Alex, Yinghai, linux-pci]
>
> On Mon, Jul 22, 2013 at 9:37 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 07/22/2013 05:22 PM, Lai Jiangshan wrote:
>>> On 07/19/2013 04:57 PM, Srivatsa S. Bhat wrote:
>>>> On 07/19/2013 07:17 AM, Lai Jiangshan wrote:
>>>>> On 07/19/2013 04:23 AM, Srivatsa S. Bhat wrote:
>>>>>> ---
>>>>>>
>>>>>>  kernel/workqueue.c |    6 ++++++
>>>>>>  1 file changed, 6 insertions(+)
>>>>>>
>>>>>>
>>>>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>>>>> index f02c4a4..07d9a67 100644
>>>>>> --- a/kernel/workqueue.c
>>>>>> +++ b/kernel/workqueue.c
>>>>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>>>>  {
>>>>>>    struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>>>>
>>>>>> +#ifdef CONFIG_LOCKDEP
>>>>>> +  static struct lock_class_key __key;
>>>>> Sorry, this "static" should be removed.
>>>>>
>>>> That didn't help either :-( Because it makes lockdep unhappy,
>>>> since the key isn't persistent.
>>>>
>>>> This is the patch I used:
>>>>
>>>> ---
>>>>
>>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>>> index f02c4a4..7967e3b 100644
>>>> --- a/kernel/workqueue.c
>>>> +++ b/kernel/workqueue.c
>>>> @@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>>  {
>>>>      struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>>
>>>> +#ifdef CONFIG_LOCKDEP
>>>> +    struct lock_class_key __key;
>>>> +    INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>>>> +    lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
>>>> +#else
>>>>      INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>>>> +#endif
>>>>      schedule_work_on(cpu, &wfc.work);
>>>>      flush_work(&wfc.work);
>>>>      return wfc.ret;
>>>>
>>>>
>>>> And here are the new warnings:
>>>>
>>>>
>>>> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
>>>> io scheduler noop registered
>>>> io scheduler deadline registered
>>>> io scheduler cfq registered (default)
>>>> BUG: key ffff881039557b98 not in .data!
>>>> ------------[ cut here ]------------
>>>> WARNING: CPU: 8 PID: 1 at kernel/lockdep.c:2987 lockdep_init_map+0x168/0x170()
>>> Sorry again.
>>>
>>> From 0096b9dac2282ec03d59a3f665b92977381a18ad Mon Sep 17 00:00:00 2001
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Date: Mon, 22 Jul 2013 19:08:51 +0800
>>> Subject: [PATCH] [PATCH] workqueue: allow the function of work_on_cpu() can
>>>  call work_on_cpu()
>>>
>>> If the @fn call work_on_cpu() again, the lockdep will complain:
>>>
>>>> [ INFO: possible recursive locking detected ]
>>>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>>>> ---------------------------------------------
>>>> kworker/0:1/142 is trying to acquire lock:
>>>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>>>
>>>> but task is already holding lock:
>>>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>>>
>>>> other info that might help us debug this:
>>>>  Possible unsafe locking scenario:
>>>>
>>>>        CPU0
>>>>        ----
>>>>   lock((&wfc.work));
>>>>   lock((&wfc.work));
>>>>
>>>>  *** DEADLOCK ***
>>> It is false-positive lockdep report. In this sutiation,
>>> the two "wfc"s of the two work_on_cpu() are different,
>>> they are both on stack. flush_work() can't be deadlock.
>>>
>>> To fix this, we need to avoid the lockdep checking in this case,
>>> But we don't want to change the flush_work(), so we use
>>> completion instead of flush_work() in the work_on_cpu().
>>>
>>> Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> ---
>> That worked, thanks a lot!
>>
>> Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>
>> Regards,
>> Srivatsa S. Bhat
>>
>>>  kernel/workqueue.c |    5 ++++-
>>>  1 files changed, 4 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>> index f02c4a4..b021a45 100644
>>> --- a/kernel/workqueue.c
>>> +++ b/kernel/workqueue.c
>>> @@ -4731,6 +4731,7 @@ struct work_for_cpu {
>>>       long (*fn)(void *);
>>>       void *arg;
>>>       long ret;
>>> +     struct completion done;
>>>  };
>>>
>>>  static void work_for_cpu_fn(struct work_struct *work)
>>> @@ -4738,6 +4739,7 @@ static void work_for_cpu_fn(struct work_struct *work)
>>>       struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);
>>>
>>>       wfc->ret = wfc->fn(wfc->arg);
>>> +     complete(&wfc->done);
>>>  }
>>>
>>>  /**
>>> @@ -4755,8 +4757,9 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>>       struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>>
>>>       INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>>> +     init_completion(&wfc.done);
>>>       schedule_work_on(cpu, &wfc.work);
>>> -     flush_work(&wfc.work);
>>> +     wait_for_completion(&wfc.done);
>>>       return wfc.ret;
>>>  }
>>>  EXPORT_SYMBOL_GPL(work_on_cpu);
>>>
> Isn't this for the same issue Alex and others have been working on?
>
> It doesn't feel like we have consensus on how this should be fixed.
> You're proposing a change to work_on_cpu(), Alex proposed a change to
> pci_call_probe() [1], Yinghai proposed some changes to the PCI core
> SR-IOV code and several drivers [2].
>
> [1] https://lkml.kernel.org/r/20130624195942.40795.27292.stgit@ahduyck-cp1.jf.intel.com
> [2] https://lkml.kernel.org/r/1368498506-25857-7-git-send-email-yinghai@kernel.org

The solution I proposed was flawed due to possible preemption issues. 
If this solution resolves the issue then I am fine with it as long as it
doesn't introduce any new issues.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: workqueue, pci: INFO: possible recursive locking detected
  2013-07-22 21:32           ` Tejun Heo
@ 2013-07-23  1:23             ` Lai Jiangshan
  2013-07-23 14:38               ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2013-07-23  1:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Srivatsa S. Bhat, linux-kernel, Rafael J. Wysocki, bhelgaas,
	Yinghai Lu, Alex Duyck

On 07/23/2013 05:32 AM, Tejun Heo wrote:
> On Mon, Jul 22, 2013 at 07:52:34PM +0800, Lai Jiangshan wrote:
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index f02c4a4..b021a45 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -4731,6 +4731,7 @@ struct work_for_cpu {
>>  	long (*fn)(void *);
>>  	void *arg;
>>  	long ret;
>> +	struct completion done;
>>  };
>>  
>>  static void work_for_cpu_fn(struct work_struct *work)
>> @@ -4738,6 +4739,7 @@ static void work_for_cpu_fn(struct work_struct *work)
>>  	struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);
>>  
>>  	wfc->ret = wfc->fn(wfc->arg);
>> +	complete(&wfc->done);
>>  }
>>  
>>  /**
>> @@ -4755,8 +4757,9 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
>>  	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
>>  
>>  	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>> +	init_completion(&wfc.done);
>>  	schedule_work_on(cpu, &wfc.work);
>> -	flush_work(&wfc.work);
>> +	wait_for_completion(&wfc.done);
> 
> Hmmm... it's kinda nasty.  Given how infrequently work_on_cpu() users
> nest, I think it'd be cleaner to have work_on_cpu_nested() which takes
> @subclass.  It requires extra work on the caller's part but I think
> that actually is useful as nested work_on_cpu()s are pretty weird
> things.
> 

The problem is that the userS may not know their work_on_cpu() nested,
especially when work_on_cpu()s are on different subsystems and the call depth
is deep enough but the nested work_on_cpu() depends on some conditions.

I prefer to change the user instead of introducing work_on_cpu_nested(), and
I accept to change the user only instead of change work_on_cpu() since there is only
one nested-calls case found.

But I'm thinking, since nested work_on_cpu() don't have any problem,
Why workqueue.c don't offer a more friendly API/behavior?

Thanks,
Lai

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: workqueue, pci: INFO: possible recursive locking detected
  2013-07-23  1:23             ` Lai Jiangshan
@ 2013-07-23 14:38               ` Tejun Heo
  2013-07-24 10:31                 ` Lai Jiangshan
  0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2013-07-23 14:38 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Srivatsa S. Bhat, linux-kernel, Rafael J. Wysocki, bhelgaas,
	Yinghai Lu, Alex Duyck

Hey, Lai.

On Tue, Jul 23, 2013 at 09:23:14AM +0800, Lai Jiangshan wrote:
> The problem is that the userS may not know their work_on_cpu() nested,
> especially when work_on_cpu()s are on different subsystems and the call depth
> is deep enough but the nested work_on_cpu() depends on some conditions.

Yeah, that's a possibility.  Not sure how much it'd actually matter
tho given that this is the only instance we have and we've had the
lockdep annotation for years.

> I prefer to change the user instead of introducing work_on_cpu_nested(), and
> I accept to change the user only instead of change work_on_cpu() since there is only
> one nested-calls case found.
> 
> But I'm thinking, since nested work_on_cpu() don't have any problem,
> Why workqueue.c don't offer a more friendly API/behavior?

If we wanna solve it from workqueue side, let's please do it by
introduing an internal flush_work() variant which skips the lockdep
annotation.  I'd really like to avoid using completion here.  It's
nasty as it depends solely on the fact that completion doesn't have
lockdep annotation yet.  Let's do it explicitly.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: workqueue, pci: INFO: possible recursive locking detected
  2013-07-23 14:38               ` Tejun Heo
@ 2013-07-24 10:31                 ` Lai Jiangshan
  2013-07-24 16:25                   ` [PATCH] workqueue: allow work_on_cpu() to be called recursively Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2013-07-24 10:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Srivatsa S. Bhat, linux-kernel, Rafael J. Wysocki, bhelgaas,
	Yinghai Lu, Alex Duyck

On 07/23/2013 10:38 PM, Tejun Heo wrote:
> Hey, Lai.
> 
> On Tue, Jul 23, 2013 at 09:23:14AM +0800, Lai Jiangshan wrote:
>> The problem is that the userS may not know their work_on_cpu() nested,
>> especially when work_on_cpu()s are on different subsystems and the call depth
>> is deep enough but the nested work_on_cpu() depends on some conditions.
> 
> Yeah, that's a possibility.  Not sure how much it'd actually matter
> tho given that this is the only instance we have and we've had the
> lockdep annotation for years.
> 
>> I prefer to change the user instead of introducing work_on_cpu_nested(), and
>> I accept to change the user only instead of change work_on_cpu() since there is only
>> one nested-calls case found.
>>
>> But I'm thinking, since nested work_on_cpu() don't have any problem,
>> Why workqueue.c don't offer a more friendly API/behavior?
> 
> If we wanna solve it from workqueue side, let's please do it by
> introduing an internal flush_work() variant which skips the lockdep
> annotation.  I'd really like to avoid using completion here.  It's
> nasty as it depends solely on the fact that completion doesn't have
> lockdep annotation yet.  Let's do it explicitly.
> 
> Thanks.
> 

>From 269bf1a2f47f04e0daf429c2cdf4052b4e8fb309 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Wed, 24 Jul 2013 18:21:50 +0800
Subject: [PATCH] workqueue: allow the function of work_on_cpu() can call
 work_on_cpu()

If the @fn call work_on_cpu() again, the lockdep will complain:

> [ INFO: possible recursive locking detected ]
> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
> ---------------------------------------------
> kworker/0:1/142 is trying to acquire lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>
> but task is already holding lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock((&wfc.work));
>   lock((&wfc.work));
>
>  *** DEADLOCK ***

It is false-positive lockdep report. In this sutiation,
the two "wfc"s of the two work_on_cpu() are different,
they are both on stack. flush_work() can't be deadlock.

To fix this, we need to avoid the lockdep checking in this case,
thus we instroduce a internal __flush_work() which skip the lockdep.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   29 +++++++++++++++++++----------
 1 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02c4a4..53df707 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2817,6 +2817,19 @@ already_gone:
 	return false;
 }
 
+static bool __flush_work(struct work_struct *work)
+{
+	struct wq_barrier barr;
+
+	if (start_flush_work(work, &barr)) {
+		wait_for_completion(&barr.done);
+		destroy_work_on_stack(&barr.work);
+		return true;
+	} else {
+		return false;
+	}
+}
+
 /**
  * flush_work - wait for a work to finish executing the last queueing instance
  * @work: the work to flush
@@ -2830,18 +2843,10 @@ already_gone:
  */
 bool flush_work(struct work_struct *work)
 {
-	struct wq_barrier barr;
-
 	lock_map_acquire(&work->lockdep_map);
 	lock_map_release(&work->lockdep_map);
 
-	if (start_flush_work(work, &barr)) {
-		wait_for_completion(&barr.done);
-		destroy_work_on_stack(&barr.work);
-		return true;
-	} else {
-		return false;
-	}
+	return __flush_work(work);
 }
 EXPORT_SYMBOL_GPL(flush_work);
 
@@ -4756,7 +4761,11 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
 
 	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
 	schedule_work_on(cpu, &wfc.work);
-	flush_work(&wfc.work);
+	/*
+	 * flushing the work can't lead to deadlock, using __flush_work()
+	 * to avoid the lockdep complaint for nested work_on_cpu()s.
+	 */
+	__flush_work(&wfc.work);
 	return wfc.ret;
 }
 EXPORT_SYMBOL_GPL(work_on_cpu);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] workqueue: allow work_on_cpu() to be called recursively
  2013-07-24 10:31                 ` Lai Jiangshan
@ 2013-07-24 16:25                   ` Tejun Heo
  2013-07-27 17:11                     ` Srivatsa S. Bhat
  0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2013-07-24 16:25 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Srivatsa S. Bhat, linux-kernel, Rafael J. Wysocki, bhelgaas,
	Yinghai Lu, Alex Duyck

Applied to wq/for-3.11-fixes with comment and subject tweaks.

Thanks!

---------- 8< ------------

>From c2fda509667b0fda4372a237f5a59ea4570b1627 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Wed, 24 Jul 2013 18:31:42 +0800

If the @fn call work_on_cpu() again, the lockdep will complain:

> [ INFO: possible recursive locking detected ]
> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
> ---------------------------------------------
> kworker/0:1/142 is trying to acquire lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>
> but task is already holding lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock((&wfc.work));
>   lock((&wfc.work));
>
>  *** DEADLOCK ***

It is false-positive lockdep report. In this sutiation,
the two "wfc"s of the two work_on_cpu() are different,
they are both on stack. flush_work() can't be deadlock.

To fix this, we need to avoid the lockdep checking in this case,
thus we instroduce a internal __flush_work() which skip the lockdep.

tj: Minor comment adjustment.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reported-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02c4a4..55f5f0a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2817,6 +2817,19 @@ already_gone:
 	return false;
 }
 
+static bool __flush_work(struct work_struct *work)
+{
+	struct wq_barrier barr;
+
+	if (start_flush_work(work, &barr)) {
+		wait_for_completion(&barr.done);
+		destroy_work_on_stack(&barr.work);
+		return true;
+	} else {
+		return false;
+	}
+}
+
 /**
  * flush_work - wait for a work to finish executing the last queueing instance
  * @work: the work to flush
@@ -2830,18 +2843,10 @@ already_gone:
  */
 bool flush_work(struct work_struct *work)
 {
-	struct wq_barrier barr;
-
 	lock_map_acquire(&work->lockdep_map);
 	lock_map_release(&work->lockdep_map);
 
-	if (start_flush_work(work, &barr)) {
-		wait_for_completion(&barr.done);
-		destroy_work_on_stack(&barr.work);
-		return true;
-	} else {
-		return false;
-	}
+	return __flush_work(work);
 }
 EXPORT_SYMBOL_GPL(flush_work);
 
@@ -4756,7 +4761,14 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
 
 	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
 	schedule_work_on(cpu, &wfc.work);
-	flush_work(&wfc.work);
+
+	/*
+	 * The work item is on-stack and can't lead to deadlock through
+	 * flushing.  Use __flush_work() to avoid spurious lockdep warnings
+	 * when work_on_cpu()s are nested.
+	 */
+	__flush_work(&wfc.work);
+
 	return wfc.ret;
 }
 EXPORT_SYMBOL_GPL(work_on_cpu);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] workqueue: allow work_on_cpu() to be called recursively
  2013-07-24 16:25                   ` [PATCH] workqueue: allow work_on_cpu() to be called recursively Tejun Heo
@ 2013-07-27 17:11                     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 16+ messages in thread
From: Srivatsa S. Bhat @ 2013-07-27 17:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lai Jiangshan, linux-kernel, Rafael J. Wysocki, bhelgaas,
	Yinghai Lu, Alex Duyck

On 07/24/2013 09:55 PM, Tejun Heo wrote:
> Applied to wq/for-3.11-fixes with comment and subject tweaks.
> 
> Thanks!
> 
> ---------- 8< ------------
> 
> From c2fda509667b0fda4372a237f5a59ea4570b1627 Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date: Wed, 24 Jul 2013 18:31:42 +0800
> 
> If the @fn call work_on_cpu() again, the lockdep will complain:
> 
>> [ INFO: possible recursive locking detected ]
>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>> ---------------------------------------------
>> kworker/0:1/142 is trying to acquire lock:
>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>
>> but task is already holding lock:
>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>
>> other info that might help us debug this:
>>  Possible unsafe locking scenario:
>>
>>        CPU0
>>        ----
>>   lock((&wfc.work));
>>   lock((&wfc.work));
>>
>>  *** DEADLOCK ***
> 
> It is false-positive lockdep report. In this sutiation,
> the two "wfc"s of the two work_on_cpu() are different,
> they are both on stack. flush_work() can't be deadlock.
> 
> To fix this, we need to avoid the lockdep checking in this case,
> thus we instroduce a internal __flush_work() which skip the lockdep.
> 
> tj: Minor comment adjustment.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Reported-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
> Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---

This version works as well, it fixes the issue I was facing.
Thank you!

FWIW:
Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Regards,
Srivatsa S. Bhat

>  kernel/workqueue.c | 32 ++++++++++++++++++++++----------
>  1 file changed, 22 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f02c4a4..55f5f0a 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2817,6 +2817,19 @@ already_gone:
>  	return false;
>  }
> 
> +static bool __flush_work(struct work_struct *work)
> +{
> +	struct wq_barrier barr;
> +
> +	if (start_flush_work(work, &barr)) {
> +		wait_for_completion(&barr.done);
> +		destroy_work_on_stack(&barr.work);
> +		return true;
> +	} else {
> +		return false;
> +	}
> +}
> +
>  /**
>   * flush_work - wait for a work to finish executing the last queueing instance
>   * @work: the work to flush
> @@ -2830,18 +2843,10 @@ already_gone:
>   */
>  bool flush_work(struct work_struct *work)
>  {
> -	struct wq_barrier barr;
> -
>  	lock_map_acquire(&work->lockdep_map);
>  	lock_map_release(&work->lockdep_map);
> 
> -	if (start_flush_work(work, &barr)) {
> -		wait_for_completion(&barr.done);
> -		destroy_work_on_stack(&barr.work);
> -		return true;
> -	} else {
> -		return false;
> -	}
> +	return __flush_work(work);
>  }
>  EXPORT_SYMBOL_GPL(flush_work);
> 
> @@ -4756,7 +4761,14 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
> 
>  	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
>  	schedule_work_on(cpu, &wfc.work);
> -	flush_work(&wfc.work);
> +
> +	/*
> +	 * The work item is on-stack and can't lead to deadlock through
> +	 * flushing.  Use __flush_work() to avoid spurious lockdep warnings
> +	 * when work_on_cpu()s are nested.
> +	 */
> +	__flush_work(&wfc.work);
> +
>  	return wfc.ret;
>  }
>  EXPORT_SYMBOL_GPL(work_on_cpu);
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-07-27 17:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-16 14:41 workqueue, pci: INFO: possible recursive locking detected Srivatsa S. Bhat
2013-07-17 10:07 ` Lai Jiangshan
2013-07-18 20:23   ` Srivatsa S. Bhat
2013-07-19  1:47     ` Lai Jiangshan
2013-07-19  8:57       ` Srivatsa S. Bhat
2013-07-22 11:52         ` Lai Jiangshan
2013-07-22 15:37           ` Srivatsa S. Bhat
2013-07-22 21:38             ` Bjorn Helgaas
2013-07-22 22:06               ` Yinghai Lu
2013-07-22 22:33               ` Alexander Duyck
2013-07-22 21:32           ` Tejun Heo
2013-07-23  1:23             ` Lai Jiangshan
2013-07-23 14:38               ` Tejun Heo
2013-07-24 10:31                 ` Lai Jiangshan
2013-07-24 16:25                   ` [PATCH] workqueue: allow work_on_cpu() to be called recursively Tejun Heo
2013-07-27 17:11                     ` Srivatsa S. Bhat

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.