All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>, "Rafael J. Wysocki" <rjw@sisk.pl>,
	bhelgaas@google.com
Subject: Re: workqueue, pci: INFO: possible recursive locking detected
Date: Fri, 19 Jul 2013 01:53:56 +0530	[thread overview]
Message-ID: <51E84EDC.5090502@linux.vnet.ibm.com> (raw)
In-Reply-To: <51E66CCC.9010600@cn.fujitsu.com>


On 07/17/2013 03:37 PM, Lai Jiangshan wrote:
> On 07/16/2013 10:41 PM, Srivatsa S. Bhat wrote:
>> Hi,
>>
>> I have been seeing this warning every time during boot. I haven't
>> spent time digging through it though... Please let me know if
>> any machine-specific info is needed.
>>
>> Regards,
>> Srivatsa S. Bhat
>>
>>
>> ----------------------------------------------------
>>
>> =============================================
>> [ INFO: possible recursive locking detected ]
>> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
>> ---------------------------------------------
>> kworker/0:1/142 is trying to acquire lock:
>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>>
>> but task is already holding lock:
>>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>>
>> other info that might help us debug this:
>>  Possible unsafe locking scenario:
>>
>>        CPU0
>>        ----
>>   lock((&wfc.work));
>>   lock((&wfc.work));
> 
>


Hi Lai,

Thanks for taking a look into this!

> 
> This is false negative,

I believe you meant false-positive...

> the two "wfc"s are different, they are
> both on stack. flush_work() can't be deadlock in such case:
> 
> void foo(void *)
> {
> 	...
> 	if (xxx)
> 		work_on_cpu(..., foo, ...);
> 	...
> }
> 
> bar()
> {
> 	work_on_cpu(..., foo, ...);
> }
> 
> The complaint is caused by "work_on_cpu() uses a static lock_class_key".
> we should fix work_on_cpu().
> (but the caller should also be careful, the foo()/local_pci_probe() is re-entering)
> 
> But I can't find an elegant fix.
> 
> long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
> {
> 	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
> 
> +#ifdef CONFIG_LOCKDEP
> +	static struct lock_class_key __key;
> +	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +	lockdep_init_map(&wfc.work.lockdep_map, &wfc.work, &__key, 0);
> +#else
> 	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
> +#endif
> 	schedule_work_on(cpu, &wfc.work);
> 	flush_work(&wfc.work);
> 	return wfc.ret;
> }
> 

Unfortunately that didn't seem to fix it.. I applied the patch
shown below, and I got the same old warning.

---

 kernel/workqueue.c |    6 ++++++
 1 file changed, 6 insertions(+)


diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02c4a4..07d9a67 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4754,7 +4754,13 @@ long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
 {
 	struct work_for_cpu wfc = { .fn = fn, .arg = arg };
 
+#ifdef CONFIG_LOCKDEP
+	static struct lock_class_key __key;
+	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+	lockdep_init_map(&wfc.work.lockdep_map, "&wfc.work", &__key, 0);
+#else
 	INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+#endif
 	schedule_work_on(cpu, &wfc.work);
 	flush_work(&wfc.work);
 	return wfc.ret;



Warning:
--------

wmi: Mapper loaded
be2net 0000:11:00.0: irq 102 for MSI/MSI-X
be2net 0000:11:00.0: enabled 1 MSI-x vector(s)
be2net 0000:11:00.0: created 0 RSS queue(s) and 1 default RX queue
be2net 0000:11:00.0: created 1 TX queue(s)
pci 0000:11:04.0: [19a2:0710] type 00 class 0x020000

=============================================
[ INFO: possible recursive locking detected ]
3.11.0-rc1-wq-fix #10 Not tainted
---------------------------------------------
kworker/0:1/126 is trying to acquire lock:
 (&wfc.work){+.+.+.}, at: [<ffffffff810770f0>] flush_work+0x0/0xb0

but task is already holding lock:
 (&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&wfc.work);
  lock(&wfc.work);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/0:1/126:
 #0:  (events){.+.+.+}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
 #1:  (&wfc.work){+.+.+.}, at: [<ffffffff81075dc9>] process_one_work+0x169/0x610
 #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff81398ada>] device_attach+0x2a/0xc0

stack backtrace:
CPU: 0 PID: 126 Comm: kworker/0:1 Not tainted 3.11.0-rc1-wq-fix #10
Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
Workqueue: events work_for_cpu_fn
 ffff881036887408 ffff881036889668 ffffffff81619059 0000000000000003
 ffff881036886a80 ffff881036889698 ffffffff810c1624 ffff881036886a80
 ffff881036887408 ffff881036886a80 0000000000000000 ffff8810368896f8
Call Trace:
 [<ffffffff81619059>] dump_stack+0x59/0x80
 [<ffffffff810c1624>] print_deadlock_bug+0xf4/0x100
 [<ffffffff810c3104>] validate_chain+0x504/0x750
 [<ffffffff810c365d>] __lock_acquire+0x30d/0x580
 [<ffffffff810c3967>] lock_acquire+0x97/0x170
 [<ffffffff810770f0>] ? start_flush_work+0x220/0x220
 [<ffffffff81077138>] flush_work+0x48/0xb0
 [<ffffffff810770f0>] ? start_flush_work+0x220/0x220
 [<ffffffff810c2000>] ? mark_held_locks+0x80/0x130
 [<ffffffff81074ceb>] ? queue_work_on+0x4b/0xa0
 [<ffffffff810c2375>] ? trace_hardirqs_on_caller+0x105/0x1d0
 [<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81077334>] work_on_cpu+0xa4/0xc0
 [<ffffffff8106f940>] ? wqattrs_hash+0x190/0x190
 [<ffffffff812d1ed0>] ? pci_pm_prepare+0x60/0x60
 [<ffffffff812d1ffa>] __pci_device_probe+0x9a/0xe0
 [<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff812d3302>] ? pci_dev_get+0x22/0x30
 [<ffffffff812d334a>] pci_device_probe+0x3a/0x60
 [<ffffffff8161ef90>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff81398bdc>] really_probe+0x6c/0x320
 [<ffffffff81398ed7>] driver_probe_device+0x47/0xa0
 [<ffffffff81398fe0>] ? __driver_attach+0xb0/0xb0
 [<ffffffff81399033>] __device_attach+0x53/0x60
 [<ffffffff81396b24>] bus_for_each_drv+0x74/0xa0
 [<ffffffff81398b50>] device_attach+0xa0/0xc0
 [<ffffffff812c99f9>] pci_bus_add_device+0x39/0x60
 [<ffffffff812ed341>] virtfn_add+0x251/0x3e0
 [<ffffffff810c244d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff812ed9bf>] sriov_enable+0x22f/0x3d0
 [<ffffffff812edbad>] pci_enable_sriov+0x4d/0x60
 [<ffffffffa0127045>] be_vf_setup+0x175/0x410 [be2net]
 [<ffffffffa012d3ca>] be_setup+0x37a/0x4b0 [be2net]
 [<ffffffffa012dac0>] be_probe+0x5c0/0x820 [be2net]
 [<ffffffff812d1f1e>] local_pci_probe+0x4e/0x90
 [<ffffffff8106f958>] work_for_cpu_fn+0x18/0x30
 [<ffffffff81075e3a>] process_one_work+0x1da/0x610
 [<ffffffff81075dc9>] ? process_one_work+0x169/0x610
 [<ffffffff810764fc>] worker_thread+0x28c/0x3a0
 [<ffffffff81076270>] ? process_one_work+0x610/0x610
 [<ffffffff8107da5e>] kthread+0xee/0x100
 [<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff81628e5c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8107d970>] ? __init_kthread_worker+0x70/0x70
be2net 0000:11:04.0: enabling device (0040 -> 0042)
be2net 0000:11:04.0: Could not use PCIe error reporting
be2net 0000:11:04.0: VF is not privileged to issue opcode 89-1
be2net 0000:11:04.0: VF is not privileged to issue opcode 125-1


Regards,
Srivatsa S. Bhat


  reply	other threads:[~2013-07-18 20:27 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16 14:41 workqueue, pci: INFO: possible recursive locking detected Srivatsa S. Bhat
2013-07-17 10:07 ` Lai Jiangshan
2013-07-18 20:23   ` Srivatsa S. Bhat [this message]
2013-07-19  1:47     ` Lai Jiangshan
2013-07-19  8:57       ` Srivatsa S. Bhat
2013-07-22 11:52         ` Lai Jiangshan
2013-07-22 15:37           ` Srivatsa S. Bhat
2013-07-22 21:38             ` Bjorn Helgaas
2013-07-22 22:06               ` Yinghai Lu
2013-07-22 22:33               ` Alexander Duyck
2013-07-22 21:32           ` Tejun Heo
2013-07-23  1:23             ` Lai Jiangshan
2013-07-23 14:38               ` Tejun Heo
2013-07-24 10:31                 ` Lai Jiangshan
2013-07-24 16:25                   ` [PATCH] workqueue: allow work_on_cpu() to be called recursively Tejun Heo
2013-07-27 17:11                     ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E84EDC.5090502@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=bhelgaas@google.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.