From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bjorn Helgaas Subject: Re: [PATCHv2 for-3.9] pci: avoid work_on_cpu for nested SRIOV probes Date: Thu, 18 Apr 2013 15:40:01 -0600 Message-ID: References: <51360D7C.3060209@mellanox.com> <20130418200855.GA24086@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Or Gerlitz , Ming Lei , Greg Kroah-Hartman , David Miller , Roland Dreier , netdev , Yan Burman , Jack Morgenstein , Tejun Heo , "linux-pci@vger.kernel.org" To: "Michael S. Tsirkin" Return-path: In-Reply-To: <20130418200855.GA24086@redhat.com> Sender: linux-pci-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On Thu, Apr 18, 2013 at 2:08 PM, Michael S. Tsirkin wrote: > The following lockdep report triggers since 3.9-rc1: > > ============================================= > [ INFO: possible recursive locking detected ] > 3.9.0-rc1 #96 Not tainted > --------------------------------------------- > kworker/0:1/734 is trying to acquire lock: > ((&wfc.work)){+.+.+.}, at: [] flush_work+0x0/0x250 > > but task is already holding lock: > ((&wfc.work)){+.+.+.}, at: [] > process_one_work+0x162/0x4c0 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock((&wfc.work)); > lock((&wfc.work)); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 3 locks held by kworker/0:1/734: > #0: (events){.+.+.+}, at: [] > process_one_work+0x162/0x4c0 > #1: ((&wfc.work)){+.+.+.}, at: [] > process_one_work+0x162/0x4c0 > #2: (&__lockdep_no_validate__){......}, at: [] > device_attach+0x25/0xb0 > > stack backtrace: > Pid: 734, comm: kworker/0:1 Not tainted 3.9.0-rc1 #96 > Call Trace: > [] validate_chain+0xdcc/0x11f0 > [] __lock_acquire+0x440/0xc70 > [] ? __lock_acquire+0x440/0xc70 > [] lock_acquire+0x5a/0x70 > [] ? wq_worker_waking_up+0x60/0x60 > [] flush_work+0x45/0x250 > [] ? wq_worker_waking_up+0x60/0x60 > [] ? mark_held_locks+0x9e/0x130 > [] ? queue_work_on+0x46/0x90 > [] ? trace_hardirqs_on_caller+0xfd/0x190 > [] ? trace_hardirqs_on+0xd/0x10 > [] work_on_cpu+0x74/0x90 > [] ? keventd_up+0x20/0x20 > [] ? pci_pm_prepare+0x60/0x60 > [] ? cpumask_next_and+0x23/0x40 > [] pci_device_probe+0xba/0x110 > [] ? driver_sysfs_add+0x7a/0xb0 > [] driver_probe_device+0x8f/0x230 > [] ? __driver_attach+0xb0/0xb0 > [] __device_attach+0x4b/0x60 > [] bus_for_each_drv+0x64/0x90 > [] device_attach+0x98/0xb0 > [] pci_bus_add_device+0x24/0x50 > [] virtfn_add+0x240/0x3e0 > [] ? _raw_spin_unlock_irqrestore+0x3d/0x80 > [] pci_enable_sriov+0x23e/0x500 > [] __mlx4_init_one+0x5da/0xce0 [mlx4_core] > [] mlx4_init_one+0x2d/0x60 [mlx4_core] > [] local_pci_probe+0x49/0x80 > [] work_for_cpu_fn+0x13/0x20 > [] process_one_work+0x1c8/0x4c0 > [] ? process_one_work+0x162/0x4c0 > [] worker_thread+0x30b/0x430 > [] ? manage_workers+0x340/0x340 > [] kthread+0xd6/0xe0 > [] ? __init_kthread_worker+0x70/0x70 > [] ret_from_fork+0x7c/0xb0 > [] ? __init_kthread_worker+0x70/0x70 > > The issue is that a driver, in it's probe function, calls > pci_sriov_enable so a PF device probe causes VF probe (AKA nested > probe). Each probe in pci_device_probe which is (normally) run through > work_on_cpu (this is to get the right numa node for memory allocated by > the driver). In turn work_on_cpu does this internally: > > schedule_work_on(cpu, &wfc.work); > flush_work(&wfc.work); > > So if you are running probe on CPU1, and cause another > probe on the same CPU, this will try to flush > workqueue from inside same workqueue which causes > a lockep warning. > > Nested probing might be tricky to get right generally. > > But for pci_sriov_enable, the situation is actually very simple: all VFs > naturally have same affinity as the PF, and cpumask_any_and is actually > same as cpumask_first_and, so it always gives us the same CPU. > So let's just detect that, and run the probing for VFs locally without a > workqueue. > > This is hardly elegant, but looks to me like an appropriate quick fix > for 3.9. > > Tested-by: Or Gerlitz > Signed-off-by: Michael S. Tsirkin > Acked-by: Tejun Heo Thanks, Michael. I put this in my for-linus branch: http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=for-linus I'll send a pull request to Linus today. Bjorn > --- > > Changes from v1: > - clarified commit log and added Ack by Tejun Heo > patch is unchanged. > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 1fa1e48..6eeb5ec 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -286,8 +286,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, > int cpu; > > get_online_cpus(); > cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask); > - if (cpu < nr_cpu_ids) > + if (cpu != raw_smp_processor_id() && cpu < nr_cpu_ids) > error = work_on_cpu(cpu, local_pci_probe, &ddi); > else > error = local_pci_probe(&ddi);