From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753384AbbFXQPj (ORCPT ); Wed, 24 Jun 2015 12:15:39 -0400 Received: from casper.infradead.org ([85.118.1.10]:38648 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752620AbbFXQPa (ORCPT ); Wed, 24 Jun 2015 12:15:30 -0400 Date: Wed, 24 Jun 2015 18:15:25 +0200 From: Peter Zijlstra To: Oleg Nesterov Cc: paulmck@linux.vnet.ibm.com, tj@kernel.org, mingo@redhat.com, linux-kernel@vger.kernel.org, der.herr@hofr.at, dave@stgolabs.net, riel@redhat.com, viro@ZenIV.linux.org.uk, torvalds@linux-foundation.org Subject: Re: [RFC][PATCH 09/13] hotplug: Replace hotplug lock with percpu-rwsem Message-ID: <20150624161524.GO3644@twins.programming.kicks-ass.net> References: <20150622121623.291363374@infradead.org> <20150622122256.480062572@infradead.org> <20150622225739.GA5582@redhat.com> <20150623071637.GA3644@twins.programming.kicks-ass.net> <20150623170122.GA26854@redhat.com> <20150623175318.GE3644@twins.programming.kicks-ass.net> <20150624135049.GA31992@redhat.com> <20150624141358.GQ19282@twins.programming.kicks-ass.net> <20150624151212.GA3766@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150624151212.GA3766@redhat.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 24, 2015 at 05:12:12PM +0200, Oleg Nesterov wrote: > On 06/24, Peter Zijlstra wrote: > > I'm confused.. why isn't the read-in-read recursion good enough? > > Because the code above can actually deadlock if 2 CPU's do this at > the same time? Hmm yes.. this makes the hotplug locking worse than I feared it was, but alas. FYI, the actual splat. --- [ 7.399737] ====================================================== [ 7.406640] [ INFO: possible circular locking dependency detected ] [ 7.413643] 4.1.0-02756-ge3d06bd-dirty #185 Not tainted [ 7.419481] ------------------------------------------------------- [ 7.426483] kworker/0:1/215 is trying to acquire lock: [ 7.432221] (&cpu_hotplug.rwsem){++++++}, at: [] apply_workqueue_attrs+0x183/0x4b0 [ 7.442564] [ 7.442564] but task is already holding lock: [ 7.449079] (&item->mutex){+.+.+.}, at: [] drm_global_item_ref+0x33/0xe0 [ 7.458455] [ 7.458455] which lock already depends on the new lock. [ 7.458455] [ 7.467591] [ 7.467591] the existing dependency chain (in reverse order) is: [ 7.475949] -> #3 (&item->mutex){+.+.+.}: [ 7.480662] [] lock_acquire+0xd1/0x290 [ 7.487280] [] mutex_lock_nested+0x47/0x3c0 [ 7.494390] [] drm_global_item_ref+0x33/0xe0 [ 7.501596] [] mgag200_mm_init+0x50/0x1c0 [ 7.508514] [] mgag200_driver_load+0x30f/0x500 [ 7.515916] [] drm_dev_register+0xb1/0x100 [ 7.522922] [] drm_get_pci_dev+0x8d/0x1e0 [ 7.529840] [] mga_pci_probe+0x9f/0xc0 [ 7.536463] [] local_pci_probe+0x42/0xa0 [ 7.543283] [] work_for_cpu_fn+0x18/0x30 [ 7.550106] [] process_one_work+0x1e7/0x7e0 [ 7.557214] [] worker_thread+0x1c8/0x460 [ 7.564029] [] kthread+0xf6/0x110 [ 7.570166] [] ret_from_fork+0x3f/0x70 [ 7.576792] -> #2 (drm_global_mutex){+.+.+.}: [ 7.581891] [] lock_acquire+0xd1/0x290 [ 7.588514] [] mutex_lock_nested+0x47/0x3c0 [ 7.595622] [] drm_dev_register+0x26/0x100 [ 7.602632] [] drm_get_pci_dev+0x8d/0x1e0 [ 7.609547] [] mga_pci_probe+0x9f/0xc0 [ 7.616170] [] local_pci_probe+0x42/0xa0 [ 7.622987] [] work_for_cpu_fn+0x18/0x30 [ 7.629806] [] process_one_work+0x1e7/0x7e0 [ 7.636913] [] worker_thread+0x1c8/0x460 [ 7.643727] [] kthread+0xf6/0x110 [ 7.649866] [] ret_from_fork+0x3f/0x70 [ 7.656490] -> #1 ((&wfc.work)){+.+.+.}: [ 7.661104] [] lock_acquire+0xd1/0x290 [ 7.667727] [] flush_work+0x3d/0x260 [ 7.674155] [] work_on_cpu+0x82/0x90 [ 7.680584] [] pci_device_probe+0x112/0x120 [ 7.687692] [] driver_probe_device+0x17f/0x2e0 [ 7.695094] [] __driver_attach+0x94/0xa0 [ 7.701910] [] bus_for_each_dev+0x66/0xa0 [ 7.708824] [] driver_attach+0x1e/0x20 [ 7.715447] [] bus_add_driver+0x168/0x210 [ 7.722361] [] driver_register+0x60/0xe0 [ 7.729180] [] __pci_register_driver+0x64/0x70 [ 7.736580] [] pcie_portdrv_init+0x66/0x79 [ 7.743593] [] do_one_initcall+0x88/0x1c0 [ 7.750508] [] kernel_init_freeable+0x1f5/0x282 [ 7.758005] [] kernel_init+0xe/0xe0 [ 7.764338] [] ret_from_fork+0x3f/0x70 [ 7.770961] -> #0 (&cpu_hotplug.rwsem){++++++}: [ 7.776255] [] __lock_acquire+0x2207/0x2240 [ 7.783363] [] lock_acquire+0xd1/0x290 [ 7.789986] [] get_online_cpus+0x62/0xb0 [ 7.796805] [] apply_workqueue_attrs+0x183/0x4b0 [ 7.804398] [] __alloc_workqueue_key+0x2ec/0x560 [ 7.811992] [] ttm_mem_global_init+0x5a/0x310 [ 7.819295] [] mgag200_ttm_mem_global_init+0x12/0x20 [ 7.827277] [] drm_global_item_ref+0x65/0xe0 [ 7.834481] [] mgag200_mm_init+0x50/0x1c0 [ 7.841395] [] mgag200_driver_load+0x30f/0x500 [ 7.848793] [] drm_dev_register+0xb1/0x100 [ 7.855804] [] drm_get_pci_dev+0x8d/0x1e0 [ 7.862715] [] mga_pci_probe+0x9f/0xc0 [ 7.869338] [] local_pci_probe+0x42/0xa0 [ 7.876159] [] work_for_cpu_fn+0x18/0x30 [ 7.882979] [] process_one_work+0x1e7/0x7e0 [ 7.890087] [] worker_thread+0x1c8/0x460 [ 7.896907] [] kthread+0xf6/0x110 [ 7.903043] [] ret_from_fork+0x3f/0x70 [ 7.909673] [ 7.909673] other info that might help us debug this: [ 7.909673] [ 7.918616] Chain exists of: &cpu_hotplug.rwsem --> drm_global_mutex --> &item->mutex [ 7.927907] Possible unsafe locking scenario: [ 7.927907] [ 7.934521] CPU0 CPU1 [ 7.939580] ---- ---- [ 7.944639] lock(&item->mutex); [ 7.948359] lock(drm_global_mutex); [ 7.955292] lock(&item->mutex); [ 7.961855] lock(&cpu_hotplug.rwsem); [ 7.966158] [ 7.966158] *** DEADLOCK *** [ 7.966158] [ 7.972771] 4 locks held by kworker/0:1/215: [ 7.977539] #0: ("events"){.+.+.+}, at: [] process_one_work+0x156/0x7e0 [ 7.986929] #1: ((&wfc.work)){+.+.+.}, at: [] process_one_work+0x156/0x7e0 [ 7.996600] #2: (drm_global_mutex){+.+.+.}, at: [] drm_dev_register+0x26/0x100 [ 8.006690] #3: (&item->mutex){+.+.+.}, at: [] drm_global_item_ref+0x33/0xe0 [ 8.016559] [ 8.016559] stack backtrace: [ 8.021427] CPU: 0 PID: 215 Comm: kworker/0:1 Not tainted 4.1.0-02756-ge3d06bd-dirty #185 [ 8.030565] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013 [ 8.042034] Workqueue: events work_for_cpu_fn [ 8.046909] ffffffff82857e30 ffff88042b3437c8 ffffffff818e5189 0000000000000011 [ 8.055216] ffffffff8282aa40 ffff88042b343818 ffffffff8111ee76 0000000000000004 [ 8.063522] ffff88042b343888 ffff88042b33f040 0000000000000004 ffff88042b33f040 [ 8.071827] Call Trace: [ 8.074559] [] dump_stack+0x4c/0x6e [ 8.080300] [] print_circular_bug+0x1c6/0x220 [ 8.087011] [] __lock_acquire+0x2207/0x2240 [ 8.093528] [] lock_acquire+0xd1/0x290 [ 8.099559] [] ? apply_workqueue_attrs+0x183/0x4b0 [ 8.106755] [] get_online_cpus+0x62/0xb0 [ 8.112981] [] ? apply_workqueue_attrs+0x183/0x4b0 [ 8.120176] [] ? alloc_workqueue_attrs+0x27/0x80 [ 8.127178] [] apply_workqueue_attrs+0x183/0x4b0 [ 8.134182] [] ? debug_mutex_init+0x31/0x40 [ 8.140690] [] __alloc_workqueue_key+0x2ec/0x560 [ 8.147691] [] ttm_mem_global_init+0x5a/0x310 [ 8.154405] [] ? __kmalloc+0x5e0/0x630 [ 8.160435] [] ? drm_global_item_ref+0x52/0xe0 [ 8.167243] [] mgag200_ttm_mem_global_init+0x12/0x20 [ 8.174631] [] drm_global_item_ref+0x65/0xe0 [ 8.181245] [] mgag200_mm_init+0x50/0x1c0 [ 8.187570] [] mgag200_driver_load+0x30f/0x500 [ 8.194383] [] drm_dev_register+0xb1/0x100 [ 8.200802] [] drm_get_pci_dev+0x8d/0x1e0 [ 8.207125] [] ? mutex_unlock+0xe/0x10 [ 8.213156] [] mga_pci_probe+0x9f/0xc0 [ 8.219187] [] local_pci_probe+0x42/0xa0 [ 8.225412] [] ? __lock_is_held+0x51/0x80 [ 8.231736] [] work_for_cpu_fn+0x18/0x30 [ 8.237962] [] process_one_work+0x1e7/0x7e0 [ 8.244477] [] ? process_one_work+0x156/0x7e0 [ 8.251187] [] worker_thread+0x1c8/0x460 [ 8.257410] [] ? process_one_work+0x7e0/0x7e0 [ 8.264120] [] ? process_one_work+0x7e0/0x7e0 [ 8.270829] [] kthread+0xf6/0x110 [ 8.276375] [] ? _raw_spin_unlock_irq+0x30/0x60 [ 8.283282] [] ? kthread_create_on_node+0x220/0x220 [ 8.290566] [] ret_from_fork+0x3f/0x70 [ 8.296597] [] ? kthread_create_on_node+0x220/0x220