From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757491AbZHGOEi (ORCPT ); Fri, 7 Aug 2009 10:04:38 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757399AbZHGOEi (ORCPT ); Fri, 7 Aug 2009 10:04:38 -0400 Received: from mail-bw0-f215.google.com ([209.85.218.215]:58988 "EHLO mail-bw0-f215.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757352AbZHGOEh convert rfc822-to-8bit (ORCPT ); Fri, 7 Aug 2009 10:04:37 -0400 X-Greylist: delayed 449 seconds by postgrey-1.27 at vger.kernel.org; Fri, 07 Aug 2009 10:04:36 EDT DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=TcUyD5JSi6lxjkwxXfccWCkZh8Y1nZs7rl216ah84r5zthswP0YA1/Ihf7a1o/cIlb CtVrGUwcXZy4dKI7YzpZHFZDo6BM/RMrm613bE0toKe6mTWGNKwl9Yuuy8FQZ+F9H8Ks qQKXIsm0RPPO5FTqmq6MEIVgH6aX/PyV/o7uc= MIME-Version: 1.0 In-Reply-To: <19066.19060.95471.616532@notabene.brown> References: <20090802215444.4198.83094.stgit@notabene.brown> <20090802215818.4198.77041.stgit@notabene.brown> <4A7A0287.5010505@zytor.com> <170fa0d20908051803ud6ef819xdad3cafe44cc1fb8@mail.gmail.com> <4A7A2EA6.9020603@zytor.com> <19066.19060.95471.616532@notabene.brown> Date: Fri, 7 Aug 2009 09:57:06 -0400 Message-ID: <170fa0d20908070657x1206f4ebp5f434ed63386b073@mail.gmail.com> Subject: Re: [md PATCH 8/8] md: Use revalidate_disk to effect changes in size of device. From: Mike Snitzer To: Neil Brown Cc: "H. Peter Anvin" , linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 5, 2009 at 11:13 PM, Neil Brown wrote: > On Wednesday August 5, hpa@zytor.com wrote: >> On 08/05/2009 06:03 PM, Mike Snitzer wrote: >> > On Wed, Aug 5, 2009 at 6:07 PM, H. Peter Anvin wrote: >> >> On 08/02/2009 02:58 PM, NeilBrown wrote: >> >>> As revalidate_disk calls check_disk_size_change, it will cause >> >>> any capacity change of a gendisk to be propagated to the blockdev >> >>> inode.  So use that instead of mucking about with locks and >> >>> i_size_write. >> >>> >> >>> Also add a call to revalidate_disk in do_md_run and a few other places >> >>> where the gendisk capacity is changed. >> >>> >> >> This patch causes my Fedora 11 system with all filesystems on RAID-1 to >> >> not boot (it hangs in early userspace, Ctrl-Alt-Del reboots the system.) >> > >> > I reported similar findings, with some more detail, relative to >> > Fedora's rawhide here: >> > http://lkml.org/lkml/2009/8/5/275 >> >> Sounds to be the same, yes. > > Thanks for the reports guys. > > I managed to reproduce the lockup and I think this patch should fix > it. > If you could review/test I would appreciate it. > > Thanks, > NeilBrown > > > From cf90cb85596d05d9595bb8ee4cadb7d4091212b5 Mon Sep 17 00:00:00 2001 > From: NeilBrown > Date: Thu, 6 Aug 2009 13:10:43 +1000 > Subject: [PATCH] Remove deadlock potential in md_open > > A recent commit: >  commit 449aad3e25358812c43afc60918c5ad3819488e7 > > introduced the possibility of an A-B/B-A deadlock between > bd_mutex and reconfig_mutex. > > __blkdev_get holds bd_mutex while calling md_open which takes >   reconfig_mutex, > do_md_run is always called with reconfig_mutex held, and it now >   takes bd_mutex in the call the revalidate_disk. > > This potential deadlock was not caught by lockdep due to the > use of mutex_lock_interruptible_nexted which was introduced > by >   commit d63a5a74dee87883fda6b7d170244acaac5b05e8 > do avoid a warning of an impossible deadlock. > > It is quite possible to split reconfig_mutex in to two locks. > One protects the array data structures while it is being > reconfigured, the other ensures that an array is never even partially > open while it is being deactivated. > > So create a new lock, open_mutex, just to ensure exclusion between > 'open' and 'stop'. > > This avoids the deadlock and also avoid the lockdep warning mentioned > in commit d63a5a74d > > Reported-by: "Mike Snitzer" > Reported-by: "H. Peter Anvin" > Signed-off-by: NeilBrown Neil, I finally tested this with the LVM testsuite's t-pvcreate-operation-md.sh script that I mentioned here: http://lkml.org/lkml/2009/8/5/275 The test succeeds but unfortunately triggers the following lockdep warning: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.31-0.125.rc5.git2.snitm.x86_64 #1 ------------------------------------------------------- mdadm/1348 is trying to acquire lock: (&bdev->bd_mutex){+.+.+.}, at: [] bd_release_from_disk+0x49/0x113 but task is already holding lock: (&new->open_mutex){+.+...}, at: [] do_md_stop+0x3f/0x529 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&new->open_mutex){+.+...}: [] __lock_acquire+0x990/0xb29 [] lock_acquire+0xd5/0x115 [] __mutex_lock_common+0x5d/0x3b0 [] mutex_lock_interruptible_nested+0x4d/0x68 [] md_open+0x71/0xb3 [] __blkdev_get+0xe1/0x37e [] __blkdev_get+0x1c0/0x37e [] blkdev_get+0x23/0x39 [] blkdev_open+0x85/0xd1 [] __dentry_open+0x1af/0x303 [] nameidata_to_filp+0x55/0x7d [] do_filp_open+0x504/0x960 [] do_sys_open+0x70/0x12e [] sys_open+0x33/0x49 [] system_call_fastpath+0x16/0x1b [] 0xffffffffffffffff -> #1 (&bdev->bd_mutex/1){+.+...}: [] __lock_acquire+0x990/0xb29 [] lock_acquire+0xd5/0x115 [] __mutex_lock_common+0x5d/0x3b0 [] mutex_lock_nested+0x4d/0x6a [] __blkdev_get+0x8e/0x37e [] __blkdev_get+0x1c0/0x37e [] blkdev_get+0x23/0x39 [] blkdev_open+0x85/0xd1 [] __dentry_open+0x1af/0x303 [] nameidata_to_filp+0x55/0x7d [] do_filp_open+0x504/0x960 [] do_sys_open+0x70/0x12e [] sys_open+0x33/0x49 [] system_call_fastpath+0x16/0x1b [] 0xffffffffffffffff -> #0 (&bdev->bd_mutex){+.+.+.}: [] __lock_acquire+0x884/0xb29 [] lock_acquire+0xd5/0x115 [] __mutex_lock_common+0x5d/0x3b0 [] mutex_lock_nested+0x4d/0x6a [] bd_release_from_disk+0x49/0x113 [] unbind_rdev_from_array+0x5f/0x148 [] kick_rdev_from_array+0x25/0x48 [] export_array+0x58/0xc2 [] do_md_stop+0x2cf/0x529 [] md_ioctl+0xa5e/0xffe [] __blkdev_driver_ioctl+0x4b/0xa2 [] blkdev_ioctl+0x881/0x8ee [] block_ioctl+0x50/0x68 [] vfs_ioctl+0x3e/0xa2 [] do_vfs_ioctl+0x499/0x500 [] sys_ioctl+0x6b/0xa2 [] system_call_fastpath+0x16/0x1b [] 0xffffffffffffffff other info that might help us debug this: 2 locks held by mdadm/1348: #0: (&new->reconfig_mutex){+.+.+.}, at: [] mddev_lock+0x2a/0x40 #1: (&new->open_mutex){+.+...}, at: [] do_md_stop+0x3f/0x529 stack backtrace: Pid: 1348, comm: mdadm Tainted: G W 2.6.31-0.125.rc5.git2.snitm.x86_64 #1 Call Trace: [] print_circular_bug_tail+0x80/0x9f [] __lock_acquire+0x884/0xb29 [] ? bd_release_from_disk+0x49/0x113 [] ? restore_args+0x0/0x30 [] ? bd_release_from_disk+0x49/0x113 [] lock_acquire+0xd5/0x115 [] ? bd_release_from_disk+0x49/0x113 [] __mutex_lock_common+0x5d/0x3b0 [] ? bd_release_from_disk+0x49/0x113 [] ? bd_release_from_disk+0x49/0x113 [] ? need_resched+0x36/0x54 [] ? _spin_unlock_irq+0x46/0x61 [] mutex_lock_nested+0x4d/0x6a [] bd_release_from_disk+0x49/0x113 [] unbind_rdev_from_array+0x5f/0x148 [] kick_rdev_from_array+0x25/0x48 [] ? flush_workqueue+0x0/0xd1 [] export_array+0x58/0xc2 [] do_md_stop+0x2cf/0x529 [] ? mutex_lock_interruptible_nested+0x4d/0x68 [] md_ioctl+0xa5e/0xffe [] ? __lock_acquire+0x330/0xb29 [] ? pvclock_clocksource_read+0x56/0xa6 [] __blkdev_driver_ioctl+0x4b/0xa2 [] ? lock_release_holdtime+0x61/0x7c [] blkdev_ioctl+0x881/0x8ee [] ? __rcu_read_unlock+0x34/0x4a [] ? avc_has_perm_noaudit+0x358/0x37d [] ? mark_lock+0x31/0x221 [] ? avc_has_perm+0x60/0x86 [] ? pvclock_clocksource_read+0x56/0xa6 [] ? inode_has_perm+0x7d/0xa0 [] ? _spin_unlock_irqrestore+0x5e/0x83 [] block_ioctl+0x50/0x68 [] vfs_ioctl+0x3e/0xa2 [] do_vfs_ioctl+0x499/0x500 [] sys_ioctl+0x6b/0xa2 [] system_call_fastpath+0x16/0x1b