* Re: [md PATCH 8/8] md: Use revalidate_disk to effect changes in size of device. [not found] ` <4A7A2EA6.9020603@zytor.com> @ 2009-08-06 3:13 ` Neil Brown 2009-08-06 4:35 ` H. Peter Anvin 2009-08-07 13:57 ` Mike Snitzer 0 siblings, 2 replies; 4+ messages in thread From: Neil Brown @ 2009-08-06 3:13 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Mike Snitzer, linux-raid, linux-kernel On Wednesday August 5, hpa@zytor.com wrote: > On 08/05/2009 06:03 PM, Mike Snitzer wrote: > > On Wed, Aug 5, 2009 at 6:07 PM, H. Peter Anvin<hpa@zytor.com> wrote: > >> On 08/02/2009 02:58 PM, NeilBrown wrote: > >>> As revalidate_disk calls check_disk_size_change, it will cause > >>> any capacity change of a gendisk to be propagated to the blockdev > >>> inode. So use that instead of mucking about with locks and > >>> i_size_write. > >>> > >>> Also add a call to revalidate_disk in do_md_run and a few other places > >>> where the gendisk capacity is changed. > >>> > >> This patch causes my Fedora 11 system with all filesystems on RAID-1 to > >> not boot (it hangs in early userspace, Ctrl-Alt-Del reboots the system.) > > > > I reported similar findings, with some more detail, relative to > > Fedora's rawhide here: > > http://lkml.org/lkml/2009/8/5/275 > > Sounds to be the same, yes. Thanks for the reports guys. I managed to reproduce the lockup and I think this patch should fix it. If you could review/test I would appreciate it. Thanks, NeilBrown >From cf90cb85596d05d9595bb8ee4cadb7d4091212b5 Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@suse.de> Date: Thu, 6 Aug 2009 13:10:43 +1000 Subject: [PATCH] Remove deadlock potential in md_open A recent commit: commit 449aad3e25358812c43afc60918c5ad3819488e7 introduced the possibility of an A-B/B-A deadlock between bd_mutex and reconfig_mutex. __blkdev_get holds bd_mutex while calling md_open which takes reconfig_mutex, do_md_run is always called with reconfig_mutex held, and it now takes bd_mutex in the call the revalidate_disk. This potential deadlock was not caught by lockdep due to the use of mutex_lock_interruptible_nexted which was introduced by commit d63a5a74dee87883fda6b7d170244acaac5b05e8 do avoid a warning of an impossible deadlock. It is quite possible to split reconfig_mutex in to two locks. One protects the array data structures while it is being reconfigured, the other ensures that an array is never even partially open while it is being deactivated. So create a new lock, open_mutex, just to ensure exclusion between 'open' and 'stop'. This avoids the deadlock and also avoid the lockdep warning mentioned in commit d63a5a74d Reported-by: "Mike Snitzer" <snitzer@gmail.com> Reported-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NeilBrown <neilb@suse.de> --- drivers/md/md.c | 10 +++++++--- drivers/md/md.h | 10 ++++++++++ 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 5b98bea..1ecb219 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -359,6 +359,7 @@ static mddev_t * mddev_find(dev_t unit) else new->md_minor = MINOR(unit) >> MdpMinorShift; + mutex_init(&new->open_mutex); mutex_init(&new->reconfig_mutex); INIT_LIST_HEAD(&new->disks); INIT_LIST_HEAD(&new->all_mddevs); @@ -4304,9 +4305,11 @@ static int do_md_stop(mddev_t * mddev, int mode, int is_open) struct gendisk *disk = mddev->gendisk; mdk_rdev_t *rdev; + mutex_lock(&mddev->open_mutex); if (atomic_read(&mddev->openers) > is_open) { printk("md: %s still in use.\n",mdname(mddev)); - return -EBUSY; + err = -EBUSY; + goto out; } if (mddev->pers) { @@ -4434,6 +4437,7 @@ static int do_md_stop(mddev_t * mddev, int mode, int is_open) md_new_event(mddev); sysfs_notify_dirent(mddev->sysfs_state); out: + mutex_unlock(&mddev->open_mutex); return err; } @@ -5518,12 +5522,12 @@ static int md_open(struct block_device *bdev, fmode_t mode) } BUG_ON(mddev != bdev->bd_disk->private_data); - if ((err = mutex_lock_interruptible_nested(&mddev->reconfig_mutex, 1))) + if ((err = mutex_lock_interruptible(&mddev->open_mutex))) goto out; err = 0; atomic_inc(&mddev->openers); - mddev_unlock(mddev); + mutex_unlock(&mddev->open_mutex); check_disk_change(bdev); out: diff --git a/drivers/md/md.h b/drivers/md/md.h index 78f0316..f8fc188 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -223,6 +223,16 @@ struct mddev_s * so we don't loop trying */ int in_sync; /* know to not need resync */ + /* 'open_mutex' avoids races between 'md_open' and 'do_md_stop', so + * that we are never stopping an array while it is open. + * 'reconfig_mutex' protects all other reconfiguration. + * These locks are separate due to conflicting interactions + * with bdev->bd_mutex. + * Lock ordering is: + * reconfig_mutex -> bd_mutex : e.g. do_md_run -> revalidate_disk + * bd_mutex -> open_mutex: e.g. __blkdev_get -> md_open + */ + struct mutex open_mutex; struct mutex reconfig_mutex; atomic_t active; /* general refcount */ atomic_t openers; /* number of active opens */ -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [md PATCH 8/8] md: Use revalidate_disk to effect changes in size of device. 2009-08-06 3:13 ` [md PATCH 8/8] md: Use revalidate_disk to effect changes in size of device Neil Brown @ 2009-08-06 4:35 ` H. Peter Anvin 2009-08-07 13:57 ` Mike Snitzer 1 sibling, 0 replies; 4+ messages in thread From: H. Peter Anvin @ 2009-08-06 4:35 UTC (permalink / raw) To: Neil Brown; +Cc: Mike Snitzer, linux-raid, linux-kernel On 08/05/2009 08:13 PM, Neil Brown wrote: > > Thanks for the reports guys. > > I managed to reproduce the lockup and I think this patch should fix > it. > If you could review/test I would appreciate it. > > Thanks, > NeilBrown > This patch makes my system boot, whereas previously it would deadlock 100% of the time. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [md PATCH 8/8] md: Use revalidate_disk to effect changes in size of device. 2009-08-06 3:13 ` [md PATCH 8/8] md: Use revalidate_disk to effect changes in size of device Neil Brown 2009-08-06 4:35 ` H. Peter Anvin @ 2009-08-07 13:57 ` Mike Snitzer 2009-08-10 1:26 ` Neil Brown 1 sibling, 1 reply; 4+ messages in thread From: Mike Snitzer @ 2009-08-07 13:57 UTC (permalink / raw) To: Neil Brown; +Cc: H. Peter Anvin, linux-raid, linux-kernel On Wed, Aug 5, 2009 at 11:13 PM, Neil Brown<neilb@suse.de> wrote: > On Wednesday August 5, hpa@zytor.com wrote: >> On 08/05/2009 06:03 PM, Mike Snitzer wrote: >> > On Wed, Aug 5, 2009 at 6:07 PM, H. Peter Anvin<hpa@zytor.com> wrote: >> >> On 08/02/2009 02:58 PM, NeilBrown wrote: >> >>> As revalidate_disk calls check_disk_size_change, it will cause >> >>> any capacity change of a gendisk to be propagated to the blockdev >> >>> inode. So use that instead of mucking about with locks and >> >>> i_size_write. >> >>> >> >>> Also add a call to revalidate_disk in do_md_run and a few other places >> >>> where the gendisk capacity is changed. >> >>> >> >> This patch causes my Fedora 11 system with all filesystems on RAID-1 to >> >> not boot (it hangs in early userspace, Ctrl-Alt-Del reboots the system.) >> > >> > I reported similar findings, with some more detail, relative to >> > Fedora's rawhide here: >> > http://lkml.org/lkml/2009/8/5/275 >> >> Sounds to be the same, yes. > > Thanks for the reports guys. > > I managed to reproduce the lockup and I think this patch should fix > it. > If you could review/test I would appreciate it. > > Thanks, > NeilBrown > > > From cf90cb85596d05d9595bb8ee4cadb7d4091212b5 Mon Sep 17 00:00:00 2001 > From: NeilBrown <neilb@suse.de> > Date: Thu, 6 Aug 2009 13:10:43 +1000 > Subject: [PATCH] Remove deadlock potential in md_open > > A recent commit: > commit 449aad3e25358812c43afc60918c5ad3819488e7 > > introduced the possibility of an A-B/B-A deadlock between > bd_mutex and reconfig_mutex. > > __blkdev_get holds bd_mutex while calling md_open which takes > reconfig_mutex, > do_md_run is always called with reconfig_mutex held, and it now > takes bd_mutex in the call the revalidate_disk. > > This potential deadlock was not caught by lockdep due to the > use of mutex_lock_interruptible_nexted which was introduced > by > commit d63a5a74dee87883fda6b7d170244acaac5b05e8 > do avoid a warning of an impossible deadlock. > > It is quite possible to split reconfig_mutex in to two locks. > One protects the array data structures while it is being > reconfigured, the other ensures that an array is never even partially > open while it is being deactivated. > > So create a new lock, open_mutex, just to ensure exclusion between > 'open' and 'stop'. > > This avoids the deadlock and also avoid the lockdep warning mentioned > in commit d63a5a74d > > Reported-by: "Mike Snitzer" <snitzer@gmail.com> > Reported-by: "H. Peter Anvin" <hpa@zytor.com> > Signed-off-by: NeilBrown <neilb@suse.de> Neil, I finally tested this with the LVM testsuite's t-pvcreate-operation-md.sh script that I mentioned here: http://lkml.org/lkml/2009/8/5/275 The test succeeds but unfortunately triggers the following lockdep warning: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.31-0.125.rc5.git2.snitm.x86_64 #1 ------------------------------------------------------- mdadm/1348 is trying to acquire lock: (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff811809dc>] bd_release_from_disk+0x49/0x113 but task is already holding lock: (&new->open_mutex){+.+...}, at: [<ffffffff81418ac7>] do_md_stop+0x3f/0x529 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&new->open_mutex){+.+...}: [<ffffffff8109ed70>] __lock_acquire+0x990/0xb29 [<ffffffff810a0298>] lock_acquire+0xd5/0x115 [<ffffffff8152e318>] __mutex_lock_common+0x5d/0x3b0 [<ffffffff8152e6b8>] mutex_lock_interruptible_nested+0x4d/0x68 [<ffffffff8141e683>] md_open+0x71/0xb3 [<ffffffff81181821>] __blkdev_get+0xe1/0x37e [<ffffffff81181900>] __blkdev_get+0x1c0/0x37e [<ffffffff81181ae1>] blkdev_get+0x23/0x39 [<ffffffff81181b7c>] blkdev_open+0x85/0xd1 [<ffffffff8114f909>] __dentry_open+0x1af/0x303 [<ffffffff81150da8>] nameidata_to_filp+0x55/0x7d [<ffffffff8115ecb1>] do_filp_open+0x504/0x960 [<ffffffff8114f595>] do_sys_open+0x70/0x12e [<ffffffff8114f6c0>] sys_open+0x33/0x49 [<ffffffff810141c2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff -> #1 (&bdev->bd_mutex/1){+.+...}: [<ffffffff8109ed70>] __lock_acquire+0x990/0xb29 [<ffffffff810a0298>] lock_acquire+0xd5/0x115 [<ffffffff8152e318>] __mutex_lock_common+0x5d/0x3b0 [<ffffffff8152e789>] mutex_lock_nested+0x4d/0x6a [<ffffffff811817ce>] __blkdev_get+0x8e/0x37e [<ffffffff81181900>] __blkdev_get+0x1c0/0x37e [<ffffffff81181ae1>] blkdev_get+0x23/0x39 [<ffffffff81181b7c>] blkdev_open+0x85/0xd1 [<ffffffff8114f909>] __dentry_open+0x1af/0x303 [<ffffffff81150da8>] nameidata_to_filp+0x55/0x7d [<ffffffff8115ecb1>] do_filp_open+0x504/0x960 [<ffffffff8114f595>] do_sys_open+0x70/0x12e [<ffffffff8114f6c0>] sys_open+0x33/0x49 [<ffffffff810141c2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff -> #0 (&bdev->bd_mutex){+.+.+.}: [<ffffffff8109ec64>] __lock_acquire+0x884/0xb29 [<ffffffff810a0298>] lock_acquire+0xd5/0x115 [<ffffffff8152e318>] __mutex_lock_common+0x5d/0x3b0 [<ffffffff8152e789>] mutex_lock_nested+0x4d/0x6a [<ffffffff811809dc>] bd_release_from_disk+0x49/0x113 [<ffffffff81415cd5>] unbind_rdev_from_array+0x5f/0x148 [<ffffffff81416f96>] kick_rdev_from_array+0x25/0x48 [<ffffffff81417011>] export_array+0x58/0xc2 [<ffffffff81418d57>] do_md_stop+0x2cf/0x529 [<ffffffff8141d7ab>] md_ioctl+0xa5e/0xffe [<ffffffff8128141f>] __blkdev_driver_ioctl+0x4b/0xa2 [<ffffffff81281d9b>] blkdev_ioctl+0x881/0x8ee [<ffffffff81181f54>] block_ioctl+0x50/0x68 [<ffffffff81161466>] vfs_ioctl+0x3e/0xa2 [<ffffffff81161a03>] do_vfs_ioctl+0x499/0x500 [<ffffffff81161ad5>] sys_ioctl+0x6b/0xa2 [<ffffffff810141c2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff other info that might help us debug this: 2 locks held by mdadm/1348: #0: (&new->reconfig_mutex){+.+.+.}, at: [<ffffffff81415e82>] mddev_lock+0x2a/0x40 #1: (&new->open_mutex){+.+...}, at: [<ffffffff81418ac7>] do_md_stop+0x3f/0x529 stack backtrace: Pid: 1348, comm: mdadm Tainted: G W 2.6.31-0.125.rc5.git2.snitm.x86_64 #1 Call Trace: [<ffffffff8109e3c1>] print_circular_bug_tail+0x80/0x9f [<ffffffff8109ec64>] __lock_acquire+0x884/0xb29 [<ffffffff811809dc>] ? bd_release_from_disk+0x49/0x113 [<ffffffff81014d10>] ? restore_args+0x0/0x30 [<ffffffff811809dc>] ? bd_release_from_disk+0x49/0x113 [<ffffffff810a0298>] lock_acquire+0xd5/0x115 [<ffffffff811809dc>] ? bd_release_from_disk+0x49/0x113 [<ffffffff8152e318>] __mutex_lock_common+0x5d/0x3b0 [<ffffffff811809dc>] ? bd_release_from_disk+0x49/0x113 [<ffffffff811809dc>] ? bd_release_from_disk+0x49/0x113 [<ffffffff810537e7>] ? need_resched+0x36/0x54 [<ffffffff8152fbdf>] ? _spin_unlock_irq+0x46/0x61 [<ffffffff8152e789>] mutex_lock_nested+0x4d/0x6a [<ffffffff811809dc>] bd_release_from_disk+0x49/0x113 [<ffffffff81415cd5>] unbind_rdev_from_array+0x5f/0x148 [<ffffffff81416f96>] kick_rdev_from_array+0x25/0x48 [<ffffffff81082dd6>] ? flush_workqueue+0x0/0xd1 [<ffffffff81417011>] export_array+0x58/0xc2 [<ffffffff81418d57>] do_md_stop+0x2cf/0x529 [<ffffffff8152e6b8>] ? mutex_lock_interruptible_nested+0x4d/0x68 [<ffffffff8141d7ab>] md_ioctl+0xa5e/0xffe [<ffffffff8109e710>] ? __lock_acquire+0x330/0xb29 [<ffffffff8103bd8f>] ? pvclock_clocksource_read+0x56/0xa6 [<ffffffff8128141f>] __blkdev_driver_ioctl+0x4b/0xa2 [<ffffffff8109bc58>] ? lock_release_holdtime+0x61/0x7c [<ffffffff81281d9b>] blkdev_ioctl+0x881/0x8ee [<ffffffff81246cb1>] ? __rcu_read_unlock+0x34/0x4a [<ffffffff812472c9>] ? avc_has_perm_noaudit+0x358/0x37d [<ffffffff8109d37a>] ? mark_lock+0x31/0x221 [<ffffffff81247bee>] ? avc_has_perm+0x60/0x86 [<ffffffff8103bd8f>] ? pvclock_clocksource_read+0x56/0xa6 [<ffffffff812492b3>] ? inode_has_perm+0x7d/0xa0 [<ffffffff815305e6>] ? _spin_unlock_irqrestore+0x5e/0x83 [<ffffffff81181f54>] block_ioctl+0x50/0x68 [<ffffffff81161466>] vfs_ioctl+0x3e/0xa2 [<ffffffff81161a03>] do_vfs_ioctl+0x499/0x500 [<ffffffff81161ad5>] sys_ioctl+0x6b/0xa2 [<ffffffff810141c2>] system_call_fastpath+0x16/0x1b ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [md PATCH 8/8] md: Use revalidate_disk to effect changes in size of device. 2009-08-07 13:57 ` Mike Snitzer @ 2009-08-10 1:26 ` Neil Brown 0 siblings, 0 replies; 4+ messages in thread From: Neil Brown @ 2009-08-10 1:26 UTC (permalink / raw) To: Mike Snitzer; +Cc: H. Peter Anvin, linux-raid, linux-kernel On Friday August 7, snitzer@gmail.com wrote: > > Neil, > > I finally tested this with the LVM testsuite's > t-pvcreate-operation-md.sh script that I mentioned here: > http://lkml.org/lkml/2009/8/5/275 > > The test succeeds but unfortunately triggers the following lockdep warning: Thanks! It looks like I was protecting too much of do_md_stop with the new lock. This patch should do better. I'll see if I can try out that testsuite... Thanks, NeilBrown >From 603c37e9533d7a9b329c79a9075a2d50955dbf2c Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@suse.de> Date: Mon, 10 Aug 2009 10:37:12 +1000 Subject: [PATCH] Remove deadlock potential in md_open A recent commit: commit 449aad3e25358812c43afc60918c5ad3819488e7 introduced the possibility of an A-B/B-A deadlock between bd_mutex and reconfig_mutex. __blkdev_get holds bd_mutex while calling md_open which takes reconfig_mutex, do_md_run is always called with reconfig_mutex held, and it now takes bd_mutex in the call the revalidate_disk. This potential deadlock was not caught by lockdep due to the use of mutex_lock_interruptible_nexted which was introduced by commit d63a5a74dee87883fda6b7d170244acaac5b05e8 do avoid a warning of an impossible deadlock. It is quite possible to split reconfig_mutex in to two locks. One protects the array data structures while it is being reconfigured, the other ensures that an array is never even partially open while it is being deactivated. So create a new lock, open_mutex, just to ensure exclusion between 'open' and 'stop'. This avoids the deadlock and also avoid the lockdep warning mentioned in commit d63a5a74d Reported-by: "Mike Snitzer" <snitzer@gmail.com> Reported-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NeilBrown <neilb@suse.de> --- drivers/md/md.c | 18 ++++++++++-------- drivers/md/md.h | 10 ++++++++++ 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 5b98bea..5614500 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -359,6 +359,7 @@ static mddev_t * mddev_find(dev_t unit) else new->md_minor = MINOR(unit) >> MdpMinorShift; + mutex_init(&new->open_mutex); mutex_init(&new->reconfig_mutex); INIT_LIST_HEAD(&new->disks); INIT_LIST_HEAD(&new->all_mddevs); @@ -4304,12 +4305,11 @@ static int do_md_stop(mddev_t * mddev, int mode, int is_open) struct gendisk *disk = mddev->gendisk; mdk_rdev_t *rdev; + mutex_lock(&mddev->open_mutex); if (atomic_read(&mddev->openers) > is_open) { printk("md: %s still in use.\n",mdname(mddev)); - return -EBUSY; - } - - if (mddev->pers) { + err = -EBUSY; + } else if (mddev->pers) { if (mddev->sync_thread) { set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); @@ -4367,7 +4367,10 @@ static int do_md_stop(mddev_t * mddev, int mode, int is_open) set_disk_ro(disk, 1); clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); } - +out: + mutex_unlock(&mddev->open_mutex); + if (err) + return err; /* * Free resources if final stop */ @@ -4433,7 +4436,6 @@ static int do_md_stop(mddev_t * mddev, int mode, int is_open) blk_integrity_unregister(disk); md_new_event(mddev); sysfs_notify_dirent(mddev->sysfs_state); -out: return err; } @@ -5518,12 +5520,12 @@ static int md_open(struct block_device *bdev, fmode_t mode) } BUG_ON(mddev != bdev->bd_disk->private_data); - if ((err = mutex_lock_interruptible_nested(&mddev->reconfig_mutex, 1))) + if ((err = mutex_lock_interruptible(&mddev->open_mutex))) goto out; err = 0; atomic_inc(&mddev->openers); - mddev_unlock(mddev); + mutex_unlock(&mddev->open_mutex); check_disk_change(bdev); out: diff --git a/drivers/md/md.h b/drivers/md/md.h index 78f0316..f8fc188 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -223,6 +223,16 @@ struct mddev_s * so we don't loop trying */ int in_sync; /* know to not need resync */ + /* 'open_mutex' avoids races between 'md_open' and 'do_md_stop', so + * that we are never stopping an array while it is open. + * 'reconfig_mutex' protects all other reconfiguration. + * These locks are separate due to conflicting interactions + * with bdev->bd_mutex. + * Lock ordering is: + * reconfig_mutex -> bd_mutex : e.g. do_md_run -> revalidate_disk + * bd_mutex -> open_mutex: e.g. __blkdev_get -> md_open + */ + struct mutex open_mutex; struct mutex reconfig_mutex; atomic_t active; /* general refcount */ atomic_t openers; /* number of active opens */ -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-08-10 1:26 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20090802215444.4198.83094.stgit@notabene.brown> [not found] ` <20090802215818.4198.77041.stgit@notabene.brown> [not found] ` <4A7A0287.5010505@zytor.com> [not found] ` <170fa0d20908051803ud6ef819xdad3cafe44cc1fb8@mail.gmail.com> [not found] ` <4A7A2EA6.9020603@zytor.com> 2009-08-06 3:13 ` [md PATCH 8/8] md: Use revalidate_disk to effect changes in size of device Neil Brown 2009-08-06 4:35 ` H. Peter Anvin 2009-08-07 13:57 ` Mike Snitzer 2009-08-10 1:26 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).