* [PATCH] dm: use revalidate_disk to update device size after set_capacity @ 2010-10-19 22:07 Mike Snitzer 2010-10-20 5:42 ` Jun'ichi Nomura 2010-10-29 21:50 ` dm: lock bd_mutex when setting device size Mike Snitzer 0 siblings, 2 replies; 11+ messages in thread From: Mike Snitzer @ 2010-10-19 22:07 UTC (permalink / raw) To: dm-devel Avoid taking md->bdev->bd_inode->i_mutex to update the DM block device's size. Doing so eliminates the potential for deadlock if an fsync is racing with a device resize. Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- drivers/md/dm.c | 5 +---- 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index f934e98..fd315a7 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1995,10 +1995,7 @@ static void event_callback(void *context) static void __set_size(struct mapped_device *md, sector_t size) { set_capacity(md->disk, size); - - mutex_lock(&md->bdev->bd_inode->i_mutex); - i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT); - mutex_unlock(&md->bdev->bd_inode->i_mutex); + revalidate_disk(md->disk); } /* ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] dm: use revalidate_disk to update device size after set_capacity 2010-10-19 22:07 [PATCH] dm: use revalidate_disk to update device size after set_capacity Mike Snitzer @ 2010-10-20 5:42 ` Jun'ichi Nomura 2010-10-28 1:16 ` Mike Snitzer 2010-10-29 21:50 ` dm: lock bd_mutex when setting device size Mike Snitzer 1 sibling, 1 reply; 11+ messages in thread From: Jun'ichi Nomura @ 2010-10-20 5:42 UTC (permalink / raw) To: device-mapper development, Mike Snitzer Hi Mike, (10/20/10 07:07), Mike Snitzer wrote: > Avoid taking md->bdev->bd_inode->i_mutex to update the DM block device's > size. Doing so eliminates the potential for deadlock if an fsync is > racing with a device resize. > > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > --- > drivers/md/dm.c | 5 +---- > 1 files changed, 1 insertions(+), 4 deletions(-) > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index f934e98..fd315a7 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -1995,10 +1995,7 @@ static void event_callback(void *context) > static void __set_size(struct mapped_device *md, sector_t size) > { > set_capacity(md->disk, size); > - > - mutex_lock(&md->bdev->bd_inode->i_mutex); > - i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT); > - mutex_unlock(&md->bdev->bd_inode->i_mutex); > + revalidate_disk(md->disk); > } Some concerns/questions: - revalidate_disk() calls bdget() inside it. Does bdget() no longer block on I/O? Sometime ago, bdget() has been blocked by I_LOCK, where process holding I_LOCK blocked by resize. Since I_LOCK has disappeared, I suspect this might not be a valid concern anymore. FYI, past discussion on this topic: http://www.redhat.com/archives/dm-devel/2007-October/msg00134.html (it's not my intention to push the patch in the above URL) - revalidate_disk() ends up with get_super(): revalidate_disk check_disk_size_change flush_disk __invalidate_device get_super and get_super() takes sb->s_umount. OTOH, there are codes which wait on I/O holding s_umount exclusively. E.g. freeze_super() calls sync_filesystem(). So it seems there is a possible deadlock if you use revalidate_disk from dm_swap_table. I've been away from that part of the code for a while. So it's nice if the above is just a false alarm... Thanks, -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: dm: use revalidate_disk to update device size after set_capacity 2010-10-20 5:42 ` Jun'ichi Nomura @ 2010-10-28 1:16 ` Mike Snitzer 2010-10-28 12:15 ` Jun'ichi Nomura 0 siblings, 1 reply; 11+ messages in thread From: Mike Snitzer @ 2010-10-28 1:16 UTC (permalink / raw) To: Jun'ichi Nomura; +Cc: device-mapper development Hi Jun'ichi, Thanks for your note. It took me a while to set aside some time to look at this closer. Please see my response inlined below. On Wed, Oct 20 2010 at 1:42am -0400, Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote: > Hi Mike, > > (10/20/10 07:07), Mike Snitzer wrote: > > Avoid taking md->bdev->bd_inode->i_mutex to update the DM block device's > > size. Doing so eliminates the potential for deadlock if an fsync is > > racing with a device resize. > > > > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > > --- > > drivers/md/dm.c | 5 +---- > > 1 files changed, 1 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > > index f934e98..fd315a7 100644 > > --- a/drivers/md/dm.c > > +++ b/drivers/md/dm.c > > @@ -1995,10 +1995,7 @@ static void event_callback(void *context) > > static void __set_size(struct mapped_device *md, sector_t size) > > { > > set_capacity(md->disk, size); > > - > > - mutex_lock(&md->bdev->bd_inode->i_mutex); > > - i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT); > > - mutex_unlock(&md->bdev->bd_inode->i_mutex); > > + revalidate_disk(md->disk); > > } > > Some concerns/questions: > > - revalidate_disk() calls bdget() inside it. > Does bdget() no longer block on I/O? > Sometime ago, bdget() has been blocked by I_LOCK, > where process holding I_LOCK blocked by resize. > Since I_LOCK has disappeared, I suspect this might not > be a valid concern anymore. > FYI, past discussion on this topic: > http://www.redhat.com/archives/dm-devel/2007-October/msg00134.html > (it's not my intention to push the patch in the above URL) OK, thanks for the pointer. Yes, I agree with you, seems there is no longer any obvious potential for bdget to block on IO. > - revalidate_disk() ends up with get_super(): > revalidate_disk > check_disk_size_change > flush_disk > __invalidate_device > get_super > and get_super() takes sb->s_umount. > OTOH, there are codes which wait on I/O holding s_umount > exclusively. E.g. freeze_super() calls sync_filesystem(). > So it seems there is a possible deadlock if you use > revalidate_disk from dm_swap_table. Doesn't seem like a freeze_super of a particular DM device would conflict with revalidate_disk relative to sb->s_umount. But it is concerning that revalidate_disk is calling flush_disk while the DM device is suspended (though in practice this doesn't cause a problem): dm_suspend() - flushes any outstanding IO. lock_fs freeze_bdev freeze_super down_write(&sb->s_umount); sync_filesystem up_write(&sb->s_umount); dm_swap_table() __bind __set_size revalidate_disk check_disk_size_change flush_disk __invalidate_device get_super down_read(&sb->s_umount); up_read(&sb->s_umount); dm_resume() unlock_fs thaw_bdev thaw_super down_write(&sb->s_umount); deactivate_locked_super up_write(&s->s_umount); > I've been away from that part of the code for a while. > So it's nice if the above is just a false alarm... Clearly warrants further analysis. check_disk_size_change() takes bdev->bd_mutex while changing bdev->bd_inode->i_size (rather than bdev->bd_inode->i_mutex). This is what attracted me to revalidate_disk (in addition to the rest of the block drivers using it for resize -- thanks to Neil Brown for pointing it out). But in my limited testing of the proposed patch (above), using linear DM target over DM mpath, I haven't seen any problems. I was doing IO in parallel to the resize. Notice with the patch we now see the following messages (dm-0 is the mpath device, dm-1 is the linear): dm-0: detected capacity change from 0 to 10737418240 dm-1: detected capacity change from 0 to 8589934592 ... dm-1: detected capacity change from 8589934592 to 9663676416 dm-1: detected capacity change from 9663676416 to 9667870720 But I haven't yet fully understood why check_disk_size_change's use of bdev->bd_mutex sufficiently protects access to bdev->bd_inode->i_size (unless all access to bdev->bd_inode->i_size takes bdev->bd_mutex; DM being an exception?). Given how naive I am on these core block paths there is more analysis needed to verify/determine the proper fix for DM device resize (while the device is suspended). Could be the following patch be sufficient? (avoids potential for IO while device is suspended -- final patch would need comments explaining why revalidate_disk was avoided) Mike --- drivers/md/dm.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 7cb1352..248794a 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1996,9 +1996,9 @@ static void __set_size(struct mapped_device *md, sector_t size) { set_capacity(md->disk, size); - mutex_lock(&md->bdev->bd_inode->i_mutex); + mutex_lock(&md->bdev->bd_mutex); i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT); - mutex_unlock(&md->bdev->bd_inode->i_mutex); + mutex_unlock(&md->bdev->bd_mutex); } /* ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: dm: use revalidate_disk to update device size after set_capacity 2010-10-28 1:16 ` Mike Snitzer @ 2010-10-28 12:15 ` Jun'ichi Nomura 2010-10-28 19:54 ` Mike Snitzer 0 siblings, 1 reply; 11+ messages in thread From: Jun'ichi Nomura @ 2010-10-28 12:15 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development Hi Mike, (10/28/10 10:16), Mike Snitzer wrote: > But in my limited testing of the proposed patch (above), using linear DM > target over DM mpath, I haven't seen any problems. I was doing IO in > parallel to the resize. Notice with the patch we now see the following > messages (dm-0 is the mpath device, dm-1 is the linear): There is FIFREEZE ioctl, which calls freeze_super. So if you mix a process doing FIFREEZE (xfs_freeze?) in your test, I think you hit the deadlock like this: process A process B ----------------------------------------------- suspend dm dev ioctl(FIFREEZE) freeze_super() hold s_umount sync_filesystems() wait for I/O flowing.. resume dm dev __set_size revalidate_disk() hold bd_mutex flush_disk() wait for s_umount > But I haven't yet fully understood why check_disk_size_change's use of > bdev->bd_mutex sufficiently protects access to bdev->bd_inode->i_size > (unless all access to bdev->bd_inode->i_size takes bdev->bd_mutex; DM > being an exception?). i_size_read/write uses seqcount to protect the reads from accessing incomplete write. But the seqcount itself needs protection. Otherwise concurrent writes will break the seqcount scheme. So i_size_write()s need mutual exclusion, but not all accesses do. That's my understanding from the comments in include/linux/fs.h. > Given how naive I am on these core block paths there is more analysis > needed to verify/determine the proper fix for DM device resize (while > the device is suspended). > > Could be the following patch be sufficient? (avoids potential for IO > while device is suspended -- final patch would need comments explaining > why revalidate_disk was avoided) Though I can't point out actual problem, I think it's deadlock-prone to take bd_mutex in dm_swap_table. There are already codes which do I/O while holding bd_mutex, e.g. block/ioctl.c, though the code is not called for dm, so we can' just set a general rule "Don't do I/O while holding bd_mutex". Also, even if I/O is not done under bd_mutex, it might be blocked by other. For example, though currently nobody can call revalidate_disk for dm, process A process B process C ---------------------------------------------------------- suspend dm dev freeze_super() hold s_umount sync_filesystems() wait for I/O flowing.. revalidate_disk() hold bd_mutex flush_disk() wait for s_umount resume dm dev __set_size wait for bd_mutex If __set_size() could be done in later stage of do_resume(), we can use revalidate_disk() for dm, too. What do you think? Thanks, -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: dm: use revalidate_disk to update device size after set_capacity 2010-10-28 12:15 ` Jun'ichi Nomura @ 2010-10-28 19:54 ` Mike Snitzer 2010-10-28 22:18 ` Mike Snitzer 2010-10-29 3:00 ` Jun'ichi Nomura 0 siblings, 2 replies; 11+ messages in thread From: Mike Snitzer @ 2010-10-28 19:54 UTC (permalink / raw) To: Jun'ichi Nomura; +Cc: device-mapper development On Thu, Oct 28 2010 at 8:15am -0400, Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote: > Hi Mike, > > (10/28/10 10:16), Mike Snitzer wrote: > > But in my limited testing of the proposed patch (above), using linear DM > > target over DM mpath, I haven't seen any problems. I was doing IO in > > parallel to the resize. Notice with the patch we now see the following > > messages (dm-0 is the mpath device, dm-1 is the linear): > > There is FIFREEZE ioctl, which calls freeze_super. > So if you mix a process doing FIFREEZE (xfs_freeze?) in your test, > I think you hit the deadlock like this: > > process A process B > ----------------------------------------------- > suspend dm dev > ioctl(FIFREEZE) > freeze_super() > hold s_umount > sync_filesystems() > wait for I/O flowing.. > > resume dm dev > __set_size > revalidate_disk() > hold bd_mutex > flush_disk() > wait for s_umount OK, I didn't consider FIFREEZE ioctl.. But regardless, I think it is now clear that we must avoid using revalidate_disk() while a DM device is suspended. > > But I haven't yet fully understood why check_disk_size_change's use of > > bdev->bd_mutex sufficiently protects access to bdev->bd_inode->i_size > > (unless all access to bdev->bd_inode->i_size takes bdev->bd_mutex; DM > > being an exception?). > > i_size_read/write uses seqcount to protect the reads from > accessing incomplete write. > But the seqcount itself needs protection. Otherwise concurrent > writes will break the seqcount scheme. > So i_size_write()s need mutual exclusion, but not all accesses do. > That's my understanding from the comments in include/linux/fs.h. Correct, but my point was that revalidate_disk protects the bdev->bd_inode->i_size i_size_write() with bdev->bd_mutex. Whereas quite a few non-block driver i_size_write() callers use the inode's i_mutex; as DM currently does with md->bdev->bd_inode->i_mutex. As we now know, using md->bdev->bd_inode->i_mutex is prone to deadlock against fsync. > > Given how naive I am on these core block paths there is more analysis > > needed to verify/determine the proper fix for DM device resize (while > > the device is suspended). > > > > Could be the following patch be sufficient? (avoids potential for IO > > while device is suspended -- final patch would need comments explaining > > why revalidate_disk was avoided) > > Though I can't point out actual problem, > I think it's deadlock-prone to take bd_mutex in dm_swap_table. > > There are already codes which do I/O while holding bd_mutex, > e.g. block/ioctl.c, though the code is not called for dm, > so we can' just set a general rule "Don't do I/O while holding bd_mutex". Right, it would work provided we had that understanding within DM. > Also, even if I/O is not done under bd_mutex, it might be blocked by > other. For example, though currently nobody can call revalidate_disk for dm, Hmm, that was a strange example, as you pointed out, considering the fictional revalidate_disk caller. I was proposing using bd_mutex directly -- independent of revalidate_disk and its associated flushing. > If __set_size() could be done in later stage of do_resume(), > we can use revalidate_disk() for dm, too. > What do you think? I think it would be incorrect to make a new DM table live without first adjusting the size of the DM device to reflect the new table. Could result in racing IO to the end of a device which would cause IO errors like "access beyond end of device". So our only option would be to change the device's size _before_ do_resume's dm_suspend(). But then if dm_swap_table() fails we'd need to revert back to the old size -- which is clearly awkward. I think updating the device's size within dm_swap_table() really is the most logical place. I'll cleanup the patch from my previous mail to include some in-code comments and re-post it. Please ack it if you agree that direct use of bd_mutex in __set_size() is a reasonable fix given the current code. Mike ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: dm: use revalidate_disk to update device size after set_capacity 2010-10-28 19:54 ` Mike Snitzer @ 2010-10-28 22:18 ` Mike Snitzer 2010-10-29 3:00 ` Jun'ichi Nomura 1 sibling, 0 replies; 11+ messages in thread From: Mike Snitzer @ 2010-10-28 22:18 UTC (permalink / raw) To: Jun'ichi Nomura; +Cc: device-mapper development On Thu, Oct 28 2010 at 3:54pm -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Thu, Oct 28 2010 at 8:15am -0400, > Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote: > > Though I can't point out actual problem, > > I think it's deadlock-prone to take bd_mutex in dm_swap_table. > > > > There are already codes which do I/O while holding bd_mutex, > > e.g. block/ioctl.c, though the code is not called for dm, > > so we can' just set a general rule "Don't do I/O while holding bd_mutex". > > Right, it would work provided we had that understanding within DM. I think I misunderstood what you were saying as: no code does I/O to DM while holding bd_mutex so we _can_ just set the general rule.... But re-reading your mail made me realize you were likely saying the opposite? e.g. your can' was meant to be can't. So even though we currently can't see a real deadlock associated with using ->bd_mutex directly you're contending there is potential for one (may already be one in hiding or one could be introduced in the future). That is fine, best to be conservative. > I'll cleanup the patch from my previous mail to include some in-code > comments and re-post it. Please ack it if you agree that direct use of > bd_mutex in __set_size() is a reasonable fix given the current code. Taking a step back, don't we already have adequate locking for __set_size's i_size_write() via dm_swap_table's md->suspend_lock? So something like the following? Mike --- drivers/md/dm.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 7cb1352..e71b8a1 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1996,9 +1996,8 @@ static void __set_size(struct mapped_device *md, sector_t size) { set_capacity(md->disk, size); - mutex_lock(&md->bdev->bd_inode->i_mutex); + /* i_size_write() is protected by md->suspend_lock */ i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT); - mutex_unlock(&md->bdev->bd_inode->i_mutex); } /* ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: dm: use revalidate_disk to update device size after set_capacity 2010-10-28 19:54 ` Mike Snitzer 2010-10-28 22:18 ` Mike Snitzer @ 2010-10-29 3:00 ` Jun'ichi Nomura 1 sibling, 0 replies; 11+ messages in thread From: Jun'ichi Nomura @ 2010-10-29 3:00 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development Hi Mike, (10/29/10 04:54), Mike Snitzer wrote: >> If __set_size() could be done in later stage of do_resume(), >> we can use revalidate_disk() for dm, too. >> What do you think? > > I think it would be incorrect to make a new DM table live without first > adjusting the size of the DM device to reflect the new table. Could > result in racing IO to the end of a device which would cause IO errors > like "access beyond end of device". Yes, but at that point, resume ioctl is still in progress. Doesn't that mean "the resize is still in progress"? If so, who cares that I/O while the resize is not yet completed? > I'll cleanup the patch from my previous mail to include some in-code > comments and re-post it. Please ack it if you agree that direct use of > bd_mutex in __set_size() is a reasonable fix given the current code. My point is that it's better to avoid having a code with special assumption on generic code, either "any codes don't do I/O while holding bd_mutex" or "dm_resume is the only code which calls i_size_write() for dm device". I think it's prone for someone who don't care dm specifics to introduce a new code that breaks such an assumption. Anyway, I think your bd_mutex patch should be fine for now and is better than the current code with i_mutex, which has a real deadlock issue. -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 11+ messages in thread
* dm: lock bd_mutex when setting device size 2010-10-19 22:07 [PATCH] dm: use revalidate_disk to update device size after set_capacity Mike Snitzer 2010-10-20 5:42 ` Jun'ichi Nomura @ 2010-10-29 21:50 ` Mike Snitzer 2010-11-01 7:19 ` Jun'ichi Nomura 1 sibling, 1 reply; 11+ messages in thread From: Mike Snitzer @ 2010-10-29 21:50 UTC (permalink / raw) To: dm-devel; +Cc: Jun'ichi Nomura Avoid taking md->bdev->bd_inode->i_mutex to update the DM block device's size. Using md->bdev->bd_mutex eliminates the potential for deadlock if an fsync is racing with a device resize. revalidate_disk() was avoided because it would flush_disk() while the DM device is suspended. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> --- drivers/md/dm.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) Jun'ichi, was the following your implict Acked-by? Care to make it explicit? "Anyway, I think your bd_mutex patch should be fine for now and is better than the current code with i_mutex, which has a real deadlock issue." diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 7cb1352..248794a 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1996,9 +1996,9 @@ static void __set_size(struct mapped_device *md, sector_t size) { set_capacity(md->disk, size); - mutex_lock(&md->bdev->bd_inode->i_mutex); + mutex_lock(&md->bdev->bd_mutex); i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT); - mutex_unlock(&md->bdev->bd_inode->i_mutex); + mutex_unlock(&md->bdev->bd_mutex); } /* ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: dm: lock bd_mutex when setting device size 2010-10-29 21:50 ` dm: lock bd_mutex when setting device size Mike Snitzer @ 2010-11-01 7:19 ` Jun'ichi Nomura 2010-11-01 13:14 ` Mike Snitzer 0 siblings, 1 reply; 11+ messages in thread From: Jun'ichi Nomura @ 2010-11-01 7:19 UTC (permalink / raw) To: Mike Snitzer; +Cc: dm-devel Hi Mike, (10/30/10 06:50), Mike Snitzer wrote: > Avoid taking md->bdev->bd_inode->i_mutex to update the DM block device's > size. Using md->bdev->bd_mutex eliminates the potential for deadlock if > an fsync is racing with a device resize. > > revalidate_disk() was avoided because it would flush_disk() while the DM > device is suspended. > > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> > --- > drivers/md/dm.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > Jun'ichi, was the following your implict Acked-by? Care to make it > explicit? > "Anyway, I think your bd_mutex patch should be fine for now and is > better than the current code with i_mutex, which has a real deadlock > issue." No, it was not an ACK. (This is not multipath. So I think you don't need my ack.) I'm reluctant to ack this because, as I wrote, it's prone to cause deadlock in future. But I couldn't find a real problem with the patch, so I'm not NACK-ing either. -- Jun'ichi Nomura, NEC Corporation ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: dm: lock bd_mutex when setting device size 2010-11-01 7:19 ` Jun'ichi Nomura @ 2010-11-01 13:14 ` Mike Snitzer 2010-11-03 18:06 ` [PATCH] dm: remove extra locking when changing " Mike Snitzer 0 siblings, 1 reply; 11+ messages in thread From: Mike Snitzer @ 2010-11-01 13:14 UTC (permalink / raw) To: Jun'ichi Nomura; +Cc: dm-devel On Mon, Nov 01 2010 at 3:19am -0400, Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote: > Hi Mike, > > (10/30/10 06:50), Mike Snitzer wrote: > > Avoid taking md->bdev->bd_inode->i_mutex to update the DM block device's > > size. Using md->bdev->bd_mutex eliminates the potential for deadlock if > > an fsync is racing with a device resize. > > > > revalidate_disk() was avoided because it would flush_disk() while the DM > > device is suspended. > > > > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > > Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> > > --- > > drivers/md/dm.c | 4 ++-- > > 1 files changed, 2 insertions(+), 2 deletions(-) > > > > Jun'ichi, was the following your implict Acked-by? Care to make it > > explicit? > > "Anyway, I think your bd_mutex patch should be fine for now and is > > better than the current code with i_mutex, which has a real deadlock > > issue." > > No, it was not an ACK. > (This is not multipath. So I think you don't need my ack.) Your ack is always meaningful but that is fine too. > I'm reluctant to ack this because, as I wrote, it's prone to > cause deadlock in future. > But I couldn't find a real problem with the patch, > so I'm not NACK-ing either. dm_swap_table's md->suspend_mutex already provides adequate protection of __set_size's i_size_write. You didn't like this because it implied: "dm_resume is the only code which calls i_size_write() for dm device". I don't see that as a problem at all; its a more meaningful/enforcable rule to have in DM than something tethered to the generic bd_mutex. I'll discuss this further with Alasdair and/or others. Mike ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] dm: remove extra locking when changing device size 2010-11-01 13:14 ` Mike Snitzer @ 2010-11-03 18:06 ` Mike Snitzer 0 siblings, 0 replies; 11+ messages in thread From: Mike Snitzer @ 2010-11-03 18:06 UTC (permalink / raw) To: dm-devel DM no longer needlessly holds md->bdev->bd_inode->i_mutex when changing the size of a DM device. This additional locking is unnecessary because i_size_write() is already protected by the existing critical section in dm_swap_table(). DM already has a reference on md->bdev so the associated bd_inode may be changed without lifetime concerns. A negative side-effect of having held md->bdev->bd_inode->i_mutex was that a concurrent DM device resize and flush (via fsync) would deadlock. Dropping md->bdev->bd_inode->i_mutex eliminates this potential for deadlock. The following reproducer no longer deadlocks: https://www.redhat.com/archives/dm-devel/2009-July/msg00284.html Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 7cb1352..1794dee 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1996,9 +1996,12 @@ static void __set_size(struct mapped_device *md, sector_t size) { set_capacity(md->disk, size); - mutex_lock(&md->bdev->bd_inode->i_mutex); + /* + * Only DM is allowed to change the size of a DM device. + * i_size_write() is protected by the dm_swap_table() critical + * section that uses md->suspend_lock. + */ i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT); - mutex_unlock(&md->bdev->bd_inode->i_mutex); } /* ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-11-03 18:06 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-10-19 22:07 [PATCH] dm: use revalidate_disk to update device size after set_capacity Mike Snitzer 2010-10-20 5:42 ` Jun'ichi Nomura 2010-10-28 1:16 ` Mike Snitzer 2010-10-28 12:15 ` Jun'ichi Nomura 2010-10-28 19:54 ` Mike Snitzer 2010-10-28 22:18 ` Mike Snitzer 2010-10-29 3:00 ` Jun'ichi Nomura 2010-10-29 21:50 ` dm: lock bd_mutex when setting device size Mike Snitzer 2010-11-01 7:19 ` Jun'ichi Nomura 2010-11-01 13:14 ` Mike Snitzer 2010-11-03 18:06 ` [PATCH] dm: remove extra locking when changing " Mike Snitzer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.