All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dm: use revalidate_disk to update device size after set_capacity
@ 2010-10-19 22:07 Mike Snitzer
  2010-10-20  5:42 ` Jun'ichi Nomura
  2010-10-29 21:50 ` dm: lock bd_mutex when setting device size Mike Snitzer
  0 siblings, 2 replies; 11+ messages in thread
From: Mike Snitzer @ 2010-10-19 22:07 UTC (permalink / raw)
  To: dm-devel

Avoid taking md->bdev->bd_inode->i_mutex to update the DM block device's
size.  Doing so eliminates the potential for deadlock if an fsync is
racing with a device resize.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f934e98..fd315a7 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1995,10 +1995,7 @@ static void event_callback(void *context)
 static void __set_size(struct mapped_device *md, sector_t size)
 {
 	set_capacity(md->disk, size);
-
-	mutex_lock(&md->bdev->bd_inode->i_mutex);
-	i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT);
-	mutex_unlock(&md->bdev->bd_inode->i_mutex);
+	revalidate_disk(md->disk);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] dm: use revalidate_disk to update device size after set_capacity
  2010-10-19 22:07 [PATCH] dm: use revalidate_disk to update device size after set_capacity Mike Snitzer
@ 2010-10-20  5:42 ` Jun'ichi Nomura
  2010-10-28  1:16   ` Mike Snitzer
  2010-10-29 21:50 ` dm: lock bd_mutex when setting device size Mike Snitzer
  1 sibling, 1 reply; 11+ messages in thread
From: Jun'ichi Nomura @ 2010-10-20  5:42 UTC (permalink / raw)
  To: device-mapper development, Mike Snitzer

Hi Mike,

(10/20/10 07:07), Mike Snitzer wrote:
> Avoid taking md->bdev->bd_inode->i_mutex to update the DM block device's
> size.  Doing so eliminates the potential for deadlock if an fsync is
> racing with a device resize.
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  drivers/md/dm.c |    5 +----
>  1 files changed, 1 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index f934e98..fd315a7 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1995,10 +1995,7 @@ static void event_callback(void *context)
>  static void __set_size(struct mapped_device *md, sector_t size)
>  {
>  	set_capacity(md->disk, size);
> -
> -	mutex_lock(&md->bdev->bd_inode->i_mutex);
> -	i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT);
> -	mutex_unlock(&md->bdev->bd_inode->i_mutex);
> +	revalidate_disk(md->disk);
>  }

Some concerns/questions:

- revalidate_disk() calls bdget() inside it.
  Does bdget() no longer block on I/O?
  Sometime ago, bdget() has been blocked by I_LOCK,
  where process holding I_LOCK blocked by resize.
  Since I_LOCK has disappeared, I suspect this might not
  be a valid concern anymore.
  FYI, past discussion on this topic:
  http://www.redhat.com/archives/dm-devel/2007-October/msg00134.html
  (it's not my intention to push the patch in the above URL)

- revalidate_disk() ends up with get_super():
    revalidate_disk
      check_disk_size_change
        flush_disk
          __invalidate_device
            get_super
  and get_super() takes sb->s_umount.
  OTOH, there are codes which wait on I/O holding s_umount
  exclusively. E.g. freeze_super() calls sync_filesystem().
  So it seems there is a possible deadlock if you use
  revalidate_disk from dm_swap_table.

I've been away from that part of the code for a while.
So it's nice if the above is just a false alarm...

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm: use revalidate_disk to update device size after set_capacity
  2010-10-20  5:42 ` Jun'ichi Nomura
@ 2010-10-28  1:16   ` Mike Snitzer
  2010-10-28 12:15     ` Jun'ichi Nomura
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Snitzer @ 2010-10-28  1:16 UTC (permalink / raw)
  To: Jun'ichi Nomura; +Cc: device-mapper development

Hi Jun'ichi,

Thanks for your note.  It took me a while to set aside some time to look
at this closer.  Please see my response inlined below.

On Wed, Oct 20 2010 at  1:42am -0400,
Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:

> Hi Mike,
> 
> (10/20/10 07:07), Mike Snitzer wrote:
> > Avoid taking md->bdev->bd_inode->i_mutex to update the DM block device's
> > size.  Doing so eliminates the potential for deadlock if an fsync is
> > racing with a device resize.
> > 
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > ---
> >  drivers/md/dm.c |    5 +----
> >  1 files changed, 1 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index f934e98..fd315a7 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1995,10 +1995,7 @@ static void event_callback(void *context)
> >  static void __set_size(struct mapped_device *md, sector_t size)
> >  {
> >  	set_capacity(md->disk, size);
> > -
> > -	mutex_lock(&md->bdev->bd_inode->i_mutex);
> > -	i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT);
> > -	mutex_unlock(&md->bdev->bd_inode->i_mutex);
> > +	revalidate_disk(md->disk);
> >  }
> 
> Some concerns/questions:
> 
> - revalidate_disk() calls bdget() inside it.
>   Does bdget() no longer block on I/O?
>   Sometime ago, bdget() has been blocked by I_LOCK,
>   where process holding I_LOCK blocked by resize.
>   Since I_LOCK has disappeared, I suspect this might not
>   be a valid concern anymore.
>   FYI, past discussion on this topic:
>   http://www.redhat.com/archives/dm-devel/2007-October/msg00134.html
>   (it's not my intention to push the patch in the above URL)

OK, thanks for the pointer.

Yes, I agree with you, seems there is no longer any obvious potential
for bdget to block on IO.

> - revalidate_disk() ends up with get_super():
>     revalidate_disk
>       check_disk_size_change
>         flush_disk
>           __invalidate_device
>             get_super
>   and get_super() takes sb->s_umount.
>   OTOH, there are codes which wait on I/O holding s_umount
>   exclusively. E.g. freeze_super() calls sync_filesystem().
>   So it seems there is a possible deadlock if you use
>   revalidate_disk from dm_swap_table.

Doesn't seem like a freeze_super of a particular DM device would
conflict with revalidate_disk relative to sb->s_umount.  But it is
concerning that revalidate_disk is calling flush_disk while the DM
device is suspended (though in practice this doesn't cause a problem):

dm_suspend() - flushes any outstanding IO.
  lock_fs
    freeze_bdev
      freeze_super
	down_write(&sb->s_umount);
        sync_filesystem
	up_write(&sb->s_umount);

dm_swap_table()
  __bind
    __set_size
      revalidate_disk
        check_disk_size_change
          flush_disk
            __invalidate_device
              get_super
	        down_read(&sb->s_umount);
		up_read(&sb->s_umount);

dm_resume()
  unlock_fs
    thaw_bdev
      thaw_super
        down_write(&sb->s_umount);
	deactivate_locked_super
	  up_write(&s->s_umount);

> I've been away from that part of the code for a while.
> So it's nice if the above is just a false alarm...

Clearly warrants further analysis.

check_disk_size_change() takes bdev->bd_mutex while changing
bdev->bd_inode->i_size (rather than bdev->bd_inode->i_mutex). This is
what attracted me to revalidate_disk (in addition to the rest of the
block drivers using it for resize -- thanks to Neil Brown for pointing
it out).

But in my limited testing of the proposed patch (above), using linear DM
target over DM mpath, I haven't seen any problems.  I was doing IO in
parallel to the resize.  Notice with the patch we now see the following
messages (dm-0 is the mpath device, dm-1 is the linear):

dm-0: detected capacity change from 0 to 10737418240
dm-1: detected capacity change from 0 to 8589934592
...
dm-1: detected capacity change from 8589934592 to 9663676416
dm-1: detected capacity change from 9663676416 to 9667870720

But I haven't yet fully understood why check_disk_size_change's use of
bdev->bd_mutex sufficiently protects access to bdev->bd_inode->i_size
(unless all access to bdev->bd_inode->i_size takes bdev->bd_mutex; DM
being an exception?).

Given how naive I am on these core block paths there is more analysis
needed to verify/determine the proper fix for DM device resize (while
the device is suspended).

Could be the following patch be sufficient? (avoids potential for IO
while device is suspended -- final patch would need comments explaining
why revalidate_disk was avoided)

Mike

---
 drivers/md/dm.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7cb1352..248794a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1996,9 +1996,9 @@ static void __set_size(struct mapped_device *md, sector_t size)
 {
 	set_capacity(md->disk, size);
 
-	mutex_lock(&md->bdev->bd_inode->i_mutex);
+	mutex_lock(&md->bdev->bd_mutex);
 	i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT);
-	mutex_unlock(&md->bdev->bd_inode->i_mutex);
+	mutex_unlock(&md->bdev->bd_mutex);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: dm: use revalidate_disk to update device size after set_capacity
  2010-10-28  1:16   ` Mike Snitzer
@ 2010-10-28 12:15     ` Jun'ichi Nomura
  2010-10-28 19:54       ` Mike Snitzer
  0 siblings, 1 reply; 11+ messages in thread
From: Jun'ichi Nomura @ 2010-10-28 12:15 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

Hi Mike,

(10/28/10 10:16), Mike Snitzer wrote:
> But in my limited testing of the proposed patch (above), using linear DM
> target over DM mpath, I haven't seen any problems.  I was doing IO in
> parallel to the resize.  Notice with the patch we now see the following
> messages (dm-0 is the mpath device, dm-1 is the linear):

There is FIFREEZE ioctl, which calls freeze_super.
So if you mix a process doing FIFREEZE (xfs_freeze?) in your test,
I think you hit the deadlock like this:

  process A              process B
  -----------------------------------------------
                         suspend dm dev
  ioctl(FIFREEZE)
    freeze_super()
      hold s_umount
      sync_filesystems()
        wait for I/O flowing..

                         resume dm dev
                           __set_size
                             revalidate_disk()
                               hold bd_mutex
                               flush_disk()
                                 wait for s_umount

> But I haven't yet fully understood why check_disk_size_change's use of
> bdev->bd_mutex sufficiently protects access to bdev->bd_inode->i_size
> (unless all access to bdev->bd_inode->i_size takes bdev->bd_mutex; DM
> being an exception?).

i_size_read/write uses seqcount to protect the reads from
accessing incomplete write.
But the seqcount itself needs protection. Otherwise concurrent
writes will break the seqcount scheme.
So i_size_write()s need mutual exclusion, but not all accesses do.
That's my understanding from the comments in include/linux/fs.h.

> Given how naive I am on these core block paths there is more analysis
> needed to verify/determine the proper fix for DM device resize (while
> the device is suspended).
> 
> Could be the following patch be sufficient? (avoids potential for IO
> while device is suspended -- final patch would need comments explaining
> why revalidate_disk was avoided)

Though I can't point out actual problem,
I think it's deadlock-prone to take bd_mutex in dm_swap_table.

There are already codes which do I/O while holding bd_mutex,
e.g. block/ioctl.c, though the code is not called for dm,
so we can' just set a general rule "Don't do I/O while holding bd_mutex".

Also, even if I/O is not done under bd_mutex, it might be blocked by
other. For example, though currently nobody can call revalidate_disk for dm,

  process A              process B             process C
  ----------------------------------------------------------
                                               suspend dm dev
  freeze_super()
    hold s_umount
    sync_filesystems()
      wait for I/O flowing..

                         revalidate_disk()
                           hold bd_mutex
                           flush_disk()
                             wait for s_umount

                                               resume dm dev
                                                 __set_size
                                                   wait for bd_mutex

If __set_size() could be done in later stage of do_resume(),
we can use revalidate_disk() for dm, too.
What do you think?

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm: use revalidate_disk to update device size after set_capacity
  2010-10-28 12:15     ` Jun'ichi Nomura
@ 2010-10-28 19:54       ` Mike Snitzer
  2010-10-28 22:18         ` Mike Snitzer
  2010-10-29  3:00         ` Jun'ichi Nomura
  0 siblings, 2 replies; 11+ messages in thread
From: Mike Snitzer @ 2010-10-28 19:54 UTC (permalink / raw)
  To: Jun'ichi Nomura; +Cc: device-mapper development

On Thu, Oct 28 2010 at  8:15am -0400,
Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:

> Hi Mike,
> 
> (10/28/10 10:16), Mike Snitzer wrote:
> > But in my limited testing of the proposed patch (above), using linear DM
> > target over DM mpath, I haven't seen any problems.  I was doing IO in
> > parallel to the resize.  Notice with the patch we now see the following
> > messages (dm-0 is the mpath device, dm-1 is the linear):
> 
> There is FIFREEZE ioctl, which calls freeze_super.
> So if you mix a process doing FIFREEZE (xfs_freeze?) in your test,
> I think you hit the deadlock like this:
> 
>   process A              process B
>   -----------------------------------------------
>                          suspend dm dev
>   ioctl(FIFREEZE)
>     freeze_super()
>       hold s_umount
>       sync_filesystems()
>         wait for I/O flowing..
> 
>                          resume dm dev
>                            __set_size
>                              revalidate_disk()
>                                hold bd_mutex
>                                flush_disk()
>                                  wait for s_umount

OK, I didn't consider FIFREEZE ioctl..

But regardless, I think it is now clear that we must avoid using
revalidate_disk() while a DM device is suspended.

> > But I haven't yet fully understood why check_disk_size_change's use of
> > bdev->bd_mutex sufficiently protects access to bdev->bd_inode->i_size
> > (unless all access to bdev->bd_inode->i_size takes bdev->bd_mutex; DM
> > being an exception?).
> 
> i_size_read/write uses seqcount to protect the reads from
> accessing incomplete write.
> But the seqcount itself needs protection. Otherwise concurrent
> writes will break the seqcount scheme.
> So i_size_write()s need mutual exclusion, but not all accesses do.
> That's my understanding from the comments in include/linux/fs.h.

Correct, but my point was that revalidate_disk protects the
bdev->bd_inode->i_size i_size_write() with bdev->bd_mutex.

Whereas quite a few non-block driver i_size_write() callers use the
inode's i_mutex; as DM currently does with md->bdev->bd_inode->i_mutex.

As we now know, using md->bdev->bd_inode->i_mutex is prone to deadlock
against fsync.

> > Given how naive I am on these core block paths there is more analysis
> > needed to verify/determine the proper fix for DM device resize (while
> > the device is suspended).
> > 
> > Could be the following patch be sufficient? (avoids potential for IO
> > while device is suspended -- final patch would need comments explaining
> > why revalidate_disk was avoided)
> 
> Though I can't point out actual problem,
> I think it's deadlock-prone to take bd_mutex in dm_swap_table.
> 
> There are already codes which do I/O while holding bd_mutex,
> e.g. block/ioctl.c, though the code is not called for dm,
> so we can' just set a general rule "Don't do I/O while holding bd_mutex".

Right, it would work provided we had that understanding within DM.

> Also, even if I/O is not done under bd_mutex, it might be blocked by
> other. For example, though currently nobody can call revalidate_disk for dm,

Hmm, that was a strange example, as you pointed out, considering the
fictional revalidate_disk caller.  I was proposing using bd_mutex
directly -- independent of revalidate_disk and its associated flushing.

> If __set_size() could be done in later stage of do_resume(),
> we can use revalidate_disk() for dm, too.
> What do you think?

I think it would be incorrect to make a new DM table live without first
adjusting the size of the DM device to reflect the new table.  Could
result in racing IO to the end of a device which would cause IO errors
like "access beyond end of device".

So our only option would be to change the device's size _before_
do_resume's dm_suspend().  But then if dm_swap_table() fails we'd need
to revert back to the old size -- which is clearly awkward.

I think updating the device's size within dm_swap_table() really is the
most logical place.

I'll cleanup the patch from my previous mail to include some in-code
comments and re-post it.  Please ack it if you agree that direct use of
bd_mutex in __set_size() is a reasonable fix given the current code.

Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm: use revalidate_disk to update device size after set_capacity
  2010-10-28 19:54       ` Mike Snitzer
@ 2010-10-28 22:18         ` Mike Snitzer
  2010-10-29  3:00         ` Jun'ichi Nomura
  1 sibling, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2010-10-28 22:18 UTC (permalink / raw)
  To: Jun'ichi Nomura; +Cc: device-mapper development

On Thu, Oct 28 2010 at  3:54pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Thu, Oct 28 2010 at  8:15am -0400,
> Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:

> > Though I can't point out actual problem,
> > I think it's deadlock-prone to take bd_mutex in dm_swap_table.
> > 
> > There are already codes which do I/O while holding bd_mutex,
> > e.g. block/ioctl.c, though the code is not called for dm,
> > so we can' just set a general rule "Don't do I/O while holding bd_mutex".
> 
> Right, it would work provided we had that understanding within DM.

I think I misunderstood what you were saying as: no code does I/O to DM
while holding bd_mutex so we _can_ just set the general rule....

But re-reading your mail made me realize you were likely saying the
opposite?  e.g. your can' was meant to be can't.

So even though we currently can't see a real deadlock associated with
using ->bd_mutex directly you're contending there is potential for one
(may already be one in hiding or one could be introduced in the future).

That is fine, best to be conservative.

> I'll cleanup the patch from my previous mail to include some in-code
> comments and re-post it.  Please ack it if you agree that direct use of
> bd_mutex in __set_size() is a reasonable fix given the current code.

Taking a step back, don't we already have adequate locking for
__set_size's i_size_write() via dm_swap_table's md->suspend_lock?

So something like the following?

Mike
---
 drivers/md/dm.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7cb1352..e71b8a1 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1996,9 +1996,8 @@ static void __set_size(struct mapped_device *md, sector_t size)
 {
 	set_capacity(md->disk, size);
 
-	mutex_lock(&md->bdev->bd_inode->i_mutex);
+	/* i_size_write() is protected by md->suspend_lock */
 	i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT);
-	mutex_unlock(&md->bdev->bd_inode->i_mutex);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: dm: use revalidate_disk to update device size after set_capacity
  2010-10-28 19:54       ` Mike Snitzer
  2010-10-28 22:18         ` Mike Snitzer
@ 2010-10-29  3:00         ` Jun'ichi Nomura
  1 sibling, 0 replies; 11+ messages in thread
From: Jun'ichi Nomura @ 2010-10-29  3:00 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

Hi Mike,

(10/29/10 04:54), Mike Snitzer wrote:
>> If __set_size() could be done in later stage of do_resume(),
>> we can use revalidate_disk() for dm, too.
>> What do you think?
> 
> I think it would be incorrect to make a new DM table live without first
> adjusting the size of the DM device to reflect the new table.  Could
> result in racing IO to the end of a device which would cause IO errors
> like "access beyond end of device".

Yes, but at that point, resume ioctl is still in progress.
Doesn't that mean "the resize is still in progress"?
If so, who cares that I/O while the resize is not yet completed?

> I'll cleanup the patch from my previous mail to include some in-code
> comments and re-post it.  Please ack it if you agree that direct use of
> bd_mutex in __set_size() is a reasonable fix given the current code.

My point is that it's better to avoid having a code with
special assumption on generic code, either "any codes don't do
I/O while holding bd_mutex" or "dm_resume is the only code
which calls i_size_write() for dm device".
I think it's prone for someone who don't care dm specifics
to introduce a new code that breaks such an assumption.

Anyway, I think your bd_mutex patch should be fine for now and
is better than the current code with i_mutex, which has a real deadlock issue.

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 11+ messages in thread

* dm: lock bd_mutex when setting device size
  2010-10-19 22:07 [PATCH] dm: use revalidate_disk to update device size after set_capacity Mike Snitzer
  2010-10-20  5:42 ` Jun'ichi Nomura
@ 2010-10-29 21:50 ` Mike Snitzer
  2010-11-01  7:19   ` Jun'ichi Nomura
  1 sibling, 1 reply; 11+ messages in thread
From: Mike Snitzer @ 2010-10-29 21:50 UTC (permalink / raw)
  To: dm-devel; +Cc: Jun'ichi Nomura

Avoid taking md->bdev->bd_inode->i_mutex to update the DM block device's
size.  Using md->bdev->bd_mutex eliminates the potential for deadlock if
an fsync is racing with a device resize.

revalidate_disk() was avoided because it would flush_disk() while the DM
device is suspended.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
---
 drivers/md/dm.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

Jun'ichi, was the following your implict Acked-by?  Care to make it
explicit?
"Anyway, I think your bd_mutex patch should be fine for now and is
better than the current code with i_mutex, which has a real deadlock
issue."

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7cb1352..248794a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1996,9 +1996,9 @@ static void __set_size(struct mapped_device *md, sector_t size)
 {
 	set_capacity(md->disk, size);
 
-	mutex_lock(&md->bdev->bd_inode->i_mutex);
+	mutex_lock(&md->bdev->bd_mutex);
 	i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT);
-	mutex_unlock(&md->bdev->bd_inode->i_mutex);
+	mutex_unlock(&md->bdev->bd_mutex);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: dm: lock bd_mutex when setting device size
  2010-10-29 21:50 ` dm: lock bd_mutex when setting device size Mike Snitzer
@ 2010-11-01  7:19   ` Jun'ichi Nomura
  2010-11-01 13:14     ` Mike Snitzer
  0 siblings, 1 reply; 11+ messages in thread
From: Jun'ichi Nomura @ 2010-11-01  7:19 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel

Hi Mike,

(10/30/10 06:50), Mike Snitzer wrote:
> Avoid taking md->bdev->bd_inode->i_mutex to update the DM block device's
> size.  Using md->bdev->bd_mutex eliminates the potential for deadlock if
> an fsync is racing with a device resize.
> 
> revalidate_disk() was avoided because it would flush_disk() while the DM
> device is suspended.
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> ---
>  drivers/md/dm.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> Jun'ichi, was the following your implict Acked-by?  Care to make it
> explicit?
> "Anyway, I think your bd_mutex patch should be fine for now and is
> better than the current code with i_mutex, which has a real deadlock
> issue."

No, it was not an ACK.
(This is not multipath. So I think you don't need my ack.)

I'm reluctant to ack this because, as I wrote, it's prone to
cause deadlock in future.
But I couldn't find a real problem with the patch,
so I'm not NACK-ing either.

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: dm: lock bd_mutex when setting device size
  2010-11-01  7:19   ` Jun'ichi Nomura
@ 2010-11-01 13:14     ` Mike Snitzer
  2010-11-03 18:06       ` [PATCH] dm: remove extra locking when changing " Mike Snitzer
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Snitzer @ 2010-11-01 13:14 UTC (permalink / raw)
  To: Jun'ichi Nomura; +Cc: dm-devel

On Mon, Nov 01 2010 at  3:19am -0400,
Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:

> Hi Mike,
> 
> (10/30/10 06:50), Mike Snitzer wrote:
> > Avoid taking md->bdev->bd_inode->i_mutex to update the DM block device's
> > size.  Using md->bdev->bd_mutex eliminates the potential for deadlock if
> > an fsync is racing with a device resize.
> > 
> > revalidate_disk() was avoided because it would flush_disk() while the DM
> > device is suspended.
> > 
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> > ---
> >  drivers/md/dm.c |    4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > Jun'ichi, was the following your implict Acked-by?  Care to make it
> > explicit?
> > "Anyway, I think your bd_mutex patch should be fine for now and is
> > better than the current code with i_mutex, which has a real deadlock
> > issue."
> 
> No, it was not an ACK.
> (This is not multipath. So I think you don't need my ack.)

Your ack is always meaningful but that is fine too.

> I'm reluctant to ack this because, as I wrote, it's prone to
> cause deadlock in future.
> But I couldn't find a real problem with the patch,
> so I'm not NACK-ing either.

dm_swap_table's md->suspend_mutex already provides adequate protection
of __set_size's i_size_write.  You didn't like this because it implied:
"dm_resume is the only code which calls i_size_write() for dm device".

I don't see that as a problem at all; its a more meaningful/enforcable
rule to have in DM than something tethered to the generic bd_mutex.

I'll discuss this further with Alasdair and/or others.

Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] dm: remove extra locking when changing device size
  2010-11-01 13:14     ` Mike Snitzer
@ 2010-11-03 18:06       ` Mike Snitzer
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2010-11-03 18:06 UTC (permalink / raw)
  To: dm-devel

DM no longer needlessly holds md->bdev->bd_inode->i_mutex when changing
the size of a DM device.  This additional locking is unnecessary because
i_size_write() is already protected by the existing critical section in
dm_swap_table().  DM already has a reference on md->bdev so the
associated bd_inode may be changed without lifetime concerns.

A negative side-effect of having held md->bdev->bd_inode->i_mutex was
that a concurrent DM device resize and flush (via fsync) would
deadlock.  Dropping md->bdev->bd_inode->i_mutex eliminates this
potential for deadlock.  The following reproducer no longer deadlocks:
https://www.redhat.com/archives/dm-devel/2009-July/msg00284.html

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
 drivers/md/dm.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7cb1352..1794dee 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1996,9 +1996,12 @@ static void __set_size(struct mapped_device *md, sector_t size)
 {
 	set_capacity(md->disk, size);
 
-	mutex_lock(&md->bdev->bd_inode->i_mutex);
+	/*
+	 * Only DM is allowed to change the size of a DM device.
+	 * i_size_write() is protected by the dm_swap_table() critical
+	 * section that uses md->suspend_lock.
+	 */
 	i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT);
-	mutex_unlock(&md->bdev->bd_inode->i_mutex);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-11-03 18:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-19 22:07 [PATCH] dm: use revalidate_disk to update device size after set_capacity Mike Snitzer
2010-10-20  5:42 ` Jun'ichi Nomura
2010-10-28  1:16   ` Mike Snitzer
2010-10-28 12:15     ` Jun'ichi Nomura
2010-10-28 19:54       ` Mike Snitzer
2010-10-28 22:18         ` Mike Snitzer
2010-10-29  3:00         ` Jun'ichi Nomura
2010-10-29 21:50 ` dm: lock bd_mutex when setting device size Mike Snitzer
2010-11-01  7:19   ` Jun'ichi Nomura
2010-11-01 13:14     ` Mike Snitzer
2010-11-03 18:06       ` [PATCH] dm: remove extra locking when changing " Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.