linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount
@ 2010-01-24 11:41 Dmitry Monakhov
  2010-01-24 11:50 ` Dmitry Monakhov
  2010-01-24 19:53 ` Al Viro
  0 siblings, 2 replies; 8+ messages in thread
From: Dmitry Monakhov @ 2010-01-24 11:41 UTC (permalink / raw)
  To: linux-ext4, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1738 bytes --]

Currently on rw=>ro remount we have following race
| mount /mnt -oremount,ro | write-task |
|-------------------------+------------|
|                         | open(RDWR) |
| shrink_dcache_sb(sb);   |            |
| sync_filesystem(sb);    |            |
|                         | write()    |
|                         | close()    |
| fs_may_remount_ro(sb)   |            |
| sb->s_flags = new_flags |            |
Later writeback or sync() will result in error due to MS_RDONLY flag
In case of ext4 this result in jbd2_start failure on writeback
ext4_da_writepages: jbd2_start: 1024 pages, ino 1431; err -30 
In fact all others are affected by this error but it is not visible
because the skip s_flags check on writeback. For example ext3 check
(s_flags & MS_RDONLY) only if page has no buffers during journal start.

In order to prevent the race we have to block new writers before
fs_may_remount_ro() and sync_filesystem(). Let's introduce new
sb->s_flags MS_RO_REMOUNT flag for this purpose. But suddenly we have
no available space in MS_XXX bits, let's share this bit with MS_REMOUNT.
This is possible because MS_REMOUNT used only for passing arguments
from flags to sys_mount() and never used in sb->s_flags.

##TESTCASE_BEGIN:
#! /bin/bash -x 
DEV=/dev/sdb5
FSTYPE=ext4
BINDIR=/home/dmon
MNTOPT="data=ordered"
umount /mnt
mkfs.${FSTYPE}  ${DEV} || exit 1
mount  ${DEV} /mnt -o${MNTOPT} || exit 1
${BINDIR}/fsstress -p1 -l999999999 -n9999999999 -d /mnt/test &
sleep 15
mount /mnt -oremount,ro,${MNTOPT}
sleep 1
killall -9 fsstress
sync
# after this you may get following message in dmesg
# "ext4_da_writepages: jbd2_start: 1024 pages, ino 1431; err -30"
##TESTCASE_END

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
--

[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 2529 bytes --]

diff --git a/fs/namespace.c b/fs/namespace.c
index c768f73..a216fb3 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -194,7 +194,7 @@ int __mnt_is_readonly(struct vfsmount *mnt)
 {
 	if (mnt->mnt_flags & MNT_READONLY)
 		return 1;
-	if (mnt->mnt_sb->s_flags & MS_RDONLY)
+	if (mnt->mnt_sb->s_flags & (MS_RDONLY| MS_RO_REMOUNT))
 		return 1;
 	return 0;
 }
diff --git a/fs/super.c b/fs/super.c
index aff046b..756fe88 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -569,42 +569,51 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 {
 	int retval;
 	int remount_rw;
+	int remount_ro;
 
 	if (sb->s_frozen != SB_UNFROZEN)
 		return -EBUSY;
-
+	remount_ro = (flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY);
 #ifdef CONFIG_BLOCK
 	if (!(flags & MS_RDONLY) && bdev_read_only(sb->s_bdev))
 		return -EACCES;
 #endif
-
 	if (flags & MS_RDONLY)
 		acct_auto_close(sb);
-	shrink_dcache_sb(sb);
-	sync_filesystem(sb);
 
 	/* If we are remounting RDONLY and current sb is read/write,
 	   make sure there are no rw files opened */
-	if ((flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY)) {
+	retval = -EBUSY;
+	if (remount_ro) {
+		/* Prevent new writers before check */
+		sb->s_flags |= MS_RO_REMOUNT;
 		if (force)
 			mark_files_ro(sb);
 		else if (!fs_may_remount_ro(sb))
-			return -EBUSY;
+			goto out;
+	}
+	shrink_dcache_sb(sb);
+	sync_filesystem(sb);
+
+	if (remount_ro) {
 		retval = vfs_dq_off(sb, 1);
 		if (retval < 0 && retval != -ENOSYS)
-			return -EBUSY;
+			goto out;
 	}
 	remount_rw = !(flags & MS_RDONLY) && (sb->s_flags & MS_RDONLY);
 
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		if (retval)
-			return retval;
+			goto out;
 	}
 	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);
 	if (remount_rw)
 		vfs_dq_quota_on_remount(sb);
-	return 0;
+out:
+	if (remount_ro)
+		sb->s_flags = (sb->s_flags & ~MS_RO_REMOUNT);
+	return retval;
 }
 
 static void do_emergency_remount(struct work_struct *work)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b1bcb27..a613875 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -208,6 +208,9 @@ struct inodes_stat_t {
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
 #define MS_ACTIVE	(1<<30)
 #define MS_NOUSER	(1<<31)
+#define MS_RO_REMOUNT	MS_REMOUNT /* Alter flags from rw=>ro of mounted FS.
+				      Not conflicting with MS_REMOUNT because
+				      it never stored in sb->s_flags */
 
 /*
  * Superblock flags that can be altered by MS_REMOUNT

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount
  2010-01-24 11:41 [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount Dmitry Monakhov
@ 2010-01-24 11:50 ` Dmitry Monakhov
  2010-01-24 19:53 ` Al Viro
  1 sibling, 0 replies; 8+ messages in thread
From: Dmitry Monakhov @ 2010-01-24 11:50 UTC (permalink / raw)
  To: linux-ext4; +Cc: linux-kernel, Greg KH

Dmitry Monakhov <dmonakhov@openvz.org> writes:

As soon as i understand all kernel version are affected, at least
I'm able to reproduce the bug on 2.6.29..2.6.33-rc4
> Currently on rw=>ro remount we have following race
> | mount /mnt -oremount,ro | write-task |
> |-------------------------+------------|
> |                         | open(RDWR) |
> | shrink_dcache_sb(sb);   |            |
> | sync_filesystem(sb);    |            |
> |                         | write()    |
> |                         | close()    |
> | fs_may_remount_ro(sb)   |            |
> | sb->s_flags = new_flags |            |
> Later writeback or sync() will result in error due to MS_RDONLY flag
> In case of ext4 this result in jbd2_start failure on writeback
> ext4_da_writepages: jbd2_start: 1024 pages, ino 1431; err -30 
> In fact all others are affected by this error but it is not visible
> because the skip s_flags check on writeback. For example ext3 check
> (s_flags & MS_RDONLY) only if page has no buffers during journal start.
>
> In order to prevent the race we have to block new writers before
> fs_may_remount_ro() and sync_filesystem(). Let's introduce new
> sb->s_flags MS_RO_REMOUNT flag for this purpose. But suddenly we have
> no available space in MS_XXX bits, let's share this bit with MS_REMOUNT.
> This is possible because MS_REMOUNT used only for passing arguments
> from flags to sys_mount() and never used in sb->s_flags.
>
> ##TESTCASE_BEGIN:
> #! /bin/bash -x 
> DEV=/dev/sdb5
> FSTYPE=ext4
> BINDIR=/home/dmon
> MNTOPT="data=ordered"
> umount /mnt
> mkfs.${FSTYPE}  ${DEV} || exit 1
> mount  ${DEV} /mnt -o${MNTOPT} || exit 1
> ${BINDIR}/fsstress -p1 -l999999999 -n9999999999 -d /mnt/test &
> sleep 15
> mount /mnt -oremount,ro,${MNTOPT}
> sleep 1
> killall -9 fsstress
> sync
> # after this you may get following message in dmesg
> # "ext4_da_writepages: jbd2_start: 1024 pages, ino 1431; err -30"
> ##TESTCASE_END
>
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> --
> diff --git a/fs/namespace.c b/fs/namespace.c
> index c768f73..a216fb3 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -194,7 +194,7 @@ int __mnt_is_readonly(struct vfsmount *mnt)
>  {
>  	if (mnt->mnt_flags & MNT_READONLY)
>  		return 1;
> -	if (mnt->mnt_sb->s_flags & MS_RDONLY)
> +	if (mnt->mnt_sb->s_flags & (MS_RDONLY| MS_RO_REMOUNT))
>  		return 1;
>  	return 0;
>  }
> diff --git a/fs/super.c b/fs/super.c
> index aff046b..756fe88 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -569,42 +569,51 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
>  {
>  	int retval;
>  	int remount_rw;
> +	int remount_ro;
>  
>  	if (sb->s_frozen != SB_UNFROZEN)
>  		return -EBUSY;
> -
> +	remount_ro = (flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY);
>  #ifdef CONFIG_BLOCK
>  	if (!(flags & MS_RDONLY) && bdev_read_only(sb->s_bdev))
>  		return -EACCES;
>  #endif
> -
>  	if (flags & MS_RDONLY)
>  		acct_auto_close(sb);
> -	shrink_dcache_sb(sb);
> -	sync_filesystem(sb);
>  
>  	/* If we are remounting RDONLY and current sb is read/write,
>  	   make sure there are no rw files opened */
> -	if ((flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY)) {
> +	retval = -EBUSY;
> +	if (remount_ro) {
> +		/* Prevent new writers before check */
> +		sb->s_flags |= MS_RO_REMOUNT;
>  		if (force)
>  			mark_files_ro(sb);
>  		else if (!fs_may_remount_ro(sb))
> -			return -EBUSY;
> +			goto out;
> +	}
> +	shrink_dcache_sb(sb);
> +	sync_filesystem(sb);
> +
> +	if (remount_ro) {
>  		retval = vfs_dq_off(sb, 1);
>  		if (retval < 0 && retval != -ENOSYS)
> -			return -EBUSY;
> +			goto out;
>  	}
>  	remount_rw = !(flags & MS_RDONLY) && (sb->s_flags & MS_RDONLY);
>  
>  	if (sb->s_op->remount_fs) {
>  		retval = sb->s_op->remount_fs(sb, &flags, data);
>  		if (retval)
> -			return retval;
> +			goto out;
>  	}
>  	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);
>  	if (remount_rw)
>  		vfs_dq_quota_on_remount(sb);
> -	return 0;
> +out:
> +	if (remount_ro)
> +		sb->s_flags = (sb->s_flags & ~MS_RO_REMOUNT);
> +	return retval;
>  }
>  
>  static void do_emergency_remount(struct work_struct *work)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b1bcb27..a613875 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -208,6 +208,9 @@ struct inodes_stat_t {
>  #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
>  #define MS_ACTIVE	(1<<30)
>  #define MS_NOUSER	(1<<31)
> +#define MS_RO_REMOUNT	MS_REMOUNT /* Alter flags from rw=>ro of mounted FS.
> +				      Not conflicting with MS_REMOUNT because
> +				      it never stored in sb->s_flags */
>  
>  /*
>   * Superblock flags that can be altered by MS_REMOUNT

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount
  2010-01-24 11:41 [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount Dmitry Monakhov
  2010-01-24 11:50 ` Dmitry Monakhov
@ 2010-01-24 19:53 ` Al Viro
  2010-01-24 21:15   ` Dmitry Monakhov
  1 sibling, 1 reply; 8+ messages in thread
From: Al Viro @ 2010-01-24 19:53 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, linux-kernel

On Sun, Jan 24, 2010 at 02:41:15PM +0300, Dmitry Monakhov wrote:
> Currently on rw=>ro remount we have following race
> | mount /mnt -oremount,ro | write-task |
> |-------------------------+------------|
> |                         | open(RDWR) |
> | shrink_dcache_sb(sb);   |            |
> | sync_filesystem(sb);    |            |
> |                         | write()    |
> |                         | close()    |
> | fs_may_remount_ro(sb)   |            |
> | sb->s_flags = new_flags |            |
> Later writeback or sync() will result in error due to MS_RDONLY flag
> In case of ext4 this result in jbd2_start failure on writeback
> ext4_da_writepages: jbd2_start: 1024 pages, ino 1431; err -30 
> In fact all others are affected by this error but it is not visible
> because the skip s_flags check on writeback. For example ext3 check
> (s_flags & MS_RDONLY) only if page has no buffers during journal start.
> 
> In order to prevent the race we have to block new writers before
> fs_may_remount_ro() and sync_filesystem(). Let's introduce new
> sb->s_flags MS_RO_REMOUNT flag for this purpose. But suddenly we have
> no available space in MS_XXX bits, let's share this bit with MS_REMOUNT.
> This is possible because MS_REMOUNT used only for passing arguments
> from flags to sys_mount() and never used in sb->s_flags.

It's not a solution.  You get an _attempted_ remount ro making writes
fail, even if it's going to be unsuccessful.  No go...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount
  2010-01-24 19:53 ` Al Viro
@ 2010-01-24 21:15   ` Dmitry Monakhov
  2010-01-24 21:37     ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Monakhov @ 2010-01-24 21:15 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-ext4, linux-kernel

Al Viro <viro@ZenIV.linux.org.uk> writes:

> On Sun, Jan 24, 2010 at 02:41:15PM +0300, Dmitry Monakhov wrote:
>> Currently on rw=>ro remount we have following race
>> | mount /mnt -oremount,ro | write-task |
>> |-------------------------+------------|
>> |                         | open(RDWR) |
>> | shrink_dcache_sb(sb);   |            |
>> | sync_filesystem(sb);    |            |
>> |                         | write()    |
>> |                         | close()    |
>> | fs_may_remount_ro(sb)   |            |
>> | sb->s_flags = new_flags |            |
>> Later writeback or sync() will result in error due to MS_RDONLY flag
>> In case of ext4 this result in jbd2_start failure on writeback
>> ext4_da_writepages: jbd2_start: 1024 pages, ino 1431; err -30 
>> In fact all others are affected by this error but it is not visible
>> because the skip s_flags check on writeback. For example ext3 check
>> (s_flags & MS_RDONLY) only if page has no buffers during journal start.
>> 
>> In order to prevent the race we have to block new writers before
>> fs_may_remount_ro() and sync_filesystem(). Let's introduce new
>> sb->s_flags MS_RO_REMOUNT flag for this purpose. But suddenly we have
>> no available space in MS_XXX bits, let's share this bit with MS_REMOUNT.
>> This is possible because MS_REMOUNT used only for passing arguments
>> from flags to sys_mount() and never used in sb->s_flags.
>
> It's not a solution.  You get an _attempted_ remount ro making writes
> fail, even if it's going to be unsuccessful.  No go...
We have two options for new writers:
1) Fail it via -EROFS
   Yes, remount may fail, but it is really unlikely.
2) Defer(block) new writers on until we complete or fail remount
   for example like follows. Do you like second solution ?
diff --git a/fs/namespace.c b/fs/namespace.c
index c768f73..daf3c5a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -196,6 +196,15 @@ int __mnt_is_readonly(struct vfsmount *mnt)
 		return 1;
 	if (mnt->mnt_sb->s_flags & MS_RDONLY)
 		return 1;
+	if (mnt->mnt_sb->s_flags & MS_RO_REMOUNT) {
+		int ret = 0;
+		/* Serialize against remount */
+		down_read(&mnt->mnt_sb->s_umount);
+		if (mnt->mnt_sb->s_flags & MS_RDONLY)
+			ret = 1;
+		up_read(&mnt->mnt_sb->s_umount);
+		return ret;
+	}
 	return 0;
 }
 EXPORT_SYMBOL_GPL(__mnt_is_readonly);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount
  2010-01-24 21:15   ` Dmitry Monakhov
@ 2010-01-24 21:37     ` Al Viro
  2010-01-24 22:40       ` Dave Chinner
  2010-01-24 23:01       ` Dmitry Monakhov
  0 siblings, 2 replies; 8+ messages in thread
From: Al Viro @ 2010-01-24 21:37 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, linux-kernel

On Mon, Jan 25, 2010 at 12:15:51AM +0300, Dmitry Monakhov wrote:

> > It's not a solution.  You get an _attempted_ remount ro making writes
> > fail, even if it's going to be unsuccessful.  No go...
> We have two options for new writers:
> 1) Fail it via -EROFS
>    Yes, remount may fail, but it is really unlikely.
> 2) Defer(block) new writers on until we complete or fail remount
>    for example like follows. Do you like second solution ?

Umm...  I wonder what the locking implications would be...  Frankly,
I suspect that what we really want is this:
	* per-superblock write count of some kind, bumped when we decide
that writeback is inevitable and dropped when we are done with it (the
same thing goes for async part of unlink(), etc.)
	* fs_may_remount_ro() checking that write count
So basically we try to push those short-term writers to completion and
if new ones had come while we'd been doing that (or some are really
stuck) we fail remount with -EBUSY.

As a short-term solution the second patch would do probably (-stable and .33),
but in the next cycle I'd rather see something addressing the real problem.
fs_may_remount_ro() in its current form is really broken by design - it
should not scan any lists (which is where your race comes from, BTW)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount
  2010-01-24 21:37     ` Al Viro
@ 2010-01-24 22:40       ` Dave Chinner
  2010-02-09 15:28         ` Jan Kara
  2010-01-24 23:01       ` Dmitry Monakhov
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2010-01-24 22:40 UTC (permalink / raw)
  To: Al Viro; +Cc: Dmitry Monakhov, linux-ext4, linux-kernel

On Sun, Jan 24, 2010 at 09:37:07PM +0000, Al Viro wrote:
> On Mon, Jan 25, 2010 at 12:15:51AM +0300, Dmitry Monakhov wrote:
> 
> > > It's not a solution.  You get an _attempted_ remount ro making writes
> > > fail, even if it's going to be unsuccessful.  No go...
> > We have two options for new writers:
> > 1) Fail it via -EROFS
> >    Yes, remount may fail, but it is really unlikely.
> > 2) Defer(block) new writers on until we complete or fail remount
> >    for example like follows. Do you like second solution ?
> 
> Umm...  I wonder what the locking implications would be...  Frankly,
> I suspect that what we really want is this:
> 	* per-superblock write count of some kind, bumped when we decide
> that writeback is inevitable and dropped when we are done with it (the
> same thing goes for async part of unlink(), etc.)
> 	* fs_may_remount_ro() checking that write count
> So basically we try to push those short-term writers to completion and
> if new ones had come while we'd been doing that (or some are really
> stuck) we fail remount with -EBUSY.

Perhaps we could utilise the filesystem freeze infrastructure - it
already has hooks for intercepting new writers and modifcations,
and filesystems have to flush any current modifications before the freeze
completes. It sounds very similar to the requirements needed here...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount
  2010-01-24 21:37     ` Al Viro
  2010-01-24 22:40       ` Dave Chinner
@ 2010-01-24 23:01       ` Dmitry Monakhov
  1 sibling, 0 replies; 8+ messages in thread
From: Dmitry Monakhov @ 2010-01-24 23:01 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-ext4, linux-kernel

Al Viro <viro@ZenIV.linux.org.uk> writes:

> On Mon, Jan 25, 2010 at 12:15:51AM +0300, Dmitry Monakhov wrote:
>
>> > It's not a solution.  You get an _attempted_ remount ro making writes
>> > fail, even if it's going to be unsuccessful.  No go...
>> We have two options for new writers:
>> 1) Fail it via -EROFS
>>    Yes, remount may fail, but it is really unlikely.
>> 2) Defer(block) new writers on until we complete or fail remount
>>    for example like follows. Do you like second solution ?
>
> Umm...  I wonder what the locking implications would be...  Frankly,
> I suspect that what we really want is this:
> 	* per-superblock write count of some kind, bumped when we decide
> that writeback is inevitable and dropped when we are done with it (the
> same thing goes for async part of unlink(), etc.)
> 	* fs_may_remount_ro() checking that write count
> So basically we try to push those short-term writers to completion and
> if new ones had come while we'd been doing that (or some are really
> stuck) we fail remount with -EBUSY.
>
> As a short-term solution the second patch would do probably (-stable and .33),
> but in the next cycle I'd rather see something addressing the real problem.
> fs_may_remount_ro() in its current form is really broken by design - it
> should not scan any lists (which is where your race comes from, BTW)
This is not actually true. The race happens not only because
fs_may_remount_ro() is not atomic, but because we have two stages
1) fs_may_remount_ro()
2) sync_filesystem()
Even when we make first stage atomic, we still have race between
second stage and new writers.
BTW: Your idea about per-sb counter may be useful here but
it must be not reference count, but it may be used like i_version
For example:
mnt_want_write()
{
   mnt->mnt_sb->s_wr_count++;
}
mnt_drop_write()
{
   mnt->mnt_sb->s_wr_count++;
}
do_remount_sb {
    cur = mnt->mnt_sb->s_wr_count;
    if (fs_may_remount_ro())
         return -EBUSY;
    sync_filesystem()
    if (cur != mnt->mnt_sb->s_wr_count)
         return -EBUSY;
}



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount
  2010-01-24 22:40       ` Dave Chinner
@ 2010-02-09 15:28         ` Jan Kara
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kara @ 2010-02-09 15:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Al Viro, Dmitry Monakhov, linux-ext4, linux-kernel

> On Sun, Jan 24, 2010 at 09:37:07PM +0000, Al Viro wrote:
> > On Mon, Jan 25, 2010 at 12:15:51AM +0300, Dmitry Monakhov wrote:
> > 
> > > > It's not a solution.  You get an _attempted_ remount ro making writes
> > > > fail, even if it's going to be unsuccessful.  No go...
> > > We have two options for new writers:
> > > 1) Fail it via -EROFS
> > >    Yes, remount may fail, but it is really unlikely.
> > > 2) Defer(block) new writers on until we complete or fail remount
> > >    for example like follows. Do you like second solution ?
> > 
> > Umm...  I wonder what the locking implications would be...  Frankly,
> > I suspect that what we really want is this:
> > 	* per-superblock write count of some kind, bumped when we decide
> > that writeback is inevitable and dropped when we are done with it (the
> > same thing goes for async part of unlink(), etc.)
> > 	* fs_may_remount_ro() checking that write count
> > So basically we try to push those short-term writers to completion and
> > if new ones had come while we'd been doing that (or some are really
> > stuck) we fail remount with -EBUSY.
> 
> Perhaps we could utilise the filesystem freeze infrastructure - it
> already has hooks for intercepting new writers and modifcations,
> and filesystems have to flush any current modifications before the freeze
> completes. It sounds very similar to the requirements needed here...
  There are filesystems (e.g. ext2 or UDF) which don't support freezing so it's not
an option at least short term...

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-02-09 15:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-24 11:41 [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount Dmitry Monakhov
2010-01-24 11:50 ` Dmitry Monakhov
2010-01-24 19:53 ` Al Viro
2010-01-24 21:15   ` Dmitry Monakhov
2010-01-24 21:37     ` Al Viro
2010-01-24 22:40       ` Dave Chinner
2010-02-09 15:28         ` Jan Kara
2010-01-24 23:01       ` Dmitry Monakhov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).