All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] read-only remount fixes
@ 2010-10-05 10:31 Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race Miklos Szeredi
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

Al,

This series fixes some of the races and other bugs in the read-only
remount code.  Please review.

Git tree is here:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git read-only-remount-fixes

Thanks,
Miklos
---

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:49   ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 2/9] vfs: ignore error on forced remount Miklos Szeredi
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-fix-clone-flags.patch --]
[-- Type: text/plain, Size: 979 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

If clone_mnt() happens while mnt_make_readonly() is running, the
cloned mount might have MNT_WRITE_HOLD flag set, which results in
mnt_want_write() spinning forever on this mount.

Needs CAP_SYS_ADMIN to trigger deliberately and unlikely to happen
accidentally.  But if it does happen it can hang the machine.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namespace.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-01 21:54:17.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-01 21:54:30.000000000 +0200
@@ -595,7 +595,7 @@ static struct vfsmount *clone_mnt(struct
 				goto out_free;
 		}
 
-		mnt->mnt_flags = old->mnt_flags;
+		mnt->mnt_flags = old->mnt_flags & ~MNT_WRITE_HOLD;
 		atomic_inc(&sb->s_active);
 		mnt->mnt_sb = sb;
 		mnt->mnt_root = dget(root);

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/9] vfs: ignore error on forced remount
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 3/9] vfs: fix per mount read-write Miklos Szeredi
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-fix-force-remount-readonly.patch --]
[-- Type: text/plain, Size: 843 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

On emergency remount we want to force MS_RDONLY on the super block
even if ->remount_fs() failed for some reason.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/super.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-01 20:40:42.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-01 21:55:01.000000000 +0200
@@ -579,7 +579,8 @@ int do_remount_sb(struct super_block *sb
 
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
-		if (retval)
+		/* If forced remount, go ahead despite any errors */
+		if (retval && !force)
 			return retval;
 	}
 	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/9] vfs: fix per mount read-write
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 2/9] vfs: ignore error on forced remount Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 4/9] vfs: add sb_force_remount_readonly() helper Miklos Szeredi
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-fix-bind-remount-read-write.patch --]
[-- Type: text/plain, Size: 1332 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Don't allow remounting the vfsmount read-write if the superblock is
read-only.  Return -EROFS in this case.

Rename __mnt_unmake_readonly() to mnt_make_writable() to match
mnt_make_readonly().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namespace.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-09-30 16:53:30.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-09-30 16:57:47.000000000 +0200
@@ -399,11 +399,18 @@ static int mnt_make_readonly(struct vfsm
 	return ret;
 }
 
-static void __mnt_unmake_readonly(struct vfsmount *mnt)
+static int mnt_make_writable(struct vfsmount *mnt)
 {
+	int err = 0;
+
 	br_write_lock(vfsmount_lock);
-	mnt->mnt_flags &= ~MNT_READONLY;
+	if (mnt->mnt_sb->s_flags & MS_RDONLY)
+		err = -EROFS;
+	else
+		mnt->mnt_flags &= ~MNT_READONLY;
 	br_write_unlock(vfsmount_lock);
+
+	return err;
 }
 
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
@@ -1600,7 +1607,7 @@ static int change_mount_flags(struct vfs
 	if (readonly_request)
 		error = mnt_make_readonly(mnt);
 	else
-		__mnt_unmake_readonly(mnt);
+		error = mnt_make_writable(mnt);
 	return error;
 }
 

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/9] vfs: add sb_force_remount_readonly() helper
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
                   ` (2 preceding siblings ...)
  2010-10-05 10:31 ` [PATCH 3/9] vfs: fix per mount read-write Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 5/9] vfs: allow mnt_want_write() to sleep Miklos Szeredi
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-force_remount_readonly-helper.patch --]
[-- Type: text/plain, Size: 10917 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Add helper for filesystems to call when forcefully remounting
read-only (such as on filesystem errors).

Functionally this doesn't change anything.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 drivers/staging/pohmelfs/net.c |    2 +-
 fs/affs/amigaffs.c             |    2 +-
 fs/ext2/super.c                |    2 +-
 fs/ext3/super.c                |    4 ++--
 fs/ext4/super.c                |    4 ++--
 fs/fat/misc.c                  |    2 +-
 fs/hpfs/super.c                |    2 +-
 fs/jfs/super.c                 |    2 +-
 fs/namespace.c                 |    6 ++++++
 fs/nilfs2/super.c              |    2 +-
 fs/ocfs2/super.c               |    2 +-
 fs/reiserfs/journal.c          |    2 +-
 fs/reiserfs/prints.c           |    4 ++--
 fs/ubifs/io.c                  |    2 +-
 fs/ufs/super.c                 |    4 ++--
 include/linux/fs.h             |    1 +
 16 files changed, 25 insertions(+), 18 deletions(-)

Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-04 12:12:10.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-04 12:15:17.000000000 +0200
@@ -413,6 +413,12 @@ static int mnt_make_writable(struct vfsm
 	return err;
 }
 
+void sb_force_remount_readonly(struct super_block *sb)
+{
+	sb->s_flags |= MS_RDONLY;
+}
+EXPORT_SYMBOL(sb_force_remount_readonly);
+
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
Index: linux-2.6/drivers/staging/pohmelfs/net.c
===================================================================
--- linux-2.6.orig/drivers/staging/pohmelfs/net.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/drivers/staging/pohmelfs/net.c	2010-10-04 12:13:00.000000000 +0200
@@ -673,7 +673,7 @@ static int pohmelfs_root_cap_response(st
 	psb->state_flags = cap->flags;
 
 	if (psb->state_flags & POHMELFS_FLAGS_RO) {
-		psb->sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(psb->sb);
 		printk(KERN_INFO "Mounting POHMELFS (%d) read-only.\n", psb->idx);
 	}
 
Index: linux-2.6/fs/affs/amigaffs.c
===================================================================
--- linux-2.6.orig/fs/affs/amigaffs.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/affs/amigaffs.c	2010-10-04 12:13:00.000000000 +0200
@@ -458,7 +458,7 @@ affs_error(struct super_block *sb, const
 		function,ErrorBuffer);
 	if (!(sb->s_flags & MS_RDONLY))
 		printk(KERN_WARNING "AFFS: Remounting filesystem read-only\n");
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 }
 
 void
Index: linux-2.6/fs/ext2/super.c
===================================================================
--- linux-2.6.orig/fs/ext2/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/ext2/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -69,7 +69,7 @@ void ext2_error (struct super_block * sb
 	if (test_opt(sb, ERRORS_RO)) {
 		ext2_msg(sb, KERN_CRIT,
 			     "error: remounting filesystem read-only");
-		sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(sb);
 	}
 }
 
Index: linux-2.6/fs/ext3/super.c
===================================================================
--- linux-2.6.orig/fs/ext3/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/ext3/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -188,7 +188,7 @@ static void ext3_handle_error(struct sup
 	if (test_opt (sb, ERRORS_RO)) {
 		ext3_msg(sb, KERN_CRIT,
 			"error: remounting filesystem read-only");
-		sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(sb);
 	}
 	ext3_commit_super(sb, es, 1);
 	if (test_opt(sb, ERRORS_PANIC))
@@ -295,7 +295,7 @@ void ext3_abort (struct super_block * sb
 	ext3_msg(sb, KERN_CRIT,
 		"error: remounting filesystem read-only");
 	EXT3_SB(sb)->s_mount_state |= EXT3_ERROR_FS;
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 	set_opt(EXT3_SB(sb)->s_mount_opt, ABORT);
 	if (EXT3_SB(sb)->s_journal)
 		journal_abort(EXT3_SB(sb)->s_journal, -EIO);
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-10-04 11:57:35.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-10-04 12:13:00.000000000 +0200
@@ -2057,6 +2057,7 @@ extern const struct file_operations writ
 extern const struct file_operations rdwr_pipefifo_fops;
 
 extern int fs_may_remount_ro(struct super_block *);
+extern void sb_force_remount_readonly(struct super_block *);
 
 #ifdef CONFIG_BLOCK
 /*
Index: linux-2.6/fs/ext4/super.c
===================================================================
--- linux-2.6.orig/fs/ext4/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/ext4/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -371,7 +371,7 @@ static void ext4_handle_error(struct sup
 	}
 	if (test_opt(sb, ERRORS_RO)) {
 		ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only");
-		sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(sb);
 	}
 	if (test_opt(sb, ERRORS_PANIC))
 		panic("EXT4-fs (device %s): panic forced after error\n",
@@ -526,7 +526,7 @@ void __ext4_abort(struct super_block *sb
 
 	if ((sb->s_flags & MS_RDONLY) == 0) {
 		ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only");
-		sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(sb);
 		EXT4_SB(sb)->s_mount_flags |= EXT4_MF_FS_ABORTED;
 		if (EXT4_SB(sb)->s_journal)
 			jbd2_journal_abort(EXT4_SB(sb)->s_journal, -EIO);
Index: linux-2.6/fs/fat/misc.c
===================================================================
--- linux-2.6.orig/fs/fat/misc.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/fat/misc.c	2010-10-04 12:13:00.000000000 +0200
@@ -38,7 +38,7 @@ void __fat_fs_error(struct super_block *
 	if (opts->errors == FAT_ERRORS_PANIC)
 		panic("FAT: fs panic from previous error\n");
 	else if (opts->errors == FAT_ERRORS_RO && !(s->s_flags & MS_RDONLY)) {
-		s->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(s);
 		printk(KERN_ERR "FAT: Filesystem has been set read-only\n");
 	}
 }
Index: linux-2.6/fs/hpfs/super.c
===================================================================
--- linux-2.6.orig/fs/hpfs/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/hpfs/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -71,7 +71,7 @@ void hpfs_error(struct super_block *s, c
 			else {
 				printk("; remounting read-only\n");
 				mark_dirty(s);
-				s->s_flags |= MS_RDONLY;
+				sb_force_remount_readonly(s);
 			}
 		} else if (s->s_flags & MS_RDONLY) printk("; going on - but anything won't be destroyed because it's read-only\n");
 		else printk("; corrupted filesystem mounted read/write - your computer will explode within 20 seconds ... but you wanted it so!\n");
Index: linux-2.6/fs/jfs/super.c
===================================================================
--- linux-2.6.orig/fs/jfs/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/jfs/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -86,7 +86,7 @@ static void jfs_handle_error(struct supe
 		jfs_err("ERROR: (device %s): remounting filesystem "
 			"as read-only\n",
 			sb->s_id);
-		sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(sb);
 	}
 
 	/* nothing is done for continue beyond marking the superblock dirty */
Index: linux-2.6/fs/nilfs2/super.c
===================================================================
--- linux-2.6.orig/fs/nilfs2/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/nilfs2/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -124,7 +124,7 @@ void nilfs_error(struct super_block *sb,
 
 		if (nilfs_test_opt(sbi, ERRORS_RO)) {
 			printk(KERN_CRIT "Remounting filesystem read-only\n");
-			sb->s_flags |= MS_RDONLY;
+			sb_force_remount_readonly(sb);
 		}
 	}
 
Index: linux-2.6/fs/ocfs2/super.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/ocfs2/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -2504,7 +2504,7 @@ static void ocfs2_handle_error(struct su
 	printk(KERN_CRIT "File system is now read-only due to the potential "
 	       "of on-disk corruption. Please run fsck.ocfs2 once the file "
 	       "system is unmounted.\n");
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 	ocfs2_set_ro_flag(osb, 0);
 }
 
Index: linux-2.6/fs/reiserfs/journal.c
===================================================================
--- linux-2.6.orig/fs/reiserfs/journal.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/reiserfs/journal.c	2010-10-04 12:13:00.000000000 +0200
@@ -4383,7 +4383,7 @@ void reiserfs_abort_journal(struct super
 	if (!journal->j_errno)
 		journal->j_errno = errno;
 
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 	set_bit(J_ABORTED, &journal->j_state);
 
 #ifdef CONFIG_REISERFS_CHECK
Index: linux-2.6/fs/reiserfs/prints.c
===================================================================
--- linux-2.6.orig/fs/reiserfs/prints.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/reiserfs/prints.c	2010-10-04 12:13:00.000000000 +0200
@@ -387,7 +387,7 @@ void __reiserfs_error(struct super_block
 		return;
 
 	reiserfs_info(sb, "Remounting filesystem read-only\n");
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 	reiserfs_abort_journal(sb, -EIO);
 }
 
@@ -406,7 +406,7 @@ void reiserfs_abort(struct super_block *
 	printk(KERN_CRIT "REISERFS abort (device %s): %s\n", sb->s_id,
 	       error_buf);
 
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 	reiserfs_abort_journal(sb, errno);
 }
 
Index: linux-2.6/fs/ubifs/io.c
===================================================================
--- linux-2.6.orig/fs/ubifs/io.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/ubifs/io.c	2010-10-04 12:13:00.000000000 +0200
@@ -64,7 +64,7 @@ void ubifs_ro_mode(struct ubifs_info *c,
 	if (!c->ro_media) {
 		c->ro_media = 1;
 		c->no_chk_data_crc = 0;
-		c->vfs_sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(c->vfs_sb);
 		ubifs_warn("switched to read-only mode, error %d", err);
 		dbg_dump_stack();
 	}
Index: linux-2.6/fs/ufs/super.c
===================================================================
--- linux-2.6.orig/fs/ufs/super.c	2010-10-04 09:38:21.000000000 +0200
+++ linux-2.6/fs/ufs/super.c	2010-10-04 12:13:00.000000000 +0200
@@ -288,7 +288,7 @@ void ufs_error (struct super_block * sb,
 		usb1->fs_clean = UFS_FSBAD;
 		ubh_mark_buffer_dirty(USPI_UBH(uspi));
 		sb->s_dirt = 1;
-		sb->s_flags |= MS_RDONLY;
+		sb_force_remount_readonly(sb);
 	}
 	va_start (args, fmt);
 	vsnprintf (error_buf, sizeof(error_buf), fmt, args);
@@ -325,7 +325,7 @@ void ufs_panic (struct super_block * sb,
 	va_start (args, fmt);
 	vsnprintf (error_buf, sizeof(error_buf), fmt, args);
 	va_end (args);
-	sb->s_flags |= MS_RDONLY;
+	sb_force_remount_readonly(sb);
 	printk (KERN_CRIT "UFS-fs panic (device %s): %s: %s\n",
 		sb->s_id, function, error_buf);
 }

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 5/9] vfs: allow mnt_want_write() to sleep
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
                   ` (3 preceding siblings ...)
  2010-10-05 10:31 ` [PATCH 4/9] vfs: add sb_force_remount_readonly() helper Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 6/9] vfs: keep list of mounts for each superblock Miklos Szeredi
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-sleeping-mnt_want_write.patch --]
[-- Type: text/plain, Size: 2706 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Allow mnt_want_write() to sleep.

This is necessary to enable holding off write requests for the
duration of ->remount_fs(), which may sleep.

A quick audit didn't turn up any callers from atomic context.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namespace.c     |   11 +++++++----
 fs/super.c         |    1 +
 include/linux/fs.h |    5 +++++
 3 files changed, 13 insertions(+), 4 deletions(-)

Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-04 12:15:17.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-04 16:50:44.000000000 +0200
@@ -281,8 +281,12 @@ int mnt_want_write(struct vfsmount *mnt)
 	 * incremented count after it has set MNT_WRITE_HOLD.
 	 */
 	smp_mb();
-	while (mnt->mnt_flags & MNT_WRITE_HOLD)
-		cpu_relax();
+	if (unlikely(mnt->mnt_flags & MNT_WRITE_HOLD)) {
+		preempt_enable();
+		wait_event(mnt->mnt_sb->s_wait_remount_readonly,
+			   !(mnt->mnt_flags & MNT_WRITE_HOLD));
+		preempt_disable();
+	}
 	/*
 	 * After the slowpath clears MNT_WRITE_HOLD, mnt_is_readonly will
 	 * be set to match its requirements. So we must not load that until
@@ -292,9 +296,7 @@ int mnt_want_write(struct vfsmount *mnt)
 	if (__mnt_is_readonly(mnt)) {
 		dec_mnt_writers(mnt);
 		ret = -EROFS;
-		goto out;
 	}
-out:
 	preempt_enable();
 	return ret;
 }
@@ -395,6 +397,7 @@ static int mnt_make_readonly(struct vfsm
 	 */
 	smp_wmb();
 	mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+	wake_up_all(&mnt->mnt_sb->s_wait_remount_readonly);
 	br_write_unlock(vfsmount_lock);
 	return ret;
 }
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-04 12:12:01.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-04 16:50:25.000000000 +0200
@@ -107,6 +107,7 @@ static struct super_block *alloc_super(s
 		mutex_init(&s->s_dquot.dqonoff_mutex);
 		init_rwsem(&s->s_dquot.dqptr_sem);
 		init_waitqueue_head(&s->s_wait_unfrozen);
+		init_waitqueue_head(&s->s_wait_remount_readonly);
 		s->s_maxbytes = MAX_NON_LFS;
 		s->s_op = &default_op;
 		s->s_time_gran = 1000000000;
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-10-04 12:13:00.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-10-04 16:50:25.000000000 +0200
@@ -1385,6 +1385,11 @@ struct super_block {
 	 * generic_show_options()
 	 */
 	char *s_options;
+
+	/*
+	 * Wait queue for remouting read-only
+	 */
+	wait_queue_head_t s_wait_remount_readonly;
 };
 
 extern struct timespec current_fs_time(struct super_block *sb);

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/9] vfs: keep list of mounts for each superblock
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
                   ` (4 preceding siblings ...)
  2010-10-05 10:31 ` [PATCH 5/9] vfs: allow mnt_want_write() to sleep Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 7/9] vfs: protect remounting superblock read-only Miklos Szeredi
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-mount-list.patch --]
[-- Type: text/plain, Size: 3370 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Keep track of vfsmounts belonging to a superblock.  List is protected
by vfsmount_lock.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namespace.c        |    5 +++++
 fs/super.c            |    7 +++++++
 include/linux/fs.h    |    1 +
 include/linux/mount.h |    1 +
 4 files changed, 14 insertions(+)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-10-04 12:15:30.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-10-04 12:38:35.000000000 +0200
@@ -1346,6 +1346,7 @@ struct super_block {
 #else
 	struct list_head	s_files;
 #endif
+	struct list_head	s_mounts;	/* list of mounts */
 	/* s_dentry_lru and s_nr_dentry_unused are protected by dcache_lock */
 	struct list_head	s_dentry_lru;	/* unused dentry lru */
 	int			s_nr_dentry_unused;	/* # of dentry on lru */
Index: linux-2.6/include/linux/mount.h
===================================================================
--- linux-2.6.orig/include/linux/mount.h	2010-10-01 20:40:41.000000000 +0200
+++ linux-2.6/include/linux/mount.h	2010-10-04 12:38:35.000000000 +0200
@@ -54,6 +54,7 @@ struct vfsmount {
 	struct super_block *mnt_sb;	/* pointer to superblock */
 	struct list_head mnt_mounts;	/* list of children, anchored here */
 	struct list_head mnt_child;	/* and going through their mnt_child */
+	struct list_head mnt_instance;	/* mount instance on sb->s_mounts */
 	int mnt_flags;
 	/* 4 bytes hole on 64bits arches without fsnotify */
 #ifdef CONFIG_FSNOTIFY
Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-04 12:15:30.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-04 12:38:35.000000000 +0200
@@ -638,6 +638,10 @@ static struct vfsmount *clone_mnt(struct
 			if (!list_empty(&old->mnt_expire))
 				list_add(&mnt->mnt_expire, &old->mnt_expire);
 		}
+
+		br_write_lock(vfsmount_lock);
+		list_add_tail(&mnt->mnt_instance, &mnt->mnt_sb->s_mounts);
+		br_write_unlock(vfsmount_lock);
 	}
 	return mnt;
 
@@ -677,6 +681,7 @@ repeat:
 		return;
 	}
 	if (likely(!mnt->mnt_pinned)) {
+		list_del(&mnt->mnt_instance);
 		br_write_unlock(vfsmount_lock);
 		__mntput(mnt);
 		return;
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-04 12:15:30.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-04 12:38:35.000000000 +0200
@@ -74,6 +74,7 @@ static struct super_block *alloc_super(s
 		INIT_HLIST_HEAD(&s->s_anon);
 		INIT_LIST_HEAD(&s->s_inodes);
 		INIT_LIST_HEAD(&s->s_dentry_lru);
+		INIT_LIST_HEAD(&s->s_mounts);
 		init_rwsem(&s->s_umount);
 		mutex_init(&s->s_lock);
 		lockdep_set_class(&s->s_umount, &type->s_umount_key);
@@ -128,6 +129,7 @@ static inline void destroy_super(struct
 	free_percpu(s->s_files);
 #endif
 	security_sb_free(s);
+	WARN_ON(!list_empty(&s->s_mounts));
 	kfree(s->s_subtype);
 	kfree(s->s_options);
 	kfree(s);
@@ -969,6 +971,11 @@ vfs_kern_mount(struct file_system_type *
 	mnt->mnt_parent = mnt;
 	up_write(&mnt->mnt_sb->s_umount);
 	free_secdata(secdata);
+
+	br_write_lock(vfsmount_lock);
+	list_add_tail(&mnt->mnt_instance, &mnt->mnt_sb->s_mounts);
+	br_write_unlock(vfsmount_lock);
+
 	return mnt;
 out_sb:
 	dput(mnt->mnt_root);

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 7/9] vfs: protect remounting superblock read-only
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
                   ` (5 preceding siblings ...)
  2010-10-05 10:31 ` [PATCH 6/9] vfs: keep list of mounts for each superblock Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-22  6:46   ` Al Viro
  2010-10-05 10:31 ` [PATCH 8/9] vfs: fs_may_remount_ro: turn unnecessary check into a WARN_ON Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 9/9] vfs: mark mounts read-only on forced remount Miklos Szeredi
  8 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-protect-sb-remount.patch --]
[-- Type: text/plain, Size: 6118 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Currently remouting superblock read-only is racy in a major way.

With the per mount read-only infrastructure it is now possible to
prevent most races, which this patch attempts.

Before starting the remount read-only, set MNT_WRITE_HOLD on all
mounts so that writes are held off until the remount either succeeds
or fails.

After this proceed with the remount and if successful mark all mounts
MNT_READONLY and release MNT_WRITE_HOLD.  If unsuccessful, just
release MNT_WRITE_HOLD so that write operations can proceed normally.

Careful thought needs to be given to places where mnt_flags are set
such as do_add_mount() and clone_mnt() to ensure that a read-write
mount is not added to a read-only superblock.

Small races still remain because delayed writes due to nlink going to
zero but inode still having a reference are not dealt with by this
patch.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/internal.h  |    3 ++
 fs/namespace.c |   70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/super.c     |   25 +++++++++++++++++---
 3 files changed, 93 insertions(+), 5 deletions(-)

Index: linux-2.6/fs/internal.h
===================================================================
--- linux-2.6.orig/fs/internal.h	2010-10-01 20:40:41.000000000 +0200
+++ linux-2.6/fs/internal.h	2010-10-04 13:02:24.000000000 +0200
@@ -69,6 +69,9 @@ extern void mnt_set_mountpoint(struct vf
 extern void release_mounts(struct list_head *);
 extern void umount_tree(struct vfsmount *, int, struct list_head *);
 extern struct vfsmount *copy_tree(struct vfsmount *, struct dentry *, int);
+extern int sb_prepare_remount_readonly(struct super_block *);
+extern void sb_cancel_remount_readonly(struct super_block *);
+extern void sb_finish_remount_readonly(struct super_block *);
 
 extern void __init mnt_init(void);
 
Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-04 12:38:35.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-04 13:02:24.000000000 +0200
@@ -422,6 +422,61 @@ void sb_force_remount_readonly(struct su
 }
 EXPORT_SYMBOL(sb_force_remount_readonly);
 
+int sb_prepare_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+	int err = 0;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (!(mnt->mnt_flags & MNT_READONLY)) {
+			mnt->mnt_flags |= MNT_WRITE_HOLD;
+			smp_mb();
+			if (count_mnt_writers(mnt) > 0) {
+				err = -EBUSY;
+				break;
+			}
+		}
+	}
+	br_write_unlock(vfsmount_lock);
+
+	if (err)
+		sb_cancel_remount_readonly(sb);
+
+	return err;
+}
+
+void sb_cancel_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (mnt->mnt_flags & MNT_WRITE_HOLD)
+			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+	}
+
+	br_write_unlock(vfsmount_lock);
+	wake_up_all(&sb->s_wait_remount_readonly);
+}
+
+void sb_finish_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (!(mnt->mnt_flags & MNT_READONLY)) {
+			mnt->mnt_flags |= MNT_READONLY;
+			smp_wmb();
+			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+		}
+	}
+	sb->s_flags |= MS_RDONLY;
+	br_write_unlock(vfsmount_lock);
+	wake_up_all(&sb->s_wait_remount_readonly);
+}
+
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
@@ -611,7 +666,7 @@ static struct vfsmount *clone_mnt(struct
 				goto out_free;
 		}
 
-		mnt->mnt_flags = old->mnt_flags & ~MNT_WRITE_HOLD;
+		mnt->mnt_flags = old->mnt_flags & MNT_PROPAGATION_MASK;
 		atomic_inc(&sb->s_active);
 		mnt->mnt_sb = sb;
 		mnt->mnt_root = dget(root);
@@ -639,7 +694,13 @@ static struct vfsmount *clone_mnt(struct
 				list_add(&mnt->mnt_expire, &old->mnt_expire);
 		}
 
+		/*
+		 * Add new mount to s_mounts.  Do this after fiddling
+		 * with propagation flags to prevent races with
+		 * remount ro/rw.
+		 */
 		br_write_lock(vfsmount_lock);
+		mnt->mnt_flags |= old->mnt_flags & ~MNT_PROPAGATION_MASK;
 		list_add_tail(&mnt->mnt_instance, &mnt->mnt_sb->s_mounts);
 		br_write_unlock(vfsmount_lock);
 	}
@@ -1804,7 +1865,14 @@ int do_add_mount(struct vfsmount *newmnt
 	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
+	/* Locking is necessary to prevent racing with remount r/o */
+	br_read_lock(vfsmount_lock);
+	if (newmnt->mnt_sb->s_flags & MS_RDONLY)
+		mnt_flags |= MNT_READONLY;
+
 	newmnt->mnt_flags = mnt_flags;
+	br_read_unlock(vfsmount_lock);
+
 	if ((err = graft_tree(newmnt, path)))
 		goto unlock;
 
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-04 12:38:35.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-04 13:02:24.000000000 +0200
@@ -574,18 +574,31 @@ int do_remount_sb(struct super_block *sb
 	/* If we are remounting RDONLY and current sb is read/write,
 	   make sure there are no rw files opened */
 	if (remount_ro) {
-		if (force)
+		if (force) {
 			mark_files_ro(sb);
-		else if (!fs_may_remount_ro(sb))
-			return -EBUSY;
+		} else {
+			retval = sb_prepare_remount_readonly(sb);
+			if (retval)
+				return retval;
+
+			retval = -EBUSY;
+			if (!fs_may_remount_ro(sb))
+				goto cancel_readonly;
+		}
 	}
 
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		/* If forced remount, go ahead despite any errors */
-		if (retval && !force)
+		if (retval && !force) {
+			if (remount_ro)
+				goto cancel_readonly;
 			return retval;
+		}
 	}
+	if (remount_ro && !force)
+		sb_finish_remount_readonly(sb);
+
 	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);
 
 	/*
@@ -599,6 +612,10 @@ int do_remount_sb(struct super_block *sb
 	if (remount_ro && sb->s_bdev)
 		invalidate_bdev(sb->s_bdev);
 	return 0;
+
+cancel_readonly:
+	sb_cancel_remount_readonly(sb);
+	return retval;
 }
 
 static void do_emergency_remount(struct work_struct *work)

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 8/9] vfs: fs_may_remount_ro: turn unnecessary check into a WARN_ON
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
                   ` (6 preceding siblings ...)
  2010-10-05 10:31 ` [PATCH 7/9] vfs: protect remounting superblock read-only Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  2010-10-05 10:31 ` [PATCH 9/9] vfs: mark mounts read-only on forced remount Miklos Szeredi
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-remove-unnecessary-remount-check.patch --]
[-- Type: text/plain, Size: 1017 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Now a successful sb_prepare_remount_readonly() should ensure that no
writable files remain for this superblock.  So turn this check into a
WARN_ON.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/file_table.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6/fs/file_table.c
===================================================================
--- linux-2.6.orig/fs/file_table.c	2010-10-04 10:44:23.000000000 +0200
+++ linux-2.6/fs/file_table.c	2010-10-04 15:07:29.000000000 +0200
@@ -437,8 +437,11 @@ int fs_may_remount_ro(struct super_block
 		if (inode->i_nlink == 0)
 			goto too_bad;
 
-		/* Writeable file? */
-		if (S_ISREG(inode->i_mode) && (file->f_mode & FMODE_WRITE))
+		/*
+		 * Writable file?
+		 * Should be caught by sb_prepare_remount_readonly().
+		 */
+		if (WARN_ON(S_ISREG(inode->i_mode) && (file->f_mode & FMODE_WRITE)))
 			goto too_bad;
 	} while_file_list_for_each_entry;
 	lg_global_unlock(files_lglock);

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 9/9] vfs: mark mounts read-only on forced remount
  2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
                   ` (7 preceding siblings ...)
  2010-10-05 10:31 ` [PATCH 8/9] vfs: fs_may_remount_ro: turn unnecessary check into a WARN_ON Miklos Szeredi
@ 2010-10-05 10:31 ` Miklos Szeredi
  8 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:31 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: vfs-force-mounts-readonly.patch --]
[-- Type: text/plain, Size: 3483 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

When the superblock is forcefully remounted read-only, as in case of
an emergency remount or filesystem errors, make sure that the mounts
belonging to the superblock are also marked read-only.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/internal.h  |    1 +
 fs/namespace.c |   16 ++++++++++++++++
 fs/super.c     |   16 ++++++++++++----
 3 files changed, 29 insertions(+), 4 deletions(-)

Index: linux-2.6/fs/internal.h
===================================================================
--- linux-2.6.orig/fs/internal.h	2010-10-04 13:02:24.000000000 +0200
+++ linux-2.6/fs/internal.h	2010-10-04 15:09:49.000000000 +0200
@@ -72,6 +72,7 @@ extern struct vfsmount *copy_tree(struct
 extern int sb_prepare_remount_readonly(struct super_block *);
 extern void sb_cancel_remount_readonly(struct super_block *);
 extern void sb_finish_remount_readonly(struct super_block *);
+extern void mark_mounts_readonly(struct super_block *);
 
 extern void __init mnt_init(void);
 
Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-04 13:02:24.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-04 15:09:49.000000000 +0200
@@ -418,7 +418,13 @@ static int mnt_make_writable(struct vfsm
 
 void sb_force_remount_readonly(struct super_block *sb)
 {
+	struct vfsmount *mnt;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance)
+		mnt->mnt_flags |= MNT_READONLY;
 	sb->s_flags |= MS_RDONLY;
+	br_write_unlock(vfsmount_lock);
 }
 EXPORT_SYMBOL(sb_force_remount_readonly);
 
@@ -477,6 +483,16 @@ void sb_finish_remount_readonly(struct s
 	wake_up_all(&sb->s_wait_remount_readonly);
 }
 
+void mark_mounts_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance)
+		mnt->mnt_flags |= MNT_READONLY;
+	br_write_unlock(vfsmount_lock);
+}
+
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-04 13:02:24.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-04 15:09:49.000000000 +0200
@@ -555,6 +555,7 @@ int do_remount_sb(struct super_block *sb
 {
 	int retval;
 	int remount_ro;
+	bool old_ro;
 
 	if (sb->s_frozen != SB_UNFROZEN)
 		return -EBUSY;
@@ -569,12 +570,14 @@ int do_remount_sb(struct super_block *sb
 	shrink_dcache_sb(sb);
 	sync_filesystem(sb);
 
-	remount_ro = (flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY);
+	old_ro = (sb->s_flags & MS_RDONLY) != 0;
+	remount_ro = (flags & MS_RDONLY) && !old_ro;
 
 	/* If we are remounting RDONLY and current sb is read/write,
 	   make sure there are no rw files opened */
 	if (remount_ro) {
 		if (force) {
+			mark_mounts_readonly(sb);
 			mark_files_ro(sb);
 		} else {
 			retval = sb_prepare_remount_readonly(sb);
@@ -596,9 +599,14 @@ int do_remount_sb(struct super_block *sb
 			return retval;
 		}
 	}
-	if (remount_ro && !force)
-		sb_finish_remount_readonly(sb);
-
+	if (remount_ro) {
+		WARN_ON(!(flags & MS_RDONLY));
+		if (!force)
+			sb_finish_remount_readonly(sb);
+	} else if ((flags & MS_RDONLY) && !old_ro) {
+		/* filesystem remounted r/o without being asked to */
+		sb_force_remount_readonly(sb);
+	}
 	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);
 
 	/*

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race
  2010-10-05 10:31 ` [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race Miklos Szeredi
@ 2010-10-05 10:49   ` Miklos Szeredi
  0 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-05 10:49 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

On Tue, 05 Oct 2010, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> If clone_mnt() happens while mnt_make_readonly() is running, the
> cloned mount might have MNT_WRITE_HOLD flag set, which results in
> mnt_want_write() spinning forever on this mount.
> 
> Needs CAP_SYS_ADMIN to trigger deliberately and unlikely to happen
> accidentally.  But if it does happen it can hang the machine.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>

Forgot to add

CC: stable@kernel.org

> ---
>  fs/namespace.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6/fs/namespace.c
> ===================================================================
> --- linux-2.6.orig/fs/namespace.c	2010-10-01 21:54:17.000000000 +0200
> +++ linux-2.6/fs/namespace.c	2010-10-01 21:54:30.000000000 +0200
> @@ -595,7 +595,7 @@ static struct vfsmount *clone_mnt(struct
>  				goto out_free;
>  		}
>  
> -		mnt->mnt_flags = old->mnt_flags;
> +		mnt->mnt_flags = old->mnt_flags & ~MNT_WRITE_HOLD;
>  		atomic_inc(&sb->s_active);
>  		mnt->mnt_sb = sb;
>  		mnt->mnt_root = dget(root);
> 
> -- 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9] vfs: protect remounting superblock read-only
  2010-10-05 10:31 ` [PATCH 7/9] vfs: protect remounting superblock read-only Miklos Szeredi
@ 2010-10-22  6:46   ` Al Viro
  2010-10-22 12:25     ` Miklos Szeredi
  0 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2010-10-22  6:46 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: dave, akpm, linux-fsdevel, linux-kernel

On Tue, Oct 05, 2010 at 12:31:15PM +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> Currently remouting superblock read-only is racy in a major way.
> 
> With the per mount read-only infrastructure it is now possible to
> prevent most races, which this patch attempts.
> 
> Before starting the remount read-only, set MNT_WRITE_HOLD on all
> mounts so that writes are held off until the remount either succeeds
> or fails.

Umm...  What'll happen if your remount will say mnt_want_write() on
e.g. internal vfsmount over that sb?  Or, more subtle, tries to
update atime on some opened struct file on that sb.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9] vfs: protect remounting superblock read-only
  2010-10-22  6:46   ` Al Viro
@ 2010-10-22 12:25     ` Miklos Szeredi
  2010-10-22 16:10       ` Miklos Szeredi
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-22 12:25 UTC (permalink / raw)
  To: Al Viro; +Cc: miklos, dave, akpm, linux-fsdevel, linux-kernel

On Fri, 22 Oct 2010, Al Viro wrote:
> On Tue, Oct 05, 2010 at 12:31:15PM +0200, Miklos Szeredi wrote:
> > From: Miklos Szeredi <mszeredi@suse.cz>
> > 
> > Currently remouting superblock read-only is racy in a major way.
> > 
> > With the per mount read-only infrastructure it is now possible to
> > prevent most races, which this patch attempts.
> > 
> > Before starting the remount read-only, set MNT_WRITE_HOLD on all
> > mounts so that writes are held off until the remount either succeeds
> > or fails.
> 
> Umm...  What'll happen if your remount will say mnt_want_write() on
> e.g. internal vfsmount over that sb?

Quickly looking through filesystems I didn't find this sort of usage.

>   Or, more subtle, tries to
> update atime on some opened struct file on that sb.

Hmm, load_nls() will apparently do that.  Drat.

Plan B: remount all vfsmounts read-only before ->remount_fs() but
remember which ones were read-write and restore on failure.  Will
result in ugly transient write failures if remount_fs() fails, but I
don't think anybody would care in practice.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9] vfs: protect remounting superblock read-only
  2010-10-22 12:25     ` Miklos Szeredi
@ 2010-10-22 16:10       ` Miklos Szeredi
  2010-10-22 16:14         ` [PATCH 7/9 updated] " Miklos Szeredi
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-22 16:10 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

On Fri, 22 Oct 2010, Miklos Szeredi wrote:
> >   Or, more subtle, tries to
> > update atime on some opened struct file on that sb.
> 
> Hmm, load_nls() will apparently do that.  Drat.
> 
> Plan B: remount all vfsmounts read-only before ->remount_fs() but
> remember which ones were read-write and restore on failure.  Will
> result in ugly transient write failures if remount_fs() fails, but I
> don't think anybody would care in practice.

And it's actually a pretty simple change.  Here's the incremental
patch to "vfs: protect remounting superblock read-only", with "vfs:
allow mnt_want_write() to sleep" reverted.

The full, updated patch with follow.

Thanks,
Miklos



Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-22 16:24:41.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-22 16:17:59.000000000 +0200
@@ -435,11 +435,17 @@ int sb_prepare_remount_readonly(struct s
 			}
 		}
 	}
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (mnt->mnt_flags & MNT_WRITE_HOLD) {
+			if (!err) {
+				mnt->mnt_flags |= MNT_READONLY | MNT_WAS_WRITE;
+				smp_wmb();
+			}
+			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+		}
+	}
 	br_write_unlock(vfsmount_lock);
 
-	if (err)
-		sb_cancel_remount_readonly(sb);
-
 	return err;
 }
 
@@ -449,12 +455,11 @@ void sb_cancel_remount_readonly(struct s
 
 	br_write_lock(vfsmount_lock);
 	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
-		if (mnt->mnt_flags & MNT_WRITE_HOLD)
-			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+		if (mnt->mnt_flags & MNT_WAS_WRITE)
+			mnt->mnt_flags &= ~(MNT_WAS_WRITE | MNT_READONLY);
 	}
 
 	br_write_unlock(vfsmount_lock);
-	wake_up_all(&sb->s_wait_remount_readonly);
 }
 
 void sb_finish_remount_readonly(struct super_block *sb)
@@ -463,15 +468,11 @@ void sb_finish_remount_readonly(struct s
 
 	br_write_lock(vfsmount_lock);
 	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
-		if (!(mnt->mnt_flags & MNT_READONLY)) {
-			mnt->mnt_flags |= MNT_READONLY;
-			smp_wmb();
-			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
-		}
+		if (mnt->mnt_flags & MNT_WAS_WRITE)
+			mnt->mnt_flags &= ~MNT_WAS_WRITE;
 	}
 	sb->s_flags |= MS_RDONLY;
 	br_write_unlock(vfsmount_lock);
-	wake_up_all(&sb->s_wait_remount_readonly);
 }
 
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
Index: linux-2.6/include/linux/mount.h
===================================================================
--- linux-2.6.orig/include/linux/mount.h	2010-10-22 16:02:14.000000000 +0200
+++ linux-2.6/include/linux/mount.h	2010-10-22 16:20:26.000000000 +0200
@@ -30,6 +30,7 @@ struct mnt_namespace;
 
 #define MNT_SHRINKABLE	0x100
 #define MNT_WRITE_HOLD	0x200
+#define MNT_WAS_WRITE	0x400   /* used by remount to remember write mounts */
 
 #define MNT_SHARED	0x1000	/* if the vfsmount is a shared mount */
 #define MNT_UNBINDABLE	0x2000	/* if the vfsmount is a unbindable mount */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 7/9 updated] vfs: protect remounting superblock read-only
  2010-10-22 16:10       ` Miklos Szeredi
@ 2010-10-22 16:14         ` Miklos Szeredi
  2010-10-23 16:19           ` Al Viro
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-22 16:14 UTC (permalink / raw)
  To: viro; +Cc: dave, akpm, linux-fsdevel, linux-kernel

From: Miklos Szeredi <mszeredi@suse.cz>

Currently remouting superblock read-only is racy in a major way.

With the per mount read-only infrastructure it is now possible to
prevent most races, which this patch attempts.

Before starting the remount read-only, set MNT_READONLY on all mounts
so that writes are prevented during the remount.  Also set
MNT_WAS_WRITE on mounts that were not read-only.

If the remounting is unsuccessful, restore the mounts to read-write.
This can result in transient EROFS errors.  Unfortunately hodling off
writes is difficult as remount itself may touch the filesystem
(e.g. through load_nls()) which would then deadlock.

Careful thought needs to be given to places where mnt_flags are set
such as do_add_mount() and clone_mnt() to ensure that a read-write
mount is not added to a read-only superblock.

Small races still remain because delayed writes due to nlink going to
zero but inode still having a reference are not dealt with by this
patch.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/internal.h         |    3 ++
 fs/namespace.c        |   73 ++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/super.c            |   25 ++++++++++++++---
 include/linux/mount.h |    1 
 4 files changed, 96 insertions(+), 6 deletions(-)

Index: linux-2.6/fs/internal.h
===================================================================
--- linux-2.6.orig/fs/internal.h	2010-10-21 19:38:36.000000000 +0200
+++ linux-2.6/fs/internal.h	2010-10-22 17:48:16.000000000 +0200
@@ -69,6 +69,9 @@ extern void mnt_set_mountpoint(struct vf
 extern void release_mounts(struct list_head *);
 extern void umount_tree(struct vfsmount *, int, struct list_head *);
 extern struct vfsmount *copy_tree(struct vfsmount *, struct dentry *, int);
+extern int sb_prepare_remount_readonly(struct super_block *);
+extern void sb_cancel_remount_readonly(struct super_block *);
+extern void sb_finish_remount_readonly(struct super_block *);
 
 extern void __init mnt_init(void);
 
Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-22 16:02:14.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-22 17:59:30.000000000 +0200
@@ -419,6 +419,62 @@ void sb_force_remount_readonly(struct su
 }
 EXPORT_SYMBOL(sb_force_remount_readonly);
 
+int sb_prepare_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+	int err = 0;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (!(mnt->mnt_flags & MNT_READONLY)) {
+			mnt->mnt_flags |= MNT_WRITE_HOLD;
+			smp_mb();
+			if (count_mnt_writers(mnt) > 0) {
+				err = -EBUSY;
+				break;
+			}
+		}
+	}
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (mnt->mnt_flags & MNT_WRITE_HOLD) {
+			if (!err) {
+				mnt->mnt_flags |= MNT_READONLY | MNT_WAS_WRITE;
+				smp_wmb();
+			}
+			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+		}
+	}
+	br_write_unlock(vfsmount_lock);
+
+	return err;
+}
+
+void sb_cancel_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (mnt->mnt_flags & MNT_WAS_WRITE)
+			mnt->mnt_flags &= ~(MNT_WAS_WRITE | MNT_READONLY);
+	}
+
+	br_write_unlock(vfsmount_lock);
+}
+
+void sb_finish_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (mnt->mnt_flags & MNT_WAS_WRITE)
+			mnt->mnt_flags &= ~MNT_WAS_WRITE;
+	}
+	sb->s_flags |= MS_RDONLY;
+	br_write_unlock(vfsmount_lock);
+}
+
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
@@ -608,7 +664,7 @@ static struct vfsmount *clone_mnt(struct
 				goto out_free;
 		}
 
-		mnt->mnt_flags = old->mnt_flags & ~MNT_WRITE_HOLD;
+		mnt->mnt_flags = old->mnt_flags & MNT_PROPAGATION_MASK;
 		atomic_inc(&sb->s_active);
 		mnt->mnt_sb = sb;
 		mnt->mnt_root = dget(root);
@@ -636,7 +692,13 @@ static struct vfsmount *clone_mnt(struct
 				list_add(&mnt->mnt_expire, &old->mnt_expire);
 		}
 
+		/*
+		 * Add new mount to s_mounts.  Do this after fiddling
+		 * with propagation flags to prevent races with
+		 * remount ro/rw.
+		 */
 		br_write_lock(vfsmount_lock);
+		mnt->mnt_flags |= old->mnt_flags & ~MNT_PROPAGATION_MASK;
 		list_add_tail(&mnt->mnt_instance, &mnt->mnt_sb->s_mounts);
 		br_write_unlock(vfsmount_lock);
 	}
@@ -1782,6 +1844,14 @@ int do_add_mount(struct vfsmount *newmnt
 
 	mnt_flags &= ~(MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL);
 
+	/* Locking is necessary to prevent racing with remount r/o */
+	down_read(&newmnt->mnt_sb->s_umount);
+	if (newmnt->mnt_sb->s_flags & MS_RDONLY)
+		mnt_flags |= MNT_READONLY;
+
+	newmnt->mnt_flags = mnt_flags;
+	up_read(&newmnt->mnt_sb->s_umount);
+
 	down_write(&namespace_sem);
 	/* Something was mounted here while we slept */
 	while (d_mountpoint(path->dentry) &&
@@ -1801,7 +1871,6 @@ int do_add_mount(struct vfsmount *newmnt
 	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
-	newmnt->mnt_flags = mnt_flags;
 	if ((err = graft_tree(newmnt, path)))
 		goto unlock;
 
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-22 16:02:14.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-22 17:48:16.000000000 +0200
@@ -573,18 +573,31 @@ int do_remount_sb(struct super_block *sb
 	/* If we are remounting RDONLY and current sb is read/write,
 	   make sure there are no rw files opened */
 	if (remount_ro) {
-		if (force)
+		if (force) {
 			mark_files_ro(sb);
-		else if (!fs_may_remount_ro(sb))
-			return -EBUSY;
+		} else {
+			retval = sb_prepare_remount_readonly(sb);
+			if (retval)
+				return retval;
+
+			retval = -EBUSY;
+			if (!fs_may_remount_ro(sb))
+				goto cancel_readonly;
+		}
 	}
 
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		/* If forced remount, go ahead despite any errors */
-		if (retval && !force)
+		if (retval && !force) {
+			if (remount_ro)
+				goto cancel_readonly;
 			return retval;
+		}
 	}
+	if (remount_ro && !force)
+		sb_finish_remount_readonly(sb);
+
 	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);
 
 	/*
@@ -598,6 +611,10 @@ int do_remount_sb(struct super_block *sb
 	if (remount_ro && sb->s_bdev)
 		invalidate_bdev(sb->s_bdev);
 	return 0;
+
+cancel_readonly:
+	sb_cancel_remount_readonly(sb);
+	return retval;
 }
 
 static void do_emergency_remount(struct work_struct *work)
Index: linux-2.6/include/linux/mount.h
===================================================================
--- linux-2.6.orig/include/linux/mount.h	2010-10-22 16:02:14.000000000 +0200
+++ linux-2.6/include/linux/mount.h	2010-10-22 17:59:30.000000000 +0200
@@ -30,6 +30,7 @@ struct mnt_namespace;
 
 #define MNT_SHRINKABLE	0x100
 #define MNT_WRITE_HOLD	0x200
+#define MNT_WAS_WRITE	0x400   /* used by remount to remember write mounts */
 
 #define MNT_SHARED	0x1000	/* if the vfsmount is a shared mount */
 #define MNT_UNBINDABLE	0x2000	/* if the vfsmount is a unbindable mount */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9 updated] vfs: protect remounting superblock read-only
  2010-10-22 16:14         ` [PATCH 7/9 updated] " Miklos Szeredi
@ 2010-10-23 16:19           ` Al Viro
  2010-10-23 19:35             ` Miklos Szeredi
  2010-10-25 12:36             ` Miklos Szeredi
  0 siblings, 2 replies; 19+ messages in thread
From: Al Viro @ 2010-10-23 16:19 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: dave, akpm, linux-fsdevel, linux-kernel

On Fri, Oct 22, 2010 at 06:14:01PM +0200, Miklos Szeredi wrote:

> @@ -1782,6 +1844,14 @@ int do_add_mount(struct vfsmount *newmnt
>  
>  	mnt_flags &= ~(MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL);

Obviously not enough - you've just added a new flag that needs to be
trimmed from mnt_flags.

> +	/* Locking is necessary to prevent racing with remount r/o */
> +	down_read(&newmnt->mnt_sb->s_umount);
> +	if (newmnt->mnt_sb->s_flags & MS_RDONLY)
> +		mnt_flags |= MNT_READONLY;
> +
> +	newmnt->mnt_flags = mnt_flags;
> +	up_read(&newmnt->mnt_sb->s_umount);

FWIW, I really don't like the way you are doing that; what we really need
there is a per-sb analog of mnt_want_write()/mnt_drop_write().  With
mnt_want_write() bumping per-sb write count, which would solve all these
problems quite nicely.

NOTE: vfsmount being ro and sb being ro are *independent* things; either
is enough to deny writes.  Having remount ro + remount rw lose the state
of other vfsmounts is a Bad Thing(tm).

Another thing:
	"If clone_mnt() happens while mnt_make_readonly() is running, the
	cloned mount might have MNT_WRITE_HOLD flag set, which results in
	mnt_want_write() spinning forever on this mount."
actually means
	"neither clone_mnt() nor remounts should ever be done without
namespace_sem held exclusive; if that ever happens, we have a serious
bug that can lead to any number of bad things happening".

Do you actually see such places?  If so, that's what needs fixing.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9 updated] vfs: protect remounting superblock read-only
  2010-10-23 16:19           ` Al Viro
@ 2010-10-23 19:35             ` Miklos Szeredi
  2010-10-23 21:42               ` Al Viro
  2010-10-25 12:36             ` Miklos Szeredi
  1 sibling, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-23 19:35 UTC (permalink / raw)
  To: Al Viro; +Cc: miklos, dave, akpm, linux-fsdevel, linux-kernel

On Sat, 23 Oct 2010, Al Viro wrote:
> On Fri, Oct 22, 2010 at 06:14:01PM +0200, Miklos Szeredi wrote:
> 
> > @@ -1782,6 +1844,14 @@ int do_add_mount(struct vfsmount *newmnt
> >  
> >  	mnt_flags &= ~(MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL);
> 
> Obviously not enough - you've just added a new flag that needs to be
> trimmed from mnt_flags.
> 
> > +	/* Locking is necessary to prevent racing with remount r/o */
> > +	down_read(&newmnt->mnt_sb->s_umount);
> > +	if (newmnt->mnt_sb->s_flags & MS_RDONLY)
> > +		mnt_flags |= MNT_READONLY;
> > +
> > +	newmnt->mnt_flags = mnt_flags;
> > +	up_read(&newmnt->mnt_sb->s_umount);
> 
> FWIW, I really don't like the way you are doing that; what we really need
> there is a per-sb analog of mnt_want_write()/mnt_drop_write().  With
> mnt_want_write() bumping per-sb write count, which would solve all these
> problems quite nicely.
> 
> NOTE: vfsmount being ro and sb being ro are *independent* things;

Yes, except the mount(2) API which doens't quite let them be changed
independently.

>  either
> is enough to deny writes.  Having remount ro + remount rw lose the state
> of other vfsmounts is a Bad Thing(tm).

Hmm.

> 
> Another thing:
> 	"If clone_mnt() happens while mnt_make_readonly() is running, the
> 	cloned mount might have MNT_WRITE_HOLD flag set, which results in
> 	mnt_want_write() spinning forever on this mount."
> actually means
> 	"neither clone_mnt() nor remounts should ever be done without
> namespace_sem held exclusive; if that ever happens, we have a serious
> bug that can lead to any number of bad things happening".
> 
> Do you actually see such places?  If so, that's what needs fixing.

do_remount() takes s_umount, but not namespace_sem.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9 updated] vfs: protect remounting superblock read-only
  2010-10-23 19:35             ` Miklos Szeredi
@ 2010-10-23 21:42               ` Al Viro
  0 siblings, 0 replies; 19+ messages in thread
From: Al Viro @ 2010-10-23 21:42 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: dave, akpm, linux-fsdevel, linux-kernel

On Sat, Oct 23, 2010 at 09:35:16PM +0200, Miklos Szeredi wrote:

> > Another thing:
> > 	"If clone_mnt() happens while mnt_make_readonly() is running, the
> > 	cloned mount might have MNT_WRITE_HOLD flag set, which results in
> > 	mnt_want_write() spinning forever on this mount."
> > actually means
> > 	"neither clone_mnt() nor remounts should ever be done without
> > namespace_sem held exclusive; if that ever happens, we have a serious
> > bug that can lead to any number of bad things happening".
> > 
> > Do you actually see such places?  If so, that's what needs fixing.
> 
> do_remount() takes s_umount, but not namespace_sem.

Duh...  Right, ignore that part; we really don't want to do anything
blocking beyond simple allocations under namespace_sem (e.g. everything
that gets unmounted is collected to be dropped after namespace_sem
is released).

Applied.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9 updated] vfs: protect remounting superblock read-only
  2010-10-23 16:19           ` Al Viro
  2010-10-23 19:35             ` Miklos Szeredi
@ 2010-10-25 12:36             ` Miklos Szeredi
  1 sibling, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2010-10-25 12:36 UTC (permalink / raw)
  To: Al Viro; +Cc: miklos, dave, akpm, linux-fsdevel, linux-kernel

On Sat, 23 Oct 2010, Al Viro wrote:
> > +	/* Locking is necessary to prevent racing with remount r/o */
> > +	down_read(&newmnt->mnt_sb->s_umount);
> > +	if (newmnt->mnt_sb->s_flags & MS_RDONLY)
> > +		mnt_flags |= MNT_READONLY;
> > +
> > +	newmnt->mnt_flags = mnt_flags;
> > +	up_read(&newmnt->mnt_sb->s_umount);
> 
> FWIW, I really don't like the way you are doing that; what we really need
> there is a per-sb analog of mnt_want_write()/mnt_drop_write().  With
> mnt_want_write() bumping per-sb write count, which would solve all these
> problems quite nicely.

And add a nice bit of overhead to a hot path...  Making it per-cpu
would help, but then we'd end up with the same ifdef mess as
mnt_writers and still a non-zero overhead.

But what's the point anyway, the per-sb write count always equals
SUM(mnt->mnt_writers).

How about this variant?  It addresses the vfsmount vs. sb read only
status independence:


Index: linux-2.6/fs/internal.h
===================================================================
--- linux-2.6.orig/fs/internal.h	2010-10-25 12:45:22.000000000 +0200
+++ linux-2.6/fs/internal.h	2010-10-25 12:48:08.000000000 +0200
@@ -69,6 +69,7 @@ extern void mnt_set_mountpoint(struct vf
 extern void release_mounts(struct list_head *);
 extern void umount_tree(struct vfsmount *, int, struct list_head *);
 extern struct vfsmount *copy_tree(struct vfsmount *, struct dentry *, int);
+extern int sb_prepare_remount_readonly(struct super_block *);
 
 extern void __init mnt_init(void);
 
Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2010-10-25 12:45:22.000000000 +0200
+++ linux-2.6/fs/namespace.c	2010-10-25 14:14:35.000000000 +0200
@@ -251,6 +251,15 @@ static unsigned int count_mnt_writers(st
 #endif
 }
 
+static int mnt_is_readonly(struct vfsmount *mnt)
+{
+	if (mnt->mnt_sb->s_readonly_remount)
+		return 1;
+	/* Order wrt setting s_flags/s_readonly_remount in do_remount() */
+	smp_rmb();
+	return __mnt_is_readonly(mnt);
+}
+
 /*
  * Most r/o checks on a fs are for operations that take
  * discrete amounts of time, like a write() or unlink().
@@ -289,7 +298,7 @@ int mnt_want_write(struct vfsmount *mnt)
 	 * MNT_WRITE_HOLD is cleared.
 	 */
 	smp_rmb();
-	if (__mnt_is_readonly(mnt)) {
+	if (mnt_is_readonly(mnt)) {
 		dec_mnt_writers(mnt);
 		ret = -EROFS;
 		goto out;
@@ -406,6 +415,35 @@ static void __mnt_unmake_readonly(struct
 	br_write_unlock(vfsmount_lock);
 }
 
+int sb_prepare_remount_readonly(struct super_block *sb)
+{
+	struct vfsmount *mnt;
+	int err = 0;
+
+	br_write_lock(vfsmount_lock);
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (!(mnt->mnt_flags & MNT_READONLY)) {
+			mnt->mnt_flags |= MNT_WRITE_HOLD;
+			smp_mb();
+			if (count_mnt_writers(mnt) > 0) {
+				err = -EBUSY;
+				break;
+			}
+		}
+	}
+	if (!err) {
+		sb->s_readonly_remount = 1;
+		smp_wmb();
+	}
+	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
+		if (mnt->mnt_flags & MNT_WRITE_HOLD)
+			mnt->mnt_flags &= ~MNT_WRITE_HOLD;
+	}
+	br_write_unlock(vfsmount_lock);
+
+	return err;
+}
+
 void simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
 	mnt->mnt_sb = sb;
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2010-10-25 12:45:22.000000000 +0200
+++ linux-2.6/fs/super.c	2010-10-25 14:08:58.000000000 +0200
@@ -573,19 +573,29 @@ int do_remount_sb(struct super_block *sb
 	/* If we are remounting RDONLY and current sb is read/write,
 	   make sure there are no rw files opened */
 	if (remount_ro) {
-		if (force)
+		if (force) {
 			mark_files_ro(sb);
-		else if (!fs_may_remount_ro(sb))
-			return -EBUSY;
+		} else {
+			retval = sb_prepare_remount_readonly(sb);
+			if (retval)
+				return retval;
+
+			retval = -EBUSY;
+			if (!fs_may_remount_ro(sb))
+				goto cancel_readonly;
+		}
 	}
 
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		/* If forced remount, go ahead despite any errors */
 		if (retval && !force)
-			return retval;
+			goto cancel_readonly;
 	}
 	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);
+	/* Needs to be ordered wrt mnt_is_readonly() */
+	smp_wmb();
+	sb->s_readonly_remount = 0;
 
 	/*
 	 * Some filesystems modify their metadata via some other path than the
@@ -598,6 +608,10 @@ int do_remount_sb(struct super_block *sb
 	if (remount_ro && sb->s_bdev)
 		invalidate_bdev(sb->s_bdev);
 	return 0;
+
+cancel_readonly:
+	sb->s_readonly_remount = 0;
+	return retval;
 }
 
 static void do_emergency_remount(struct work_struct *work)
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-10-25 11:43:25.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2010-10-25 13:38:26.000000000 +0200
@@ -1386,6 +1386,9 @@ struct super_block {
 	 * generic_show_options()
 	 */
 	char *s_options;
+
+	/* Being remounted read-only */
+	int s_readonly_remount;
 };
 
 extern struct timespec current_fs_time(struct super_block *sb);

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-10-25 12:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-05 10:31 [PATCH 0/9] read-only remount fixes Miklos Szeredi
2010-10-05 10:31 ` [PATCH 1/9] vfs: fix infinite loop caused by clone_mnt race Miklos Szeredi
2010-10-05 10:49   ` Miklos Szeredi
2010-10-05 10:31 ` [PATCH 2/9] vfs: ignore error on forced remount Miklos Szeredi
2010-10-05 10:31 ` [PATCH 3/9] vfs: fix per mount read-write Miklos Szeredi
2010-10-05 10:31 ` [PATCH 4/9] vfs: add sb_force_remount_readonly() helper Miklos Szeredi
2010-10-05 10:31 ` [PATCH 5/9] vfs: allow mnt_want_write() to sleep Miklos Szeredi
2010-10-05 10:31 ` [PATCH 6/9] vfs: keep list of mounts for each superblock Miklos Szeredi
2010-10-05 10:31 ` [PATCH 7/9] vfs: protect remounting superblock read-only Miklos Szeredi
2010-10-22  6:46   ` Al Viro
2010-10-22 12:25     ` Miklos Szeredi
2010-10-22 16:10       ` Miklos Szeredi
2010-10-22 16:14         ` [PATCH 7/9 updated] " Miklos Szeredi
2010-10-23 16:19           ` Al Viro
2010-10-23 19:35             ` Miklos Szeredi
2010-10-23 21:42               ` Al Viro
2010-10-25 12:36             ` Miklos Szeredi
2010-10-05 10:31 ` [PATCH 8/9] vfs: fs_may_remount_ro: turn unnecessary check into a WARN_ON Miklos Szeredi
2010-10-05 10:31 ` [PATCH 9/9] vfs: mark mounts read-only on forced remount Miklos Szeredi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.