All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9 v5] fsfreeze: miscellaneous fixes and cleanups
@ 2012-10-05  5:24 Fernando Luis Vázquez Cao
  2012-10-05  5:31 ` [PATCH 1/9] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-10-05  5:24 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, linux-fsdevel

This patch set is to address long standing issues in the filesytem freeze code
and to fill some functionality gaps in the API. Some minor code rearrangements
are included too.

The following patches are included:

---
[1/9] vfs: add __iterate_supers() helper
[2/9] fsfreeze: add unlocked version of thaw_super

Preparatory patches to fix s_umount lockup of emergency thaw code.

[3/9] fsfreeze: Prevent emergency thaw from looping infinitely

Fix thaw_bdev so that it propagates the error code properly to the caller.
This bug caused emergency thaw to loop infinitely. This is a forward port of
a previous patch by Dave Chinner.

[4/9] fsfreeze: emergency thaw will deadlock on s_umount

Avoid emergency thaw deadlock on s_umount by using unlocked version of
thaw_super() and __iterate_supers()i (introduced in patches 2 and 1
respectively).

[5/9] xfs: switch to using super methods for fsfreeze
[6/9] fsfreeze: move emergency thaw code to fs/super.c

Two cleanups (based on previous patches by David Chinner): convert xfs to using
superblock methods for fsfreeze and move emergency thaw code to where it
belongs.

[7/9] fsfreeze: freeze_super and thaw_bdev don't play well together

These patches address the bdev/sb mismatch and the fact that sb level freezing
does not nest correctly. For all the places that the block device interfaces
are used, we need a superblock anyway so we may as well make freeze/thaw work
only at the superblock level. There is one caveat (pointed out by Christoph in
the discussion linked above) though: we still need to support the "feature"
that we can freeze a block device that does not have a filesystem mounted yet
(this can be used to prevent a filesystem mount during a dm snapshot for
example).

This series moves part of the nesting code to the superblock from the block
device level but retains the freeze_bdev/thaw_bdev API so that we can keep the
"feature" mentioned above for those who may need it, namely dm and external
users (yes, block device level API is exported for external use). All the other
users of the block device API are converted to work at the superblock level.

[8/9] fsfreeze: add vfs ioctl to check freeze state
[9/9] fsfreeze: add block device ioctl to check freeze state

These ioctls can be used to by HA and monitoring software to check the freeze
state of block devices and filesystems.
---

In a follow-up patch set I will tackle a big outstanding problem: we allow
users to umount a frozen filesystem but we have no bdev level fsfreeze API to
thaw it after that. My plan is to add the bdev level API but this requires
some modifications to btrfs and vfs layer that I am still working on. By
the way, the approach of returning EBUSY on umount was discarded because it
would break lazy umount.

Changes since v4:
  - Make implementation of iterate_supers_read and iterate_supers_write
    symmetric and update locking of emergency thaw code accordingly.
  - Add Jan Kara's Reviewed-by to patch 2.
  - Add explanation about the change from a block device based emergency thaw
    to a superblock based one to patch 4.
  - Modify patch 7 so that the new superblock based emergency does not leave
    the block device level fsfreeze counter in an inconsistent state.
  - Update and comment handling of fsfreeze during mount.

Changes since v3:
  - Include right version of the emergency thaw fix (thanks go to Josef for the
    heads up).
  - Add iterate_supers_(read/write) helpers as suggested by Eric Sandeen.
  - Fix typos.
  - Add Eric Sandeen's Reviewed-by to patches 2 and 3.

Changes since v2:
  - Rebase on top of Jan Kara's patch set which made it upstream.
  - Do without the horrible "emergency" parameter as suggested by Jan Kara.
  - New approach to fix emergency thaw deadlock on s_umount which does locking
    properly.

Changes since v1:
  - Do not break unfreezing of block devices without a mounted filesystem.
  - Avoid conditional locking in thaw_bdev.
  - Update filesystem locking documentation.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/9] vfs: add __iterate_supers() and helpers around it
  2012-10-05  5:24 [PATCH 0/9 v5] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
@ 2012-10-05  5:31 ` Fernando Luis Vázquez Cao
  2012-10-08 13:48   ` Jan Kara
  2012-10-05  5:33 ` [PATCH 2/9] fsfreeze: add unlocked version of thaw_super Fernando Luis Vázquez Cao
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-10-05  5:31 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, linux-fsdevel

iterate_supers() calls a function provided by the caller with the s_umount
semaphore taken in read mode. However, there may be cases where write mode
is preferable, so we add __iterate_supers(), which lets one
specify the mode of the lock, and replace iterate_supers with two helpers
around __iterate_supers(), iterate_supers_read() and iterate_supers_write().

This will be used to fix the emergency thaw (filesystem unfreeze) code, which
iterates over the list of superblocks but needs to hold the s_umount semaphore
in _write_ mode bebore carrying out the actual thaw operation.

This patch introduces no semantic changes since current iterate_supers()
callers are just updated to use iterate_supers_read() instead.

Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.6.0-rc7-orig/fs/buffer.c linux-3.6.0-rc7/fs/buffer.c
--- linux-3.6.0-rc7-orig/fs/buffer.c	2012-09-24 13:45:05.569520902 +0900
+++ linux-3.6.0-rc7/fs/buffer.c	2012-09-26 13:06:06.578342989 +0900
@@ -521,7 +521,7 @@ static void do_thaw_one(struct super_blo
 
 static void do_thaw_all(struct work_struct *work)
 {
-	iterate_supers(do_thaw_one, NULL);
+	iterate_supers_read(do_thaw_one, NULL);
 	kfree(work);
 	printk(KERN_WARNING "Emergency Thaw complete\n");
 }
diff -urNp linux-3.6.0-rc7-orig/fs/drop_caches.c linux-3.6.0-rc7/fs/drop_caches.c
--- linux-3.6.0-rc7-orig/fs/drop_caches.c	2012-07-22 05:58:29.000000000 +0900
+++ linux-3.6.0-rc7/fs/drop_caches.c	2012-09-26 13:06:06.582342990 +0900
@@ -59,7 +59,7 @@ int drop_caches_sysctl_handler(ctl_table
 		return ret;
 	if (write) {
 		if (sysctl_drop_caches & 1)
-			iterate_supers(drop_pagecache_sb, NULL);
+			iterate_supers_read(drop_pagecache_sb, NULL);
 		if (sysctl_drop_caches & 2)
 			drop_slab();
 	}
diff -urNp linux-3.6.0-rc7-orig/fs/quota/quota.c linux-3.6.0-rc7/fs/quota/quota.c
--- linux-3.6.0-rc7-orig/fs/quota/quota.c	2012-09-24 13:45:06.105520990 +0900
+++ linux-3.6.0-rc7/fs/quota/quota.c	2012-09-26 13:06:06.594342992 +0900
@@ -58,7 +58,7 @@ static int quota_sync_all(int type)
 		return -EINVAL;
 	ret = security_quotactl(Q_SYNC, type, 0, NULL);
 	if (!ret)
-		iterate_supers(quota_sync_one, &type);
+		iterate_supers_read(quota_sync_one, &type);
 	return ret;
 }
 
diff -urNp linux-3.6.0-rc7-orig/fs/super.c linux-3.6.0-rc7/fs/super.c
--- linux-3.6.0-rc7-orig/fs/super.c	2012-09-24 13:45:06.129520994 +0900
+++ linux-3.6.0-rc7/fs/super.c	2012-09-26 13:14:39.462346665 +0900
@@ -537,14 +537,16 @@ void drop_super(struct super_block *sb)
 EXPORT_SYMBOL(drop_super);
 
 /**
- *	iterate_supers - call function for all active superblocks
+ *	__iterate_supers - call function for all active superblocks
  *	@f: function to call
  *	@arg: argument to pass to it
+ *	@wlock: mode of superblock lock (false->read lock, true->write lock)
  *
  *	Scans the superblock list and calls given function, passing it
  *	locked superblock and given argument.
  */
-void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
+static void __iterate_supers(void (*f)(struct super_block *, void *), void *arg,
+			     bool wlock)
 {
 	struct super_block *sb, *p = NULL;
 
@@ -555,10 +557,18 @@ void iterate_supers(void (*f)(struct sup
 		sb->s_count++;
 		spin_unlock(&sb_lock);
 
-		down_read(&sb->s_umount);
+		if (wlock)
+			down_write(&sb->s_umount);
+		else
+			down_read(&sb->s_umount);
+
 		if (sb->s_root && (sb->s_flags & MS_BORN))
 			f(sb, arg);
-		up_read(&sb->s_umount);
+
+		if (wlock)
+			up_write(&sb->s_umount);
+		else
+			up_read(&sb->s_umount);
 
 		spin_lock(&sb_lock);
 		if (p)
@@ -571,6 +581,34 @@ void iterate_supers(void (*f)(struct sup
 }
 
 /**
+ *	iterate_supers_read - call function for all active superblocks
+ *	@f: function to call
+ *	@arg: argument to pass to it
+ *
+ *	Scans the superblock list and calls given function, passing it
+ *	a superblock locked in _read_ mode and given argument. The lock
+ *	is automatically relased after the function returns.
+ */
+void iterate_supers_read(void (*f)(struct super_block *, void *), void *arg)
+{
+	__iterate_supers(f, arg , false);
+}
+
+/**
+ *	iterate_supers_write - call function for all active superblocks
+ *	@f: function to call
+ *	@arg: argument to pass to it
+ *
+ *	Scans the superblock list and calls given function, passing it
+ *	a superblock locked in _write_ mode and given argument. The lock
+ *	is automatically relased after the function returns.
+ */
+void iterate_supers_write(void (*f)(struct super_block *, void *), void *arg)
+{
+	__iterate_supers(f, arg , true);
+}
+
+/**
  *	iterate_supers_type - call function for superblocks of given type
  *	@type: fs type
  *	@f: function to call
diff -urNp linux-3.6.0-rc7-orig/fs/sync.c linux-3.6.0-rc7/fs/sync.c
--- linux-3.6.0-rc7-orig/fs/sync.c	2012-09-24 13:45:06.129520994 +0900
+++ linux-3.6.0-rc7/fs/sync.c	2012-09-26 13:06:06.594342992 +0900
@@ -104,9 +104,9 @@ SYSCALL_DEFINE0(sync)
 	int nowait = 0, wait = 1;
 
 	wakeup_flusher_threads(0, WB_REASON_SYNC);
-	iterate_supers(sync_inodes_one_sb, NULL);
-	iterate_supers(sync_fs_one_sb, &nowait);
-	iterate_supers(sync_fs_one_sb, &wait);
+	iterate_supers_read(sync_inodes_one_sb, NULL);
+	iterate_supers_read(sync_fs_one_sb, &nowait);
+	iterate_supers_read(sync_fs_one_sb, &wait);
 	iterate_bdevs(fdatawrite_one_bdev, NULL);
 	iterate_bdevs(fdatawait_one_bdev, NULL);
 	if (unlikely(laptop_mode))
@@ -122,11 +122,11 @@ static void do_sync_work(struct work_str
 	 * Sync twice to reduce the possibility we skipped some inodes / pages
 	 * because they were temporarily locked
 	 */
-	iterate_supers(sync_inodes_one_sb, &nowait);
-	iterate_supers(sync_fs_one_sb, &nowait);
+	iterate_supers_read(sync_inodes_one_sb, &nowait);
+	iterate_supers_read(sync_fs_one_sb, &nowait);
 	iterate_bdevs(fdatawrite_one_bdev, NULL);
-	iterate_supers(sync_inodes_one_sb, &nowait);
-	iterate_supers(sync_fs_one_sb, &nowait);
+	iterate_supers_read(sync_inodes_one_sb, &nowait);
+	iterate_supers_read(sync_fs_one_sb, &nowait);
 	iterate_bdevs(fdatawrite_one_bdev, NULL);
 	printk("Emergency Sync complete\n");
 	kfree(work);
diff -urNp linux-3.6.0-rc7-orig/include/linux/fs.h linux-3.6.0-rc7/include/linux/fs.h
--- linux-3.6.0-rc7-orig/include/linux/fs.h	2012-09-24 13:45:06.301521033 +0900
+++ linux-3.6.0-rc7/include/linux/fs.h	2012-09-26 13:06:06.598342993 +0900
@@ -2684,7 +2684,8 @@ extern struct super_block *get_super(str
 extern struct super_block *get_super_thawed(struct block_device *);
 extern struct super_block *get_active_super(struct block_device *bdev);
 extern void drop_super(struct super_block *sb);
-extern void iterate_supers(void (*)(struct super_block *, void *), void *);
+extern void iterate_supers_read(void (*)(struct super_block *, void *), void *);
+extern void iterate_supers_write(void (*)(struct super_block *, void *), void *);
 extern void iterate_supers_type(struct file_system_type *,
 			        void (*)(struct super_block *, void *), void *);
 
diff -urNp linux-3.6.0-rc7-orig/security/selinux/hooks.c linux-3.6.0-rc7/security/selinux/hooks.c
--- linux-3.6.0-rc7-orig/security/selinux/hooks.c	2012-09-24 13:45:08.121521275 +0900
+++ linux-3.6.0-rc7/security/selinux/hooks.c	2012-09-26 13:06:06.610342995 +0900
@@ -5755,7 +5755,7 @@ void selinux_complete_init(void)
 
 	/* Set up any superblocks initialized prior to the policy load. */
 	printk(KERN_DEBUG "SELinux:  Setting up existing superblocks.\n");
-	iterate_supers(delayed_superblock_init, NULL);
+	iterate_supers_read(delayed_superblock_init, NULL);
 }
 
 /* SELinux requires early initialization in order to label



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 2/9] fsfreeze: add unlocked version of thaw_super
  2012-10-05  5:24 [PATCH 0/9 v5] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
  2012-10-05  5:31 ` [PATCH 1/9] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
@ 2012-10-05  5:33 ` Fernando Luis Vázquez Cao
  2012-10-05  5:34 ` [PATCH 3/9] fsfreeze: Prevent emergency thaw from looping infinitely Fernando Luis Vázquez Cao
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-10-05  5:33 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, linux-fsdevel

thaw_super may be called with superblock lock already taken (fsfreeze's
emergency thaw being one example), so we need an unlocked version to avoid
lockups.

Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.6.0-rc7-orig/fs/super.c linux-3.6.0-rc7/fs/super.c
--- linux-3.6.0-rc7-orig/fs/super.c	2012-09-26 13:20:14.854365058 +0900
+++ linux-3.6.0-rc7/fs/super.c	2012-09-26 13:20:36.342365207 +0900
@@ -1428,40 +1428,59 @@ int freeze_super(struct super_block *sb)
 EXPORT_SYMBOL(freeze_super);
 
 /**
- * thaw_super -- unlock filesystem
+ * __thaw_super -- unlock filesystem
  * @sb: the super to thaw
  *
- * Unlocks the filesystem and marks it writeable again after freeze_super().
+ * Unlocks the filesystem and marks it writeable again.
+ *
+ * This is the unlocked version of thaw_super, so it is the caller's job to
+ * to protect the superblock by grabbing the s_umount semaphore in write mode
+ * and release it again on return. See thaw_super() for an example.
  */
-int thaw_super(struct super_block *sb)
+int __thaw_super(struct super_block *sb)
 {
-	int error;
+	int error = 0;
 
-	down_write(&sb->s_umount);
 	if (sb->s_writers.frozen == SB_UNFROZEN) {
-		up_write(&sb->s_umount);
-		return -EINVAL;
+		error = -EINVAL;
+		goto out;
 	}
 
 	if (sb->s_flags & MS_RDONLY)
-		goto out;
+		goto out_thaw;
 
 	if (sb->s_op->unfreeze_fs) {
 		error = sb->s_op->unfreeze_fs(sb);
 		if (error) {
 			printk(KERN_ERR
 				"VFS:Filesystem thaw failed\n");
-			up_write(&sb->s_umount);
-			return error;
+			goto out;
 		}
 	}
 
-out:
+out_thaw:
 	sb->s_writers.frozen = SB_UNFROZEN;
 	smp_wmb();
 	wake_up(&sb->s_writers.wait_unfrozen);
-	deactivate_locked_super(sb);
+out:
+	return error;
+}
 
-	return 0;
+/**
+ * thaw_super -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ */
+int thaw_super(struct super_block *sb)
+{
+	int res;
+	down_write(&sb->s_umount);
+	res = __thaw_super(sb);
+	if (!res)
+		deactivate_locked_super(sb);
+	else
+		up_write(&sb->s_umount);
+	return res;
 }
 EXPORT_SYMBOL(thaw_super);
diff -urNp linux-3.6.0-rc7-orig/include/linux/fs.h linux-3.6.0-rc7/include/linux/fs.h
--- linux-3.6.0-rc7-orig/include/linux/fs.h	2012-09-26 13:20:14.858365059 +0900
+++ linux-3.6.0-rc7/include/linux/fs.h	2012-09-26 13:20:36.358365209 +0900
@@ -2082,6 +2082,7 @@ extern int user_statfs(const char __user
 extern int fd_statfs(int, struct kstatfs *);
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
+extern int __thaw_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
 extern bool our_mnt(struct vfsmount *mnt);
 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3/9] fsfreeze: Prevent emergency thaw from looping infinitely
  2012-10-05  5:24 [PATCH 0/9 v5] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
  2012-10-05  5:31 ` [PATCH 1/9] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
  2012-10-05  5:33 ` [PATCH 2/9] fsfreeze: add unlocked version of thaw_super Fernando Luis Vázquez Cao
@ 2012-10-05  5:34 ` Fernando Luis Vázquez Cao
  2012-10-05  5:35 ` [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-10-05  5:34 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, linux-fsdevel

The thawing of a filesystem through sysrq-j loops infinitely as it
incorrectly detects a thawed filesytsem as frozen and tries to
unfreeze repeatedly. This is a regression caused by
4504230a71566785a05d3e6b53fa1ee071b864eb ("freeze_bdev: grab active
reference to frozen superblocks") in that it no longer returned
-EINVAL for superblocks that were not frozen.

Return -EINVAL when the filesystem is already unfrozen to avoid this
problem.

Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp vfs-orig/fs/block_dev.c vfs/fs/block_dev.c
--- vfs-orig/fs/block_dev.c	2012-07-04 18:57:54.000000000 +0900
+++ vfs/fs/block_dev.c	2012-07-12 13:22:38.124815295 +0900
@@ -319,22 +319,17 @@ int thaw_bdev(struct block_device *bdev,
 	if (!bdev->bd_fsfreeze_count)
 		goto out;
 
-	error = 0;
-	if (--bdev->bd_fsfreeze_count > 0)
-		goto out;
-
-	if (!sb)
+	if (--bdev->bd_fsfreeze_count > 0 || !sb) {
+		error = 0;
 		goto out;
+	}
 
 	error = thaw_super(sb);
-	if (error) {
+	if (error)
 		bdev->bd_fsfreeze_count++;
-		mutex_unlock(&bdev->bd_fsfreeze_mutex);
-		return error;
-	}
 out:
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
-	return 0;
+	return error;
 }
 EXPORT_SYMBOL(thaw_bdev);
 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount
  2012-10-05  5:24 [PATCH 0/9 v5] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (2 preceding siblings ...)
  2012-10-05  5:34 ` [PATCH 3/9] fsfreeze: Prevent emergency thaw from looping infinitely Fernando Luis Vázquez Cao
@ 2012-10-05  5:35 ` Fernando Luis Vázquez Cao
  2012-10-08 13:57   ` Jan Kara
  2012-10-05  5:38 ` [PATCH 5/9] xfs: switch to using super methods for fsfreeze Fernando Luis Vázquez Cao
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-10-05  5:35 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, linux-fsdevel

The emergency thaw process uses iterate_super() which holds the
sb->s_umount lock in read mode. The current thaw_super() code takes
the sb->s_umount lock in write mode, hence leading to an instant
deadlock.
 
Use the unlocked version of thaw_super() to do the thawing and replace
iterate_supers() with __iterate_supers() so that the unfreeze operation can
be performed with s_umount held as the locking rules for fsfreeze indicate.

As a bonus, by using thaw_super(), which does not nest, instead of thaw_bdev()
when can get rid of the ugly while loop.

Jan Kara pointed out that with this approach we will leave the block devices
frozen, but this is a problem we have had since the introduction of the
superblock level API: if we thaw the filesystem using the superblock level API
(be it through the thaw ioctl or emergency thaw) the bdev level freeze
reference counter (bd_fsfreeze_count) will not be updated and even though
subsequent calls to thaw_bdev() will decrease it it will never get back to 0
(if thaw_super() returns an error, and it will when the superblock is unfrozen,
thaw_bdev() will return without decreasing the counter). The solution I propose
(and will be implementing in the followup patch "fsfreeze: freeze_super and
thaw_bdev don't play well together") is letting bd_fsfreeze_count
become zero when the superblock sitting on top of it is unfrozen, so that
future calls to freeze_bdev() actually try to freeze the superblock.

Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.6.0-rc7-orig/fs/buffer.c linux-3.6.0-rc7/fs/buffer.c
--- linux-3.6.0-rc7-orig/fs/buffer.c	2012-09-26 13:20:14.842365056 +0900
+++ linux-3.6.0-rc7/fs/buffer.c	2012-09-26 15:02:22.630595704 +0900
@@ -513,15 +513,28 @@ repeat:
 
 static void do_thaw_one(struct super_block *sb, void *unused)
 {
-	char b[BDEVNAME_SIZE];
-	while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb))
-		printk(KERN_WARNING "Emergency Thaw on %s\n",
+	int res;
+
+	if (sb->s_bdev) {
+		char b[BDEVNAME_SIZE];
+		printk(KERN_WARNING "Emergency Thaw on %s.\n",
 		       bdevname(sb->s_bdev, b));
+	}
+
+	/* We got here from __iterate_supers with the superblock lock taken
+	 * so we can call the lockless version of thaw_super() safely. */
+	res = __thaw_super(sb);
+	/* If we are going to drop the final active reference call
+	 * deactivate_locked_super to clean things up. In the general case
+	 * we avoid calling deactivate_locked_super() because it would relase
+	 * the superblock lock, which is __iterate_supers()'s job. */
+	if (!res && !atomic_add_unless(&sb->s_active, -1, 1))
+		deactivate_locked_super(sb);
 }
 
 static void do_thaw_all(struct work_struct *work)
 {
-	iterate_supers_read(do_thaw_one, NULL);
+	iterate_supers_write(do_thaw_one, NULL);
 	kfree(work);
 	printk(KERN_WARNING "Emergency Thaw complete\n");
 }



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 5/9] xfs: switch to using super methods for fsfreeze
  2012-10-05  5:24 [PATCH 0/9 v5] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (3 preceding siblings ...)
  2012-10-05  5:35 ` [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
@ 2012-10-05  5:38 ` Fernando Luis Vázquez Cao
  2012-10-05  5:39 ` [PATCH 6/9] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-10-05  5:38 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, linux-fsdevel

xfs uses the bdev interfaces despite the fact that the kernel is operating
at the superblock level here. Convert xfs to use the superblock interfaces
instead.

Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.6-rc5-orig/fs/xfs/xfs_fsops.c linux-3.6-rc5/fs/xfs/xfs_fsops.c
--- linux-3.6-rc5-orig/fs/xfs/xfs_fsops.c	2012-07-22 05:58:29.000000000 +0900
+++ linux-3.6-rc5/fs/xfs/xfs_fsops.c	2012-09-14 13:20:44.525043726 +0900
@@ -664,16 +664,12 @@ xfs_fs_goingdown(
 	__uint32_t	inflags)
 {
 	switch (inflags) {
-	case XFS_FSOP_GOING_FLAGS_DEFAULT: {
-		struct super_block *sb = freeze_bdev(mp->m_super->s_bdev);
-
-		if (sb && !IS_ERR(sb)) {
+	case XFS_FSOP_GOING_FLAGS_DEFAULT:
+		if (!freeze_super(mp->m_super)) {
 			xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
-			thaw_bdev(sb->s_bdev, sb);
+			thaw_super(mp->m_super);
 		}
-
 		break;
-	}
 	case XFS_FSOP_GOING_FLAGS_LOGFLUSH:
 		xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
 		break;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 6/9] fsfreeze: move emergency thaw code to fs/super.c
  2012-10-05  5:24 [PATCH 0/9 v5] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (4 preceding siblings ...)
  2012-10-05  5:38 ` [PATCH 5/9] xfs: switch to using super methods for fsfreeze Fernando Luis Vázquez Cao
@ 2012-10-05  5:39 ` Fernando Luis Vázquez Cao
  2012-10-05  5:42 ` [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-10-05  5:39 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, linux-fsdevel

It makes no sense having the emergency thaw code in fs/buffer.c when all of
it's operations are one superblocks and the code it executes is all in
fs/super.c. Move the code there and clean it up.

Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.6.0-rc7-orig/fs/buffer.c linux-3.6.0-rc7/fs/buffer.c
--- linux-3.6.0-rc7-orig/fs/buffer.c	2012-09-26 16:18:39.010705981 +0900
+++ linux-3.6.0-rc7/fs/buffer.c	2012-09-26 16:30:27.050760551 +0900
@@ -511,50 +511,6 @@ repeat:
 	return err;
 }
 
-static void do_thaw_one(struct super_block *sb, void *unused)
-{
-	int res;
-
-	if (sb->s_bdev) {
-		char b[BDEVNAME_SIZE];
-		printk(KERN_WARNING "Emergency Thaw on %s.\n",
-		       bdevname(sb->s_bdev, b));
-	}
-
-	/* We got here from __iterate_supers with the superblock lock taken
-	 * so we can call the lockless version of thaw_super() safely. */
-	res = __thaw_super(sb);
-	/* If we are going to drop the final active reference call
-	 * deactivate_locked_super to clean things up. In the general case
-	 * we avoid calling deactivate_locked_super() because it would relase
-	 * the superblock lock, which is __iterate_supers()'s job. */
-	if (!res && !atomic_add_unless(&sb->s_active, -1, 1))
-		deactivate_locked_super(sb);
-}
-
-static void do_thaw_all(struct work_struct *work)
-{
-	iterate_supers_write(do_thaw_one, NULL);
-	kfree(work);
-	printk(KERN_WARNING "Emergency Thaw complete\n");
-}
-
-/**
- * emergency_thaw_all -- forcibly thaw every frozen filesystem
- *
- * Used for emergency unfreeze of all filesystems via SysRq
- */
-void emergency_thaw_all(void)
-{
-	struct work_struct *work;
-
-	work = kmalloc(sizeof(*work), GFP_ATOMIC);
-	if (work) {
-		INIT_WORK(work, do_thaw_all);
-		schedule_work(work);
-	}
-}
-
 /**
  * sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
  * @mapping: the mapping which wants those buffers written
diff -urNp linux-3.6.0-rc7-orig/fs/super.c linux-3.6.0-rc7/fs/super.c
--- linux-3.6.0-rc7-orig/fs/super.c	2012-09-26 13:25:58.422382440 +0900
+++ linux-3.6.0-rc7/fs/super.c	2012-09-26 16:32:00.650761566 +0900
@@ -603,7 +603,7 @@ void iterate_supers_read(void (*f)(struc
  *	a superblock locked in _write_ mode and given argument. The lock
  *	is automatically relased after the function returns.
  */
-void iterate_supers_write(void (*f)(struct super_block *, void *), void *arg)
+static void iterate_supers_write(void (*f)(struct super_block *, void *), void *arg)
 {
 	__iterate_supers(f, arg , true);
 }
@@ -1437,7 +1437,7 @@ EXPORT_SYMBOL(freeze_super);
  * to protect the superblock by grabbing the s_umount semaphore in write mode
  * and release it again on return. See thaw_super() for an example.
  */
-int __thaw_super(struct super_block *sb)
+static int __thaw_super(struct super_block *sb)
 {
 	int error = 0;
 
@@ -1484,3 +1484,47 @@ int thaw_super(struct super_block *sb)
 	return res;
 }
 EXPORT_SYMBOL(thaw_super);
+
+static void do_thaw_one(struct super_block *sb, void *unused)
+{
+	int res;
+
+	if (sb->s_bdev) {
+		char b[BDEVNAME_SIZE];
+		printk(KERN_WARNING "Emergency Thaw on %s.\n",
+		       bdevname(sb->s_bdev, b));
+	}
+
+	/* We got here from __iterate_supers with the superblock lock taken
+	 * so we can call the lockless version of thaw_super() safely. */
+	res = __thaw_super(sb);
+	/* If we are going to drop the final active reference call
+	 * deactivate_locked_super to clean things up. In the general case
+	 * we avoid calling deactivate_locked_super() because it would relase
+	 * the superblock lock, which is __iterate_supers()'s job. */
+	if (!res && !atomic_add_unless(&sb->s_active, -1, 1))
+		deactivate_locked_super(sb);
+}
+
+static void do_thaw_all(struct work_struct *work)
+{
+	iterate_supers_write(do_thaw_one, NULL);
+	kfree(work);
+	printk(KERN_WARNING "Emergency Thaw complete\n");
+}
+
+/**
+ * emergency_thaw_all -- forcibly thaw every frozen filesystem
+ *
+ * Used for emergency unfreeze of all filesystems via SysRq
+ */
+void emergency_thaw_all(void)
+{
+	struct work_struct *work;
+
+	work = kmalloc(sizeof(*work), GFP_ATOMIC);
+	if (work) {
+		INIT_WORK(work, do_thaw_all);
+		schedule_work(work);
+	}
+}
diff -urNp linux-3.6.0-rc7-orig/include/linux/fs.h linux-3.6.0-rc7/include/linux/fs.h
--- linux-3.6.0-rc7-orig/include/linux/fs.h	2012-09-26 13:25:58.430382441 +0900
+++ linux-3.6.0-rc7/include/linux/fs.h	2012-09-26 16:33:34.934762455 +0900
@@ -2082,7 +2082,6 @@ extern int user_statfs(const char __user
 extern int fd_statfs(int, struct kstatfs *);
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
-extern int __thaw_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
 extern bool our_mnt(struct vfsmount *mnt);
 
@@ -2686,7 +2685,6 @@ extern struct super_block *get_super_tha
 extern struct super_block *get_active_super(struct block_device *bdev);
 extern void drop_super(struct super_block *sb);
 extern void iterate_supers_read(void (*)(struct super_block *, void *), void *);
-extern void iterate_supers_write(void (*)(struct super_block *, void *), void *);
 extern void iterate_supers_type(struct file_system_type *,
 			        void (*)(struct super_block *, void *), void *);
 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
  2012-10-05  5:24 [PATCH 0/9 v5] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (5 preceding siblings ...)
  2012-10-05  5:39 ` [PATCH 6/9] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
@ 2012-10-05  5:42 ` Fernando Luis Vázquez Cao
  2012-10-08 14:17   ` Jan Kara
  2012-10-05  5:43 ` [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao
  2012-10-05  5:44 ` [PATCH 9/9] fsfreeze: add block device " Fernando Luis Vázquez Cao
  8 siblings, 1 reply; 23+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-10-05  5:42 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, linux-fsdevel, Konishi Ryusuke, Steven Whitehouse

thaw_bdev() has re-entrancy guards to allow freezes to nest
together. That is, it ensures that the filesystem is not thawed
until the last thaw command is issued. This is needed to prevent the
filesystem from being unfrozen while an existing freezer is still
operating on the filesystem in a frozen state (e.g. dm-snapshot).

Currently, freeze_super() and thaw_super() bypasses these guards,
and as a result manual freezing and unfreezing via the ioctl methods
do not nest correctly. hence mixing userspace directed freezes with
block device level freezes result in inconsistency due to premature
thawing of the filesystem.

Move the re-enterency guards to the superblock and into freeze_super
and thaw_super() so that userspace directed freezes nest correctly
again. Caveat: Things work as expected as long as direct calls to
thaw_super are always in response to a previous sb level freeze. In
other words an unpaired call to thaw_super can still thaw a
filesystem frozen using freeze_bdev (this issue could be addressed
in a follow-up patch if deemed necessary).

This patch retains the bdev level mutex and counter to keep the
"feature" that we can freeze a block device that does not have a
filesystem mounted yet.

IMPORTANT CAVEAT: This patch restores the nesting feature of the fsfreeze ioctl
which got removed by commit 18e9e5104fcd9a973ffe3eed3816c87f2a1b6cd2
("Introduce freeze_super and thaw_super for the fsfreeze ioctl"). It could be
argued that it is too late to fix the userland ABI breakage caused by that
patch and that the current ABI is the one that should be kept. If this is the
respective maintainer(s) opinion this could be modified to not allow fsfreeze
ioctl nesting.

Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Konishi Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.6-orig/fs/block_dev.c linux-3.6/fs/block_dev.c
--- linux-3.6-orig/fs/block_dev.c	2012-10-04 15:05:42.168084928 +0900
+++ linux-3.6/fs/block_dev.c	2012-10-04 15:08:40.228086607 +0900
@@ -257,16 +257,18 @@ int fsync_bdev(struct block_device *bdev
 EXPORT_SYMBOL(fsync_bdev);
 
 /**
- * freeze_bdev  --  lock a filesystem and force it into a consistent state
+ * freeze_bdev  --  lock a block device
  * @bdev:	blockdevice to lock
  *
- * If a superblock is found on this device, we take the s_umount semaphore
- * on it to make sure nobody unmounts until the snapshot creation is done.
- * The reference counter (bd_fsfreeze_count) guarantees that only the last
- * unfreeze process can unfreeze the frozen filesystem actually when multiple
- * freeze requests arrive simultaneously. It counts up in freeze_bdev() and
- * count down in thaw_bdev(). When it becomes 0, thaw_bdev() will unfreeze
- * actually.
+ * Locks the block device and, if present, the associated filesystem too.
+ *
+ * The reference counter (bd_fsfreeze_count) is used to implement the feature
+ * that allows one to freeze a block device that does not have a filesystem
+ * mounted yet. For filesystems using mount_bdev the kernel takes care of
+ * things by preventing the mount operation from succeeding if the underlying
+ * block device is frozen. Other filesystems should check this counter or risk
+ * a situation where a freeze_bdev user (e.g. dm snapshot) and mount race,
+ * which may lead to inconsistencies.
  */
 struct super_block *freeze_bdev(struct block_device *bdev)
 {
@@ -274,17 +276,7 @@ struct super_block *freeze_bdev(struct b
 	int error = 0;
 
 	mutex_lock(&bdev->bd_fsfreeze_mutex);
-	if (++bdev->bd_fsfreeze_count > 1) {
-		/*
-		 * We don't even need to grab a reference - the first call
-		 * to freeze_bdev grab an active reference and only the last
-		 * thaw_bdev drops it.
-		 */
-		sb = get_super(bdev);
-		drop_super(sb);
-		mutex_unlock(&bdev->bd_fsfreeze_mutex);
-		return sb;
-	}
+	bdev->bd_fsfreeze_count++;
 
 	sb = get_active_super(bdev);
 	if (!sb)
@@ -297,36 +289,48 @@ struct super_block *freeze_bdev(struct b
 		return ERR_PTR(error);
 	}
 	deactivate_super(sb);
- out:
+out:
 	sync_blockdev(bdev);
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
-	return sb;	/* thaw_bdev releases s->s_umount */
+	return sb;
 }
 EXPORT_SYMBOL(freeze_bdev);
 
 /**
- * thaw_bdev  -- unlock filesystem
+ * thaw_bdev  -- unlock a block device
  * @bdev:	blockdevice to unlock
  * @sb:		associated superblock
  *
- * Unlocks the filesystem and marks it writeable again after freeze_bdev().
+ * Unlocks the block device and, if present, the associated filesystem too.
  */
 int thaw_bdev(struct block_device *bdev, struct super_block *sb)
 {
 	int error = -EINVAL;
 
 	mutex_lock(&bdev->bd_fsfreeze_mutex);
+
 	if (!bdev->bd_fsfreeze_count)
 		goto out;
 
-	if (--bdev->bd_fsfreeze_count > 0 || !sb) {
+	bdev->bd_fsfreeze_count--;
+
+	if (!sb) {
 		error = 0;
 		goto out;
 	}
 
 	error = thaw_super(sb);
-	if (error)
+	/* If the superblock is already unfrozen, i.e. thaw_super() returned
+	 * -EINVAL, we consider the block device level thaw successful. This
+	 * behavior is important in a scenario where a filesystem frozen using
+	 * freeze_bdev() is thawed through the superblock level API; if we
+	 * caused the subsequent thaw_bdev() to fail bdev->bd_fsfreeze_count
+	 * would not go back to 0 which means that future calls to freeze_bdev()
+	 * would not freeze the superblock, just increase the counter. */
+	if (error && error != -EINVAL)
 		bdev->bd_fsfreeze_count++;
+	else
+		error = 0;
 out:
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	return error;
diff -urNp linux-3.6-orig/fs/gfs2/ops_fstype.c linux-3.6/fs/gfs2/ops_fstype.c
--- linux-3.6-orig/fs/gfs2/ops_fstype.c	2012-10-01 08:47:46.000000000 +0900
+++ linux-3.6/fs/gfs2/ops_fstype.c	2012-10-04 15:08:40.228086607 +0900
@@ -1289,9 +1289,13 @@ static struct dentry *gfs2_mount(struct
 		return ERR_CAST(bdev);
 
 	/*
-	 * once the super is inserted into the list by sget, s_umount
-	 * will protect the lockfs code from trying to start a snapshot
-	 * while we are mounting
+	 * Once the super is inserted into the list by sget, s_umount will
+	 * protect the block device level fsfreeze code from trying to start a
+	 * snapshot while we are mounting. There is one caveat though: the
+	 * mount can still fail after sget(), in which case we would release
+	 * s_umount before the superblock is fully initialized. The fsfreeze
+	 * code takes care of the latter case by canceling the freeze and
+	 * bailing out if the MS_BORN flag is not set in the superblock.
 	 */
 	mutex_lock(&bdev->bd_fsfreeze_mutex);
 	if (bdev->bd_fsfreeze_count > 0) {
diff -urNp linux-3.6-orig/fs/nilfs2/super.c linux-3.6/fs/nilfs2/super.c
--- linux-3.6-orig/fs/nilfs2/super.c	2012-10-01 08:47:46.000000000 +0900
+++ linux-3.6/fs/nilfs2/super.c	2012-10-04 15:08:40.228086607 +0900
@@ -1274,9 +1274,13 @@ nilfs_mount(struct file_system_type *fs_
 	}
 
 	/*
-	 * once the super is inserted into the list by sget, s_umount
-	 * will protect the lockfs code from trying to start a snapshot
-	 * while we are mounting
+	 * Once the super is inserted into the list by sget, s_umount will
+	 * protect the block device level fsfreeze code from trying to start a
+	 * snapshot while we are mounting. There is one caveat though: the
+	 * mount can still fail after sget(), in which case we would release
+	 * s_umount before the superblock is fully initialized. The fsfreeze
+	 * code takes care of the latter case by canceling the freeze and
+	 * bailing out if the MS_BORN flag is not set in the superblock.
 	 */
 	mutex_lock(&sd.bdev->bd_fsfreeze_mutex);
 	if (sd.bdev->bd_fsfreeze_count > 0) {
diff -urNp linux-3.6-orig/fs/super.c linux-3.6/fs/super.c
--- linux-3.6-orig/fs/super.c	2012-10-04 15:06:00.264085275 +0900
+++ linux-3.6/fs/super.c	2012-10-04 15:09:21.164086840 +0900
@@ -1033,9 +1033,13 @@ struct dentry *mount_bdev(struct file_sy
 		return ERR_CAST(bdev);
 
 	/*
-	 * once the super is inserted into the list by sget, s_umount
-	 * will protect the lockfs code from trying to start a snapshot
-	 * while we are mounting
+	 * Once the super is inserted into the list by sget, s_umount will
+	 * protect the block device level fsfreeze code from trying to start a
+	 * snapshot while we are mounting. There is one caveat though: the
+	 * mount can still fail after sget(), in which case we would release
+	 * s_umount before the superblock is fully initialized. The fsfreeze
+	 * code takes care of the latter case by canceling the freeze and
+	 * bailing out if the MS_BORN flag is not set in the superblock.
 	 */
 	mutex_lock(&bdev->bd_fsfreeze_mutex);
 	if (bdev->bd_fsfreeze_count > 0) {
@@ -1330,8 +1334,11 @@ static void sb_wait_write(struct super_b
  * @sb: the super to lock
  *
  * Syncs the super to make sure the filesystem is consistent and calls the fs's
- * freeze_fs.  Subsequent calls to this without first thawing the fs will return
- * -EBUSY.
+ * freeze_fs. The reference counter (s_freeze_count) guarantees that only the
+ * last unfreeze process can unfreeze the frozen filesystem actually when
+ * multiple freeze requests arrive simultaneously. It counts up in
+ * freeze_super() and counts down in thaw_super(). When it becomes 0,
+ * thaw_super() will execute the unfreeze.
  *
  * During this function, sb->s_writers.frozen goes through these values:
  *
@@ -1356,29 +1363,29 @@ static void sb_wait_write(struct super_b
  * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
  * mostly auxiliary for filesystems to verify they do not modify frozen fs.
  *
- * sb->s_writers.frozen is protected by sb->s_umount.
+ * sb->s_writers.frozen and sb->s_freeze_count are protected by sb->s_umount.
  */
 int freeze_super(struct super_block *sb)
 {
-	int ret;
+	int ret = 0;
 
 	atomic_inc(&sb->s_active);
 	down_write(&sb->s_umount);
-	if (sb->s_writers.frozen != SB_UNFROZEN) {
-		deactivate_locked_super(sb);
-		return -EBUSY;
-	}
 
-	if (!(sb->s_flags & MS_BORN)) {
-		up_write(&sb->s_umount);
-		return 0;	/* sic - it's "nothing to do" */
-	}
+	if (++sb->s_freeze_count > 1)
+		goto out_deactivate;
+
+	/* If MS_BORN is not set it means we have a failing mount (this is
+	 * possible if we got here from freeze_bdev()). Keep an active
+	 * reference so that the superblock is not killed until it is thawed
+	 * via thaw_bdev(). */
+	if (!(sb->s_flags & MS_BORN))
+		goto out_unlock;
 
 	if (sb->s_flags & MS_RDONLY) {
 		/* Nothing to do really... */
 		sb->s_writers.frozen = SB_FREEZE_COMPLETE;
-		up_write(&sb->s_umount);
-		return 0;
+		goto out_unlock;
 	}
 
 	/* From now on, no new normal writers can start */
@@ -1410,11 +1417,11 @@ int freeze_super(struct super_block *sb)
 		if (ret) {
 			printk(KERN_ERR
 				"VFS:Filesystem freeze failed\n");
+			sb->s_freeze_count--;
 			sb->s_writers.frozen = SB_UNFROZEN;
 			smp_wmb();
 			wake_up(&sb->s_writers.wait_unfrozen);
-			deactivate_locked_super(sb);
-			return ret;
+			goto out_deactivate;
 		}
 	}
 	/*
@@ -1422,8 +1429,12 @@ int freeze_super(struct super_block *sb)
 	 * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
 	 */
 	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+out_unlock:
 	up_write(&sb->s_umount);
-	return 0;
+	return ret;
+out_deactivate:
+	deactivate_locked_super(sb);
+	return ret;
 }
 EXPORT_SYMBOL(freeze_super);
 
@@ -1433,6 +1444,10 @@ EXPORT_SYMBOL(freeze_super);
  *
  * Unlocks the filesystem and marks it writeable again.
  *
+ * Returns -EINVAL if @sb is not frozen, the number of nested freezes remaining
+ * after this thaw if it succeeded or the corresponding error code otherwise.
+ * If the unfreeze fails, @sb is left in the frozen state.
+ *
  * This is the unlocked version of thaw_super, so it is the caller's job to
  * to protect the superblock by grabbing the s_umount semaphore in write mode
  * and release it again on return. See thaw_super() for an example.
@@ -1441,11 +1456,16 @@ static int __thaw_super(struct super_blo
 {
 	int error = 0;
 
-	if (sb->s_writers.frozen == SB_UNFROZEN) {
+	if (!sb->s_freeze_count) {
 		error = -EINVAL;
 		goto out;
 	}
 
+	if (--sb->s_freeze_count > 0) {
+		error = sb->s_freeze_count;
+		goto out;
+	}
+
 	if (sb->s_flags & MS_RDONLY)
 		goto out_thaw;
 
@@ -1454,6 +1474,7 @@ static int __thaw_super(struct super_blo
 		if (error) {
 			printk(KERN_ERR
 				"VFS:Filesystem thaw failed\n");
+			sb->s_freeze_count++;
 			goto out;
 		}
 	}
@@ -1477,11 +1498,11 @@ int thaw_super(struct super_block *sb)
 	int res;
 	down_write(&sb->s_umount);
 	res = __thaw_super(sb);
-	if (!res)
+	if (!res) /* Active reference released after last thaw. */
 		deactivate_locked_super(sb);
 	else
 		up_write(&sb->s_umount);
-	return res;
+	return res > 0 ? 0 : res;
 }
 EXPORT_SYMBOL(thaw_super);
 
@@ -1489,6 +1510,9 @@ static void do_thaw_one(struct super_blo
 {
 	int res;
 
+	if (sb->s_writers.frozen == SB_UNFROZEN)
+		return;
+
 	if (sb->s_bdev) {
 		char b[BDEVNAME_SIZE];
 		printk(KERN_WARNING "Emergency Thaw on %s.\n",
@@ -1497,6 +1521,7 @@ static void do_thaw_one(struct super_blo
 
 	/* We got here from __iterate_supers with the superblock lock taken
 	 * so we can call the lockless version of thaw_super() safely. */
+	sb->s_freeze_count = 1; /* avoid multiple calls to __thaw_super */
 	res = __thaw_super(sb);
 	/* If we are going to drop the final active reference call
 	 * deactivate_locked_super to clean things up. In the general case
diff -urNp linux-3.6-orig/include/linux/fs.h linux-3.6/include/linux/fs.h
--- linux-3.6-orig/include/linux/fs.h	2012-10-04 15:06:00.264085275 +0900
+++ linux-3.6/include/linux/fs.h	2012-10-04 15:08:40.232086608 +0900
@@ -1578,6 +1578,9 @@ struct super_block {
 
 	/* Being remounted read-only */
 	int s_readonly_remount;
+
+	/* Number of nested freezes */
+	int s_freeze_count;
 };
 
 /* superblock cache pruning functions */



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state
  2012-10-05  5:24 [PATCH 0/9 v5] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (6 preceding siblings ...)
  2012-10-05  5:42 ` [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
@ 2012-10-05  5:43 ` Fernando Luis Vázquez Cao
  2012-10-08 15:05   ` Jan Kara
  2012-10-05  5:44 ` [PATCH 9/9] fsfreeze: add block device " Fernando Luis Vázquez Cao
  8 siblings, 1 reply; 23+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-10-05  5:43 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, linux-fsdevel

The FIISFROZEN ioctl can be use by HA and monitoring software to check
the freeze state of a mounted filesystem.

Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.6-orig/fs/compat_ioctl.c linux-3.6/fs/compat_ioctl.c
--- linux-3.6-orig/fs/compat_ioctl.c	2012-10-01 08:47:46.000000000 +0900
+++ linux-3.6/fs/compat_ioctl.c	2012-10-04 17:29:50.060102922 +0900
@@ -885,6 +885,7 @@ COMPATIBLE_IOCTL(FIGETBSZ)
 /* 'X' - originally XFS but some now in the VFS */
 COMPATIBLE_IOCTL(FIFREEZE)
 COMPATIBLE_IOCTL(FITHAW)
+COMPATIBLE_IOCTL(FIISFROZEN)
 COMPATIBLE_IOCTL(KDGETKEYCODE)
 COMPATIBLE_IOCTL(KDSETKEYCODE)
 COMPATIBLE_IOCTL(KDGKBTYPE)
diff -urNp linux-3.6-orig/fs/ioctl.c linux-3.6/fs/ioctl.c
--- linux-3.6-orig/fs/ioctl.c	2012-10-01 08:47:46.000000000 +0900
+++ linux-3.6/fs/ioctl.c	2012-10-04 17:29:50.060102922 +0900
@@ -536,6 +536,16 @@ static int ioctl_fsthaw(struct file *fil
 	return thaw_super(sb);
 }
 
+static int ioctl_fs_isfrozen(struct file *filp)
+{
+	struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	return isfrozen_super(sb);
+}
+
 /*
  * When you add any new common ioctls to the switches above and below
  * please update compat_sys_ioctl() too.
@@ -585,6 +595,12 @@ int do_vfs_ioctl(struct file *filp, unsi
 		error = ioctl_fsthaw(filp);
 		break;
 
+	case FIISFROZEN:
+		error = ioctl_fs_isfrozen(filp);
+		if (error >= 0)
+			return put_user(error, (int __user *)arg);
+		break;
+
 	case FS_IOC_FIEMAP:
 		return ioctl_fiemap(filp, arg);
 
diff -urNp linux-3.6-orig/fs/super.c linux-3.6/fs/super.c
--- linux-3.6-orig/fs/super.c	2012-10-04 17:28:45.440102318 +0900
+++ linux-3.6/fs/super.c	2012-10-04 17:29:50.060102922 +0900
@@ -1553,3 +1553,8 @@ void emergency_thaw_all(void)
 		schedule_work(work);
 	}
 }
+
+int isfrozen_super(struct super_block *sb)
+{
+	return sb->s_writers.frozen > SB_UNFROZEN ? 1 : 0;
+}
diff -urNp linux-3.6-orig/include/linux/fs.h linux-3.6/include/linux/fs.h
--- linux-3.6-orig/include/linux/fs.h	2012-10-04 17:28:45.440102318 +0900
+++ linux-3.6/include/linux/fs.h	2012-10-04 17:29:50.060102922 +0900
@@ -342,6 +342,7 @@ struct inodes_stat_t {
 #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
 #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
 #define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
+#define FIISFROZEN	_IOR('X', 122, int)	/* get sb freeze state */
 
 #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
 #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
@@ -2086,6 +2087,7 @@ extern int fd_statfs(int, struct kstatfs
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
+extern int isfrozen_super(struct super_block *sb);
 extern bool our_mnt(struct vfsmount *mnt);
 
 extern int current_umask(void);



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 9/9] fsfreeze: add block device ioctl to check freeze state
  2012-10-05  5:24 [PATCH 0/9 v5] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (7 preceding siblings ...)
  2012-10-05  5:43 ` [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao
@ 2012-10-05  5:44 ` Fernando Luis Vázquez Cao
  8 siblings, 0 replies; 23+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-10-05  5:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, linux-fsdevel

The BLKISFROZEN ioctl can be used to check the freeze state of a block device
(it is possible to freeze a block device that does not have a filesystem
sitting on top of it).

If allowing the umounting of frozen filesystems is deemed acceptable this ioctl
will be extended to check the state of the associated superblock if any.

Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.6-orig/block/compat_ioctl.c linux-3.6/block/compat_ioctl.c
--- linux-3.6-orig/block/compat_ioctl.c	2012-10-01 08:47:46.000000000 +0900
+++ linux-3.6/block/compat_ioctl.c	2012-10-04 12:38:51.860275389 +0900
@@ -746,6 +746,13 @@ long compat_blkdev_ioctl(struct file *fi
 	case BLKTRACETEARDOWN: /* compatible */
 		ret = blk_trace_ioctl(bdev, cmd, compat_ptr(arg));
 		return ret;
+	case BLKISFROZEN:
+		if (!capable(CAP_SYS_ADMIN))
+			return -EPERM;
+		ret = isfrozen_bdev(bdev);
+		if (ret >= 0)
+			return compat_put_int(arg, ret);
+		return ret;
 	default:
 		if (disk->fops->compat_ioctl)
 			ret = disk->fops->compat_ioctl(bdev, mode, cmd, arg);
diff -urNp linux-3.6-orig/block/ioctl.c linux-3.6/block/ioctl.c
--- linux-3.6-orig/block/ioctl.c	2012-10-01 08:47:46.000000000 +0900
+++ linux-3.6/block/ioctl.c	2012-10-04 12:38:51.860275389 +0900
@@ -396,6 +396,13 @@ int blkdev_ioctl(struct block_device *bd
 	case BLKTRACETEARDOWN:
 		ret = blk_trace_ioctl(bdev, cmd, (char __user *) arg);
 		break;
+	case BLKISFROZEN:
+		if (!capable(CAP_SYS_ADMIN))
+			return -EPERM;
+		ret = isfrozen_bdev(bdev);
+		if (ret >= 0)
+			return put_int(arg, ret);
+		return ret;
 	default:
 		ret = __blkdev_driver_ioctl(bdev, mode, cmd, arg);
 	}
diff -urNp linux-3.6-orig/fs/block_dev.c linux-3.6/fs/block_dev.c
--- linux-3.6-orig/fs/block_dev.c	2012-10-04 12:35:51.828269380 +0900
+++ linux-3.6/fs/block_dev.c	2012-10-04 12:38:51.860275389 +0900
@@ -337,6 +337,11 @@ out:
 }
 EXPORT_SYMBOL(thaw_bdev);
 
+int isfrozen_bdev(struct block_device *bdev)
+{
+	return bdev->bd_fsfreeze_count > 0;
+}
+
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
diff -urNp linux-3.6-orig/include/linux/fs.h linux-3.6/include/linux/fs.h
--- linux-3.6-orig/include/linux/fs.h	2012-10-04 12:38:27.056275050 +0900
+++ linux-3.6/include/linux/fs.h	2012-10-04 12:38:51.860275389 +0900
@@ -335,6 +335,7 @@ struct inodes_stat_t {
 #define BLKDISCARDZEROES _IO(0x12,124)
 #define BLKSECDISCARD _IO(0x12,125)
 #define BLKROTATIONAL _IO(0x12,126)
+#define BLKISFROZEN _IOR(0x12,127, int)	/* get file system freeze state */
 
 #define BMAP_IOCTL 1		/* obsolete - kept for compatibility */
 #define FIBMAP	   _IO(0x00,1)	/* bmap access */
@@ -2251,6 +2252,7 @@ extern void kill_bdev(struct block_devic
 extern struct super_block *freeze_bdev(struct block_device *);
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
+extern int isfrozen_bdev(struct block_device *bdev);
 extern int fsync_bdev(struct block_device *);
 #else
 static inline void bd_forget(struct inode *inode) {}
@@ -2271,6 +2273,11 @@ static inline int thaw_bdev(struct block
 static inline void iterate_bdevs(void (*f)(struct block_device *, void *), void *arg)
 {
 }
+
+static int isfrozen_bdev(struct block_device *bdev)
+{
+	return 0;
+}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/9] vfs: add __iterate_supers() and helpers around it
  2012-10-05  5:31 ` [PATCH 1/9] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
@ 2012-10-08 13:48   ` Jan Kara
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2012-10-08 13:48 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Jan Kara, linux-fsdevel

On Fri 05-10-12 14:31:46, Fernando Luis Vázquez Cao wrote:
> iterate_supers() calls a function provided by the caller with the s_umount
> semaphore taken in read mode. However, there may be cases where write mode
> is preferable, so we add __iterate_supers(), which lets one
> specify the mode of the lock, and replace iterate_supers with two helpers
> around __iterate_supers(), iterate_supers_read() and iterate_supers_write().
> 
> This will be used to fix the emergency thaw (filesystem unfreeze) code, which
> iterates over the list of superblocks but needs to hold the s_umount semaphore
> in _write_ mode bebore carrying out the actual thaw operation.
> 
> This patch introduces no semantic changes since current iterate_supers()
> callers are just updated to use iterate_supers_read() instead.
  The patch looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

									Honza
> 
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
> 
> diff -urNp linux-3.6.0-rc7-orig/fs/buffer.c linux-3.6.0-rc7/fs/buffer.c
> --- linux-3.6.0-rc7-orig/fs/buffer.c	2012-09-24 13:45:05.569520902 +0900
> +++ linux-3.6.0-rc7/fs/buffer.c	2012-09-26 13:06:06.578342989 +0900
> @@ -521,7 +521,7 @@ static void do_thaw_one(struct super_blo
>  
>  static void do_thaw_all(struct work_struct *work)
>  {
> -	iterate_supers(do_thaw_one, NULL);
> +	iterate_supers_read(do_thaw_one, NULL);
>  	kfree(work);
>  	printk(KERN_WARNING "Emergency Thaw complete\n");
>  }
> diff -urNp linux-3.6.0-rc7-orig/fs/drop_caches.c linux-3.6.0-rc7/fs/drop_caches.c
> --- linux-3.6.0-rc7-orig/fs/drop_caches.c	2012-07-22 05:58:29.000000000 +0900
> +++ linux-3.6.0-rc7/fs/drop_caches.c	2012-09-26 13:06:06.582342990 +0900
> @@ -59,7 +59,7 @@ int drop_caches_sysctl_handler(ctl_table
>  		return ret;
>  	if (write) {
>  		if (sysctl_drop_caches & 1)
> -			iterate_supers(drop_pagecache_sb, NULL);
> +			iterate_supers_read(drop_pagecache_sb, NULL);
>  		if (sysctl_drop_caches & 2)
>  			drop_slab();
>  	}
> diff -urNp linux-3.6.0-rc7-orig/fs/quota/quota.c linux-3.6.0-rc7/fs/quota/quota.c
> --- linux-3.6.0-rc7-orig/fs/quota/quota.c	2012-09-24 13:45:06.105520990 +0900
> +++ linux-3.6.0-rc7/fs/quota/quota.c	2012-09-26 13:06:06.594342992 +0900
> @@ -58,7 +58,7 @@ static int quota_sync_all(int type)
>  		return -EINVAL;
>  	ret = security_quotactl(Q_SYNC, type, 0, NULL);
>  	if (!ret)
> -		iterate_supers(quota_sync_one, &type);
> +		iterate_supers_read(quota_sync_one, &type);
>  	return ret;
>  }
>  
> diff -urNp linux-3.6.0-rc7-orig/fs/super.c linux-3.6.0-rc7/fs/super.c
> --- linux-3.6.0-rc7-orig/fs/super.c	2012-09-24 13:45:06.129520994 +0900
> +++ linux-3.6.0-rc7/fs/super.c	2012-09-26 13:14:39.462346665 +0900
> @@ -537,14 +537,16 @@ void drop_super(struct super_block *sb)
>  EXPORT_SYMBOL(drop_super);
>  
>  /**
> - *	iterate_supers - call function for all active superblocks
> + *	__iterate_supers - call function for all active superblocks
>   *	@f: function to call
>   *	@arg: argument to pass to it
> + *	@wlock: mode of superblock lock (false->read lock, true->write lock)
>   *
>   *	Scans the superblock list and calls given function, passing it
>   *	locked superblock and given argument.
>   */
> -void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
> +static void __iterate_supers(void (*f)(struct super_block *, void *), void *arg,
> +			     bool wlock)
>  {
>  	struct super_block *sb, *p = NULL;
>  
> @@ -555,10 +557,18 @@ void iterate_supers(void (*f)(struct sup
>  		sb->s_count++;
>  		spin_unlock(&sb_lock);
>  
> -		down_read(&sb->s_umount);
> +		if (wlock)
> +			down_write(&sb->s_umount);
> +		else
> +			down_read(&sb->s_umount);
> +
>  		if (sb->s_root && (sb->s_flags & MS_BORN))
>  			f(sb, arg);
> -		up_read(&sb->s_umount);
> +
> +		if (wlock)
> +			up_write(&sb->s_umount);
> +		else
> +			up_read(&sb->s_umount);
>  
>  		spin_lock(&sb_lock);
>  		if (p)
> @@ -571,6 +581,34 @@ void iterate_supers(void (*f)(struct sup
>  }
>  
>  /**
> + *	iterate_supers_read - call function for all active superblocks
> + *	@f: function to call
> + *	@arg: argument to pass to it
> + *
> + *	Scans the superblock list and calls given function, passing it
> + *	a superblock locked in _read_ mode and given argument. The lock
> + *	is automatically relased after the function returns.
> + */
> +void iterate_supers_read(void (*f)(struct super_block *, void *), void *arg)
> +{
> +	__iterate_supers(f, arg , false);
> +}
> +
> +/**
> + *	iterate_supers_write - call function for all active superblocks
> + *	@f: function to call
> + *	@arg: argument to pass to it
> + *
> + *	Scans the superblock list and calls given function, passing it
> + *	a superblock locked in _write_ mode and given argument. The lock
> + *	is automatically relased after the function returns.
> + */
> +void iterate_supers_write(void (*f)(struct super_block *, void *), void *arg)
> +{
> +	__iterate_supers(f, arg , true);
> +}
> +
> +/**
>   *	iterate_supers_type - call function for superblocks of given type
>   *	@type: fs type
>   *	@f: function to call
> diff -urNp linux-3.6.0-rc7-orig/fs/sync.c linux-3.6.0-rc7/fs/sync.c
> --- linux-3.6.0-rc7-orig/fs/sync.c	2012-09-24 13:45:06.129520994 +0900
> +++ linux-3.6.0-rc7/fs/sync.c	2012-09-26 13:06:06.594342992 +0900
> @@ -104,9 +104,9 @@ SYSCALL_DEFINE0(sync)
>  	int nowait = 0, wait = 1;
>  
>  	wakeup_flusher_threads(0, WB_REASON_SYNC);
> -	iterate_supers(sync_inodes_one_sb, NULL);
> -	iterate_supers(sync_fs_one_sb, &nowait);
> -	iterate_supers(sync_fs_one_sb, &wait);
> +	iterate_supers_read(sync_inodes_one_sb, NULL);
> +	iterate_supers_read(sync_fs_one_sb, &nowait);
> +	iterate_supers_read(sync_fs_one_sb, &wait);
>  	iterate_bdevs(fdatawrite_one_bdev, NULL);
>  	iterate_bdevs(fdatawait_one_bdev, NULL);
>  	if (unlikely(laptop_mode))
> @@ -122,11 +122,11 @@ static void do_sync_work(struct work_str
>  	 * Sync twice to reduce the possibility we skipped some inodes / pages
>  	 * because they were temporarily locked
>  	 */
> -	iterate_supers(sync_inodes_one_sb, &nowait);
> -	iterate_supers(sync_fs_one_sb, &nowait);
> +	iterate_supers_read(sync_inodes_one_sb, &nowait);
> +	iterate_supers_read(sync_fs_one_sb, &nowait);
>  	iterate_bdevs(fdatawrite_one_bdev, NULL);
> -	iterate_supers(sync_inodes_one_sb, &nowait);
> -	iterate_supers(sync_fs_one_sb, &nowait);
> +	iterate_supers_read(sync_inodes_one_sb, &nowait);
> +	iterate_supers_read(sync_fs_one_sb, &nowait);
>  	iterate_bdevs(fdatawrite_one_bdev, NULL);
>  	printk("Emergency Sync complete\n");
>  	kfree(work);
> diff -urNp linux-3.6.0-rc7-orig/include/linux/fs.h linux-3.6.0-rc7/include/linux/fs.h
> --- linux-3.6.0-rc7-orig/include/linux/fs.h	2012-09-24 13:45:06.301521033 +0900
> +++ linux-3.6.0-rc7/include/linux/fs.h	2012-09-26 13:06:06.598342993 +0900
> @@ -2684,7 +2684,8 @@ extern struct super_block *get_super(str
>  extern struct super_block *get_super_thawed(struct block_device *);
>  extern struct super_block *get_active_super(struct block_device *bdev);
>  extern void drop_super(struct super_block *sb);
> -extern void iterate_supers(void (*)(struct super_block *, void *), void *);
> +extern void iterate_supers_read(void (*)(struct super_block *, void *), void *);
> +extern void iterate_supers_write(void (*)(struct super_block *, void *), void *);
>  extern void iterate_supers_type(struct file_system_type *,
>  			        void (*)(struct super_block *, void *), void *);
>  
> diff -urNp linux-3.6.0-rc7-orig/security/selinux/hooks.c linux-3.6.0-rc7/security/selinux/hooks.c
> --- linux-3.6.0-rc7-orig/security/selinux/hooks.c	2012-09-24 13:45:08.121521275 +0900
> +++ linux-3.6.0-rc7/security/selinux/hooks.c	2012-09-26 13:06:06.610342995 +0900
> @@ -5755,7 +5755,7 @@ void selinux_complete_init(void)
>  
>  	/* Set up any superblocks initialized prior to the policy load. */
>  	printk(KERN_DEBUG "SELinux:  Setting up existing superblocks.\n");
> -	iterate_supers(delayed_superblock_init, NULL);
> +	iterate_supers_read(delayed_superblock_init, NULL);
>  }
>  
>  /* SELinux requires early initialization in order to label
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount
  2012-10-05  5:35 ` [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
@ 2012-10-08 13:57   ` Jan Kara
  2012-10-09  5:07     ` Fernando Luis Vazquez Cao
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2012-10-08 13:57 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Jan Kara, linux-fsdevel

On Fri 05-10-12 14:35:53, Fernando Luis Vázquez Cao wrote:
> The emergency thaw process uses iterate_super() which holds the
> sb->s_umount lock in read mode. The current thaw_super() code takes
> the sb->s_umount lock in write mode, hence leading to an instant
> deadlock.
>  
> Use the unlocked version of thaw_super() to do the thawing and replace
> iterate_supers() with __iterate_supers() so that the unfreeze operation can
                        ^^ iterate_supers_write()

> be performed with s_umount held as the locking rules for fsfreeze indicate.
> 
> As a bonus, by using thaw_super(), which does not nest, instead of thaw_bdev()
> when can get rid of the ugly while loop.
> 
> Jan Kara pointed out that with this approach we will leave the block devices
> frozen, but this is a problem we have had since the introduction of the
> superblock level API: if we thaw the filesystem using the superblock level API
> (be it through the thaw ioctl or emergency thaw) the bdev level freeze
> reference counter (bd_fsfreeze_count) will not be updated and even though
> subsequent calls to thaw_bdev() will decrease it it will never get back to 0
> (if thaw_super() returns an error, and it will when the superblock is unfrozen,
> thaw_bdev() will return without decreasing the counter). The solution I propose
> (and will be implementing in the followup patch "fsfreeze: freeze_super and
> thaw_bdev don't play well together") is letting bd_fsfreeze_count
> become zero when the superblock sitting on top of it is unfrozen, so that
> future calls to freeze_bdev() actually try to freeze the superblock.
> 
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
> 
> diff -urNp linux-3.6.0-rc7-orig/fs/buffer.c linux-3.6.0-rc7/fs/buffer.c
> --- linux-3.6.0-rc7-orig/fs/buffer.c	2012-09-26 13:20:14.842365056 +0900
> +++ linux-3.6.0-rc7/fs/buffer.c	2012-09-26 15:02:22.630595704 +0900
> @@ -513,15 +513,28 @@ repeat:
>  
>  static void do_thaw_one(struct super_block *sb, void *unused)
>  {
> -	char b[BDEVNAME_SIZE];
> -	while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb))
> -		printk(KERN_WARNING "Emergency Thaw on %s\n",
> +	int res;
> +
> +	if (sb->s_bdev) {
> +		char b[BDEVNAME_SIZE];
> +		printk(KERN_WARNING "Emergency Thaw on %s.\n",
>  		       bdevname(sb->s_bdev, b));
> +	}
> +
> +	/* We got here from __iterate_supers with the superblock lock taken
> +	 * so we can call the lockless version of thaw_super() safely. */
> +	res = __thaw_super(sb);
> +	/* If we are going to drop the final active reference call
> +	 * deactivate_locked_super to clean things up. In the general case
> +	 * we avoid calling deactivate_locked_super() because it would relase
> +	 * the superblock lock, which is __iterate_supers()'s job. */
> +	if (!res && !atomic_add_unless(&sb->s_active, -1, 1))
> +		deactivate_locked_super(sb);
  This just looks wrong. When we *do* end up calling
deactivate_locked_super() we will return with sb unlocked which makes
iterate_supers_write() unlock already unlocked lock. What I would put here
is:
	if (!res) {
		deactivate_locked_super(sb);
		/*
		 * We have to re-acquire s_umount because
		 * iterate_supers_write() will unlock it. It still holds
		 * passive reference so sb cannot be freed under us.
		 */
		down_write(&sb->s_umount);
	}
	
Is there any problem with this I miss?

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together
  2012-10-05  5:42 ` [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
@ 2012-10-08 14:17   ` Jan Kara
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2012-10-08 14:17 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Jan Kara, linux-fsdevel, Konishi Ryusuke,
	Steven Whitehouse

On Fri 05-10-12 14:42:40, Fernando Luis Vázquez Cao wrote:
> thaw_bdev() has re-entrancy guards to allow freezes to nest
> together. That is, it ensures that the filesystem is not thawed
> until the last thaw command is issued. This is needed to prevent the
> filesystem from being unfrozen while an existing freezer is still
> operating on the filesystem in a frozen state (e.g. dm-snapshot).
> 
> Currently, freeze_super() and thaw_super() bypasses these guards,
> and as a result manual freezing and unfreezing via the ioctl methods
> do not nest correctly. hence mixing userspace directed freezes with
> block device level freezes result in inconsistency due to premature
> thawing of the filesystem.
> 
> Move the re-enterency guards to the superblock and into freeze_super
> and thaw_super() so that userspace directed freezes nest correctly
> again. Caveat: Things work as expected as long as direct calls to
> thaw_super are always in response to a previous sb level freeze. In
> other words an unpaired call to thaw_super can still thaw a
> filesystem frozen using freeze_bdev (this issue could be addressed
> in a follow-up patch if deemed necessary).
> 
> This patch retains the bdev level mutex and counter to keep the
> "feature" that we can freeze a block device that does not have a
> filesystem mounted yet.
> 
> IMPORTANT CAVEAT: This patch restores the nesting feature of the fsfreeze ioctl
> which got removed by commit 18e9e5104fcd9a973ffe3eed3816c87f2a1b6cd2
> ("Introduce freeze_super and thaw_super for the fsfreeze ioctl"). It could be
> argued that it is too late to fix the userland ABI breakage caused by that
> patch and that the current ABI is the one that should be kept. If this is the
> respective maintainer(s) opinion this could be modified to not allow fsfreeze
> ioctl nesting.
> 
> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Cc: Konishi Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
> Cc: Steven Whitehouse <swhiteho@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
  Just some style nits below. So after fixing them you can add:
Reviewed-by: Jan Kara <jack@suse.cz>

									Honza
> ---
> 
> diff -urNp linux-3.6-orig/fs/block_dev.c linux-3.6/fs/block_dev.c
> --- linux-3.6-orig/fs/block_dev.c	2012-10-04 15:05:42.168084928 +0900
> +++ linux-3.6/fs/block_dev.c	2012-10-04 15:08:40.228086607 +0900
...
>  int thaw_bdev(struct block_device *bdev, struct super_block *sb)
>  {
>  	int error = -EINVAL;
>  
>  	mutex_lock(&bdev->bd_fsfreeze_mutex);
> +
>  	if (!bdev->bd_fsfreeze_count)
>  		goto out;
>  
> -	if (--bdev->bd_fsfreeze_count > 0 || !sb) {
> +	bdev->bd_fsfreeze_count--;
> +
> +	if (!sb) {
>  		error = 0;
>  		goto out;
>  	}
>  
>  	error = thaw_super(sb);
> -	if (error)
> +	/* If the superblock is already unfrozen, i.e. thaw_super() returned
> +	 * -EINVAL, we consider the block device level thaw successful. This
> +	 * behavior is important in a scenario where a filesystem frozen using
> +	 * freeze_bdev() is thawed through the superblock level API; if we
> +	 * caused the subsequent thaw_bdev() to fail bdev->bd_fsfreeze_count
> +	 * would not go back to 0 which means that future calls to freeze_bdev()
> +	 * would not freeze the superblock, just increase the counter. */
  ^^ The correct formatting of long comments is:
  /*
   * Text
   * Text
   */

> +	if (error && error != -EINVAL)
>  		bdev->bd_fsfreeze_count++;
> +	else
> +		error = 0;
>  out:
>  	mutex_unlock(&bdev->bd_fsfreeze_mutex);
>  	return error;
...
> diff -urNp linux-3.6-orig/fs/super.c linux-3.6/fs/super.c
> --- linux-3.6-orig/fs/super.c	2012-10-04 15:06:00.264085275 +0900
> +++ linux-3.6/fs/super.c	2012-10-04 15:09:21.164086840 +0900
...
> @@ -1356,29 +1363,29 @@ static void sb_wait_write(struct super_b
>   * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
>   * mostly auxiliary for filesystems to verify they do not modify frozen fs.
>   *
> - * sb->s_writers.frozen is protected by sb->s_umount.
> + * sb->s_writers.frozen and sb->s_freeze_count are protected by sb->s_umount.
>   */
>  int freeze_super(struct super_block *sb)
>  {
> -	int ret;
> +	int ret = 0;
>  
>  	atomic_inc(&sb->s_active);
>  	down_write(&sb->s_umount);
> -	if (sb->s_writers.frozen != SB_UNFROZEN) {
> -		deactivate_locked_super(sb);
> -		return -EBUSY;
> -	}
>  
> -	if (!(sb->s_flags & MS_BORN)) {
> -		up_write(&sb->s_umount);
> -		return 0;	/* sic - it's "nothing to do" */
> -	}
> +	if (++sb->s_freeze_count > 1)
> +		goto out_deactivate;
> +
> +	/* If MS_BORN is not set it means we have a failing mount (this is
> +	 * possible if we got here from freeze_bdev()). Keep an active
> +	 * reference so that the superblock is not killed until it is thawed
> +	 * via thaw_bdev(). */
  Again, comment formatting is wrong.

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state
  2012-10-05  5:43 ` [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao
@ 2012-10-08 15:05   ` Jan Kara
  2012-10-09  9:46     ` Fernando Luis Vazquez Cao
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2012-10-08 15:05 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, Jan Kara, linux-fsdevel

On Fri 05-10-12 14:43:29, Fernando Luis Vázquez Cao wrote:
> The FIISFROZEN ioctl can be use by HA and monitoring software to check
> the freeze state of a mounted filesystem.
  I was thinking about this and your use case and I thing this is just a
wrong way to fix a problem with your HA application. E.g. in case you would
"umount -l" your filesystem, you would hit the same problem as in presence
of freezing and the ioctl won't help you. Now I understand that in your
specific use case you likely don't need to deal with lazy umounts but we
shouldn't add an interface just to accomodate one use case and later find
out we need another interface for slightly different one.

So what you rather seem to need is some interface which allows you to
investigate filesystems that are mounted on a block device but not attached
anywhere in the namespace. Would that be enough for you? If yes, some
extension to /proc/self/mountinfo to do this should be possible...

								Honza

> Cc: Josef Bacik <jbacik@fusionio.com>
> Cc: Eric Sandeen <sandeen@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
> 
> diff -urNp linux-3.6-orig/fs/compat_ioctl.c linux-3.6/fs/compat_ioctl.c
> --- linux-3.6-orig/fs/compat_ioctl.c	2012-10-01 08:47:46.000000000 +0900
> +++ linux-3.6/fs/compat_ioctl.c	2012-10-04 17:29:50.060102922 +0900
> @@ -885,6 +885,7 @@ COMPATIBLE_IOCTL(FIGETBSZ)
>  /* 'X' - originally XFS but some now in the VFS */
>  COMPATIBLE_IOCTL(FIFREEZE)
>  COMPATIBLE_IOCTL(FITHAW)
> +COMPATIBLE_IOCTL(FIISFROZEN)
>  COMPATIBLE_IOCTL(KDGETKEYCODE)
>  COMPATIBLE_IOCTL(KDSETKEYCODE)
>  COMPATIBLE_IOCTL(KDGKBTYPE)
> diff -urNp linux-3.6-orig/fs/ioctl.c linux-3.6/fs/ioctl.c
> --- linux-3.6-orig/fs/ioctl.c	2012-10-01 08:47:46.000000000 +0900
> +++ linux-3.6/fs/ioctl.c	2012-10-04 17:29:50.060102922 +0900
> @@ -536,6 +536,16 @@ static int ioctl_fsthaw(struct file *fil
>  	return thaw_super(sb);
>  }
>  
> +static int ioctl_fs_isfrozen(struct file *filp)
> +{
> +	struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	return isfrozen_super(sb);
> +}
> +
>  /*
>   * When you add any new common ioctls to the switches above and below
>   * please update compat_sys_ioctl() too.
> @@ -585,6 +595,12 @@ int do_vfs_ioctl(struct file *filp, unsi
>  		error = ioctl_fsthaw(filp);
>  		break;
>  
> +	case FIISFROZEN:
> +		error = ioctl_fs_isfrozen(filp);
> +		if (error >= 0)
> +			return put_user(error, (int __user *)arg);
> +		break;
> +
>  	case FS_IOC_FIEMAP:
>  		return ioctl_fiemap(filp, arg);
>  
> diff -urNp linux-3.6-orig/fs/super.c linux-3.6/fs/super.c
> --- linux-3.6-orig/fs/super.c	2012-10-04 17:28:45.440102318 +0900
> +++ linux-3.6/fs/super.c	2012-10-04 17:29:50.060102922 +0900
> @@ -1553,3 +1553,8 @@ void emergency_thaw_all(void)
>  		schedule_work(work);
>  	}
>  }
> +
> +int isfrozen_super(struct super_block *sb)
> +{
> +	return sb->s_writers.frozen > SB_UNFROZEN ? 1 : 0;
> +}
> diff -urNp linux-3.6-orig/include/linux/fs.h linux-3.6/include/linux/fs.h
> --- linux-3.6-orig/include/linux/fs.h	2012-10-04 17:28:45.440102318 +0900
> +++ linux-3.6/include/linux/fs.h	2012-10-04 17:29:50.060102922 +0900
> @@ -342,6 +342,7 @@ struct inodes_stat_t {
>  #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
>  #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
>  #define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
> +#define FIISFROZEN	_IOR('X', 122, int)	/* get sb freeze state */
>  
>  #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
>  #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
> @@ -2086,6 +2087,7 @@ extern int fd_statfs(int, struct kstatfs
>  extern int vfs_ustat(dev_t, struct kstatfs *);
>  extern int freeze_super(struct super_block *super);
>  extern int thaw_super(struct super_block *super);
> +extern int isfrozen_super(struct super_block *sb);
>  extern bool our_mnt(struct vfsmount *mnt);
>  
>  extern int current_umask(void);
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount
  2012-10-08 13:57   ` Jan Kara
@ 2012-10-09  5:07     ` Fernando Luis Vazquez Cao
  2012-10-09  8:20       ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-10-09  5:07 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, linux-fsdevel

On 2012/10/08 22:57, Jan Kara wrote:
> On Fri 05-10-12 14:35:53, Fernando Luis Vázquez Cao wrote:
>> The emergency thaw process uses iterate_super() which holds the
>> sb->s_umount lock in read mode. The current thaw_super() code takes
>> the sb->s_umount lock in write mode, hence leading to an instant
>> deadlock.
>>   
>> Use the unlocked version of thaw_super() to do the thawing and replace
>> iterate_supers() with __iterate_supers() so that the unfreeze operation can
>                          ^^ iterate_supers_write()

Good catch.


>> be performed with s_umount held as the locking rules for fsfreeze indicate.
>>
>> As a bonus, by using thaw_super(), which does not nest, instead of thaw_bdev()
>> when can get rid of the ugly while loop.
>>
>> Jan Kara pointed out that with this approach we will leave the block devices
>> frozen, but this is a problem we have had since the introduction of the
>> superblock level API: if we thaw the filesystem using the superblock level API
>> (be it through the thaw ioctl or emergency thaw) the bdev level freeze
>> reference counter (bd_fsfreeze_count) will not be updated and even though
>> subsequent calls to thaw_bdev() will decrease it it will never get back to 0
>> (if thaw_super() returns an error, and it will when the superblock is unfrozen,
>> thaw_bdev() will return without decreasing the counter). The solution I propose
>> (and will be implementing in the followup patch "fsfreeze: freeze_super and
>> thaw_bdev don't play well together") is letting bd_fsfreeze_count
>> become zero when the superblock sitting on top of it is unfrozen, so that
>> future calls to freeze_bdev() actually try to freeze the superblock.
>>
>> Cc: Josef Bacik <jbacik@fusionio.com>
>> Cc: Eric Sandeen <sandeen@redhat.com>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Dave Chinner <dchinner@redhat.com>
>> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
>> ---
>>
>> diff -urNp linux-3.6.0-rc7-orig/fs/buffer.c linux-3.6.0-rc7/fs/buffer.c
>> --- linux-3.6.0-rc7-orig/fs/buffer.c	2012-09-26 13:20:14.842365056 +0900
>> +++ linux-3.6.0-rc7/fs/buffer.c	2012-09-26 15:02:22.630595704 +0900
>> @@ -513,15 +513,28 @@ repeat:
>>   
>>   static void do_thaw_one(struct super_block *sb, void *unused)
>>   {
>> -	char b[BDEVNAME_SIZE];
>> -	while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb))
>> -		printk(KERN_WARNING "Emergency Thaw on %s\n",
>> +	int res;
>> +
>> +	if (sb->s_bdev) {
>> +		char b[BDEVNAME_SIZE];
>> +		printk(KERN_WARNING "Emergency Thaw on %s.\n",
>>   		       bdevname(sb->s_bdev, b));
>> +	}
>> +
>> +	/* We got here from __iterate_supers with the superblock lock taken
>> +	 * so we can call the lockless version of thaw_super() safely. */
>> +	res = __thaw_super(sb);
>> +	/* If we are going to drop the final active reference call
>> +	 * deactivate_locked_super to clean things up. In the general case
>> +	 * we avoid calling deactivate_locked_super() because it would relase
>> +	 * the superblock lock, which is __iterate_supers()'s job. */
>> +	if (!res && !atomic_add_unless(&sb->s_active, -1, 1))
>> +		deactivate_locked_super(sb);
>    This just looks wrong. When we *do* end up calling
> deactivate_locked_super() we will return with sb unlocked which makes
> iterate_supers_write() unlock already unlocked lock.

Thank you for the heads-up.

I missed the fact that ->kill_sb() which gets called in 
deactivate_locked_super()
will unlock the superblock indirectly via generic_shutdown_super() or one of
the wrappers around it (kill_block_super(), kill_anon_super(), 
kill_litter_super()).


> What I would put here is:
> 	if (!res) {
> 		deactivate_locked_super(sb);
> 		/*
> 		 * We have to re-acquire s_umount because
> 		 * iterate_supers_write() will unlock it. It still holds
> 		 * passive reference so sb cannot be freed under us.
> 		 */
> 		down_write(&sb->s_umount);
> 	}
> 	
> Is there any problem with this I miss?

The reason  I wrote the code as I did is that I did not want to re-acquire
s_umount in the normal case (s_active >= 2 entering the if statement).

What about combining our approaches and doing something like this?:

if (!res && !atomic_add_unless(&sb->s_active, -1, 1)) {
      deactivate_locked_super(sb);
      down_write(&sb->s_umount);
}

Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount
  2012-10-09  5:07     ` Fernando Luis Vazquez Cao
@ 2012-10-09  8:20       ` Jan Kara
  2012-10-09  9:52         ` Fernando Luis Vazquez Cao
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2012-10-09  8:20 UTC (permalink / raw)
  To: Fernando Luis Vazquez Cao
  Cc: Jan Kara, Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, linux-fsdevel

On Tue 09-10-12 14:07:52, Fernando Luis Vazquez Cao wrote:
> >>diff -urNp linux-3.6.0-rc7-orig/fs/buffer.c linux-3.6.0-rc7/fs/buffer.c
> >>--- linux-3.6.0-rc7-orig/fs/buffer.c	2012-09-26 13:20:14.842365056 +0900
> >>+++ linux-3.6.0-rc7/fs/buffer.c	2012-09-26 15:02:22.630595704 +0900
...
> >What I would put here is:
> >	if (!res) {
> >		deactivate_locked_super(sb);
> >		/*
> >		 * We have to re-acquire s_umount because
> >		 * iterate_supers_write() will unlock it. It still holds
> >		 * passive reference so sb cannot be freed under us.
> >		 */
> >		down_write(&sb->s_umount);
> >	}
> >	
> >Is there any problem with this I miss?
> 
> The reason  I wrote the code as I did is that I did not want to re-acquire
> s_umount in the normal case (s_active >= 2 entering the if statement).
> 
> What about combining our approaches and doing something like this?:
> 
> if (!res && !atomic_add_unless(&sb->s_active, -1, 1)) {
>      deactivate_locked_super(sb);
>      down_write(&sb->s_umount);
> }
  Well, we are speaking about emergency thaw code. That's no performance
critical path by any means so I'd trade code simplicity for speed as much
as possible. atomic_add_unless() makes you think twice what the hell we are
doing here so I'd avoid it...

								Honza

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state
  2012-10-08 15:05   ` Jan Kara
@ 2012-10-09  9:46     ` Fernando Luis Vazquez Cao
  2012-10-09 14:55       ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-10-09  9:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, linux-fsdevel, Fernando Luis Vázquez Cao

On 2012/10/09 00:05, Jan Kara wrote:
 > On Fri 05-10-12 14:43:29, Fernando Luis Vázquez Cao wrote:
 >> The FIISFROZEN ioctl can be use by HA and monitoring software to check
 >> the freeze state of a mounted filesystem.
 >
 >   I was thinking about this and your use case and I thing this is just a
 > wrong way to fix a problem with your HA application. E.g. in case you 
would
 > "umount -l" your filesystem, you would hit the same problem as in 
presence
 > of freezing and the ioctl won't help you. Now I understand that in your
 > specific use case you likely don't need to deal with lazy umounts but we
 > shouldn't add an interface just to accomodate one use case and later find
 > out we need another interface for slightly different one.

By the way, we can end up with a detached and active superblock even
without using lazy umount; it is possible to do a regular umount of a
frozen filesystem.


 > So what you rather seem to need is some interface which allows you to
 > investigate filesystems that are mounted on a block device but not 
attached
 > anywhere in the namespace. Would that be enough for you? If yes, some
 > extension to /proc/self/mountinfo to do this should be possible...

Well, neither /proc/self/mount* nor /proc/mounts show superblocks not
attached anywhere in the namespace and changing that behavior could
wreck havoc in userspace scripts and management software. I would
rather not change that de-facto ABI unless it is strictly needed. On
the other hand, the check ioctls add a new userspace API that would
not break anything.

Regarding your concern about the ioctl approach, when a frozen
filesystem is detached from the namespace it can still be reached
through the block device it is sitting on (well... with the exception
of btrfs which has some issues that I am working on) and this is the
reason I added a block device level check ioctl too. That said, if one
day we have a filesystem which is not block device based and supports
fsfreeze (ioctl_fsfreeze() returns -EOPNOTSUPP if the superblock has
no ->freeze_fs operation, which is the case for all virtual
filesystems and NAS drivers that we have) the two check ioctls would
not cover that case.

I think that to cover all cases without adding a completely new API we
need to do the following:

1) Filesystems which are not tied to a block device (virtual
   filesystems, NAS, etc):

   As soon as the filesystem is removed from the namespace the
   superblock based fsfreeze ioctls become useless; if we let a umount
   of a frozen filesystem succeed we would not be able to thaw it (well
   we could use emergency thaw but it would be overkill). Since we do
   not want to break lazy umounts the only viable solution is thawing
   the superblock automatically on umount (releasing the active
   reference taken in freeze_super() to be more precise).

2) Block device based filesystems:

   These can be reached through the block device it is sitting on even
   if the filesystem was detached from the namespace and have the
   particularity that they can be frozen using two different APIs, a
   block device level one and the ioctls. When a filesytem was frozen
   using the former, which only has in-kernel users such as dm,
   automatically thawing the filesystem on umount is arguably too rude
   (we can end up breaking the filesystem level consistency of a
   storage snapshot). It we care about this, we could modify
   sys_umount() so that filesystem is automatically thawed if and only
   if there are no block device level freezes active. This behavior
   would be consistent with case 1) above (the premise here is that
   both fsfreeze and umount are userspace controlled operations and the
   administrator should know what it is doing) and is the less likely
   to cause surprises to freeze_bdev() users.

   It would also be nice to have a block device level thaw ioctl for
   emergency cases (for example, a scenario where thaw_bdev() was not
   called and the freeze counter was left in a inconsistent state;
   freeze_bdev() and thaw_bdev() are exported symbols and in many cases
   we cannot control what external modules do).

Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount
  2012-10-09  8:20       ` Jan Kara
@ 2012-10-09  9:52         ` Fernando Luis Vazquez Cao
  0 siblings, 0 replies; 23+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-10-09  9:52 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, linux-fsdevel, Fernando Luis Vázquez Cao

On 2012/10/09 17:20, Jan Kara wrote:
> On Tue 09-10-12 14:07:52, Fernando Luis Vazquez Cao wrote:
>>>> diff -urNp linux-3.6.0-rc7-orig/fs/buffer.c linux-3.6.0-rc7/fs/buffer.c
>>>> --- linux-3.6.0-rc7-orig/fs/buffer.c	2012-09-26 13:20:14.842365056 +0900
>>>> +++ linux-3.6.0-rc7/fs/buffer.c	2012-09-26 15:02:22.630595704 +0900
> ...
>>> What I would put here is:
>>> 	if (!res) {
>>> 		deactivate_locked_super(sb);
>>> 		/*
>>> 		 * We have to re-acquire s_umount because
>>> 		 * iterate_supers_write() will unlock it. It still holds
>>> 		 * passive reference so sb cannot be freed under us.
>>> 		 */
>>> 		down_write(&sb->s_umount);
>>> 	}
>>> 	
>>> Is there any problem with this I miss?
>> The reason  I wrote the code as I did is that I did not want to re-acquire
>> s_umount in the normal case (s_active >= 2 entering the if statement).
>>
>> What about combining our approaches and doing something like this?:
>>
>> if (!res && !atomic_add_unless(&sb->s_active, -1, 1)) {
>>       deactivate_locked_super(sb);
>>       down_write(&sb->s_umount);
>> }
>    Well, we are speaking about emergency thaw code. That's no performance
> critical path by any means so I'd trade code simplicity for speed as much
> as possible. atomic_add_unless() makes you think twice what the hell we are
> doing here so I'd avoid it...

I left (and documented) atomic_add_unless() in the new iteration I 
cooked locally,
but I can get rid of it if you feel strongly about it.

I really appreciate you in-depth comments!

Thanks,
Fernando

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state
  2012-10-09  9:46     ` Fernando Luis Vazquez Cao
@ 2012-10-09 14:55       ` Jan Kara
  2012-10-10  2:17         ` Fernando Luis Vazquez Cao
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2012-10-09 14:55 UTC (permalink / raw)
  To: Fernando Luis Vazquez Cao
  Cc: Jan Kara, Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, linux-fsdevel, Fernando Luis Vázquez Cao

On Tue 09-10-12 18:46:26, Fernando Luis Vazquez Cao wrote:
> On 2012/10/09 00:05, Jan Kara wrote:
> > On Fri 05-10-12 14:43:29, Fernando Luis Vázquez Cao wrote:
> >> The FIISFROZEN ioctl can be use by HA and monitoring software to check
> >> the freeze state of a mounted filesystem.
> >
> >   I was thinking about this and your use case and I thing this is just a
> > wrong way to fix a problem with your HA application. E.g. in case
> you would
> > "umount -l" your filesystem, you would hit the same problem as in
> presence
> > of freezing and the ioctl won't help you. Now I understand that in your
> > specific use case you likely don't need to deal with lazy umounts but we
> > shouldn't add an interface just to accomodate one use case and later find
> > out we need another interface for slightly different one.
> 
> By the way, we can end up with a detached and active superblock even
> without using lazy umount; it is possible to do a regular umount of a
> frozen filesystem.
  Yes, I'm aware of this.

> > So what you rather seem to need is some interface which allows you to
> > investigate filesystems that are mounted on a block device but not
> attached
> > anywhere in the namespace. Would that be enough for you? If yes, some
> > extension to /proc/self/mountinfo to do this should be possible...
> 
> Well, neither /proc/self/mount* nor /proc/mounts show superblocks not
> attached anywhere in the namespace and changing that behavior could
> wreck havoc in userspace scripts and management software. I would
> rather not change that de-facto ABI unless it is strictly needed. On
> the other hand, the check ioctls add a new userspace API that would
> not break anything.
  I agree we must be careful not to break the interface. I believe
there are ways to extend mountinfo in a compatible way though. But it's not
completely trivial since we have to forge the mount path (or better put
there something like 'none') and vfsmount id.

> Regarding your concern about the ioctl approach, when a frozen
> filesystem is detached from the namespace it can still be reached
> through the block device it is sitting on (well... with the exception
> of btrfs which has some issues that I am working on) and this is the
> reason I added a block device level check ioctl too. That said, if one
> day we have a filesystem which is not block device based and supports
> fsfreeze (ioctl_fsfreeze() returns -EOPNOTSUPP if the superblock has
> no ->freeze_fs operation, which is the case for all virtual
> filesystems and NAS drivers that we have) the two check ioctls would
> not cover that case.
  In principle, there are filesystems which operate e.g. on MTD and thus do
not have a block device. So far none of these seem to support freezing but
in principle there's no reason they couldn't. And for these filesystems your
ioctls won't help.

> I think that to cover all cases without adding a completely new API we
> need to do the following:
> 
> 1) Filesystems which are not tied to a block device (virtual
>   filesystems, NAS, etc):
> 
>   As soon as the filesystem is removed from the namespace the
>   superblock based fsfreeze ioctls become useless; if we let a umount
>   of a frozen filesystem succeed we would not be able to thaw it (well
>   we could use emergency thaw but it would be overkill). Since we do
  Actually, you can always mount the filesystem again (you will essentially
just attach the superblock to the namespace again) and thaw the filesystem.
So this is not a big issue.

>   not want to break lazy umounts the only viable solution is thawing
>   the superblock automatically on umount (releasing the active
>   reference taken in freeze_super() to be more precise).
  I'm not against this. As you write below, you cannot really thaw
freeze coming via block device so you end up with somewhat inconsistent
behavior (thaw only freezes by ioctl) but after all freeze of a filesystem
and freeze of a block device *are* somewhat different requests so the
inconsistency can be justified.

Do I get right that when we do this, you won't need ioctls for querying the
freeze state?

> 2) Block device based filesystems:
> 
>   These can be reached through the block device it is sitting on even
>   if the filesystem was detached from the namespace and have the
>   particularity that they can be frozen using two different APIs, a
>   block device level one and the ioctls. When a filesytem was frozen
>   using the former, which only has in-kernel users such as dm,
>   automatically thawing the filesystem on umount is arguably too rude
>   (we can end up breaking the filesystem level consistency of a
>   storage snapshot). It we care about this, we could modify
>   sys_umount() so that filesystem is automatically thawed if and only
>   if there are no block device level freezes active. This behavior
>   would be consistent with case 1) above (the premise here is that
>   both fsfreeze and umount are userspace controlled operations and the
>   administrator should know what it is doing) and is the less likely
>   to cause surprises to freeze_bdev() users.
> 
>   It would also be nice to have a block device level thaw ioctl for
>   emergency cases (for example, a scenario where thaw_bdev() was not
>   called and the freeze counter was left in a inconsistent state;
>   freeze_bdev() and thaw_bdev() are exported symbols and in many cases
>   we cannot control what external modules do).
  Umm, I don't know. I'd rather forbid thawing via ioctl when the device is
frozen via block device so that should solve possible issues caused by
buggy userspace and the rest is a kernel bug - emergency thaw is for
that...

								Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state
  2012-10-09 14:55       ` Jan Kara
@ 2012-10-10  2:17         ` Fernando Luis Vazquez Cao
  2012-10-11 16:26           ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-10-10  2:17 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, linux-fsdevel

On 2012/10/09 23:55, Jan Kara wrote:
> On Tue 09-10-12 18:46:26, Fernando Luis Vazquez Cao wrote:
>> Regarding your concern about the ioctl approach, when a frozen
>> filesystem is detached from the namespace it can still be reached
>> through the block device it is sitting on (well... with the exception
>> of btrfs which has some issues that I am working on) and this is the
>> reason I added a block device level check ioctl too. That said, if one
>> day we have a filesystem which is not block device based and supports
>> fsfreeze (ioctl_fsfreeze() returns -EOPNOTSUPP if the superblock has
>> no ->freeze_fs operation, which is the case for all virtual
>> filesystems and NAS drivers that we have) the two check ioctls would
>> not cover that case.
>    In principle, there are filesystems which operate e.g. on MTD and thus do
> not have a block device. So far none of these seem to support freezing but
> in principle there's no reason they couldn't. And for these filesystems your
> ioctls won't help.

Such devices fall in category 1) below, which means they would
be automatically thawed on umount if frozen. As long as the
filesystem remains mounted the check ioctl can be used.


>> I think that to cover all cases without adding a completely new API we
>> need to do the following:
>>
>> 1) Filesystems which are not tied to a block device (virtual
>>    filesystems, NAS, etc):
>>
>>    As soon as the filesystem is removed from the namespace the
>>    superblock based fsfreeze ioctls become useless; if we let a umount
>>    of a frozen filesystem succeed we would not be able to thaw it (well
>>    we could use emergency thaw but it would be overkill). Since we do
>    Actually, you can always mount the filesystem again (you will essentially
> just attach the superblock to the namespace again) and thaw the filesystem.
> So this is not a big issue.

The problem is that we may generate write I/O during the second
mount. We would need to audit all filesystems (which I am fine
with if there is a sensible use case).


>>    not want to break lazy umounts the only viable solution is thawing
>>    the superblock automatically on umount (releasing the active
>>    reference taken in freeze_super() to be more precise).
>    I'm not against this. As you write below, you cannot really thaw
> freeze coming via block device so you end up with somewhat inconsistent
> behavior (thaw only freezes by ioctl) but after all freeze of a filesystem
> and freeze of a block device *are* somewhat different requests so the
> inconsistency can be justified.
>
> Do I get right that when we do this, you won't need ioctls for querying the
> freeze state?

I would still want the check ioctls. For example, in some cases the
freeze/unfreeze process is controlled by a daemon which can die
and with the current API there is no way to check what state
filesystems where left in (well, we have emergency thaw but thaw
unfreezes all filesystems which may not be what we want, i.e. overkill).
I have heard a lot of complaints about this from users.

Virtualization is a special case of this where the freeze of a guest
filesystem can be initiated from the hypervisor and carried out by
a guest agent behind the guest's administrator's back.


>> 2) Block device based filesystems:
>>
>>    These can be reached through the block device it is sitting on even
>>    if the filesystem was detached from the namespace and have the
>>    particularity that they can be frozen using two different APIs, a
>>    block device level one and the ioctls. When a filesytem was frozen
>>    using the former, which only has in-kernel users such as dm,
>>    automatically thawing the filesystem on umount is arguably too rude
>>    (we can end up breaking the filesystem level consistency of a
>>    storage snapshot). It we care about this, we could modify
>>    sys_umount() so that filesystem is automatically thawed if and only
>>    if there are no block device level freezes active. This behavior
>>    would be consistent with case 1) above (the premise here is that
>>    both fsfreeze and umount are userspace controlled operations and the
>>    administrator should know what it is doing) and is the less likely
>>    to cause surprises to freeze_bdev() users.
>>
>>    It would also be nice to have a block device level thaw ioctl for
>>    emergency cases (for example, a scenario where thaw_bdev() was not
>>    called and the freeze counter was left in a inconsistent state;
>>    freeze_bdev() and thaw_bdev() are exported symbols and in many cases
>>    we cannot control what external modules do).
>    Umm, I don't know. I'd rather forbid thawing via ioctl when the device is
> frozen via block device so that should solve possible issues caused by
> buggy userspace and the rest is a kernel bug - emergency thaw is for
> that...

That is an approach I myself considered and that I would be ok
with. I guess I will implement both and let Al decide.

Thanks,
Fernando

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state
  2012-10-10  2:17         ` Fernando Luis Vazquez Cao
@ 2012-10-11 16:26           ` Jan Kara
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2012-10-11 16:26 UTC (permalink / raw)
  To: Fernando Luis Vazquez Cao
  Cc: Jan Kara, Al Viro, Josef Bacik, Eric Sandeen, Dave Chinner,
	Christoph Hellwig, linux-fsdevel

On Wed 10-10-12 11:17:25, Fernando Luis Vazquez Cao wrote:
> On 2012/10/09 23:55, Jan Kara wrote:
> >On Tue 09-10-12 18:46:26, Fernando Luis Vazquez Cao wrote:
> >>I think that to cover all cases without adding a completely new API we
> >>need to do the following:
> >>
> >>1) Filesystems which are not tied to a block device (virtual
> >>   filesystems, NAS, etc):
> >>
> >>   As soon as the filesystem is removed from the namespace the
> >>   superblock based fsfreeze ioctls become useless; if we let a umount
> >>   of a frozen filesystem succeed we would not be able to thaw it (well
> >>   we could use emergency thaw but it would be overkill). Since we do
> >   Actually, you can always mount the filesystem again (you will essentially
> >just attach the superblock to the namespace again) and thaw the filesystem.
> >So this is not a big issue.
> 
> The problem is that we may generate write I/O during the second
> mount. We would need to audit all filesystems (which I am fine
> with if there is a sensible use case).
  Most filesystems should be fine as they use mount_bdev() so
foo_fill_super() isn't called in that case. But yes, some filesystems could
in theory do something weird.

> >>   not want to break lazy umounts the only viable solution is thawing
> >>   the superblock automatically on umount (releasing the active
> >>   reference taken in freeze_super() to be more precise).
> >   I'm not against this. As you write below, you cannot really thaw
> >freeze coming via block device so you end up with somewhat inconsistent
> >behavior (thaw only freezes by ioctl) but after all freeze of a filesystem
> >and freeze of a block device *are* somewhat different requests so the
> >inconsistency can be justified.
> >
> >Do I get right that when we do this, you won't need ioctls for querying the
> >freeze state?
> 
> I would still want the check ioctls. For example, in some cases the
> freeze/unfreeze process is controlled by a daemon which can die
> and with the current API there is no way to check what state
> filesystems where left in (well, we have emergency thaw but thaw
> unfreezes all filesystems which may not be what we want, i.e. overkill).
> I have heard a lot of complaints about this from users.
  Well, you're going to find out pretty quickly in what state a filesystem
is :) But I understand that with ioctl() you can produce a sensible
output...

> Virtualization is a special case of this where the freeze of a guest
> filesystem can be initiated from the hypervisor and carried out by
> a guest agent behind the guest's administrator's back.
  OK.

> >>2) Block device based filesystems:
> >>
> >>   These can be reached through the block device it is sitting on even
> >>   if the filesystem was detached from the namespace and have the
> >>   particularity that they can be frozen using two different APIs, a
> >>   block device level one and the ioctls. When a filesytem was frozen
> >>   using the former, which only has in-kernel users such as dm,
> >>   automatically thawing the filesystem on umount is arguably too rude
> >>   (we can end up breaking the filesystem level consistency of a
> >>   storage snapshot). It we care about this, we could modify
> >>   sys_umount() so that filesystem is automatically thawed if and only
> >>   if there are no block device level freezes active. This behavior
> >>   would be consistent with case 1) above (the premise here is that
> >>   both fsfreeze and umount are userspace controlled operations and the
> >>   administrator should know what it is doing) and is the less likely
> >>   to cause surprises to freeze_bdev() users.
> >>
> >>   It would also be nice to have a block device level thaw ioctl for
> >>   emergency cases (for example, a scenario where thaw_bdev() was not
> >>   called and the freeze counter was left in a inconsistent state;
> >>   freeze_bdev() and thaw_bdev() are exported symbols and in many cases
> >>   we cannot control what external modules do).
> >   Umm, I don't know. I'd rather forbid thawing via ioctl when the device is
> >frozen via block device so that should solve possible issues caused by
> >buggy userspace and the rest is a kernel bug - emergency thaw is for
> >that...
> 
> That is an approach I myself considered and that I would be ok
> with. I guess I will implement both and let Al decide.
  OK.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state
  2012-09-14  6:43 [RFC, PATCH 0/9 v4] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
@ 2012-09-14  6:54 ` Fernando Luis Vázquez Cao
  0 siblings, 0 replies; 23+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-09-14  6:54 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, linux-fsdevel, fernando

The FIISFROZEN ioctl can be use by HA and monitoring software to check
the freeze state of a mounted filesystem.

Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.6-rc5-orig/fs/compat_ioctl.c linux-3.6-rc5/fs/compat_ioctl.c
--- linux-3.6-rc5-orig/fs/compat_ioctl.c	2012-07-22 05:58:29.000000000 +0900
+++ linux-3.6-rc5/fs/compat_ioctl.c	2012-09-14 15:03:25.481355872 +0900
@@ -885,6 +885,7 @@ COMPATIBLE_IOCTL(FIGETBSZ)
 /* 'X' - originally XFS but some now in the VFS */
 COMPATIBLE_IOCTL(FIFREEZE)
 COMPATIBLE_IOCTL(FITHAW)
+COMPATIBLE_IOCTL(FIISFROZEN)
 COMPATIBLE_IOCTL(KDGETKEYCODE)
 COMPATIBLE_IOCTL(KDSETKEYCODE)
 COMPATIBLE_IOCTL(KDGKBTYPE)
diff -urNp linux-3.6-rc5-orig/fs/ioctl.c linux-3.6-rc5/fs/ioctl.c
--- linux-3.6-rc5-orig/fs/ioctl.c	2012-07-22 05:58:29.000000000 +0900
+++ linux-3.6-rc5/fs/ioctl.c	2012-09-14 15:03:25.481355872 +0900
@@ -536,6 +536,16 @@ static int ioctl_fsthaw(struct file *fil
 	return thaw_super(sb);
 }
 
+static int ioctl_fs_isfrozen(struct file *filp)
+{
+	struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	return isfrozen_super(sb);
+}
+
 /*
  * When you add any new common ioctls to the switches above and below
  * please update compat_sys_ioctl() too.
@@ -585,6 +595,12 @@ int do_vfs_ioctl(struct file *filp, unsi
 		error = ioctl_fsthaw(filp);
 		break;
 
+	case FIISFROZEN:
+		error = ioctl_fs_isfrozen(filp);
+		if (error >= 0)
+			return put_user(error, (int __user *)arg);
+		break;
+
 	case FS_IOC_FIEMAP:
 		return ioctl_fiemap(filp, arg);
 
diff -urNp linux-3.6-rc5-orig/fs/super.c linux-3.6-rc5/fs/super.c
--- linux-3.6-rc5-orig/fs/super.c	2012-09-14 14:03:18.653312262 +0900
+++ linux-3.6-rc5/fs/super.c	2012-09-14 15:03:25.485355873 +0900
@@ -1545,3 +1545,8 @@ void emergency_thaw_all(void)
 		schedule_work(work);
 	}
 }
+
+int isfrozen_super(struct super_block *sb)
+{
+	return sb->s_writers.frozen > SB_UNFROZEN ? 1 : 0;
+}
diff -urNp linux-3.6-rc5-orig/include/linux/fs.h linux-3.6-rc5/include/linux/fs.h
--- linux-3.6-rc5-orig/include/linux/fs.h	2012-09-14 14:03:18.993312323 +0900
+++ linux-3.6-rc5/include/linux/fs.h	2012-09-14 15:03:25.485355873 +0900
@@ -342,6 +342,7 @@ struct inodes_stat_t {
 #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
 #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
 #define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
+#define FIISFROZEN	_IOR('X', 122, int)	/* get sb freeze state */
 
 #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
 #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
@@ -2085,6 +2086,7 @@ extern int fd_statfs(int, struct kstatfs
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
+extern int isfrozen_super(struct super_block *sb);
 extern bool our_mnt(struct vfsmount *mnt);
 
 extern int current_umask(void);



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state
  2012-09-13 10:57 [RFC 0/9 v3] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
@ 2012-09-13 11:11 ` Fernando Luis Vázquez Cao
  0 siblings, 0 replies; 23+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-09-13 11:11 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, Eric Sandeen, Dave Chinner, Christoph Hellwig,
	Jan Kara, linux-fsdevel

The FIISFROZEN ioctl can be use by HA and monitoring software to check
the freeze state of a mounted filesystem.

Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp linux-3.6-rc5-orig/fs/compat_ioctl.c linux-3.6-rc5/fs/compat_ioctl.c
--- linux-3.6-rc5-orig/fs/compat_ioctl.c	2012-07-22 05:58:29.000000000 +0900
+++ linux-3.6-rc5/fs/compat_ioctl.c	2012-09-13 15:40:54.540150687 +0900
@@ -885,6 +885,7 @@ COMPATIBLE_IOCTL(FIGETBSZ)
 /* 'X' - originally XFS but some now in the VFS */
 COMPATIBLE_IOCTL(FIFREEZE)
 COMPATIBLE_IOCTL(FITHAW)
+COMPATIBLE_IOCTL(FIISFROZEN)
 COMPATIBLE_IOCTL(KDGETKEYCODE)
 COMPATIBLE_IOCTL(KDSETKEYCODE)
 COMPATIBLE_IOCTL(KDGKBTYPE)
diff -urNp linux-3.6-rc5-orig/fs/ioctl.c linux-3.6-rc5/fs/ioctl.c
--- linux-3.6-rc5-orig/fs/ioctl.c	2012-07-22 05:58:29.000000000 +0900
+++ linux-3.6-rc5/fs/ioctl.c	2012-09-13 15:40:54.540150687 +0900
@@ -536,6 +536,16 @@ static int ioctl_fsthaw(struct file *fil
 	return thaw_super(sb);
 }
 
+static int ioctl_fs_isfrozen(struct file *filp)
+{
+	struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	return isfrozen_super(sb);
+}
+
 /*
  * When you add any new common ioctls to the switches above and below
  * please update compat_sys_ioctl() too.
@@ -585,6 +595,12 @@ int do_vfs_ioctl(struct file *filp, unsi
 		error = ioctl_fsthaw(filp);
 		break;
 
+	case FIISFROZEN:
+		error = ioctl_fs_isfrozen(filp);
+		if (error >= 0)
+			return put_user(error, (int __user *)arg);
+		break;
+
 	case FS_IOC_FIEMAP:
 		return ioctl_fiemap(filp, arg);
 
diff -urNp linux-3.6-rc5-orig/fs/super.c linux-3.6-rc5/fs/super.c
--- linux-3.6-rc5-orig/fs/super.c	2012-09-13 15:38:07.356137398 +0900
+++ linux-3.6-rc5/fs/super.c	2012-09-13 15:58:31.516184164 +0900
@@ -1533,3 +1533,8 @@ void emergency_thaw_all(void)
 		schedule_work(work);
 	}
 }
+
+int isfrozen_super(struct super_block *sb)
+{
+	return sb->s_writers.frozen > SB_UNFROZEN ? 1 : 0;
+}
diff -urNp linux-3.6-rc5-orig/include/linux/fs.h linux-3.6-rc5/include/linux/fs.h
--- linux-3.6-rc5-orig/include/linux/fs.h	2012-09-13 15:38:07.356137398 +0900
+++ linux-3.6-rc5/include/linux/fs.h	2012-09-13 15:40:54.540150687 +0900
@@ -342,6 +342,7 @@ struct inodes_stat_t {
 #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
 #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
 #define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
+#define FIISFROZEN	_IOR('X', 122, int)	/* get sb freeze state */
 
 #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
 #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
@@ -2086,6 +2087,7 @@ extern int fd_statfs(int, struct kstatfs
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
+extern int isfrozen_super(struct super_block *sb);
 extern bool our_mnt(struct vfsmount *mnt);
 
 extern int current_umask(void);



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-10-11 16:26 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-05  5:24 [PATCH 0/9 v5] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
2012-10-05  5:31 ` [PATCH 1/9] vfs: add __iterate_supers() and helpers around it Fernando Luis Vázquez Cao
2012-10-08 13:48   ` Jan Kara
2012-10-05  5:33 ` [PATCH 2/9] fsfreeze: add unlocked version of thaw_super Fernando Luis Vázquez Cao
2012-10-05  5:34 ` [PATCH 3/9] fsfreeze: Prevent emergency thaw from looping infinitely Fernando Luis Vázquez Cao
2012-10-05  5:35 ` [PATCH 4/9] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
2012-10-08 13:57   ` Jan Kara
2012-10-09  5:07     ` Fernando Luis Vazquez Cao
2012-10-09  8:20       ` Jan Kara
2012-10-09  9:52         ` Fernando Luis Vazquez Cao
2012-10-05  5:38 ` [PATCH 5/9] xfs: switch to using super methods for fsfreeze Fernando Luis Vázquez Cao
2012-10-05  5:39 ` [PATCH 6/9] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
2012-10-05  5:42 ` [PATCH 7/9] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
2012-10-08 14:17   ` Jan Kara
2012-10-05  5:43 ` [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao
2012-10-08 15:05   ` Jan Kara
2012-10-09  9:46     ` Fernando Luis Vazquez Cao
2012-10-09 14:55       ` Jan Kara
2012-10-10  2:17         ` Fernando Luis Vazquez Cao
2012-10-11 16:26           ` Jan Kara
2012-10-05  5:44 ` [PATCH 9/9] fsfreeze: add block device " Fernando Luis Vázquez Cao
  -- strict thread matches above, loose matches on Subject: below --
2012-09-14  6:43 [RFC, PATCH 0/9 v4] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
2012-09-14  6:54 ` [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao
2012-09-13 10:57 [RFC 0/9 v3] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
2012-09-13 11:11 ` [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.