All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8 v2] fsfreeze: miscellaneous fixes and cleanups
@ 2012-07-12  8:57 Fernando Luis Vázquez Cao
  2012-07-12  9:02 ` [PATCH 1/8] fsfreeze: Prevent emergency thaw from looping infinitely Fernando Luis Vázquez Cao
                   ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-07-12  8:57 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

This patch set (probable the first of a series) is to address long standing
issues in the filesytem freeze code and to fill some functionality gaps in the
API. Some minor code rearrangements are included too.

The first 5 patches are a rehash of a previous patch set from Dave Chinner which
I ported to the latest -rc kernel (3.5.0-rc5 at the time of writing) and
updated to fix bugs and reflect previous feedback from Christoph Hellwig and
Al Viro.

Dave, I took the liberty of adding your Signed-off-by as the author of the
original patches. If you do not agree with my modifications or want me to
acknowledge you differently please let me know.

You can find the original discussion here:
http://www.spinics.net/lists/linux-fsdevel/msg32794.html

The last 2 patches add a block device based check ioctl and superblock based
check ioctl.

More details follow below:

---
[1/8] fsfreeze: Prevent emergency thaw from looping infinitely
[2/8] fsfreeze: emergency thaw will deadlock on s_umount

This two patches fix the emergency thaw infinite loop reported by Jeff
Merkey and the deadlock on sb->s_umount that the infinite loop hid. These may
be stable kernel candidates.

[3/8] fsfreeze: freeze_super and thaw_bdev don't play well together
[4/8] fsfreeze: switch to using super methods where possible
[5/8] fsfreeze: move emergency thaw code to fs/super.c

These patches address the bdev/sb mismatch and the fact that sb level freezing
does not nest correctly. For all the places that the block device interfaces
are used, we need a superblock anyway so we may as well make freeze/thaw work
only at the superblock level. There is one caveat (pointed out by Christoph in
the discussion linked above) though: we still need to support the "feature"
that we can freeze a block device that does not have a filesystem mounted yet
(this can be used to prevent a filesystem mount during a dm snapshot for
example).

This series moves part of the nesting code to the superblock from the block
device level but retains the freeze_bdev/thaw_bdev API so that we can keep the
"feature" mentioned above for those who may need it, namely dm and external
users (yes, block device level API is exported for external use). All the other
users of the block device API are converted to work at the superblock level.

[6/8] fsfreeze: add vfs ioctl to check freeze state
[7/8] fsfreeze: add block device ioctl to check freeze state

These ioctls can be used to by HA and monitoring software to check the freeze
state of block device and filesystems.

[8/8] fsfreeze: update Documentation/filesystems/Locking

Documentation and implementation got out of sync.
---

In a follow-up patch I would like to tackle (second try) a big outstanding
problem: a frozen filesystem can be unmounted but with the current API one
needs to open a file in the target superblock, which is no longer accessible
because it has been unmounted.

Possible approaches to fix this issue include:

1- Add a block device level API to unfreeze the filesystem.

The problem with this approach is that it does not play well with filesystems
sb/bdev association is 1:n (i.e. btrfs and that ilk).

2- Disallow umount on frozen filesystems.

I think this is the more reasonable solution but it may require a refactoring
of the lazy unmount code; a lazy unmount should only be allowed to succeed if
both the target filesystem and all its child mounts are unfrozen (a failure to
guarantee this can lead to a situation where the user cannot unfreeze a
filesystem in the hierarchy because it is not visible).

3- Automatically thaw the filesystem on unmount.

Probably not a good idea.

Changes since v1:
  - Do not break unfreezing of block devices without a mounted filesystem.
  - Avoid conditional locking in thaw_bdev.
  - Update filesystem locking documentation.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/8] fsfreeze: Prevent emergency thaw from looping infinitely
  2012-07-12  8:57 [PATCH 0/8 v2] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
@ 2012-07-12  9:02 ` Fernando Luis Vázquez Cao
  2012-07-13 12:59   ` Jan Kara
  2012-07-12  9:04 ` [PATCH 2/8] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-07-12  9:02 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

The thawing of a filesystem through sysrq-j loops infinitely as it
incorrectly detects a thawed filesytsem as frozen and tries to
unfreeze repeatedly. This is a regression caused by
4504230a71566785a05d3e6b53fa1ee071b864eb ("freeze_bdev: grab active
reference to frozen superblocks") in that it no longer returned
-EINVAL for superblocks that were not frozen.

Return -EINVAL when the filesystem is already unfrozen to avoid this
problem.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp vfs-orig/fs/block_dev.c vfs/fs/block_dev.c
--- vfs-orig/fs/block_dev.c	2012-07-04 18:57:54.000000000 +0900
+++ vfs/fs/block_dev.c	2012-07-12 13:22:38.124815295 +0900
@@ -319,22 +319,17 @@ int thaw_bdev(struct block_device *bdev,
 	if (!bdev->bd_fsfreeze_count)
 		goto out;
 
-	error = 0;
-	if (--bdev->bd_fsfreeze_count > 0)
-		goto out;
-
-	if (!sb)
+	if (--bdev->bd_fsfreeze_count > 0 || !sb) {
+		error = 0;
 		goto out;
+	}
 
 	error = thaw_super(sb);
-	if (error) {
+	if (error)
 		bdev->bd_fsfreeze_count++;
-		mutex_unlock(&bdev->bd_fsfreeze_mutex);
-		return error;
-	}
 out:
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
-	return 0;
+	return error;
 }
 EXPORT_SYMBOL(thaw_bdev);
 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/8] fsfreeze: emergency thaw will deadlock on s_umount
  2012-07-12  8:57 [PATCH 0/8 v2] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
  2012-07-12  9:02 ` [PATCH 1/8] fsfreeze: Prevent emergency thaw from looping infinitely Fernando Luis Vázquez Cao
@ 2012-07-12  9:04 ` Fernando Luis Vázquez Cao
  2012-07-13 13:17   ` Jan Kara
  2012-07-12  9:05 ` [PATCH 3/8] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-07-12  9:04 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

The emergency thaw process uses iterate_super() which holds the
sb->s_umount lock in read mode. The current thaw_super() code takes
the sb->s_umount lock in write mode, hence leading to an instant
deadlock.

Pass the emergency state into the thaw_bdev/thaw_super code to avoid
taking the s_umount lock in this case. We are running under the bdev
freeze mutex, so this is still serialised against freeze despite
only having a read lock on the sb->s_umount. Hence it should be safe
to execute in this manner, especially given that emergency thaw is a
rarely executed "get-out-of-jail" feature.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp vfs-orig/fs/block_dev.c vfs/fs/block_dev.c
--- vfs-orig/fs/block_dev.c	2012-07-12 13:26:42.108635927 +0900
+++ vfs/fs/block_dev.c	2012-07-12 13:26:56.964628082 +0900
@@ -305,13 +305,14 @@ struct super_block *freeze_bdev(struct b
 EXPORT_SYMBOL(freeze_bdev);
 
 /**
- * thaw_bdev  -- unlock filesystem
+ * __thaw_bdev  -- unlock filesystem
  * @bdev:	blockdevice to unlock
  * @sb:		associated superblock
+ * @emergency:	emergency thaw
  *
  * Unlocks the filesystem and marks it writeable again after freeze_bdev().
  */
-int thaw_bdev(struct block_device *bdev, struct super_block *sb)
+static int __thaw_bdev(struct block_device *bdev, struct super_block *sb, int emergency)
 {
 	int error = -EINVAL;
 
@@ -324,15 +325,35 @@ int thaw_bdev(struct block_device *bdev,
 		goto out;
 	}
 
-	error = thaw_super(sb);
+	if (emergency)
+		error = thaw_super_emergency(sb);
+	else
+		error = thaw_super(sb);
 	if (error)
 		bdev->bd_fsfreeze_count++;
 out:
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	return error;
 }
+
+/**
+ * thaw_bdev  -- unlock filesystem
+ * @bdev:	blockdevice to unlock
+ * @sb:		associated superblock
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_bdev().
+ */
+int thaw_bdev(struct block_device *bdev, struct super_block *sb)
+{
+	return __thaw_bdev(bdev, sb, 0);
+}
 EXPORT_SYMBOL(thaw_bdev);
 
+int thaw_bdev_emergency(struct block_device *bdev, struct super_block *sb)
+{
+	return __thaw_bdev(bdev, sb, 1);
+}
+
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
diff -urNp vfs-orig/fs/buffer.c vfs/fs/buffer.c
--- vfs-orig/fs/buffer.c	2012-07-04 18:57:54.000000000 +0900
+++ vfs/fs/buffer.c	2012-07-12 13:26:56.964628082 +0900
@@ -514,7 +514,7 @@ repeat:
 static void do_thaw_one(struct super_block *sb, void *unused)
 {
 	char b[BDEVNAME_SIZE];
-	while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb))
+	while (sb->s_bdev && !thaw_bdev_emergency(sb->s_bdev, sb))
 		printk(KERN_WARNING "Emergency Thaw on %s\n",
 		       bdevname(sb->s_bdev, b));
 }
diff -urNp vfs-orig/fs/super.c vfs/fs/super.c
--- vfs-orig/fs/super.c	2012-07-04 18:57:54.000000000 +0900
+++ vfs/fs/super.c	2012-07-12 14:21:09.736566768 +0900
@@ -1221,23 +1221,25 @@ int freeze_super(struct super_block *sb)
 EXPORT_SYMBOL(freeze_super);
 
 /**
- * thaw_super -- unlock filesystem
+ * __thaw_super -- unlock filesystem
  * @sb: the super to thaw
+ * @emergency:	emergency thaw
  *
  * Unlocks the filesystem and marks it writeable again after freeze_super().
+ * This is the unlocked version of thaw_super and has to be called with the
+ * sb->s_umount lock held in the non-emergency thaw case.
  */
-int thaw_super(struct super_block *sb)
+static int __thaw_super(struct super_block *sb, int emergency)
 {
-	int error;
+	int error = 0;
 
-	down_write(&sb->s_umount);
 	if (sb->s_frozen == SB_UNFROZEN) {
-		up_write(&sb->s_umount);
-		return -EINVAL;
+		error = -EINVAL;
+		goto out;
 	}
 
 	if (sb->s_flags & MS_RDONLY)
-		goto out;
+		goto out_thaw;
 
 	if (sb->s_op->unfreeze_fs) {
 		error = sb->s_op->unfreeze_fs(sb);
@@ -1245,17 +1247,52 @@ int thaw_super(struct super_block *sb)
 			printk(KERN_ERR
 				"VFS:Filesystem thaw failed\n");
 			sb->s_frozen = SB_FREEZE_TRANS;
-			up_write(&sb->s_umount);
-			return error;
+			goto out;
 		}
 	}
 
-out:
+out_thaw:
 	sb->s_frozen = SB_UNFROZEN;
 	smp_wmb();
 	wake_up(&sb->s_wait_unfrozen);
-	deactivate_locked_super(sb);
 
-	return 0;
+	/*
+	 * When called from emergency scope, we cannot grab the s_umount lock
+	 * so we cannot deactivate the superblock. This may leave unbalanced
+	 * superblock references which could prevent unmount, but given this is
+	 * an emergency operation....
+	 */
+	if (!emergency)
+		deactivate_locked_super(sb);
+out:
+	return error;
+}
+
+/**
+ * thaw_super -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ */
+int thaw_super(struct super_block *sb)
+{
+	int res;
+	down_write(&sb->s_umount);
+	res = __thaw_super(sb, 0);
+	up_write(&sb->s_umount);
+	return res;
 }
 EXPORT_SYMBOL(thaw_super);
+
+/**
+ * thaw_super_emergency  -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ * The kernel gets here holding the sb->s_umount lock in read mode so we cannot
+ * grab it again in write mode.
+ */
+int thaw_super_emergency(struct super_block *sb)
+{
+	return __thaw_super(sb, 1);
+}
diff -urNp vfs-orig/include/linux/fs.h vfs/include/linux/fs.h
--- vfs-orig/include/linux/fs.h	2012-07-04 18:57:54.000000000 +0900
+++ vfs/include/linux/fs.h	2012-07-12 13:26:56.968627781 +0900
@@ -1944,6 +1944,7 @@ extern int fd_statfs(int, struct kstatfs
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
+extern int thaw_super_emergency(struct super_block *super);
 extern bool our_mnt(struct vfsmount *mnt);
 
 extern int current_umask(void);
@@ -2107,6 +2108,8 @@ extern void kill_bdev(struct block_devic
 extern struct super_block *freeze_bdev(struct block_device *);
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
+extern int thaw_bdev_emergency(struct block_device *bdev,
+			       struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
 #else
 static inline void bd_forget(struct inode *inode) {}
@@ -2123,6 +2126,12 @@ static inline int thaw_bdev(struct block
 {
 	return 0;
 }
+
+static inline int thaw_bdev_emergency(struct block_device *bdev,
+				      struct super_block *sb)
+{
+	return 0;
+}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/8] fsfreeze: freeze_super and thaw_bdev don't play well together
  2012-07-12  8:57 [PATCH 0/8 v2] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
  2012-07-12  9:02 ` [PATCH 1/8] fsfreeze: Prevent emergency thaw from looping infinitely Fernando Luis Vázquez Cao
  2012-07-12  9:04 ` [PATCH 2/8] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
@ 2012-07-12  9:05 ` Fernando Luis Vázquez Cao
  2012-07-13 13:45   ` Jan Kara
  2012-07-12  9:07 ` [PATCH 4/8] fsfreeze: switch to using super methods where possible Fernando Luis Vázquez Cao
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-07-12  9:05 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

Changes from Dave Chinner's version:
 - Remove s_frozen check in freeze_super which is not needed now that it is
   re-entrant.
 - Decrement freeze counter if the freeze_fs callback fails.

---

thaw_bdev() has re-entrancy guards to allow freezes to nest
together. That is, it ensures that the filesystem is not thawed
until the last thaw command is issued. This is needed to prevent the
filesystem from being unfrozen while an existing freezer is still
operating on the filesystem in a frozen state (e.g. dm-snapshot).

Currently, freeze_super() and thaw_super() bypasses these guards,
and as a result manual freezing and unfreezing via the ioctl methods
do not nest correctly. hence mixing userspace directed freezes with
block device level freezes result in inconsistency due to premature
thawing of the filesystem.

Move the re-enterency guards to the superblock and into freeze_super
and thaw_super() so that userspace directed freezes nest correctly
again. Caveat: Things work as expected as long as direct calls to
thaw_super are always in response to a previous sb level freeze. In
other words an unpaired call to thaw_super can still thaw a
filesystem frozen using freeze_bdev (this issue could be addressed
in a follow-up patch if deemed necessary).

This patch retains the bdev level mutex and counter to keep the
"feature" that we can freeze a block device that does not have a
filesystem mounted yet.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp vfs-orig/fs/block_dev.c vfs/fs/block_dev.c
--- vfs-orig/fs/block_dev.c	2012-07-12 14:31:38.936631141 +0900
+++ vfs/fs/block_dev.c	2012-07-12 15:03:57.032627014 +0900
@@ -257,16 +257,18 @@ int fsync_bdev(struct block_device *bdev
 EXPORT_SYMBOL(fsync_bdev);
 
 /**
- * freeze_bdev  --  lock a filesystem and force it into a consistent state
+ * freeze_bdev  --  lock a block device
  * @bdev:	blockdevice to lock
  *
- * If a superblock is found on this device, we take the s_umount semaphore
- * on it to make sure nobody unmounts until the snapshot creation is done.
- * The reference counter (bd_fsfreeze_count) guarantees that only the last
- * unfreeze process can unfreeze the frozen filesystem actually when multiple
- * freeze requests arrive simultaneously. It counts up in freeze_bdev() and
- * count down in thaw_bdev(). When it becomes 0, thaw_bdev() will unfreeze
- * actually.
+ * Locks the block device and, if present, the associated filesystem too.
+ *
+ * The reference counter (bd_fsfreeze_count) is used to implement the feature
+ * that allows one to freeze a block device that does not have a filesystem
+ * mounted yet. For filesystems using mount_bdev the kernel takes care of
+ * things by preventing the mount operation from succeeding if the underlying
+ * block device is frozen. Other filesystems should check this counter or risk
+ * a situation where a freeze_bdev user (e.g. dm snapshot) and mount race,
+ * which may lead to inconsistencies.
  */
 struct super_block *freeze_bdev(struct block_device *bdev)
 {
@@ -274,17 +276,7 @@ struct super_block *freeze_bdev(struct b
 	int error = 0;
 
 	mutex_lock(&bdev->bd_fsfreeze_mutex);
-	if (++bdev->bd_fsfreeze_count > 1) {
-		/*
-		 * We don't even need to grab a reference - the first call
-		 * to freeze_bdev grab an active reference and only the last
-		 * thaw_bdev drops it.
-		 */
-		sb = get_super(bdev);
-		drop_super(sb);
-		mutex_unlock(&bdev->bd_fsfreeze_mutex);
-		return sb;
-	}
+	bdev->bd_fsfreeze_count++;
 
 	sb = get_active_super(bdev);
 	if (!sb)
@@ -297,30 +289,33 @@ struct super_block *freeze_bdev(struct b
 		return ERR_PTR(error);
 	}
 	deactivate_super(sb);
- out:
+out:
 	sync_blockdev(bdev);
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
-	return sb;	/* thaw_bdev releases s->s_umount */
+	return sb;
 }
 EXPORT_SYMBOL(freeze_bdev);
 
 /**
- * __thaw_bdev  -- unlock filesystem
+ * __thaw_bdev  -- unlock a block device
  * @bdev:	blockdevice to unlock
  * @sb:		associated superblock
  * @emergency:	emergency thaw
  *
- * Unlocks the filesystem and marks it writeable again after freeze_bdev().
+ * Unlocks the block device and, if present, the associated filesystem too.
  */
 static int __thaw_bdev(struct block_device *bdev, struct super_block *sb, int emergency)
 {
 	int error = -EINVAL;
 
 	mutex_lock(&bdev->bd_fsfreeze_mutex);
+
 	if (!bdev->bd_fsfreeze_count)
 		goto out;
 
-	if (--bdev->bd_fsfreeze_count > 0 || !sb) {
+	bdev->bd_fsfreeze_count--;
+
+	if (!sb) {
 		error = 0;
 		goto out;
 	}
@@ -336,13 +331,6 @@ out:
 	return error;
 }
 
-/**
- * thaw_bdev  -- unlock filesystem
- * @bdev:	blockdevice to unlock
- * @sb:		associated superblock
- *
- * Unlocks the filesystem and marks it writeable again after freeze_bdev().
- */
 int thaw_bdev(struct block_device *bdev, struct super_block *sb)
 {
 	return __thaw_bdev(bdev, sb, 0);
diff -urNp vfs-orig/fs/gfs2/ops_fstype.c vfs/fs/gfs2/ops_fstype.c
--- vfs-orig/fs/gfs2/ops_fstype.c	2012-07-04 18:57:54.000000000 +0900
+++ vfs/fs/gfs2/ops_fstype.c	2012-07-12 15:04:37.924628170 +0900
@@ -1275,11 +1275,6 @@ static struct dentry *gfs2_mount(struct
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
 
-	/*
-	 * once the super is inserted into the list by sget, s_umount
-	 * will protect the lockfs code from trying to start a snapshot
-	 * while we are mounting
-	 */
 	mutex_lock(&bdev->bd_fsfreeze_mutex);
 	if (bdev->bd_fsfreeze_count > 0) {
 		mutex_unlock(&bdev->bd_fsfreeze_mutex);
diff -urNp vfs-orig/fs/nilfs2/super.c vfs/fs/nilfs2/super.c
--- vfs-orig/fs/nilfs2/super.c	2012-07-04 18:57:54.000000000 +0900
+++ vfs/fs/nilfs2/super.c	2012-07-12 15:04:55.112642078 +0900
@@ -1277,11 +1277,6 @@ nilfs_mount(struct file_system_type *fs_
 		goto failed;
 	}
 
-	/*
-	 * once the super is inserted into the list by sget, s_umount
-	 * will protect the lockfs code from trying to start a snapshot
-	 * while we are mounting
-	 */
 	mutex_lock(&sd.bdev->bd_fsfreeze_mutex);
 	if (sd.bdev->bd_fsfreeze_count > 0) {
 		mutex_unlock(&sd.bdev->bd_fsfreeze_mutex);
diff -urNp vfs-orig/fs/super.c vfs/fs/super.c
--- vfs-orig/fs/super.c	2012-07-12 14:31:38.964628483 +0900
+++ vfs/fs/super.c	2012-07-12 16:50:27.172627639 +0900
@@ -187,6 +187,8 @@ static struct super_block *alloc_super(s
 		s->s_shrink.seeks = DEFAULT_SEEKS;
 		s->s_shrink.shrink = prune_super;
 		s->s_shrink.batch = 1024;
+
+		mutex_init(&s->s_freeze_mutex);
 	}
 out:
 	return s;
@@ -984,11 +986,6 @@ struct dentry *mount_bdev(struct file_sy
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
 
-	/*
-	 * once the super is inserted into the list by sget, s_umount
-	 * will protect the lockfs code from trying to start a snapshot
-	 * while we are mounting
-	 */
 	mutex_lock(&bdev->bd_fsfreeze_mutex);
 	if (bdev->bd_fsfreeze_count > 0) {
 		mutex_unlock(&bdev->bd_fsfreeze_mutex);
@@ -1168,30 +1165,29 @@ out:
  * @sb: the super to lock
  *
  * Syncs the super to make sure the filesystem is consistent and calls the fs's
- * freeze_fs.  Subsequent calls to this without first thawing the fs will return
- * -EBUSY.
+ * freeze_fs.  The reference counter (s_freeze_count) guarantees that only the
+ * last unfreeze process can unfreeze the frozen filesystem actually when
+ * multiple freeze requests arrive simultaneously. It counts up in
+ * freeze_super() and counts down in thaw_super(). When it becomes 0,
+ * thaw_super() will execute the unfreeze.
  */
 int freeze_super(struct super_block *sb)
 {
-	int ret;
+	int ret = 0;
 
 	atomic_inc(&sb->s_active);
 	down_write(&sb->s_umount);
-	if (sb->s_frozen) {
-		deactivate_locked_super(sb);
-		return -EBUSY;
-	}
+	mutex_lock(&sb->s_freeze_mutex);
+	if (++sb->s_freeze_count > 1)
+		goto out_deactivate;
 
-	if (!(sb->s_flags & MS_BORN)) {
-		up_write(&sb->s_umount);
-		return 0;	/* sic - it's "nothing to do" */
-	}
+	if (!(sb->s_flags & MS_BORN))
+		goto out_active;	/* sic - it's "nothing to do" */
 
 	if (sb->s_flags & MS_RDONLY) {
 		sb->s_frozen = SB_FREEZE_TRANS;
 		smp_wmb();
-		up_write(&sb->s_umount);
-		return 0;
+		goto out_active;
 	}
 
 	sb->s_frozen = SB_FREEZE_WRITE;
@@ -1206,17 +1202,24 @@ int freeze_super(struct super_block *sb)
 	if (sb->s_op->freeze_fs) {
 		ret = sb->s_op->freeze_fs(sb);
 		if (ret) {
+			sb->s_freeze_count--;
 			printk(KERN_ERR
 				"VFS:Filesystem freeze failed\n");
 			sb->s_frozen = SB_UNFROZEN;
 			smp_wmb();
 			wake_up(&sb->s_wait_unfrozen);
-			deactivate_locked_super(sb);
-			return ret;
+			goto out_deactivate;
 		}
 	}
+
+out_active:
 	up_write(&sb->s_umount);
-	return 0;
+out_unlock:
+	mutex_unlock(&sb->s_freeze_mutex);
+	return ret;
+out_deactivate:
+	deactivate_locked_super(sb);
+	goto out_unlock;
 }
 EXPORT_SYMBOL(freeze_super);
 
@@ -1226,6 +1229,10 @@ EXPORT_SYMBOL(freeze_super);
  * @emergency:	emergency thaw
  *
  * Unlocks the filesystem and marks it writeable again after freeze_super().
+ * Returns -EINVAL if @sb is not frozen, 0 if the filesystem specific unfreeze
+ * function was executed and succeeded or the corresponding error code
+ * otherwise. if the unfreeze fails, @sb is left in the frozen state.
+ *
  * This is the unlocked version of thaw_super and has to be called with the
  * sb->s_umount lock held in the non-emergency thaw case.
  */
@@ -1233,29 +1240,33 @@ static int __thaw_super(struct super_blo
 {
 	int error = 0;
 
-	if (sb->s_frozen == SB_UNFROZEN) {
+	mutex_lock(&sb->s_freeze_mutex);
+	if (!sb->s_freeze_count) {
 		error = -EINVAL;
-		goto out;
+		goto out_unlock;
 	}
+	sb->s_freeze_count = emergency ? 1 : sb->s_freeze_count;
+
+	if (--sb->s_freeze_count > 0)
+		goto out_unlock;
 
 	if (sb->s_flags & MS_RDONLY)
-		goto out_thaw;
+		goto out_unfreeze;
 
 	if (sb->s_op->unfreeze_fs) {
 		error = sb->s_op->unfreeze_fs(sb);
 		if (error) {
 			printk(KERN_ERR
 				"VFS:Filesystem thaw failed\n");
+			sb->s_freeze_count++;
 			sb->s_frozen = SB_FREEZE_TRANS;
-			goto out;
+			goto out_unlock;
 		}
 	}
-
-out_thaw:
+out_unfreeze:
 	sb->s_frozen = SB_UNFROZEN;
 	smp_wmb();
 	wake_up(&sb->s_wait_unfrozen);
-
 	/*
 	 * When called from emergency scope, we cannot grab the s_umount lock
 	 * so we cannot deactivate the superblock. This may leave unbalanced
@@ -1264,7 +1275,8 @@ out_thaw:
 	 */
 	if (!emergency)
 		deactivate_locked_super(sb);
-out:
+out_unlock:
+	mutex_unlock(&sb->s_freeze_mutex);
 	return error;
 }
 
diff -urNp vfs-orig/include/linux/fs.h vfs/include/linux/fs.h
--- vfs-orig/include/linux/fs.h	2012-07-12 14:31:39.008626843 +0900
+++ vfs/include/linux/fs.h	2012-07-12 15:58:45.692627465 +0900
@@ -1542,6 +1542,9 @@ struct super_block {
 
 	/* Being remounted read-only */
 	int s_readonly_remount;
+
+	int			s_freeze_count; /* nr of nested freezes */
+	struct mutex		s_freeze_mutex; /* nesting lock */
 };
 
 /* superblock cache pruning functions */



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 4/8] fsfreeze: switch to using super methods where possible
  2012-07-12  8:57 [PATCH 0/8 v2] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (2 preceding siblings ...)
  2012-07-12  9:05 ` [PATCH 3/8] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
@ 2012-07-12  9:07 ` Fernando Luis Vázquez Cao
  2012-07-13 13:50   ` Jan Kara
  2012-07-12  9:09 ` [PATCH 5/8] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-07-12  9:07 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

Changes from Dave Chinner's version:
- Retain the freeze_bdev/thaw_bdev API so that we can keep the feature that
  we can freeze a block device that does not have a filesystem mounted yet.

---

In some places we are using freeze/thaw_bdev despite the fact the kernel
is operating at the sb level there. Convert these users of the bdev
interfaces to use the superblock interfaces instead.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp vfs-orig/fs/block_dev.c vfs/fs/block_dev.c
--- vfs-orig/fs/block_dev.c	2012-07-12 17:03:18.516624899 +0900
+++ vfs/fs/block_dev.c	2012-07-12 17:04:35.244784436 +0900
@@ -297,14 +297,14 @@ out:
 EXPORT_SYMBOL(freeze_bdev);
 
 /**
- * __thaw_bdev  -- unlock a block device
+ * thaw_bdev  -- unlock a block device
  * @bdev:	blockdevice to unlock
  * @sb:		associated superblock
  * @emergency:	emergency thaw
  *
  * Unlocks the block device and, if present, the associated filesystem too.
  */
-static int __thaw_bdev(struct block_device *bdev, struct super_block *sb, int emergency)
+int thaw_bdev(struct block_device *bdev, struct super_block *sb)
 {
 	int error = -EINVAL;
 
@@ -320,28 +320,15 @@ static int __thaw_bdev(struct block_devi
 		goto out;
 	}
 
-	if (emergency)
-		error = thaw_super_emergency(sb);
-	else
-		error = thaw_super(sb);
+	error = thaw_super(sb);
 	if (error)
 		bdev->bd_fsfreeze_count++;
 out:
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	return error;
 }
-
-int thaw_bdev(struct block_device *bdev, struct super_block *sb)
-{
-	return __thaw_bdev(bdev, sb, 0);
-}
 EXPORT_SYMBOL(thaw_bdev);
 
-int thaw_bdev_emergency(struct block_device *bdev, struct super_block *sb)
-{
-	return __thaw_bdev(bdev, sb, 1);
-}
-
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
diff -urNp vfs-orig/fs/buffer.c vfs/fs/buffer.c
--- vfs-orig/fs/buffer.c	2012-07-12 14:31:38.944630298 +0900
+++ vfs/fs/buffer.c	2012-07-12 17:04:35.244784436 +0900
@@ -514,7 +514,7 @@ repeat:
 static void do_thaw_one(struct super_block *sb, void *unused)
 {
 	char b[BDEVNAME_SIZE];
-	while (sb->s_bdev && !thaw_bdev_emergency(sb->s_bdev, sb))
+	while (sb->s_bdev && !thaw_super_emergency(sb))
 		printk(KERN_WARNING "Emergency Thaw on %s\n",
 		       bdevname(sb->s_bdev, b));
 }
diff -urNp vfs-orig/fs/xfs/xfs_fsops.c vfs/fs/xfs/xfs_fsops.c
--- vfs-orig/fs/xfs/xfs_fsops.c	2012-07-04 18:57:54.000000000 +0900
+++ vfs/fs/xfs/xfs_fsops.c	2012-07-12 17:04:35.252780168 +0900
@@ -664,16 +664,12 @@ xfs_fs_goingdown(
 	__uint32_t	inflags)
 {
 	switch (inflags) {
-	case XFS_FSOP_GOING_FLAGS_DEFAULT: {
-		struct super_block *sb = freeze_bdev(mp->m_super->s_bdev);
-
-		if (sb && !IS_ERR(sb)) {
+	case XFS_FSOP_GOING_FLAGS_DEFAULT:
+		if (!freeze_super(mp->m_super)) {
 			xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
-			thaw_bdev(sb->s_bdev, sb);
+			thaw_super(mp->m_super);
 		}
-
 		break;
-	}
 	case XFS_FSOP_GOING_FLAGS_LOGFLUSH:
 		xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
 		break;
diff -urNp vfs-orig/include/linux/fs.h vfs/include/linux/fs.h
--- vfs-orig/include/linux/fs.h	2012-07-12 17:03:18.596626998 +0900
+++ vfs/include/linux/fs.h	2012-07-12 17:04:35.252780168 +0900
@@ -2111,8 +2111,6 @@ extern void kill_bdev(struct block_devic
 extern struct super_block *freeze_bdev(struct block_device *);
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
-extern int thaw_bdev_emergency(struct block_device *bdev,
-			       struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
 #else
 static inline void bd_forget(struct inode *inode) {}
@@ -2129,12 +2127,6 @@ static inline int thaw_bdev(struct block
 {
 	return 0;
 }
-
-static inline int thaw_bdev_emergency(struct block_device *bdev,
-				      struct super_block *sb)
-{
-	return 0;
-}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 5/8] fsfreeze: move emergency thaw code to fs/super.c
  2012-07-12  8:57 [PATCH 0/8 v2] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (3 preceding siblings ...)
  2012-07-12  9:07 ` [PATCH 4/8] fsfreeze: switch to using super methods where possible Fernando Luis Vázquez Cao
@ 2012-07-12  9:09 ` Fernando Luis Vázquez Cao
  2012-07-13 13:53   ` Jan Kara
  2012-07-12  9:10 ` [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-07-12  9:09 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

It makes no sense having the emergency thaw code in fs/buffer.c when all of
it's operations are one superblocks and the code it executes is all in
fs/super.c. Move the code there and clean it up.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp vfs-orig/fs/buffer.c vfs/fs/buffer.c
--- vfs-orig/fs/buffer.c	2012-07-12 17:11:42.480606884 +0900
+++ vfs/fs/buffer.c	2012-07-12 17:14:25.652627936 +0900
@@ -511,37 +511,6 @@ repeat:
 	return err;
 }
 
-static void do_thaw_one(struct super_block *sb, void *unused)
-{
-	char b[BDEVNAME_SIZE];
-	while (sb->s_bdev && !thaw_super_emergency(sb))
-		printk(KERN_WARNING "Emergency Thaw on %s\n",
-		       bdevname(sb->s_bdev, b));
-}
-
-static void do_thaw_all(struct work_struct *work)
-{
-	iterate_supers(do_thaw_one, NULL);
-	kfree(work);
-	printk(KERN_WARNING "Emergency Thaw complete\n");
-}
-
-/**
- * emergency_thaw_all -- forcibly thaw every frozen filesystem
- *
- * Used for emergency unfreeze of all filesystems via SysRq
- */
-void emergency_thaw_all(void)
-{
-	struct work_struct *work;
-
-	work = kmalloc(sizeof(*work), GFP_ATOMIC);
-	if (work) {
-		INIT_WORK(work, do_thaw_all);
-		schedule_work(work);
-	}
-}
-
 /**
  * sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
  * @mapping: the mapping which wants those buffers written
diff -urNp vfs-orig/fs/super.c vfs/fs/super.c
--- vfs-orig/fs/super.c	2012-07-12 17:03:18.568626969 +0900
+++ vfs/fs/super.c	2012-07-12 17:21:40.768626516 +0900
@@ -1304,7 +1304,35 @@ EXPORT_SYMBOL(thaw_super);
  * The kernel gets here holding the sb->s_umount lock in read mode so we cannot
  * grab it again in write mode.
  */
-int thaw_super_emergency(struct super_block *sb)
+static void thaw_super_emergency(struct super_block *sb, void *unused)
 {
-	return __thaw_super(sb, 1);
+	if (sb->s_bdev) {
+		char b[BDEVNAME_SIZE];
+		printk(KERN_WARNING "Emergency Thaw on %s.\n",
+				    bdevname(sb->s_bdev, b));
+	}
+	while (!__thaw_super(sb, 1));
+}
+
+static void do_thaw_all(struct work_struct *work)
+{
+	iterate_supers(thaw_super_emergency, NULL);
+	kfree(work);
+	printk(KERN_WARNING "Emergency Thaw complete\n");
+}
+
+/**
+ * emergency_thaw_all -- forcibly thaw every frozen filesystem
+ *
+ * Used for emergency unfreeze of all filesystems via SysRq
+ */
+void emergency_thaw_all(void)
+{
+	struct work_struct *work;
+
+	work = kmalloc(sizeof(*work), GFP_ATOMIC);
+	if (work) {
+		INIT_WORK(work, do_thaw_all);
+		schedule_work(work);
+	}
 }
diff -urNp vfs-orig/include/linux/fs.h vfs/include/linux/fs.h
--- vfs-orig/include/linux/fs.h	2012-07-12 17:11:42.496609385 +0900
+++ vfs/include/linux/fs.h	2012-07-12 17:22:14.640642715 +0900
@@ -1947,7 +1947,6 @@ extern int fd_statfs(int, struct kstatfs
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
-extern int thaw_super_emergency(struct super_block *super);
 extern bool our_mnt(struct vfsmount *mnt);
 
 extern int current_umask(void);



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state
  2012-07-12  8:57 [PATCH 0/8 v2] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (4 preceding siblings ...)
  2012-07-12  9:09 ` [PATCH 5/8] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
@ 2012-07-12  9:10 ` Fernando Luis Vázquez Cao
  2012-07-13 13:54   ` Jan Kara
  2012-07-12  9:11 ` [PATCH 7/8] fsfreeze: add block device " Fernando Luis Vázquez Cao
  2012-07-12  9:12 ` [PATCH 8/8] fsfreeze: update Documentation/filesystems/Locking Fernando Luis Vázquez Cao
  7 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-07-12  9:10 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

The FIISFROZEN ioctl can be use by HA and monitoring software to check
the freeze state of a mounted filesystem.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp vfs-orig/fs/compat_ioctl.c vfs/fs/compat_ioctl.c
--- vfs-orig/fs/compat_ioctl.c	2012-07-04 18:57:54.000000000 +0900
+++ vfs/fs/compat_ioctl.c	2012-07-12 17:25:37.592626439 +0900
@@ -885,6 +885,7 @@ COMPATIBLE_IOCTL(FIGETBSZ)
 /* 'X' - originally XFS but some now in the VFS */
 COMPATIBLE_IOCTL(FIFREEZE)
 COMPATIBLE_IOCTL(FITHAW)
+COMPATIBLE_IOCTL(FIISFROZEN)
 COMPATIBLE_IOCTL(KDGETKEYCODE)
 COMPATIBLE_IOCTL(KDSETKEYCODE)
 COMPATIBLE_IOCTL(KDGKBTYPE)
diff -urNp vfs-orig/fs/ioctl.c vfs/fs/ioctl.c
--- vfs-orig/fs/ioctl.c	2012-07-04 18:57:54.000000000 +0900
+++ vfs/fs/ioctl.c	2012-07-12 17:25:37.592626439 +0900
@@ -536,6 +536,16 @@ static int ioctl_fsthaw(struct file *fil
 	return thaw_super(sb);
 }
 
+static int ioctl_fs_isfrozen(struct file *filp)
+{
+	struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	return isfrozen_super(sb);
+}
+
 /*
  * When you add any new common ioctls to the switches above and below
  * please update compat_sys_ioctl() too.
@@ -585,6 +595,12 @@ int do_vfs_ioctl(struct file *filp, unsi
 		error = ioctl_fsthaw(filp);
 		break;
 
+	case FIISFROZEN:
+		error = ioctl_fs_isfrozen(filp);
+		if (error >= 0)
+			return put_user(error, (int __user *)arg);
+		break;
+
 	case FS_IOC_FIEMAP:
 		return ioctl_fiemap(filp, arg);
 
diff -urNp vfs-orig/fs/super.c vfs/fs/super.c
--- vfs-orig/fs/super.c	2012-07-12 17:24:40.096620156 +0900
+++ vfs/fs/super.c	2012-07-12 17:25:37.600627115 +0900
@@ -1336,3 +1336,8 @@ void emergency_thaw_all(void)
 		schedule_work(work);
 	}
 }
+
+int isfrozen_super(struct super_block *sb)
+{
+	return sb->s_frozen > SB_UNFROZEN ? 1 : 0;
+}
diff -urNp vfs-orig/include/linux/fs.h vfs/include/linux/fs.h
--- vfs-orig/include/linux/fs.h	2012-07-12 17:24:40.116622271 +0900
+++ vfs/include/linux/fs.h	2012-07-12 17:25:37.600627115 +0900
@@ -340,6 +340,7 @@ struct inodes_stat_t {
 #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
 #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
 #define FITRIM		_IOWR('X', 121, struct fstrim_range)	/* Trim */
+#define FIISFROZEN	_IOR('X', 122, int)	/* get sb freeze state */
 
 #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
 #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
@@ -1947,6 +1948,7 @@ extern int fd_statfs(int, struct kstatfs
 extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
+extern int isfrozen_super(struct super_block *sb);
 extern bool our_mnt(struct vfsmount *mnt);
 
 extern int current_umask(void);



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 7/8] fsfreeze: add block device ioctl to check freeze state
  2012-07-12  8:57 [PATCH 0/8 v2] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (5 preceding siblings ...)
  2012-07-12  9:10 ` [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao
@ 2012-07-12  9:11 ` Fernando Luis Vázquez Cao
  2012-07-12  9:12 ` [PATCH 8/8] fsfreeze: update Documentation/filesystems/Locking Fernando Luis Vázquez Cao
  7 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-07-12  9:11 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

The BLKISFROZEN ioctl can be use by HA and monitoring software to check
the freeze state of the filesystem sitting on top of a block device.
This will not work for filesystems that can handle multiple devices,
i.e. btrfs.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp vfs-orig/block/compat_ioctl.c vfs/block/compat_ioctl.c
--- vfs-orig/block/compat_ioctl.c	2012-07-04 18:57:54.000000000 +0900
+++ vfs/block/compat_ioctl.c	2012-07-12 17:29:01.608639153 +0900
@@ -746,6 +746,13 @@ long compat_blkdev_ioctl(struct file *fi
 	case BLKTRACETEARDOWN: /* compatible */
 		ret = blk_trace_ioctl(bdev, cmd, compat_ptr(arg));
 		return ret;
+	case BLKISFROZEN:
+		if (!capable(CAP_SYS_ADMIN))
+			return -EPERM;
+		ret = isfrozen_bdev(bdev);
+		if (ret >= 0)
+			return compat_put_int(arg, ret);
+		return ret;
 	default:
 		if (disk->fops->compat_ioctl)
 			ret = disk->fops->compat_ioctl(bdev, mode, cmd, arg);
diff -urNp vfs-orig/block/ioctl.c vfs/block/ioctl.c
--- vfs-orig/block/ioctl.c	2012-07-04 18:57:54.000000000 +0900
+++ vfs/block/ioctl.c	2012-07-12 17:29:01.608639153 +0900
@@ -343,6 +343,13 @@ int blkdev_ioctl(struct block_device *bd
 	case BLKTRACETEARDOWN:
 		ret = blk_trace_ioctl(bdev, cmd, (char __user *) arg);
 		break;
+	case BLKISFROZEN:
+		if (!capable(CAP_SYS_ADMIN))
+			return -EPERM;
+		ret = isfrozen_bdev(bdev);
+		if (ret >= 0)
+			return put_int(arg, ret);
+		return ret;
 	default:
 		ret = __blkdev_driver_ioctl(bdev, mode, cmd, arg);
 	}
diff -urNp vfs-orig/fs/block_dev.c vfs/fs/block_dev.c
--- vfs-orig/fs/block_dev.c	2012-07-12 17:11:42.480606884 +0900
+++ vfs/fs/block_dev.c	2012-07-12 17:29:01.616637572 +0900
@@ -329,6 +329,11 @@ out:
 }
 EXPORT_SYMBOL(thaw_bdev);
 
+int isfrozen_bdev(struct block_device *bdev)
+{
+	return bdev->bd_fsfreeze_count > 0;
+}
+
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
diff -urNp vfs-orig/include/linux/fs.h vfs/include/linux/fs.h
--- vfs-orig/include/linux/fs.h	2012-07-12 17:28:15.556654469 +0900
+++ vfs/include/linux/fs.h	2012-07-12 17:29:01.616637572 +0900
@@ -333,6 +333,7 @@ struct inodes_stat_t {
 #define BLKDISCARDZEROES _IO(0x12,124)
 #define BLKSECDISCARD _IO(0x12,125)
 #define BLKROTATIONAL _IO(0x12,126)
+#define BLKISFROZEN _IOR(0x12,127, int)	/* get file system freeze state */
 
 #define BMAP_IOCTL 1		/* obsolete - kept for compatibility */
 #define FIBMAP	   _IO(0x00,1)	/* bmap access */
@@ -2112,6 +2113,7 @@ extern void kill_bdev(struct block_devic
 extern struct super_block *freeze_bdev(struct block_device *);
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
+extern int isfrozen_bdev(struct block_device *bdev);
 extern int fsync_bdev(struct block_device *);
 #else
 static inline void bd_forget(struct inode *inode) {}
@@ -2128,6 +2130,11 @@ static inline int thaw_bdev(struct block
 {
 	return 0;
 }
+
+static int isfrozen_bdev(struct block_device *bdev)
+{
+	return 0;
+}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 8/8] fsfreeze: update Documentation/filesystems/Locking
  2012-07-12  8:57 [PATCH 0/8 v2] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
                   ` (6 preceding siblings ...)
  2012-07-12  9:11 ` [PATCH 7/8] fsfreeze: add block device " Fernando Luis Vázquez Cao
@ 2012-07-12  9:12 ` Fernando Luis Vázquez Cao
  2012-07-13 14:11   ` Jan Kara
  7 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vázquez Cao @ 2012-07-12  9:12 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

Documentation and implementation got out of sync. This patch updates the
documentation to reflect the current fsfreeze locking rules.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---

diff -urNp vfs-orig/Documentation/filesystems/Locking vfs/Documentation/filesystems/Locking
--- vfs-orig/Documentation/filesystems/Locking	2012-07-04 18:57:54.000000000 +0900
+++ vfs/Documentation/filesystems/Locking	2012-07-12 17:33:44.792662348 +0900
@@ -138,8 +138,8 @@ evict_inode:
 put_super:		write
 write_super:		read
 sync_fs:		read
-freeze_fs:		read
-unfreeze_fs:		read
+freeze_fs:		write
+unfreeze_fs:		write
 statfs:			maybe(read)	(see below)
 remount_fs:		write
 umount_begin:		no



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/8] fsfreeze: Prevent emergency thaw from looping infinitely
  2012-07-12  9:02 ` [PATCH 1/8] fsfreeze: Prevent emergency thaw from looping infinitely Fernando Luis Vázquez Cao
@ 2012-07-13 12:59   ` Jan Kara
  2012-07-17  5:13     ` Fernando Luis Vazquez Cao
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2012-07-13 12:59 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

On Thu 12-07-12 18:02:07, Fernando Luis Vázquez Cao wrote:
> The thawing of a filesystem through sysrq-j loops infinitely as it
> incorrectly detects a thawed filesytsem as frozen and tries to
> unfreeze repeatedly. This is a regression caused by
> 4504230a71566785a05d3e6b53fa1ee071b864eb ("freeze_bdev: grab active
> reference to frozen superblocks") in that it no longer returned
> -EINVAL for superblocks that were not frozen.
> 
> Return -EINVAL when the filesystem is already unfrozen to avoid this
> problem.
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
  Looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
> 
> diff -urNp vfs-orig/fs/block_dev.c vfs/fs/block_dev.c
> --- vfs-orig/fs/block_dev.c	2012-07-04 18:57:54.000000000 +0900
> +++ vfs/fs/block_dev.c	2012-07-12 13:22:38.124815295 +0900
> @@ -319,22 +319,17 @@ int thaw_bdev(struct block_device *bdev,
>  	if (!bdev->bd_fsfreeze_count)
>  		goto out;
>  
> -	error = 0;
> -	if (--bdev->bd_fsfreeze_count > 0)
> -		goto out;
> -
> -	if (!sb)
> +	if (--bdev->bd_fsfreeze_count > 0 || !sb) {
> +		error = 0;
>  		goto out;
> +	}
>  
>  	error = thaw_super(sb);
> -	if (error) {
> +	if (error)
>  		bdev->bd_fsfreeze_count++;
> -		mutex_unlock(&bdev->bd_fsfreeze_mutex);
> -		return error;
> -	}
>  out:
>  	mutex_unlock(&bdev->bd_fsfreeze_mutex);
> -	return 0;
> +	return error;
>  }
>  EXPORT_SYMBOL(thaw_bdev);
>  
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] fsfreeze: emergency thaw will deadlock on s_umount
  2012-07-12  9:04 ` [PATCH 2/8] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
@ 2012-07-13 13:17   ` Jan Kara
  2012-08-09  6:00     ` Fernando Luis Vazquez Cao
  2012-09-13  6:23     ` Fernando Luis Vazquez Cao
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Kara @ 2012-07-13 13:17 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

On Thu 12-07-12 18:04:09, Fernando Luis Vázquez Cao wrote:
> The emergency thaw process uses iterate_super() which holds the
> sb->s_umount lock in read mode. The current thaw_super() code takes
> the sb->s_umount lock in write mode, hence leading to an instant
> deadlock.
> 
> Pass the emergency state into the thaw_bdev/thaw_super code to avoid
> taking the s_umount lock in this case. We are running under the bdev
> freeze mutex, so this is still serialised against freeze despite
> only having a read lock on the sb->s_umount. Hence it should be safe
> to execute in this manner, especially given that emergency thaw is a
> rarely executed "get-out-of-jail" feature.
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
> 
> diff -urNp vfs-orig/fs/block_dev.c vfs/fs/block_dev.c
> --- vfs-orig/fs/block_dev.c	2012-07-12 13:26:42.108635927 +0900
> +++ vfs/fs/block_dev.c	2012-07-12 13:26:56.964628082 +0900
> @@ -305,13 +305,14 @@ struct super_block *freeze_bdev(struct b
>  EXPORT_SYMBOL(freeze_bdev);
>  
>  /**
> - * thaw_bdev  -- unlock filesystem
> + * __thaw_bdev  -- unlock filesystem
>   * @bdev:	blockdevice to unlock
>   * @sb:		associated superblock
> + * @emergency:	emergency thaw
>   *
>   * Unlocks the filesystem and marks it writeable again after freeze_bdev().
>   */
> -int thaw_bdev(struct block_device *bdev, struct super_block *sb)
> +static int __thaw_bdev(struct block_device *bdev, struct super_block *sb, int emergency)
>  {
>  	int error = -EINVAL;
>  
> @@ -324,15 +325,35 @@ int thaw_bdev(struct block_device *bdev,
>  		goto out;
>  	}
>  
> -	error = thaw_super(sb);
> +	if (emergency)
> +		error = thaw_super_emergency(sb);
> +	else
> +		error = thaw_super(sb);
>  	if (error)
>  		bdev->bd_fsfreeze_count++;
>  out:
>  	mutex_unlock(&bdev->bd_fsfreeze_mutex);
>  	return error;
>  }
> +
> +/**
> + * thaw_bdev  -- unlock filesystem
> + * @bdev:	blockdevice to unlock
> + * @sb:		associated superblock
> + *
> + * Unlocks the filesystem and marks it writeable again after freeze_bdev().
> + */
> +int thaw_bdev(struct block_device *bdev, struct super_block *sb)
> +{
> +	return __thaw_bdev(bdev, sb, 0);
> +}
>  EXPORT_SYMBOL(thaw_bdev);
>  
> +int thaw_bdev_emergency(struct block_device *bdev, struct super_block *sb)
> +{
> +	return __thaw_bdev(bdev, sb, 1);
> +}
> +
  OK, this still isn't very nice but it goes away in subsequent patches so
I won't bother you...

> diff -urNp vfs-orig/fs/super.c vfs/fs/super.c
> --- vfs-orig/fs/super.c	2012-07-04 18:57:54.000000000 +0900
> +++ vfs/fs/super.c	2012-07-12 14:21:09.736566768 +0900
> @@ -1221,23 +1221,25 @@ int freeze_super(struct super_block *sb)
>  EXPORT_SYMBOL(freeze_super);
>  
>  /**
> - * thaw_super -- unlock filesystem
> + * __thaw_super -- unlock filesystem
>   * @sb: the super to thaw
> + * @emergency:	emergency thaw
>   *
>   * Unlocks the filesystem and marks it writeable again after freeze_super().
> + * This is the unlocked version of thaw_super and has to be called with the
> + * sb->s_umount lock held in the non-emergency thaw case.
>   */
> -int thaw_super(struct super_block *sb)
> +static int __thaw_super(struct super_block *sb, int emergency)
>  {
> -	int error;
> +	int error = 0;
>  
> -	down_write(&sb->s_umount);
>  	if (sb->s_frozen == SB_UNFROZEN) {
> -		up_write(&sb->s_umount);
> -		return -EINVAL;
> +		error = -EINVAL;
> +		goto out;
>  	}
>  
>  	if (sb->s_flags & MS_RDONLY)
> -		goto out;
> +		goto out_thaw;
>  
>  	if (sb->s_op->unfreeze_fs) {
>  		error = sb->s_op->unfreeze_fs(sb);
> @@ -1245,17 +1247,52 @@ int thaw_super(struct super_block *sb)
>  			printk(KERN_ERR
>  				"VFS:Filesystem thaw failed\n");
>  			sb->s_frozen = SB_FREEZE_TRANS;
> -			up_write(&sb->s_umount);
> -			return error;
> +			goto out;
>  		}
>  	}
>  
> -out:
> +out_thaw:
>  	sb->s_frozen = SB_UNFROZEN;
>  	smp_wmb();
>  	wake_up(&sb->s_wait_unfrozen);
> -	deactivate_locked_super(sb);
>  
> -	return 0;
> +	/*
> +	 * When called from emergency scope, we cannot grab the s_umount lock
> +	 * so we cannot deactivate the superblock. This may leave unbalanced
> +	 * superblock references which could prevent unmount, but given this is
> +	 * an emergency operation....
> +	 */
> +	if (!emergency)
> +		deactivate_locked_super(sb);
> +out:
> +	return error;
> +}
  This is just wrong. deactivate_locked_super() will unlock the superblock
for you (and may even free it). So just do this in thaw_super() instead of
up_write(&sb->s_umount) which is bogus there. That will also allow you to
get rid of that ugly 'emergency' argument...

> +
> +/**
> + * thaw_super -- unlock filesystem
> + * @sb: the super to thaw
> + *
> + * Unlocks the filesystem and marks it writeable again after freeze_super().
> + */
> +int thaw_super(struct super_block *sb)
> +{
> +	int res;
> +	down_write(&sb->s_umount);
> +	res = __thaw_super(sb, 0);
> +	up_write(&sb->s_umount);
> +	return res;
>  }
>  EXPORT_SYMBOL(thaw_super);
> +
> +/**
> + * thaw_super_emergency  -- unlock filesystem
> + * @sb: the super to thaw
> + *
> + * Unlocks the filesystem and marks it writeable again after freeze_super().
> + * The kernel gets here holding the sb->s_umount lock in read mode so we cannot
> + * grab it again in write mode.
> + */
> +int thaw_super_emergency(struct super_block *sb)
> +{
> +	return __thaw_super(sb, 1);
> +}
  And this function can be deleted as well. Just call __thaw_super() from
that one place.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/8] fsfreeze: freeze_super and thaw_bdev don't play well together
  2012-07-12  9:05 ` [PATCH 3/8] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
@ 2012-07-13 13:45   ` Jan Kara
  2012-08-09  9:00     ` Fernando Luis Vazquez Cao
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2012-07-13 13:45 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

On Thu 12-07-12 18:05:43, Fernando Luis Vázquez Cao wrote:
> Changes from Dave Chinner's version:
>  - Remove s_frozen check in freeze_super which is not needed now that it is
>    re-entrant.
>  - Decrement freeze counter if the freeze_fs callback fails.
> 
> ---
> 
> thaw_bdev() has re-entrancy guards to allow freezes to nest
> together. That is, it ensures that the filesystem is not thawed
> until the last thaw command is issued. This is needed to prevent the
> filesystem from being unfrozen while an existing freezer is still
> operating on the filesystem in a frozen state (e.g. dm-snapshot).
> 
> Currently, freeze_super() and thaw_super() bypasses these guards,
> and as a result manual freezing and unfreezing via the ioctl methods
> do not nest correctly. hence mixing userspace directed freezes with
> block device level freezes result in inconsistency due to premature
> thawing of the filesystem.
> 
> Move the re-enterency guards to the superblock and into freeze_super
> and thaw_super() so that userspace directed freezes nest correctly
> again. Caveat: Things work as expected as long as direct calls to
> thaw_super are always in response to a previous sb level freeze. In
> other words an unpaired call to thaw_super can still thaw a
> filesystem frozen using freeze_bdev (this issue could be addressed
> in a follow-up patch if deemed necessary).
> 
> This patch retains the bdev level mutex and counter to keep the
> "feature" that we can freeze a block device that does not have a
> filesystem mounted yet.
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
  Some mostly minor coments below. 

 
> diff -urNp vfs-orig/fs/block_dev.c vfs/fs/block_dev.c
> --- vfs-orig/fs/block_dev.c	2012-07-12 14:31:38.936631141 +0900
> +++ vfs/fs/block_dev.c	2012-07-12 15:03:57.032627014 +0900
> @@ -257,16 +257,18 @@ int fsync_bdev(struct block_device *bdev
>  EXPORT_SYMBOL(fsync_bdev);
>  
>  /**
> - * freeze_bdev  --  lock a filesystem and force it into a consistent state
> + * freeze_bdev  --  lock a block device
>   * @bdev:	blockdevice to lock
>   *
> - * If a superblock is found on this device, we take the s_umount semaphore
> - * on it to make sure nobody unmounts until the snapshot creation is done.
> - * The reference counter (bd_fsfreeze_count) guarantees that only the last
> - * unfreeze process can unfreeze the frozen filesystem actually when multiple
> - * freeze requests arrive simultaneously. It counts up in freeze_bdev() and
> - * count down in thaw_bdev(). When it becomes 0, thaw_bdev() will unfreeze
> - * actually.
> + * Locks the block device and, if present, the associated filesystem too.
> + *
> + * The reference counter (bd_fsfreeze_count) is used to implement the feature
> + * that allows one to freeze a block device that does not have a filesystem
> + * mounted yet. For filesystems using mount_bdev the kernel takes care of
> + * things by preventing the mount operation from succeeding if the underlying
> + * block device is frozen. Other filesystems should check this counter or risk
> + * a situation where a freeze_bdev user (e.g. dm snapshot) and mount race,
> + * which may lead to inconsistencies.
>   */
>  struct super_block *freeze_bdev(struct block_device *bdev)
>  {
> @@ -274,17 +276,7 @@ struct super_block *freeze_bdev(struct b
>  	int error = 0;
>  
>  	mutex_lock(&bdev->bd_fsfreeze_mutex);
> -	if (++bdev->bd_fsfreeze_count > 1) {
> -		/*
> -		 * We don't even need to grab a reference - the first call
> -		 * to freeze_bdev grab an active reference and only the last
> -		 * thaw_bdev drops it.
> -		 */
> -		sb = get_super(bdev);
> -		drop_super(sb);
> -		mutex_unlock(&bdev->bd_fsfreeze_mutex);
> -		return sb;
> -	}
> +	bdev->bd_fsfreeze_count++;
>  
>  	sb = get_active_super(bdev);
>  	if (!sb)
> @@ -297,30 +289,33 @@ struct super_block *freeze_bdev(struct b
>  		return ERR_PTR(error);
>  	}
>  	deactivate_super(sb);
> - out:
> +out:
>  	sync_blockdev(bdev);
>  	mutex_unlock(&bdev->bd_fsfreeze_mutex);
> -	return sb;	/* thaw_bdev releases s->s_umount */
> +	return sb;
>  }
>  EXPORT_SYMBOL(freeze_bdev);
>  
>  /**
> - * __thaw_bdev  -- unlock filesystem
> + * __thaw_bdev  -- unlock a block device
>   * @bdev:	blockdevice to unlock
>   * @sb:		associated superblock
>   * @emergency:	emergency thaw
>   *
> - * Unlocks the filesystem and marks it writeable again after freeze_bdev().
> + * Unlocks the block device and, if present, the associated filesystem too.
>   */
>  static int __thaw_bdev(struct block_device *bdev, struct super_block *sb, int emergency)
>  {
>  	int error = -EINVAL;
>  
>  	mutex_lock(&bdev->bd_fsfreeze_mutex);
> +
>  	if (!bdev->bd_fsfreeze_count)
>  		goto out;
>  
> -	if (--bdev->bd_fsfreeze_count > 0 || !sb) {
> +	bdev->bd_fsfreeze_count--;
> +
> +	if (!sb) {
>  		error = 0;
>  		goto out;
>  	}
> @@ -336,13 +331,6 @@ out:
>  	return error;
>  }
>  
> -/**
> - * thaw_bdev  -- unlock filesystem
> - * @bdev:	blockdevice to unlock
> - * @sb:		associated superblock
> - *
> - * Unlocks the filesystem and marks it writeable again after freeze_bdev().
> - */
>  int thaw_bdev(struct block_device *bdev, struct super_block *sb)
>  {
>  	return __thaw_bdev(bdev, sb, 0);
  It's a bit confusing (for reviewer) to remove the documentation here when
you remove the whole function in a later patch...

...
> @@ -1233,29 +1240,33 @@ static int __thaw_super(struct super_blo
>  {
>  	int error = 0;
>  
> -	if (sb->s_frozen == SB_UNFROZEN) {
> +	mutex_lock(&sb->s_freeze_mutex);
> +	if (!sb->s_freeze_count) {
>  		error = -EINVAL;
> -		goto out;
> +		goto out_unlock;
>  	}
> +	sb->s_freeze_count = emergency ? 1 : sb->s_freeze_count;
  It would be cleaner to do this somewhere in do_thaw_one() and not here.
Also you won't have to pass the emergency parameter then... Also it might
be more logical for __thaw_super() to expect also s_freeze_mutex locked and
handle the locking in thaw_super().

								Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/8] fsfreeze: switch to using super methods where possible
  2012-07-12  9:07 ` [PATCH 4/8] fsfreeze: switch to using super methods where possible Fernando Luis Vázquez Cao
@ 2012-07-13 13:50   ` Jan Kara
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2012-07-13 13:50 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

On Thu 12-07-12 18:07:32, Fernando Luis Vázquez Cao wrote:
> Changes from Dave Chinner's version:
> - Retain the freeze_bdev/thaw_bdev API so that we can keep the feature that
>   we can freeze a block device that does not have a filesystem mounted yet.
> 
> ---
> 
> In some places we are using freeze/thaw_bdev despite the fact the kernel
> is operating at the sb level there. Convert these users of the bdev
> interfaces to use the superblock interfaces instead.
  This looks good. So you can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
> 
> diff -urNp vfs-orig/fs/block_dev.c vfs/fs/block_dev.c
> --- vfs-orig/fs/block_dev.c	2012-07-12 17:03:18.516624899 +0900
> +++ vfs/fs/block_dev.c	2012-07-12 17:04:35.244784436 +0900
> @@ -297,14 +297,14 @@ out:
>  EXPORT_SYMBOL(freeze_bdev);
>  
>  /**
> - * __thaw_bdev  -- unlock a block device
> + * thaw_bdev  -- unlock a block device
>   * @bdev:	blockdevice to unlock
>   * @sb:		associated superblock
>   * @emergency:	emergency thaw
>   *
>   * Unlocks the block device and, if present, the associated filesystem too.
>   */
> -static int __thaw_bdev(struct block_device *bdev, struct super_block *sb, int emergency)
> +int thaw_bdev(struct block_device *bdev, struct super_block *sb)
>  {
>  	int error = -EINVAL;
>  
> @@ -320,28 +320,15 @@ static int __thaw_bdev(struct block_devi
>  		goto out;
>  	}
>  
> -	if (emergency)
> -		error = thaw_super_emergency(sb);
> -	else
> -		error = thaw_super(sb);
> +	error = thaw_super(sb);
>  	if (error)
>  		bdev->bd_fsfreeze_count++;
>  out:
>  	mutex_unlock(&bdev->bd_fsfreeze_mutex);
>  	return error;
>  }
> -
> -int thaw_bdev(struct block_device *bdev, struct super_block *sb)
> -{
> -	return __thaw_bdev(bdev, sb, 0);
> -}
>  EXPORT_SYMBOL(thaw_bdev);
>  
> -int thaw_bdev_emergency(struct block_device *bdev, struct super_block *sb)
> -{
> -	return __thaw_bdev(bdev, sb, 1);
> -}
> -
>  static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
>  {
>  	return block_write_full_page(page, blkdev_get_block, wbc);
> diff -urNp vfs-orig/fs/buffer.c vfs/fs/buffer.c
> --- vfs-orig/fs/buffer.c	2012-07-12 14:31:38.944630298 +0900
> +++ vfs/fs/buffer.c	2012-07-12 17:04:35.244784436 +0900
> @@ -514,7 +514,7 @@ repeat:
>  static void do_thaw_one(struct super_block *sb, void *unused)
>  {
>  	char b[BDEVNAME_SIZE];
> -	while (sb->s_bdev && !thaw_bdev_emergency(sb->s_bdev, sb))
> +	while (sb->s_bdev && !thaw_super_emergency(sb))
>  		printk(KERN_WARNING "Emergency Thaw on %s\n",
>  		       bdevname(sb->s_bdev, b));
>  }
> diff -urNp vfs-orig/fs/xfs/xfs_fsops.c vfs/fs/xfs/xfs_fsops.c
> --- vfs-orig/fs/xfs/xfs_fsops.c	2012-07-04 18:57:54.000000000 +0900
> +++ vfs/fs/xfs/xfs_fsops.c	2012-07-12 17:04:35.252780168 +0900
> @@ -664,16 +664,12 @@ xfs_fs_goingdown(
>  	__uint32_t	inflags)
>  {
>  	switch (inflags) {
> -	case XFS_FSOP_GOING_FLAGS_DEFAULT: {
> -		struct super_block *sb = freeze_bdev(mp->m_super->s_bdev);
> -
> -		if (sb && !IS_ERR(sb)) {
> +	case XFS_FSOP_GOING_FLAGS_DEFAULT:
> +		if (!freeze_super(mp->m_super)) {
>  			xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
> -			thaw_bdev(sb->s_bdev, sb);
> +			thaw_super(mp->m_super);
>  		}
> -
>  		break;
> -	}
>  	case XFS_FSOP_GOING_FLAGS_LOGFLUSH:
>  		xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
>  		break;
> diff -urNp vfs-orig/include/linux/fs.h vfs/include/linux/fs.h
> --- vfs-orig/include/linux/fs.h	2012-07-12 17:03:18.596626998 +0900
> +++ vfs/include/linux/fs.h	2012-07-12 17:04:35.252780168 +0900
> @@ -2111,8 +2111,6 @@ extern void kill_bdev(struct block_devic
>  extern struct super_block *freeze_bdev(struct block_device *);
>  extern void emergency_thaw_all(void);
>  extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
> -extern int thaw_bdev_emergency(struct block_device *bdev,
> -			       struct super_block *sb);
>  extern int fsync_bdev(struct block_device *);
>  #else
>  static inline void bd_forget(struct inode *inode) {}
> @@ -2129,12 +2127,6 @@ static inline int thaw_bdev(struct block
>  {
>  	return 0;
>  }
> -
> -static inline int thaw_bdev_emergency(struct block_device *bdev,
> -				      struct super_block *sb)
> -{
> -	return 0;
> -}
>  #endif
>  extern int sync_filesystem(struct super_block *);
>  extern const struct file_operations def_blk_fops;
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/8] fsfreeze: move emergency thaw code to fs/super.c
  2012-07-12  9:09 ` [PATCH 5/8] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
@ 2012-07-13 13:53   ` Jan Kara
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2012-07-13 13:53 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

On Thu 12-07-12 18:09:00, Fernando Luis Vázquez Cao wrote:
> It makes no sense having the emergency thaw code in fs/buffer.c when all of
> it's operations are one superblocks and the code it executes is all in
> fs/super.c. Move the code there and clean it up.
  Looks sensinble. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
> 
> diff -urNp vfs-orig/fs/buffer.c vfs/fs/buffer.c
> --- vfs-orig/fs/buffer.c	2012-07-12 17:11:42.480606884 +0900
> +++ vfs/fs/buffer.c	2012-07-12 17:14:25.652627936 +0900
> @@ -511,37 +511,6 @@ repeat:
>  	return err;
>  }
>  
> -static void do_thaw_one(struct super_block *sb, void *unused)
> -{
> -	char b[BDEVNAME_SIZE];
> -	while (sb->s_bdev && !thaw_super_emergency(sb))
> -		printk(KERN_WARNING "Emergency Thaw on %s\n",
> -		       bdevname(sb->s_bdev, b));
> -}
> -
> -static void do_thaw_all(struct work_struct *work)
> -{
> -	iterate_supers(do_thaw_one, NULL);
> -	kfree(work);
> -	printk(KERN_WARNING "Emergency Thaw complete\n");
> -}
> -
> -/**
> - * emergency_thaw_all -- forcibly thaw every frozen filesystem
> - *
> - * Used for emergency unfreeze of all filesystems via SysRq
> - */
> -void emergency_thaw_all(void)
> -{
> -	struct work_struct *work;
> -
> -	work = kmalloc(sizeof(*work), GFP_ATOMIC);
> -	if (work) {
> -		INIT_WORK(work, do_thaw_all);
> -		schedule_work(work);
> -	}
> -}
> -
>  /**
>   * sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
>   * @mapping: the mapping which wants those buffers written
> diff -urNp vfs-orig/fs/super.c vfs/fs/super.c
> --- vfs-orig/fs/super.c	2012-07-12 17:03:18.568626969 +0900
> +++ vfs/fs/super.c	2012-07-12 17:21:40.768626516 +0900
> @@ -1304,7 +1304,35 @@ EXPORT_SYMBOL(thaw_super);
>   * The kernel gets here holding the sb->s_umount lock in read mode so we cannot
>   * grab it again in write mode.
>   */
> -int thaw_super_emergency(struct super_block *sb)
> +static void thaw_super_emergency(struct super_block *sb, void *unused)
>  {
> -	return __thaw_super(sb, 1);
> +	if (sb->s_bdev) {
> +		char b[BDEVNAME_SIZE];
> +		printk(KERN_WARNING "Emergency Thaw on %s.\n",
> +				    bdevname(sb->s_bdev, b));
> +	}
> +	while (!__thaw_super(sb, 1));
> +}
> +
> +static void do_thaw_all(struct work_struct *work)
> +{
> +	iterate_supers(thaw_super_emergency, NULL);
> +	kfree(work);
> +	printk(KERN_WARNING "Emergency Thaw complete\n");
> +}
> +
> +/**
> + * emergency_thaw_all -- forcibly thaw every frozen filesystem
> + *
> + * Used for emergency unfreeze of all filesystems via SysRq
> + */
> +void emergency_thaw_all(void)
> +{
> +	struct work_struct *work;
> +
> +	work = kmalloc(sizeof(*work), GFP_ATOMIC);
> +	if (work) {
> +		INIT_WORK(work, do_thaw_all);
> +		schedule_work(work);
> +	}
>  }
> diff -urNp vfs-orig/include/linux/fs.h vfs/include/linux/fs.h
> --- vfs-orig/include/linux/fs.h	2012-07-12 17:11:42.496609385 +0900
> +++ vfs/include/linux/fs.h	2012-07-12 17:22:14.640642715 +0900
> @@ -1947,7 +1947,6 @@ extern int fd_statfs(int, struct kstatfs
>  extern int vfs_ustat(dev_t, struct kstatfs *);
>  extern int freeze_super(struct super_block *super);
>  extern int thaw_super(struct super_block *super);
> -extern int thaw_super_emergency(struct super_block *super);
>  extern bool our_mnt(struct vfsmount *mnt);
>  
>  extern int current_umask(void);
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state
  2012-07-12  9:10 ` [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao
@ 2012-07-13 13:54   ` Jan Kara
  2012-07-15 22:45     ` Dave Chinner
  2012-09-13  6:11     ` Fernando Luis Vazquez Cao
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Kara @ 2012-07-13 13:54 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

On Thu 12-07-12 18:10:14, Fernando Luis Vázquez Cao wrote:
> The FIISFROZEN ioctl can be use by HA and monitoring software to check
> the freeze state of a mounted filesystem.
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> ---
> 
> diff -urNp vfs-orig/fs/compat_ioctl.c vfs/fs/compat_ioctl.c
> --- vfs-orig/fs/compat_ioctl.c	2012-07-04 18:57:54.000000000 +0900
> +++ vfs/fs/compat_ioctl.c	2012-07-12 17:25:37.592626439 +0900
> @@ -885,6 +885,7 @@ COMPATIBLE_IOCTL(FIGETBSZ)
>  /* 'X' - originally XFS but some now in the VFS */
>  COMPATIBLE_IOCTL(FIFREEZE)
>  COMPATIBLE_IOCTL(FITHAW)
> +COMPATIBLE_IOCTL(FIISFROZEN)
>  COMPATIBLE_IOCTL(KDGETKEYCODE)
>  COMPATIBLE_IOCTL(KDSETKEYCODE)
>  COMPATIBLE_IOCTL(KDGKBTYPE)
> diff -urNp vfs-orig/fs/ioctl.c vfs/fs/ioctl.c
> --- vfs-orig/fs/ioctl.c	2012-07-04 18:57:54.000000000 +0900
> +++ vfs/fs/ioctl.c	2012-07-12 17:25:37.592626439 +0900
> @@ -536,6 +536,16 @@ static int ioctl_fsthaw(struct file *fil
>  	return thaw_super(sb);
>  }
>  
> +static int ioctl_fs_isfrozen(struct file *filp)
> +{
> +	struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	return isfrozen_super(sb);
> +}
> +
  Is there any reason to restrict this to CAP_SYS_ADMIN?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 8/8] fsfreeze: update Documentation/filesystems/Locking
  2012-07-12  9:12 ` [PATCH 8/8] fsfreeze: update Documentation/filesystems/Locking Fernando Luis Vázquez Cao
@ 2012-07-13 14:11   ` Jan Kara
  2012-07-17  1:42     ` Fernando Luis Vazquez Cao
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2012-07-13 14:11 UTC (permalink / raw)
  To: Fernando Luis Vázquez Cao
  Cc: Al Viro, Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel

On Thu 12-07-12 18:12:45, Fernando Luis Vázquez Cao wrote:
> Documentation and implementation got out of sync. This patch updates the
> documentation to reflect the current fsfreeze locking rules.
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
  This is already part of my patch series... but whatever.

								Honza
> ---
> 
> diff -urNp vfs-orig/Documentation/filesystems/Locking vfs/Documentation/filesystems/Locking
> --- vfs-orig/Documentation/filesystems/Locking	2012-07-04 18:57:54.000000000 +0900
> +++ vfs/Documentation/filesystems/Locking	2012-07-12 17:33:44.792662348 +0900
> @@ -138,8 +138,8 @@ evict_inode:
>  put_super:		write
>  write_super:		read
>  sync_fs:		read
> -freeze_fs:		read
> -unfreeze_fs:		read
> +freeze_fs:		write
> +unfreeze_fs:		write
>  statfs:			maybe(read)	(see below)
>  remount_fs:		write
>  umount_begin:		no
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state
  2012-07-13 13:54   ` Jan Kara
@ 2012-07-15 22:45     ` Dave Chinner
  2012-09-13  6:19       ` Fernando Luis Vazquez Cao
  2012-09-13  6:11     ` Fernando Luis Vazquez Cao
  1 sibling, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2012-07-15 22:45 UTC (permalink / raw)
  To: Jan Kara
  Cc: Fernando Luis Vázquez Cao, Al Viro, Dave Chinner,
	Christoph Hellwig, linux-fsdevel

On Fri, Jul 13, 2012 at 03:54:54PM +0200, Jan Kara wrote:
> On Thu 12-07-12 18:10:14, Fernando Luis Vázquez Cao wrote:
> > The FIISFROZEN ioctl can be use by HA and monitoring software to check
> > the freeze state of a mounted filesystem.

Can you explain in more detail why the HA system needs to check this?
And, for that matter, what it does with that information?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 8/8] fsfreeze: update Documentation/filesystems/Locking
  2012-07-13 14:11   ` Jan Kara
@ 2012-07-17  1:42     ` Fernando Luis Vazquez Cao
  0 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-07-17  1:42 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel

Hi Jan,

On 2012/07/13 23:11, Jan Kara wrote:
> On Thu 12-07-12 18:12:45, Fernando Luis Vázquez Cao wrote:
>> Documentation and implementation got out of sync. This patch updates the
>> documentation to reflect the current fsfreeze locking rules.
>>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Dave Chinner <dchinner@redhat.com>
>> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
>    This is already part of my patch series... but whatever.

Sorry, I overlooked that patch. I will drop mine for
the next iteration of this patch set.

Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/8] fsfreeze: Prevent emergency thaw from looping infinitely
  2012-07-13 12:59   ` Jan Kara
@ 2012-07-17  5:13     ` Fernando Luis Vazquez Cao
  0 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-07-17  5:13 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel

Hi Jan, Dave,

(2012/07/13 21:59), Jan Kara wrote:

> On Thu 12-07-12 18:02:07, Fernando Luis Vázquez Cao wrote:
>> The thawing of a filesystem through sysrq-j loops infinitely as it
>> incorrectly detects a thawed filesytsem as frozen and tries to
>> unfreeze repeatedly. This is a regression caused by
>> 4504230a71566785a05d3e6b53fa1ee071b864eb ("freeze_bdev: grab active
>> reference to frozen superblocks") in that it no longer returned
>> -EINVAL for superblocks that were not frozen.
>>
>> Return -EINVAL when the filesystem is already unfrozen to avoid this
>> problem.
>>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: Jan Kara <jack@suse.cz>
>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
>    Looks good. You can add:
> Reviewed-by: Jan Kara <jack@suse.cz>

Jan, thank you for the detailed review of the patch set. I really appreciate it.

This week I am pretty busy with other stuff so I may not be able to address your
comments and Dave's until next Monday.

Regards,
Fernando

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] fsfreeze: emergency thaw will deadlock on s_umount
  2012-07-13 13:17   ` Jan Kara
@ 2012-08-09  6:00     ` Fernando Luis Vazquez Cao
  2012-09-13  6:23     ` Fernando Luis Vazquez Cao
  1 sibling, 0 replies; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-08-09  6:00 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel

On 2012/07/13 22:17, Jan Kara wrote:
> On Thu 12-07-12 18:04:09, Fernando Luis Vázquez Cao wrote:
>> diff -urNp vfs-orig/fs/super.c vfs/fs/super.c
>> --- vfs-orig/fs/super.c	2012-07-04 18:57:54.000000000 +0900
>> +++ vfs/fs/super.c	2012-07-12 14:21:09.736566768 +0900
>> @@ -1221,23 +1221,25 @@ int freeze_super(struct super_block *sb)
>>   EXPORT_SYMBOL(freeze_super);
>>   
>>   /**
>> - * thaw_super -- unlock filesystem
>> + * __thaw_super -- unlock filesystem
>>    * @sb: the super to thaw
>> + * @emergency:	emergency thaw
>>    *
>>    * Unlocks the filesystem and marks it writeable again after freeze_super().
>> + * This is the unlocked version of thaw_super and has to be called with the
>> + * sb->s_umount lock held in the non-emergency thaw case.
>>    */
>> -int thaw_super(struct super_block *sb)
>> +static int __thaw_super(struct super_block *sb, int emergency)
>>   {
>> -	int error;
>> +	int error = 0;
>>   
>> -	down_write(&sb->s_umount);
>>   	if (sb->s_frozen == SB_UNFROZEN) {
>> -		up_write(&sb->s_umount);
>> -		return -EINVAL;
>> +		error = -EINVAL;
>> +		goto out;
>>   	}
>>   
>>   	if (sb->s_flags & MS_RDONLY)
>> -		goto out;
>> +		goto out_thaw;
>>   
>>   	if (sb->s_op->unfreeze_fs) {
>>   		error = sb->s_op->unfreeze_fs(sb);
>> @@ -1245,17 +1247,52 @@ int thaw_super(struct super_block *sb)
>>   			printk(KERN_ERR
>>   				"VFS:Filesystem thaw failed\n");
>>   			sb->s_frozen = SB_FREEZE_TRANS;
>> -			up_write(&sb->s_umount);
>> -			return error;
>> +			goto out;
>>   		}
>>   	}
>>   
>> -out:
>> +out_thaw:
>>   	sb->s_frozen = SB_UNFROZEN;
>>   	smp_wmb();
>>   	wake_up(&sb->s_wait_unfrozen);
>> -	deactivate_locked_super(sb);
>>   
>> -	return 0;
>> +	/*
>> +	 * When called from emergency scope, we cannot grab the s_umount lock
>> +	 * so we cannot deactivate the superblock. This may leave unbalanced
>> +	 * superblock references which could prevent unmount, but given this is
>> +	 * an emergency operation....
>> +	 */
>> +	if (!emergency)
>> +		deactivate_locked_super(sb);
>> +out:
>> +	return error;
>> +}
>    This is just wrong. deactivate_locked_super() will unlock the superblock
> for you (and may even free it). So just do this in thaw_super() instead of
> up_write(&sb->s_umount) which is bogus there. That will also allow you to
> get rid of that ugly 'emergency' argument...

Ouch! I am sorry for sending a botched patch. In my local tree
the emergency argument is gone and thaw_super looks like this
(up_write(&sb->s_umount) is needed when __thaw_super() fails
because we do not want to release a reference to the superblock
in that case):

int thaw_super(struct super_block *sb)
{
         int res;
         down_write(&sb->s_umount);
         res = __thaw_super(sb);
         if (!res)
                 deactivate_locked_super(sb);
         else
                 up_write(&sb->s_umount);
         return res;
}


>> +/**
>> + * thaw_super_emergency  -- unlock filesystem
>> + * @sb: the super to thaw
>> + *
>> + * Unlocks the filesystem and marks it writeable again after freeze_super().
>> + * The kernel gets here holding the sb->s_umount lock in read mode so we cannot
>> + * grab it again in write mode.
>> + */
>> +int thaw_super_emergency(struct super_block *sb)
>> +{
>> +	return __thaw_super(sb, 1);
>> +}
>    And this function can be deleted as well. Just call __thaw_super() from
> that one place.

Good point. The code is easier to read that way.

Thanks!

- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/8] fsfreeze: freeze_super and thaw_bdev don't play well together
  2012-07-13 13:45   ` Jan Kara
@ 2012-08-09  9:00     ` Fernando Luis Vazquez Cao
  0 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-08-09  9:00 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel

On 2012/07/13 22:45, Jan Kara wrote:
> On Thu 12-07-12 18:05:43, Fernando Luis Vázquez Cao wrote:
>> -/**
>> - * thaw_bdev  -- unlock filesystem
>> - * @bdev:	blockdevice to unlock
>> - * @sb:		associated superblock
>> - *
>> - * Unlocks the filesystem and marks it writeable again after freeze_bdev().
>> - */
>>   int thaw_bdev(struct block_device *bdev, struct super_block *sb)
>>   {
>>   	return __thaw_bdev(bdev, sb, 0);
>    It's a bit confusing (for reviewer) to remove the documentation here when
> you remove the whole function in a later patch...

Ok. I will get rid of the function and its documentation together
in a later patch.

>> @@ -1233,29 +1240,33 @@ static int __thaw_super(struct super_blo
>>   {
>>   	int error = 0;
>>   
>> -	if (sb->s_frozen == SB_UNFROZEN) {
>> +	mutex_lock(&sb->s_freeze_mutex);
>> +	if (!sb->s_freeze_count) {
>>   		error = -EINVAL;
>> -		goto out;
>> +		goto out_unlock;
>>   	}
>> +	sb->s_freeze_count = emergency ? 1 : sb->s_freeze_count;
>    It would be cleaner to do this somewhere in do_thaw_one() and not here.
> Also you won't have to pass the emergency parameter then...

Good point. In the next iteration sb->s_freeze_count is updated
by the caller when appropriate. I will remove the emergency
parameter too.


> Also it might
> be more logical for __thaw_super() to expect also s_freeze_mutex locked and
> handle the locking in thaw_super().

I think I will keep this as is so that I do not need to add explicit
locking to the emergency thaw code, which calls __thaw_super
directly.

Thank you for your feedback!

- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state
  2012-07-13 13:54   ` Jan Kara
  2012-07-15 22:45     ` Dave Chinner
@ 2012-09-13  6:11     ` Fernando Luis Vazquez Cao
  1 sibling, 0 replies; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-09-13  6:11 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel

On 2012/07/13 22:54, Jan Kara wrote:
> On Thu 12-07-12 18:10:14, Fernando Luis Vázquez Cao wrote:
>> The FIISFROZEN ioctl can be use by HA and monitoring software to check
>> the freeze state of a mounted filesystem.
>>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Dave Chinner <dchinner@redhat.com>
>> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
>> ---
>>
>> diff -urNp vfs-orig/fs/compat_ioctl.c vfs/fs/compat_ioctl.c
>> --- vfs-orig/fs/compat_ioctl.c	2012-07-04 18:57:54.000000000 +0900
>> +++ vfs/fs/compat_ioctl.c	2012-07-12 17:25:37.592626439 +0900
>> @@ -885,6 +885,7 @@ COMPATIBLE_IOCTL(FIGETBSZ)
>>   /* 'X' - originally XFS but some now in the VFS */
>>   COMPATIBLE_IOCTL(FIFREEZE)
>>   COMPATIBLE_IOCTL(FITHAW)
>> +COMPATIBLE_IOCTL(FIISFROZEN)
>>   COMPATIBLE_IOCTL(KDGETKEYCODE)
>>   COMPATIBLE_IOCTL(KDSETKEYCODE)
>>   COMPATIBLE_IOCTL(KDGKBTYPE)
>> diff -urNp vfs-orig/fs/ioctl.c vfs/fs/ioctl.c
>> --- vfs-orig/fs/ioctl.c	2012-07-04 18:57:54.000000000 +0900
>> +++ vfs/fs/ioctl.c	2012-07-12 17:25:37.592626439 +0900
>> @@ -536,6 +536,16 @@ static int ioctl_fsthaw(struct file *fil
>>   	return thaw_super(sb);
>>   }
>>   
>> +static int ioctl_fs_isfrozen(struct file *filp)
>> +{
>> +	struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
>> +
>> +	if (!capable(CAP_SYS_ADMIN))
>> +		return -EPERM;
>> +
>> +	return isfrozen_super(sb);
>> +}
>> +
>    Is there any reason to restrict this to CAP_SYS_ADMIN?

It is just for consistency with what FIFREEZE and FITHAW require
in terms of ACLs. If it is deemed too strict for a check ioctl I can
remove it.

Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state
  2012-07-15 22:45     ` Dave Chinner
@ 2012-09-13  6:19       ` Fernando Luis Vazquez Cao
  2012-09-13  7:18         ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-09-13  6:19 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel

On 2012/07/16 07:45, Dave Chinner wrote:
> On Fri, Jul 13, 2012 at 03:54:54PM +0200, Jan Kara wrote:
>> On Thu 12-07-12 18:10:14, Fernando Luis Vázquez Cao wrote:
>>> The FIISFROZEN ioctl can be use by HA and monitoring software to check
>>> the freeze state of a mounted filesystem.
> Can you explain in more detail why the HA system needs to check this?
> And, for that matter, what it does with that information?

What our HA guys told me is that certain fencing scripts
try to umount filesystems that can be in frozen state. The
problem is that when we umount a frozen filesystem the
superblock still stays around which can lead to a split-brain
scenario.

Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] fsfreeze: emergency thaw will deadlock on s_umount
  2012-07-13 13:17   ` Jan Kara
  2012-08-09  6:00     ` Fernando Luis Vazquez Cao
@ 2012-09-13  6:23     ` Fernando Luis Vazquez Cao
  1 sibling, 0 replies; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-09-13  6:23 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel

On 2012/07/13 22:17, Jan Kara wrote:
> On Thu 12-07-12 18:04:09, Fernando Luis Vázquez Cao wrote:
>> diff -urNp vfs-orig/fs/super.c vfs/fs/super.c
>> --- vfs-orig/fs/super.c	2012-07-04 18:57:54.000000000 +0900
>> +++ vfs/fs/super.c	2012-07-12 14:21:09.736566768 +0900
>> @@ -1221,23 +1221,25 @@ int freeze_super(struct super_block *sb)
>>   EXPORT_SYMBOL(freeze_super);
>>   
>>   /**
>> - * thaw_super -- unlock filesystem
>> + * __thaw_super -- unlock filesystem
>>    * @sb: the super to thaw
>> + * @emergency:	emergency thaw
>>    *
>>    * Unlocks the filesystem and marks it writeable again after freeze_super().
>> + * This is the unlocked version of thaw_super and has to be called with the
>> + * sb->s_umount lock held in the non-emergency thaw case.
>>    */
>> -int thaw_super(struct super_block *sb)
>> +static int __thaw_super(struct super_block *sb, int emergency)
>>   {
>> -	int error;
>> +	int error = 0;
>>   
>> -	down_write(&sb->s_umount);
>>   	if (sb->s_frozen == SB_UNFROZEN) {
>> -		up_write(&sb->s_umount);
>> -		return -EINVAL;
>> +		error = -EINVAL;
>> +		goto out;
>>   	}
>>   
>>   	if (sb->s_flags & MS_RDONLY)
>> -		goto out;
>> +		goto out_thaw;
>>   
>>   	if (sb->s_op->unfreeze_fs) {
>>   		error = sb->s_op->unfreeze_fs(sb);
>> @@ -1245,17 +1247,52 @@ int thaw_super(struct super_block *sb)
>>   			printk(KERN_ERR
>>   				"VFS:Filesystem thaw failed\n");
>>   			sb->s_frozen = SB_FREEZE_TRANS;
>> -			up_write(&sb->s_umount);
>> -			return error;
>> +			goto out;
>>   		}
>>   	}
>>   
>> -out:
>> +out_thaw:
>>   	sb->s_frozen = SB_UNFROZEN;
>>   	smp_wmb();
>>   	wake_up(&sb->s_wait_unfrozen);
>> -	deactivate_locked_super(sb);
>>   
>> -	return 0;
>> +	/*
>> +	 * When called from emergency scope, we cannot grab the s_umount lock
>> +	 * so we cannot deactivate the superblock. This may leave unbalanced
>> +	 * superblock references which could prevent unmount, but given this is
>> +	 * an emergency operation....
>> +	 */
>> +	if (!emergency)
>> +		deactivate_locked_super(sb);
>> +out:
>> +	return error;
>> +}
>    This is just wrong. deactivate_locked_super() will unlock the superblock
> for you (and may even free it). So just do this in thaw_super() instead of
> up_write(&sb->s_umount) which is bogus there. That will also allow you to
> get rid of that ugly 'emergency' argument...

I just wanted to let you know that I rewrote this patch from scratch
taking a different approach which allows me to get locking right
in the emergency thaw case too. I will be sending the updated
patch set later today.

Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state
  2012-09-13  6:19       ` Fernando Luis Vazquez Cao
@ 2012-09-13  7:18         ` Dave Chinner
  2012-09-13  8:19           ` Fernando Luis Vazquez Cao
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2012-09-13  7:18 UTC (permalink / raw)
  To: Fernando Luis Vazquez Cao
  Cc: Jan Kara, Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel

On Thu, Sep 13, 2012 at 03:19:23PM +0900, Fernando Luis Vazquez Cao wrote:
> On 2012/07/16 07:45, Dave Chinner wrote:
> >On Fri, Jul 13, 2012 at 03:54:54PM +0200, Jan Kara wrote:
> >>On Thu 12-07-12 18:10:14, Fernando Luis Vázquez Cao wrote:
> >>>The FIISFROZEN ioctl can be use by HA and monitoring software to check
> >>>the freeze state of a mounted filesystem.
> >Can you explain in more detail why the HA system needs to check this?
> >And, for that matter, what it does with that information?
> 
> What our HA guys told me is that certain fencing scripts
> try to umount filesystems that can be in frozen state. The
> problem is that when we umount a frozen filesystem the
> superblock still stays around which can lead to a split-brain
> scenario.

Then the bug is that unmounting a frozen filesystem is not
working correctly. Fix the problem, don't add new APIs to try to
detect a state where the bug might get tripped over and avoid it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state
  2012-09-13  7:18         ` Dave Chinner
@ 2012-09-13  8:19           ` Fernando Luis Vazquez Cao
  2012-09-14  0:15             ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-09-13  8:19 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel

On 2012/09/13 16:18, Dave Chinner wrote:
> On Thu, Sep 13, 2012 at 03:19:23PM +0900, Fernando Luis Vazquez Cao wrote:
>> On 2012/07/16 07:45, Dave Chinner wrote:
>>> On Fri, Jul 13, 2012 at 03:54:54PM +0200, Jan Kara wrote:
>>>> On Thu 12-07-12 18:10:14, Fernando Luis Vázquez Cao wrote:
>>>>> The FIISFROZEN ioctl can be use by HA and monitoring software to check
>>>>> the freeze state of a mounted filesystem.
>>> Can you explain in more detail why the HA system needs to check this?
>>> And, for that matter, what it does with that information?
>> What our HA guys told me is that certain fencing scripts
>> try to umount filesystems that can be in frozen state. The
>> problem is that when we umount a frozen filesystem the
>> superblock still stays around which can lead to a split-brain
>> scenario.
> Then the bug is that unmounting a frozen filesystem is not
> working correctly. Fix the problem, don't add new APIs to try to
> detect a state where the bug might get tripped over and avoid it.

The problem is that we allow users to umount a frozen
filesystem but we have neither  a bdev level fsfreeze
API to thaw it after that nor check ioctls.

I proposed returning EBUSY when userspace tries to umount a
frozen filesystem, but this would break lazy umount, which
I was told by Al Viro and Josef Bacik is a no go, so I discarded
this approach.

I will follow Al's advice a few years ago and do the following:
1- Let userspace umount frozen filesystems.
2- Provide a bdev level ioctl to unfreeze a umounted filesystem.
I will also:
3-  add the check ioctls so that users can check whether a
filesystem is frozen or not and avoid surprises.

The patch set I will be sending later today takes care of 1 and 3
(among other things) and a follow-up patch set will address 2
(need to make some changes to btrfs and get_active_super
before it is ready).

That said, even if the problem above did not exist it still would
be nice to have check ioctls from an API point of view.

Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state
  2012-09-13  8:19           ` Fernando Luis Vazquez Cao
@ 2012-09-14  0:15             ` Dave Chinner
  2012-09-14  1:46               ` Fernando Luis Vazquez Cao
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2012-09-14  0:15 UTC (permalink / raw)
  To: Fernando Luis Vazquez Cao
  Cc: Jan Kara, Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel

On Thu, Sep 13, 2012 at 05:19:21PM +0900, Fernando Luis Vazquez Cao wrote:
> On 2012/09/13 16:18, Dave Chinner wrote:
> >On Thu, Sep 13, 2012 at 03:19:23PM +0900, Fernando Luis Vazquez Cao wrote:
> >>On 2012/07/16 07:45, Dave Chinner wrote:
> >>>On Fri, Jul 13, 2012 at 03:54:54PM +0200, Jan Kara wrote:
> >>>>On Thu 12-07-12 18:10:14, Fernando Luis Vázquez Cao wrote:
> >>>>>The FIISFROZEN ioctl can be use by HA and monitoring software to check
> >>>>>the freeze state of a mounted filesystem.
> >>>Can you explain in more detail why the HA system needs to check this?
> >>>And, for that matter, what it does with that information?
> >>What our HA guys told me is that certain fencing scripts
> >>try to umount filesystems that can be in frozen state. The
> >>problem is that when we umount a frozen filesystem the
> >>superblock still stays around which can lead to a split-brain
> >>scenario.
> >Then the bug is that unmounting a frozen filesystem is not
> >working correctly. Fix the problem, don't add new APIs to try to
> >detect a state where the bug might get tripped over and avoid it.
> 
> The problem is that we allow users to umount a frozen
> filesystem but we have neither  a bdev level fsfreeze
> API to thaw it after that nor check ioctls.

Yes, I know they don't exist, but you have to justify why they are
needed. So far you haven't.

> I proposed returning EBUSY when userspace tries to umount a
> frozen filesystem, but this would break lazy umount, which
> I was told by Al Viro and Josef Bacik is a no go, so I discarded
> this approach.
> 
> I will follow Al's advice a few years ago and do the following:
> 1- Let userspace umount frozen filesystems.

Which it already can.

> 2- Provide a bdev level ioctl to unfreeze a umounted filesystem.

The only time a block device knows about the frozen state of the
superblock is when dm-snapshot drives the freeze from the block
device. There are special hooks in, e.g. mount_bdev() for checking
whether there is an active bdev freeze and this prevents new mounts
from occurring while a bdev freeze is in progress. DM also
removes the freeze state itself, so this is not a persistent state.

IOWs, there are two specific freeze types - one a superblock (user)
level freeze, and the other is a block device (kernel) level freeze.
What you are proposing here means that the user, in certain
circumstances, needs to manipulate superblock level freezes from the
block level because they superblock is no longer visible to the
user.  It's a recipe for confusion and convoluted bugs, and it sure
as hell won't work for all filesystems. e.g. see 18e9e51 ("Introduce
freeze_super and thaw_super for the fsfreeze ioctl") as the reason
why superblock level freezes exist and why trying to thaw from the
bdev level doesn't work.

Indeed, what happens when the superblock freeze is driven from
dm-snapshot, and the user unmounts the fs and runs the blockdev
ioctl to drop the freeze reference that dm-snapshot holds?
That would free the superblock out from underneath DM, so this is
a can of worms I'd prefer not to open.

> I will also:
> 3-  add the check ioctls so that users can check whether a
> filesystem is frozen or not and avoid surprises.

Which only solves the problem for users that know they have to check
the state in certain corner cases. That doesn't fix the underlying
problem.

The reason this problem exists is that a active superblock level
freeze holds a reference to the superblock. This has interesting
side effects:

$ sudo xfs_freeze -f /mnt/scratch
$ sudo umount /mnt/scratch
$ sudo mount /dev/vdc /mnt/scratch
$ sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
<blocked on frozen state>
^Z
[1]+  Stopped                 sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
$ sudo xfs_freeze -u /mnt/scratch
$
$ fg
sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
wrote 65536/65536 bytes at offset 0
64 KiB, 16 ops; 0.0000 sec (inf EiB/sec and inf ops/sec)
$

The freeze persists over unmount/mount, but we write to the
filesystem during unmount, and run log recovery and write stuff to
the filesystem as part of the normal mount process. IOWs, no
filesystem checks for SB_UNFROZEN in either the unmount or mount
path and so we violate the freeze condition by writing to the block
device. This means that as it stands, an unmount or mount violates
the frozen state guarantee and unmount/mount effectively imply that
the filesystem is no longer frozen.  Hence silent snapshot image
corruption is very possible, and the fact that it is silent is very
worrying.

If you have a HA system that does user level freeze operations, then
you need a HA agent to handle the application that needs frozen
filesystems appropriately (i.e. tell it that whatever operation it
was doing has now failed and the filesystem is unfrozen, and, BTW,
shutdown so I can start you over there).  Trying to work around
weird freeze semantics at the filesystem unmount/blockdev level is
not going to make the application that froze the filesystem suddenly
avoid silent fs image corruption.

As it is, the requirement that we allow unmounting of frozen
filesystems implies that an unmount operation must also be a thaw
operation. The filesystem is no longer visible to the user, and they
have no method of controlling it's state anymore, so the user has
said to the kernel "it's all yours now". If the filesystem was
frozen when the unmount is issued, then the user is saying "I don't
care about the frozen state anymore". If they did care, then they
wouldn't be unmounting the filesystem without having first issued a
thaw operation.

FWIW, while we are unmounting the filesystem, we hold the s_umount lock
exclusively, so the filesystem cannot be thawed while an unmount is
in progress. Hence there is no reason why the unmount can't thaw the
superblock and take away it's reference as part of the unmount.

If we do this, all of the problems with persistent frozen superblock
state after unmount go away.  At this point, there is no need for
block device level ioctls to clean up a persistent frozen superblock
becuase they don't exist anymore. That means there is no need for
ioctls to check the frozen state of the filesystem before unmount,
either, because there is no problem to avoid....

> That said, even if the problem above did not exist it still would
> be nice to have check ioctls from an API point of view.

"Nice to have" is not a good enough reason to add new APIs. Fix the
underlying problems first, then we can discuss new APIs on their
merits.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state
  2012-09-14  0:15             ` Dave Chinner
@ 2012-09-14  1:46               ` Fernando Luis Vazquez Cao
  2012-09-14  6:28                 ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-09-14  1:46 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel

On 2012/09/14 09:15, Dave Chinner wrote:
> On Thu, Sep 13, 2012 at 05:19:21PM +0900, Fernando Luis Vazquez Cao wrote:
>> The problem is that we allow users to umount a frozen
>> filesystem but we have neither  a bdev level fsfreeze
>> API to thaw it after that nor check ioctls.
> Yes, I know they don't exist, but you have to justify why they are
> needed. So far you haven't.
>
>> I proposed returning EBUSY when userspace tries to umount a
>> frozen filesystem, but this would break lazy umount, which
>> I was told by Al Viro and Josef Bacik is a no go, so I discarded
>> this approach.
>>
>> I will follow Al's advice a few years ago and do the following:
>> 1- Let userspace umount frozen filesystems.
> Which it already can.

Of course. I just meant that I will leave things as they are in that regard.


>> 2- Provide a bdev level ioctl to unfreeze a umounted filesystem.
> The only time a block device knows about the frozen state of the
> superblock is when dm-snapshot drives the freeze from the block
> device. There are special hooks in, e.g. mount_bdev() for checking
> whether there is an active bdev freeze and this prevents new mounts
> from occurring while a bdev freeze is in progress.

Yes,  and I tried to retain that feature in my patch set.

> DM also removes the freeze state itself, so this is not a persistent state.

Unfortunately freeze_bdev and thaw bdev are symbols
exported to modules.


> IOWs, there are two specific freeze types - one a superblock (user)
> level freeze, and the other is a block device (kernel) level freeze.
> What you are proposing here means that the user, in certain
> circumstances, needs to manipulate superblock level freezes from the
> block level because they superblock is no longer visible to the
> user.  It's a recipe for confusion and convoluted bugs, and it sure
> as hell won't work for all filesystems. e.g. see 18e9e51 ("Introduce
> freeze_super and thaw_super for the fsfreeze ioctl") as the reason
> why superblock level freezes exist and why trying to thaw from the
> bdev level doesn't work.

I discussed the issue mentioned in that commit with Josef
Bacik and Chris Mason and decided to fix those filesystems
for which it would not work (we need to fix the btrfs issue
anyway).


> Indeed, what happens when the superblock freeze is driven from
> dm-snapshot, and the user unmounts the fs and runs the blockdev
> ioctl to drop the freeze reference that dm-snapshot holds?
> That would free the superblock out from underneath DM, so this is
> a can of worms I'd prefer not to open.

The prototype I wrote for the blockdev ioctl
grabs the bdev->bd_fsfreeze_mutex and checks
bdev->bd_fsfreeze_count. If bd_fsfreeze_count>1
it returns EBUSY, else it calls freeze_super so
that the thawed and subsequently released.

I hope this addresses your concerns. That said
I am not married to this aproach,  if we decide
that filesystems should be thawed on
umount (which I was told is a no-go in a previous
LSF) I am willing to implement it.


>> I will also:
>> 3-  add the check ioctls so that users can check whether a
>> filesystem is frozen or not and avoid surprises.
> Which only solves the problem for users that know they have to check
> the state in certain corner cases. That doesn't fix the underlying
> problem.
>
> The reason this problem exists is that a active superblock level
> freeze holds a reference to the superblock. This has interesting
> side effects:
>
> $ sudo xfs_freeze -f /mnt/scratch
> $ sudo umount /mnt/scratch
> $ sudo mount /dev/vdc /mnt/scratch
> $ sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
> <blocked on frozen state>
> ^Z
> [1]+  Stopped                 sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
> $ sudo xfs_freeze -u /mnt/scratch
> $
> $ fg
> sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
> wrote 65536/65536 bytes at offset 0
> 64 KiB, 16 ops; 0.0000 sec (inf EiB/sec and inf ops/sec)
> $
>
> The freeze persists over unmount/mount, but we write to the
> filesystem during unmount, and run log recovery and write stuff to
> the filesystem as part of the normal mount process. IOWs, no
> filesystem checks for SB_UNFROZEN in either the unmount or mount
> path and so we violate the freeze condition by writing to the block
> device. This means that as it stands, an unmount or mount violates
> the frozen state guarantee and unmount/mount effectively imply that
> the filesystem is no longer frozen.  Hence silent snapshot image
> corruption is very possible, and the fact that it is silent is very
> worrying.

That is why my original proposal was to return EBUSY
when userspace tries to umount a frozen filesystem.
However this breaks lazy umount and I was told that
auto-thaw on umount is not acceptable, so I came
with the current patch set. If the behaviour implemented
in this patches is considered the right one I guess
filesystems would need to get fixed to check for
SB_UNFROZEN (I need to check whether this is
realistic or not).


> As it is, the requirement that we allow unmounting of frozen
> filesystems implies that an unmount operation must also be a thaw
> operation. The filesystem is no longer visible to the user, and they
> have no method of controlling it's state anymore, so the user has
> said to the kernel "it's all yours now". If the filesystem was
> frozen when the unmount is issued, then the user is saying "I don't
> care about the frozen state anymore". If they did care, then they
> wouldn't be unmounting the filesystem without having first issued a
> thaw operation.
>
> FWIW, while we are unmounting the filesystem, we hold the s_umount lock
> exclusively, so the filesystem cannot be thawed while an unmount is
> in progress. Hence there is no reason why the unmount can't thaw the
> superblock and take away it's reference as part of the unmount.

I was told that behaviour was too harsh during previous
discussions. I do not dislike it personally though. I guess
this is Al's call.


> If we do this, all of the problems with persistent frozen superblock
> state after unmount go away.  At this point, there is no need for
> block device level ioctls to clean up a persistent frozen superblock
> becuase they don't exist anymore. That means there is no need for
> ioctls to check the frozen state of the filesystem before unmount,
> either, because there is no problem to avoid....

Check ioctls are still needed IMHO. For example, there
is one issue we run into quite often in virtualization
environments:

We have Linux guest OS where fsfreeze in performed by a
guest agent controlled by the hypervisor. At the behest of
the hypervisor the agent freezes a guest filesystem and, for
whatever reason (a bug for example), the agent dies after
that. The filesystem was frozen without the guest's users
being aware of it, so these users will find that writes to
the filesystem block and they have no way to know
what is going on because we do not have check ioctls.
In such cases I was told by costumers that they would
like to use either emergency thaw (which this patch set
attempts to fix) or a check ioctl.

Thanks,
Fernando

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state
  2012-09-14  1:46               ` Fernando Luis Vazquez Cao
@ 2012-09-14  6:28                 ` Dave Chinner
  2012-09-14  8:18                   ` Fernando Luis Vazquez Cao
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2012-09-14  6:28 UTC (permalink / raw)
  To: Fernando Luis Vazquez Cao
  Cc: Jan Kara, Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel

On Fri, Sep 14, 2012 at 10:46:33AM +0900, Fernando Luis Vazquez Cao wrote:
> On 2012/09/14 09:15, Dave Chinner wrote:
> >On Thu, Sep 13, 2012 at 05:19:21PM +0900, Fernando Luis Vazquez Cao wrote:
> >>The problem is that we allow users to umount a frozen
> >>filesystem but we have neither  a bdev level fsfreeze
> >>API to thaw it after that nor check ioctls.
> >>2- Provide a bdev level ioctl to unfreeze a umounted filesystem.
> >The only time a block device knows about the frozen state of the
> >superblock is when dm-snapshot drives the freeze from the block
> >device. There are special hooks in, e.g. mount_bdev() for checking
> >whether there is an active bdev freeze and this prevents new mounts
> >from occurring while a bdev freeze is in progress.
> 
> Yes,  and I tried to retain that feature in my patch set.
> 
> >DM also removes the freeze state itself, so this is not a persistent state.
> 
> Unfortunately freeze_bdev and thaw bdev are symbols
> exported to modules.

I don't see how that is relevant.

> >IOWs, there are two specific freeze types - one a superblock (user)
> >level freeze, and the other is a block device (kernel) level freeze.
> >What you are proposing here means that the user, in certain
> >circumstances, needs to manipulate superblock level freezes from the
> >block level because they superblock is no longer visible to the
> >user.  It's a recipe for confusion and convoluted bugs, and it sure
> >as hell won't work for all filesystems. e.g. see 18e9e51 ("Introduce
> >freeze_super and thaw_super for the fsfreeze ioctl") as the reason
> >why superblock level freezes exist and why trying to thaw from the
> >bdev level doesn't work.
> 
> I discussed the issue mentioned in that commit with Josef
> Bacik and Chris Mason and decided to fix those filesystems
> for which it would not work (we need to fix the btrfs issue
> anyway).

Fixing btrfs doesn't address the layering problem.

> >Indeed, what happens when the superblock freeze is driven from
> >dm-snapshot, and the user unmounts the fs and runs the blockdev
> >ioctl to drop the freeze reference that dm-snapshot holds?
> >That would free the superblock out from underneath DM, so this is
> >a can of worms I'd prefer not to open.
> 
> The prototype I wrote for the blockdev ioctl
> grabs the bdev->bd_fsfreeze_mutex and checks
> bdev->bd_fsfreeze_count. If bd_fsfreeze_count>1
> it returns EBUSY, else it calls freeze_super so
> that the thawed and subsequently released.

So if I run FIFREEZE, then unmount, bdev->bd_fsfreeze_count is
zero so BLKTHAW won't thaw the superblock. If you decide to ignore
the bd_fsfreeze_count and thaw_super(), then you don't have any
nesting sanity at all.

> I hope this addresses your concerns. That said
> I am not married to this aproach,  if we decide
> that filesystems should be thawed on
> umount (which I was told is a no-go in a previous
> LSF) I am willing to implement it.

I don't recall it ever being discussed. If filesystems are supposed
to maintain frozen state during mount and unmount (which is what
this implies), then we've got a lot of work to do in every single
filesystem to audit and ensure that the mount/unmount paths never
write anything to disk...

It's far simpler, and IMO much saner, simply to saw unmount == thaw.

> >>I will also:
> >>3-  add the check ioctls so that users can check whether a
> >>filesystem is frozen or not and avoid surprises.
> >Which only solves the problem for users that know they have to check
> >the state in certain corner cases. That doesn't fix the underlying
> >problem.
> >
> >The reason this problem exists is that a active superblock level
> >freeze holds a reference to the superblock. This has interesting
> >side effects:
> >
> >$ sudo xfs_freeze -f /mnt/scratch
> >$ sudo umount /mnt/scratch
> >$ sudo mount /dev/vdc /mnt/scratch
> >$ sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
> ><blocked on frozen state>
> >^Z
> >[1]+  Stopped                 sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
> >$ sudo xfs_freeze -u /mnt/scratch
> >$
> >$ fg
> >sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
> >wrote 65536/65536 bytes at offset 0
> >64 KiB, 16 ops; 0.0000 sec (inf EiB/sec and inf ops/sec)
> >$
> >
> >The freeze persists over unmount/mount, but we write to the
> >filesystem during unmount, and run log recovery and write stuff to
> >the filesystem as part of the normal mount process. IOWs, no
> >filesystem checks for SB_UNFROZEN in either the unmount or mount
> >path and so we violate the freeze condition by writing to the block
> >device. This means that as it stands, an unmount or mount violates
> >the frozen state guarantee and unmount/mount effectively imply that
> >the filesystem is no longer frozen.  Hence silent snapshot image
> >corruption is very possible, and the fact that it is silent is very
> >worrying.
> 
> That is why my original proposal was to return EBUSY
> when userspace tries to umount a frozen filesystem.

Which was never going to fly, even though it makes sense.

> However this breaks lazy umount and I was told that
> auto-thaw on umount is not acceptable, so I came
> with the current patch set.

Silently corrupted snapshots due to unmount/mount when frozen is
pretty bad, and there's no way around it right now.

FWIW: "I was told...": A reference to the discussion so I can find
out the reasons why it is unacceptible would be handy....

> >If we do this, all of the problems with persistent frozen superblock
> >state after unmount go away.  At this point, there is no need for
> >block device level ioctls to clean up a persistent frozen superblock
> >becuase they don't exist anymore. That means there is no need for
> >ioctls to check the frozen state of the filesystem before unmount,
> >either, because there is no problem to avoid....
> 
> Check ioctls are still needed IMHO. For example, there
> is one issue we run into quite often in virtualization
> environments:
> 
> We have Linux guest OS where fsfreeze in performed by a
> guest agent controlled by the hypervisor. At the behest of
> the hypervisor the agent freezes a guest filesystem and, for
> whatever reason (a bug for example), the agent dies after
> that.

> The filesystem was frozen without the guest's users
> being aware of it, so these users will find that writes to
> the filesystem block and they have no way to know
> what is going on because we do not have check ioctls.

They should know what happened - the failure of the management
operation or agent should have been reported to the administrator.
It doesn't matter whether the filesystem is frozen or not - any
failure of a automated management operation that coul daffect the
operationof the guest VM shoul dbe raising alarms.  At that point,
the owner ofthe guest will know what failed before they even start
digging around in the guest VM and have some idea of what to look
for and what to resolve.

I'd be looking for a new VM infrastructure provider if the first I
knew about failed infrastructure management operations was customers
complaining about their VMs hanging....

> In such cases I was told by costumers that they would
> like to use either emergency thaw (which this patch set
> attempts to fix) or a check ioctl.

If they can run a check ioctl, then they don't need an emergency
thaw - they can just run fsfreeze -u.

Further, what those users want is to know if the filesystem is
frozen, they don't *want* an ioctl.  For them, something in
/proc/mount (e.g. add a "frozen" state to the mount options output)
or /sys/block/<dev>/freeze_count for block level freezes. The
information is easy to obtain and only requires cat to access.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state
  2012-09-14  6:28                 ` Dave Chinner
@ 2012-09-14  8:18                   ` Fernando Luis Vazquez Cao
  0 siblings, 0 replies; 30+ messages in thread
From: Fernando Luis Vazquez Cao @ 2012-09-14  8:18 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Al Viro, Dave Chinner, Christoph Hellwig,
	linux-fsdevel, Fernando Luis Vázquez Cao

On 2012/09/14 15:28, Dave Chinner wrote:
> On Fri, Sep 14, 2012 at 10:46:33AM +0900, Fernando Luis Vazquez Cao wrote:
>>> Indeed, what happens when the superblock freeze is driven from
>>> dm-snapshot, and the user unmounts the fs and runs the blockdev
>>> ioctl to drop the freeze reference that dm-snapshot holds?
>>> That would free the superblock out from underneath DM, so this is
>>> a can of worms I'd prefer not to open.
>> The prototype I wrote for the blockdev ioctl
>> grabs the bdev->bd_fsfreeze_mutex and checks
>> bdev->bd_fsfreeze_count. If bd_fsfreeze_count>1
>> it returns EBUSY, else it calls freeze_super so
>> that the thawed and subsequently released.
> So if I run FIFREEZE, then unmount, bdev->bd_fsfreeze_count is
> zero so BLKTHAW won't thaw the superblock. If you decide to ignore
> the bd_fsfreeze_count and thaw_super(), then you don't have any
> nesting sanity at all.

No. In such a case, since bdev_fsfreeze_count is 0 BLKTHAW
would call thaw_super and thaw the superblock. It is only
when bdev_fsfreeze_count>0 that BLKTHAW would return
EBUSY. BLKTHAW's job is to thaw a superblock sitting on top
of a block device not the block device itself and only if
the block device is not frozen.


>> I hope this addresses your concerns. That said
>> I am not married to this aproach,  if we decide
>> that filesystems should be thawed on
>> umount (which I was told is a no-go in a previous
>> LSF) I am willing to implement it.
> I don't recall it ever being discussed. If filesystems are supposed
> to maintain frozen state during mount and unmount (which is what
> this implies), then we've got a lot of work to do in every single
> filesystem to audit and ensure that the mount/unmount paths never
> write anything to disk...

I agree it is a lot of work.


> It's far simpler, and IMO much saner, simply to saw unmount == thaw.

This issue was discussed at LSF 2010 and we have a nice
report from that day written by Jon Corbet:

http://lwn.net/Articles/399148/

Al suggested that the right solution is creating a new
freeze command which is described in the article above.
I implemented this new API and posted the patches
last year:

http://www.spinics.net/lists/linux-fsdevel/msg47238.html

The consensus was that this is the best long-term solution,
but since there are tools based on the older ioctl() commands
we also need a new ioctl() command which will cause the
thawing of an unmounted filesystem.


>> However this breaks lazy umount and I was told that
>> auto-thaw on umount is not acceptable, so I came
>> with the current patch set.
> Silently corrupted snapshots due to unmount/mount when frozen is
> pretty bad, and there's no way around it right now.

I agree, we need to fix that too.


> FWIW: "I was told...": A reference to the discussion so I can find
> out the reasons why it is unacceptible would be handy....

The only thing I could find is the article linked
above. The main reason was that (from my
notes of the day): "Umount has no business thawing
a filesystem since it could interfere with a backup
process that is scheduled periodically for example.
As a different approach we could block umount
until the filesystem is thawed but the filesystem
can be left frozen for an undetermined amount
of time, so this behaviour was deemed undesirable
too."

I just implemented the behaviour people seemed
to prefer, but I am fine with either approach as
long as it is not broken and is documented properly
(and of course I am willing to do the work if
necessary).

I guess this is Al's call. Al, which solution do you want
us to implement?


>> The filesystem was frozen without the guest's users
>> being aware of it, so these users will find that writes to
>> the filesystem block and they have no way to know
>> what is going on because we do not have check ioctls.
> They should know what happened - the failure of the management
> operation or agent should have been reported to the administrator.
> It doesn't matter whether the filesystem is frozen or not - any
> failure of a automated management operation that coul daffect the
> operationof the guest VM shoul dbe raising alarms.  At that point,
> the owner ofthe guest will know what failed before they even start
> digging around in the guest VM and have some idea of what to look
> for and what to resolve.

The owner of the guest only knows that the guest agent
is dead and it has no knowledge about the state in which
the filesystem(s) were left.


>> In such cases I was told by costumers that they would
>> like to use either emergency thaw (which this patch set
>> attempts to fix) or a check ioctl.
> If they can run a check ioctl, then they don't need an emergency
> thaw - they can just run fsfreeze -u.

That is one of the reasons I want a check ioctl.
I agree that if we had check ioctls the emergency
thaw feature would be less important.


> Further, what those users want is to know if the filesystem is
> frozen, they don't *want* an ioctl.  For them, something in
> /proc/mount (e.g. add a "frozen" state to the mount options output)
> or /sys/block/<dev>/freeze_count for block level freezes. The
> information is easy to obtain and only requires cat to access.

Either API is fine with me (I even implemented both!).
The reason that I proposed the ioctl is that since
we use an ioctl to freeze/thaw the filesystem our
customers (and most of the of the people I discussed
this issue with) wanted me to implement the check
functionality as an ioctl too, for the sake of consistency.

Thanks,
Fernando

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2012-09-14  8:18 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-12  8:57 [PATCH 0/8 v2] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
2012-07-12  9:02 ` [PATCH 1/8] fsfreeze: Prevent emergency thaw from looping infinitely Fernando Luis Vázquez Cao
2012-07-13 12:59   ` Jan Kara
2012-07-17  5:13     ` Fernando Luis Vazquez Cao
2012-07-12  9:04 ` [PATCH 2/8] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
2012-07-13 13:17   ` Jan Kara
2012-08-09  6:00     ` Fernando Luis Vazquez Cao
2012-09-13  6:23     ` Fernando Luis Vazquez Cao
2012-07-12  9:05 ` [PATCH 3/8] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
2012-07-13 13:45   ` Jan Kara
2012-08-09  9:00     ` Fernando Luis Vazquez Cao
2012-07-12  9:07 ` [PATCH 4/8] fsfreeze: switch to using super methods where possible Fernando Luis Vázquez Cao
2012-07-13 13:50   ` Jan Kara
2012-07-12  9:09 ` [PATCH 5/8] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
2012-07-13 13:53   ` Jan Kara
2012-07-12  9:10 ` [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao
2012-07-13 13:54   ` Jan Kara
2012-07-15 22:45     ` Dave Chinner
2012-09-13  6:19       ` Fernando Luis Vazquez Cao
2012-09-13  7:18         ` Dave Chinner
2012-09-13  8:19           ` Fernando Luis Vazquez Cao
2012-09-14  0:15             ` Dave Chinner
2012-09-14  1:46               ` Fernando Luis Vazquez Cao
2012-09-14  6:28                 ` Dave Chinner
2012-09-14  8:18                   ` Fernando Luis Vazquez Cao
2012-09-13  6:11     ` Fernando Luis Vazquez Cao
2012-07-12  9:11 ` [PATCH 7/8] fsfreeze: add block device " Fernando Luis Vázquez Cao
2012-07-12  9:12 ` [PATCH 8/8] fsfreeze: update Documentation/filesystems/Locking Fernando Luis Vázquez Cao
2012-07-13 14:11   ` Jan Kara
2012-07-17  1:42     ` Fernando Luis Vazquez Cao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.