All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC, PATCH 0/5] fsfreeze: fix sb vs bdev freeze/thaw b0rkage
@ 2010-06-10  7:19 Dave Chinner
  2010-06-10  7:19 ` [PATCH 1/5] fsfreeze: Prevent emergency thaw from looping infinitely Dave Chinner
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Dave Chinner @ 2010-06-10  7:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, josef, jeffmerkey

The following series is for to address bugs in the emergency thawing code, as
well as mismatcheѕ with freezing at the block layer and the superblock that
break the freeze/thaw nesting order.

The first two patches fix the emergency thaw infinite loop reported by Jeff
Merkey and the deadlock on sb->s_umount that the infinite loop hid. These may
be stable kernel candidates.

The remainder of the patches address the bdev/sb mismatch and the fact that sb
level freezing does not nest correctly. For all the places that the bdev
interfaces are used, we need a superblock anyway so we may as well make
freeze/thaw work only at the sb level. As such, this series moves all the
nesting code to the sb from the bdev level and removes the
freeze_bdev/thaw_bdev interfaces completely. It also converts the emergency
thaw to work at the superblock level such that it will now thaw manually frozen
filesystems.

A *big* outstanding problem still exists - freezing takes an active reference
to the superblock, so unmounting an frozen filesystem has some nasty and
unexpected side effects. The existing code results in an unmountable block
device:

# mount /dev/vda /mnt/test
# xfs_freeze -f /mnt/test
# umount /mnt/test
# grep test /proc/mounts
# mkfs.xfs -f -l size=128m /dev/vda
mkfs.xfs: /dev/vda contains a mounted filesystem
Usage: mkfs.xfs
....
# mount /dev/vda /mnt/test
mount: /dev/vda already mounted or /mnt/test busy
#

At this point I can't get access to /dev/vda and needs a reboot to
get it and /mnt/test back.

This patch series results in the block device being mountable, but
remains frozen across unmount/mount:

# mount /dev/vda /mnt/test
# xfs_freeze -f /mnt/test
# umount /mnt/test
# grep test /proc/mounts
# mkfs.xfs -f -l size=128m /dev/vda
mkfs.xfs: /dev/vda contains a mounted filesystem
Usage: mkfs.xfs
....
# mount /dev/vda /mnt/test
# touch /mnt/test/foo &
[1] 2647
#
# xfs_freeze -u /mnt/test
[1]+  Done                    sudo touch /mnt/test/foo
# umount /mnt/test
# mkfs.xfs -f -l size=128m /dev/vda
meta-data=/dev/vda               isize=256    agcount=4, agsize=262144 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=1048576, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=32768, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
#

This behaviour is only marginally better than the existing behaviour
(at least you can release the references). However, I don't really
like either option - we used to disallow umount on a frozen
filesystems to avoid this problem.

So What is really supposed to happen when we unmount a frozen
superblock? Should unmount return EBUSY?  Should it be automatically
thawed so it doesn't affect block device behaviour after unmount?
Something else?

Cheers,

Dave.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/5] fsfreeze: Prevent emergency thaw from looping infinitely
  2010-06-10  7:19 [RFC, PATCH 0/5] fsfreeze: fix sb vs bdev freeze/thaw b0rkage Dave Chinner
@ 2010-06-10  7:19 ` Dave Chinner
  2010-06-14 15:18   ` Christoph Hellwig
  2010-06-10  7:19 ` [PATCH 2/5] fsfreeze: emergency thaw will deadlock on s_umount Dave Chinner
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2010-06-10  7:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, josef, jeffmerkey

From: Dave Chinner <dchinner@redhat.com>

The thawing of a filesystem through sysrq-j loops infinitely as it
incorrectly detects a thawed filesytsem as frozen and tries to
unfreeze repeatedly. This is a regression caused by
4504230a71566785a05d3e6b53fa1ee071b864eb ("freeze_bdev: grab active
reference to frozen superblocks") in that it no longer returned
-EINVAL for superblocks that were not frozen.

Return -EINVAL when the filesystem is already unfrozen to avoid this
problem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/block_dev.c |   13 +++++--------
 1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7346c96..366ac38 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -276,22 +276,19 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
 	if (!bdev->bd_fsfreeze_count)
 		goto out;
 
-	error = 0;
-	if (--bdev->bd_fsfreeze_count > 0)
+	if (!sb)
 		goto out;
 
-	if (!sb)
+	error = 0;
+	if (--bdev->bd_fsfreeze_count > 0)
 		goto out;
 
 	error = thaw_super(sb);
-	if (error) {
+	if (error)
 		bdev->bd_fsfreeze_count++;
-		mutex_unlock(&bdev->bd_fsfreeze_mutex);
-		return error;
-	}
 out:
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
-	return 0;
+	return error;
 }
 EXPORT_SYMBOL(thaw_bdev);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/5] fsfreeze: emergency thaw will deadlock on s_umount
  2010-06-10  7:19 [RFC, PATCH 0/5] fsfreeze: fix sb vs bdev freeze/thaw b0rkage Dave Chinner
  2010-06-10  7:19 ` [PATCH 1/5] fsfreeze: Prevent emergency thaw from looping infinitely Dave Chinner
@ 2010-06-10  7:19 ` Dave Chinner
  2010-06-14 15:20   ` Christoph Hellwig
  2010-06-10  7:19 ` [PATCH 3/5] fsfreeze: freeze_super and thaw_bdev don't play well together Dave Chinner
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2010-06-10  7:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, josef, jeffmerkey

From: Dave Chinner <dchinner@redhat.com>

The emergency thaw process uses iterate_super() which holds the
sb->s_umount lock in read mode. The current thaw_super() code takes
the sb->s_umount lock in write mode, hence leading to an instant
deadlock.

Pass the emergency state into the thaw_bdev/thaw_super code to avoid
taking the s_umount lock in this case. We are running under the bdev
freeze mutex, so this is still serialised against freeze despite
only having a read lock on the sb->s_umount. Hence it should be safe
to execute in this manner, especially given that emergency thaw is a
rarely executed "get-out-of-jail" feature.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/block_dev.c     |   26 ++++++++++++++++++++--
 fs/buffer.c        |    2 +-
 fs/super.c         |   58 +++++++++++++++++++++++++++++++++++++++++++---------
 include/linux/fs.h |    9 ++++++++
 4 files changed, 81 insertions(+), 14 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 366ac38..a8c8224 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -262,13 +262,14 @@ struct super_block *freeze_bdev(struct block_device *bdev)
 EXPORT_SYMBOL(freeze_bdev);
 
 /**
- * thaw_bdev  -- unlock filesystem
+ * __thaw_bdev  -- unlock filesystem
  * @bdev:	blockdevice to unlock
  * @sb:		associated superblock
+ * @emergency:	emergency thaw
  *
  * Unlocks the filesystem and marks it writeable again after freeze_bdev().
  */
-int thaw_bdev(struct block_device *bdev, struct super_block *sb)
+static int __thaw_bdev(struct block_device *bdev, struct super_block *sb, int emergency)
 {
 	int error = -EINVAL;
 
@@ -283,15 +284,34 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
 	if (--bdev->bd_fsfreeze_count > 0)
 		goto out;
 
-	error = thaw_super(sb);
+	if (emergency)
+		error = thaw_super_emergency(sb);
+	else
+		error = thaw_super(sb);
 	if (error)
 		bdev->bd_fsfreeze_count++;
 out:
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	return error;
 }
+/**
+ * thaw_bdev  -- unlock filesystem
+ * @bdev:	blockdevice to unlock
+ * @sb:		associated superblock
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_bdev().
+ */
+int thaw_bdev(struct block_device *bdev, struct super_block *sb)
+{
+	return __thaw_bdev(bdev, sb, 0);
+}
 EXPORT_SYMBOL(thaw_bdev);
 
+int thaw_bdev_emergency(struct block_device *bdev, struct super_block *sb)
+{
+	return __thaw_bdev(bdev, sb, 1);
+}
+
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
diff --git a/fs/buffer.c b/fs/buffer.c
index d54812b..f0c55d9 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -564,7 +564,7 @@ repeat:
 static void do_thaw_one(struct super_block *sb, void *unused)
 {
 	char b[BDEVNAME_SIZE];
-	while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb))
+	while (sb->s_bdev && !thaw_bdev_emergency(sb->s_bdev, sb))
 		printk(KERN_WARNING "Emergency Thaw on %s\n",
 		       bdevname(sb->s_bdev, b));
 }
diff --git a/fs/super.c b/fs/super.c
index 5c35bc7..76ed922 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -987,19 +987,24 @@ int freeze_super(struct super_block *sb)
 EXPORT_SYMBOL(freeze_super);
 
 /**
- * thaw_super -- unlock filesystem
+ * __thaw_super -- unlock filesystem
  * @sb: the super to thaw
+ * @emergency:	emergency thaw
  *
  * Unlocks the filesystem and marks it writeable again after freeze_super().
+ * If we are doing an emergency thaw, we don't need to grab the sb->s_umount
+ * lock as it is already held.
  */
-int thaw_super(struct super_block *sb)
+static int __thaw_super(struct super_block *sb, int emergency)
 {
-	int error;
+	int error = 0;
+
+	if (!emergency)
+		down_write(&sb->s_umount);
 
-	down_write(&sb->s_umount);
 	if (sb->s_frozen == SB_UNFROZEN) {
-		up_write(&sb->s_umount);
-		return -EINVAL;
+		error = -EINVAL;
+		goto out_unlock;
 	}
 
 	if (sb->s_flags & MS_RDONLY)
@@ -1011,8 +1016,7 @@ int thaw_super(struct super_block *sb)
 			printk(KERN_ERR
 				"VFS:Filesystem thaw failed\n");
 			sb->s_frozen = SB_FREEZE_TRANS;
-			up_write(&sb->s_umount);
-			return error;
+			goto out_unlock;
 		}
 	}
 
@@ -1020,12 +1024,46 @@ out:
 	sb->s_frozen = SB_UNFROZEN;
 	smp_wmb();
 	wake_up(&sb->s_wait_unfrozen);
-	deactivate_locked_super(sb);
-
+	/*
+	 * When called from emergency scope, we cannot grab the s_umount lock
+	 * so we cannot deactivate the superblock. This may leave unbalanced
+	 * superblock references which could prevent unmount, but given this is
+	 * an emergency operation....
+	 */
+	if (!emergency)
+		deactivate_locked_super(sb);
 	return 0;
+
+out_unlock:
+	if (!emergency)
+		up_write(&sb->s_umount);
+	return error;
+}
+
+/**
+ * thaw_super -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ */
+int thaw_super(struct super_block *sb)
+{
+	return __thaw_super(sb, 0);
 }
 EXPORT_SYMBOL(thaw_super);
 
+/**
+ * thaw_super_emergency  -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ * This avoids taking the s_umount lock if it is already held.
+ */
+int thaw_super_emergency(struct super_block *sb)
+{
+	return __thaw_super(sb, 1);
+}
+
 static struct vfsmount *fs_set_subtype(struct vfsmount *mnt, const char *fstype)
 {
 	int err;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 471e1ff..e246389 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1803,6 +1803,7 @@ extern int iterate_mounts(int (*)(struct vfsmount *, void *), void *,
 extern int vfs_statfs(struct dentry *, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
+extern int thaw_super_emergency(struct super_block *super);
 
 extern int current_umask(void);
 
@@ -1953,6 +1954,8 @@ extern int sync_blockdev(struct block_device *bdev);
 extern struct super_block *freeze_bdev(struct block_device *);
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
+extern int thaw_bdev_emergency(struct block_device *bdev,
+				struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
 #else
 static inline void bd_forget(struct inode *inode) {}
@@ -1968,6 +1971,12 @@ static inline int thaw_bdev(struct block_device *bdev, struct super_block *sb)
 {
 	return 0;
 }
+
+static inline int thaw_bdev_emergency(struct block_device *bdev,
+					struct super_block *sb)
+{
+	return 0;
+}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/5] fsfreeze: freeze_super and thaw_bdev don't play well together
  2010-06-10  7:19 [RFC, PATCH 0/5] fsfreeze: fix sb vs bdev freeze/thaw b0rkage Dave Chinner
  2010-06-10  7:19 ` [PATCH 1/5] fsfreeze: Prevent emergency thaw from looping infinitely Dave Chinner
  2010-06-10  7:19 ` [PATCH 2/5] fsfreeze: emergency thaw will deadlock on s_umount Dave Chinner
@ 2010-06-10  7:19 ` Dave Chinner
  2010-06-14 15:22   ` Christoph Hellwig
  2010-06-10  7:19 ` [PATCH 4/5] fsfreeze: switch to using super methods everywhere Dave Chinner
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2010-06-10  7:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, josef, jeffmerkey

From: Dave Chinner <dchinner@redhat.com>

thaw_bdev() has re-enterency guards to allow freezes to nest
together. That is, it ensures that the filesystem is not thawed
until the last thaw command is issued. This is needed to prevent the
filesystem from being unfrozen while an existing freezer is still
operating on the filesystem in a frozen state (e.g. dm-snapshot).

Currently, freeze_super() and thaw_super() bypasses these guards,
and as a result manual freezing and unfreezing via the ioctl methods
do not nest correctly. hence mixing userspace directed freezes with
block device level freezes result in inconsistency due to premature
thawing of the filesystem.

Move the re-enterency guards to the superblock and into freeze_super
and thaw_super() so that userspace directed freezes nest correctly
again.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/block_dev.c       |   59 ++++----------------------------------
 fs/gfs2/ops_fstype.c |   12 --------
 fs/super.c           |   77 ++++++++++++++++++++++++++++++-------------------
 include/linux/fs.h   |    8 ++---
 4 files changed, 56 insertions(+), 100 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index a8c8224..84899b3 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -230,71 +230,22 @@ struct super_block *freeze_bdev(struct block_device *bdev)
 	struct super_block *sb;
 	int error = 0;
 
-	mutex_lock(&bdev->bd_fsfreeze_mutex);
-	if (++bdev->bd_fsfreeze_count > 1) {
-		/*
-		 * We don't even need to grab a reference - the first call
-		 * to freeze_bdev grab an active reference and only the last
-		 * thaw_bdev drops it.
-		 */
-		sb = get_super(bdev);
-		drop_super(sb);
-		mutex_unlock(&bdev->bd_fsfreeze_mutex);
-		return sb;
-	}
-
 	sb = get_active_super(bdev);
 	if (!sb)
 		goto out;
 	error = freeze_super(sb);
 	if (error) {
 		deactivate_super(sb);
-		bdev->bd_fsfreeze_count--;
-		mutex_unlock(&bdev->bd_fsfreeze_mutex);
 		return ERR_PTR(error);
 	}
 	deactivate_super(sb);
  out:
 	sync_blockdev(bdev);
-	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	return sb;	/* thaw_bdev releases s->s_umount */
 }
 EXPORT_SYMBOL(freeze_bdev);
 
 /**
- * __thaw_bdev  -- unlock filesystem
- * @bdev:	blockdevice to unlock
- * @sb:		associated superblock
- * @emergency:	emergency thaw
- *
- * Unlocks the filesystem and marks it writeable again after freeze_bdev().
- */
-static int __thaw_bdev(struct block_device *bdev, struct super_block *sb, int emergency)
-{
-	int error = -EINVAL;
-
-	mutex_lock(&bdev->bd_fsfreeze_mutex);
-	if (!bdev->bd_fsfreeze_count)
-		goto out;
-
-	if (!sb)
-		goto out;
-
-	error = 0;
-	if (--bdev->bd_fsfreeze_count > 0)
-		goto out;
-
-	if (emergency)
-		error = thaw_super_emergency(sb);
-	else
-		error = thaw_super(sb);
-	if (error)
-		bdev->bd_fsfreeze_count++;
-out:
-	mutex_unlock(&bdev->bd_fsfreeze_mutex);
-	return error;
-}
-/**
  * thaw_bdev  -- unlock filesystem
  * @bdev:	blockdevice to unlock
  * @sb:		associated superblock
@@ -303,13 +254,17 @@ out:
  */
 int thaw_bdev(struct block_device *bdev, struct super_block *sb)
 {
-	return __thaw_bdev(bdev, sb, 0);
+	if (!sb)
+		return -EINVAL;
+	return thaw_super(sb);
 }
 EXPORT_SYMBOL(thaw_bdev);
 
 int thaw_bdev_emergency(struct block_device *bdev, struct super_block *sb)
 {
-	return __thaw_bdev(bdev, sb, 1);
+	if (!sb)
+		return -EINVAL;
+	return thaw_super_emergency(sb);
 }
 
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
@@ -434,8 +389,6 @@ static void init_once(void *foo)
 	INIT_LIST_HEAD(&bdev->bd_holder_list);
 #endif
 	inode_init_once(&ei->vfs_inode);
-	/* Initialize mutex for freeze. */
-	mutex_init(&bdev->bd_fsfreeze_mutex);
 }
 
 static inline void __bd_forget(struct inode *inode)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 3593b3a..5acc907 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1304,19 +1304,7 @@ static int gfs2_get_sb(struct file_system_type *fs_type, int flags,
 	if (IS_ERR(bdev))
 		return PTR_ERR(bdev);
 
-	/*
-	 * once the super is inserted into the list by sget, s_umount
-	 * will protect the lockfs code from trying to start a snapshot
-	 * while we are mounting
-	 */
-	mutex_lock(&bdev->bd_fsfreeze_mutex);
-	if (bdev->bd_fsfreeze_count > 0) {
-		mutex_unlock(&bdev->bd_fsfreeze_mutex);
-		error = -EBUSY;
-		goto error_bdev;
-	}
 	s = sget(fs_type, test_gfs2_super, set_gfs2_super, bdev);
-	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	error = PTR_ERR(s);
 	if (IS_ERR(s))
 		goto error_bdev;
diff --git a/fs/super.c b/fs/super.c
index 76ed922..5f13431 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -95,6 +95,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
 		s->s_maxbytes = MAX_NON_LFS;
 		s->s_op = &default_op;
 		s->s_time_gran = 1000000000;
+		mutex_init(&s->s_freeze_mutex);
 	}
 out:
 	return s;
@@ -744,19 +745,7 @@ int get_sb_bdev(struct file_system_type *fs_type,
 	if (IS_ERR(bdev))
 		return PTR_ERR(bdev);
 
-	/*
-	 * once the super is inserted into the list by sget, s_umount
-	 * will protect the lockfs code from trying to start a snapshot
-	 * while we are mounting
-	 */
-	mutex_lock(&bdev->bd_fsfreeze_mutex);
-	if (bdev->bd_fsfreeze_count > 0) {
-		mutex_unlock(&bdev->bd_fsfreeze_mutex);
-		error = -EBUSY;
-		goto error_bdev;
-	}
 	s = sget(fs_type, test_bdev_super, set_bdev_super, bdev);
-	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	if (IS_ERR(s))
 		goto error_s;
 
@@ -941,25 +930,31 @@ EXPORT_SYMBOL_GPL(vfs_kern_mount);
  * @sb: the super to lock
  *
  * Syncs the super to make sure the filesystem is consistent and calls the fs's
- * freeze_fs.  Subsequent calls to this without first thawing the fs will return
- * -EBUSY.
+ * freeze_fs.  The reference counter (s_freeze_count) guarantees that only the
+ * last unfreeze process can unfreeze the frozen filesystem actually when
+ * multiple freeze requests arrive simultaneously. It counts up in
+ * freeze_super() and count down in thaw_super(). When it becomes 0,
+ * thaw_super() will execute the unfreeze.
  */
 int freeze_super(struct super_block *sb)
 {
-	int ret;
+	int ret = 0;
 
 	atomic_inc(&sb->s_active);
 	down_write(&sb->s_umount);
+	mutex_lock(&sb->s_freeze_mutex);
+	if (++sb->s_freeze_count > 1)
+		goto out_deactivate;
+
 	if (sb->s_frozen) {
-		deactivate_locked_super(sb);
-		return -EBUSY;
+		ret = -EBUSY;
+		goto out_deactivate;
 	}
 
 	if (sb->s_flags & MS_RDONLY) {
 		sb->s_frozen = SB_FREEZE_TRANS;
 		smp_wmb();
-		up_write(&sb->s_umount);
-		return 0;
+		goto out_active;
 	}
 
 	sb->s_frozen = SB_FREEZE_WRITE;
@@ -977,12 +972,18 @@ int freeze_super(struct super_block *sb)
 			printk(KERN_ERR
 				"VFS:Filesystem freeze failed\n");
 			sb->s_frozen = SB_UNFROZEN;
-			deactivate_locked_super(sb);
-			return ret;
+			goto out_deactivate;
 		}
 	}
+out_active:
 	up_write(&sb->s_umount);
-	return 0;
+out_unlock:
+	mutex_unlock(&sb->s_freeze_mutex);
+	return ret;
+
+out_deactivate:
+	deactivate_locked_super(sb);
+	goto out_unlock;
 }
 EXPORT_SYMBOL(freeze_super);
 
@@ -993,22 +994,34 @@ EXPORT_SYMBOL(freeze_super);
  *
  * Unlocks the filesystem and marks it writeable again after freeze_super().
  * If we are doing an emergency thaw, we don't need to grab the sb->s_umount
- * lock as it is already held.
+ * lock as it is already held.  Return -EINVAL if @sb is not frozen, 0 if the
+ * unfreeze was executed and succeeded or the error if it failed. If the
+ * unfreeze fails, then leave @sb in the frozen state.
  */
 static int __thaw_super(struct super_block *sb, int emergency)
 {
-	int error = 0;
+	int error = -EINVAL;
 
-	if (!emergency)
+	if (!emergency) {
 		down_write(&sb->s_umount);
+		mutex_lock(&sb->s_freeze_mutex);
+		if (!sb->s_freeze_count)
+			goto out_unlock;
 
-	if (sb->s_frozen == SB_UNFROZEN) {
-		error = -EINVAL;
-		goto out_unlock;
+		if (--sb->s_freeze_count > 0) {
+			error = 0;
+			goto out_unlock;
+		}
+	} else {
+		/* already under down_read(&sb->s_umount) */
+		mutex_lock(&sb->s_freeze_mutex);
 	}
 
+	if (sb->s_frozen == SB_UNFROZEN)
+		goto out_unlock;
+
 	if (sb->s_flags & MS_RDONLY)
-		goto out;
+		goto out_unfreeze;
 
 	if (sb->s_op->unfreeze_fs) {
 		error = sb->s_op->unfreeze_fs(sb);
@@ -1020,10 +1033,11 @@ static int __thaw_super(struct super_block *sb, int emergency)
 		}
 	}
 
-out:
+out_unfreeze:
 	sb->s_frozen = SB_UNFROZEN;
 	smp_wmb();
 	wake_up(&sb->s_wait_unfrozen);
+	mutex_unlock(&sb->s_freeze_mutex);
 	/*
 	 * When called from emergency scope, we cannot grab the s_umount lock
 	 * so we cannot deactivate the superblock. This may leave unbalanced
@@ -1035,6 +1049,9 @@ out:
 	return 0;
 
 out_unlock:
+	if (error && error != -EINVAL)
+		sb->s_freeze_count++;
+	mutex_unlock(&sb->s_freeze_mutex);
 	if (!emergency)
 		up_write(&sb->s_umount);
 	return error;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e246389..f92b077 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -672,11 +672,6 @@ struct block_device {
 	 * care to not mess up bd_private for that case.
 	 */
 	unsigned long		bd_private;
-
-	/* The counter of freeze processes */
-	int			bd_fsfreeze_count;
-	/* Mutex for freeze */
-	struct mutex		bd_fsfreeze_mutex;
 };
 
 /*
@@ -1382,6 +1377,9 @@ struct super_block {
 	 * generic_show_options()
 	 */
 	char *s_options;
+
+	int			s_freeze_count;	/* nr of nested freezes */
+	struct mutex		s_freeze_mutex;	/* nesting lock */
 };
 
 extern struct timespec current_fs_time(struct super_block *sb);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/5] fsfreeze: switch to using super methods everywhere
  2010-06-10  7:19 [RFC, PATCH 0/5] fsfreeze: fix sb vs bdev freeze/thaw b0rkage Dave Chinner
                   ` (2 preceding siblings ...)
  2010-06-10  7:19 ` [PATCH 3/5] fsfreeze: freeze_super and thaw_bdev don't play well together Dave Chinner
@ 2010-06-10  7:19 ` Dave Chinner
  2010-06-14 15:23   ` Christoph Hellwig
  2010-06-10  7:19 ` [PATCH 5/5] fsfreeze: move emergency thaw code to fs/super.c Dave Chinner
  2010-06-10 12:45   ` Josef Bacik
  5 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2010-06-10  7:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, josef, jeffmerkey

From: Dave Chinner <dchinner@redhat.com>

freeze/thaw_bdev are now just trivial wrappers around
freeze/thaw_super(). Convert all users of the bdev interfaces to use
the superblock interfaces instead, and remove the bdev interfaces.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 drivers/md/dm.c    |   13 +++++++----
 fs/block_dev.c     |   54 ----------------------------------------------------
 fs/buffer.c        |    2 +-
 fs/super.c         |    1 +
 fs/xfs/xfs_fsops.c |   10 ++------
 include/linux/fs.h |   19 ------------------
 6 files changed, 13 insertions(+), 86 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index d21e128..70d5fd6 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2427,15 +2427,18 @@ static int lock_fs(struct mapped_device *md)
 
 	WARN_ON(md->frozen_sb);
 
-	md->frozen_sb = freeze_bdev(md->bdev);
-	if (IS_ERR(md->frozen_sb)) {
-		r = PTR_ERR(md->frozen_sb);
+	md->frozen_sb = get_active_super(md->bdev);
+	if (!md->frozen_sb)
+		return -EIO;
+	r = freeze_super(md->frozen_sb);
+	if (r) {
+		deactivate_super(md->frozen_sb);
 		md->frozen_sb = NULL;
 		return r;
 	}
+	deactivate_super(md->frozen_sb);
 
 	set_bit(DMF_FROZEN, &md->flags);
-
 	return 0;
 }
 
@@ -2444,7 +2447,7 @@ static void unlock_fs(struct mapped_device *md)
 	if (!test_bit(DMF_FROZEN, &md->flags))
 		return;
 
-	thaw_bdev(md->bdev, md->frozen_sb);
+	thaw_super(md->frozen_sb);
 	md->frozen_sb = NULL;
 	clear_bit(DMF_FROZEN, &md->flags);
 }
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 84899b3..3c3d1fe 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -213,60 +213,6 @@ int fsync_bdev(struct block_device *bdev)
 }
 EXPORT_SYMBOL(fsync_bdev);
 
-/**
- * freeze_bdev  --  lock a filesystem and force it into a consistent state
- * @bdev:	blockdevice to lock
- *
- * If a superblock is found on this device, we take the s_umount semaphore
- * on it to make sure nobody unmounts until the snapshot creation is done.
- * The reference counter (bd_fsfreeze_count) guarantees that only the last
- * unfreeze process can unfreeze the frozen filesystem actually when multiple
- * freeze requests arrive simultaneously. It counts up in freeze_bdev() and
- * count down in thaw_bdev(). When it becomes 0, thaw_bdev() will unfreeze
- * actually.
- */
-struct super_block *freeze_bdev(struct block_device *bdev)
-{
-	struct super_block *sb;
-	int error = 0;
-
-	sb = get_active_super(bdev);
-	if (!sb)
-		goto out;
-	error = freeze_super(sb);
-	if (error) {
-		deactivate_super(sb);
-		return ERR_PTR(error);
-	}
-	deactivate_super(sb);
- out:
-	sync_blockdev(bdev);
-	return sb;	/* thaw_bdev releases s->s_umount */
-}
-EXPORT_SYMBOL(freeze_bdev);
-
-/**
- * thaw_bdev  -- unlock filesystem
- * @bdev:	blockdevice to unlock
- * @sb:		associated superblock
- *
- * Unlocks the filesystem and marks it writeable again after freeze_bdev().
- */
-int thaw_bdev(struct block_device *bdev, struct super_block *sb)
-{
-	if (!sb)
-		return -EINVAL;
-	return thaw_super(sb);
-}
-EXPORT_SYMBOL(thaw_bdev);
-
-int thaw_bdev_emergency(struct block_device *bdev, struct super_block *sb)
-{
-	if (!sb)
-		return -EINVAL;
-	return thaw_super_emergency(sb);
-}
-
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
diff --git a/fs/buffer.c b/fs/buffer.c
index f0c55d9..b095fc1 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -564,7 +564,7 @@ repeat:
 static void do_thaw_one(struct super_block *sb, void *unused)
 {
 	char b[BDEVNAME_SIZE];
-	while (sb->s_bdev && !thaw_bdev_emergency(sb->s_bdev, sb))
+	while (sb->s_bdev && !thaw_super_emergency(sb))
 		printk(KERN_WARNING "Emergency Thaw on %s\n",
 		       bdevname(sb->s_bdev, b));
 }
diff --git a/fs/super.c b/fs/super.c
index 5f13431..81a4034 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -976,6 +976,7 @@ int freeze_super(struct super_block *sb)
 		}
 	}
 out_active:
+	sync_blockdev(sb->s_bdev);
 	up_write(&sb->s_umount);
 out_unlock:
 	mutex_unlock(&sb->s_freeze_mutex);
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 37a6f62..cc993a5 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -642,16 +642,12 @@ xfs_fs_goingdown(
 	__uint32_t	inflags)
 {
 	switch (inflags) {
-	case XFS_FSOP_GOING_FLAGS_DEFAULT: {
-		struct super_block *sb = freeze_bdev(mp->m_super->s_bdev);
-
-		if (sb && !IS_ERR(sb)) {
+	case XFS_FSOP_GOING_FLAGS_DEFAULT:
+		if (!freeze_super(mp->m_super)) {
 			xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
-			thaw_bdev(sb->s_bdev, sb);
+			thaw_super(mp->m_super);
 		}
-
 		break;
-	}
 	case XFS_FSOP_GOING_FLAGS_LOGFLUSH:
 		xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
 		break;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f92b077..39bf4ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1951,30 +1951,11 @@ extern void invalidate_bdev(struct block_device *);
 extern int sync_blockdev(struct block_device *bdev);
 extern struct super_block *freeze_bdev(struct block_device *);
 extern void emergency_thaw_all(void);
-extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
-extern int thaw_bdev_emergency(struct block_device *bdev,
-				struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
 #else
 static inline void bd_forget(struct inode *inode) {}
 static inline int sync_blockdev(struct block_device *bdev) { return 0; }
 static inline void invalidate_bdev(struct block_device *bdev) {}
-
-static inline struct super_block *freeze_bdev(struct block_device *sb)
-{
-	return NULL;
-}
-
-static inline int thaw_bdev(struct block_device *bdev, struct super_block *sb)
-{
-	return 0;
-}
-
-static inline int thaw_bdev_emergency(struct block_device *bdev,
-					struct super_block *sb)
-{
-	return 0;
-}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/5] fsfreeze: move emergency thaw code to fs/super.c
  2010-06-10  7:19 [RFC, PATCH 0/5] fsfreeze: fix sb vs bdev freeze/thaw b0rkage Dave Chinner
                   ` (3 preceding siblings ...)
  2010-06-10  7:19 ` [PATCH 4/5] fsfreeze: switch to using super methods everywhere Dave Chinner
@ 2010-06-10  7:19 ` Dave Chinner
  2010-06-14 15:25   ` Christoph Hellwig
  2010-06-10 12:45   ` Josef Bacik
  5 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2010-06-10  7:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, josef, jeffmerkey

From: Dave Chinner <dchinner@redhat.com>

It makes no sense having the emergency thaw code in fs/buffer.c when all of
it's operations are one superblocks and the code it executes is all in
fs/super.c. Move the code there and clean it up.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/buffer.c        |   31 -------------------------------
 fs/super.c         |   37 +++++++++++++++++++++++++++++++++----
 include/linux/fs.h |    1 -
 3 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index b095fc1..61bd994 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -561,37 +561,6 @@ repeat:
 	return err;
 }
 
-static void do_thaw_one(struct super_block *sb, void *unused)
-{
-	char b[BDEVNAME_SIZE];
-	while (sb->s_bdev && !thaw_super_emergency(sb))
-		printk(KERN_WARNING "Emergency Thaw on %s\n",
-		       bdevname(sb->s_bdev, b));
-}
-
-static void do_thaw_all(struct work_struct *work)
-{
-	iterate_supers(do_thaw_one, NULL);
-	kfree(work);
-	printk(KERN_WARNING "Emergency Thaw complete\n");
-}
-
-/**
- * emergency_thaw_all -- forcibly thaw every frozen filesystem
- *
- * Used for emergency unfreeze of all filesystems via SysRq
- */
-void emergency_thaw_all(void)
-{
-	struct work_struct *work;
-
-	work = kmalloc(sizeof(*work), GFP_ATOMIC);
-	if (work) {
-		INIT_WORK(work, do_thaw_all);
-		schedule_work(work);
-	}
-}
-
 /**
  * sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
  * @mapping: the mapping which wants those buffers written
diff --git a/fs/super.c b/fs/super.c
index 81a4034..680b8d5 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1059,7 +1059,7 @@ out_unlock:
 }
 
 /**
- * thaw_super -- unlock filesystem
+ * __thaw_super -- unlock filesystem
  * @sb: the super to thaw
  *
  * Unlocks the filesystem and marks it writeable again after freeze_super().
@@ -1075,11 +1075,40 @@ EXPORT_SYMBOL(thaw_super);
  * @sb: the super to thaw
  *
  * Unlocks the filesystem and marks it writeable again after freeze_super().
- * This avoids taking the s_umount lock if it is already held.
+ * This avoids taking the s_umount lock because it is already held.
+ */
+static void thaw_super_emergency(struct super_block *sb, void *unused)
+{
+
+	if (sb->s_bdev) {
+		char b[BDEVNAME_SIZE];
+		printk(KERN_WARNING "Emergency Thaw on %s.\n",
+						bdevname(sb->s_bdev, b));
+	}
+	while (!__thaw_super(sb, 1));
+}
+
+static void do_thaw_all(struct work_struct *work)
+{
+	iterate_supers(thaw_super_emergency, NULL);
+	kfree(work);
+	printk(KERN_WARNING "Emergency Thaw complete\n");
+}
+
+/**
+ * emergency_thaw_all -- forcibly thaw every frozen filesystem
+ *
+ * Used for emergency unfreeze of all filesystems via SysRq
  */
-int thaw_super_emergency(struct super_block *sb)
+void emergency_thaw_all(void)
 {
-	return __thaw_super(sb, 1);
+	struct work_struct *work;
+
+	work = kmalloc(sizeof(*work), GFP_ATOMIC);
+	if (work) {
+		INIT_WORK(work, do_thaw_all);
+		schedule_work(work);
+	}
 }
 
 static struct vfsmount *fs_set_subtype(struct vfsmount *mnt, const char *fstype)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 39bf4ac..a704062 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1801,7 +1801,6 @@ extern int iterate_mounts(int (*)(struct vfsmount *, void *), void *,
 extern int vfs_statfs(struct dentry *, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
-extern int thaw_super_emergency(struct super_block *super);
 
 extern int current_umask(void);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC, PATCH 0/5] fsfreeze: fix sb vs bdev freeze/thaw b0rkage
  2010-06-10  7:19 [RFC, PATCH 0/5] fsfreeze: fix sb vs bdev freeze/thaw b0rkage Dave Chinner
@ 2010-06-10 12:45   ` Josef Bacik
  2010-06-10  7:19 ` [PATCH 2/5] fsfreeze: emergency thaw will deadlock on s_umount Dave Chinner
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Josef Bacik @ 2010-06-10 12:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, viro, josef, jeffmerkey

On Thu, Jun 10, 2010 at 05:19:49PM +1000, Dave Chinner wrote:
> The following series is for to address bugs in the emergency thawing code, as
> well as mismatcheѕ with freezing at the block layer and the superblock that
> break the freeze/thaw nesting order.
> 
> The first two patches fix the emergency thaw infinite loop reported by Jeff
> Merkey and the deadlock on sb->s_umount that the infinite loop hid. These may
> be stable kernel candidates.
> 
> The remainder of the patches address the bdev/sb mismatch and the fact that sb
> level freezing does not nest correctly. For all the places that the bdev
> interfaces are used, we need a superblock anyway so we may as well make
> freeze/thaw work only at the sb level. As such, this series moves all the
> nesting code to the sb from the bdev level and removes the
> freeze_bdev/thaw_bdev interfaces completely. It also converts the emergency
> thaw to work at the superblock level such that it will now thaw manually frozen
> filesystems.
> 
> A *big* outstanding problem still exists - freezing takes an active reference
> to the superblock, so unmounting an frozen filesystem has some nasty and
> unexpected side effects. The existing code results in an unmountable block
> device:
> 
> # mount /dev/vda /mnt/test
> # xfs_freeze -f /mnt/test
> # umount /mnt/test
> # grep test /proc/mounts
> # mkfs.xfs -f -l size=128m /dev/vda
> mkfs.xfs: /dev/vda contains a mounted filesystem
> Usage: mkfs.xfs
> ....
> # mount /dev/vda /mnt/test
> mount: /dev/vda already mounted or /mnt/test busy
> #
> 
> At this point I can't get access to /dev/vda and needs a reboot to
> get it and /mnt/test back.
> 
> This patch series results in the block device being mountable, but
> remains frozen across unmount/mount:
> 
> # mount /dev/vda /mnt/test
> # xfs_freeze -f /mnt/test
> # umount /mnt/test
> # grep test /proc/mounts
> # mkfs.xfs -f -l size=128m /dev/vda
> mkfs.xfs: /dev/vda contains a mounted filesystem
> Usage: mkfs.xfs
> ....
> # mount /dev/vda /mnt/test
> # touch /mnt/test/foo &
> [1] 2647
> #
> # xfs_freeze -u /mnt/test
> [1]+  Done                    sudo touch /mnt/test/foo
> # umount /mnt/test
> # mkfs.xfs -f -l size=128m /dev/vda
> meta-data=/dev/vda               isize=256    agcount=4, agsize=262144 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=1048576, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal log           bsize=4096   blocks=32768, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> #
> 
> This behaviour is only marginally better than the existing behaviour
> (at least you can release the references). However, I don't really
> like either option - we used to disallow umount on a frozen
> filesystems to avoid this problem.
> 
> So What is really supposed to happen when we unmount a frozen
> superblock? Should unmount return EBUSY?  Should it be automatically
> thawed so it doesn't affect block device behaviour after unmount?
> Something else?
>

My vote is for EBUSY, that way we don't get this weird unexpected behavior
thing.  Plus if dm is doing a snapshot or whatever, unfreezing the fs to let it
unmount in the middle of it doing its thing could be unfun.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC, PATCH 0/5] fsfreeze: fix sb vs bdev freeze/thaw b0rkage
@ 2010-06-10 12:45   ` Josef Bacik
  0 siblings, 0 replies; 19+ messages in thread
From: Josef Bacik @ 2010-06-10 12:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, viro, josef, jeffmerkey

On Thu, Jun 10, 2010 at 05:19:49PM +1000, Dave Chinner wrote:
> The following series is for to address bugs in the emergency thawing code, as
> well as mismatcheѕ with freezing at the block layer and the superblock that
> break the freeze/thaw nesting order.
> 
> The first two patches fix the emergency thaw infinite loop reported by Jeff
> Merkey and the deadlock on sb->s_umount that the infinite loop hid. These may
> be stable kernel candidates.
> 
> The remainder of the patches address the bdev/sb mismatch and the fact that sb
> level freezing does not nest correctly. For all the places that the bdev
> interfaces are used, we need a superblock anyway so we may as well make
> freeze/thaw work only at the sb level. As such, this series moves all the
> nesting code to the sb from the bdev level and removes the
> freeze_bdev/thaw_bdev interfaces completely. It also converts the emergency
> thaw to work at the superblock level such that it will now thaw manually frozen
> filesystems.
> 
> A *big* outstanding problem still exists - freezing takes an active reference
> to the superblock, so unmounting an frozen filesystem has some nasty and
> unexpected side effects. The existing code results in an unmountable block
> device:
> 
> # mount /dev/vda /mnt/test
> # xfs_freeze -f /mnt/test
> # umount /mnt/test
> # grep test /proc/mounts
> # mkfs.xfs -f -l size=128m /dev/vda
> mkfs.xfs: /dev/vda contains a mounted filesystem
> Usage: mkfs.xfs
> ....
> # mount /dev/vda /mnt/test
> mount: /dev/vda already mounted or /mnt/test busy
> #
> 
> At this point I can't get access to /dev/vda and needs a reboot to
> get it and /mnt/test back.
> 
> This patch series results in the block device being mountable, but
> remains frozen across unmount/mount:
> 
> # mount /dev/vda /mnt/test
> # xfs_freeze -f /mnt/test
> # umount /mnt/test
> # grep test /proc/mounts
> # mkfs.xfs -f -l size=128m /dev/vda
> mkfs.xfs: /dev/vda contains a mounted filesystem
> Usage: mkfs.xfs
> ....
> # mount /dev/vda /mnt/test
> # touch /mnt/test/foo &
> [1] 2647
> #
> # xfs_freeze -u /mnt/test
> [1]+  Done                    sudo touch /mnt/test/foo
> # umount /mnt/test
> # mkfs.xfs -f -l size=128m /dev/vda
> meta-data=/dev/vda               isize=256    agcount=4, agsize=262144 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=1048576, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal log           bsize=4096   blocks=32768, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> #
> 
> This behaviour is only marginally better than the existing behaviour
> (at least you can release the references). However, I don't really
> like either option - we used to disallow umount on a frozen
> filesystems to avoid this problem.
> 
> So What is really supposed to happen when we unmount a frozen
> superblock? Should unmount return EBUSY?  Should it be automatically
> thawed so it doesn't affect block device behaviour after unmount?
> Something else?
>

My vote is for EBUSY, that way we don't get this weird unexpected behavior
thing.  Plus if dm is doing a snapshot or whatever, unfreezing the fs to let it
unmount in the middle of it doing its thing could be unfun.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] fsfreeze: Prevent emergency thaw from looping infinitely
  2010-06-10  7:19 ` [PATCH 1/5] fsfreeze: Prevent emergency thaw from looping infinitely Dave Chinner
@ 2010-06-14 15:18   ` Christoph Hellwig
  2010-06-14 23:19     ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2010-06-14 15:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, viro, josef, jeffmerkey

On Thu, Jun 10, 2010 at 05:19:50PM +1000, Dave Chinner wrote:
> Return -EINVAL when the filesystem is already unfrozen to avoid this
> problem.


This includes some additional changes in addition to the description,
and at least one of them seems incorrect.

> -	error = 0;
> -	if (--bdev->bd_fsfreeze_count > 0)
> +	if (!sb)
>  		goto out;
>  
> -	if (!sb)
> +	error = 0;
> +	if (--bdev->bd_fsfreeze_count > 0)
>  		goto out;

Here you reorder the sb check to be before the counter decrement.  But
we do support calling freeze_bdev on a device without a superblock, and
you would leak bd_fsfreeze_count for that case and wrongly return
-EINVAL on unthaw for these now.

>  	error = thaw_super(sb);
> -	if (error) {
> +	if (error)
>  		bdev->bd_fsfreeze_count++;
> -		mutex_unlock(&bdev->bd_fsfreeze_mutex);
> -		return error;
> -	}

Ok, useful cleanup.

>  out:
>  	mutex_unlock(&bdev->bd_fsfreeze_mutex);
> -	return 0;
> +	return error;

And this is the actual fix of course, also looks good.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] fsfreeze: emergency thaw will deadlock on s_umount
  2010-06-10  7:19 ` [PATCH 2/5] fsfreeze: emergency thaw will deadlock on s_umount Dave Chinner
@ 2010-06-14 15:20   ` Christoph Hellwig
  2010-06-14 23:21     ` Dave Chinner
  2010-06-21  1:57     ` Dave Chinner
  0 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2010-06-14 15:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, viro, josef, jeffmerkey

On Thu, Jun 10, 2010 at 05:19:51PM +1000, Dave Chinner wrote:
> The emergency thaw process uses iterate_super() which holds the
> sb->s_umount lock in read mode. The current thaw_super() code takes
> the sb->s_umount lock in write mode, hence leading to an instant
> deadlock.
> 
> Pass the emergency state into the thaw_bdev/thaw_super code to avoid
> taking the s_umount lock in this case. We are running under the bdev
> freeze mutex, so this is still serialised against freeze despite
> only having a read lock on the sb->s_umount. Hence it should be safe
> to execute in this manner, especially given that emergency thaw is a
> rarely executed "get-out-of-jail" feature.

This is correct as long as no one calls thaw_super directly, which
is not the case currently.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] fsfreeze: freeze_super and thaw_bdev don't play well together
  2010-06-10  7:19 ` [PATCH 3/5] fsfreeze: freeze_super and thaw_bdev don't play well together Dave Chinner
@ 2010-06-14 15:22   ` Christoph Hellwig
  2010-06-15  0:01     ` Dave Chinner
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2010-06-14 15:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, viro, josef, jeffmerkey

This breaks the "feature" that we can freeze a block device that doesn't
have a filesystem mounted yet.  For filesystems using get_sb_bdev that
prevents a new filesystem to be mounted on them.

I'm not sure it's a particularly useful feature, but it's been there
since day 1 of the freeze support.  The easiest way to not break it
would be to keep the per-sb freeze count only for that case and only
check it during mount.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] fsfreeze: switch to using super methods everywhere
  2010-06-10  7:19 ` [PATCH 4/5] fsfreeze: switch to using super methods everywhere Dave Chinner
@ 2010-06-14 15:23   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2010-06-14 15:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, viro, josef, jeffmerkey

On Thu, Jun 10, 2010 at 05:19:53PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> freeze/thaw_bdev are now just trivial wrappers around
> freeze/thaw_super(). Convert all users of the bdev interfaces to use
> the superblock interfaces instead, and remove the bdev interfaces.

This requires exporting get_active_super.  Buit if we want to keep suppoting
freezing of unmounted block devices we need to keep the helpers anyway.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] fsfreeze: move emergency thaw code to fs/super.c
  2010-06-10  7:19 ` [PATCH 5/5] fsfreeze: move emergency thaw code to fs/super.c Dave Chinner
@ 2010-06-14 15:25   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2010-06-14 15:25 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, viro, josef, jeffmerkey

On Thu, Jun 10, 2010 at 05:19:54PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> It makes no sense having the emergency thaw code in fs/buffer.c when all of
> it's operations are one superblocks and the code it executes is all in
> fs/super.c. Move the code there and clean it up.

> +static void thaw_super_emergency(struct super_block *sb, void *unused)
> +{
> +
> +	if (sb->s_bdev) {
> +		char b[BDEVNAME_SIZE];
> +		printk(KERN_WARNING "Emergency Thaw on %s.\n",
> +						bdevname(sb->s_bdev, b));
> +	}
> +	while (!__thaw_super(sb, 1));

Please move the ; to a separate line to make the loop better redable.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] fsfreeze: Prevent emergency thaw from looping infinitely
  2010-06-14 15:18   ` Christoph Hellwig
@ 2010-06-14 23:19     ` Dave Chinner
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2010-06-14 23:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-kernel, viro, josef, jeffmerkey

On Mon, Jun 14, 2010 at 11:18:15AM -0400, Christoph Hellwig wrote:
> On Thu, Jun 10, 2010 at 05:19:50PM +1000, Dave Chinner wrote:
> > Return -EINVAL when the filesystem is already unfrozen to avoid this
> > problem.
> 
> 
> This includes some additional changes in addition to the description,
> and at least one of them seems incorrect.
> 
> > -	error = 0;
> > -	if (--bdev->bd_fsfreeze_count > 0)
> > +	if (!sb)
> >  		goto out;
> >  
> > -	if (!sb)
> > +	error = 0;
> > +	if (--bdev->bd_fsfreeze_count > 0)
> >  		goto out;
> 
> Here you reorder the sb check to be before the counter decrement.  But
> we do support calling freeze_bdev on a device without a superblock, and
> you would leak bd_fsfreeze_count for that case and wrongly return
> -EINVAL on unthaw for these now.

Ok, will fix it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] fsfreeze: emergency thaw will deadlock on s_umount
  2010-06-14 15:20   ` Christoph Hellwig
@ 2010-06-14 23:21     ` Dave Chinner
  2010-06-21  1:57     ` Dave Chinner
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2010-06-14 23:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-kernel, viro, josef, jeffmerkey

On Mon, Jun 14, 2010 at 11:20:11AM -0400, Christoph Hellwig wrote:
> On Thu, Jun 10, 2010 at 05:19:51PM +1000, Dave Chinner wrote:
> > The emergency thaw process uses iterate_super() which holds the
> > sb->s_umount lock in read mode. The current thaw_super() code takes
> > the sb->s_umount lock in write mode, hence leading to an instant
> > deadlock.
> > 
> > Pass the emergency state into the thaw_bdev/thaw_super code to avoid
> > taking the s_umount lock in this case. We are running under the bdev
> > freeze mutex, so this is still serialised against freeze despite
> > only having a read lock on the sb->s_umount. Hence it should be safe
> > to execute in this manner, especially given that emergency thaw is a
> > rarely executed "get-out-of-jail" feature.
> 
> This is correct as long as no one calls thaw_super directly, which
> is not the case currently.

Yeah, the idea of the first two patches is that they can be applied
to a current tree and backported and prevent the infinite loop or
deadlock. The problem of thaw_bdev/thaw_super is what the rest of
the patches are supposed to address.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] fsfreeze: freeze_super and thaw_bdev don't play well together
  2010-06-14 15:22   ` Christoph Hellwig
@ 2010-06-15  0:01     ` Dave Chinner
  2010-06-15  6:24       ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2010-06-15  0:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-kernel, viro, josef, jeffmerkey

On Mon, Jun 14, 2010 at 11:22:19AM -0400, Christoph Hellwig wrote:
> This breaks the "feature" that we can freeze a block device that doesn't
> have a filesystem mounted yet.  For filesystems using get_sb_bdev that
> prevents a new filesystem to be mounted on them.
>
> I'm not sure it's a particularly useful feature, but it's been there
> since day 1 of the freeze support.  The easiest way to not break it
> would be to keep the per-sb freeze count only for that case and only
> check it during mount.

You mean the per-bdev freeze count, right? So freeze/thaw_bdev would
have to remain?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] fsfreeze: freeze_super and thaw_bdev don't play well together
  2010-06-15  0:01     ` Dave Chinner
@ 2010-06-15  6:24       ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2010-06-15  6:24 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, linux-fsdevel, linux-kernel, viro, josef, jeffmerkey

On Tue, Jun 15, 2010 at 10:01:07AM +1000, Dave Chinner wrote:
> On Mon, Jun 14, 2010 at 11:22:19AM -0400, Christoph Hellwig wrote:
> > This breaks the "feature" that we can freeze a block device that doesn't
> > have a filesystem mounted yet.  For filesystems using get_sb_bdev that
> > prevents a new filesystem to be mounted on them.
> >
> > I'm not sure it's a particularly useful feature, but it's been there
> > since day 1 of the freeze support.  The easiest way to not break it
> > would be to keep the per-sb freeze count only for that case and only
> > check it during mount.
> 
> You mean the per-bdev freeze count, right? So freeze/thaw_bdev would
> have to remain?

Yes.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] fsfreeze: emergency thaw will deadlock on s_umount
  2010-06-14 15:20   ` Christoph Hellwig
  2010-06-14 23:21     ` Dave Chinner
@ 2010-06-21  1:57     ` Dave Chinner
  2010-06-21  7:47       ` Christoph Hellwig
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2010-06-21  1:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-kernel, viro, josef, jeffmerkey

On Mon, Jun 14, 2010 at 11:20:11AM -0400, Christoph Hellwig wrote:
> On Thu, Jun 10, 2010 at 05:19:51PM +1000, Dave Chinner wrote:
> > The emergency thaw process uses iterate_super() which holds the
> > sb->s_umount lock in read mode. The current thaw_super() code takes
> > the sb->s_umount lock in write mode, hence leading to an instant
> > deadlock.
> > 
> > Pass the emergency state into the thaw_bdev/thaw_super code to avoid
> > taking the s_umount lock in this case. We are running under the bdev
> > freeze mutex, so this is still serialised against freeze despite
> > only having a read lock on the sb->s_umount. Hence it should be safe
> > to execute in this manner, especially given that emergency thaw is a
> > rarely executed "get-out-of-jail" feature.
> 
> This is correct as long as no one calls thaw_super directly, which
> is not the case currently.

This patch doesn't try to deal with the bdev/super mismatches; all
it does is prevent an obvious deadlock. Calling freeze/thaw_super
directly will serialise on the s_umount lock, calling
freeze/thaw_bdev() will serialise on the bdev freeze mutex, and if
we mix the two they'll serialise on the s_umount lock. So I think
with this patch serialisation will still occur correctly but avoid
the current deadlock.

I'll change the commit message to explain this better.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] fsfreeze: emergency thaw will deadlock on s_umount
  2010-06-21  1:57     ` Dave Chinner
@ 2010-06-21  7:47       ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2010-06-21  7:47 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, linux-fsdevel, linux-kernel, viro, josef, jeffmerkey

On Mon, Jun 21, 2010 at 11:57:31AM +1000, Dave Chinner wrote:
> This patch doesn't try to deal with the bdev/super mismatches; all
> it does is prevent an obvious deadlock. Calling freeze/thaw_super
> directly will serialise on the s_umount lock, calling
> freeze/thaw_bdev() will serialise on the bdev freeze mutex, and if
> we mix the two they'll serialise on the s_umount lock. So I think
> with this patch serialisation will still occur correctly but avoid
> the current deadlock.
> 
> I'll change the commit message to explain this better.

I don;t think the explanation alone is enough.

Right now thaw_super itself is only serialized by exclusive shared
s_umount.   thaw_bdev it also serialized by bd_fsfreeze_mutex, but
there are callers of thaw_super that do not go through thaw_bdev, so
our locking is not enough here.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-06-21  7:47 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-10  7:19 [RFC, PATCH 0/5] fsfreeze: fix sb vs bdev freeze/thaw b0rkage Dave Chinner
2010-06-10  7:19 ` [PATCH 1/5] fsfreeze: Prevent emergency thaw from looping infinitely Dave Chinner
2010-06-14 15:18   ` Christoph Hellwig
2010-06-14 23:19     ` Dave Chinner
2010-06-10  7:19 ` [PATCH 2/5] fsfreeze: emergency thaw will deadlock on s_umount Dave Chinner
2010-06-14 15:20   ` Christoph Hellwig
2010-06-14 23:21     ` Dave Chinner
2010-06-21  1:57     ` Dave Chinner
2010-06-21  7:47       ` Christoph Hellwig
2010-06-10  7:19 ` [PATCH 3/5] fsfreeze: freeze_super and thaw_bdev don't play well together Dave Chinner
2010-06-14 15:22   ` Christoph Hellwig
2010-06-15  0:01     ` Dave Chinner
2010-06-15  6:24       ` Christoph Hellwig
2010-06-10  7:19 ` [PATCH 4/5] fsfreeze: switch to using super methods everywhere Dave Chinner
2010-06-14 15:23   ` Christoph Hellwig
2010-06-10  7:19 ` [PATCH 5/5] fsfreeze: move emergency thaw code to fs/super.c Dave Chinner
2010-06-14 15:25   ` Christoph Hellwig
2010-06-10 12:45 ` [RFC, PATCH 0/5] fsfreeze: fix sb vs bdev freeze/thaw b0rkage Josef Bacik
2010-06-10 12:45   ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.