linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] vfs: provide automatic kernel freeze / resume
@ 2023-05-08  1:17 Luis Chamberlain
  2023-05-08  1:17 ` [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw Luis Chamberlain
                   ` (6 more replies)
  0 siblings, 7 replies; 34+ messages in thread
From: Luis Chamberlain @ 2023-05-08  1:17 UTC (permalink / raw)
  To: hch, djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm
  Cc: mchehab, keescook, p.raghav, da.gomez, linux-fsdevel, kernel,
	kexec, linux-kernel, Luis Chamberlain

It's been about 5 months since the last v3 RFC for fs freeze. Now
that some of us will have time at LSFMM to discuss stuff I figured
it would be good to try to get the last pieces, if any discussed,
and put out a new patch set based on the latests feedback. This
time I boot tested, and stress tested the patches. From what I can
tell I confirm nothing regresses, we just end up now with a new world
where if your platform does support s3  / s4 we'll kick into gear the
automatic kernel freeze.

To help with testing, as this is a rather tiny bit obscure area with
virtualization, I've gone ahead and extended kdevops with support for
always enabling s3 / s4, so it should be easy to test guest bring up
there.

I've picked out using stress-ng now to have fun stress testing suspend,
the insane will try:

./stress-ng --touch 8192 --touch-method creat

Resume works but eventually suspend will trickle tons of OOMs and so
we gotta find a sweet spot to automate this somehow in fstests somehow.
I am not sure how we're gonna test this moving forward on fstests but
perhaps worth talking about at LSFMM for ideas.

Anyway, your filesystem will not participate in the auto kernel
freeze / thaw unless your filesystem gets the kthread freezer junk
removed and sets a flag. I'll post 3 patches for the 3 main filesystems
after this. I've carried and advanced the SmPL patch for a few years
now, and magically it all still works.

[0] https://lkml.kernel.org/r/20230114003409.1168311-1-mcgrof@kernel.org

Luis Chamberlain (6):
  fs: unify locking semantics for fs freeze / thaw
  fs: add frozen sb state helpers
  fs: distinguish between user initiated freeze and kernel initiated
    freeze
  fs: move !SB_BORN check early on freeze and add for thaw
  fs: add iterate_supers_excl() and iterate_supers_reverse_excl()
  fs: add automatic kernel fs freeze / thaw and remove kthread freezing

 block/bdev.c           |   9 +-
 fs/ext4/ext4_jbd2.c    |   2 +-
 fs/f2fs/gc.c           |   9 +-
 fs/gfs2/glops.c        |   2 +-
 fs/gfs2/super.c        |  11 +-
 fs/gfs2/sys.c          |  12 ++-
 fs/gfs2/util.c         |   7 +-
 fs/ioctl.c             |  14 ++-
 fs/quota/quota.c       |   4 +-
 fs/super.c             | 239 +++++++++++++++++++++++++++++++++--------
 fs/xfs/xfs_trans.c     |   3 +-
 include/linux/fs.h     |  54 +++++++++-
 kernel/power/process.c |  15 ++-
 13 files changed, 313 insertions(+), 68 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw
  2023-05-08  1:17 [PATCH 0/6] vfs: provide automatic kernel freeze / resume Luis Chamberlain
@ 2023-05-08  1:17 ` Luis Chamberlain
  2023-05-18  5:32   ` Darrick J. Wong
                     ` (2 more replies)
  2023-05-08  1:17 ` [PATCH 2/6] fs: add frozen sb state helpers Luis Chamberlain
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 34+ messages in thread
From: Luis Chamberlain @ 2023-05-08  1:17 UTC (permalink / raw)
  To: hch, djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm
  Cc: mchehab, keescook, p.raghav, da.gomez, linux-fsdevel, kernel,
	kexec, linux-kernel, Luis Chamberlain

Right now freeze_super()  and thaw_super() are called with
different locking contexts. To expand on this is messy, so
just unify the requirement to require grabbing an active
reference and keep the superblock locked.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/bdev.c    |  5 +++-
 fs/f2fs/gc.c    |  5 ++++
 fs/gfs2/super.c |  9 +++++--
 fs/gfs2/sys.c   |  6 +++++
 fs/gfs2/util.c  |  5 ++++
 fs/ioctl.c      | 12 ++++++++--
 fs/super.c      | 62 ++++++++++++++++++-------------------------------
 7 files changed, 60 insertions(+), 44 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 21c63bfef323..dc54a2a1c46e 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -251,7 +251,7 @@ int freeze_bdev(struct block_device *bdev)
 		error = sb->s_op->freeze_super(sb);
 	else
 		error = freeze_super(sb);
-	deactivate_super(sb);
+	deactivate_locked_super(sb);
 
 	if (error) {
 		bdev->bd_fsfreeze_count--;
@@ -289,6 +289,8 @@ int thaw_bdev(struct block_device *bdev)
 	sb = bdev->bd_fsfreeze_sb;
 	if (!sb)
 		goto out;
+	if (!get_active_super(bdev))
+		goto out;
 
 	if (sb->s_op->thaw_super)
 		error = sb->s_op->thaw_super(sb);
@@ -298,6 +300,7 @@ int thaw_bdev(struct block_device *bdev)
 		bdev->bd_fsfreeze_count++;
 	else
 		bdev->bd_fsfreeze_sb = NULL;
+	deactivate_locked_super(sb);
 out:
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	return error;
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 61c5f9d26018..e31d6791d3e3 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -2166,7 +2166,10 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
 	if (err)
 		return err;
 
+	if (!get_active_super(sbi->sb->s_bdev))
+		return -ENOTTY;
 	freeze_super(sbi->sb);
+
 	f2fs_down_write(&sbi->gc_lock);
 	f2fs_down_write(&sbi->cp_global_sem);
 
@@ -2217,6 +2220,8 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
 out_err:
 	f2fs_up_write(&sbi->cp_global_sem);
 	f2fs_up_write(&sbi->gc_lock);
+	/* We use the same active reference from freeze */
 	thaw_super(sbi->sb);
+	deactivate_locked_super(sbi->sb);
 	return err;
 }
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 5eed8c237500..e57cb593e2f3 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -676,7 +676,12 @@ void gfs2_freeze_func(struct work_struct *work)
 	struct gfs2_sbd *sdp = container_of(work, struct gfs2_sbd, sd_freeze_work);
 	struct super_block *sb = sdp->sd_vfs;
 
-	atomic_inc(&sb->s_active);
+	if (!get_active_super(sb->s_bdev)) {
+		fs_info(sdp, "GFS2: couldn't grap super for thaw for filesystem\n");
+		gfs2_assert_withdraw(sdp, 0);
+		return;
+	}
+
 	error = gfs2_freeze_lock(sdp, &freeze_gh, 0);
 	if (error) {
 		gfs2_assert_withdraw(sdp, 0);
@@ -690,7 +695,7 @@ void gfs2_freeze_func(struct work_struct *work)
 		}
 		gfs2_freeze_unlock(&freeze_gh);
 	}
-	deactivate_super(sb);
+	deactivate_locked_super(sb);
 	clear_bit_unlock(SDF_FS_FROZEN, &sdp->sd_flags);
 	wake_up_bit(&sdp->sd_flags, SDF_FS_FROZEN);
 	return;
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 454dc2ff8b5e..cbb71c3520c0 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -164,6 +164,9 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
+	if (!get_active_super(sb->s_bdev))
+		return -ENOTTY;
+
 	switch (n) {
 	case 0:
 		error = thaw_super(sdp->sd_vfs);
@@ -172,9 +175,12 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
 		error = freeze_super(sdp->sd_vfs);
 		break;
 	default:
+		deactivate_locked_super(sb);
 		return -EINVAL;
 	}
 
+	deactivate_locked_super(sb);
+
 	if (error) {
 		fs_warn(sdp, "freeze %d error %d\n", n, error);
 		return error;
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 7a6aeffcdf5c..3a0cd5e9ad84 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -345,10 +345,15 @@ int gfs2_withdraw(struct gfs2_sbd *sdp)
 	set_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags);
 
 	if (sdp->sd_args.ar_errors == GFS2_ERRORS_WITHDRAW) {
+		if (!get_active_super(sb->s_bdev)) {
+			fs_err(sdp, "could not grab super on withdraw for file system\n");
+			return -1;
+		}
 		fs_err(sdp, "about to withdraw this file system\n");
 		BUG_ON(sdp->sd_args.ar_debug);
 
 		signal_our_withdraw(sdp);
+		deactivate_locked_super(sb);
 
 		kobject_uevent(&sdp->sd_kobj, KOBJ_OFFLINE);
 
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 5b2481cd4750..1d20af762e0d 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -386,6 +386,7 @@ static int ioctl_fioasync(unsigned int fd, struct file *filp,
 static int ioctl_fsfreeze(struct file *filp)
 {
 	struct super_block *sb = file_inode(filp)->i_sb;
+	int ret;
 
 	if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
@@ -394,10 +395,17 @@ static int ioctl_fsfreeze(struct file *filp)
 	if (sb->s_op->freeze_fs == NULL && sb->s_op->freeze_super == NULL)
 		return -EOPNOTSUPP;
 
+	if (!get_active_super(sb->s_bdev))
+		return -ENOTTY;
+
 	/* Freeze */
 	if (sb->s_op->freeze_super)
-		return sb->s_op->freeze_super(sb);
-	return freeze_super(sb);
+		ret = sb->s_op->freeze_super(sb);
+	ret = freeze_super(sb);
+
+	deactivate_locked_super(sb);
+
+	return ret;
 }
 
 static int ioctl_fsthaw(struct file *filp)
diff --git a/fs/super.c b/fs/super.c
index 34afe411cf2b..0e9d48846684 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -39,8 +39,6 @@
 #include <uapi/linux/mount.h>
 #include "internal.h"
 
-static int thaw_super_locked(struct super_block *sb);
-
 static LIST_HEAD(super_blocks);
 static DEFINE_SPINLOCK(sb_lock);
 
@@ -851,13 +849,13 @@ struct super_block *get_active_super(struct block_device *bdev)
 		if (sb->s_bdev == bdev) {
 			if (!grab_super(sb))
 				goto restart;
-			up_write(&sb->s_umount);
 			return sb;
 		}
 	}
 	spin_unlock(&sb_lock);
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(get_active_super);
 
 struct super_block *user_get_super(dev_t dev, bool excl)
 {
@@ -1024,13 +1022,13 @@ void emergency_remount(void)
 
 static void do_thaw_all_callback(struct super_block *sb)
 {
-	down_write(&sb->s_umount);
+	if (!get_active_super(sb->s_bdev))
+		return;
 	if (sb->s_root && sb->s_flags & SB_BORN) {
 		emergency_thaw_bdev(sb);
-		thaw_super_locked(sb);
-	} else {
-		up_write(&sb->s_umount);
+		thaw_super(sb);
 	}
+	deactivate_locked_super(sb);
 }
 
 static void do_thaw_all(struct work_struct *work)
@@ -1636,10 +1634,13 @@ static void sb_freeze_unlock(struct super_block *sb, int level)
 }
 
 /**
- * freeze_super - lock the filesystem and force it into a consistent state
+ * freeze_super - force a filesystem backed by a block device into a consistent state
  * @sb: the super to lock
  *
- * Syncs the super to make sure the filesystem is consistent and calls the fs's
+ * Used by filesystems and the kernel to freeze a fileystem backed by a block
+ * device into a consistent state. Callers must use get_active_super(bdev) to
+ * lock the @sb and when done must unlock it with deactivate_locked_super().
+ * Syncs the filesystem backed by the @sb and calls the filesystem's optional
  * freeze_fs.  Subsequent calls to this without first thawing the fs will return
  * -EBUSY.
  *
@@ -1672,22 +1673,15 @@ int freeze_super(struct super_block *sb)
 {
 	int ret;
 
-	atomic_inc(&sb->s_active);
-	down_write(&sb->s_umount);
-	if (sb->s_writers.frozen != SB_UNFROZEN) {
-		deactivate_locked_super(sb);
+	if (sb->s_writers.frozen != SB_UNFROZEN)
 		return -EBUSY;
-	}
 
-	if (!(sb->s_flags & SB_BORN)) {
-		up_write(&sb->s_umount);
+	if (!(sb->s_flags & SB_BORN))
 		return 0;	/* sic - it's "nothing to do" */
-	}
 
 	if (sb_rdonly(sb)) {
 		/* Nothing to do really... */
 		sb->s_writers.frozen = SB_FREEZE_COMPLETE;
-		up_write(&sb->s_umount);
 		return 0;
 	}
 
@@ -1707,7 +1701,6 @@ int freeze_super(struct super_block *sb)
 		sb->s_writers.frozen = SB_UNFROZEN;
 		sb_freeze_unlock(sb, SB_FREEZE_PAGEFAULT);
 		wake_up(&sb->s_writers.wait_unfrozen);
-		deactivate_locked_super(sb);
 		return ret;
 	}
 
@@ -1723,7 +1716,6 @@ int freeze_super(struct super_block *sb)
 			sb->s_writers.frozen = SB_UNFROZEN;
 			sb_freeze_unlock(sb, SB_FREEZE_FS);
 			wake_up(&sb->s_writers.wait_unfrozen);
-			deactivate_locked_super(sb);
 			return ret;
 		}
 	}
@@ -1733,19 +1725,25 @@ int freeze_super(struct super_block *sb)
 	 */
 	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
 	lockdep_sb_freeze_release(sb);
-	up_write(&sb->s_umount);
 	return 0;
 }
 EXPORT_SYMBOL(freeze_super);
 
-static int thaw_super_locked(struct super_block *sb)
+/**
+ * thaw_super -- unlock a filesystem backed by a block device
+ * @sb: the super to thaw
+ *
+ * Used by filesystems and the kernel to thaw a fileystem backed by a block
+ * device. Callers must use get_active_super(bdev) to lock the @sb and when
+ * done must unlock it with deactivate_locked_super(). Once done, this marks
+ * the filesystem as writeable.
+ */
+int thaw_super(struct super_block *sb)
 {
 	int error;
 
-	if (sb->s_writers.frozen != SB_FREEZE_COMPLETE) {
-		up_write(&sb->s_umount);
+	if (sb->s_writers.frozen != SB_FREEZE_COMPLETE)
 		return -EINVAL;
-	}
 
 	if (sb_rdonly(sb)) {
 		sb->s_writers.frozen = SB_UNFROZEN;
@@ -1760,7 +1758,6 @@ static int thaw_super_locked(struct super_block *sb)
 			printk(KERN_ERR
 				"VFS:Filesystem thaw failed\n");
 			lockdep_sb_freeze_release(sb);
-			up_write(&sb->s_umount);
 			return error;
 		}
 	}
@@ -1769,21 +1766,8 @@ static int thaw_super_locked(struct super_block *sb)
 	sb_freeze_unlock(sb, SB_FREEZE_FS);
 out:
 	wake_up(&sb->s_writers.wait_unfrozen);
-	deactivate_locked_super(sb);
 	return 0;
 }
-
-/**
- * thaw_super -- unlock filesystem
- * @sb: the super to thaw
- *
- * Unlocks the filesystem and marks it writeable again after freeze_super().
- */
-int thaw_super(struct super_block *sb)
-{
-	down_write(&sb->s_umount);
-	return thaw_super_locked(sb);
-}
 EXPORT_SYMBOL(thaw_super);
 
 /*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/6] fs: add frozen sb state helpers
  2023-05-08  1:17 [PATCH 0/6] vfs: provide automatic kernel freeze / resume Luis Chamberlain
  2023-05-08  1:17 ` [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw Luis Chamberlain
@ 2023-05-08  1:17 ` Luis Chamberlain
  2023-05-25 12:19   ` Jan Kara
  2023-06-08  5:05   ` Christoph Hellwig
  2023-05-08  1:17 ` [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze Luis Chamberlain
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 34+ messages in thread
From: Luis Chamberlain @ 2023-05-08  1:17 UTC (permalink / raw)
  To: hch, djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm
  Cc: mchehab, keescook, p.raghav, da.gomez, linux-fsdevel, kernel,
	kexec, linux-kernel, Luis Chamberlain

Provide helpers so that we can check a superblock frozen state.
This will make subsequent changes easier to read. This makes
no functional changes.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 fs/ext4/ext4_jbd2.c |  2 +-
 fs/gfs2/sys.c       |  2 +-
 fs/quota/quota.c    |  4 ++--
 fs/super.c          |  6 +++---
 fs/xfs/xfs_trans.c  |  3 +--
 include/linux/fs.h  | 22 ++++++++++++++++++++++
 6 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 77f318ec8abb..ef441f15053b 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -72,7 +72,7 @@ static int ext4_journal_check_start(struct super_block *sb)
 
 	if (sb_rdonly(sb))
 		return -EROFS;
-	WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
+	WARN_ON(sb_is_frozen(sb));
 	journal = EXT4_SB(sb)->s_journal;
 	/*
 	 * Special case here: if the journal has aborted behind our
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index cbb71c3520c0..e80c827acd09 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -148,7 +148,7 @@ static ssize_t uuid_show(struct gfs2_sbd *sdp, char *buf)
 static ssize_t freeze_show(struct gfs2_sbd *sdp, char *buf)
 {
 	struct super_block *sb = sdp->sd_vfs;
-	int frozen = (sb->s_writers.frozen == SB_UNFROZEN) ? 0 : 1;
+	int frozen = sb_is_unfrozen(sb) ? 0 : 1;
 
 	return snprintf(buf, PAGE_SIZE, "%d\n", frozen);
 }
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 052f143e2e0e..66ea23e15d93 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -890,13 +890,13 @@ static struct super_block *quotactl_block(const char __user *special, int cmd)
 	sb = user_get_super(dev, excl);
 	if (!sb)
 		return ERR_PTR(-ENODEV);
-	if (thawed && sb->s_writers.frozen != SB_UNFROZEN) {
+	if (thawed && !sb_is_unfrozen(sb)) {
 		if (excl)
 			up_write(&sb->s_umount);
 		else
 			up_read(&sb->s_umount);
 		wait_event(sb->s_writers.wait_unfrozen,
-			   sb->s_writers.frozen == SB_UNFROZEN);
+			   sb_is_unfrozen(sb));
 		put_super(sb);
 		goto retry;
 	}
diff --git a/fs/super.c b/fs/super.c
index 0e9d48846684..46c6475fc765 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -905,7 +905,7 @@ int reconfigure_super(struct fs_context *fc)
 
 	if (fc->sb_flags_mask & ~MS_RMT_MASK)
 		return -EINVAL;
-	if (sb->s_writers.frozen != SB_UNFROZEN)
+	if (!(sb_is_unfrozen(sb)))
 		return -EBUSY;
 
 	retval = security_sb_remount(sb, fc->security);
@@ -929,7 +929,7 @@ int reconfigure_super(struct fs_context *fc)
 			down_write(&sb->s_umount);
 			if (!sb->s_root)
 				return 0;
-			if (sb->s_writers.frozen != SB_UNFROZEN)
+			if (!sb_is_unfrozen(sb))
 				return -EBUSY;
 			remount_ro = !sb_rdonly(sb);
 		}
@@ -1673,7 +1673,7 @@ int freeze_super(struct super_block *sb)
 {
 	int ret;
 
-	if (sb->s_writers.frozen != SB_UNFROZEN)
+	if (!sb_is_unfrozen(sb))
 		return -EBUSY;
 
 	if (!(sb->s_flags & SB_BORN))
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 8afc0c080861..26caeafc572f 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -267,8 +267,7 @@ xfs_trans_alloc(
 	 * Zero-reservation ("empty") transactions can't modify anything, so
 	 * they're allowed to run while we're frozen.
 	 */
-	WARN_ON(resp->tr_logres > 0 &&
-		mp->m_super->s_writers.frozen == SB_FREEZE_COMPLETE);
+	WARN_ON(resp->tr_logres > 0 && sb_is_frozen(mp->m_super));
 	ASSERT(!(flags & XFS_TRANS_RES_FDBLKS) ||
 	       xfs_has_lazysbcount(mp));
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21a981680856..90b5bdc4071a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1621,6 +1621,28 @@ static inline bool sb_start_intwrite_trylock(struct super_block *sb)
 	return __sb_start_write_trylock(sb, SB_FREEZE_FS);
 }
 
+/**
+ * sb_is_frozen - is superblock frozen
+ * @sb: the super to check
+ *
+ * Returns true if the super is frozen.
+ */
+static inline bool sb_is_frozen(struct super_block *sb)
+{
+	return sb->s_writers.frozen == SB_FREEZE_COMPLETE;
+}
+
+/**
+ * sb_is_unfrozen - is superblock unfrozen
+ * @sb: the super to check
+ *
+ * Returns true if the super is unfrozen.
+ */
+static inline bool sb_is_unfrozen(struct super_block *sb)
+{
+	return sb->s_writers.frozen == SB_UNFROZEN;
+}
+
 bool inode_owner_or_capable(struct mnt_idmap *idmap,
 			    const struct inode *inode);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-05-08  1:17 [PATCH 0/6] vfs: provide automatic kernel freeze / resume Luis Chamberlain
  2023-05-08  1:17 ` [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw Luis Chamberlain
  2023-05-08  1:17 ` [PATCH 2/6] fs: add frozen sb state helpers Luis Chamberlain
@ 2023-05-08  1:17 ` Luis Chamberlain
  2023-05-16 15:23   ` Darrick J. Wong
  2023-05-22 23:42   ` Darrick J. Wong
  2023-05-08  1:17 ` [PATCH 4/6] fs: move !SB_BORN check early on freeze and add for thaw Luis Chamberlain
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 34+ messages in thread
From: Luis Chamberlain @ 2023-05-08  1:17 UTC (permalink / raw)
  To: hch, djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm
  Cc: mchehab, keescook, p.raghav, da.gomez, linux-fsdevel, kernel,
	kexec, linux-kernel, Luis Chamberlain

Userspace can initiate a freeze call using ioctls. If the kernel decides
to freeze a filesystem later it must be able to distinguish if userspace
had initiated the freeze, so that it does not unfreeze it later
automatically on resume.

Likewise if the kernel is initiating a freeze on its own it should *not*
fail to freeze a filesystem if a user had already frozen it on our behalf.
This same concept applies to thawing, even if its not possible for
userspace to beat the kernel in thawing a filesystem. This logic however
has never applied to userspace freezing and thawing, two consecutive
userspace freeze calls will results in only the first one succeeding, so
we must retain the same behaviour in userspace.

This doesn't implement yet kernel initiated filesystem freeze calls,
this will be done in subsequent calls. This change should introduce
no functional changes, it just extends the definitions of a frozen
filesystem to account for future kernel initiated filesystem freeze
and let's us keep record of when userpace initiated it so the kernel
can respect a userspace initiated freeze upon kernel initiated freeze
and its respective thaw cycle.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 block/bdev.c       |  4 ++--
 fs/f2fs/gc.c       |  4 ++--
 fs/gfs2/glops.c    |  2 +-
 fs/gfs2/super.c    |  2 +-
 fs/gfs2/sys.c      |  4 ++--
 fs/gfs2/util.c     |  2 +-
 fs/ioctl.c         |  4 ++--
 fs/super.c         | 29 +++++++++++++++++++++++++----
 include/linux/fs.h | 16 ++++++++++++++--
 9 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index dc54a2a1c46e..04f7b2c99845 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -250,7 +250,7 @@ int freeze_bdev(struct block_device *bdev)
 	if (sb->s_op->freeze_super)
 		error = sb->s_op->freeze_super(sb);
 	else
-		error = freeze_super(sb);
+		error = freeze_super(sb, true);
 	deactivate_locked_super(sb);
 
 	if (error) {
@@ -295,7 +295,7 @@ int thaw_bdev(struct block_device *bdev)
 	if (sb->s_op->thaw_super)
 		error = sb->s_op->thaw_super(sb);
 	else
-		error = thaw_super(sb);
+		error = thaw_super(sb, true);
 	if (error)
 		bdev->bd_fsfreeze_count++;
 	else
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index e31d6791d3e3..a5891055d85d 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -2168,7 +2168,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
 
 	if (!get_active_super(sbi->sb->s_bdev))
 		return -ENOTTY;
-	freeze_super(sbi->sb);
+	freeze_super(sbi->sb, true);
 
 	f2fs_down_write(&sbi->gc_lock);
 	f2fs_down_write(&sbi->cp_global_sem);
@@ -2221,7 +2221,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
 	f2fs_up_write(&sbi->cp_global_sem);
 	f2fs_up_write(&sbi->gc_lock);
 	/* We use the same active reference from freeze */
-	thaw_super(sbi->sb);
+	thaw_super(sbi->sb, true);
 	deactivate_locked_super(sbi->sb);
 	return err;
 }
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 01d433ed6ce7..8fd37508f9a0 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -584,7 +584,7 @@ static int freeze_go_sync(struct gfs2_glock *gl)
 	if (gl->gl_state == LM_ST_SHARED && !gfs2_withdrawn(sdp) &&
 	    !test_bit(SDF_NORECOVERY, &sdp->sd_flags)) {
 		atomic_set(&sdp->sd_freeze_state, SFS_STARTING_FREEZE);
-		error = freeze_super(sdp->sd_vfs);
+		error = freeze_super(sdp->sd_vfs, true);
 		if (error) {
 			fs_info(sdp, "GFS2: couldn't freeze filesystem: %d\n",
 				error);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index e57cb593e2f3..f2641891de43 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -687,7 +687,7 @@ void gfs2_freeze_func(struct work_struct *work)
 		gfs2_assert_withdraw(sdp, 0);
 	} else {
 		atomic_set(&sdp->sd_freeze_state, SFS_UNFROZEN);
-		error = thaw_super(sb);
+		error = thaw_super(sb, true);
 		if (error) {
 			fs_info(sdp, "GFS2: couldn't thaw filesystem: %d\n",
 				error);
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index e80c827acd09..9e0398f99674 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -169,10 +169,10 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
 
 	switch (n) {
 	case 0:
-		error = thaw_super(sdp->sd_vfs);
+		error = thaw_super(sdp->sd_vfs, true);
 		break;
 	case 1:
-		error = freeze_super(sdp->sd_vfs);
+		error = freeze_super(sdp->sd_vfs, true);
 		break;
 	default:
 		deactivate_locked_super(sb);
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 3a0cd5e9ad84..be9705d618ec 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -191,7 +191,7 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 		/* Make sure gfs2_unfreeze works if partially-frozen */
 		flush_work(&sdp->sd_freeze_work);
 		atomic_set(&sdp->sd_freeze_state, SFS_FROZEN);
-		thaw_super(sdp->sd_vfs);
+		thaw_super(sdp->sd_vfs, true);
 	} else {
 		wait_on_bit(&i_gl->gl_flags, GLF_DEMOTE,
 			    TASK_UNINTERRUPTIBLE);
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 1d20af762e0d..3cc79b82a5dc 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -401,7 +401,7 @@ static int ioctl_fsfreeze(struct file *filp)
 	/* Freeze */
 	if (sb->s_op->freeze_super)
 		ret = sb->s_op->freeze_super(sb);
-	ret = freeze_super(sb);
+	ret = freeze_super(sb, true);
 
 	deactivate_locked_super(sb);
 
@@ -418,7 +418,7 @@ static int ioctl_fsthaw(struct file *filp)
 	/* Thaw */
 	if (sb->s_op->thaw_super)
 		return sb->s_op->thaw_super(sb);
-	return thaw_super(sb);
+	return thaw_super(sb, true);
 }
 
 static int ioctl_file_dedupe_range(struct file *file,
diff --git a/fs/super.c b/fs/super.c
index 46c6475fc765..16ccbb9dd230 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1026,7 +1026,7 @@ static void do_thaw_all_callback(struct super_block *sb)
 		return;
 	if (sb->s_root && sb->s_flags & SB_BORN) {
 		emergency_thaw_bdev(sb);
-		thaw_super(sb);
+		thaw_super(sb, true);
 	}
 	deactivate_locked_super(sb);
 }
@@ -1636,6 +1636,8 @@ static void sb_freeze_unlock(struct super_block *sb, int level)
 /**
  * freeze_super - force a filesystem backed by a block device into a consistent state
  * @sb: the super to lock
+ * @usercall: whether or not userspace initiated this via an ioctl or if it
+ * 	was a kernel freeze
  *
  * Used by filesystems and the kernel to freeze a fileystem backed by a block
  * device into a consistent state. Callers must use get_active_super(bdev) to
@@ -1669,10 +1671,13 @@ static void sb_freeze_unlock(struct super_block *sb, int level)
  *
  * sb->s_writers.frozen is protected by sb->s_umount.
  */
-int freeze_super(struct super_block *sb)
+int freeze_super(struct super_block *sb, bool usercall)
 {
 	int ret;
 
+	if (!usercall && sb_is_frozen(sb))
+		return 0;
+
 	if (!sb_is_unfrozen(sb))
 		return -EBUSY;
 
@@ -1682,6 +1687,7 @@ int freeze_super(struct super_block *sb)
 	if (sb_rdonly(sb)) {
 		/* Nothing to do really... */
 		sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+		sb->s_writers.frozen_by_user = usercall;
 		return 0;
 	}
 
@@ -1699,6 +1705,7 @@ int freeze_super(struct super_block *sb)
 	ret = sync_filesystem(sb);
 	if (ret) {
 		sb->s_writers.frozen = SB_UNFROZEN;
+		sb->s_writers.frozen_by_user = false;
 		sb_freeze_unlock(sb, SB_FREEZE_PAGEFAULT);
 		wake_up(&sb->s_writers.wait_unfrozen);
 		return ret;
@@ -1724,6 +1731,7 @@ int freeze_super(struct super_block *sb)
 	 * when frozen is set to SB_FREEZE_COMPLETE, and for thaw_super().
 	 */
 	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+	sb->s_writers.frozen_by_user = usercall;
 	lockdep_sb_freeze_release(sb);
 	return 0;
 }
@@ -1732,21 +1740,33 @@ EXPORT_SYMBOL(freeze_super);
 /**
  * thaw_super -- unlock a filesystem backed by a block device
  * @sb: the super to thaw
+ * @usercall: whether or not userspace initiated this thaw or if it was the
+ * 	kernel which initiated it
  *
  * Used by filesystems and the kernel to thaw a fileystem backed by a block
  * device. Callers must use get_active_super(bdev) to lock the @sb and when
  * done must unlock it with deactivate_locked_super(). Once done, this marks
  * the filesystem as writeable.
  */
-int thaw_super(struct super_block *sb)
+int thaw_super(struct super_block *sb, bool usercall)
 {
 	int error;
 
-	if (sb->s_writers.frozen != SB_FREEZE_COMPLETE)
+	if (!usercall) {
+		/*
+		 * If userspace initiated the freeze don't let the kernel
+		 * thaw it on return from a kernel initiated freeze.
+		 */
+		if (sb_is_unfrozen(sb) || sb_is_frozen_by_user(sb))
+			return 0;
+	}
+
+	if (!sb_is_frozen(sb))
 		return -EINVAL;
 
 	if (sb_rdonly(sb)) {
 		sb->s_writers.frozen = SB_UNFROZEN;
+		sb->s_writers.frozen_by_user = false;
 		goto out;
 	}
 
@@ -1763,6 +1783,7 @@ int thaw_super(struct super_block *sb)
 	}
 
 	sb->s_writers.frozen = SB_UNFROZEN;
+	sb->s_writers.frozen_by_user = false;
 	sb_freeze_unlock(sb, SB_FREEZE_FS);
 out:
 	wake_up(&sb->s_writers.wait_unfrozen);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 90b5bdc4071a..d9b46c858103 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1146,6 +1146,7 @@ enum {
 
 struct sb_writers {
 	int				frozen;		/* Is sb frozen? */
+	bool				frozen_by_user;	/* User freeze? */
 	wait_queue_head_t		wait_unfrozen;	/* wait for thaw */
 	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
 };
@@ -1632,6 +1633,17 @@ static inline bool sb_is_frozen(struct super_block *sb)
 	return sb->s_writers.frozen == SB_FREEZE_COMPLETE;
 }
 
+/**
+ * sb_is_frozen_by_user - was the superblock frozen by userspace?
+ * @sb: the super to check
+ *
+ * Returns true if the super is frozen by userspace, such as an ioctl.
+ */
+static inline bool sb_is_frozen_by_user(struct super_block *sb)
+{
+	return sb_is_frozen(sb) && sb->s_writers.frozen_by_user;
+}
+
 /**
  * sb_is_unfrozen - is superblock unfrozen
  * @sb: the super to check
@@ -2308,8 +2320,8 @@ extern int unregister_filesystem(struct file_system_type *);
 extern int vfs_statfs(const struct path *, struct kstatfs *);
 extern int user_statfs(const char __user *, struct kstatfs *);
 extern int fd_statfs(int, struct kstatfs *);
-extern int freeze_super(struct super_block *super);
-extern int thaw_super(struct super_block *super);
+extern int freeze_super(struct super_block *super, bool usercall);
+extern int thaw_super(struct super_block *super, bool usercall);
 extern __printf(2, 3)
 int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
 extern int super_setup_bdi(struct super_block *sb);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/6] fs: move !SB_BORN check early on freeze and add for thaw
  2023-05-08  1:17 [PATCH 0/6] vfs: provide automatic kernel freeze / resume Luis Chamberlain
                   ` (2 preceding siblings ...)
  2023-05-08  1:17 ` [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze Luis Chamberlain
@ 2023-05-08  1:17 ` Luis Chamberlain
  2023-05-08  1:17 ` [PATCH 5/6] fs: add iterate_supers_excl() and iterate_supers_reverse_excl() Luis Chamberlain
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 34+ messages in thread
From: Luis Chamberlain @ 2023-05-08  1:17 UTC (permalink / raw)
  To: hch, djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm
  Cc: mchehab, keescook, p.raghav, da.gomez, linux-fsdevel, kernel,
	kexec, linux-kernel, Luis Chamberlain

The SB_BORN flag added on vfs_get_tree() when a filesystem is about to be
mounted. If a super_block lacks this flag there's nothing to do for that
filesystem in terms of freezing or thawing.

In subsequent patches support will be added to allow detecting failures for
iterating over all super_blocks and freezing or thawing. Although that
functionality will be be skipped when sb->s_bdi == &noop_backing_dev_info,
and so SB_BORN will not be set, since we already check for SB_BORN on
freeze just move that up earlier and to be consistent do the same on
thaw too. We do this as these are user facing APIs, and although it
would be incorrect to issue a freeze on a non-mounted sb, it is even
stranger to get -EBUSY.

Be consistent about this and bail early for !SB_BORN for freeze and thaw
without failing.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 fs/super.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 16ccbb9dd230..28c633b81f8f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1678,12 +1678,13 @@ int freeze_super(struct super_block *sb, bool usercall)
 	if (!usercall && sb_is_frozen(sb))
 		return 0;
 
+	/* If the filesystem was not going to be mounted there is nothing to do */
+	if (!(sb->s_flags & SB_BORN))
+		return 0;
+
 	if (!sb_is_unfrozen(sb))
 		return -EBUSY;
 
-	if (!(sb->s_flags & SB_BORN))
-		return 0;	/* sic - it's "nothing to do" */
-
 	if (sb_rdonly(sb)) {
 		/* Nothing to do really... */
 		sb->s_writers.frozen = SB_FREEZE_COMPLETE;
@@ -1761,6 +1762,10 @@ int thaw_super(struct super_block *sb, bool usercall)
 			return 0;
 	}
 
+	/* If the filesystem was not going to be mounted there is nothing to do */
+	if (!(sb->s_flags & SB_BORN))
+		return 0;
+
 	if (!sb_is_frozen(sb))
 		return -EINVAL;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/6] fs: add iterate_supers_excl() and iterate_supers_reverse_excl()
  2023-05-08  1:17 [PATCH 0/6] vfs: provide automatic kernel freeze / resume Luis Chamberlain
                   ` (3 preceding siblings ...)
  2023-05-08  1:17 ` [PATCH 4/6] fs: move !SB_BORN check early on freeze and add for thaw Luis Chamberlain
@ 2023-05-08  1:17 ` Luis Chamberlain
  2023-05-08  1:17 ` [PATCH 6/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing Luis Chamberlain
  2023-05-08  1:21 ` [PATCH 0/6] vfs: provide automatic kernel freeze / resume Luis Chamberlain
  6 siblings, 0 replies; 34+ messages in thread
From: Luis Chamberlain @ 2023-05-08  1:17 UTC (permalink / raw)
  To: hch, djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm
  Cc: mchehab, keescook, p.raghav, da.gomez, linux-fsdevel, kernel,
	kexec, linux-kernel, Luis Chamberlain

There are use cases where we wish to traverse the superblock list
but also capture errors, and in which case we want to avoid having
our callers issue a lock themselves since we can do the locking for
the callers. Provide a iterate_supers_excl() which calls a function
with the write lock held. If an error occurs we capture it and
propagate it.

Likewise there are use cases where we wish to traverse the superblock
list but in reverse order. The new iterate_supers_reverse_excl() helpers
does this but also also captures any errors encountered.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 fs/super.c         | 91 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |  2 +
 2 files changed, 93 insertions(+)

diff --git a/fs/super.c b/fs/super.c
index 28c633b81f8f..d5eab6b38b03 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -753,6 +753,97 @@ void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
 	spin_unlock(&sb_lock);
 }
 
+/**
+ *	iterate_supers_excl - exclusively call func for all active superblocks
+ *	@f: function to call
+ *	@arg: argument to pass to it
+ *
+ *	Scans the superblock list and calls given function, passing it
+ *	locked superblock and given argument. Returns 0 unless an error
+ *	occurred on calling the function on any superblock.
+ */
+int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg)
+{
+	struct super_block *sb, *p = NULL;
+	int error = 0;
+
+	spin_lock(&sb_lock);
+	list_for_each_entry(sb, &super_blocks, s_list) {
+		if (hlist_unhashed(&sb->s_instances))
+			continue;
+		sb->s_count++;
+		spin_unlock(&sb_lock);
+
+		down_write(&sb->s_umount);
+		if (sb->s_root && (sb->s_flags & SB_BORN)) {
+			error = f(sb, arg);
+			if (error) {
+				up_write(&sb->s_umount);
+				spin_lock(&sb_lock);
+				__put_super(sb);
+				break;
+			}
+		}
+		up_write(&sb->s_umount);
+
+		spin_lock(&sb_lock);
+		if (p)
+			__put_super(p);
+		p = sb;
+	}
+	if (p)
+		__put_super(p);
+	spin_unlock(&sb_lock);
+
+	return error;
+}
+
+/**
+ *	iterate_supers_reverse_excl - exclusively calls func in reverse order
+ *	@f: function to call
+ *	@arg: argument to pass to it
+ *
+ *	Scans the superblock list and calls given function, passing it
+ *	locked superblock and given argument, in reverse order, and holding
+ *	the s_umount write lock. Returns if an error occurred.
+ */
+int iterate_supers_reverse_excl(int (*f)(struct super_block *, void *),
+					 void *arg)
+{
+	struct super_block *sb, *p = NULL;
+	int error = 0;
+
+	spin_lock(&sb_lock);
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (hlist_unhashed(&sb->s_instances))
+			continue;
+		sb->s_count++;
+		spin_unlock(&sb_lock);
+
+		down_write(&sb->s_umount);
+		if (sb->s_root && (sb->s_flags & SB_BORN)) {
+			error = f(sb, arg);
+			if (error) {
+				up_write(&sb->s_umount);
+				spin_lock(&sb_lock);
+				__put_super(sb);
+				break;
+			}
+		}
+		up_write(&sb->s_umount);
+
+		spin_lock(&sb_lock);
+		if (p)
+			__put_super(p);
+		p = sb;
+	}
+	if (p)
+		__put_super(p);
+	spin_unlock(&sb_lock);
+
+	return error;
+}
+
 /**
  *	iterate_supers_type - call function for superblocks of given type
  *	@type: fs type
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d9b46c858103..22dd697ab703 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2935,6 +2935,8 @@ extern struct super_block *get_active_super(struct block_device *bdev);
 extern void drop_super(struct super_block *sb);
 extern void drop_super_exclusive(struct super_block *sb);
 extern void iterate_supers(void (*)(struct super_block *, void *), void *);
+extern int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg);
+extern int iterate_supers_reverse_excl(int (*)(struct super_block *, void *), void *);
 extern void iterate_supers_type(struct file_system_type *,
 			        void (*)(struct super_block *, void *), void *);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 6/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing
  2023-05-08  1:17 [PATCH 0/6] vfs: provide automatic kernel freeze / resume Luis Chamberlain
                   ` (4 preceding siblings ...)
  2023-05-08  1:17 ` [PATCH 5/6] fs: add iterate_supers_excl() and iterate_supers_reverse_excl() Luis Chamberlain
@ 2023-05-08  1:17 ` Luis Chamberlain
  2023-05-09  1:20   ` Dave Chinner
  2023-05-08  1:21 ` [PATCH 0/6] vfs: provide automatic kernel freeze / resume Luis Chamberlain
  6 siblings, 1 reply; 34+ messages in thread
From: Luis Chamberlain @ 2023-05-08  1:17 UTC (permalink / raw)
  To: hch, djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm
  Cc: mchehab, keescook, p.raghav, da.gomez, linux-fsdevel, kernel,
	kexec, linux-kernel, Luis Chamberlain

Add support to automatically handle freezing and thawing filesystems
during the kernel's suspend/resume cycle.

This is needed so that we properly really stop IO in flight without
races after userspace has been frozen. Without this we rely on
kthread freezing and its semantics are loose and error prone.
For instance, even though a kthread may use try_to_freeze() and end
up being frozen we have no way of being sure that everything that
has been spawned asynchronously from it (such as timers) have also
been stopped as well.

A long term advantage of also adding filesystem freeze / thawing
supporting during suspend / hibernation is that long term we may
be able to eventually drop the kernel's thread freezing completely
as it was originally added to stop disk IO in flight as we hibernate
or suspend.

This does not remove the superfluous freezer calls on all filesystems.
Each filesystem must remove all the kthread freezer stuff and peg
the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE
flag.

Subsequent patches remove the kthread freezer usage from each
filesystem, one at a time to make all this work bisectable.
Once all filesystems remove the usage of the kthread freezer we
can remove the FS_AUTOFREEZE flag.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 fs/super.c             | 50 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     | 14 ++++++++++++
 kernel/power/process.c | 15 ++++++++++++-
 3 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/fs/super.c b/fs/super.c
index d5eab6b38b03..148674429ab8 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1910,3 +1910,53 @@ int sb_init_dio_done_wq(struct super_block *sb)
 		destroy_workqueue(wq);
 	return 0;
 }
+
+#ifdef CONFIG_PM_SLEEP
+static bool super_should_freeze(struct super_block *sb)
+{
+	if (!(sb->s_type->fs_flags & FS_AUTOFREEZE))
+		return false;
+	/*
+	 * We don't freeze virtual filesystems, we skip those filesystems with
+	 * no backing device.
+	 */
+	if (sb->s_bdi == &noop_backing_dev_info)
+		return false;
+
+	return true;
+}
+
+int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
+{
+	int error = 0;
+
+	if (!super_should_freeze(sb))
+		goto out;
+
+	pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
+
+	error = freeze_super(sb, false);
+	if (error && error != -EBUSY)
+		pr_notice("%s (%s): Unable to freeze, error=%d",
+			  sb->s_type->name, sb->s_id, error);
+out:
+	return error;
+}
+
+int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
+{
+	int error = 0;
+
+	if (!super_should_freeze(sb))
+		goto out;
+
+	pr_info("%s (%s): thawing\n", sb->s_type->name, sb->s_id);
+
+	error = thaw_super(sb, false);
+	if (error && error != -EBUSY)
+		pr_notice("%s (%s): Unable to unfreeze, error=%d",
+			  sb->s_type->name, sb->s_id, error);
+out:
+	return error;
+}
+#endif /* CONFIG_PM_SLEEP */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 22dd697ab703..92c85c8ec1ed 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2247,6 +2247,7 @@ struct file_system_type {
 #define FS_DISALLOW_NOTIFY_PERM	16	/* Disable fanotify permission events */
 #define FS_ALLOW_IDMAP         32      /* FS has been updated to handle vfs idmappings. */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
+#define FS_AUTOFREEZE           (1<<16)	/*  temporary as we phase kthread freezer out */
 	int (*init_fs_context)(struct fs_context *);
 	const struct fs_parameter_spec *parameters;
 	struct dentry *(*mount) (struct file_system_type *, int,
@@ -2322,6 +2323,19 @@ extern int user_statfs(const char __user *, struct kstatfs *);
 extern int fd_statfs(int, struct kstatfs *);
 extern int freeze_super(struct super_block *super, bool usercall);
 extern int thaw_super(struct super_block *super, bool usercall);
+#ifdef CONFIG_PM_SLEEP
+int fs_suspend_freeze_sb(struct super_block *sb, void *priv);
+int fs_suspend_thaw_sb(struct super_block *sb, void *priv);
+#else
+static inline int fs_suspend_freeze_sb(struct super_block *sb, void *priv)
+{
+	return 0;
+}
+static inline int fs_suspend_thaw_sb(struct super_block *sb, void *priv)
+{
+	return 0;
+}
+#endif
 extern __printf(2, 3)
 int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
 extern int super_setup_bdi(struct super_block *sb);
diff --git a/kernel/power/process.c b/kernel/power/process.c
index cae81a87cc91..7ca7688f0b5d 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -140,6 +140,16 @@ int freeze_processes(void)
 
 	BUG_ON(in_atomic());
 
+	pr_info("Freezing filesystems ... ");
+	error = iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL);
+	if (error) {
+		pr_cont("failed\n");
+		iterate_supers_excl(fs_suspend_thaw_sb, NULL);
+		thaw_processes();
+		return error;
+	}
+	pr_cont("done.\n");
+
 	/*
 	 * Now that the whole userspace is frozen we need to disable
 	 * the OOM killer to disallow any further interference with
@@ -149,8 +159,10 @@ int freeze_processes(void)
 	if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs)))
 		error = -EBUSY;
 
-	if (error)
+	if (error) {
+		iterate_supers_excl(fs_suspend_thaw_sb, NULL);
 		thaw_processes();
+	}
 	return error;
 }
 
@@ -188,6 +200,7 @@ void thaw_processes(void)
 	pm_nosig_freezing = false;
 
 	oom_killer_enable();
+	iterate_supers_excl(fs_suspend_thaw_sb, NULL);
 
 	pr_info("Restarting tasks ... ");
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] vfs: provide automatic kernel freeze / resume
  2023-05-08  1:17 [PATCH 0/6] vfs: provide automatic kernel freeze / resume Luis Chamberlain
                   ` (5 preceding siblings ...)
  2023-05-08  1:17 ` [PATCH 6/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing Luis Chamberlain
@ 2023-05-08  1:21 ` Luis Chamberlain
  6 siblings, 0 replies; 34+ messages in thread
From: Luis Chamberlain @ 2023-05-08  1:21 UTC (permalink / raw)
  To: hch, djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm
  Cc: mchehab, keescook, p.raghav, da.gomez, linux-fsdevel, kernel,
	kexec, linux-kernel

I forgot to mention if you wanna test on a git tree:

https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20230507-fs-freeze

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing
  2023-05-08  1:17 ` [PATCH 6/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing Luis Chamberlain
@ 2023-05-09  1:20   ` Dave Chinner
  2023-05-16 15:17     ` Darrick J. Wong
  0 siblings, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2023-05-09  1:20 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hch, djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm, mchehab, keescook, p.raghav, da.gomez,
	linux-fsdevel, kernel, kexec, linux-kernel

On Sun, May 07, 2023 at 06:17:17PM -0700, Luis Chamberlain wrote:
> Add support to automatically handle freezing and thawing filesystems
> during the kernel's suspend/resume cycle.
> 
> This is needed so that we properly really stop IO in flight without
> races after userspace has been frozen. Without this we rely on
> kthread freezing and its semantics are loose and error prone.
> For instance, even though a kthread may use try_to_freeze() and end
> up being frozen we have no way of being sure that everything that
> has been spawned asynchronously from it (such as timers) have also
> been stopped as well.
> 
> A long term advantage of also adding filesystem freeze / thawing
> supporting during suspend / hibernation is that long term we may
> be able to eventually drop the kernel's thread freezing completely
> as it was originally added to stop disk IO in flight as we hibernate
> or suspend.
> 
> This does not remove the superfluous freezer calls on all filesystems.
> Each filesystem must remove all the kthread freezer stuff and peg
> the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE
> flag.
> 
> Subsequent patches remove the kthread freezer usage from each
> filesystem, one at a time to make all this work bisectable.
> Once all filesystems remove the usage of the kthread freezer we
> can remove the FS_AUTOFREEZE flag.
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  fs/super.c             | 50 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h     | 14 ++++++++++++
>  kernel/power/process.c | 15 ++++++++++++-
>  3 files changed, 78 insertions(+), 1 deletion(-)

.....

> diff --git a/kernel/power/process.c b/kernel/power/process.c
> index cae81a87cc91..7ca7688f0b5d 100644
> --- a/kernel/power/process.c
> +++ b/kernel/power/process.c
> @@ -140,6 +140,16 @@ int freeze_processes(void)
>  
>  	BUG_ON(in_atomic());
>  
> +	pr_info("Freezing filesystems ... ");
> +	error = iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL);
> +	if (error) {
> +		pr_cont("failed\n");
> +		iterate_supers_excl(fs_suspend_thaw_sb, NULL);
> +		thaw_processes();
> +		return error;

That looks wrong. i.e. if the sb iteration fails to freeze a
filesystem (for whatever reason) then every userspace frozen
filesystem will be thawed by this call, right? i.e. it will thaw
more than just the filesystems frozen by the suspend freeze
iteration before it failed.

Don't we only want to thaw the superblocks we froze before the
failure occurred? i.e. the "undo" iteration needs to start from the
last superblock we successfully froze and then only walk to the tail
of the list we started from?

> +	}
> +	pr_cont("done.\n");
> +
>  	/*
>  	 * Now that the whole userspace is frozen we need to disable
>  	 * the OOM killer to disallow any further interference with
> @@ -149,8 +159,10 @@ int freeze_processes(void)
>  	if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs)))
>  		error = -EBUSY;
>  
> -	if (error)
> +	if (error) {
> +		iterate_supers_excl(fs_suspend_thaw_sb, NULL);
>  		thaw_processes();
> +	}

Does this also have the same problem? i.e. if fs_suspend_freeze_sb()
skips over superblocks that are already userspace frozen without any
error, then this will incorrectly thaw those userspace frozen
filesystems.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing
  2023-05-09  1:20   ` Dave Chinner
@ 2023-05-16 15:17     ` Darrick J. Wong
  0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2023-05-16 15:17 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Luis Chamberlain, hch, sandeen, song, rafael, gregkh, viro, jack,
	jikos, bvanassche, ebiederm, mchehab, keescook, p.raghav,
	da.gomez, linux-fsdevel, kernel, kexec, linux-kernel

On Tue, May 09, 2023 at 11:20:13AM +1000, Dave Chinner wrote:
> On Sun, May 07, 2023 at 06:17:17PM -0700, Luis Chamberlain wrote:
> > Add support to automatically handle freezing and thawing filesystems
> > during the kernel's suspend/resume cycle.
> > 
> > This is needed so that we properly really stop IO in flight without
> > races after userspace has been frozen. Without this we rely on
> > kthread freezing and its semantics are loose and error prone.
> > For instance, even though a kthread may use try_to_freeze() and end
> > up being frozen we have no way of being sure that everything that
> > has been spawned asynchronously from it (such as timers) have also
> > been stopped as well.
> > 
> > A long term advantage of also adding filesystem freeze / thawing
> > supporting during suspend / hibernation is that long term we may
> > be able to eventually drop the kernel's thread freezing completely
> > as it was originally added to stop disk IO in flight as we hibernate
> > or suspend.
> > 
> > This does not remove the superfluous freezer calls on all filesystems.
> > Each filesystem must remove all the kthread freezer stuff and peg
> > the fs_type flags as supporting auto-freezing with the FS_AUTOFREEZE
> > flag.
> > 
> > Subsequent patches remove the kthread freezer usage from each
> > filesystem, one at a time to make all this work bisectable.
> > Once all filesystems remove the usage of the kthread freezer we
> > can remove the FS_AUTOFREEZE flag.
> > 
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> >  fs/super.c             | 50 ++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/fs.h     | 14 ++++++++++++
> >  kernel/power/process.c | 15 ++++++++++++-
> >  3 files changed, 78 insertions(+), 1 deletion(-)
> 
> .....
> 
> > diff --git a/kernel/power/process.c b/kernel/power/process.c
> > index cae81a87cc91..7ca7688f0b5d 100644
> > --- a/kernel/power/process.c
> > +++ b/kernel/power/process.c
> > @@ -140,6 +140,16 @@ int freeze_processes(void)
> >  
> >  	BUG_ON(in_atomic());
> >  
> > +	pr_info("Freezing filesystems ... ");
> > +	error = iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL);
> > +	if (error) {
> > +		pr_cont("failed\n");
> > +		iterate_supers_excl(fs_suspend_thaw_sb, NULL);
> > +		thaw_processes();
> > +		return error;
> 
> That looks wrong. i.e. if the sb iteration fails to freeze a
> filesystem (for whatever reason) then every userspace frozen
> filesystem will be thawed by this call, right? i.e. it will thaw
> more than just the filesystems frozen by the suspend freeze
> iteration before it failed.
> 
> Don't we only want to thaw the superblocks we froze before the
> failure occurred? i.e. the "undo" iteration needs to start from the
> last superblock we successfully froze and then only walk to the tail
> of the list we started from?

I think fs_suspend_thaw_sb calls thaw_super(..., false), which will not
undo a userspace freeze.  So strictly speaking the answer to your
question is (AFAICT) "no it won't"

That said, I read this and also had a raised-eyebrow moment -- we
shouldn't (un?)touch superblocks that fs_suspend_freeze_sb didn't touch
in the first place.

I wonder if we should be using that NULL parameter to keep track of the
last super that fs_suspend_freeze_sb didn't fail on?

--D

> > +	}
> > +	pr_cont("done.\n");
> > +
> >  	/*
> >  	 * Now that the whole userspace is frozen we need to disable
> >  	 * the OOM killer to disallow any further interference with
> > @@ -149,8 +159,10 @@ int freeze_processes(void)
> >  	if (!error && !oom_killer_disable(msecs_to_jiffies(freeze_timeout_msecs)))
> >  		error = -EBUSY;
> >  
> > -	if (error)
> > +	if (error) {
> > +		iterate_supers_excl(fs_suspend_thaw_sb, NULL);
> >  		thaw_processes();
> > +	}
> 
> Does this also have the same problem? i.e. if fs_suspend_freeze_sb()
> skips over superblocks that are already userspace frozen without any
> error, then this will incorrectly thaw those userspace frozen
> filesystems.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-05-08  1:17 ` [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze Luis Chamberlain
@ 2023-05-16 15:23   ` Darrick J. Wong
  2023-05-22 23:42   ` Darrick J. Wong
  1 sibling, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2023-05-16 15:23 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hch, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm, mchehab, keescook, p.raghav, da.gomez,
	linux-fsdevel, kernel, kexec, linux-kernel

On Sun, May 07, 2023 at 06:17:14PM -0700, Luis Chamberlain wrote:
> Userspace can initiate a freeze call using ioctls. If the kernel decides
> to freeze a filesystem later it must be able to distinguish if userspace
> had initiated the freeze, so that it does not unfreeze it later
> automatically on resume.
> 
> Likewise if the kernel is initiating a freeze on its own it should *not*
> fail to freeze a filesystem if a user had already frozen it on our behalf.
> This same concept applies to thawing, even if its not possible for
> userspace to beat the kernel in thawing a filesystem. This logic however
> has never applied to userspace freezing and thawing, two consecutive
> userspace freeze calls will results in only the first one succeeding, so
> we must retain the same behaviour in userspace.
> 
> This doesn't implement yet kernel initiated filesystem freeze calls,
> this will be done in subsequent calls. This change should introduce
> no functional changes, it just extends the definitions of a frozen
> filesystem to account for future kernel initiated filesystem freeze
> and let's us keep record of when userpace initiated it so the kernel
> can respect a userspace initiated freeze upon kernel initiated freeze
> and its respective thaw cycle.
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  block/bdev.c       |  4 ++--
>  fs/f2fs/gc.c       |  4 ++--
>  fs/gfs2/glops.c    |  2 +-
>  fs/gfs2/super.c    |  2 +-
>  fs/gfs2/sys.c      |  4 ++--
>  fs/gfs2/util.c     |  2 +-
>  fs/ioctl.c         |  4 ++--
>  fs/super.c         | 29 +++++++++++++++++++++++++----
>  include/linux/fs.h | 16 ++++++++++++++--
>  9 files changed, 50 insertions(+), 17 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index dc54a2a1c46e..04f7b2c99845 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -250,7 +250,7 @@ int freeze_bdev(struct block_device *bdev)
>  	if (sb->s_op->freeze_super)
>  		error = sb->s_op->freeze_super(sb);
>  	else
> -		error = freeze_super(sb);
> +		error = freeze_super(sb, true);
>  	deactivate_locked_super(sb);
>  
>  	if (error) {
> @@ -295,7 +295,7 @@ int thaw_bdev(struct block_device *bdev)
>  	if (sb->s_op->thaw_super)
>  		error = sb->s_op->thaw_super(sb);
>  	else
> -		error = thaw_super(sb);
> +		error = thaw_super(sb, true);
>  	if (error)
>  		bdev->bd_fsfreeze_count++;
>  	else
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index e31d6791d3e3..a5891055d85d 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -2168,7 +2168,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
>  
>  	if (!get_active_super(sbi->sb->s_bdev))
>  		return -ENOTTY;
> -	freeze_super(sbi->sb);
> +	freeze_super(sbi->sb, true);
>  
>  	f2fs_down_write(&sbi->gc_lock);
>  	f2fs_down_write(&sbi->cp_global_sem);
> @@ -2221,7 +2221,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
>  	f2fs_up_write(&sbi->cp_global_sem);
>  	f2fs_up_write(&sbi->gc_lock);
>  	/* We use the same active reference from freeze */
> -	thaw_super(sbi->sb);
> +	thaw_super(sbi->sb, true);
>  	deactivate_locked_super(sbi->sb);
>  	return err;
>  }
> diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
> index 01d433ed6ce7..8fd37508f9a0 100644
> --- a/fs/gfs2/glops.c
> +++ b/fs/gfs2/glops.c
> @@ -584,7 +584,7 @@ static int freeze_go_sync(struct gfs2_glock *gl)
>  	if (gl->gl_state == LM_ST_SHARED && !gfs2_withdrawn(sdp) &&
>  	    !test_bit(SDF_NORECOVERY, &sdp->sd_flags)) {
>  		atomic_set(&sdp->sd_freeze_state, SFS_STARTING_FREEZE);
> -		error = freeze_super(sdp->sd_vfs);
> +		error = freeze_super(sdp->sd_vfs, true);
>  		if (error) {
>  			fs_info(sdp, "GFS2: couldn't freeze filesystem: %d\n",
>  				error);
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index e57cb593e2f3..f2641891de43 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -687,7 +687,7 @@ void gfs2_freeze_func(struct work_struct *work)
>  		gfs2_assert_withdraw(sdp, 0);
>  	} else {
>  		atomic_set(&sdp->sd_freeze_state, SFS_UNFROZEN);
> -		error = thaw_super(sb);
> +		error = thaw_super(sb, true);
>  		if (error) {
>  			fs_info(sdp, "GFS2: couldn't thaw filesystem: %d\n",
>  				error);
> diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
> index e80c827acd09..9e0398f99674 100644
> --- a/fs/gfs2/sys.c
> +++ b/fs/gfs2/sys.c
> @@ -169,10 +169,10 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
>  
>  	switch (n) {
>  	case 0:
> -		error = thaw_super(sdp->sd_vfs);
> +		error = thaw_super(sdp->sd_vfs, true);
>  		break;
>  	case 1:
> -		error = freeze_super(sdp->sd_vfs);
> +		error = freeze_super(sdp->sd_vfs, true);
>  		break;
>  	default:
>  		deactivate_locked_super(sb);
> diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
> index 3a0cd5e9ad84..be9705d618ec 100644
> --- a/fs/gfs2/util.c
> +++ b/fs/gfs2/util.c
> @@ -191,7 +191,7 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
>  		/* Make sure gfs2_unfreeze works if partially-frozen */
>  		flush_work(&sdp->sd_freeze_work);
>  		atomic_set(&sdp->sd_freeze_state, SFS_FROZEN);
> -		thaw_super(sdp->sd_vfs);
> +		thaw_super(sdp->sd_vfs, true);
>  	} else {
>  		wait_on_bit(&i_gl->gl_flags, GLF_DEMOTE,
>  			    TASK_UNINTERRUPTIBLE);
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 1d20af762e0d..3cc79b82a5dc 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -401,7 +401,7 @@ static int ioctl_fsfreeze(struct file *filp)
>  	/* Freeze */
>  	if (sb->s_op->freeze_super)
>  		ret = sb->s_op->freeze_super(sb);
> -	ret = freeze_super(sb);
> +	ret = freeze_super(sb, true);
>  
>  	deactivate_locked_super(sb);
>  
> @@ -418,7 +418,7 @@ static int ioctl_fsthaw(struct file *filp)
>  	/* Thaw */
>  	if (sb->s_op->thaw_super)
>  		return sb->s_op->thaw_super(sb);
> -	return thaw_super(sb);
> +	return thaw_super(sb, true);
>  }
>  
>  static int ioctl_file_dedupe_range(struct file *file,
> diff --git a/fs/super.c b/fs/super.c
> index 46c6475fc765..16ccbb9dd230 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1026,7 +1026,7 @@ static void do_thaw_all_callback(struct super_block *sb)
>  		return;
>  	if (sb->s_root && sb->s_flags & SB_BORN) {
>  		emergency_thaw_bdev(sb);
> -		thaw_super(sb);
> +		thaw_super(sb, true);
>  	}
>  	deactivate_locked_super(sb);
>  }
> @@ -1636,6 +1636,8 @@ static void sb_freeze_unlock(struct super_block *sb, int level)
>  /**
>   * freeze_super - force a filesystem backed by a block device into a consistent state
>   * @sb: the super to lock
> + * @usercall: whether or not userspace initiated this via an ioctl or if it
> + * 	was a kernel freeze
>   *
>   * Used by filesystems and the kernel to freeze a fileystem backed by a block
>   * device into a consistent state. Callers must use get_active_super(bdev) to
> @@ -1669,10 +1671,13 @@ static void sb_freeze_unlock(struct super_block *sb, int level)
>   *
>   * sb->s_writers.frozen is protected by sb->s_umount.
>   */
> -int freeze_super(struct super_block *sb)
> +int freeze_super(struct super_block *sb, bool usercall)
>  {
>  	int ret;
>  
> +	if (!usercall && sb_is_frozen(sb))
> +		return 0;

If we try a kernel freeze but the fs was already frozen by userspace, we
return ... zero?

What if userspace thaws the fs immediately afterwards?  The kernel
caller is still running, and now it erroneously thinks the fs is frozen.
Won't that break the suspend freezer?

TBH I was more thinking about fscounters scrub, which will report false
corruptions if the fs gets unfrozen while it is running.

--D

> +
>  	if (!sb_is_unfrozen(sb))
>  		return -EBUSY;
>  
> @@ -1682,6 +1687,7 @@ int freeze_super(struct super_block *sb)
>  	if (sb_rdonly(sb)) {
>  		/* Nothing to do really... */
>  		sb->s_writers.frozen = SB_FREEZE_COMPLETE;
> +		sb->s_writers.frozen_by_user = usercall;
>  		return 0;
>  	}
>  
> @@ -1699,6 +1705,7 @@ int freeze_super(struct super_block *sb)
>  	ret = sync_filesystem(sb);
>  	if (ret) {
>  		sb->s_writers.frozen = SB_UNFROZEN;
> +		sb->s_writers.frozen_by_user = false;
>  		sb_freeze_unlock(sb, SB_FREEZE_PAGEFAULT);
>  		wake_up(&sb->s_writers.wait_unfrozen);
>  		return ret;
> @@ -1724,6 +1731,7 @@ int freeze_super(struct super_block *sb)
>  	 * when frozen is set to SB_FREEZE_COMPLETE, and for thaw_super().
>  	 */
>  	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
> +	sb->s_writers.frozen_by_user = usercall;
>  	lockdep_sb_freeze_release(sb);
>  	return 0;
>  }
> @@ -1732,21 +1740,33 @@ EXPORT_SYMBOL(freeze_super);
>  /**
>   * thaw_super -- unlock a filesystem backed by a block device
>   * @sb: the super to thaw
> + * @usercall: whether or not userspace initiated this thaw or if it was the
> + * 	kernel which initiated it
>   *
>   * Used by filesystems and the kernel to thaw a fileystem backed by a block
>   * device. Callers must use get_active_super(bdev) to lock the @sb and when
>   * done must unlock it with deactivate_locked_super(). Once done, this marks
>   * the filesystem as writeable.
>   */
> -int thaw_super(struct super_block *sb)
> +int thaw_super(struct super_block *sb, bool usercall)
>  {
>  	int error;
>  
> -	if (sb->s_writers.frozen != SB_FREEZE_COMPLETE)
> +	if (!usercall) {
> +		/*
> +		 * If userspace initiated the freeze don't let the kernel
> +		 * thaw it on return from a kernel initiated freeze.
> +		 */
> +		if (sb_is_unfrozen(sb) || sb_is_frozen_by_user(sb))
> +			return 0;
> +	}
> +
> +	if (!sb_is_frozen(sb))
>  		return -EINVAL;
>  
>  	if (sb_rdonly(sb)) {
>  		sb->s_writers.frozen = SB_UNFROZEN;
> +		sb->s_writers.frozen_by_user = false;
>  		goto out;
>  	}
>  
> @@ -1763,6 +1783,7 @@ int thaw_super(struct super_block *sb)
>  	}
>  
>  	sb->s_writers.frozen = SB_UNFROZEN;
> +	sb->s_writers.frozen_by_user = false;
>  	sb_freeze_unlock(sb, SB_FREEZE_FS);
>  out:
>  	wake_up(&sb->s_writers.wait_unfrozen);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 90b5bdc4071a..d9b46c858103 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1146,6 +1146,7 @@ enum {
>  
>  struct sb_writers {
>  	int				frozen;		/* Is sb frozen? */
> +	bool				frozen_by_user;	/* User freeze? */
>  	wait_queue_head_t		wait_unfrozen;	/* wait for thaw */
>  	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
>  };
> @@ -1632,6 +1633,17 @@ static inline bool sb_is_frozen(struct super_block *sb)
>  	return sb->s_writers.frozen == SB_FREEZE_COMPLETE;
>  }
>  
> +/**
> + * sb_is_frozen_by_user - was the superblock frozen by userspace?
> + * @sb: the super to check
> + *
> + * Returns true if the super is frozen by userspace, such as an ioctl.
> + */
> +static inline bool sb_is_frozen_by_user(struct super_block *sb)
> +{
> +	return sb_is_frozen(sb) && sb->s_writers.frozen_by_user;
> +}
> +
>  /**
>   * sb_is_unfrozen - is superblock unfrozen
>   * @sb: the super to check
> @@ -2308,8 +2320,8 @@ extern int unregister_filesystem(struct file_system_type *);
>  extern int vfs_statfs(const struct path *, struct kstatfs *);
>  extern int user_statfs(const char __user *, struct kstatfs *);
>  extern int fd_statfs(int, struct kstatfs *);
> -extern int freeze_super(struct super_block *super);
> -extern int thaw_super(struct super_block *super);
> +extern int freeze_super(struct super_block *super, bool usercall);
> +extern int thaw_super(struct super_block *super, bool usercall);
>  extern __printf(2, 3)
>  int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
>  extern int super_setup_bdi(struct super_block *sb);
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw
  2023-05-08  1:17 ` [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw Luis Chamberlain
@ 2023-05-18  5:32   ` Darrick J. Wong
  2023-05-25 12:17   ` Jan Kara
  2023-06-08  5:01   ` Christoph Hellwig
  2 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2023-05-18  5:32 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hch, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm, mchehab, keescook, p.raghav, da.gomez,
	linux-fsdevel, kernel, kexec, linux-kernel

On Sun, May 07, 2023 at 06:17:12PM -0700, Luis Chamberlain wrote:
> Right now freeze_super()  and thaw_super() are called with
> different locking contexts. To expand on this is messy, so
> just unify the requirement to require grabbing an active
> reference and keep the superblock locked.
> 
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  block/bdev.c    |  5 +++-
>  fs/f2fs/gc.c    |  5 ++++
>  fs/gfs2/super.c |  9 +++++--
>  fs/gfs2/sys.c   |  6 +++++
>  fs/gfs2/util.c  |  5 ++++
>  fs/ioctl.c      | 12 ++++++++--
>  fs/super.c      | 62 ++++++++++++++++++-------------------------------
>  7 files changed, 60 insertions(+), 44 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 21c63bfef323..dc54a2a1c46e 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -251,7 +251,7 @@ int freeze_bdev(struct block_device *bdev)
>  		error = sb->s_op->freeze_super(sb);
>  	else
>  		error = freeze_super(sb);
> -	deactivate_super(sb);
> +	deactivate_locked_super(sb);
>  
>  	if (error) {
>  		bdev->bd_fsfreeze_count--;
> @@ -289,6 +289,8 @@ int thaw_bdev(struct block_device *bdev)
>  	sb = bdev->bd_fsfreeze_sb;
>  	if (!sb)
>  		goto out;
> +	if (!get_active_super(bdev))
> +		goto out;
>  
>  	if (sb->s_op->thaw_super)
>  		error = sb->s_op->thaw_super(sb);
> @@ -298,6 +300,7 @@ int thaw_bdev(struct block_device *bdev)
>  		bdev->bd_fsfreeze_count++;
>  	else
>  		bdev->bd_fsfreeze_sb = NULL;
> +	deactivate_locked_super(sb);
>  out:
>  	mutex_unlock(&bdev->bd_fsfreeze_mutex);
>  	return error;
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 61c5f9d26018..e31d6791d3e3 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -2166,7 +2166,10 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
>  	if (err)
>  		return err;
>  
> +	if (!get_active_super(sbi->sb->s_bdev))
> +		return -ENOTTY;
>  	freeze_super(sbi->sb);
> +
>  	f2fs_down_write(&sbi->gc_lock);
>  	f2fs_down_write(&sbi->cp_global_sem);
>  
> @@ -2217,6 +2220,8 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
>  out_err:
>  	f2fs_up_write(&sbi->cp_global_sem);
>  	f2fs_up_write(&sbi->gc_lock);
> +	/* We use the same active reference from freeze */
>  	thaw_super(sbi->sb);
> +	deactivate_locked_super(sbi->sb);
>  	return err;
>  }
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 5eed8c237500..e57cb593e2f3 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -676,7 +676,12 @@ void gfs2_freeze_func(struct work_struct *work)
>  	struct gfs2_sbd *sdp = container_of(work, struct gfs2_sbd, sd_freeze_work);
>  	struct super_block *sb = sdp->sd_vfs;
>  
> -	atomic_inc(&sb->s_active);
> +	if (!get_active_super(sb->s_bdev)) {
> +		fs_info(sdp, "GFS2: couldn't grap super for thaw for filesystem\n");
> +		gfs2_assert_withdraw(sdp, 0);
> +		return;
> +	}
> +
>  	error = gfs2_freeze_lock(sdp, &freeze_gh, 0);
>  	if (error) {
>  		gfs2_assert_withdraw(sdp, 0);
> @@ -690,7 +695,7 @@ void gfs2_freeze_func(struct work_struct *work)
>  		}
>  		gfs2_freeze_unlock(&freeze_gh);
>  	}
> -	deactivate_super(sb);
> +	deactivate_locked_super(sb);
>  	clear_bit_unlock(SDF_FS_FROZEN, &sdp->sd_flags);
>  	wake_up_bit(&sdp->sd_flags, SDF_FS_FROZEN);
>  	return;
> diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
> index 454dc2ff8b5e..cbb71c3520c0 100644
> --- a/fs/gfs2/sys.c
> +++ b/fs/gfs2/sys.c
> @@ -164,6 +164,9 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> +	if (!get_active_super(sb->s_bdev))
> +		return -ENOTTY;
> +
>  	switch (n) {
>  	case 0:
>  		error = thaw_super(sdp->sd_vfs);
> @@ -172,9 +175,12 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
>  		error = freeze_super(sdp->sd_vfs);
>  		break;
>  	default:
> +		deactivate_locked_super(sb);
>  		return -EINVAL;
>  	}
>  
> +	deactivate_locked_super(sb);
> +
>  	if (error) {
>  		fs_warn(sdp, "freeze %d error %d\n", n, error);
>  		return error;
> diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
> index 7a6aeffcdf5c..3a0cd5e9ad84 100644
> --- a/fs/gfs2/util.c
> +++ b/fs/gfs2/util.c
> @@ -345,10 +345,15 @@ int gfs2_withdraw(struct gfs2_sbd *sdp)
>  	set_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags);
>  
>  	if (sdp->sd_args.ar_errors == GFS2_ERRORS_WITHDRAW) {
> +		if (!get_active_super(sb->s_bdev)) {
> +			fs_err(sdp, "could not grab super on withdraw for file system\n");
> +			return -1;
> +		}
>  		fs_err(sdp, "about to withdraw this file system\n");
>  		BUG_ON(sdp->sd_args.ar_debug);
>  
>  		signal_our_withdraw(sdp);
> +		deactivate_locked_super(sb);
>  
>  		kobject_uevent(&sdp->sd_kobj, KOBJ_OFFLINE);
>  
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 5b2481cd4750..1d20af762e0d 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -386,6 +386,7 @@ static int ioctl_fioasync(unsigned int fd, struct file *filp,
>  static int ioctl_fsfreeze(struct file *filp)
>  {
>  	struct super_block *sb = file_inode(filp)->i_sb;
> +	int ret;
>  
>  	if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
>  		return -EPERM;
> @@ -394,10 +395,17 @@ static int ioctl_fsfreeze(struct file *filp)
>  	if (sb->s_op->freeze_fs == NULL && sb->s_op->freeze_super == NULL)
>  		return -EOPNOTSUPP;
>  
> +	if (!get_active_super(sb->s_bdev))

Um... doesn't this mean that we now cannot freeze network and pseudo
filesystems?

> +		return -ENOTTY;
> +
>  	/* Freeze */
>  	if (sb->s_op->freeze_super)
> -		return sb->s_op->freeze_super(sb);
> -	return freeze_super(sb);
> +		ret = sb->s_op->freeze_super(sb);
> +	ret = freeze_super(sb);
> +
> +	deactivate_locked_super(sb);
> +
> +	return ret;
>  }
>  
>  static int ioctl_fsthaw(struct file *filp)
> diff --git a/fs/super.c b/fs/super.c
> index 34afe411cf2b..0e9d48846684 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -39,8 +39,6 @@
>  #include <uapi/linux/mount.h>
>  #include "internal.h"
>  
> -static int thaw_super_locked(struct super_block *sb);
> -
>  static LIST_HEAD(super_blocks);
>  static DEFINE_SPINLOCK(sb_lock);
>  
> @@ -851,13 +849,13 @@ struct super_block *get_active_super(struct block_device *bdev)
>  		if (sb->s_bdev == bdev) {
>  			if (!grab_super(sb))
>  				goto restart;
> -			up_write(&sb->s_umount);
>  			return sb;
>  		}
>  	}
>  	spin_unlock(&sb_lock);
>  	return NULL;
>  }
> +EXPORT_SYMBOL_GPL(get_active_super);
>  
>  struct super_block *user_get_super(dev_t dev, bool excl)
>  {
> @@ -1024,13 +1022,13 @@ void emergency_remount(void)
>  
>  static void do_thaw_all_callback(struct super_block *sb)
>  {
> -	down_write(&sb->s_umount);
> +	if (!get_active_super(sb->s_bdev))
> +		return;
>  	if (sb->s_root && sb->s_flags & SB_BORN) {
>  		emergency_thaw_bdev(sb);
> -		thaw_super_locked(sb);
> -	} else {
> -		up_write(&sb->s_umount);
> +		thaw_super(sb);
>  	}
> +	deactivate_locked_super(sb);
>  }
>  
>  static void do_thaw_all(struct work_struct *work)
> @@ -1636,10 +1634,13 @@ static void sb_freeze_unlock(struct super_block *sb, int level)
>  }
>  
>  /**
> - * freeze_super - lock the filesystem and force it into a consistent state
> + * freeze_super - force a filesystem backed by a block device into a consistent state
>   * @sb: the super to lock
>   *
> - * Syncs the super to make sure the filesystem is consistent and calls the fs's
> + * Used by filesystems and the kernel to freeze a fileystem backed by a block
> + * device into a consistent state. Callers must use get_active_super(bdev) to
> + * lock the @sb and when done must unlock it with deactivate_locked_super().
> + * Syncs the filesystem backed by the @sb and calls the filesystem's optional
>   * freeze_fs.  Subsequent calls to this without first thawing the fs will return
>   * -EBUSY.
>   *
> @@ -1672,22 +1673,15 @@ int freeze_super(struct super_block *sb)
>  {
>  	int ret;
>  
> -	atomic_inc(&sb->s_active);
> -	down_write(&sb->s_umount);
> -	if (sb->s_writers.frozen != SB_UNFROZEN) {
> -		deactivate_locked_super(sb);
> +	if (sb->s_writers.frozen != SB_UNFROZEN)
>  		return -EBUSY;
> -	}
>  
> -	if (!(sb->s_flags & SB_BORN)) {
> -		up_write(&sb->s_umount);
> +	if (!(sb->s_flags & SB_BORN))
>  		return 0;	/* sic - it's "nothing to do" */
> -	}
>  
>  	if (sb_rdonly(sb)) {
>  		/* Nothing to do really... */
>  		sb->s_writers.frozen = SB_FREEZE_COMPLETE;
> -		up_write(&sb->s_umount);
>  		return 0;
>  	}
>  
> @@ -1707,7 +1701,6 @@ int freeze_super(struct super_block *sb)
>  		sb->s_writers.frozen = SB_UNFROZEN;
>  		sb_freeze_unlock(sb, SB_FREEZE_PAGEFAULT);
>  		wake_up(&sb->s_writers.wait_unfrozen);
> -		deactivate_locked_super(sb);
>  		return ret;
>  	}
>  
> @@ -1723,7 +1716,6 @@ int freeze_super(struct super_block *sb)
>  			sb->s_writers.frozen = SB_UNFROZEN;
>  			sb_freeze_unlock(sb, SB_FREEZE_FS);
>  			wake_up(&sb->s_writers.wait_unfrozen);
> -			deactivate_locked_super(sb);
>  			return ret;
>  		}
>  	}
> @@ -1733,19 +1725,25 @@ int freeze_super(struct super_block *sb)
>  	 */
>  	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
>  	lockdep_sb_freeze_release(sb);
> -	up_write(&sb->s_umount);
>  	return 0;

Also ... before this change, a successful freeze means that we retain
the elevated s_active during the freeze.  Now the callers of this
function always call deactivate_locked_super, which decrements the
active count, even if the fs is actually frozen.  The active count no
longer works the way it used to.  Can that lead to freeing a frozen
super?

Alternately, I guess we could take our own s_active reference here.

--D

>  }
>  EXPORT_SYMBOL(freeze_super);
>  
> -static int thaw_super_locked(struct super_block *sb)
> +/**
> + * thaw_super -- unlock a filesystem backed by a block device
> + * @sb: the super to thaw
> + *
> + * Used by filesystems and the kernel to thaw a fileystem backed by a block
> + * device. Callers must use get_active_super(bdev) to lock the @sb and when
> + * done must unlock it with deactivate_locked_super(). Once done, this marks
> + * the filesystem as writeable.
> + */
> +int thaw_super(struct super_block *sb)
>  {
>  	int error;
>  
> -	if (sb->s_writers.frozen != SB_FREEZE_COMPLETE) {
> -		up_write(&sb->s_umount);
> +	if (sb->s_writers.frozen != SB_FREEZE_COMPLETE)
>  		return -EINVAL;
> -	}
>  
>  	if (sb_rdonly(sb)) {
>  		sb->s_writers.frozen = SB_UNFROZEN;
> @@ -1760,7 +1758,6 @@ static int thaw_super_locked(struct super_block *sb)
>  			printk(KERN_ERR
>  				"VFS:Filesystem thaw failed\n");
>  			lockdep_sb_freeze_release(sb);
> -			up_write(&sb->s_umount);
>  			return error;
>  		}
>  	}
> @@ -1769,21 +1766,8 @@ static int thaw_super_locked(struct super_block *sb)
>  	sb_freeze_unlock(sb, SB_FREEZE_FS);
>  out:
>  	wake_up(&sb->s_writers.wait_unfrozen);
> -	deactivate_locked_super(sb);
>  	return 0;
>  }
> -
> -/**
> - * thaw_super -- unlock filesystem
> - * @sb: the super to thaw
> - *
> - * Unlocks the filesystem and marks it writeable again after freeze_super().
> - */
> -int thaw_super(struct super_block *sb)
> -{
> -	down_write(&sb->s_umount);
> -	return thaw_super_locked(sb);
> -}
>  EXPORT_SYMBOL(thaw_super);
>  
>  /*
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-05-08  1:17 ` [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze Luis Chamberlain
  2023-05-16 15:23   ` Darrick J. Wong
@ 2023-05-22 23:42   ` Darrick J. Wong
  2023-05-25 14:14     ` Jan Kara
                       ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Darrick J. Wong @ 2023-05-22 23:42 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hch, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm, mchehab, keescook, p.raghav, da.gomez,
	linux-fsdevel, kernel, kexec, linux-kernel

How about this as an alternative patch?  Kernel and userspace freeze
state are stored in s_writers; each type cannot block the other (though
you still can't have nested kernel or userspace freezes); and the freeze
is maintained until /both/ freeze types are dropped.

AFAICT this should work for the two other usecases (quiescing pagefaults
for fsdax pmem pre-removal; and freezing fses during suspend) besides
online fsck for xfs.

--D

From: Darrick J. Wong <djwong@kernel.org>
Subject: fs: distinguish between user initiated freeze and kernel initiated freeze

Userspace can freeze a filesystem using the FIFREEZE ioctl or by
suspending the block device; this state persists until userspace thaws
the filesystem with the FITHAW ioctl or resuming the block device.
Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
the fsfreeze ioctl") we only allow the first freeze command to succeed.

The kernel may decide that it is necessary to freeze a filesystem for
its own internal purposes, such as suspends in progress, filesystem fsck
activities, or quiescing a device prior to removal.  Userspace thaw
commands must never break a kernel freeze, and kernel thaw commands
shouldn't undo userspace's freeze command.

Introduce a couple of freeze holder flags and wire it into the
sb_writers state.  One kernel and one userspace freeze are allowed to
coexist at the same time; the filesystem will not thaw until both are
lifted.

Inspired-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/super.c         |  172 +++++++++++++++++++++++++++++++++++++++++++++++++---
 include/linux/fs.h |    8 ++
 2 files changed, 170 insertions(+), 10 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 34afe411cf2b..7496d51affb9 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -39,7 +39,7 @@
 #include <uapi/linux/mount.h>
 #include "internal.h"
 
-static int thaw_super_locked(struct super_block *sb);
+static int thaw_super_locked(struct super_block *sb, unsigned short who);
 
 static LIST_HEAD(super_blocks);
 static DEFINE_SPINLOCK(sb_lock);
@@ -1027,7 +1027,7 @@ static void do_thaw_all_callback(struct super_block *sb)
 	down_write(&sb->s_umount);
 	if (sb->s_root && sb->s_flags & SB_BORN) {
 		emergency_thaw_bdev(sb);
-		thaw_super_locked(sb);
+		thaw_super_locked(sb, FREEZE_HOLDER_USERSPACE);
 	} else {
 		up_write(&sb->s_umount);
 	}
@@ -1636,13 +1636,21 @@ static void sb_freeze_unlock(struct super_block *sb, int level)
 }
 
 /**
- * freeze_super - lock the filesystem and force it into a consistent state
+ * __freeze_super - lock the filesystem and force it into a consistent state
  * @sb: the super to lock
+ * @who: FREEZE_HOLDER_USERSPACE if userspace wants to freeze the fs;
+ * FREEZE_HOLDER_KERNEL if the kernel wants to freeze it
  *
  * Syncs the super to make sure the filesystem is consistent and calls the fs's
- * freeze_fs.  Subsequent calls to this without first thawing the fs will return
+ * freeze_fs.  Subsequent calls to this without first thawing the fs may return
  * -EBUSY.
  *
+ * The @who argument distinguishes between the kernel and userspace trying to
+ * freeze the filesystem.  Although there cannot be multiple kernel freezes or
+ * multiple userspace freezes in effect at any given time, the kernel and
+ * userspace can both hold a filesystem frozen.  The filesystem remains frozen
+ * until there are no kernel or userspace freezes in effect.
+ *
  * During this function, sb->s_writers.frozen goes through these values:
  *
  * SB_UNFROZEN: File system is normal, all writes progress as usual.
@@ -1668,12 +1676,61 @@ static void sb_freeze_unlock(struct super_block *sb, int level)
  *
  * sb->s_writers.frozen is protected by sb->s_umount.
  */
-int freeze_super(struct super_block *sb)
+static int __freeze_super(struct super_block *sb, unsigned short who)
 {
+	struct sb_writers *sbw = &sb->s_writers;
 	int ret;
 
 	atomic_inc(&sb->s_active);
 	down_write(&sb->s_umount);
+
+	if (sbw->frozen == SB_FREEZE_COMPLETE) {
+		switch (who) {
+		case FREEZE_HOLDER_KERNEL:
+			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
+				/*
+				 * Kernel freeze already in effect; caller can
+				 * try again.
+				 */
+				deactivate_locked_super(sb);
+				return -EBUSY;
+			}
+			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
+				/*
+				 * Share the freeze state with the userspace
+				 * freeze already in effect.
+				 */
+				sbw->freeze_holders |= who;
+				deactivate_locked_super(sb);
+				return 0;
+			}
+			break;
+		case FREEZE_HOLDER_USERSPACE:
+			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
+				/*
+				 * Userspace freeze already in effect; tell
+				 * the caller we're busy.
+				 */
+				deactivate_locked_super(sb);
+				return -EBUSY;
+			}
+			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
+				/*
+				 * Share the freeze state with the kernel
+				 * freeze already in effect.
+				 */
+				sbw->freeze_holders |= who;
+				deactivate_locked_super(sb);
+				return 0;
+			}
+			break;
+		default:
+			BUG();
+			deactivate_locked_super(sb);
+			return -EINVAL;
+		}
+	}
+
 	if (sb->s_writers.frozen != SB_UNFROZEN) {
 		deactivate_locked_super(sb);
 		return -EBUSY;
@@ -1686,6 +1743,7 @@ int freeze_super(struct super_block *sb)
 
 	if (sb_rdonly(sb)) {
 		/* Nothing to do really... */
+		sb->s_writers.freeze_holders |= who;
 		sb->s_writers.frozen = SB_FREEZE_COMPLETE;
 		up_write(&sb->s_umount);
 		return 0;
@@ -1731,23 +1789,103 @@ int freeze_super(struct super_block *sb)
 	 * For debugging purposes so that fs can warn if it sees write activity
 	 * when frozen is set to SB_FREEZE_COMPLETE, and for thaw_super().
 	 */
+	sb->s_writers.freeze_holders |= who;
 	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
 	lockdep_sb_freeze_release(sb);
 	up_write(&sb->s_umount);
 	return 0;
 }
+
+/*
+ * freeze_super - lock the filesystem and force it into a consistent state
+ * @sb: the super to lock
+ *
+ * Syncs the super to make sure the filesystem is consistent and calls the fs's
+ * freeze_fs.  Subsequent calls to this without first calling thaw_super will
+ * return -EBUSY.  See the comment for __freeze_super for more information.
+ */
+int freeze_super(struct super_block *sb)
+{
+	return __freeze_super(sb, FREEZE_HOLDER_USERSPACE);
+}
 EXPORT_SYMBOL(freeze_super);
 
-static int thaw_super_locked(struct super_block *sb)
+/**
+ * freeze_super_kernel - lock the filesystem for an internal kernel operation
+ * and force it into a consistent state.
+ * @sb: the super to lock
+ *
+ * Syncs the super to make sure the filesystem is consistent and calls the fs's
+ * freeze_fs.  Subsequent calls to this without first calling thaw_super_excl
+ * will return -EBUSY.
+ */
+int freeze_super_kernel(struct super_block *sb)
 {
+	return __freeze_super(sb, FREEZE_HOLDER_KERNEL);
+}
+EXPORT_SYMBOL_GPL(freeze_super_kernel);
+
+/*
+ * Undoes the effect of a freeze_super_locked call.  If the filesystem is
+ * frozen both by userspace and the kernel, a thaw call from either source
+ * removes that state without releasing the other state or unlocking the
+ * filesystem.
+ */
+static int thaw_super_locked(struct super_block *sb, unsigned short who)
+{
+	struct sb_writers *sbw = &sb->s_writers;
 	int error;
 
+	if (sbw->frozen == SB_FREEZE_COMPLETE) {
+		switch (who) {
+		case FREEZE_HOLDER_KERNEL:
+			if (!(sbw->freeze_holders & FREEZE_HOLDER_KERNEL)) {
+				/* Caller doesn't hold a kernel freeze. */
+				up_write(&sb->s_umount);
+				return -EINVAL;
+			}
+			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
+				/*
+				 * We were sharing the freeze with userspace,
+				 * so drop the userspace freeze but exit
+				 * without unfreezing.
+				 */
+				sbw->freeze_holders &= ~who;
+				up_write(&sb->s_umount);
+				return 0;
+			}
+			break;
+		case FREEZE_HOLDER_USERSPACE:
+			if (!(sbw->freeze_holders & FREEZE_HOLDER_USERSPACE)) {
+				/* Caller doesn't hold a userspace freeze. */
+				up_write(&sb->s_umount);
+				return -EINVAL;
+			}
+			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
+				/*
+				 * We were sharing the freeze with the kernel,
+				 * so drop the kernel freeze but exit without
+				 * unfreezing.
+				 */
+				sbw->freeze_holders &= ~who;
+				up_write(&sb->s_umount);
+				return 0;
+			}
+			break;
+		default:
+			BUG();
+			up_write(&sb->s_umount);
+			return -EINVAL;
+		}
+	}
+
 	if (sb->s_writers.frozen != SB_FREEZE_COMPLETE) {
 		up_write(&sb->s_umount);
 		return -EINVAL;
 	}
 
 	if (sb_rdonly(sb)) {
+		sb->s_writers.freeze_holders &= ~who;
 		sb->s_writers.frozen = SB_UNFROZEN;
 		goto out;
 	}
@@ -1765,6 +1903,7 @@ static int thaw_super_locked(struct super_block *sb)
 		}
 	}
 
+	sb->s_writers.freeze_holders &= ~who;
 	sb->s_writers.frozen = SB_UNFROZEN;
 	sb_freeze_unlock(sb, SB_FREEZE_FS);
 out:
@@ -1774,18 +1913,33 @@ static int thaw_super_locked(struct super_block *sb)
 }
 
 /**
- * thaw_super -- unlock filesystem
+ * thaw_super -- unlock filesystem frozen with freeze_super
  * @sb: the super to thaw
  *
- * Unlocks the filesystem and marks it writeable again after freeze_super().
+ * Unlocks the filesystem after freeze_super, and make it writeable again if
+ * there is not a freeze_super_kernel still in effect.
  */
 int thaw_super(struct super_block *sb)
 {
 	down_write(&sb->s_umount);
-	return thaw_super_locked(sb);
+	return thaw_super_locked(sb, FREEZE_HOLDER_USERSPACE);
 }
 EXPORT_SYMBOL(thaw_super);
 
+/**
+ * thaw_super_kernel -- unlock filesystem frozen with freeze_super_kernel
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem after freeze_super_kernel, and make it writeable
+ * again if there is not a freeze_super still in effect.
+ */
+int thaw_super_kernel(struct super_block *sb)
+{
+	down_write(&sb->s_umount);
+	return thaw_super_locked(sb, FREEZE_HOLDER_KERNEL);
+}
+EXPORT_SYMBOL_GPL(thaw_super_kernel);
+
 /*
  * Create workqueue for deferred direct IO completions. We allocate the
  * workqueue when it's first needed. This avoids creating workqueue for
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21a981680856..147644b5d648 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1145,11 +1145,15 @@ enum {
 #define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1)
 
 struct sb_writers {
-	int				frozen;		/* Is sb frozen? */
+	unsigned short			frozen;		/* Is sb frozen? */
+	unsigned short			freeze_holders;	/* Who froze fs? */
 	wait_queue_head_t		wait_unfrozen;	/* wait for thaw */
 	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
 };
 
+#define FREEZE_HOLDER_USERSPACE	(1U << 1)	/* userspace froze fs */
+#define FREEZE_HOLDER_KERNEL	(1U << 2)	/* kernel froze fs */
+
 struct super_block {
 	struct list_head	s_list;		/* Keep this first */
 	dev_t			s_dev;		/* search index; _not_ kdev_t */
@@ -2288,6 +2292,8 @@ extern int user_statfs(const char __user *, struct kstatfs *);
 extern int fd_statfs(int, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
+extern int freeze_super_kernel(struct super_block *super);
+extern int thaw_super_kernel(struct super_block *super);
 extern __printf(2, 3)
 int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
 extern int super_setup_bdi(struct super_block *sb);

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw
  2023-05-08  1:17 ` [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw Luis Chamberlain
  2023-05-18  5:32   ` Darrick J. Wong
@ 2023-05-25 12:17   ` Jan Kara
  2023-06-08  5:01   ` Christoph Hellwig
  2 siblings, 0 replies; 34+ messages in thread
From: Jan Kara @ 2023-05-25 12:17 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hch, djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm, mchehab, keescook, p.raghav, da.gomez,
	linux-fsdevel, kernel, kexec, linux-kernel

On Sun 07-05-23 18:17:12, Luis Chamberlain wrote:
> Right now freeze_super()  and thaw_super() are called with
> different locking contexts. To expand on this is messy, so
> just unify the requirement to require grabbing an active
> reference and keep the superblock locked.
> 
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Finally got around to looking at this. Sorry for the delay. In principle I
like the direction but see below:

> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 61c5f9d26018..e31d6791d3e3 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -2166,7 +2166,10 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
>  	if (err)
>  		return err;
>  
> +	if (!get_active_super(sbi->sb->s_bdev))
> +		return -ENOTTY;

Calling get_active_super() like this is just sick. You rather want to
provide a helper for grabbing another active sb reference and locking the
sb when you already have sb reference. Because that is what is needed in
the vast majority of the places. Something like

void grab_active_super(struct super_block *sb)
{
	down_write(sb->s_umount);
	atomic_inc(&s->s_active);
}

> @@ -851,13 +849,13 @@ struct super_block *get_active_super(struct block_device *bdev)
>  		if (sb->s_bdev == bdev) {
>  			if (!grab_super(sb))
>  				goto restart;
> -			up_write(&sb->s_umount);
>  			return sb;
>  		}
>  	}
>  	spin_unlock(&sb_lock);
>  	return NULL;
>  }
> +EXPORT_SYMBOL_GPL(get_active_super);

And I'd call this grab_bdev_super() and no need to export it when you have
grab_active_super().

> @@ -1636,10 +1634,13 @@ static void sb_freeze_unlock(struct super_block *sb, int level)
>  }
>  
>  /**
> - * freeze_super - lock the filesystem and force it into a consistent state
> + * freeze_super - force a filesystem backed by a block device into a consistent state
>   * @sb: the super to lock
>   *
> - * Syncs the super to make sure the filesystem is consistent and calls the fs's
> + * Used by filesystems and the kernel to freeze a fileystem backed by a block
> + * device into a consistent state. Callers must use get_active_super(bdev) to
> + * lock the @sb and when done must unlock it with deactivate_locked_super().
> + * Syncs the filesystem backed by the @sb and calls the filesystem's optional
>   * freeze_fs.  Subsequent calls to this without first thawing the fs will return
>   * -EBUSY.
>   *
> @@ -1672,22 +1673,15 @@ int freeze_super(struct super_block *sb)
>  {
>  	int ret;
>  
> -	atomic_inc(&sb->s_active);
> -	down_write(&sb->s_umount);
> -	if (sb->s_writers.frozen != SB_UNFROZEN) {
> -		deactivate_locked_super(sb);

At least add a warning for s_umount not being held here?

> +	if (sb->s_writers.frozen != SB_UNFROZEN)
>  		return -EBUSY;
> -	}
>  
> -	if (!(sb->s_flags & SB_BORN)) {
> -		up_write(&sb->s_umount);
> +	if (!(sb->s_flags & SB_BORN))
>  		return 0;	/* sic - it's "nothing to do" */
> -	}
>  
>  	if (sb_rdonly(sb)) {
>  		/* Nothing to do really... */
>  		sb->s_writers.frozen = SB_FREEZE_COMPLETE;
> -		up_write(&sb->s_umount);
>  		return 0;
>  	}

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/6] fs: add frozen sb state helpers
  2023-05-08  1:17 ` [PATCH 2/6] fs: add frozen sb state helpers Luis Chamberlain
@ 2023-05-25 12:19   ` Jan Kara
  2023-06-08  5:05   ` Christoph Hellwig
  1 sibling, 0 replies; 34+ messages in thread
From: Jan Kara @ 2023-05-25 12:19 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hch, djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm, mchehab, keescook, p.raghav, da.gomez,
	linux-fsdevel, kernel, kexec, linux-kernel

On Sun 07-05-23 18:17:13, Luis Chamberlain wrote:
> Provide helpers so that we can check a superblock frozen state.
> This will make subsequent changes easier to read. This makes
> no functional changes.
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Just noticed one nit...

> diff --git a/fs/super.c b/fs/super.c
> index 0e9d48846684..46c6475fc765 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -905,7 +905,7 @@ int reconfigure_super(struct fs_context *fc)
>  
>  	if (fc->sb_flags_mask & ~MS_RMT_MASK)
>  		return -EINVAL;
> -	if (sb->s_writers.frozen != SB_UNFROZEN)
> +	if (!(sb_is_unfrozen(sb)))
             ^ unneeded parenthesis here

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-05-22 23:42   ` Darrick J. Wong
@ 2023-05-25 14:14     ` Jan Kara
  2023-06-06 17:19       ` Darrick J. Wong
                         ` (2 more replies)
  2023-06-08  5:24     ` Christoph Hellwig
  2023-06-08 20:26     ` Luis Chamberlain
  2 siblings, 3 replies; 34+ messages in thread
From: Jan Kara @ 2023-05-25 14:14 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Luis Chamberlain, hch, sandeen, song, rafael, gregkh, viro, jack,
	jikos, bvanassche, ebiederm, mchehab, keescook, p.raghav,
	da.gomez, linux-fsdevel, kernel, kexec, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4471 bytes --]

On Mon 22-05-23 16:42:00, Darrick J. Wong wrote:
> How about this as an alternative patch?  Kernel and userspace freeze
> state are stored in s_writers; each type cannot block the other (though
> you still can't have nested kernel or userspace freezes); and the freeze
> is maintained until /both/ freeze types are dropped.
> 
> AFAICT this should work for the two other usecases (quiescing pagefaults
> for fsdax pmem pre-removal; and freezing fses during suspend) besides
> online fsck for xfs.
> 
> --D
> 
> From: Darrick J. Wong <djwong@kernel.org>
> Subject: fs: distinguish between user initiated freeze and kernel initiated freeze
> 
> Userspace can freeze a filesystem using the FIFREEZE ioctl or by
> suspending the block device; this state persists until userspace thaws
> the filesystem with the FITHAW ioctl or resuming the block device.
> Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
> the fsfreeze ioctl") we only allow the first freeze command to succeed.
> 
> The kernel may decide that it is necessary to freeze a filesystem for
> its own internal purposes, such as suspends in progress, filesystem fsck
> activities, or quiescing a device prior to removal.  Userspace thaw
> commands must never break a kernel freeze, and kernel thaw commands
> shouldn't undo userspace's freeze command.
> 
> Introduce a couple of freeze holder flags and wire it into the
> sb_writers state.  One kernel and one userspace freeze are allowed to
> coexist at the same time; the filesystem will not thaw until both are
> lifted.
> 
> Inspired-by: Luis Chamberlain <mcgrof@kernel.org>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Yes, this is exactly how I'd imagine it. Thanks for writing the patch!

I'd just note that this would need rebasing on top of Luis' patches 1 and
2. Also:

> +	if (sbw->frozen == SB_FREEZE_COMPLETE) {
> +		switch (who) {
> +		case FREEZE_HOLDER_KERNEL:
> +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> +				/*
> +				 * Kernel freeze already in effect; caller can
> +				 * try again.
> +				 */
> +				deactivate_locked_super(sb);
> +				return -EBUSY;
> +			}
> +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> +				/*
> +				 * Share the freeze state with the userspace
> +				 * freeze already in effect.
> +				 */
> +				sbw->freeze_holders |= who;
> +				deactivate_locked_super(sb);
> +				return 0;
> +			}
> +			break;
> +		case FREEZE_HOLDER_USERSPACE:
> +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> +				/*
> +				 * Userspace freeze already in effect; tell
> +				 * the caller we're busy.
> +				 */
> +				deactivate_locked_super(sb);
> +				return -EBUSY;
> +			}
> +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> +				/*
> +				 * Share the freeze state with the kernel
> +				 * freeze already in effect.
> +				 */
> +				sbw->freeze_holders |= who;
> +				deactivate_locked_super(sb);
> +				return 0;
> +			}
> +			break;
> +		default:
> +			BUG();
> +			deactivate_locked_super(sb);
> +			return -EINVAL;
> +		}
> +	}

Can't this be simplified to:

	BUG_ON(who & ~(FREEZE_HOLDER_USERSPACE | FREEZE_HOLDER_KERNEL));
	BUG_ON(!(!(who & FREEZE_HOLDER_USERSPACE) ^
	       !(who & FREEZE_HOLDER_KERNEL)));
retry:
	if (sb->s_writers.freeze_holders & who)
		return -EBUSY;
	/* Already frozen by someone else? */
	if (sb->s_writers.freeze_holders & ~who) {
		sb->s_writers.freeze_holders |= who;
		return 0;
	}

Now the only remaining issue with the code is that the two different
holders can be attempting to freeze the filesystem at once and in that case
one of them has to wait for the other one instead of returning -EBUSY as
would happen currently. This can happen because we temporarily drop
s_umount in freeze_super() due to lock ordering issues. I think we could
do something like:

	if (!sb_unfrozen(sb)) {
		up_write(&sb->s_umount);
		wait_var_event(&sb->s_writers.frozen,
			       sb_unfrozen(sb) || sb_frozen(sb));
		down_write(&sb->s_umount);
		goto retry;
	}

and then sprinkle wake_up_var(&sb->s_writers.frozen) at appropriate places
in freeze_super().

BTW, when reading this code, I've spotted attached cleanup opportunity but
I'll queue that separately so that is JFYI.

> +#define FREEZE_HOLDER_USERSPACE	(1U << 1)	/* userspace froze fs */
> +#define FREEZE_HOLDER_KERNEL	(1U << 2)	/* kernel froze fs */

Why not start from 1U << 0? And bonus points for using BIT() macro :).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-fs-Drop-wait_unfrozen-wait-queue.patch --]
[-- Type: text/x-patch, Size: 2637 bytes --]

From 9fce35f21f9a62470e764463c84373fb013108fd Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 25 May 2023 15:56:19 +0200
Subject: [PATCH] fs: Drop wait_unfrozen wait queue

wait_unfrozen waitqueue is used only in quota code to wait for
filesystem to become unfrozen. In that place we can just use
sb_start_write() - sb_end_write() pair to achieve the same. So just
remove the waitqueue.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/quota/quota.c   | 5 +++--
 fs/super.c         | 4 ----
 include/linux/fs.h | 1 -
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 052f143e2e0e..0e41fb84060f 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -895,8 +895,9 @@ static struct super_block *quotactl_block(const char __user *special, int cmd)
 			up_write(&sb->s_umount);
 		else
 			up_read(&sb->s_umount);
-		wait_event(sb->s_writers.wait_unfrozen,
-			   sb->s_writers.frozen == SB_UNFROZEN);
+		/* Wait for sb to unfreeze */
+		sb_start_write(sb);
+		sb_end_write(sb);
 		put_super(sb);
 		goto retry;
 	}
diff --git a/fs/super.c b/fs/super.c
index 34afe411cf2b..6283cea67280 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -236,7 +236,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 					&type->s_writers_key[i]))
 			goto fail;
 	}
-	init_waitqueue_head(&s->s_writers.wait_unfrozen);
 	s->s_bdi = &noop_backing_dev_info;
 	s->s_flags = flags;
 	if (s->s_user_ns != &init_user_ns)
@@ -1706,7 +1705,6 @@ int freeze_super(struct super_block *sb)
 	if (ret) {
 		sb->s_writers.frozen = SB_UNFROZEN;
 		sb_freeze_unlock(sb, SB_FREEZE_PAGEFAULT);
-		wake_up(&sb->s_writers.wait_unfrozen);
 		deactivate_locked_super(sb);
 		return ret;
 	}
@@ -1722,7 +1720,6 @@ int freeze_super(struct super_block *sb)
 				"VFS:Filesystem freeze failed\n");
 			sb->s_writers.frozen = SB_UNFROZEN;
 			sb_freeze_unlock(sb, SB_FREEZE_FS);
-			wake_up(&sb->s_writers.wait_unfrozen);
 			deactivate_locked_super(sb);
 			return ret;
 		}
@@ -1768,7 +1765,6 @@ static int thaw_super_locked(struct super_block *sb)
 	sb->s_writers.frozen = SB_UNFROZEN;
 	sb_freeze_unlock(sb, SB_FREEZE_FS);
 out:
-	wake_up(&sb->s_writers.wait_unfrozen);
 	deactivate_locked_super(sb);
 	return 0;
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21a981680856..3b65a6194485 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1146,7 +1146,6 @@ enum {
 
 struct sb_writers {
 	int				frozen;		/* Is sb frozen? */
-	wait_queue_head_t		wait_unfrozen;	/* wait for thaw */
 	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
 };
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-05-25 14:14     ` Jan Kara
@ 2023-06-06 17:19       ` Darrick J. Wong
  2023-06-07  9:22         ` Jan Kara
  2023-06-08 20:30         ` Luis Chamberlain
  2023-06-07 16:31       ` Darrick J. Wong
  2023-06-08  5:29       ` Christoph Hellwig
  2 siblings, 2 replies; 34+ messages in thread
From: Darrick J. Wong @ 2023-06-06 17:19 UTC (permalink / raw)
  To: Jan Kara
  Cc: Luis Chamberlain, hch, sandeen, song, rafael, gregkh, viro,
	jikos, bvanassche, ebiederm, mchehab, keescook, p.raghav,
	da.gomez, linux-fsdevel, kernel, kexec, linux-kernel

On Thu, May 25, 2023 at 04:14:30PM +0200, Jan Kara wrote:
> On Mon 22-05-23 16:42:00, Darrick J. Wong wrote:
> > How about this as an alternative patch?  Kernel and userspace freeze
> > state are stored in s_writers; each type cannot block the other (though
> > you still can't have nested kernel or userspace freezes); and the freeze
> > is maintained until /both/ freeze types are dropped.
> > 
> > AFAICT this should work for the two other usecases (quiescing pagefaults
> > for fsdax pmem pre-removal; and freezing fses during suspend) besides
> > online fsck for xfs.
> > 
> > --D
> > 
> > From: Darrick J. Wong <djwong@kernel.org>
> > Subject: fs: distinguish between user initiated freeze and kernel initiated freeze
> > 
> > Userspace can freeze a filesystem using the FIFREEZE ioctl or by
> > suspending the block device; this state persists until userspace thaws
> > the filesystem with the FITHAW ioctl or resuming the block device.
> > Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
> > the fsfreeze ioctl") we only allow the first freeze command to succeed.
> > 
> > The kernel may decide that it is necessary to freeze a filesystem for
> > its own internal purposes, such as suspends in progress, filesystem fsck
> > activities, or quiescing a device prior to removal.  Userspace thaw
> > commands must never break a kernel freeze, and kernel thaw commands
> > shouldn't undo userspace's freeze command.
> > 
> > Introduce a couple of freeze holder flags and wire it into the
> > sb_writers state.  One kernel and one userspace freeze are allowed to
> > coexist at the same time; the filesystem will not thaw until both are
> > lifted.
> > 
> > Inspired-by: Luis Chamberlain <mcgrof@kernel.org>
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> 
> Yes, this is exactly how I'd imagine it. Thanks for writing the patch!
> 
> I'd just note that this would need rebasing on top of Luis' patches 1 and
> 2. Also:

I started doing that, but I noticed that after patch 1, freeze_super no
longer leaves s_active elevated if the freeze is successful.  The
callers drop the s_active ref that they themselves obtained, which
means that we've now changed that behavior, right?  ioctl_fsfreeze now
does:

	if (!get_active_super(sb->s_bdev))
		return -ENOTTY;

(Increase ref)

        /* Freeze */
        if (sb->s_op->freeze_super)
		ret = sb->s_op->freeze_super(sb);
	ret = freeze_super(sb);

(Not sure why we can do both here?)

	deactivate_locked_super(sb);

(Decrease ref; net change to s_active is zero)

	return ret;

Luis hasn't responded to my question, so I stopped.

> > +	if (sbw->frozen == SB_FREEZE_COMPLETE) {
> > +		switch (who) {
> > +		case FREEZE_HOLDER_KERNEL:
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> > +				/*
> > +				 * Kernel freeze already in effect; caller can
> > +				 * try again.
> > +				 */
> > +				deactivate_locked_super(sb);
> > +				return -EBUSY;
> > +			}
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> > +				/*
> > +				 * Share the freeze state with the userspace
> > +				 * freeze already in effect.
> > +				 */
> > +				sbw->freeze_holders |= who;
> > +				deactivate_locked_super(sb);
> > +				return 0;
> > +			}
> > +			break;
> > +		case FREEZE_HOLDER_USERSPACE:
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> > +				/*
> > +				 * Userspace freeze already in effect; tell
> > +				 * the caller we're busy.
> > +				 */
> > +				deactivate_locked_super(sb);
> > +				return -EBUSY;
> > +			}
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> > +				/*
> > +				 * Share the freeze state with the kernel
> > +				 * freeze already in effect.
> > +				 */
> > +				sbw->freeze_holders |= who;
> > +				deactivate_locked_super(sb);
> > +				return 0;
> > +			}
> > +			break;
> > +		default:
> > +			BUG();
> > +			deactivate_locked_super(sb);
> > +			return -EINVAL;
> > +		}
> > +	}
> 
> Can't this be simplified to:
> 
> 	BUG_ON(who & ~(FREEZE_HOLDER_USERSPACE | FREEZE_HOLDER_KERNEL));
> 	BUG_ON(!(!(who & FREEZE_HOLDER_USERSPACE) ^
> 	       !(who & FREEZE_HOLDER_KERNEL)));
> retry:
> 	if (sb->s_writers.freeze_holders & who)
> 		return -EBUSY;
> 	/* Already frozen by someone else? */
> 	if (sb->s_writers.freeze_holders & ~who) {
> 		sb->s_writers.freeze_holders |= who;
> 		return 0;
> 	}

Yes, it can.

> Now the only remaining issue with the code is that the two different
> holders can be attempting to freeze the filesystem at once and in that case
> one of them has to wait for the other one instead of returning -EBUSY as
> would happen currently. This can happen because we temporarily drop
> s_umount in freeze_super() due to lock ordering issues. I think we could
> do something like:
> 
> 	if (!sb_unfrozen(sb)) {
> 		up_write(&sb->s_umount);
> 		wait_var_event(&sb->s_writers.frozen,
> 			       sb_unfrozen(sb) || sb_frozen(sb));
> 		down_write(&sb->s_umount);
> 		goto retry;
> 	}
> 
> and then sprinkle wake_up_var(&sb->s_writers.frozen) at appropriate places
> in freeze_super().

I think that'd work.  Let me try that.

> BTW, when reading this code, I've spotted attached cleanup opportunity but
> I'll queue that separately so that is JFYI.
> 
> > +#define FREEZE_HOLDER_USERSPACE	(1U << 1)	/* userspace froze fs */
> > +#define FREEZE_HOLDER_KERNEL	(1U << 2)	/* kernel froze fs */
> 
> Why not start from 1U << 0? And bonus points for using BIT() macro :).

I didn't think filesystem code was supposed to be using stuff from
vdso.h...

--D

> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

> From 9fce35f21f9a62470e764463c84373fb013108fd Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Thu, 25 May 2023 15:56:19 +0200
> Subject: [PATCH] fs: Drop wait_unfrozen wait queue
> 
> wait_unfrozen waitqueue is used only in quota code to wait for
> filesystem to become unfrozen. In that place we can just use
> sb_start_write() - sb_end_write() pair to achieve the same. So just
> remove the waitqueue.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/quota/quota.c   | 5 +++--
>  fs/super.c         | 4 ----
>  include/linux/fs.h | 1 -
>  3 files changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index 052f143e2e0e..0e41fb84060f 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -895,8 +895,9 @@ static struct super_block *quotactl_block(const char __user *special, int cmd)
>  			up_write(&sb->s_umount);
>  		else
>  			up_read(&sb->s_umount);
> -		wait_event(sb->s_writers.wait_unfrozen,
> -			   sb->s_writers.frozen == SB_UNFROZEN);
> +		/* Wait for sb to unfreeze */
> +		sb_start_write(sb);
> +		sb_end_write(sb);
>  		put_super(sb);
>  		goto retry;
>  	}
> diff --git a/fs/super.c b/fs/super.c
> index 34afe411cf2b..6283cea67280 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -236,7 +236,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
>  					&type->s_writers_key[i]))
>  			goto fail;
>  	}
> -	init_waitqueue_head(&s->s_writers.wait_unfrozen);
>  	s->s_bdi = &noop_backing_dev_info;
>  	s->s_flags = flags;
>  	if (s->s_user_ns != &init_user_ns)
> @@ -1706,7 +1705,6 @@ int freeze_super(struct super_block *sb)
>  	if (ret) {
>  		sb->s_writers.frozen = SB_UNFROZEN;
>  		sb_freeze_unlock(sb, SB_FREEZE_PAGEFAULT);
> -		wake_up(&sb->s_writers.wait_unfrozen);
>  		deactivate_locked_super(sb);
>  		return ret;
>  	}
> @@ -1722,7 +1720,6 @@ int freeze_super(struct super_block *sb)
>  				"VFS:Filesystem freeze failed\n");
>  			sb->s_writers.frozen = SB_UNFROZEN;
>  			sb_freeze_unlock(sb, SB_FREEZE_FS);
> -			wake_up(&sb->s_writers.wait_unfrozen);
>  			deactivate_locked_super(sb);
>  			return ret;
>  		}
> @@ -1768,7 +1765,6 @@ static int thaw_super_locked(struct super_block *sb)
>  	sb->s_writers.frozen = SB_UNFROZEN;
>  	sb_freeze_unlock(sb, SB_FREEZE_FS);
>  out:
> -	wake_up(&sb->s_writers.wait_unfrozen);
>  	deactivate_locked_super(sb);
>  	return 0;
>  }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 21a981680856..3b65a6194485 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1146,7 +1146,6 @@ enum {
>  
>  struct sb_writers {
>  	int				frozen;		/* Is sb frozen? */
> -	wait_queue_head_t		wait_unfrozen;	/* wait for thaw */
>  	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
>  };
>  
> -- 
> 2.35.3
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-06-06 17:19       ` Darrick J. Wong
@ 2023-06-07  9:22         ` Jan Kara
  2023-06-07 14:50           ` Darrick J. Wong
  2023-06-08 20:30         ` Luis Chamberlain
  1 sibling, 1 reply; 34+ messages in thread
From: Jan Kara @ 2023-06-07  9:22 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jan Kara, Luis Chamberlain, hch, sandeen, song, rafael, gregkh,
	viro, jikos, bvanassche, ebiederm, mchehab, keescook, p.raghav,
	da.gomez, linux-fsdevel, kernel, kexec, linux-kernel

On Tue 06-06-23 10:19:56, Darrick J. Wong wrote:
> On Thu, May 25, 2023 at 04:14:30PM +0200, Jan Kara wrote:
> > On Mon 22-05-23 16:42:00, Darrick J. Wong wrote:
> > > How about this as an alternative patch?  Kernel and userspace freeze
> > > state are stored in s_writers; each type cannot block the other (though
> > > you still can't have nested kernel or userspace freezes); and the freeze
> > > is maintained until /both/ freeze types are dropped.
> > > 
> > > AFAICT this should work for the two other usecases (quiescing pagefaults
> > > for fsdax pmem pre-removal; and freezing fses during suspend) besides
> > > online fsck for xfs.
> > > 
> > > --D
> > > 
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > Subject: fs: distinguish between user initiated freeze and kernel initiated freeze
> > > 
> > > Userspace can freeze a filesystem using the FIFREEZE ioctl or by
> > > suspending the block device; this state persists until userspace thaws
> > > the filesystem with the FITHAW ioctl or resuming the block device.
> > > Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
> > > the fsfreeze ioctl") we only allow the first freeze command to succeed.
> > > 
> > > The kernel may decide that it is necessary to freeze a filesystem for
> > > its own internal purposes, such as suspends in progress, filesystem fsck
> > > activities, or quiescing a device prior to removal.  Userspace thaw
> > > commands must never break a kernel freeze, and kernel thaw commands
> > > shouldn't undo userspace's freeze command.
> > > 
> > > Introduce a couple of freeze holder flags and wire it into the
> > > sb_writers state.  One kernel and one userspace freeze are allowed to
> > > coexist at the same time; the filesystem will not thaw until both are
> > > lifted.
> > > 
> > > Inspired-by: Luis Chamberlain <mcgrof@kernel.org>
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > 
> > Yes, this is exactly how I'd imagine it. Thanks for writing the patch!
> > 
> > I'd just note that this would need rebasing on top of Luis' patches 1 and
> > 2. Also:
> 
> I started doing that, but I noticed that after patch 1, freeze_super no
> longer leaves s_active elevated if the freeze is successful.  The
> callers drop the s_active ref that they themselves obtained, which
> means that we've now changed that behavior, right?  ioctl_fsfreeze now
> does:
> 
> 	if (!get_active_super(sb->s_bdev))
> 		return -ENOTTY;
> 
> (Increase ref)
> 
>         /* Freeze */
>         if (sb->s_op->freeze_super)
> 		ret = sb->s_op->freeze_super(sb);
> 	ret = freeze_super(sb);
> 
> (Not sure why we can do both here?)
> 
> 	deactivate_locked_super(sb);
> 
> (Decrease ref; net change to s_active is zero)
> 
> 	return ret;
> 
> Luis hasn't responded to my question, so I stopped.

Right. I kind of like how he's moved the locking out of freeze_super() /
thaw_super() but I agree this semantic change is problematic and needs much
more thought - e.g. with Luis' version the fs could be unmounted while
frozen which is going to spectacularly deadlock. So yeah let's just ignore
patch 1 for now.

Longer term I believe if the sb is frozen by userspace, we should refuse to
unmount it but that's a separate discussion for some other time.

> > BTW, when reading this code, I've spotted attached cleanup opportunity but
> > I'll queue that separately so that is JFYI.
> > 
> > > +#define FREEZE_HOLDER_USERSPACE	(1U << 1)	/* userspace froze fs */
> > > +#define FREEZE_HOLDER_KERNEL	(1U << 2)	/* kernel froze fs */
> > 
> > Why not start from 1U << 0? And bonus points for using BIT() macro :).
> 
> I didn't think filesystem code was supposed to be using stuff from
> vdso.h...

Hum, so BIT() macro is quite widely used in include/linux/ (from generic
headers e.g. in trace.h). include/linux/bits.h seems to be the right
include to use but I'm pretty sure include/linux/fs.h already gets this
header through something.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-06-07  9:22         ` Jan Kara
@ 2023-06-07 14:50           ` Darrick J. Wong
  0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2023-06-07 14:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: Luis Chamberlain, hch, sandeen, song, rafael, gregkh, viro,
	jikos, bvanassche, ebiederm, mchehab, keescook, p.raghav,
	da.gomez, linux-fsdevel, kernel, kexec, linux-kernel

On Wed, Jun 07, 2023 at 11:22:43AM +0200, Jan Kara wrote:
> On Tue 06-06-23 10:19:56, Darrick J. Wong wrote:
> > On Thu, May 25, 2023 at 04:14:30PM +0200, Jan Kara wrote:
> > > On Mon 22-05-23 16:42:00, Darrick J. Wong wrote:
> > > > How about this as an alternative patch?  Kernel and userspace freeze
> > > > state are stored in s_writers; each type cannot block the other (though
> > > > you still can't have nested kernel or userspace freezes); and the freeze
> > > > is maintained until /both/ freeze types are dropped.
> > > > 
> > > > AFAICT this should work for the two other usecases (quiescing pagefaults
> > > > for fsdax pmem pre-removal; and freezing fses during suspend) besides
> > > > online fsck for xfs.
> > > > 
> > > > --D
> > > > 
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > Subject: fs: distinguish between user initiated freeze and kernel initiated freeze
> > > > 
> > > > Userspace can freeze a filesystem using the FIFREEZE ioctl or by
> > > > suspending the block device; this state persists until userspace thaws
> > > > the filesystem with the FITHAW ioctl or resuming the block device.
> > > > Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
> > > > the fsfreeze ioctl") we only allow the first freeze command to succeed.
> > > > 
> > > > The kernel may decide that it is necessary to freeze a filesystem for
> > > > its own internal purposes, such as suspends in progress, filesystem fsck
> > > > activities, or quiescing a device prior to removal.  Userspace thaw
> > > > commands must never break a kernel freeze, and kernel thaw commands
> > > > shouldn't undo userspace's freeze command.
> > > > 
> > > > Introduce a couple of freeze holder flags and wire it into the
> > > > sb_writers state.  One kernel and one userspace freeze are allowed to
> > > > coexist at the same time; the filesystem will not thaw until both are
> > > > lifted.
> > > > 
> > > > Inspired-by: Luis Chamberlain <mcgrof@kernel.org>
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Yes, this is exactly how I'd imagine it. Thanks for writing the patch!
> > > 
> > > I'd just note that this would need rebasing on top of Luis' patches 1 and
> > > 2. Also:
> > 
> > I started doing that, but I noticed that after patch 1, freeze_super no
> > longer leaves s_active elevated if the freeze is successful.  The
> > callers drop the s_active ref that they themselves obtained, which
> > means that we've now changed that behavior, right?  ioctl_fsfreeze now
> > does:
> > 
> > 	if (!get_active_super(sb->s_bdev))
> > 		return -ENOTTY;
> > 
> > (Increase ref)
> > 
> >         /* Freeze */
> >         if (sb->s_op->freeze_super)
> > 		ret = sb->s_op->freeze_super(sb);
> > 	ret = freeze_super(sb);
> > 
> > (Not sure why we can do both here?)
> > 
> > 	deactivate_locked_super(sb);
> > 
> > (Decrease ref; net change to s_active is zero)
> > 
> > 	return ret;
> > 
> > Luis hasn't responded to my question, so I stopped.
> 
> Right. I kind of like how he's moved the locking out of freeze_super() /
> thaw_super() but I agree this semantic change is problematic and needs much
> more thought - e.g. with Luis' version the fs could be unmounted while
> frozen which is going to spectacularly deadlock. So yeah let's just ignore
> patch 1 for now.

Agreed, I like moving the locking out of freeze_super too.

I'm less enthused about patch 2's helpers since there are those
intermediate freezer states whose existence are only hinted at if you
get to the point of asking yourself "Why would there be both _is_frozen
and _is_unfrozen predicates?".

> Longer term I believe if the sb is frozen by userspace, we should refuse to
> unmount it but that's a separate discussion for some other time.
> 
> > > BTW, when reading this code, I've spotted attached cleanup opportunity but
> > > I'll queue that separately so that is JFYI.
> > > 
> > > > +#define FREEZE_HOLDER_USERSPACE	(1U << 1)	/* userspace froze fs */
> > > > +#define FREEZE_HOLDER_KERNEL	(1U << 2)	/* kernel froze fs */
> > > 
> > > Why not start from 1U << 0? And bonus points for using BIT() macro :).
> > 
> > I didn't think filesystem code was supposed to be using stuff from
> > vdso.h...
> 
> Hum, so BIT() macro is quite widely used in include/linux/ (from generic
> headers e.g. in trace.h). include/linux/bits.h seems to be the right
> include to use but I'm pretty sure include/linux/fs.h already gets this
> header through something.

Ok, will do.

--D

> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-05-25 14:14     ` Jan Kara
  2023-06-06 17:19       ` Darrick J. Wong
@ 2023-06-07 16:31       ` Darrick J. Wong
  2023-06-07 20:46         ` Jan Kara
  2023-06-08  5:29       ` Christoph Hellwig
  2 siblings, 1 reply; 34+ messages in thread
From: Darrick J. Wong @ 2023-06-07 16:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: Luis Chamberlain, hch, sandeen, song, rafael, gregkh, viro,
	jikos, bvanassche, ebiederm, mchehab, keescook, p.raghav,
	da.gomez, linux-fsdevel, kernel, kexec, linux-kernel

On Thu, May 25, 2023 at 04:14:30PM +0200, Jan Kara wrote:
> On Mon 22-05-23 16:42:00, Darrick J. Wong wrote:
> > How about this as an alternative patch?  Kernel and userspace freeze
> > state are stored in s_writers; each type cannot block the other (though
> > you still can't have nested kernel or userspace freezes); and the freeze
> > is maintained until /both/ freeze types are dropped.
> > 
> > AFAICT this should work for the two other usecases (quiescing pagefaults
> > for fsdax pmem pre-removal; and freezing fses during suspend) besides
> > online fsck for xfs.
> > 
> > --D
> > 
> > From: Darrick J. Wong <djwong@kernel.org>
> > Subject: fs: distinguish between user initiated freeze and kernel initiated freeze
> > 
> > Userspace can freeze a filesystem using the FIFREEZE ioctl or by
> > suspending the block device; this state persists until userspace thaws
> > the filesystem with the FITHAW ioctl or resuming the block device.
> > Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
> > the fsfreeze ioctl") we only allow the first freeze command to succeed.
> > 
> > The kernel may decide that it is necessary to freeze a filesystem for
> > its own internal purposes, such as suspends in progress, filesystem fsck
> > activities, or quiescing a device prior to removal.  Userspace thaw
> > commands must never break a kernel freeze, and kernel thaw commands
> > shouldn't undo userspace's freeze command.
> > 
> > Introduce a couple of freeze holder flags and wire it into the
> > sb_writers state.  One kernel and one userspace freeze are allowed to
> > coexist at the same time; the filesystem will not thaw until both are
> > lifted.
> > 
> > Inspired-by: Luis Chamberlain <mcgrof@kernel.org>
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> 
> Yes, this is exactly how I'd imagine it. Thanks for writing the patch!
> 
> I'd just note that this would need rebasing on top of Luis' patches 1 and
> 2. Also:
> 
> > +	if (sbw->frozen == SB_FREEZE_COMPLETE) {
> > +		switch (who) {
> > +		case FREEZE_HOLDER_KERNEL:
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> > +				/*
> > +				 * Kernel freeze already in effect; caller can
> > +				 * try again.
> > +				 */
> > +				deactivate_locked_super(sb);
> > +				return -EBUSY;
> > +			}
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> > +				/*
> > +				 * Share the freeze state with the userspace
> > +				 * freeze already in effect.
> > +				 */
> > +				sbw->freeze_holders |= who;
> > +				deactivate_locked_super(sb);
> > +				return 0;
> > +			}
> > +			break;
> > +		case FREEZE_HOLDER_USERSPACE:
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> > +				/*
> > +				 * Userspace freeze already in effect; tell
> > +				 * the caller we're busy.
> > +				 */
> > +				deactivate_locked_super(sb);
> > +				return -EBUSY;
> > +			}
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> > +				/*
> > +				 * Share the freeze state with the kernel
> > +				 * freeze already in effect.
> > +				 */
> > +				sbw->freeze_holders |= who;
> > +				deactivate_locked_super(sb);
> > +				return 0;
> > +			}
> > +			break;
> > +		default:
> > +			BUG();
> > +			deactivate_locked_super(sb);
> > +			return -EINVAL;
> > +		}
> > +	}
> 
> Can't this be simplified to:
> 
> 	BUG_ON(who & ~(FREEZE_HOLDER_USERSPACE | FREEZE_HOLDER_KERNEL));
> 	BUG_ON(!(!(who & FREEZE_HOLDER_USERSPACE) ^
> 	       !(who & FREEZE_HOLDER_KERNEL)));
> retry:
> 	if (sb->s_writers.freeze_holders & who)
> 		return -EBUSY;
> 	/* Already frozen by someone else? */
> 	if (sb->s_writers.freeze_holders & ~who) {
> 		sb->s_writers.freeze_holders |= who;
> 		return 0;
> 	}
> 
> Now the only remaining issue with the code is that the two different
> holders can be attempting to freeze the filesystem at once and in that case
> one of them has to wait for the other one instead of returning -EBUSY as
> would happen currently. This can happen because we temporarily drop
> s_umount in freeze_super() due to lock ordering issues. I think we could
> do something like:
> 
> 	if (!sb_unfrozen(sb)) {
> 		up_write(&sb->s_umount);
> 		wait_var_event(&sb->s_writers.frozen,
> 			       sb_unfrozen(sb) || sb_frozen(sb));
> 		down_write(&sb->s_umount);
> 		goto retry;
> 	}
> 
> and then sprinkle wake_up_var(&sb->s_writers.frozen) at appropriate places
> in freeze_super().

If we implemented this behavior change, it ought to be a separate patch.

For the case where the kernel is freezing the fs and userspace wants to
start freezing the fs, we could make userspace wait and then share the
kernel freeze.

For any case where the fs is !unfrozen and the kernel wants to start
freezing the fs, I think I'd rather return EBUSY immediately and let the
caller decide to wait and/or call back.

For the case where one userspace thread is freezing the fs and another
userspace thread wants to start freezing the fs, I think the current
behavior of returning EBUSY immediately is ok.

--D

> BTW, when reading this code, I've spotted attached cleanup opportunity but
> I'll queue that separately so that is JFYI.
> 
> > +#define FREEZE_HOLDER_USERSPACE	(1U << 1)	/* userspace froze fs */
> > +#define FREEZE_HOLDER_KERNEL	(1U << 2)	/* kernel froze fs */
> 
> Why not start from 1U << 0? And bonus points for using BIT() macro :).
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

> From 9fce35f21f9a62470e764463c84373fb013108fd Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Thu, 25 May 2023 15:56:19 +0200
> Subject: [PATCH] fs: Drop wait_unfrozen wait queue
> 
> wait_unfrozen waitqueue is used only in quota code to wait for
> filesystem to become unfrozen. In that place we can just use
> sb_start_write() - sb_end_write() pair to achieve the same. So just
> remove the waitqueue.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/quota/quota.c   | 5 +++--
>  fs/super.c         | 4 ----
>  include/linux/fs.h | 1 -
>  3 files changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index 052f143e2e0e..0e41fb84060f 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -895,8 +895,9 @@ static struct super_block *quotactl_block(const char __user *special, int cmd)
>  			up_write(&sb->s_umount);
>  		else
>  			up_read(&sb->s_umount);
> -		wait_event(sb->s_writers.wait_unfrozen,
> -			   sb->s_writers.frozen == SB_UNFROZEN);
> +		/* Wait for sb to unfreeze */
> +		sb_start_write(sb);
> +		sb_end_write(sb);
>  		put_super(sb);
>  		goto retry;
>  	}
> diff --git a/fs/super.c b/fs/super.c
> index 34afe411cf2b..6283cea67280 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -236,7 +236,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
>  					&type->s_writers_key[i]))
>  			goto fail;
>  	}
> -	init_waitqueue_head(&s->s_writers.wait_unfrozen);
>  	s->s_bdi = &noop_backing_dev_info;
>  	s->s_flags = flags;
>  	if (s->s_user_ns != &init_user_ns)
> @@ -1706,7 +1705,6 @@ int freeze_super(struct super_block *sb)
>  	if (ret) {
>  		sb->s_writers.frozen = SB_UNFROZEN;
>  		sb_freeze_unlock(sb, SB_FREEZE_PAGEFAULT);
> -		wake_up(&sb->s_writers.wait_unfrozen);
>  		deactivate_locked_super(sb);
>  		return ret;
>  	}
> @@ -1722,7 +1720,6 @@ int freeze_super(struct super_block *sb)
>  				"VFS:Filesystem freeze failed\n");
>  			sb->s_writers.frozen = SB_UNFROZEN;
>  			sb_freeze_unlock(sb, SB_FREEZE_FS);
> -			wake_up(&sb->s_writers.wait_unfrozen);
>  			deactivate_locked_super(sb);
>  			return ret;
>  		}
> @@ -1768,7 +1765,6 @@ static int thaw_super_locked(struct super_block *sb)
>  	sb->s_writers.frozen = SB_UNFROZEN;
>  	sb_freeze_unlock(sb, SB_FREEZE_FS);
>  out:
> -	wake_up(&sb->s_writers.wait_unfrozen);
>  	deactivate_locked_super(sb);
>  	return 0;
>  }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 21a981680856..3b65a6194485 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1146,7 +1146,6 @@ enum {
>  
>  struct sb_writers {
>  	int				frozen;		/* Is sb frozen? */
> -	wait_queue_head_t		wait_unfrozen;	/* wait for thaw */
>  	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
>  };
>  
> -- 
> 2.35.3
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-06-07 16:31       ` Darrick J. Wong
@ 2023-06-07 20:46         ` Jan Kara
  2023-06-08 18:58           ` Darrick J. Wong
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Kara @ 2023-06-07 20:46 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jan Kara, Luis Chamberlain, hch, sandeen, song, rafael, gregkh,
	viro, jikos, bvanassche, ebiederm, mchehab, keescook, p.raghav,
	da.gomez, linux-fsdevel, kernel, kexec, linux-kernel

On Wed 07-06-23 09:31:10, Darrick J. Wong wrote:
> On Thu, May 25, 2023 at 04:14:30PM +0200, Jan Kara wrote:
> > On Mon 22-05-23 16:42:00, Darrick J. Wong wrote:
> > > How about this as an alternative patch?  Kernel and userspace freeze
> > > state are stored in s_writers; each type cannot block the other (though
> > > you still can't have nested kernel or userspace freezes); and the freeze
> > > is maintained until /both/ freeze types are dropped.
> > > 
> > > AFAICT this should work for the two other usecases (quiescing pagefaults
> > > for fsdax pmem pre-removal; and freezing fses during suspend) besides
> > > online fsck for xfs.
> > > 
> > > --D
> > > 
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > Subject: fs: distinguish between user initiated freeze and kernel initiated freeze
> > > 
> > > Userspace can freeze a filesystem using the FIFREEZE ioctl or by
> > > suspending the block device; this state persists until userspace thaws
> > > the filesystem with the FITHAW ioctl or resuming the block device.
> > > Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
> > > the fsfreeze ioctl") we only allow the first freeze command to succeed.
> > > 
> > > The kernel may decide that it is necessary to freeze a filesystem for
> > > its own internal purposes, such as suspends in progress, filesystem fsck
> > > activities, or quiescing a device prior to removal.  Userspace thaw
> > > commands must never break a kernel freeze, and kernel thaw commands
> > > shouldn't undo userspace's freeze command.
> > > 
> > > Introduce a couple of freeze holder flags and wire it into the
> > > sb_writers state.  One kernel and one userspace freeze are allowed to
> > > coexist at the same time; the filesystem will not thaw until both are
> > > lifted.
> > > 
> > > Inspired-by: Luis Chamberlain <mcgrof@kernel.org>
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > 
> > Yes, this is exactly how I'd imagine it. Thanks for writing the patch!
> > 
> > I'd just note that this would need rebasing on top of Luis' patches 1 and
> > 2. Also:
> > 
> > > +	if (sbw->frozen == SB_FREEZE_COMPLETE) {
> > > +		switch (who) {
> > > +		case FREEZE_HOLDER_KERNEL:
> > > +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> > > +				/*
> > > +				 * Kernel freeze already in effect; caller can
> > > +				 * try again.
> > > +				 */
> > > +				deactivate_locked_super(sb);
> > > +				return -EBUSY;
> > > +			}
> > > +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> > > +				/*
> > > +				 * Share the freeze state with the userspace
> > > +				 * freeze already in effect.
> > > +				 */
> > > +				sbw->freeze_holders |= who;
> > > +				deactivate_locked_super(sb);
> > > +				return 0;
> > > +			}
> > > +			break;
> > > +		case FREEZE_HOLDER_USERSPACE:
> > > +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> > > +				/*
> > > +				 * Userspace freeze already in effect; tell
> > > +				 * the caller we're busy.
> > > +				 */
> > > +				deactivate_locked_super(sb);
> > > +				return -EBUSY;
> > > +			}
> > > +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> > > +				/*
> > > +				 * Share the freeze state with the kernel
> > > +				 * freeze already in effect.
> > > +				 */
> > > +				sbw->freeze_holders |= who;
> > > +				deactivate_locked_super(sb);
> > > +				return 0;
> > > +			}
> > > +			break;
> > > +		default:
> > > +			BUG();
> > > +			deactivate_locked_super(sb);
> > > +			return -EINVAL;
> > > +		}
> > > +	}
> > 
> > Can't this be simplified to:
> > 
> > 	BUG_ON(who & ~(FREEZE_HOLDER_USERSPACE | FREEZE_HOLDER_KERNEL));
> > 	BUG_ON(!(!(who & FREEZE_HOLDER_USERSPACE) ^
> > 	       !(who & FREEZE_HOLDER_KERNEL)));
> > retry:
> > 	if (sb->s_writers.freeze_holders & who)
> > 		return -EBUSY;
> > 	/* Already frozen by someone else? */
> > 	if (sb->s_writers.freeze_holders & ~who) {
> > 		sb->s_writers.freeze_holders |= who;
> > 		return 0;
> > 	}
> > 
> > Now the only remaining issue with the code is that the two different
> > holders can be attempting to freeze the filesystem at once and in that case
> > one of them has to wait for the other one instead of returning -EBUSY as
> > would happen currently. This can happen because we temporarily drop
> > s_umount in freeze_super() due to lock ordering issues. I think we could
> > do something like:
> > 
> > 	if (!sb_unfrozen(sb)) {
> > 		up_write(&sb->s_umount);
> > 		wait_var_event(&sb->s_writers.frozen,
> > 			       sb_unfrozen(sb) || sb_frozen(sb));
> > 		down_write(&sb->s_umount);
> > 		goto retry;
> > 	}
> > 
> > and then sprinkle wake_up_var(&sb->s_writers.frozen) at appropriate places
> > in freeze_super().
> 
> If we implemented this behavior change, it ought to be a separate patch.
> 
> For the case where the kernel is freezing the fs and userspace wants to
> start freezing the fs, we could make userspace wait and then share the
> kernel freeze.

Yes.

> For any case where the fs is !unfrozen and the kernel wants to start
> freezing the fs, I think I'd rather return EBUSY immediately and let the
> caller decide to wait and/or call back.

Possibly, although I thought that if userspace has frozen the fs and kernel
wants to freeze, we want to return success? At least that was what I think
your patches were doing. And then I don't see the point why we should be
returning EBUSY if userspace is in the middle of the freeze. So what's the
intended semantics?
 
> For the case where one userspace thread is freezing the fs and another
> userspace thread wants to start freezing the fs, I think the current
> behavior of returning EBUSY immediately is ok.

Yes.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw
  2023-05-08  1:17 ` [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw Luis Chamberlain
  2023-05-18  5:32   ` Darrick J. Wong
  2023-05-25 12:17   ` Jan Kara
@ 2023-06-08  5:01   ` Christoph Hellwig
  2023-06-08 19:55     ` Luis Chamberlain
  2 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2023-06-08  5:01 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hch, djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm, mchehab, keescook, p.raghav, da.gomez,
	linux-fsdevel, kernel, kexec, linux-kernel

On Sun, May 07, 2023 at 06:17:12PM -0700, Luis Chamberlain wrote:
> Right now freeze_super()  and thaw_super() are called with
> different locking contexts. To expand on this is messy, so
> just unify the requirement to require grabbing an active
> reference and keep the superblock locked.
> 
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Maybe I'm just getting old, but where did I suggest this?

That being said, holding an active reference over any operation is a
good thing.  As Jan said it can be done way simpler than this, though.

Also please explain the actual different locking contexts and what
semantics you unify in the commit log please.

> - * freeze_super - lock the filesystem and force it into a consistent state
> + * freeze_super - force a filesystem backed by a block device into a consistent state
>   * @sb: the super to lock

This is not actually true.  Nothing in here has anything to do
with block devices, and it is used on a at least one non-block file
system.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/6] fs: add frozen sb state helpers
  2023-05-08  1:17 ` [PATCH 2/6] fs: add frozen sb state helpers Luis Chamberlain
  2023-05-25 12:19   ` Jan Kara
@ 2023-06-08  5:05   ` Christoph Hellwig
  2023-06-08 15:05     ` Darrick J. Wong
  1 sibling, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2023-06-08  5:05 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hch, djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm, mchehab, keescook, p.raghav, da.gomez,
	linux-fsdevel, kernel, kexec, linux-kernel

On Sun, May 07, 2023 at 06:17:13PM -0700, Luis Chamberlain wrote:
> Provide helpers so that we can check a superblock frozen state.
> This will make subsequent changes easier to read. This makes
> no functional changes.

I'll look at the further changes, but having frozen/unfrozen helpers
that sound binary while we have 4 shades of gray seems rather confusing
to me.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-05-22 23:42   ` Darrick J. Wong
  2023-05-25 14:14     ` Jan Kara
@ 2023-06-08  5:24     ` Christoph Hellwig
  2023-06-08 18:15       ` Darrick J. Wong
  2023-06-08 20:26     ` Luis Chamberlain
  2 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2023-06-08  5:24 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Luis Chamberlain, hch, sandeen, song, rafael, gregkh, viro, jack,
	jikos, bvanassche, ebiederm, mchehab, keescook, p.raghav,
	da.gomez, linux-fsdevel, kernel, kexec, linux-kernel

On Mon, May 22, 2023 at 04:42:00PM -0700, Darrick J. Wong wrote:
> How about this as an alternative patch?  Kernel and userspace freeze
> state are stored in s_writers; each type cannot block the other (though
> you still can't have nested kernel or userspace freezes); and the freeze
> is maintained until /both/ freeze types are dropped.
> 
> AFAICT this should work for the two other usecases (quiescing pagefaults
> for fsdax pmem pre-removal; and freezing fses during suspend) besides
> online fsck for xfs.

I think this is fundamentally the right thing.  Can you send this as
a standalone thread in a separate thread to make it sure it gets
expedited?

> -static int thaw_super_locked(struct super_block *sb);
> +static int thaw_super_locked(struct super_block *sb, unsigned short who);

Is the unsigned short really worth it?  Even if it's just two values
I'd also go for a __bitwise type to get the typechecking as that tends
to help a lot goind down the road.

>  /**
> - * freeze_super - lock the filesystem and force it into a consistent state
> + * __freeze_super - lock the filesystem and force it into a consistent state
>   * @sb: the super to lock
> + * @who: FREEZE_HOLDER_USERSPACE if userspace wants to freeze the fs;
> + * FREEZE_HOLDER_KERNEL if the kernel wants to freeze it
>   *
>   * Syncs the super to make sure the filesystem is consistent and calls the fs's
> - * freeze_fs.  Subsequent calls to this without first thawing the fs will return
> + * freeze_fs.  Subsequent calls to this without first thawing the fs may return
>   * -EBUSY.
>   *
> + * The @who argument distinguishes between the kernel and userspace trying to
> + * freeze the filesystem.  Although there cannot be multiple kernel freezes or
> + * multiple userspace freezes in effect at any given time, the kernel and
> + * userspace can both hold a filesystem frozen.  The filesystem remains frozen
> + * until there are no kernel or userspace freezes in effect.
> + *
>   * During this function, sb->s_writers.frozen goes through these values:
>   *
>   * SB_UNFROZEN: File system is normal, all writes progress as usual.
> @@ -1668,12 +1676,61 @@ static void sb_freeze_unlock(struct super_block *sb, int level)
>   *
>   * sb->s_writers.frozen is protected by sb->s_umount.
>   */

There's really no point in having a kerneldoc for a static function.
Either this moves to the actual exported functions, or it should
become a normal non-kerneldoc comment.  But I'm not even sre this split
makes much sense to start with.  I'd just add a the who argument
to freeze_super given that we have only very few callers anyway,
and it is way easier to follow than thse rappers hardoding the argument.

> +static int __freeze_super(struct super_block *sb, unsigned short who)
>  {
> +	struct sb_writers *sbw = &sb->s_writers;
>  	int ret;
>  
>  	atomic_inc(&sb->s_active);
>  	down_write(&sb->s_umount);
> +
> +	if (sbw->frozen == SB_FREEZE_COMPLETE) {
> +		switch (who) {

Nit, but maybe split evetything inside this branch into a
freeze_frozen_super helper?

> +static int thaw_super_locked(struct super_block *sb, unsigned short who)
> +{
> +	struct sb_writers *sbw = &sb->s_writers;
>  	int error;
>  
> +	if (sbw->frozen == SB_FREEZE_COMPLETE) {
> +		switch (who) {
> +		case FREEZE_HOLDER_KERNEL:
> +			if (!(sbw->freeze_holders & FREEZE_HOLDER_KERNEL)) {
> +				/* Caller doesn't hold a kernel freeze. */
> +				up_write(&sb->s_umount);
> +				return -EINVAL;
> +			}
> +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> +				/*
> +				 * We were sharing the freeze with userspace,
> +				 * so drop the userspace freeze but exit
> +				 * without unfreezing.
> +				 */
> +				sbw->freeze_holders &= ~who;
> +				up_write(&sb->s_umount);
> +				return 0;
> +			}
> +			break;
> +		case FREEZE_HOLDER_USERSPACE:
> +			if (!(sbw->freeze_holders & FREEZE_HOLDER_USERSPACE)) {
> +				/* Caller doesn't hold a userspace freeze. */
> +				up_write(&sb->s_umount);
> +				return -EINVAL;
> +			}
> +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> +				/*
> +				 * We were sharing the freeze with the kernel,
> +				 * so drop the kernel freeze but exit without
> +				 * unfreezing.
> +				 */
> +				sbw->freeze_holders &= ~who;
> +				up_write(&sb->s_umount);
> +				return 0;
> +			}
> +			break;
> +		default:
> +			BUG();
> +			up_write(&sb->s_umount);
> +			return -EINVAL;
> +		}

To me this screams for another 'is_partial_thaw' or so helper, whith
which we could doe something like:

	if (sbw->frozen != SB_FREEZE_COMPLETE) {
		ret = -EINVAL;
		goto out_unlock;
	}
	ret = is_partial_thaw(sb, who);
	if (ret) {
		if (ret == 1) {
			sbw->freeze_holders &= ~who;
			ret = 0
		}
		goto out_unlock;
	}

Btw, same comment about the wrappers as on the freeze side.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-05-25 14:14     ` Jan Kara
  2023-06-06 17:19       ` Darrick J. Wong
  2023-06-07 16:31       ` Darrick J. Wong
@ 2023-06-08  5:29       ` Christoph Hellwig
  2023-06-08  9:11         ` Jan Kara
  2 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2023-06-08  5:29 UTC (permalink / raw)
  To: Jan Kara
  Cc: Darrick J. Wong, Luis Chamberlain, hch, sandeen, song, rafael,
	gregkh, viro, jikos, bvanassche, ebiederm, mchehab, keescook,
	p.raghav, da.gomez, linux-fsdevel, kernel, kexec, linux-kernel

On Thu, May 25, 2023 at 04:14:30PM +0200, Jan Kara wrote:
> Yes, this is exactly how I'd imagine it. Thanks for writing the patch!
> 
> I'd just note that this would need rebasing on top of Luis' patches 1 and
> 2. Also:

I'd not do that for now.  1 needs a lot more work, and 2 seems rather
questionable.

> Now the only remaining issue with the code is that the two different
> holders can be attempting to freeze the filesystem at once and in that case
> one of them has to wait for the other one instead of returning -EBUSY as
> would happen currently. This can happen because we temporarily drop
> s_umount in freeze_super() due to lock ordering issues. I think we could
> do something like:
> 
> 	if (!sb_unfrozen(sb)) {
> 		up_write(&sb->s_umount);
> 		wait_var_event(&sb->s_writers.frozen,
> 			       sb_unfrozen(sb) || sb_frozen(sb));
> 		down_write(&sb->s_umount);
> 		goto retry;
> 	}
> 
> and then sprinkle wake_up_var(&sb->s_writers.frozen) at appropriate places
> in freeze_super().

Let's do that separately as a follow on..

> 
> BTW, when reading this code, I've spotted attached cleanup opportunity but
> I'll queue that separately so that is JFYI.
> 
> > +#define FREEZE_HOLDER_USERSPACE	(1U << 1)	/* userspace froze fs */
> > +#define FREEZE_HOLDER_KERNEL	(1U << 2)	/* kernel froze fs */
> 
> Why not start from 1U << 0? And bonus points for using BIT() macro :).

BIT() is a nasty thing and actually makes code harder to read. And it
doesn't interact very well with the __bitwise annotation that might
actually be useful here.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-06-08  5:29       ` Christoph Hellwig
@ 2023-06-08  9:11         ` Jan Kara
  2023-06-08 18:16           ` Darrick J. Wong
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Kara @ 2023-06-08  9:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Darrick J. Wong, Luis Chamberlain, sandeen, song,
	rafael, gregkh, viro, jikos, bvanassche, ebiederm, mchehab,
	keescook, p.raghav, da.gomez, linux-fsdevel, kernel, kexec,
	linux-kernel

On Wed 07-06-23 22:29:04, Christoph Hellwig wrote:
> On Thu, May 25, 2023 at 04:14:30PM +0200, Jan Kara wrote:
> > Yes, this is exactly how I'd imagine it. Thanks for writing the patch!
> > 
> > I'd just note that this would need rebasing on top of Luis' patches 1 and
> > 2. Also:
> 
> I'd not do that for now.  1 needs a lot more work, and 2 seems rather
> questionable.

OK, I agree the wrappers could be confusing (they didn't confuse me but
when you spelled it out, I agree).

> > Now the only remaining issue with the code is that the two different
> > holders can be attempting to freeze the filesystem at once and in that case
> > one of them has to wait for the other one instead of returning -EBUSY as
> > would happen currently. This can happen because we temporarily drop
> > s_umount in freeze_super() due to lock ordering issues. I think we could
> > do something like:
> > 
> > 	if (!sb_unfrozen(sb)) {
> > 		up_write(&sb->s_umount);
> > 		wait_var_event(&sb->s_writers.frozen,
> > 			       sb_unfrozen(sb) || sb_frozen(sb));
> > 		down_write(&sb->s_umount);
> > 		goto retry;
> > 	}
> > 
> > and then sprinkle wake_up_var(&sb->s_writers.frozen) at appropriate places
> > in freeze_super().
> 
> Let's do that separately as a follow on..

Well, we need to somehow settle on how to deal with a race when both kernel
& userspace race to freeze the filesystem and make the result consistent
with the situation when the fs is already frozen by someone.

> > BTW, when reading this code, I've spotted attached cleanup opportunity but
> > I'll queue that separately so that is JFYI.
> > 
> > > +#define FREEZE_HOLDER_USERSPACE	(1U << 1)	/* userspace froze fs */
> > > +#define FREEZE_HOLDER_KERNEL	(1U << 2)	/* kernel froze fs */
> > 
> > Why not start from 1U << 0? And bonus points for using BIT() macro :).
> 
> BIT() is a nasty thing and actually makes code harder to read. And it
> doesn't interact very well with the __bitwise annotation that might
> actually be useful here.

OK. I'm not too hung up on BIT() macro.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/6] fs: add frozen sb state helpers
  2023-06-08  5:05   ` Christoph Hellwig
@ 2023-06-08 15:05     ` Darrick J. Wong
  0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2023-06-08 15:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Luis Chamberlain, sandeen, song, rafael, gregkh, viro, jack,
	jikos, bvanassche, ebiederm, mchehab, keescook, p.raghav,
	da.gomez, linux-fsdevel, kernel, kexec, linux-kernel

On Wed, Jun 07, 2023 at 10:05:46PM -0700, Christoph Hellwig wrote:
> On Sun, May 07, 2023 at 06:17:13PM -0700, Luis Chamberlain wrote:
> > Provide helpers so that we can check a superblock frozen state.
> > This will make subsequent changes easier to read. This makes
> > no functional changes.
> 
> I'll look at the further changes, but having frozen/unfrozen helpers
> that sound binary while we have 4 shades of gray seems rather confusing
> to me.

That was my first reaction too.

Then it occurred to me that *some* people might still be clued into that
subtlety if they happened to ask themselves why there are predicates for
_is_frozen and _is_unfrozen.

But in the end I think I decided that an enum isn't subtle like that at
all and clearly forgot to reply with that.

--D

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-06-08  5:24     ` Christoph Hellwig
@ 2023-06-08 18:15       ` Darrick J. Wong
  0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2023-06-08 18:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Luis Chamberlain, sandeen, song, rafael, gregkh, viro, jack,
	jikos, bvanassche, ebiederm, mchehab, keescook, p.raghav,
	da.gomez, linux-fsdevel, kernel, kexec, linux-kernel

On Wed, Jun 07, 2023 at 10:24:32PM -0700, Christoph Hellwig wrote:
> On Mon, May 22, 2023 at 04:42:00PM -0700, Darrick J. Wong wrote:
> > How about this as an alternative patch?  Kernel and userspace freeze
> > state are stored in s_writers; each type cannot block the other (though
> > you still can't have nested kernel or userspace freezes); and the freeze
> > is maintained until /both/ freeze types are dropped.
> > 
> > AFAICT this should work for the two other usecases (quiescing pagefaults
> > for fsdax pmem pre-removal; and freezing fses during suspend) besides
> > online fsck for xfs.
> 
> I think this is fundamentally the right thing.  Can you send this as
> a standalone thread in a separate thread to make it sure it gets
> expedited?

Yeah, I'll do that.

> > -static int thaw_super_locked(struct super_block *sb);
> > +static int thaw_super_locked(struct super_block *sb, unsigned short who);
> 
> Is the unsigned short really worth it?  Even if it's just two values
> I'd also go for a __bitwise type to get the typechecking as that tends
> to help a lot goind down the road.

Instead of __bitwise, I'll make freeze_super() take an enum, since
callers can only initiate one at a time, and the compiler can (in
theory) catch people passing garbage inputs.

> >  /**
> > - * freeze_super - lock the filesystem and force it into a consistent state
> > + * __freeze_super - lock the filesystem and force it into a consistent state
> >   * @sb: the super to lock
> > + * @who: FREEZE_HOLDER_USERSPACE if userspace wants to freeze the fs;
> > + * FREEZE_HOLDER_KERNEL if the kernel wants to freeze it
> >   *
> >   * Syncs the super to make sure the filesystem is consistent and calls the fs's
> > - * freeze_fs.  Subsequent calls to this without first thawing the fs will return
> > + * freeze_fs.  Subsequent calls to this without first thawing the fs may return
> >   * -EBUSY.
> >   *
> > + * The @who argument distinguishes between the kernel and userspace trying to
> > + * freeze the filesystem.  Although there cannot be multiple kernel freezes or
> > + * multiple userspace freezes in effect at any given time, the kernel and
> > + * userspace can both hold a filesystem frozen.  The filesystem remains frozen
> > + * until there are no kernel or userspace freezes in effect.
> > + *
> >   * During this function, sb->s_writers.frozen goes through these values:
> >   *
> >   * SB_UNFROZEN: File system is normal, all writes progress as usual.
> > @@ -1668,12 +1676,61 @@ static void sb_freeze_unlock(struct super_block *sb, int level)
> >   *
> >   * sb->s_writers.frozen is protected by sb->s_umount.
> >   */
> 
> There's really no point in having a kerneldoc for a static function.
> Either this moves to the actual exported functions, or it should
> become a normal non-kerneldoc comment.  But I'm not even sre this split
> makes much sense to start with.  I'd just add a the who argument
> to freeze_super given that we have only very few callers anyway,
> and it is way easier to follow than thse rappers hardoding the argument.

Agreed.

> > +static int __freeze_super(struct super_block *sb, unsigned short who)
> >  {
> > +	struct sb_writers *sbw = &sb->s_writers;
> >  	int ret;
> >  
> >  	atomic_inc(&sb->s_active);
> >  	down_write(&sb->s_umount);
> > +
> > +	if (sbw->frozen == SB_FREEZE_COMPLETE) {
> > +		switch (who) {
> 
> Nit, but maybe split evetything inside this branch into a
> freeze_frozen_super helper?

Yes, will do.  I think Jan's simplification will condense this too.

> > +static int thaw_super_locked(struct super_block *sb, unsigned short who)
> > +{
> > +	struct sb_writers *sbw = &sb->s_writers;
> >  	int error;
> >  
> > +	if (sbw->frozen == SB_FREEZE_COMPLETE) {
> > +		switch (who) {
> > +		case FREEZE_HOLDER_KERNEL:
> > +			if (!(sbw->freeze_holders & FREEZE_HOLDER_KERNEL)) {
> > +				/* Caller doesn't hold a kernel freeze. */
> > +				up_write(&sb->s_umount);
> > +				return -EINVAL;
> > +			}
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> > +				/*
> > +				 * We were sharing the freeze with userspace,
> > +				 * so drop the userspace freeze but exit
> > +				 * without unfreezing.
> > +				 */
> > +				sbw->freeze_holders &= ~who;
> > +				up_write(&sb->s_umount);
> > +				return 0;
> > +			}
> > +			break;
> > +		case FREEZE_HOLDER_USERSPACE:
> > +			if (!(sbw->freeze_holders & FREEZE_HOLDER_USERSPACE)) {
> > +				/* Caller doesn't hold a userspace freeze. */
> > +				up_write(&sb->s_umount);
> > +				return -EINVAL;
> > +			}
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> > +				/*
> > +				 * We were sharing the freeze with the kernel,
> > +				 * so drop the kernel freeze but exit without
> > +				 * unfreezing.
> > +				 */
> > +				sbw->freeze_holders &= ~who;
> > +				up_write(&sb->s_umount);
> > +				return 0;
> > +			}
> > +			break;
> > +		default:
> > +			BUG();
> > +			up_write(&sb->s_umount);
> > +			return -EINVAL;
> > +		}
> 
> To me this screams for another 'is_partial_thaw' or so helper, whith
> which we could doe something like:
> 
> 	if (sbw->frozen != SB_FREEZE_COMPLETE) {
> 		ret = -EINVAL;
> 		goto out_unlock;
> 	}
> 	ret = is_partial_thaw(sb, who);
> 	if (ret) {
> 		if (ret == 1) {
> 			sbw->freeze_holders &= ~who;
> 			ret = 0
> 		}
> 		goto out_unlock;
> 	}

<nod>

> Btw, same comment about the wrappers as on the freeze side.

<nod>

--D

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-06-08  9:11         ` Jan Kara
@ 2023-06-08 18:16           ` Darrick J. Wong
  0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2023-06-08 18:16 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Luis Chamberlain, sandeen, song, rafael,
	gregkh, viro, jikos, bvanassche, ebiederm, mchehab, keescook,
	p.raghav, da.gomez, linux-fsdevel, kernel, kexec, linux-kernel

On Thu, Jun 08, 2023 at 11:11:30AM +0200, Jan Kara wrote:
> On Wed 07-06-23 22:29:04, Christoph Hellwig wrote:
> > On Thu, May 25, 2023 at 04:14:30PM +0200, Jan Kara wrote:
> > > Yes, this is exactly how I'd imagine it. Thanks for writing the patch!
> > > 
> > > I'd just note that this would need rebasing on top of Luis' patches 1 and
> > > 2. Also:
> > 
> > I'd not do that for now.  1 needs a lot more work, and 2 seems rather
> > questionable.
> 
> OK, I agree the wrappers could be confusing (they didn't confuse me but
> when you spelled it out, I agree).
> 
> > > Now the only remaining issue with the code is that the two different
> > > holders can be attempting to freeze the filesystem at once and in that case
> > > one of them has to wait for the other one instead of returning -EBUSY as
> > > would happen currently. This can happen because we temporarily drop
> > > s_umount in freeze_super() due to lock ordering issues. I think we could
> > > do something like:
> > > 
> > > 	if (!sb_unfrozen(sb)) {
> > > 		up_write(&sb->s_umount);
> > > 		wait_var_event(&sb->s_writers.frozen,
> > > 			       sb_unfrozen(sb) || sb_frozen(sb));
> > > 		down_write(&sb->s_umount);
> > > 		goto retry;
> > > 	}
> > > 
> > > and then sprinkle wake_up_var(&sb->s_writers.frozen) at appropriate places
> > > in freeze_super().
> > 
> > Let's do that separately as a follow on..
> 
> Well, we need to somehow settle on how to deal with a race when both kernel
> & userspace race to freeze the filesystem and make the result consistent
> with the situation when the fs is already frozen by someone.

<nod> I'll see what I can do about that.

> > > BTW, when reading this code, I've spotted attached cleanup opportunity but
> > > I'll queue that separately so that is JFYI.
> > > 
> > > > +#define FREEZE_HOLDER_USERSPACE	(1U << 1)	/* userspace froze fs */
> > > > +#define FREEZE_HOLDER_KERNEL	(1U << 2)	/* kernel froze fs */
> > > 
> > > Why not start from 1U << 0? And bonus points for using BIT() macro :).
> > 
> > BIT() is a nasty thing and actually makes code harder to read. And it
> > doesn't interact very well with the __bitwise annotation that might
> > actually be useful here.
> 
> OK. I'm not too hung up on BIT() macro.
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-06-07 20:46         ` Jan Kara
@ 2023-06-08 18:58           ` Darrick J. Wong
  0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2023-06-08 18:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: Luis Chamberlain, hch, sandeen, song, rafael, gregkh, viro,
	jikos, bvanassche, ebiederm, mchehab, keescook, p.raghav,
	da.gomez, linux-fsdevel, kernel, kexec, linux-kernel

On Wed, Jun 07, 2023 at 10:46:10PM +0200, Jan Kara wrote:
> On Wed 07-06-23 09:31:10, Darrick J. Wong wrote:
> > On Thu, May 25, 2023 at 04:14:30PM +0200, Jan Kara wrote:
> > > On Mon 22-05-23 16:42:00, Darrick J. Wong wrote:
> > > > How about this as an alternative patch?  Kernel and userspace freeze
> > > > state are stored in s_writers; each type cannot block the other (though
> > > > you still can't have nested kernel or userspace freezes); and the freeze
> > > > is maintained until /both/ freeze types are dropped.
> > > > 
> > > > AFAICT this should work for the two other usecases (quiescing pagefaults
> > > > for fsdax pmem pre-removal; and freezing fses during suspend) besides
> > > > online fsck for xfs.
> > > > 
> > > > --D
> > > > 
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > Subject: fs: distinguish between user initiated freeze and kernel initiated freeze
> > > > 
> > > > Userspace can freeze a filesystem using the FIFREEZE ioctl or by
> > > > suspending the block device; this state persists until userspace thaws
> > > > the filesystem with the FITHAW ioctl or resuming the block device.
> > > > Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
> > > > the fsfreeze ioctl") we only allow the first freeze command to succeed.
> > > > 
> > > > The kernel may decide that it is necessary to freeze a filesystem for
> > > > its own internal purposes, such as suspends in progress, filesystem fsck
> > > > activities, or quiescing a device prior to removal.  Userspace thaw
> > > > commands must never break a kernel freeze, and kernel thaw commands
> > > > shouldn't undo userspace's freeze command.
> > > > 
> > > > Introduce a couple of freeze holder flags and wire it into the
> > > > sb_writers state.  One kernel and one userspace freeze are allowed to
> > > > coexist at the same time; the filesystem will not thaw until both are
> > > > lifted.
> > > > 
> > > > Inspired-by: Luis Chamberlain <mcgrof@kernel.org>
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Yes, this is exactly how I'd imagine it. Thanks for writing the patch!
> > > 
> > > I'd just note that this would need rebasing on top of Luis' patches 1 and
> > > 2. Also:
> > > 
> > > > +	if (sbw->frozen == SB_FREEZE_COMPLETE) {
> > > > +		switch (who) {
> > > > +		case FREEZE_HOLDER_KERNEL:
> > > > +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> > > > +				/*
> > > > +				 * Kernel freeze already in effect; caller can
> > > > +				 * try again.
> > > > +				 */
> > > > +				deactivate_locked_super(sb);
> > > > +				return -EBUSY;
> > > > +			}
> > > > +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> > > > +				/*
> > > > +				 * Share the freeze state with the userspace
> > > > +				 * freeze already in effect.
> > > > +				 */
> > > > +				sbw->freeze_holders |= who;
> > > > +				deactivate_locked_super(sb);
> > > > +				return 0;
> > > > +			}
> > > > +			break;
> > > > +		case FREEZE_HOLDER_USERSPACE:
> > > > +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> > > > +				/*
> > > > +				 * Userspace freeze already in effect; tell
> > > > +				 * the caller we're busy.
> > > > +				 */
> > > > +				deactivate_locked_super(sb);
> > > > +				return -EBUSY;
> > > > +			}
> > > > +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> > > > +				/*
> > > > +				 * Share the freeze state with the kernel
> > > > +				 * freeze already in effect.
> > > > +				 */
> > > > +				sbw->freeze_holders |= who;
> > > > +				deactivate_locked_super(sb);
> > > > +				return 0;
> > > > +			}
> > > > +			break;
> > > > +		default:
> > > > +			BUG();
> > > > +			deactivate_locked_super(sb);
> > > > +			return -EINVAL;
> > > > +		}
> > > > +	}
> > > 
> > > Can't this be simplified to:
> > > 
> > > 	BUG_ON(who & ~(FREEZE_HOLDER_USERSPACE | FREEZE_HOLDER_KERNEL));
> > > 	BUG_ON(!(!(who & FREEZE_HOLDER_USERSPACE) ^
> > > 	       !(who & FREEZE_HOLDER_KERNEL)));
> > > retry:
> > > 	if (sb->s_writers.freeze_holders & who)
> > > 		return -EBUSY;
> > > 	/* Already frozen by someone else? */
> > > 	if (sb->s_writers.freeze_holders & ~who) {
> > > 		sb->s_writers.freeze_holders |= who;
> > > 		return 0;
> > > 	}
> > > 
> > > Now the only remaining issue with the code is that the two different
> > > holders can be attempting to freeze the filesystem at once and in that case
> > > one of them has to wait for the other one instead of returning -EBUSY as
> > > would happen currently. This can happen because we temporarily drop
> > > s_umount in freeze_super() due to lock ordering issues. I think we could
> > > do something like:
> > > 
> > > 	if (!sb_unfrozen(sb)) {
> > > 		up_write(&sb->s_umount);
> > > 		wait_var_event(&sb->s_writers.frozen,
> > > 			       sb_unfrozen(sb) || sb_frozen(sb));
> > > 		down_write(&sb->s_umount);
> > > 		goto retry;
> > > 	}
> > > 
> > > and then sprinkle wake_up_var(&sb->s_writers.frozen) at appropriate places
> > > in freeze_super().
> > 
> > If we implemented this behavior change, it ought to be a separate patch.
> > 
> > For the case where the kernel is freezing the fs and userspace wants to
> > start freezing the fs, we could make userspace wait and then share the
> > kernel freeze.
> 
> Yes.
> 
> > For any case where the fs is !unfrozen and the kernel wants to start
> > freezing the fs, I think I'd rather return EBUSY immediately and let the
> > caller decide to wait and/or call back.
> 
> Possibly, although I thought that if userspace has frozen the fs and kernel
> wants to freeze, we want to return success?

Yes.  Apologies, I tripped over the four-states-of-gray thing and forgot
that frozen != !unfrozen.

> At least that was what I think
> your patches were doing. And then I don't see the point why we should be
> returning EBUSY if userspace is in the middle of the freeze. So what's the
> intended semantics?

Let me try again:

"For the case where one kernel thread is freezing the fs and another
kernel thread wants to start freezing the fs, return -EBUSY immediately.

"For the case where userspace is freezing the fs and kernel wants to
start freezing the fs, return -EBUSY immediately.  Callers decide if
they want to sleep and/or retry the operation."

--D

> > For the case where one userspace thread is freezing the fs and another
> > userspace thread wants to start freezing the fs, I think the current
> > behavior of returning EBUSY immediately is ok.
> 
> Yes.
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw
  2023-06-08  5:01   ` Christoph Hellwig
@ 2023-06-08 19:55     ` Luis Chamberlain
  0 siblings, 0 replies; 34+ messages in thread
From: Luis Chamberlain @ 2023-06-08 19:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: djwong, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm, mchehab, keescook, p.raghav, da.gomez,
	linux-fsdevel, kernel, kexec, linux-kernel

On Wed, Jun 07, 2023 at 10:01:14PM -0700, Christoph Hellwig wrote:
> On Sun, May 07, 2023 at 06:17:12PM -0700, Luis Chamberlain wrote:
> > Right now freeze_super()  and thaw_super() are called with
> > different locking contexts. To expand on this is messy, so
> > just unify the requirement to require grabbing an active
> > reference and keep the superblock locked.
> > 
> > Suggested-by: Christoph Hellwig <hch@infradead.org>
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> 
> Maybe I'm just getting old, but where did I suggest this?

https://lore.kernel.org/all/20210420120335.GA3604224@infradead.org/

"I don't think we need both variants, just move the locking and s_active
acquisition out of free_super.  Same for the thaw side."

> That being said, holding an active reference over any operation is a
> good thing.  As Jan said it can be done way simpler than this, though.

Great.


  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-05-22 23:42   ` Darrick J. Wong
  2023-05-25 14:14     ` Jan Kara
  2023-06-08  5:24     ` Christoph Hellwig
@ 2023-06-08 20:26     ` Luis Chamberlain
  2023-06-08 21:10       ` Darrick J. Wong
  2 siblings, 1 reply; 34+ messages in thread
From: Luis Chamberlain @ 2023-06-08 20:26 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: hch, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm, mchehab, keescook, p.raghav, da.gomez,
	linux-fsdevel, kernel, kexec, linux-kernel

On Mon, May 22, 2023 at 04:42:00PM -0700, Darrick J. Wong wrote:
> How about this as an alternative patch?

I'm all for it, this is low hanging fruit and I try to get back to it
as no one else does, so I'm glad someone else is looking and trying too!

Hopefully dropping patch 1 and 2 would help with this.

Comments below.

> From: Darrick J. Wong <djwong@kernel.org>
> Subject: fs: distinguish between user initiated freeze and kernel initiated freeze
> 
> Userspace can freeze a filesystem using the FIFREEZE ioctl or by
> suspending the block device; this state persists until userspace thaws
> the filesystem with the FITHAW ioctl or resuming the block device.
> Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
> the fsfreeze ioctl") we only allow the first freeze command to succeed.
> 
> The kernel may decide that it is necessary to freeze a filesystem for
> its own internal purposes, such as suspends in progress, filesystem fsck
> activities, or quiescing a device prior to removal.  Userspace thaw
> commands must never break a kernel freeze, and kernel thaw commands
> shouldn't undo userspace's freeze command.
> 
> Introduce a couple of freeze holder flags and wire it into the
> sb_writers state.  One kernel and one userspace freeze are allowed to
> coexist at the same time; the filesystem will not thaw until both are
> lifted.

This mix-match stuff is also important to document so we can get
userspace to understand what is allowed and we get a sense of direction
written / documented. Without this trying to navigate around this is
all implied. We may need to adjust things with time for thing we may
not have considered.

> -int freeze_super(struct super_block *sb)
> +static int __freeze_super(struct super_block *sb, unsigned short who)
>  {
> +	struct sb_writers *sbw = &sb->s_writers;
>  	int ret;
>  
>  	atomic_inc(&sb->s_active);
>  	down_write(&sb->s_umount);
> +
> +	if (sbw->frozen == SB_FREEZE_COMPLETE) {
> +		switch (who) {

<-- snip -->

> +		case FREEZE_HOLDER_USERSPACE:
> +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> +				/*
> +				 * Userspace freeze already in effect; tell
> +				 * the caller we're busy.
> +				 */
> +				deactivate_locked_super(sb);
> +				return -EBUSY;

I'm thinking some userspace might find this OK so thought maybe
something like -EALREADY would be better, to then allow userspace
to decide, however, since userspace would not control the thaw it
seems like risky business to support that.

Anyway, I'm all for any alternative!

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-06-06 17:19       ` Darrick J. Wong
  2023-06-07  9:22         ` Jan Kara
@ 2023-06-08 20:30         ` Luis Chamberlain
  1 sibling, 0 replies; 34+ messages in thread
From: Luis Chamberlain @ 2023-06-08 20:30 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jan Kara, hch, sandeen, song, rafael, gregkh, viro, jikos,
	bvanassche, ebiederm, mchehab, keescook, p.raghav, da.gomez,
	linux-fsdevel, kernel, kexec, linux-kernel

On Tue, Jun 06, 2023 at 10:19:56AM -0700, Darrick J. Wong wrote:
> Luis hasn't responded to my question, so I stopped.

Sorry for the delay, by all means alternatives are hugely
welcomed. I just worked on it as it was back logged work and
not a high priority thing for me, so I try to get to it when I
can. Having someone with more immediate needs give it a stab
is hugely appreciated!

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze
  2023-06-08 20:26     ` Luis Chamberlain
@ 2023-06-08 21:10       ` Darrick J. Wong
  0 siblings, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2023-06-08 21:10 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hch, sandeen, song, rafael, gregkh, viro, jack, jikos,
	bvanassche, ebiederm, mchehab, keescook, p.raghav, da.gomez,
	linux-fsdevel, kernel, kexec, linux-kernel

On Thu, Jun 08, 2023 at 01:26:19PM -0700, Luis Chamberlain wrote:
> On Mon, May 22, 2023 at 04:42:00PM -0700, Darrick J. Wong wrote:
> > How about this as an alternative patch?
> 
> I'm all for it, this is low hanging fruit and I try to get back to it
> as no one else does, so I'm glad someone else is looking and trying too!
> 
> Hopefully dropping patch 1 and 2 would help with this.
> 
> Comments below.
> 
> > From: Darrick J. Wong <djwong@kernel.org>
> > Subject: fs: distinguish between user initiated freeze and kernel initiated freeze
> > 
> > Userspace can freeze a filesystem using the FIFREEZE ioctl or by
> > suspending the block device; this state persists until userspace thaws
> > the filesystem with the FITHAW ioctl or resuming the block device.
> > Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
> > the fsfreeze ioctl") we only allow the first freeze command to succeed.
> > 
> > The kernel may decide that it is necessary to freeze a filesystem for
> > its own internal purposes, such as suspends in progress, filesystem fsck
> > activities, or quiescing a device prior to removal.  Userspace thaw
> > commands must never break a kernel freeze, and kernel thaw commands
> > shouldn't undo userspace's freeze command.
> > 
> > Introduce a couple of freeze holder flags and wire it into the
> > sb_writers state.  One kernel and one userspace freeze are allowed to
> > coexist at the same time; the filesystem will not thaw until both are
> > lifted.
> 
> This mix-match stuff is also important to document so we can get
> userspace to understand what is allowed and we get a sense of direction
> written / documented. Without this trying to navigate around this is
> all implied. We may need to adjust things with time for thing we may
> not have considered.

That's captured in the kernledoc for freeze_super, which is no longer
getting cut up into __freeze_super here.

> > -int freeze_super(struct super_block *sb)
> > +static int __freeze_super(struct super_block *sb, unsigned short who)
> >  {
> > +	struct sb_writers *sbw = &sb->s_writers;
> >  	int ret;
> >  
> >  	atomic_inc(&sb->s_active);
> >  	down_write(&sb->s_umount);
> > +
> > +	if (sbw->frozen == SB_FREEZE_COMPLETE) {
> > +		switch (who) {
> 
> <-- snip -->
> 
> > +		case FREEZE_HOLDER_USERSPACE:
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> > +				/*
> > +				 * Userspace freeze already in effect; tell
> > +				 * the caller we're busy.
> > +				 */
> > +				deactivate_locked_super(sb);
> > +				return -EBUSY;
> 
> I'm thinking some userspace might find this OK so thought maybe
> something like -EALREADY would be better, to then allow userspace
> to decide, however, since userspace would not control the thaw it
> seems like risky business to support that.

It already has to, since we've been returning EBUSY for "fs already
frozen or being frozen" for years.

--D

> Anyway, I'm all for any alternative!
> 
>   Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2023-06-08 21:10 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-08  1:17 [PATCH 0/6] vfs: provide automatic kernel freeze / resume Luis Chamberlain
2023-05-08  1:17 ` [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw Luis Chamberlain
2023-05-18  5:32   ` Darrick J. Wong
2023-05-25 12:17   ` Jan Kara
2023-06-08  5:01   ` Christoph Hellwig
2023-06-08 19:55     ` Luis Chamberlain
2023-05-08  1:17 ` [PATCH 2/6] fs: add frozen sb state helpers Luis Chamberlain
2023-05-25 12:19   ` Jan Kara
2023-06-08  5:05   ` Christoph Hellwig
2023-06-08 15:05     ` Darrick J. Wong
2023-05-08  1:17 ` [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze Luis Chamberlain
2023-05-16 15:23   ` Darrick J. Wong
2023-05-22 23:42   ` Darrick J. Wong
2023-05-25 14:14     ` Jan Kara
2023-06-06 17:19       ` Darrick J. Wong
2023-06-07  9:22         ` Jan Kara
2023-06-07 14:50           ` Darrick J. Wong
2023-06-08 20:30         ` Luis Chamberlain
2023-06-07 16:31       ` Darrick J. Wong
2023-06-07 20:46         ` Jan Kara
2023-06-08 18:58           ` Darrick J. Wong
2023-06-08  5:29       ` Christoph Hellwig
2023-06-08  9:11         ` Jan Kara
2023-06-08 18:16           ` Darrick J. Wong
2023-06-08  5:24     ` Christoph Hellwig
2023-06-08 18:15       ` Darrick J. Wong
2023-06-08 20:26     ` Luis Chamberlain
2023-06-08 21:10       ` Darrick J. Wong
2023-05-08  1:17 ` [PATCH 4/6] fs: move !SB_BORN check early on freeze and add for thaw Luis Chamberlain
2023-05-08  1:17 ` [PATCH 5/6] fs: add iterate_supers_excl() and iterate_supers_reverse_excl() Luis Chamberlain
2023-05-08  1:17 ` [PATCH 6/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing Luis Chamberlain
2023-05-09  1:20   ` Dave Chinner
2023-05-16 15:17     ` Darrick J. Wong
2023-05-08  1:21 ` [PATCH 0/6] vfs: provide automatic kernel freeze / resume Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).