All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/10] btrfs: free space tree mounting fixes
@ 2020-11-18 23:06 Boris Burkov
  2020-11-18 23:06 ` [PATCH v7 01/12] btrfs: lift rw mount setup from mount and remount Boris Burkov
                   ` (13 more replies)
  0 siblings, 14 replies; 25+ messages in thread
From: Boris Burkov @ 2020-11-18 23:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

This patch set cleans up issues surrounding enabling and disabling various
free space cache features and their reporting in /proc/mounts.  Because the
improvements became somewhat complex, the series starts by lifting rw mount
logic into a single place.

The first patch is a setup patch that unifies very similar logic between a
normal rw mount and a ro->rw remount. This is a useful setup step for adding
more functionality to ro->rw remounts.

The second patch fixes the omission of orphan inode cleanup on a few trees
during ro->rw remount.

The third patch stops marking block groups with need_free_space when the
free space tree is not yet enabled.

The fourth patch adds enabling the free space tree to ro->rw remount.

The fifth patch adds a method for clearing oneshot mount options after mount.

The sixth patch adds support for clearing the free space tree on ro->rw remount.

The seventh patch sets up for more accurate /proc/mounts by ensuring that
cache_generation > 0 iff space_cache is enabled.

The eigth patch is the more accurate /proc/mounts logic.

The ninth patch is a convenience kernel message that complains when we skip
changing the free space tree on remount.

The tenth patch removes the space cache v1 free space item and free space
inodes when space cache v1 is disabled (nospace_cache or space_cache=v2).

The eleventh patch stops re-creating the free space objects when we are not
using space_cache=v1

The twelfth patch fixes a lockdep failure in creating the free space tree.

changes for v7:
Rebase onto newer misc-next
Patch 1/12: No change.
Patch 2/12: No change.
Patch 3/12: New.
Patch 4/12: No change.
Patch 5/12: Moved unused cache_opt variable to patch 7.
Patch 6/12: No change.
Patch 7/12: Declare cache_opt here rather than Patch 5. Actually clear
clear_cache as in the patch description.
Patch 8/12: No change.
Patch 9/12: No change.
Patch 10/12: No change.
Patch 11/12: No change.
Patch 12/12: New.

changes for v6:
Rebase all patches on newer misc-next
Patch 1/10: Fix unnecessary re-ordering of discard resume and most of mount_rw.
Patch 2/10: No change.
Patch 3/10: No change.
Patch 4/10: Handle unclean rebase.
Patch 5/10: No change.
Patch 6/10: No change.
Patch 7/10: No change.
Patch 8/10: No change.
Patch 9/10: No change.
Patch 10/10: No change.

changes for v5:
Patch 1/10: No change.
Patch 2/10: No change.
Patch 3/10: No change.
Patch 4/10: New.
Patch 5/10: New.
Patch 6/10: Handles remounts for no<->v1. Was Patch 4.
Patch 7/10: No change. Was Patch 5
Patch 8/10: Warns/undoes flags on skipped clear as well as skipped create.
            Was Patch 6.
Patch 9/10: No change. Was Patch 7.
Patch 10/10: No change. Was Patch 8.

changes for v4:
Patch 1/8: New
Patch 2/8: New
Patch 3/8: (was Patch 1) Made much simpler by lifting of Patch 1. Simply add
           free space tree creation to lifted rw mount logic.
Patch 4/8: Split out from old Patch 2. Add an fs_info flag to avoid surprises
           when a different transaction updates the cache_generation.
Patch 5/8: Split out from old Patch 2, but no change logically.
Patch 6/8: Split out from old Patch 1. Formatting fixes.
Patch 7/8: (was Patch 3) Rely on delayed_iput running after orphan cleanup
           (setup in patch 2) to pull iputs out of mount path, per review.
Patch 8/8: No change.

changes for v3:
Patch 1/4: Change failure to warning logging.
Patch 2/4: New; fixes mount option printing.
Patch 3/4: Fix orphan inode vs. delayed iput bug, change remove function
           to take inode as a sink.
Patch 4/4: No changes.

changes for v2:
Patch 1/3: made remount _only_ work in ro->rw case, added comment.
Patch 2/3: added btrfs_ prefix to non-static function, removed bad
           whitespace.

Boris Burkov (12):
  btrfs: lift rw mount setup from mount and remount
  btrfs: cleanup all orphan inodes on ro->rw remount
  btrfs: only mark bg->needs_free_space if free space tree is on
  btrfs: create free space tree on ro->rw remount
  btrfs: clear oneshot options on mount and remount
  btrfs: clear free space tree on ro->rw remount
  btrfs: keep sb cache_generation consistent with space_cache
  btrfs: use sb state to print space_cache mount option
  btrfs: warn when remount will not change the free space tree
  btrfs: remove free space items when disabling space cache v1
  btrfs: skip space_cache v1 setup when not using it
  btrfs: fix lockdep error creating free space tree

 fs/btrfs/block-group.c      |  45 ++------
 fs/btrfs/ctree.h            |   3 +
 fs/btrfs/disk-io.c          | 205 +++++++++++++++++++++---------------
 fs/btrfs/disk-io.h          |   2 +
 fs/btrfs/free-space-cache.c | 121 +++++++++++++++++++++
 fs/btrfs/free-space-cache.h |   6 ++
 fs/btrfs/super.c            |  70 ++++++------
 fs/btrfs/transaction.c      |   2 +
 8 files changed, 294 insertions(+), 160 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v7 01/12] btrfs: lift rw mount setup from mount and remount
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
@ 2020-11-18 23:06 ` Boris Burkov
  2020-11-23 16:50   ` David Sterba
  2020-11-23 16:57   ` David Sterba
  2020-11-18 23:06 ` [PATCH v7 02/12] btrfs: cleanup all orphan inodes on ro->rw remount Boris Burkov
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 25+ messages in thread
From: Boris Burkov @ 2020-11-18 23:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Mounting rw and remounting from ro to rw naturally share invariants and
functionality which result in a correctly setup rw filesystem. Luckily,
there is even a strong unity in the code which implements them. In
mount's open_ctree, these operations mostly happen after an early return
for ro file systems, and in remount, they happen in a section devoted to
remounting ro->rw, after some remount specific validation passes.

However, there are unfortunately a few differences. There are small
deviations in the order of some of the operations, remount does not
cleanup orphan inodes in root_tree or fs_tree, remount does not create
the free space tree, and remount does not handle "one-shot" mount
options like clear_cache and uuid tree rescan.

Since we want to add building the free space tree to remount, and since
it is possible to leak orphans on a filesystem mounted as ro then
remounted rw (common for the root filesystem when booting), we would
benefit from unifying the logic between the two codepaths.

This patch only lifts the existing common functionality, and leaves a
natural path for fixing the discrepancies.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/disk-io.c | 91 ++++++++++++++++++++++++++--------------------
 fs/btrfs/disk-io.h |  1 +
 fs/btrfs/super.c   | 37 +++----------------
 3 files changed, 59 insertions(+), 70 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8c87a1caefff..3e0de4986dbc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2884,6 +2884,53 @@ static int btrfs_check_uuid_tree(struct btrfs_fs_info *fs_info)
 	return 0;
 }
 
+/*
+ * Mounting logic specific to read-write file systems. Shared by open_ctree
+ * and btrfs_remount when remounting from read-only to read-write.
+ */
+int btrfs_mount_rw(struct btrfs_fs_info *fs_info)
+{
+	int ret;
+
+	ret = btrfs_cleanup_fs_roots(fs_info);
+	if (ret)
+		goto out;
+
+	mutex_lock(&fs_info->cleaner_mutex);
+	ret = btrfs_recover_relocation(fs_info->tree_root);
+	mutex_unlock(&fs_info->cleaner_mutex);
+	if (ret < 0) {
+		btrfs_warn(fs_info, "failed to recover relocation: %d", ret);
+		goto out;
+	}
+
+	ret = btrfs_resume_balance_async(fs_info);
+	if (ret)
+		goto out;
+
+	ret = btrfs_resume_dev_replace_async(fs_info);
+	if (ret) {
+		btrfs_warn(fs_info, "failed to resume dev_replace");
+		goto out;
+	}
+
+	btrfs_qgroup_rescan_resume(fs_info);
+
+	if (!fs_info->uuid_root) {
+		btrfs_info(fs_info, "creating UUID tree");
+		ret = btrfs_create_uuid_tree(fs_info);
+		if (ret) {
+			btrfs_warn(fs_info,
+				   "failed to create the UUID tree %d",
+				   ret);
+			goto out;
+		}
+	}
+
+out:
+	return ret;
+}
+
 int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices,
 		      char *options)
 {
@@ -3290,22 +3337,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	if (ret)
 		goto fail_qgroup;
 
-	if (!sb_rdonly(sb)) {
-		ret = btrfs_cleanup_fs_roots(fs_info);
-		if (ret)
-			goto fail_qgroup;
-
-		mutex_lock(&fs_info->cleaner_mutex);
-		ret = btrfs_recover_relocation(tree_root);
-		mutex_unlock(&fs_info->cleaner_mutex);
-		if (ret < 0) {
-			btrfs_warn(fs_info, "failed to recover relocation: %d",
-					ret);
-			err = -EINVAL;
-			goto fail_qgroup;
-		}
-	}
-
 	fs_info->fs_root = btrfs_get_fs_root(fs_info, BTRFS_FS_TREE_OBJECTID, true);
 	if (IS_ERR(fs_info->fs_root)) {
 		err = PTR_ERR(fs_info->fs_root);
@@ -3358,35 +3389,17 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	}
 	up_read(&fs_info->cleanup_work_sem);
 
-	ret = btrfs_resume_balance_async(fs_info);
-	if (ret) {
-		btrfs_warn(fs_info, "failed to resume balance: %d", ret);
-		close_ctree(fs_info);
-		return ret;
-	}
-
-	ret = btrfs_resume_dev_replace_async(fs_info);
+	ret = btrfs_mount_rw(fs_info);
 	if (ret) {
-		btrfs_warn(fs_info, "failed to resume device replace: %d", ret);
 		close_ctree(fs_info);
 		return ret;
 	}
-
-	btrfs_qgroup_rescan_resume(fs_info);
 	btrfs_discard_resume(fs_info);
 
-	if (!fs_info->uuid_root) {
-		btrfs_info(fs_info, "creating UUID tree");
-		ret = btrfs_create_uuid_tree(fs_info);
-		if (ret) {
-			btrfs_warn(fs_info,
-				"failed to create the UUID tree: %d", ret);
-			close_ctree(fs_info);
-			return ret;
-		}
-	} else if (btrfs_test_opt(fs_info, RESCAN_UUID_TREE) ||
-		   fs_info->generation !=
-				btrfs_super_uuid_tree_generation(disk_super)) {
+	if (fs_info->uuid_root &&
+	    (btrfs_test_opt(fs_info, RESCAN_UUID_TREE) ||
+	     fs_info->generation !=
+			btrfs_super_uuid_tree_generation(disk_super))) {
 		btrfs_info(fs_info, "checking UUID tree");
 		ret = btrfs_check_uuid_tree(fs_info);
 		if (ret) {
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index e75ea6092942..945914d1747f 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -50,6 +50,7 @@ struct extent_buffer *btrfs_find_create_tree_block(
 						u64 bytenr, u64 owner_root,
 						int level);
 void btrfs_clean_tree_block(struct extent_buffer *buf);
+int btrfs_mount_rw(struct btrfs_fs_info *fs_info);
 int __cold open_ctree(struct super_block *sb,
 	       struct btrfs_fs_devices *fs_devices,
 	       char *options);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 6693cfc14dfd..ad5a78970389 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1878,7 +1878,6 @@ static inline void btrfs_remount_cleanup(struct btrfs_fs_info *fs_info,
 static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
-	struct btrfs_root *root = fs_info->tree_root;
 	unsigned old_flags = sb->s_flags;
 	unsigned long old_opts = fs_info->mount_opt;
 	unsigned long old_compress_type = fs_info->compress_type;
@@ -1971,39 +1970,15 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			goto restore;
 		}
 
-		ret = btrfs_cleanup_fs_roots(fs_info);
-		if (ret)
-			goto restore;
-
-		/* recover relocation */
-		mutex_lock(&fs_info->cleaner_mutex);
-		ret = btrfs_recover_relocation(root);
-		mutex_unlock(&fs_info->cleaner_mutex);
-		if (ret)
-			goto restore;
-
-		ret = btrfs_resume_balance_async(fs_info);
+		/*
+		 * NOTE: when remounting with a change that does writes, don't
+		 * put it anywhere above this point, as we are not sure to be
+		 * safe to write until we pass the above checks.
+		 */
+		ret = btrfs_mount_rw(fs_info);
 		if (ret)
 			goto restore;
 
-		ret = btrfs_resume_dev_replace_async(fs_info);
-		if (ret) {
-			btrfs_warn(fs_info, "failed to resume dev_replace");
-			goto restore;
-		}
-
-		btrfs_qgroup_rescan_resume(fs_info);
-
-		if (!fs_info->uuid_root) {
-			btrfs_info(fs_info, "creating UUID tree");
-			ret = btrfs_create_uuid_tree(fs_info);
-			if (ret) {
-				btrfs_warn(fs_info,
-					   "failed to create the UUID tree %d",
-					   ret);
-				goto restore;
-			}
-		}
 		sb->s_flags &= ~SB_RDONLY;
 
 		set_bit(BTRFS_FS_OPEN, &fs_info->flags);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 02/12] btrfs: cleanup all orphan inodes on ro->rw remount
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
  2020-11-18 23:06 ` [PATCH v7 01/12] btrfs: lift rw mount setup from mount and remount Boris Burkov
@ 2020-11-18 23:06 ` Boris Burkov
  2020-11-18 23:06 ` [PATCH v7 03/12] btrfs: only mark bg->needs_free_space if free space tree is on Boris Burkov
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Boris Burkov @ 2020-11-18 23:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

When we mount a rw file system, we clean the orphan inodes in the
filesystem trees, and also on the tree_root and fs_root. However, when
we remount a ro file system rw, we only clean the former. Move the calls
to btrfs_orphan_cleanup() on tree_root and fs_root to the shared rw
mount routine to effectively add them on ro->rw remount.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/disk-io.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3e0de4986dbc..2cf81a4e9393 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2896,6 +2896,14 @@ int btrfs_mount_rw(struct btrfs_fs_info *fs_info)
 	if (ret)
 		goto out;
 
+	down_read(&fs_info->cleanup_work_sem);
+	if ((ret = btrfs_orphan_cleanup(fs_info->fs_root)) ||
+	    (ret = btrfs_orphan_cleanup(fs_info->tree_root))) {
+		up_read(&fs_info->cleanup_work_sem);
+		goto out;
+	}
+	up_read(&fs_info->cleanup_work_sem);
+
 	mutex_lock(&fs_info->cleaner_mutex);
 	ret = btrfs_recover_relocation(fs_info->tree_root);
 	mutex_unlock(&fs_info->cleaner_mutex);
@@ -3380,15 +3388,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		}
 	}
 
-	down_read(&fs_info->cleanup_work_sem);
-	if ((ret = btrfs_orphan_cleanup(fs_info->fs_root)) ||
-	    (ret = btrfs_orphan_cleanup(fs_info->tree_root))) {
-		up_read(&fs_info->cleanup_work_sem);
-		close_ctree(fs_info);
-		return ret;
-	}
-	up_read(&fs_info->cleanup_work_sem);
-
 	ret = btrfs_mount_rw(fs_info);
 	if (ret) {
 		close_ctree(fs_info);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 03/12] btrfs: only mark bg->needs_free_space if free space tree is on
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
  2020-11-18 23:06 ` [PATCH v7 01/12] btrfs: lift rw mount setup from mount and remount Boris Burkov
  2020-11-18 23:06 ` [PATCH v7 02/12] btrfs: cleanup all orphan inodes on ro->rw remount Boris Burkov
@ 2020-11-18 23:06 ` Boris Burkov
  2020-11-18 23:06 ` [PATCH v7 04/12] btrfs: create free space tree on ro->rw remount Boris Burkov
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Boris Burkov @ 2020-11-18 23:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

If we attempt to create a free space tree while any block groups have
needs_free_space set, we will double add the new free space item
and hit EEXIST. Previously, we only created the free space tree on a new
mount, so we never hit the case, but if we try to create it on a
remount, such block groups could exist and trip us up.

We don't do anything with this field unless the free space tree is
enabled, so there is no harm in not setting it.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/block-group.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index ccc3271c20ca..00e2fe1d0f32 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2156,7 +2156,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used,
 	cache->flags = type;
 	cache->last_byte_to_unpin = (u64)-1;
 	cache->cached = BTRFS_CACHE_FINISHED;
-	cache->needs_free_space = 1;
+	if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE))
+		cache->needs_free_space = 1;
 	ret = exclude_super_stripes(cache);
 	if (ret) {
 		/* We may have excluded something, so call this just in case */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 04/12] btrfs: create free space tree on ro->rw remount
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
                   ` (2 preceding siblings ...)
  2020-11-18 23:06 ` [PATCH v7 03/12] btrfs: only mark bg->needs_free_space if free space tree is on Boris Burkov
@ 2020-11-18 23:06 ` Boris Burkov
  2020-11-18 23:06 ` [PATCH v7 05/12] btrfs: clear oneshot options on mount and remount Boris Burkov
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Boris Burkov @ 2020-11-18 23:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

When a user attempts to remount a btrfs filesystem with
'mount -o remount,space_cache=v2', that operation silently succeeds.
Unfortunately, this is misleading, because the remount does not create
the free space tree. /proc/mounts will incorrectly show space_cache=v2,
but on the next mount, the file system will revert to the old
space_cache.

For now, we handle only the easier case, where the existing mount is
read-only and the new mount is read-write. In that case, we can create
the free space tree without contending with the block groups changing
as we go.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/disk-io.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2cf81a4e9393..d934eb80ff49 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2912,6 +2912,17 @@ int btrfs_mount_rw(struct btrfs_fs_info *fs_info)
 		goto out;
 	}
 
+	if (btrfs_test_opt(fs_info, FREE_SPACE_TREE) &&
+	    !btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
+		btrfs_info(fs_info, "creating free space tree");
+		ret = btrfs_create_free_space_tree(fs_info);
+		if (ret) {
+			btrfs_warn(fs_info,
+				"failed to create free space tree: %d", ret);
+			goto out;
+		}
+	}
+
 	ret = btrfs_resume_balance_async(fs_info);
 	if (ret)
 		goto out;
@@ -3376,18 +3387,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		}
 	}
 
-	if (btrfs_test_opt(fs_info, FREE_SPACE_TREE) &&
-	    !btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
-		btrfs_info(fs_info, "creating free space tree");
-		ret = btrfs_create_free_space_tree(fs_info);
-		if (ret) {
-			btrfs_warn(fs_info,
-				"failed to create free space tree: %d", ret);
-			close_ctree(fs_info);
-			return ret;
-		}
-	}
-
 	ret = btrfs_mount_rw(fs_info);
 	if (ret) {
 		close_ctree(fs_info);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 05/12] btrfs: clear oneshot options on mount and remount
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
                   ` (3 preceding siblings ...)
  2020-11-18 23:06 ` [PATCH v7 04/12] btrfs: create free space tree on ro->rw remount Boris Burkov
@ 2020-11-18 23:06 ` Boris Burkov
  2020-11-18 23:06 ` [PATCH v7 06/12] btrfs: clear free space tree on ro->rw remount Boris Burkov
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Boris Burkov @ 2020-11-18 23:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Some options only apply during mount time and are cleared at the end
of mount. For now, the example is USEBACKUPROOT, but CLEAR_CACHE also
fits the bill, and this is a preparation patch for also clearing that
option.

One subtlety is that the current code only resets USEBACKUPROOT on rw
mounts, but the option is meaningfully "consumed" by a ro mount, so it
feels appropriate to clear in that case as well. A subsequent read-write
remount would not go through open_ctree, which is the only place that
checks the option, so the change should be benign.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/disk-io.c | 14 +++++++++++++-
 fs/btrfs/disk-io.h |  1 +
 fs/btrfs/super.c   |  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d934eb80ff49..0bc7d9766f8c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2884,6 +2884,16 @@ static int btrfs_check_uuid_tree(struct btrfs_fs_info *fs_info)
 	return 0;
 }
 
+/*
+ * Some options only have meaning at mount time and shouldn't persist across
+ * remounts, or be displayed. Clear these at the end of mount and remount
+ * code paths.
+ */
+void btrfs_clear_oneshot_options(struct btrfs_fs_info *fs_info)
+{
+	btrfs_clear_opt(fs_info->mount_opt, USEBACKUPROOT);
+}
+
 /*
  * Mounting logic specific to read-write file systems. Shared by open_ctree
  * and btrfs_remount when remounting from read-only to read-write.
@@ -3365,7 +3375,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	}
 
 	if (sb_rdonly(sb))
-		return 0;
+		goto clear_oneshot;
 
 	if (btrfs_test_opt(fs_info, CLEAR_CACHE) &&
 	    btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
@@ -3409,6 +3419,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	}
 	set_bit(BTRFS_FS_OPEN, &fs_info->flags);
 
+clear_oneshot:
+	btrfs_clear_oneshot_options(fs_info);
 	return 0;
 
 fail_qgroup:
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 945914d1747f..3bb097f22314 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -50,6 +50,7 @@ struct extent_buffer *btrfs_find_create_tree_block(
 						u64 bytenr, u64 owner_root,
 						int level);
 void btrfs_clean_tree_block(struct extent_buffer *buf);
+void btrfs_clear_oneshot_options(struct btrfs_fs_info *fs_info);
 int btrfs_mount_rw(struct btrfs_fs_info *fs_info);
 int __cold open_ctree(struct super_block *sb,
 	       struct btrfs_fs_devices *fs_devices,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index ad5a78970389..cca00cc0c98c 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1992,6 +1992,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 
 	wake_up_process(fs_info->transaction_kthread);
 	btrfs_remount_cleanup(fs_info, old_opts);
+	btrfs_clear_oneshot_options(fs_info);
 	clear_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state);
 
 	return 0;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 06/12] btrfs: clear free space tree on ro->rw remount
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
                   ` (4 preceding siblings ...)
  2020-11-18 23:06 ` [PATCH v7 05/12] btrfs: clear oneshot options on mount and remount Boris Burkov
@ 2020-11-18 23:06 ` Boris Burkov
  2020-11-30 19:49   ` David Sterba
  2020-11-18 23:06 ` [PATCH v7 07/12] btrfs: keep sb cache_generation consistent with space_cache Boris Burkov
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Boris Burkov @ 2020-11-18 23:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

A user might want to revert to v1 or nospace_cache on a root filesystem,
and much like turning on the free space tree, that can only be done
remounting from ro->rw. Support clearing the free space tree on such
mounts by moving it into the shared remount logic.

Since the CLEAR_CACHE option sticks around across remounts, this change
would result in clearing the tree for ever on every remount, which is
not desirable. To fix that, add CLEAR_CACHE to the oneshot options we
clear at mount end, which has the other bonus of not cluttering the
/proc/mounts output with clear_cache.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/disk-io.c | 43 ++++++++++++++++++++++---------------------
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0bc7d9766f8c..64e5707f008b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2892,6 +2892,7 @@ static int btrfs_check_uuid_tree(struct btrfs_fs_info *fs_info)
 void btrfs_clear_oneshot_options(struct btrfs_fs_info *fs_info)
 {
 	btrfs_clear_opt(fs_info->mount_opt, USEBACKUPROOT);
+	btrfs_clear_opt(fs_info->mount_opt, CLEAR_CACHE);
 }
 
 /*
@@ -2901,6 +2902,27 @@ void btrfs_clear_oneshot_options(struct btrfs_fs_info *fs_info)
 int btrfs_mount_rw(struct btrfs_fs_info *fs_info)
 {
 	int ret;
+	bool clear_free_space_tree = false;
+
+	if (btrfs_test_opt(fs_info, CLEAR_CACHE) &&
+	    btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
+		clear_free_space_tree = true;
+	} else if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
+		   !btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID)) {
+		btrfs_warn(fs_info, "free space tree is invalid");
+		clear_free_space_tree = true;
+	}
+
+	if (clear_free_space_tree) {
+		btrfs_info(fs_info, "clearing free space tree");
+		ret = btrfs_clear_free_space_tree(fs_info);
+		if (ret) {
+			btrfs_warn(fs_info,
+				   "failed to clear free space tree: %d", ret);
+			close_ctree(fs_info);
+			return ret;
+		}
+	}
 
 	ret = btrfs_cleanup_fs_roots(fs_info);
 	if (ret)
@@ -2975,7 +2997,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	struct btrfs_root *chunk_root;
 	int ret;
 	int err = -EINVAL;
-	int clear_free_space_tree = 0;
 	int level;
 
 	ret = init_mount_fs_info(fs_info, sb);
@@ -3377,26 +3398,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	if (sb_rdonly(sb))
 		goto clear_oneshot;
 
-	if (btrfs_test_opt(fs_info, CLEAR_CACHE) &&
-	    btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
-		clear_free_space_tree = 1;
-	} else if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
-		   !btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID)) {
-		btrfs_warn(fs_info, "free space tree is invalid");
-		clear_free_space_tree = 1;
-	}
-
-	if (clear_free_space_tree) {
-		btrfs_info(fs_info, "clearing free space tree");
-		ret = btrfs_clear_free_space_tree(fs_info);
-		if (ret) {
-			btrfs_warn(fs_info,
-				   "failed to clear free space tree: %d", ret);
-			close_ctree(fs_info);
-			return ret;
-		}
-	}
-
 	ret = btrfs_mount_rw(fs_info);
 	if (ret) {
 		close_ctree(fs_info);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 07/12] btrfs: keep sb cache_generation consistent with space_cache
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
                   ` (5 preceding siblings ...)
  2020-11-18 23:06 ` [PATCH v7 06/12] btrfs: clear free space tree on ro->rw remount Boris Burkov
@ 2020-11-18 23:06 ` Boris Burkov
  2020-11-18 23:06 ` [PATCH v7 08/12] btrfs: use sb state to print space_cache mount option Boris Burkov
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Boris Burkov @ 2020-11-18 23:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

When mounting, btrfs uses the cache_generation in the super block to
determine if space cache v1 is in use. However, by mounting with
nospace_cache or space_cache=v2, it is possible to disable space cache
v1, which does not result in un-setting cache_generation back to 0.

In order to base some logic, like mount option printing in /proc/mounts,
on the current state of the space cache rather than just the values of
the mount option, keep the value of cache_generation consistent with the
status of space cache v1.

We ensure that cache_generation > 0 iff the file system is using
space_cache v1. This requires committing a transaction on any mount
which changes whether we are using v1. (v1->nospace_cache, v1->v2,
nospace_cache->v1, v2->v1).

Since the mechanism for writing out the cache generation is transaction
commit, but we want some finer grained control over when we un-set it,
we can't just rely on the SPACE_CACHE mount option, and introduce an
fs_info flag that mount can use when it wants to unset the generation.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/ctree.h            |  3 +++
 fs/btrfs/disk-io.c          |  8 ++++++++
 fs/btrfs/free-space-cache.c | 28 ++++++++++++++++++++++++++++
 fs/btrfs/free-space-cache.h |  3 +++
 fs/btrfs/super.c            | 10 +++++++---
 fs/btrfs/transaction.c      |  2 ++
 6 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index cb90a870b235..318303f53529 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -559,6 +559,9 @@ enum {
 
 	/* Indicate that the discard workqueue can service discards. */
 	BTRFS_FS_DISCARD_RUNNING,
+
+	/* Indicate that we need to cleanup space cache v1 */
+	BTRFS_FS_CLEANUP_SPACE_CACHE_V1,
 };
 
 /*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 64e5707f008b..fea467c421e7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2902,6 +2902,7 @@ void btrfs_clear_oneshot_options(struct btrfs_fs_info *fs_info)
 int btrfs_mount_rw(struct btrfs_fs_info *fs_info)
 {
 	int ret;
+	bool cache_opt = btrfs_test_opt(fs_info, SPACE_CACHE);
 	bool clear_free_space_tree = false;
 
 	if (btrfs_test_opt(fs_info, CLEAR_CACHE) &&
@@ -2955,6 +2956,12 @@ int btrfs_mount_rw(struct btrfs_fs_info *fs_info)
 		}
 	}
 
+	if (cache_opt != btrfs_free_space_cache_v1_active(fs_info)) {
+		ret = btrfs_set_free_space_cache_v1_active(fs_info, cache_opt);
+		if (ret)
+			goto out;
+	}
+
 	ret = btrfs_resume_balance_async(fs_info);
 	if (ret)
 		goto out;
@@ -3418,6 +3425,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 			return ret;
 		}
 	}
+
 	set_bit(BTRFS_FS_OPEN, &fs_info->flags);
 
 clear_oneshot:
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 572c75d2169b..48f7bd050909 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3971,6 +3971,34 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
 	return ret;
 }
 
+bool btrfs_free_space_cache_v1_active(struct btrfs_fs_info *fs_info)
+{
+	return btrfs_super_cache_generation(fs_info->super_copy);
+}
+
+int btrfs_set_free_space_cache_v1_active(struct btrfs_fs_info *fs_info,
+					 bool active)
+{
+	struct btrfs_trans_handle *trans;
+	int ret;
+
+	/*
+	 * update_super_roots will appropriately set
+	 * fs_info->super_copy->cache_generation based on the SPACE_CACHE
+	 * option, so all we have to do is trigger a transaction commit.
+	 */
+	trans = btrfs_start_transaction(fs_info->tree_root, 0);
+	if (IS_ERR(trans))
+		return PTR_ERR(trans);
+
+	if (!active)
+		set_bit(BTRFS_FS_CLEANUP_SPACE_CACHE_V1, &fs_info->flags);
+
+	ret = btrfs_commit_transaction(trans);
+	clear_bit(BTRFS_FS_CLEANUP_SPACE_CACHE_V1, &fs_info->flags);
+	return ret;
+}
+
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 /*
  * Use this if you need to make a bitmap or extent entry specifically, it
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index bf8d127d2407..8f4bc5781cd3 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -149,6 +149,9 @@ int btrfs_trim_block_group_bitmaps(struct btrfs_block_group *block_group,
 				   u64 *trimmed, u64 start, u64 end, u64 minlen,
 				   u64 maxlen, bool async);
 
+bool btrfs_free_space_cache_v1_active(struct btrfs_fs_info *fs_info);
+int btrfs_set_free_space_cache_v1_active(struct btrfs_fs_info *fs_info,
+					 bool active);
 /* Support functions for running our sanity tests */
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 int test_add_free_space_entry(struct btrfs_block_group *cache,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index cca00cc0c98c..ff19f900cee1 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -547,7 +547,6 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 {
 	substring_t args[MAX_OPT_ARGS];
 	char *p, *num;
-	u64 cache_gen;
 	int intarg;
 	int ret = 0;
 	char *compress_type;
@@ -557,10 +556,9 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 	bool saved_compress_force;
 	int no_compress = 0;
 
-	cache_gen = btrfs_super_cache_generation(info->super_copy);
 	if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE))
 		btrfs_set_opt(info->mount_opt, FREE_SPACE_TREE);
-	else if (cache_gen)
+	else if (btrfs_free_space_cache_v1_active(info))
 		btrfs_set_opt(info->mount_opt, SPACE_CACHE);
 
 	/*
@@ -1857,6 +1855,8 @@ static inline void btrfs_remount_begin(struct btrfs_fs_info *fs_info,
 static inline void btrfs_remount_cleanup(struct btrfs_fs_info *fs_info,
 					 unsigned long old_opts)
 {
+	bool cache_opt = btrfs_test_opt(fs_info, SPACE_CACHE);
+
 	/*
 	 * We need to cleanup all defragable inodes if the autodefragment is
 	 * close or the filesystem is read only.
@@ -1873,6 +1873,10 @@ static inline void btrfs_remount_cleanup(struct btrfs_fs_info *fs_info,
 	else if (btrfs_raw_test_opt(old_opts, DISCARD_ASYNC) &&
 		 !btrfs_test_opt(fs_info, DISCARD_ASYNC))
 		btrfs_discard_cleanup(fs_info);
+
+	/* If we toggled space cache */
+	if (cache_opt != btrfs_free_space_cache_v1_active(fs_info))
+		btrfs_set_free_space_cache_v1_active(fs_info, cache_opt);
 }
 
 static int btrfs_remount(struct super_block *sb, int *flags, char *data)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 0e4063651047..43253b9996c3 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1801,6 +1801,8 @@ static void update_super_roots(struct btrfs_fs_info *fs_info)
 	super->root_level = root_item->level;
 	if (btrfs_test_opt(fs_info, SPACE_CACHE))
 		super->cache_generation = root_item->generation;
+	else if (test_bit(BTRFS_FS_CLEANUP_SPACE_CACHE_V1, &fs_info->flags))
+		super->cache_generation = 0;
 	if (test_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags))
 		super->uuid_tree_generation = root_item->generation;
 }
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 08/12] btrfs: use sb state to print space_cache mount option
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
                   ` (6 preceding siblings ...)
  2020-11-18 23:06 ` [PATCH v7 07/12] btrfs: keep sb cache_generation consistent with space_cache Boris Burkov
@ 2020-11-18 23:06 ` Boris Burkov
  2020-11-18 23:06 ` [PATCH v7 09/12] btrfs: warn when remount will not change the free space tree Boris Burkov
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Boris Burkov @ 2020-11-18 23:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

To make the contents of /proc/mounts better match the actual state of
the file system, base the display of the space cache mount options off
the contents of the super block rather than the last mount options
passed in. Since there are many scenarios where the mount will ignore a
space cache option, simply showing the passed in option is misleading.

For example, if we mount with -o remount,space_cache=v2 on a read-write
file system without an existing free space tree, we won't build a free
space tree, but /proc/mounts will read space_cache=v2 (until we mount
again and it goes away)

cache_generation is set iff space_cache=v1, FREE_SPACE_TREE is set iff
space_cache=v2, and if neither is the case, we print nospace_cache.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/super.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index ff19f900cee1..e2a186d254c5 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1474,9 +1474,9 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
 		seq_puts(seq, ",discard=async");
 	if (!(info->sb->s_flags & SB_POSIXACL))
 		seq_puts(seq, ",noacl");
-	if (btrfs_test_opt(info, SPACE_CACHE))
+	if (btrfs_free_space_cache_v1_active(info))
 		seq_puts(seq, ",space_cache");
-	else if (btrfs_test_opt(info, FREE_SPACE_TREE))
+	else if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE))
 		seq_puts(seq, ",space_cache=v2");
 	else
 		seq_puts(seq, ",nospace_cache");
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 09/12] btrfs: warn when remount will not change the free space tree
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
                   ` (7 preceding siblings ...)
  2020-11-18 23:06 ` [PATCH v7 08/12] btrfs: use sb state to print space_cache mount option Boris Burkov
@ 2020-11-18 23:06 ` Boris Burkov
  2020-11-30 20:05   ` David Sterba
  2020-11-18 23:06 ` [PATCH v7 10/12] btrfs: remove free space items when disabling space cache v1 Boris Burkov
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Boris Burkov @ 2020-11-18 23:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

If the remount is ro->ro, rw->ro, or rw->rw, we will not create or
clear the free space tree. This can be surprising, so print a warning
to dmesg to make the failure more visible. It is also important to
ensure that the space cache options (SPACE_CACHE, FREE_SPACE_TREE) are
consistent, so ensure those are set to properly match the current on
disk state (which won't be changing).

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/super.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index e2a186d254c5..5e88ae69e2e6 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1912,6 +1912,24 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 	btrfs_resize_thread_pool(fs_info,
 		fs_info->thread_pool_size, old_thread_pool_size);
 
+	if (btrfs_test_opt(fs_info, FREE_SPACE_TREE) !=
+	    btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
+	    ((!sb_rdonly(sb) || *flags & SB_RDONLY))) {
+		btrfs_warn(fs_info,
+	   "remount supports changing free space tree only from ro to rw");
+		/*
+		 * Make sure free space cache options match the state on disk
+		 */
+		if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
+			btrfs_set_opt(fs_info->mount_opt, FREE_SPACE_TREE);
+			btrfs_clear_opt(fs_info->mount_opt, SPACE_CACHE);
+		}
+		if (btrfs_free_space_cache_v1_active(fs_info)) {
+			btrfs_clear_opt(fs_info->mount_opt, FREE_SPACE_TREE);
+			btrfs_set_opt(fs_info->mount_opt, SPACE_CACHE);
+		}
+	}
+
 	if ((bool)(*flags & SB_RDONLY) == sb_rdonly(sb))
 		goto out;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 10/12] btrfs: remove free space items when disabling space cache v1
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
                   ` (8 preceding siblings ...)
  2020-11-18 23:06 ` [PATCH v7 09/12] btrfs: warn when remount will not change the free space tree Boris Burkov
@ 2020-11-18 23:06 ` Boris Burkov
  2020-11-18 23:06 ` [PATCH v7 11/12] btrfs: skip space_cache v1 setup when not using it Boris Burkov
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Boris Burkov @ 2020-11-18 23:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

When the file system transitions from space cache v1 to v2 or to
nospace_cache, it removes the old cached data, but does not remove
the FREE_SPACE items nor the free space inodes they point to. This
doesn't cause any issues besides being a bit inefficient, since these
items no longer do anything useful.

To fix it, when we are mounting, and plan to disable the space cache,
destroy each block group's free space item and free space inode.
The code to remove the items is lifted from the existing use case of
removing the block group, with a light adaptation to handle whether or
not we have already looked up the free space inode.

References: https://github.com/btrfs/btrfs-todo/issues/5
Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/block-group.c      |  39 +-------------
 fs/btrfs/free-space-cache.c | 101 ++++++++++++++++++++++++++++++++++--
 fs/btrfs/free-space-cache.h |   3 ++
 3 files changed, 102 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 00e2fe1d0f32..d5104560dfc9 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -848,8 +848,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	struct btrfs_path *path;
 	struct btrfs_block_group *block_group;
 	struct btrfs_free_cluster *cluster;
-	struct btrfs_root *tree_root = fs_info->tree_root;
-	struct btrfs_key key;
 	struct inode *inode;
 	struct kobject *kobj = NULL;
 	int ret;
@@ -927,42 +925,9 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	spin_unlock(&trans->transaction->dirty_bgs_lock);
 	mutex_unlock(&trans->transaction->cache_write_mutex);
 
-	if (!IS_ERR(inode)) {
-		ret = btrfs_orphan_add(trans, BTRFS_I(inode));
-		if (ret) {
-			btrfs_add_delayed_iput(inode);
-			goto out;
-		}
-		clear_nlink(inode);
-		/* One for the block groups ref */
-		spin_lock(&block_group->lock);
-		if (block_group->iref) {
-			block_group->iref = 0;
-			block_group->inode = NULL;
-			spin_unlock(&block_group->lock);
-			iput(inode);
-		} else {
-			spin_unlock(&block_group->lock);
-		}
-		/* One for our lookup ref */
-		btrfs_add_delayed_iput(inode);
-	}
-
-	key.objectid = BTRFS_FREE_SPACE_OBJECTID;
-	key.type = 0;
-	key.offset = block_group->start;
-
-	ret = btrfs_search_slot(trans, tree_root, &key, path, -1, 1);
-	if (ret < 0)
+	ret = btrfs_remove_free_space_inode(trans, inode, block_group);
+	if (ret)
 		goto out;
-	if (ret > 0)
-		btrfs_release_path(path);
-	if (ret == 0) {
-		ret = btrfs_del_item(trans, tree_root, path);
-		if (ret)
-			goto out;
-		btrfs_release_path(path);
-	}
 
 	spin_lock(&fs_info->block_group_cache_lock);
 	rb_erase(&block_group->cache_node,
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 48f7bd050909..3c226a436c7a 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -213,6 +213,65 @@ int create_free_space_inode(struct btrfs_trans_handle *trans,
 					 ino, block_group->start);
 }
 
+/*
+ * inode is an optional sink: if it is NULL, btrfs_remove_free_space_inode
+ * handles lookup, otherwise it takes ownership and iputs the inode.
+ * Don't reuse an inode pointer after passing it into this function.
+ */
+int btrfs_remove_free_space_inode(struct btrfs_trans_handle *trans,
+				  struct inode *inode,
+				  struct btrfs_block_group *block_group)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	int ret = 0;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	if (!inode)
+		inode = lookup_free_space_inode(block_group, path);
+	if (IS_ERR(inode)) {
+		if (PTR_ERR(inode) != -ENOENT)
+			ret = PTR_ERR(inode);
+		goto out;
+	}
+	ret = btrfs_orphan_add(trans, BTRFS_I(inode));
+	if (ret) {
+		btrfs_add_delayed_iput(inode);
+		goto out;
+	}
+	clear_nlink(inode);
+	/* One for the block groups ref */
+	spin_lock(&block_group->lock);
+	if (block_group->iref) {
+		block_group->iref = 0;
+		block_group->inode = NULL;
+		spin_unlock(&block_group->lock);
+		iput(inode);
+	} else {
+		spin_unlock(&block_group->lock);
+	}
+	/* One for the lookup ref */
+	btrfs_add_delayed_iput(inode);
+
+	key.objectid = BTRFS_FREE_SPACE_OBJECTID;
+	key.type = 0;
+	key.offset = block_group->start;
+	ret = btrfs_search_slot(trans, trans->fs_info->tree_root, &key, path,
+				-1, 1);
+	if (ret) {
+		if (ret > 0)
+			ret = 0;
+		goto out;
+	}
+	ret = btrfs_del_item(trans, trans->fs_info->tree_root, path);
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
 int btrfs_check_trunc_cache_free_space(struct btrfs_fs_info *fs_info,
 				       struct btrfs_block_rsv *rsv)
 {
@@ -3976,6 +4035,28 @@ bool btrfs_free_space_cache_v1_active(struct btrfs_fs_info *fs_info)
 	return btrfs_super_cache_generation(fs_info->super_copy);
 }
 
+static int cleanup_free_space_cache_v1(struct btrfs_fs_info *fs_info,
+				       struct btrfs_trans_handle *trans)
+{
+	struct btrfs_block_group *block_group;
+	struct rb_node *node;
+	int ret;
+
+	btrfs_info(fs_info, "cleaning free space cache v1");
+
+	node = rb_first(&fs_info->block_group_cache_tree);
+	while (node) {
+		block_group = rb_entry(node, struct btrfs_block_group,
+				       cache_node);
+		ret = btrfs_remove_free_space_inode(trans, NULL, block_group);
+		if (ret)
+			goto out;
+		node = rb_next(node);
+	}
+out:
+	return ret;
+}
+
 int btrfs_set_free_space_cache_v1_active(struct btrfs_fs_info *fs_info,
 					 bool active)
 {
@@ -3983,18 +4064,30 @@ int btrfs_set_free_space_cache_v1_active(struct btrfs_fs_info *fs_info,
 	int ret;
 
 	/*
-	 * update_super_roots will appropriately set
-	 * fs_info->super_copy->cache_generation based on the SPACE_CACHE
-	 * option, so all we have to do is trigger a transaction commit.
+	 * update_super_roots will appropriately set or unset
+	 * fs_info->super_copy->cache_generation based on SPACE_CACHE and
+	 * BTRFS_FS_CLEANUP_SPACE_CACHE_V1. For this reason, we need a
+	 * transaction commit whether we are enabling space cache v1 and don't
+	 * have any other work to do, or are disabling it and removing free
+	 * space inodes.
 	 */
 	trans = btrfs_start_transaction(fs_info->tree_root, 0);
 	if (IS_ERR(trans))
 		return PTR_ERR(trans);
 
-	if (!active)
+	if (!active) {
 		set_bit(BTRFS_FS_CLEANUP_SPACE_CACHE_V1, &fs_info->flags);
+		ret = cleanup_free_space_cache_v1(fs_info, trans);
+		if (ret)
+			goto abort;
+	}
 
 	ret = btrfs_commit_transaction(trans);
+	goto out;
+abort:
+	btrfs_abort_transaction(trans, ret);
+	btrfs_end_transaction(trans);
+out:
 	clear_bit(BTRFS_FS_CLEANUP_SPACE_CACHE_V1, &fs_info->flags);
 	return ret;
 }
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index 8f4bc5781cd3..93522d22c195 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -84,6 +84,9 @@ struct inode *lookup_free_space_inode(struct btrfs_block_group *block_group,
 int create_free_space_inode(struct btrfs_trans_handle *trans,
 			    struct btrfs_block_group *block_group,
 			    struct btrfs_path *path);
+int btrfs_remove_free_space_inode(struct btrfs_trans_handle *trans,
+				  struct inode *inode,
+				  struct btrfs_block_group *block_group);
 
 int btrfs_check_trunc_cache_free_space(struct btrfs_fs_info *fs_info,
 				       struct btrfs_block_rsv *rsv);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 11/12] btrfs: skip space_cache v1 setup when not using it
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
                   ` (9 preceding siblings ...)
  2020-11-18 23:06 ` [PATCH v7 10/12] btrfs: remove free space items when disabling space cache v1 Boris Burkov
@ 2020-11-18 23:06 ` Boris Burkov
  2020-11-18 23:06 ` [PATCH v7 12/12] btrfs: fix lockdep error creating free space tree Boris Burkov
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Boris Burkov @ 2020-11-18 23:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

If we are not using space cache v1, we should not create the free space
object or free space inodes. This comes up when we delete the existing
free space objects/inodes when migrating to v2, only to see them get
recreated for every dirtied block group.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/block-group.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index d5104560dfc9..24e3eb13f173 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2331,6 +2331,9 @@ static int cache_save_setup(struct btrfs_block_group *block_group,
 	int retries = 0;
 	int ret = 0;
 
+	if (!btrfs_test_opt(fs_info, SPACE_CACHE))
+		return 0;
+
 	/*
 	 * If this block group is smaller than 100 megs don't bother caching the
 	 * block group.
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 12/12] btrfs: fix lockdep error creating free space tree
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
                   ` (10 preceding siblings ...)
  2020-11-18 23:06 ` [PATCH v7 11/12] btrfs: skip space_cache v1 setup when not using it Boris Burkov
@ 2020-11-18 23:06 ` Boris Burkov
  2020-11-20 21:32 ` [PATCH v7 00/10] btrfs: free space tree mounting fixes David Sterba
  2020-11-30 20:33 ` David Sterba
  13 siblings, 0 replies; 25+ messages in thread
From: Boris Burkov @ 2020-11-18 23:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

A lock dependency loop exists between the root tree lock, the extent tree
lock, and the free space tree lock.

The root tree lock depends on the free space tree lock because
btrfs_create_tree holds the new tree's lock while adding it to the root
tree.

The extent tree lock depends on the root tree lock because during umount,
we write out space cache v1, which writes inodes in the root tree, which
results in holding the root tree lock while doing a lookup in the extent tree.

Finally, the free space tree depends on the extent tree because
populate_free_space_tree holds a locked path in the extent tree and then
does a lookup in the free space tree to add the new item.

The simplest of the three to break is the one during tree creation: we
unlock the leaf before inserting the tree node into the root tree, which
fixes the lockdep.

Full contents of lockdep error for reference:

[   30.480136] ======================================================
[   30.480830] WARNING: possible circular locking dependency detected
[   30.481457] 5.9.0-rc8+ #76 Not tainted
[   30.481897] ------------------------------------------------------
[   30.482500] mount/520 is trying to acquire lock:
[   30.483064] ffff9babebe03908 (btrfs-free-space-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
[   30.484054]
               but task is already holding lock:
[   30.484637] ffff9babebe24468 (btrfs-extent-01#2){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
[   30.485581]
               which lock already depends on the new lock.

[   30.486397]
               the existing dependency chain (in reverse order) is:
[   30.487205]
               -> #2 (btrfs-extent-01#2){++++}-{3:3}:
[   30.487825]        down_read_nested+0x43/0x150
[   30.488306]        __btrfs_tree_read_lock+0x39/0x180
[   30.488868]        __btrfs_read_lock_root_node+0x3a/0x50
[   30.489477]        btrfs_search_slot+0x464/0x9b0
[   30.490009]        check_committed_ref+0x59/0x1d0
[   30.490603]        btrfs_cross_ref_exist+0x65/0xb0
[   30.491108]        run_delalloc_nocow+0x405/0x930
[   30.491651]        btrfs_run_delalloc_range+0x60/0x6b0
[   30.492203]        writepage_delalloc+0xd4/0x150
[   30.492688]        __extent_writepage+0x18d/0x3a0
[   30.493199]        extent_write_cache_pages+0x2af/0x450
[   30.493743]        extent_writepages+0x34/0x70
[   30.494231]        do_writepages+0x31/0xd0
[   30.494642]        __filemap_fdatawrite_range+0xad/0xe0
[   30.495194]        btrfs_fdatawrite_range+0x1b/0x50
[   30.495677]        __btrfs_write_out_cache+0x40d/0x460
[   30.496227]        btrfs_write_out_cache+0x8b/0x110
[   30.496716]        btrfs_start_dirty_block_groups+0x211/0x4e0
[   30.497317]        btrfs_commit_transaction+0xc0/0xba0
[   30.497861]        sync_filesystem+0x71/0x90
[   30.498303]        btrfs_remount+0x81/0x433
[   30.498767]        reconfigure_super+0x9f/0x210
[   30.499261]        path_mount+0x9d1/0xa30
[   30.499722]        do_mount+0x55/0x70
[   30.500158]        __x64_sys_mount+0xc4/0xe0
[   30.500616]        do_syscall_64+0x33/0x40
[   30.501091]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   30.501629]
               -> #1 (btrfs-root-00){++++}-{3:3}:
[   30.502241]        down_read_nested+0x43/0x150
[   30.502727]        __btrfs_tree_read_lock+0x39/0x180
[   30.503291]        __btrfs_read_lock_root_node+0x3a/0x50
[   30.503903]        btrfs_search_slot+0x464/0x9b0
[   30.504405]        btrfs_insert_empty_items+0x60/0xa0
[   30.504973]        btrfs_insert_item+0x60/0xd0
[   30.505412]        btrfs_create_tree+0x1b6/0x210
[   30.505913]        btrfs_create_free_space_tree+0x54/0x110
[   30.506460]        btrfs_mount_rw+0x15d/0x20f
[   30.506937]        btrfs_remount+0x356/0x433
[   30.507369]        reconfigure_super+0x9f/0x210
[   30.507868]        path_mount+0x9d1/0xa30
[   30.508264]        do_mount+0x55/0x70
[   30.508668]        __x64_sys_mount+0xc4/0xe0
[   30.509186]        do_syscall_64+0x33/0x40
[   30.509652]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   30.510271]
               -> #0 (btrfs-free-space-00){++++}-{3:3}:
[   30.510972]        __lock_acquire+0x11ad/0x1b60
[   30.511432]        lock_acquire+0xa2/0x360
[   30.511917]        down_read_nested+0x43/0x150
[   30.512383]        __btrfs_tree_read_lock+0x39/0x180
[   30.512947]        __btrfs_read_lock_root_node+0x3a/0x50
[   30.513455]        btrfs_search_slot+0x464/0x9b0
[   30.513947]        search_free_space_info+0x45/0x90
[   30.514465]        __add_to_free_space_tree+0x92/0x39d
[   30.515010]        btrfs_create_free_space_tree.cold.22+0x1ee/0x45d
[   30.515639]        btrfs_mount_rw+0x15d/0x20f
[   30.516142]        btrfs_remount+0x356/0x433
[   30.516538]        reconfigure_super+0x9f/0x210
[   30.517065]        path_mount+0x9d1/0xa30
[   30.517438]        do_mount+0x55/0x70
[   30.517824]        __x64_sys_mount+0xc4/0xe0
[   30.518293]        do_syscall_64+0x33/0x40
[   30.518776]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   30.519335]
               other info that might help us debug this:

[   30.520210] Chain exists of:
                 btrfs-free-space-00 --> btrfs-root-00 --> btrfs-extent-01#2

[   30.521407]  Possible unsafe locking scenario:

[   30.522037]        CPU0                    CPU1
[   30.522456]        ----                    ----
[   30.522941]   lock(btrfs-extent-01#2);
[   30.523311]                                lock(btrfs-root-00);
[   30.523952]                                lock(btrfs-extent-01#2);
[   30.524620]   lock(btrfs-free-space-00);
[   30.525068]
                *** DEADLOCK ***

[   30.525669] 5 locks held by mount/520:
[   30.526116]  #0: ffff9babebc520e0 (&type->s_umount_key#37){+.+.}-{3:3}, at: path_mount+0x7ef/0xa30
[   30.527056]  #1: ffff9babebc52640 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x3d5/0x5c0
[   30.527960]  #2: ffff9babeae8f2e8 (&cache->free_space_lock#2){+.+.}-{3:3}, at: btrfs_create_free_space_tree.cold.22+0x101/0x45d
[   30.529118]  #3: ffff9babebe24468 (btrfs-extent-01#2){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
[   30.530113]  #4: ffff9babebd52eb8 (btrfs-extent-00){++++}-{3:3}, at: btrfs_try_tree_read_lock+0x16/0x100
[   30.531124]
               stack backtrace:
[   30.531528] CPU: 0 PID: 520 Comm: mount Not tainted 5.9.0-rc8+ #76
[   30.532166] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-4.module_el8.1.0+248+298dec18 04/01/2014
[   30.533215] Call Trace:
[   30.533452]  dump_stack+0x8d/0xc0
[   30.533797]  check_noncircular+0x13c/0x150
[   30.534233]  __lock_acquire+0x11ad/0x1b60
[   30.534667]  lock_acquire+0xa2/0x360
[   30.535063]  ? __btrfs_tree_read_lock+0x39/0x180
[   30.535525]  down_read_nested+0x43/0x150
[   30.535939]  ? __btrfs_tree_read_lock+0x39/0x180
[   30.536400]  __btrfs_tree_read_lock+0x39/0x180
[   30.536862]  __btrfs_read_lock_root_node+0x3a/0x50
[   30.537304]  btrfs_search_slot+0x464/0x9b0
[   30.537713]  ? trace_hardirqs_on+0x1c/0xf0
[   30.538148]  search_free_space_info+0x45/0x90
[   30.538572]  __add_to_free_space_tree+0x92/0x39d
[   30.539071]  ? printk+0x48/0x4a
[   30.539367]  btrfs_create_free_space_tree.cold.22+0x1ee/0x45d
[   30.539972]  btrfs_mount_rw+0x15d/0x20f
[   30.540350]  btrfs_remount+0x356/0x433
[   30.540773]  ? shrink_dcache_sb+0xd9/0x100
[   30.541203]  reconfigure_super+0x9f/0x210
[   30.541642]  path_mount+0x9d1/0xa30
[   30.542040]  do_mount+0x55/0x70
[   30.542366]  __x64_sys_mount+0xc4/0xe0
[   30.542822]  do_syscall_64+0x33/0x40
[   30.543197]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   30.543691] RIP: 0033:0x7f109f7ab93a
[   30.544053] Code: 48 8b 0d 49 e5 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 16 e5 0b 00 f7 d8 64 89 01 48
[   30.546042] RSP: 002b:00007ffc47c4f858 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[   30.546770] RAX: ffffffffffffffda RBX: 00007f109f8cf264 RCX: 00007f109f7ab93a
[   30.547485] RDX: 0000557e6fc10770 RSI: 0000557e6fc19cf0 RDI: 0000557e6fc19cd0
[   30.548185] RBP: 0000557e6fc10520 R08: 0000557e6fc18e30 R09: 0000557e6fc18cb0
[   30.548911] R10: 0000000000200020 R11: 0000000000000246 R12: 0000000000000000
[   30.549606] R13: 0000557e6fc19cd0 R14: 0000557e6fc10770 R15: 0000557e6fc10520

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/disk-io.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index fea467c421e7..1934bf6a99da 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1142,7 +1142,7 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
 	if (IS_ERR(leaf)) {
 		ret = PTR_ERR(leaf);
 		leaf = NULL;
-		goto fail;
+		goto fail_unlock;
 	}
 
 	root->node = leaf;
@@ -1166,6 +1166,8 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
 		export_guid(root->root_item.uuid, &guid_null);
 	btrfs_set_root_drop_level(&root->root_item, 0);
 
+	btrfs_tree_unlock(leaf);
+
 	key.objectid = objectid;
 	key.type = BTRFS_ROOT_ITEM_KEY;
 	key.offset = 0;
@@ -1173,13 +1175,12 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
 	if (ret)
 		goto fail;
 
-	btrfs_tree_unlock(leaf);
-
 	return root;
 
-fail:
+fail_unlock:
 	if (leaf)
 		btrfs_tree_unlock(leaf);
+fail:
 	btrfs_put_root(root);
 
 	return ERR_PTR(ret);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 00/10] btrfs: free space tree mounting fixes
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
                   ` (11 preceding siblings ...)
  2020-11-18 23:06 ` [PATCH v7 12/12] btrfs: fix lockdep error creating free space tree Boris Burkov
@ 2020-11-20 21:32 ` David Sterba
  2020-11-30 18:24   ` Boris Burkov
  2020-11-30 20:33 ` David Sterba
  13 siblings, 1 reply; 25+ messages in thread
From: David Sterba @ 2020-11-20 21:32 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, kernel-team

On Wed, Nov 18, 2020 at 03:06:15PM -0800, Boris Burkov wrote:
> This patch set cleans up issues surrounding enabling and disabling various
> free space cache features and their reporting in /proc/mounts.  Because the
> improvements became somewhat complex, the series starts by lifting rw mount
> logic into a single place.
> 
> The first patch is a setup patch that unifies very similar logic between a
> normal rw mount and a ro->rw remount. This is a useful setup step for adding
> more functionality to ro->rw remounts.
> 
> The second patch fixes the omission of orphan inode cleanup on a few trees
> during ro->rw remount.
> 
> The third patch stops marking block groups with need_free_space when the
> free space tree is not yet enabled.
> 
> The fourth patch adds enabling the free space tree to ro->rw remount.
> 
> The fifth patch adds a method for clearing oneshot mount options after mount.
> 
> The sixth patch adds support for clearing the free space tree on ro->rw remount.
> 
> The seventh patch sets up for more accurate /proc/mounts by ensuring that
> cache_generation > 0 iff space_cache is enabled.
> 
> The eigth patch is the more accurate /proc/mounts logic.
> 
> The ninth patch is a convenience kernel message that complains when we skip
> changing the free space tree on remount.
> 
> The tenth patch removes the space cache v1 free space item and free space
> inodes when space cache v1 is disabled (nospace_cache or space_cache=v2).
> 
> The eleventh patch stops re-creating the free space objects when we are not
> using space_cache=v1
> 
> The twelfth patch fixes a lockdep failure in creating the free space tree.

Is this fixing a problem caused by some patches in this series? Because
if yes, the fix should be folded there. A standalone patch makese sense
in case we can't fold it there (eg. after merging to Linus' tree),
otherwise the merged patchsets should be made of complete patches,
without fixes-to-fixes. Even if the patchset is in misc-next, fixups are
still doable.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 01/12] btrfs: lift rw mount setup from mount and remount
  2020-11-18 23:06 ` [PATCH v7 01/12] btrfs: lift rw mount setup from mount and remount Boris Burkov
@ 2020-11-23 16:50   ` David Sterba
  2020-11-30 23:31     ` Boris Burkov
  2020-11-23 16:57   ` David Sterba
  1 sibling, 1 reply; 25+ messages in thread
From: David Sterba @ 2020-11-23 16:50 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, kernel-team

On Wed, Nov 18, 2020 at 03:06:16PM -0800, Boris Burkov wrote:
> Mounting rw and remounting from ro to rw naturally share invariants and
> functionality which result in a correctly setup rw filesystem. Luckily,
> there is even a strong unity in the code which implements them. In
> mount's open_ctree, these operations mostly happen after an early return
> for ro file systems, and in remount, they happen in a section devoted to
> remounting ro->rw, after some remount specific validation passes.
> 
> However, there are unfortunately a few differences. There are small
> deviations in the order of some of the operations, remount does not
> cleanup orphan inodes in root_tree or fs_tree, remount does not create
> the free space tree, and remount does not handle "one-shot" mount
> options like clear_cache and uuid tree rescan.
> 
> Since we want to add building the free space tree to remount, and since
> it is possible to leak orphans on a filesystem mounted as ro then
> remounted rw

The statement is not specific if the orphans are files or roots. But I
don't agree that a leak is possible, or need a proof of the claim above.

The mount-time orphan cleanup will start early, but otherwise orphan
cleanup is checked and started on dentry lookups (btrfs_lookup_dentry).
Deleted but not clened tree roorts are all found and removed, regardless
of rw or ro->rw mount.

So I wonder if you claim there's a leak just by lack of an explicit call
on the remount path.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 01/12] btrfs: lift rw mount setup from mount and remount
  2020-11-18 23:06 ` [PATCH v7 01/12] btrfs: lift rw mount setup from mount and remount Boris Burkov
  2020-11-23 16:50   ` David Sterba
@ 2020-11-23 16:57   ` David Sterba
  2020-12-01  0:01     ` Boris Burkov
  1 sibling, 1 reply; 25+ messages in thread
From: David Sterba @ 2020-11-23 16:57 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, kernel-team

On Wed, Nov 18, 2020 at 03:06:16PM -0800, Boris Burkov wrote:
> +/*
> + * Mounting logic specific to read-write file systems. Shared by open_ctree
> + * and btrfs_remount when remounting from read-only to read-write.
> + */
> +int btrfs_mount_rw(struct btrfs_fs_info *fs_info)

The function could be named better, to reflect that it's starting some
operations prior to the rw mount. As its now it would read as "mount the
filesystem as read-write", we already have 'btrfs_mount' as callback for
mount.

As this is a cosmetic fix it's not blocking the patchset but I'd like to
have that fixed for the final version. I don't have a suggestion for
now.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 00/10] btrfs: free space tree mounting fixes
  2020-11-20 21:32 ` [PATCH v7 00/10] btrfs: free space tree mounting fixes David Sterba
@ 2020-11-30 18:24   ` Boris Burkov
  2020-11-30 20:29     ` David Sterba
  0 siblings, 1 reply; 25+ messages in thread
From: Boris Burkov @ 2020-11-30 18:24 UTC (permalink / raw)
  To: dsterba, linux-btrfs, kernel-team

On Fri, Nov 20, 2020 at 10:32:22PM +0100, David Sterba wrote:
> On Wed, Nov 18, 2020 at 03:06:15PM -0800, Boris Burkov wrote:
> > This patch set cleans up issues surrounding enabling and disabling various
> > free space cache features and their reporting in /proc/mounts.  Because the
> > improvements became somewhat complex, the series starts by lifting rw mount
> > logic into a single place.
> > 
> > The first patch is a setup patch that unifies very similar logic between a
> > normal rw mount and a ro->rw remount. This is a useful setup step for adding
> > more functionality to ro->rw remounts.
> > 
> > The second patch fixes the omission of orphan inode cleanup on a few trees
> > during ro->rw remount.
> > 
> > The third patch stops marking block groups with need_free_space when the
> > free space tree is not yet enabled.
> > 
> > The fourth patch adds enabling the free space tree to ro->rw remount.
> > 
> > The fifth patch adds a method for clearing oneshot mount options after mount.
> > 
> > The sixth patch adds support for clearing the free space tree on ro->rw remount.
> > 
> > The seventh patch sets up for more accurate /proc/mounts by ensuring that
> > cache_generation > 0 iff space_cache is enabled.
> > 
> > The eigth patch is the more accurate /proc/mounts logic.
> > 
> > The ninth patch is a convenience kernel message that complains when we skip
> > changing the free space tree on remount.
> > 
> > The tenth patch removes the space cache v1 free space item and free space
> > inodes when space cache v1 is disabled (nospace_cache or space_cache=v2).
> > 
> > The eleventh patch stops re-creating the free space objects when we are not
> > using space_cache=v1
> > 
> > The twelfth patch fixes a lockdep failure in creating the free space tree.
> 
> Is this fixing a problem caused by some patches in this series? Because
> if yes, the fix should be folded there. A standalone patch makese sense
> in case we can't fold it there (eg. after merging to Linus' tree),
> otherwise the merged patchsets should be made of complete patches,
> without fixes-to-fixes. Even if the patchset is in misc-next, fixups are
> still doable.

The new 'needs_free_space' patch (#3) fixes the bug in this series that
you caught, so if I understand correctly, I should appropriately fold
that one into one of the existing patches.

The lockdep issue exists in misc-next as far as I can tell, so I think a
standalone patch makes sense.

Let me know if I misunderstood you on either of those.

Thanks,
Boris

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 06/12] btrfs: clear free space tree on ro->rw remount
  2020-11-18 23:06 ` [PATCH v7 06/12] btrfs: clear free space tree on ro->rw remount Boris Burkov
@ 2020-11-30 19:49   ` David Sterba
  0 siblings, 0 replies; 25+ messages in thread
From: David Sterba @ 2020-11-30 19:49 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, kernel-team

On Wed, Nov 18, 2020 at 03:06:21PM -0800, Boris Burkov wrote:
> A user might want to revert to v1 or nospace_cache on a root filesystem,
> and much like turning on the free space tree, that can only be done
> remounting from ro->rw. Support clearing the free space tree on such
> mounts by moving it into the shared remount logic.
> 
> Since the CLEAR_CACHE option sticks around across remounts, this change
> would result in clearing the tree for ever on every remount, which is
> not desirable. To fix that, add CLEAR_CACHE to the oneshot options we
> clear at mount end, which has the other bonus of not cluttering the
> /proc/mounts output with clear_cache.
> 
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>  fs/btrfs/disk-io.c | 43 ++++++++++++++++++++++---------------------
>  1 file changed, 22 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 0bc7d9766f8c..64e5707f008b 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2892,6 +2892,7 @@ static int btrfs_check_uuid_tree(struct btrfs_fs_info *fs_info)
>  void btrfs_clear_oneshot_options(struct btrfs_fs_info *fs_info)
>  {
>  	btrfs_clear_opt(fs_info->mount_opt, USEBACKUPROOT);
> +	btrfs_clear_opt(fs_info->mount_opt, CLEAR_CACHE);
>  }
>  
>  /*
> @@ -2901,6 +2902,27 @@ void btrfs_clear_oneshot_options(struct btrfs_fs_info *fs_info)
>  int btrfs_mount_rw(struct btrfs_fs_info *fs_info)
>  {
>  	int ret;
> +	bool clear_free_space_tree = false;
> +
> +	if (btrfs_test_opt(fs_info, CLEAR_CACHE) &&
> +	    btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
> +		clear_free_space_tree = true;
> +	} else if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
> +		   !btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID)) {
> +		btrfs_warn(fs_info, "free space tree is invalid");
> +		clear_free_space_tree = true;
> +	}
> +
> +	if (clear_free_space_tree) {
> +		btrfs_info(fs_info, "clearing free space tree");
> +		ret = btrfs_clear_free_space_tree(fs_info);
> +		if (ret) {
> +			btrfs_warn(fs_info,
> +				   "failed to clear free space tree: %d", ret);
> +			close_ctree(fs_info);

This is probably a copy&paste error, this was originally in open_ctree
that calls close_ctree after errors, but here it's not supposed to be
called.

> +			return ret;
> +		}
> +	}
>  
>  	ret = btrfs_cleanup_fs_roots(fs_info);
>  	if (ret)
> @@ -2975,7 +2997,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>  	struct btrfs_root *chunk_root;
>  	int ret;
>  	int err = -EINVAL;
> -	int clear_free_space_tree = 0;
>  	int level;
>  
>  	ret = init_mount_fs_info(fs_info, sb);
> @@ -3377,26 +3398,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>  	if (sb_rdonly(sb))
>  		goto clear_oneshot;
>  
> -	if (btrfs_test_opt(fs_info, CLEAR_CACHE) &&
> -	    btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
> -		clear_free_space_tree = 1;
> -	} else if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
> -		   !btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID)) {
> -		btrfs_warn(fs_info, "free space tree is invalid");
> -		clear_free_space_tree = 1;
> -	}
> -
> -	if (clear_free_space_tree) {
> -		btrfs_info(fs_info, "clearing free space tree");
> -		ret = btrfs_clear_free_space_tree(fs_info);
> -		if (ret) {
> -			btrfs_warn(fs_info,
> -				   "failed to clear free space tree: %d", ret);
> -			close_ctree(fs_info);
> -			return ret;
> -		}
> -	}
> -
>  	ret = btrfs_mount_rw(fs_info);
>  	if (ret) {
>  		close_ctree(fs_info);
> -- 
> 2.24.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 09/12] btrfs: warn when remount will not change the free space tree
  2020-11-18 23:06 ` [PATCH v7 09/12] btrfs: warn when remount will not change the free space tree Boris Burkov
@ 2020-11-30 20:05   ` David Sterba
  0 siblings, 0 replies; 25+ messages in thread
From: David Sterba @ 2020-11-30 20:05 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, kernel-team

On Wed, Nov 18, 2020 at 03:06:24PM -0800, Boris Burkov wrote:
> If the remount is ro->ro, rw->ro, or rw->rw, we will not create or
> clear the free space tree. This can be surprising, so print a warning
> to dmesg to make the failure more visible. It is also important to
> ensure that the space cache options (SPACE_CACHE, FREE_SPACE_TREE) are
> consistent, so ensure those are set to properly match the current on
> disk state (which won't be changing).
> 
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>  fs/btrfs/super.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index e2a186d254c5..5e88ae69e2e6 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1912,6 +1912,24 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
>  	btrfs_resize_thread_pool(fs_info,
>  		fs_info->thread_pool_size, old_thread_pool_size);
>  
> +	if (btrfs_test_opt(fs_info, FREE_SPACE_TREE) !=
> +	    btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
> +	    ((!sb_rdonly(sb) || *flags & SB_RDONLY))) {

	(!sb_rdonly(sb) || (*flags & SB_RDONLY))) {

Ie. the parens around the & operator, not (( )) around the whole
expression.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 00/10] btrfs: free space tree mounting fixes
  2020-11-30 18:24   ` Boris Burkov
@ 2020-11-30 20:29     ` David Sterba
  0 siblings, 0 replies; 25+ messages in thread
From: David Sterba @ 2020-11-30 20:29 UTC (permalink / raw)
  To: Boris Burkov; +Cc: dsterba, linux-btrfs, kernel-team

On Mon, Nov 30, 2020 at 10:24:16AM -0800, Boris Burkov wrote:
> On Fri, Nov 20, 2020 at 10:32:22PM +0100, David Sterba wrote:
> > On Wed, Nov 18, 2020 at 03:06:15PM -0800, Boris Burkov wrote:
> > Is this fixing a problem caused by some patches in this series? Because
> > if yes, the fix should be folded there. A standalone patch makese sense
> > in case we can't fold it there (eg. after merging to Linus' tree),
> > otherwise the merged patchsets should be made of complete patches,
> > without fixes-to-fixes. Even if the patchset is in misc-next, fixups are
> > still doable.
> 
> The new 'needs_free_space' patch (#3) fixes the bug in this series that
> you caught, so if I understand correctly, I should appropriately fold
> that one into one of the existing patches.

I'll fold it but which one, 1 is only refactoring code and 2 starting
orphan cleanup, so I think it's 2. I have run the test on each commit
with the same test steps as be fore but can't reproduce it (in a VM,
previously it was another box).

> The lockdep issue exists in misc-next as far as I can tell, so I think a
> standalone patch makes sense.

Ok.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 00/10] btrfs: free space tree mounting fixes
  2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
                   ` (12 preceding siblings ...)
  2020-11-20 21:32 ` [PATCH v7 00/10] btrfs: free space tree mounting fixes David Sterba
@ 2020-11-30 20:33 ` David Sterba
  13 siblings, 0 replies; 25+ messages in thread
From: David Sterba @ 2020-11-30 20:33 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, kernel-team

On Wed, Nov 18, 2020 at 03:06:15PM -0800, Boris Burkov wrote:
> This patch set cleans up issues surrounding enabling and disabling various
> free space cache features and their reporting in /proc/mounts.  Because the
> improvements became somewhat complex, the series starts by lifting rw mount
> logic into a single place.
> 
> Boris Burkov (12):
>   btrfs: lift rw mount setup from mount and remount
>   btrfs: cleanup all orphan inodes on ro->rw remount
>   btrfs: only mark bg->needs_free_space if free space tree is on
>   btrfs: create free space tree on ro->rw remount
>   btrfs: clear oneshot options on mount and remount
>   btrfs: clear free space tree on ro->rw remount
>   btrfs: keep sb cache_generation consistent with space_cache
>   btrfs: use sb state to print space_cache mount option
>   btrfs: warn when remount will not change the free space tree
>   btrfs: remove free space items when disabling space cache v1
>   btrfs: skip space_cache v1 setup when not using it
>   btrfs: fix lockdep error creating free space tree

Moved from topic branch to misc-next, with some more fixups. I can do
some fixups until the end of this week. We need test coverage for
various ro->rw and v1/v2/no transitions, I'm aware you sent something
for fstests but haven't looked so far. The whole core for the v1->v2
conversion is there, from now on please send fixes as new patches.
Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 01/12] btrfs: lift rw mount setup from mount and remount
  2020-11-23 16:50   ` David Sterba
@ 2020-11-30 23:31     ` Boris Burkov
  2020-12-15 18:01       ` David Sterba
  0 siblings, 1 reply; 25+ messages in thread
From: Boris Burkov @ 2020-11-30 23:31 UTC (permalink / raw)
  To: dsterba, linux-btrfs, kernel-team

On Mon, Nov 23, 2020 at 05:50:40PM +0100, David Sterba wrote:
> On Wed, Nov 18, 2020 at 03:06:16PM -0800, Boris Burkov wrote:
> > Mounting rw and remounting from ro to rw naturally share invariants and
> > functionality which result in a correctly setup rw filesystem. Luckily,
> > there is even a strong unity in the code which implements them. In
> > mount's open_ctree, these operations mostly happen after an early return
> > for ro file systems, and in remount, they happen in a section devoted to
> > remounting ro->rw, after some remount specific validation passes.
> > 
> > However, there are unfortunately a few differences. There are small
> > deviations in the order of some of the operations, remount does not
> > cleanup orphan inodes in root_tree or fs_tree, remount does not create
> > the free space tree, and remount does not handle "one-shot" mount
> > options like clear_cache and uuid tree rescan.
> > 
> > Since we want to add building the free space tree to remount, and since
> > it is possible to leak orphans on a filesystem mounted as ro then
> > remounted rw
> 
> The statement is not specific if the orphans are files or roots. But I
> don't agree that a leak is possible, or need a proof of the claim above.
> 
> The mount-time orphan cleanup will start early, but otherwise orphan
> cleanup is checked and started on dentry lookups (btrfs_lookup_dentry).
> Deleted but not clened tree roorts are all found and removed, regardless
> of rw or ro->rw mount.
> 
> So I wonder if you claim there's a leak just by lack of an explicit call
> on the remount path.

For what it's worth, the example I had in mind is the free space inode
orphans after a block_group delete or the new "clear v1 space cache"
code in this stack.

I hadn't considered btrfs_lookup_dentry because I was focused on those
specific inodes, but it's possible that gets called in a way that would
clean them too.

However, another thing I think I overlooked is that it doesn't look
like remount would affect the set of delayed_iputs, so that mechanism for
removing the orphans should still work. Further, the new function only
runs when going from ro->rw, but any ro mount would run delayed iputs
before completing as part of btrfs_commit_super.

So with all that, I agree with you that there isn't a leak. Going
forward with this, I can certainly fix the commit messages, or even get
rid of the patch that does the orphan cleanup in remount. I can't think
of a reason that the cleanup would be bad, but on the other hand, just
"unity" is a flimsy justification for adding it. Let me know what you
prefer.

Thanks for the review,
Boris

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 01/12] btrfs: lift rw mount setup from mount and remount
  2020-11-23 16:57   ` David Sterba
@ 2020-12-01  0:01     ` Boris Burkov
  2020-12-15 17:50       ` David Sterba
  0 siblings, 1 reply; 25+ messages in thread
From: Boris Burkov @ 2020-12-01  0:01 UTC (permalink / raw)
  To: dsterba, linux-btrfs, kernel-team

On Mon, Nov 23, 2020 at 05:57:04PM +0100, David Sterba wrote:
> On Wed, Nov 18, 2020 at 03:06:16PM -0800, Boris Burkov wrote:
> > +/*
> > + * Mounting logic specific to read-write file systems. Shared by open_ctree
> > + * and btrfs_remount when remounting from read-only to read-write.
> > + */
> > +int btrfs_mount_rw(struct btrfs_fs_info *fs_info)
> 
> The function could be named better, to reflect that it's starting some
> operations prior to the rw mount. As its now it would read as "mount the
> filesystem as read-write", we already have 'btrfs_mount' as callback for
> mount.
> 
> As this is a cosmetic fix it's not blocking the patchset but I'd like to
> have that fixed for the final version. I don't have a suggestion for
> now.

I agree, the name implies a function which does a rw mount end to end.

Some alternative ideas:
btrfs_prepare_rw_mount
btrfs_setup_rw_mount
btrfs_handle_rw_mount
btrfs_process_rw_mount

I think I personally like 'prepare' the most out of those.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 01/12] btrfs: lift rw mount setup from mount and remount
  2020-12-01  0:01     ` Boris Burkov
@ 2020-12-15 17:50       ` David Sterba
  0 siblings, 0 replies; 25+ messages in thread
From: David Sterba @ 2020-12-15 17:50 UTC (permalink / raw)
  To: Boris Burkov; +Cc: dsterba, linux-btrfs, kernel-team

On Mon, Nov 30, 2020 at 04:01:38PM -0800, Boris Burkov wrote:
> On Mon, Nov 23, 2020 at 05:57:04PM +0100, David Sterba wrote:
> > On Wed, Nov 18, 2020 at 03:06:16PM -0800, Boris Burkov wrote:
> > > +/*
> > > + * Mounting logic specific to read-write file systems. Shared by open_ctree
> > > + * and btrfs_remount when remounting from read-only to read-write.
> > > + */
> > > +int btrfs_mount_rw(struct btrfs_fs_info *fs_info)
> > 
> > The function could be named better, to reflect that it's starting some
> > operations prior to the rw mount. As its now it would read as "mount the
> > filesystem as read-write", we already have 'btrfs_mount' as callback for
> > mount.
> > 
> > As this is a cosmetic fix it's not blocking the patchset but I'd like to
> > have that fixed for the final version. I don't have a suggestion for
> > now.
> 
> I agree, the name implies a function which does a rw mount end to end.
> 
> Some alternative ideas:
> btrfs_prepare_rw_mount
> btrfs_setup_rw_mount
> btrfs_handle_rw_mount
> btrfs_process_rw_mount
> 
> I think I personally like 'prepare' the most out of those.

I had not noticed your reply when I committed the patch, the function is
btrfs_start_pre_rw_mount. btrfs_prepare_rw_mount would be also good, we
can rename it eventually if it needed.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 01/12] btrfs: lift rw mount setup from mount and remount
  2020-11-30 23:31     ` Boris Burkov
@ 2020-12-15 18:01       ` David Sterba
  0 siblings, 0 replies; 25+ messages in thread
From: David Sterba @ 2020-12-15 18:01 UTC (permalink / raw)
  To: Boris Burkov; +Cc: dsterba, linux-btrfs, kernel-team

On Mon, Nov 30, 2020 at 03:31:42PM -0800, Boris Burkov wrote:
> On Mon, Nov 23, 2020 at 05:50:40PM +0100, David Sterba wrote:
> > On Wed, Nov 18, 2020 at 03:06:16PM -0800, Boris Burkov wrote:
> > > Mounting rw and remounting from ro to rw naturally share invariants and
> > > functionality which result in a correctly setup rw filesystem. Luckily,
> > > there is even a strong unity in the code which implements them. In
> > > mount's open_ctree, these operations mostly happen after an early return
> > > for ro file systems, and in remount, they happen in a section devoted to
> > > remounting ro->rw, after some remount specific validation passes.
> > > 
> > > However, there are unfortunately a few differences. There are small
> > > deviations in the order of some of the operations, remount does not
> > > cleanup orphan inodes in root_tree or fs_tree, remount does not create
> > > the free space tree, and remount does not handle "one-shot" mount
> > > options like clear_cache and uuid tree rescan.
> > > 
> > > Since we want to add building the free space tree to remount, and since
> > > it is possible to leak orphans on a filesystem mounted as ro then
> > > remounted rw
> > 
> > The statement is not specific if the orphans are files or roots. But I
> > don't agree that a leak is possible, or need a proof of the claim above.
> > 
> > The mount-time orphan cleanup will start early, but otherwise orphan
> > cleanup is checked and started on dentry lookups (btrfs_lookup_dentry).
> > Deleted but not clened tree roorts are all found and removed, regardless
> > of rw or ro->rw mount.
> > 
> > So I wonder if you claim there's a leak just by lack of an explicit call
> > on the remount path.
> 
> For what it's worth, the example I had in mind is the free space inode
> orphans after a block_group delete or the new "clear v1 space cache"
> code in this stack.
> 
> I hadn't considered btrfs_lookup_dentry because I was focused on those
> specific inodes, but it's possible that gets called in a way that would
> clean them too.
> 
> However, another thing I think I overlooked is that it doesn't look
> like remount would affect the set of delayed_iputs, so that mechanism for
> removing the orphans should still work. Further, the new function only
> runs when going from ro->rw, but any ro mount would run delayed iputs
> before completing as part of btrfs_commit_super.
> 
> So with all that, I agree with you that there isn't a leak. Going
> forward with this, I can certainly fix the commit messages, or even get
> rid of the patch that does the orphan cleanup in remount. I can't think
> of a reason that the cleanup would be bad, but on the other hand, just
> "unity" is a flimsy justification for adding it. Let me know what you
> prefer.

Thanks for checking, the committed changelog does not contain 'leak' and
I slighthly rephrased only that sentence.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-12-15 18:03 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-18 23:06 [PATCH v7 00/10] btrfs: free space tree mounting fixes Boris Burkov
2020-11-18 23:06 ` [PATCH v7 01/12] btrfs: lift rw mount setup from mount and remount Boris Burkov
2020-11-23 16:50   ` David Sterba
2020-11-30 23:31     ` Boris Burkov
2020-12-15 18:01       ` David Sterba
2020-11-23 16:57   ` David Sterba
2020-12-01  0:01     ` Boris Burkov
2020-12-15 17:50       ` David Sterba
2020-11-18 23:06 ` [PATCH v7 02/12] btrfs: cleanup all orphan inodes on ro->rw remount Boris Burkov
2020-11-18 23:06 ` [PATCH v7 03/12] btrfs: only mark bg->needs_free_space if free space tree is on Boris Burkov
2020-11-18 23:06 ` [PATCH v7 04/12] btrfs: create free space tree on ro->rw remount Boris Burkov
2020-11-18 23:06 ` [PATCH v7 05/12] btrfs: clear oneshot options on mount and remount Boris Burkov
2020-11-18 23:06 ` [PATCH v7 06/12] btrfs: clear free space tree on ro->rw remount Boris Burkov
2020-11-30 19:49   ` David Sterba
2020-11-18 23:06 ` [PATCH v7 07/12] btrfs: keep sb cache_generation consistent with space_cache Boris Burkov
2020-11-18 23:06 ` [PATCH v7 08/12] btrfs: use sb state to print space_cache mount option Boris Burkov
2020-11-18 23:06 ` [PATCH v7 09/12] btrfs: warn when remount will not change the free space tree Boris Burkov
2020-11-30 20:05   ` David Sterba
2020-11-18 23:06 ` [PATCH v7 10/12] btrfs: remove free space items when disabling space cache v1 Boris Burkov
2020-11-18 23:06 ` [PATCH v7 11/12] btrfs: skip space_cache v1 setup when not using it Boris Burkov
2020-11-18 23:06 ` [PATCH v7 12/12] btrfs: fix lockdep error creating free space tree Boris Burkov
2020-11-20 21:32 ` [PATCH v7 00/10] btrfs: free space tree mounting fixes David Sterba
2020-11-30 18:24   ` Boris Burkov
2020-11-30 20:29     ` David Sterba
2020-11-30 20:33 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.