All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool
@ 2020-06-16  2:17 Qu Wenruo
  2020-06-16  2:17 ` [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots Qu Wenruo
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Qu Wenruo @ 2020-06-16  2:17 UTC (permalink / raw)
  To: linux-btrfs

There is a user report about transaction abort with a rare error number,
-EMFILE:

  ------------[ cut here ]------------
  BTRFS: Transaction aborted (error -24)
  WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs]
  RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs]
  Call Trace:
   create_pending_snapshots+0x82/0xa0 [btrfs]
   btrfs_commit_transaction+0x275/0x8c0 [btrfs]
   btrfs_mksubvol+0x4b9/0x500 [btrfs]
   btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs]
   btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
   btrfs_ioctl+0x11a4/0x2da0 [btrfs]
   do_vfs_ioctl+0xa9/0x640
   ksys_ioctl+0x67/0x90
   __x64_sys_ioctl+0x1a/0x20
   do_syscall_64+0x5a/0x110
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ---[ end trace 33f2f83f3d5250e9 ]---
  BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown
  BTRFS info (device sda1): forced readonly
  BTRFS warning (device sda1): Skipping commit of aborted transaction.
  BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown

It turns out that, it's caused by exhausted anonymous block device pool.
For that case, we don't have any good way to enlarge that pool to match
btrfs subvolume limits.

But still we can improve the situation not to abort transaction, but
report error gracefully to user.

This patchset will:
- Reduce the user of anonymous block device pool
  Although it can only reduce the user by data reloc tree and orphan
  roots, it should still save save several ids.

- Catch the unintialized btrfs_root::anon_dev for user visible roots
  Just an extra safe net for previous patch

- Preallocate anon_dev for snapshot/subvolume creation
  So user will get a graceful error other than transaction abort
  This patch should be the core of the patchset, as the remaining part
  won't have much user visible effect.

- Free anon_dev earlier to reduce the lifespan of anon_dev

Qu Wenruo (4):
  btrfs: disk-io: don't allocate anonymous block device for user
    invisible roots
  btrfs: detect uninitialized btrfs_root::anon_dev for user visible
    subvolumes
  btrfs: preallocate anon_dev for subvolume and snapshot creation
  btrfs: free anon_dev earlier to prevent exhausting anonymous block
    device pool

 fs/btrfs/disk-io.c     | 59 +++++++++++++++++++++++++++++++++++++-----
 fs/btrfs/disk-io.h     |  2 ++
 fs/btrfs/inode.c       | 11 +++++++-
 fs/btrfs/ioctl.c       | 21 ++++++++++++++-
 fs/btrfs/transaction.c |  3 ++-
 fs/btrfs/transaction.h |  2 ++
 6 files changed, 88 insertions(+), 10 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots
  2020-06-16  2:17 [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool Qu Wenruo
@ 2020-06-16  2:17 ` Qu Wenruo
  2020-06-16 19:21   ` Josef Bacik
  2020-06-16  2:17 ` [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes Qu Wenruo
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Qu Wenruo @ 2020-06-16  2:17 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Greed Rong

[BUG]
When a lot of subvolumes are created, there is a user report about
transaction aborted:

  ------------[ cut here ]------------
  BTRFS: Transaction aborted (error -24)
  WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs]
  RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs]
  Call Trace:
   create_pending_snapshots+0x82/0xa0 [btrfs]
   btrfs_commit_transaction+0x275/0x8c0 [btrfs]
   btrfs_mksubvol+0x4b9/0x500 [btrfs]
   btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs]
   btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
   btrfs_ioctl+0x11a4/0x2da0 [btrfs]
   do_vfs_ioctl+0xa9/0x640
   ksys_ioctl+0x67/0x90
   __x64_sys_ioctl+0x1a/0x20
   do_syscall_64+0x5a/0x110
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ---[ end trace 33f2f83f3d5250e9 ]---
  BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown
  BTRFS info (device sda1): forced readonly
  BTRFS warning (device sda1): Skipping commit of aborted transaction.
  BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown

[CAUSE]
The root cause is we don't have unlimited resource for anonymous block
device number.
The anonymous block device pool only contains 1<<20 devices, and is
shared across a several fses, like ceph and overlayfs.

While btrfs has support for 1<<48 subvolumes, so it's just a problem of
time to hit such limit.

[WORKAROUND]
Since it's not possible to completely solve the problem, we can only
workaround it.

Firstly, we can reduce the user of anon_dev. Data reloc tree is not visible
to users, thus it doesn't need anon_dev at all.

This patch will do extra check on root objectid, to rule out roots who
don't need anon_dev.
Although currently it's only data reloc tree and orphan roots.

Reported-by: Greed Rong <greedrong@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c70d47b8090a..cfc0ff288238 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1428,9 +1428,17 @@ static int btrfs_init_fs_root(struct btrfs_root *root)
 	spin_lock_init(&root->ino_cache_lock);
 	init_waitqueue_head(&root->ino_cache_wait);
 
-	ret = get_anon_bdev(&root->anon_dev);
-	if (ret)
-		goto fail;
+	/*
+	 * Anonymous block device pool has limited size (1M), which is way
+	 * smaller than btrfs subvolumes limits (1<<48).
+	 * We shouldn't allocate any if it's not a user visible subvolume.
+	 */
+	if (is_fstree(root->root_key.objectid) &&
+	    btrfs_root_refs(&root->root_item)) {
+		ret = get_anon_bdev(&root->anon_dev);
+		if (ret)
+			goto fail;
+	}
 
 	mutex_lock(&root->objectid_mutex);
 	ret = btrfs_find_highest_objectid(root,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes
  2020-06-16  2:17 [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool Qu Wenruo
  2020-06-16  2:17 ` [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots Qu Wenruo
@ 2020-06-16  2:17 ` Qu Wenruo
  2020-06-16 19:25   ` Josef Bacik
  2020-06-16  2:17 ` [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation Qu Wenruo
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Qu Wenruo @ 2020-06-16  2:17 UTC (permalink / raw)
  To: linux-btrfs

btrfs_root::anon_dev is an indicator for different subvolume namespaces.
Thus it should always be initialized to an anonymous block device.

Add a safe net to catch such uninitialized values.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 00647e8cf2df..195aac71fe32 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8697,6 +8697,7 @@ static int btrfs_getattr(const struct path *path, struct kstat *stat,
 {
 	u64 delalloc_bytes;
 	struct inode *inode = d_inode(path->dentry);
+	struct btrfs_root *root = BTRFS_I(inode)->root;
 	u32 blocksize = inode->i_sb->s_blocksize;
 	u32 bi_flags = BTRFS_I(inode)->flags;
 
@@ -8718,7 +8719,13 @@ static int btrfs_getattr(const struct path *path, struct kstat *stat,
 				  STATX_ATTR_NODUMP);
 
 	generic_fillattr(inode, stat);
-	stat->dev = BTRFS_I(inode)->root->anon_dev;
+	if (unlikely(!root->anon_dev)) {
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		btrfs_warn(root->fs_info,
+		"uninitialized btrfs_root::anon_dev detected, root=%llu",
+			   root->root_key.objectid);
+	}
+	stat->dev = root->anon_dev;
 
 	spin_lock(&BTRFS_I(inode)->lock);
 	delalloc_bytes = BTRFS_I(inode)->new_delalloc_bytes;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation
  2020-06-16  2:17 [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool Qu Wenruo
  2020-06-16  2:17 ` [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots Qu Wenruo
  2020-06-16  2:17 ` [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes Qu Wenruo
@ 2020-06-16  2:17 ` Qu Wenruo
  2020-06-16 15:10   ` David Sterba
  2020-06-16  2:17 ` [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool Qu Wenruo
  2020-06-30 14:14 ` [PATCH 0/4] btrfs: workaround exhausted " David Sterba
  4 siblings, 1 reply; 27+ messages in thread
From: Qu Wenruo @ 2020-06-16  2:17 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Greed Rong

[BUG]
When a lot of subvolumes are created, there is a user report about
transaction aborted:

  ------------[ cut here ]------------
  BTRFS: Transaction aborted (error -24)
  WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs]
  RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs]
  Call Trace:
   create_pending_snapshots+0x82/0xa0 [btrfs]
   btrfs_commit_transaction+0x275/0x8c0 [btrfs]
   btrfs_mksubvol+0x4b9/0x500 [btrfs]
   btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs]
   btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
   btrfs_ioctl+0x11a4/0x2da0 [btrfs]
   do_vfs_ioctl+0xa9/0x640
   ksys_ioctl+0x67/0x90
   __x64_sys_ioctl+0x1a/0x20
   do_syscall_64+0x5a/0x110
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ---[ end trace 33f2f83f3d5250e9 ]---
  BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown
  BTRFS info (device sda1): forced readonly
  BTRFS warning (device sda1): Skipping commit of aborted transaction.
  BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown

[CAUSE]
When the global anonymous block device pool is exhausted, the following
call chain will fail, and lead to transaction abort:

 btrfs_ioctl_snap_create_v2()
 |- btrfs_ioctl_snap_create_transid()
    |- btrfs_mksubvol()
       |- btrfs_commit_transaction()
          |- create_pending_snapshot()
             |- btrfs_get_fs_root()
                |- btrfs_init_fs_root()
                   |- get_anon_bdev()

[FIX]
Although we can't enlarge the anonymous block device pool, at least we
can preallocate anon_dev for subvolume/snapshot creation.
So that when the pool is exhausted, user will get an error other than
aborting transaction later.

Reported-by: Greed Rong <greedrong@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c     | 51 ++++++++++++++++++++++++++++++++++++------
 fs/btrfs/disk-io.h     |  2 ++
 fs/btrfs/ioctl.c       | 21 ++++++++++++++++-
 fs/btrfs/transaction.c |  3 ++-
 fs/btrfs/transaction.h |  2 ++
 5 files changed, 70 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index cfc0ff288238..14fd69b71cb8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1395,7 +1395,7 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
 	goto out;
 }
 
-static int btrfs_init_fs_root(struct btrfs_root *root)
+static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
 {
 	int ret;
 	unsigned int nofs_flag;
@@ -1435,9 +1435,13 @@ static int btrfs_init_fs_root(struct btrfs_root *root)
 	 */
 	if (is_fstree(root->root_key.objectid) &&
 	    btrfs_root_refs(&root->root_item)) {
-		ret = get_anon_bdev(&root->anon_dev);
-		if (ret)
-			goto fail;
+		if (!anon_dev) {
+			ret = get_anon_bdev(&root->anon_dev);
+			if (ret)
+				goto fail;
+		} else {
+			root->anon_dev = anon_dev;
+		}
 	}
 
 	mutex_lock(&root->objectid_mutex);
@@ -1542,8 +1546,27 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
 }
 
 
-struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
-				     u64 objectid, bool check_ref)
+/*
+ * Get a fs root.
+ *
+ * For essential trees like root/extent tree, we grab it from fs_info directly.
+ * For subvolume trees, we check the cached fs roots first. If miss then
+ * read it from disk and add it to cached fs roots.
+ *
+ * Caller should release the root by calling btrfs_put_root() after the usage.
+ *
+ * NOTE: Reloc and log trees can't be read by this function as they share the
+ *	 same root objectid.
+ *
+ * @objectid:	Root (subvolume) id
+ * @anon_dev:	Preallocated anonymous block device number for new roots.
+ * 		Pass 0 for automatic allocation.
+ * @check_ref:	Whether to check root refs. If true, return -ENOENT for orphan
+ * 		roots.
+ */
+static struct btrfs_root *__get_fs_root(struct btrfs_fs_info *fs_info,
+					u64 objectid, dev_t anon_dev,
+					bool check_ref)
 {
 	struct btrfs_root *root;
 	struct btrfs_path *path;
@@ -1572,6 +1595,8 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
 again:
 	root = btrfs_lookup_fs_root(fs_info, objectid);
 	if (root) {
+		/* Shouldn't get preallocated anon_dev for cached roots */
+		ASSERT(!anon_dev);
 		if (check_ref && btrfs_root_refs(&root->root_item) == 0) {
 			btrfs_put_root(root);
 			return ERR_PTR(-ENOENT);
@@ -1591,7 +1616,7 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
 		goto fail;
 	}
 
-	ret = btrfs_init_fs_root(root);
+	ret = btrfs_init_fs_root(root, anon_dev);
 	if (ret)
 		goto fail;
 
@@ -1624,6 +1649,18 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
 	return ERR_PTR(ret);
 }
 
+struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
+				     u64 objectid, bool check_ref)
+{
+	return __get_fs_root(fs_info, objectid, 0, check_ref);
+}
+
+struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
+					 u64 objectid, dev_t anon_dev)
+{
+	return __get_fs_root(fs_info, objectid, anon_dev, true);
+}
+
 static int btrfs_congested_fn(void *congested_data, int bdi_bits)
 {
 	struct btrfs_fs_info *info = (struct btrfs_fs_info *)congested_data;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index bf43245406c4..00dc39d47ed3 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -67,6 +67,8 @@ void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info);
 
 struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
 				     u64 objectid, bool check_ref);
+struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
+					 u64 objectid, dev_t anon_dev);
 
 void btrfs_free_fs_info(struct btrfs_fs_info *fs_info);
 int btrfs_cleanup_fs_roots(struct btrfs_fs_info *fs_info);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 168deb8ef68a..644bfe53ff61 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -566,6 +566,7 @@ static noinline int create_subvol(struct inode *dir,
 	struct inode *inode;
 	int ret;
 	int err;
+	dev_t anon_dev = 0;
 	u64 objectid;
 	u64 new_dirid = BTRFS_FIRST_FREE_OBJECTID;
 	u64 index = 0;
@@ -578,6 +579,10 @@ static noinline int create_subvol(struct inode *dir,
 	if (ret)
 		goto fail_free;
 
+	ret = get_anon_bdev(&anon_dev);
+	if (ret < 0)
+		goto fail_free;
+
 	/*
 	 * Don't create subvolume whose level is not zero. Or qgroup will be
 	 * screwed up since it assumes subvolume qgroup's level to be 0.
@@ -660,12 +665,15 @@ static noinline int create_subvol(struct inode *dir,
 		goto fail;
 
 	key.offset = (u64)-1;
-	new_root = btrfs_get_fs_root(fs_info, objectid, true);
+	new_root = btrfs_get_new_fs_root(fs_info, objectid, anon_dev);
 	if (IS_ERR(new_root)) {
+		free_anon_bdev(anon_dev);
 		ret = PTR_ERR(new_root);
 		btrfs_abort_transaction(trans, ret);
 		goto fail;
 	}
+	/* Freeing would be done in btrfs_put_root() of @new_root */
+	anon_dev = 0;
 
 	btrfs_record_root_in_trans(trans, new_root);
 
@@ -735,6 +743,8 @@ static noinline int create_subvol(struct inode *dir,
 	return ret;
 
 fail_free:
+	if (anon_dev)
+		free_anon_bdev(anon_dev);
 	kfree(root_item);
 	return ret;
 }
@@ -762,6 +772,9 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
 	if (!pending_snapshot)
 		return -ENOMEM;
 
+	ret = get_anon_bdev(&pending_snapshot->anon_dev);
+	if (ret < 0)
+		goto free_pending;
 	pending_snapshot->root_item = kzalloc(sizeof(struct btrfs_root_item),
 			GFP_KERNEL);
 	pending_snapshot->path = btrfs_alloc_path();
@@ -823,10 +836,16 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
 
 	d_instantiate(dentry, inode);
 	ret = 0;
+	pending_snapshot->anon_dev = 0;
 fail:
+	/* Prevent doulbe freeing of anon_dev */
+	if (ret && pending_snapshot->snap)
+		pending_snapshot->snap->anon_dev = 0;
 	btrfs_put_root(pending_snapshot->snap);
 	btrfs_subvolume_release_metadata(fs_info, &pending_snapshot->block_rsv);
 free_pending:
+	if (pending_snapshot->anon_dev)
+		free_anon_bdev(pending_snapshot->anon_dev);
 	kfree(pending_snapshot->root_item);
 	btrfs_free_path(pending_snapshot->path);
 	kfree(pending_snapshot);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index b359d4b17658..48f3db64cd40 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1630,7 +1630,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	}
 
 	key.offset = (u64)-1;
-	pending->snap = btrfs_get_fs_root(fs_info, objectid, true);
+	pending->snap = btrfs_get_new_fs_root(fs_info, objectid,
+					      pending->anon_dev);
 	if (IS_ERR(pending->snap)) {
 		ret = PTR_ERR(pending->snap);
 		btrfs_abort_transaction(trans, ret);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index bf102e64bfb2..a122a712f5cc 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -151,6 +151,8 @@ struct btrfs_pending_snapshot {
 	struct btrfs_block_rsv block_rsv;
 	/* extra metadata reservation for relocation */
 	int error;
+	/* Preallocated anonymous block device number */
+	dev_t anon_dev;
 	bool readonly;
 	struct list_head list;
 };
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool
  2020-06-16  2:17 [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool Qu Wenruo
                   ` (2 preceding siblings ...)
  2020-06-16  2:17 ` [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation Qu Wenruo
@ 2020-06-16  2:17 ` Qu Wenruo
  2020-06-16 19:23   ` Josef Bacik
  2020-06-30 14:14 ` [PATCH 0/4] btrfs: workaround exhausted " David Sterba
  4 siblings, 1 reply; 27+ messages in thread
From: Qu Wenruo @ 2020-06-16  2:17 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Greed Rong

[BUG]
When a lot of subvolumes are created, there is a user report about
transaction aborted:

  ------------[ cut here ]------------
  BTRFS: Transaction aborted (error -24)
  WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs]
  RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs]
  Call Trace:
   create_pending_snapshots+0x82/0xa0 [btrfs]
   btrfs_commit_transaction+0x275/0x8c0 [btrfs]
   btrfs_mksubvol+0x4b9/0x500 [btrfs]
   btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs]
   btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
   btrfs_ioctl+0x11a4/0x2da0 [btrfs]
   do_vfs_ioctl+0xa9/0x640
   ksys_ioctl+0x67/0x90
   __x64_sys_ioctl+0x1a/0x20
   do_syscall_64+0x5a/0x110
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ---[ end trace 33f2f83f3d5250e9 ]---
  BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown
  BTRFS info (device sda1): forced readonly
  BTRFS warning (device sda1): Skipping commit of aborted transaction.
  BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown

[CAUSE]
The root cause is we don't have unlimited resource for anonymous block
device number.
The anonymous block device pool only contains 1<<20 devices, and is
shared across a several fses, like ceph and overlayfs.

While btrfs has support for 1<<48 subvolumes, so it's just a problem of
time to hit such limit.

[WORKAROUND]
Here we can free btrfs_root::anon_dev as long as the subvolume is no
longer visible to users.

By freeing it earlier we reclaim the anon_dev quicker, hopefully to
reduce the chance of exhausting the pool.

Reported-by: Greed Rong <greedrong@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 195aac71fe32..e9f5a166d056 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4045,6 +4045,8 @@ int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry)
 		}
 	}
 
+	free_anon_bdev(dest->anon_dev);
+	dest->anon_dev = 0;
 out_end_trans:
 	trans->block_rsv = NULL;
 	trans->bytes_reserved = 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation
  2020-06-16  2:17 ` [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation Qu Wenruo
@ 2020-06-16 15:10   ` David Sterba
  2020-06-16 22:54     ` Qu Wenruo
  2020-07-01  3:25     ` Qu Wenruo
  0 siblings, 2 replies; 27+ messages in thread
From: David Sterba @ 2020-06-16 15:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Greed Rong

On Tue, Jun 16, 2020 at 10:17:36AM +0800, Qu Wenruo wrote:
> [BUG]
> When a lot of subvolumes are created, there is a user report about
> transaction aborted:
> 
>   ------------[ cut here ]------------
>   BTRFS: Transaction aborted (error -24)
>   WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs]
>   RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs]
>   Call Trace:
>    create_pending_snapshots+0x82/0xa0 [btrfs]
>    btrfs_commit_transaction+0x275/0x8c0 [btrfs]
>    btrfs_mksubvol+0x4b9/0x500 [btrfs]
>    btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs]
>    btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
>    btrfs_ioctl+0x11a4/0x2da0 [btrfs]
>    do_vfs_ioctl+0xa9/0x640
>    ksys_ioctl+0x67/0x90
>    __x64_sys_ioctl+0x1a/0x20
>    do_syscall_64+0x5a/0x110
>    entry_SYSCALL_64_after_hwframe+0x44/0xa9
>   ---[ end trace 33f2f83f3d5250e9 ]---
>   BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown
>   BTRFS info (device sda1): forced readonly
>   BTRFS warning (device sda1): Skipping commit of aborted transaction.
>   BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown
> 
> [CAUSE]
> When the global anonymous block device pool is exhausted, the following
> call chain will fail, and lead to transaction abort:
> 
>  btrfs_ioctl_snap_create_v2()
>  |- btrfs_ioctl_snap_create_transid()
>     |- btrfs_mksubvol()
>        |- btrfs_commit_transaction()
>           |- create_pending_snapshot()
>              |- btrfs_get_fs_root()
>                 |- btrfs_init_fs_root()
>                    |- get_anon_bdev()
> 
> [FIX]
> Although we can't enlarge the anonymous block device pool, at least we
> can preallocate anon_dev for subvolume/snapshot creation.
> So that when the pool is exhausted, user will get an error other than
> aborting transaction later.
> 
> Reported-by: Greed Rong <greedrong@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/disk-io.c     | 51 ++++++++++++++++++++++++++++++++++++------
>  fs/btrfs/disk-io.h     |  2 ++
>  fs/btrfs/ioctl.c       | 21 ++++++++++++++++-
>  fs/btrfs/transaction.c |  3 ++-
>  fs/btrfs/transaction.h |  2 ++
>  5 files changed, 70 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index cfc0ff288238..14fd69b71cb8 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1395,7 +1395,7 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
>  	goto out;
>  }
>  
> -static int btrfs_init_fs_root(struct btrfs_root *root)
> +static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
>  {
>  	int ret;
>  	unsigned int nofs_flag;
> @@ -1435,9 +1435,13 @@ static int btrfs_init_fs_root(struct btrfs_root *root)
>  	 */
>  	if (is_fstree(root->root_key.objectid) &&
>  	    btrfs_root_refs(&root->root_item)) {
> -		ret = get_anon_bdev(&root->anon_dev);
> -		if (ret)
> -			goto fail;
> +		if (!anon_dev) {
> +			ret = get_anon_bdev(&root->anon_dev);
> +			if (ret)
> +				goto fail;
> +		} else {
> +			root->anon_dev = anon_dev;
> +		}
>  	}
>  
>  	mutex_lock(&root->objectid_mutex);
> @@ -1542,8 +1546,27 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
>  }
>  
>  
> -struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
> -				     u64 objectid, bool check_ref)
> +/*
> + * Get a fs root.
> + *
> + * For essential trees like root/extent tree, we grab it from fs_info directly.
> + * For subvolume trees, we check the cached fs roots first. If miss then
> + * read it from disk and add it to cached fs roots.
> + *
> + * Caller should release the root by calling btrfs_put_root() after the usage.
> + *
> + * NOTE: Reloc and log trees can't be read by this function as they share the
> + *	 same root objectid.
> + *
> + * @objectid:	Root (subvolume) id
> + * @anon_dev:	Preallocated anonymous block device number for new roots.
> + * 		Pass 0 for automatic allocation.
> + * @check_ref:	Whether to check root refs. If true, return -ENOENT for orphan
> + * 		roots.
> + */
> +static struct btrfs_root *__get_fs_root(struct btrfs_fs_info *fs_info,
> +					u64 objectid, dev_t anon_dev,
> +					bool check_ref)


> +struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
> +				     u64 objectid, bool check_ref)
> +{
> +	return __get_fs_root(fs_info, objectid, 0, check_ref);
> +}
> +
> +struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
> +					 u64 objectid, dev_t anon_dev)
> +{
> +	return __get_fs_root(fs_info, objectid, anon_dev, true);
> +}

This does not look like a good API, we should keep btrfs_get_fs_root and
add the anon_bdev initialization to the callers, there are only a few.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots
  2020-06-16  2:17 ` [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots Qu Wenruo
@ 2020-06-16 19:21   ` Josef Bacik
  0 siblings, 0 replies; 27+ messages in thread
From: Josef Bacik @ 2020-06-16 19:21 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Greed Rong

On 6/15/20 10:17 PM, Qu Wenruo wrote:
> [BUG]
> When a lot of subvolumes are created, there is a user report about
> transaction aborted:
> 
>    ------------[ cut here ]------------
>    BTRFS: Transaction aborted (error -24)
>    WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs]
>    RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs]
>    Call Trace:
>     create_pending_snapshots+0x82/0xa0 [btrfs]
>     btrfs_commit_transaction+0x275/0x8c0 [btrfs]
>     btrfs_mksubvol+0x4b9/0x500 [btrfs]
>     btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs]
>     btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
>     btrfs_ioctl+0x11a4/0x2da0 [btrfs]
>     do_vfs_ioctl+0xa9/0x640
>     ksys_ioctl+0x67/0x90
>     __x64_sys_ioctl+0x1a/0x20
>     do_syscall_64+0x5a/0x110
>     entry_SYSCALL_64_after_hwframe+0x44/0xa9
>    ---[ end trace 33f2f83f3d5250e9 ]---
>    BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown
>    BTRFS info (device sda1): forced readonly
>    BTRFS warning (device sda1): Skipping commit of aborted transaction.
>    BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown
> 
> [CAUSE]
> The root cause is we don't have unlimited resource for anonymous block
> device number.
> The anonymous block device pool only contains 1<<20 devices, and is
> shared across a several fses, like ceph and overlayfs.
> 
> While btrfs has support for 1<<48 subvolumes, so it's just a problem of
> time to hit such limit.
> 
> [WORKAROUND]
> Since it's not possible to completely solve the problem, we can only
> workaround it.
> 
> Firstly, we can reduce the user of anon_dev. Data reloc tree is not visible
> to users, thus it doesn't need anon_dev at all.
> 
> This patch will do extra check on root objectid, to rule out roots who
> don't need anon_dev.
> Although currently it's only data reloc tree and orphan roots.
> 
> Reported-by: Greed Rong <greedrong@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool
  2020-06-16  2:17 ` [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool Qu Wenruo
@ 2020-06-16 19:23   ` Josef Bacik
  2020-06-16 22:48     ` David Sterba
  0 siblings, 1 reply; 27+ messages in thread
From: Josef Bacik @ 2020-06-16 19:23 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Greed Rong

On 6/15/20 10:17 PM, Qu Wenruo wrote:
> [BUG]
> When a lot of subvolumes are created, there is a user report about
> transaction aborted:
> 
>    ------------[ cut here ]------------
>    BTRFS: Transaction aborted (error -24)
>    WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs]
>    RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs]
>    Call Trace:
>     create_pending_snapshots+0x82/0xa0 [btrfs]
>     btrfs_commit_transaction+0x275/0x8c0 [btrfs]
>     btrfs_mksubvol+0x4b9/0x500 [btrfs]
>     btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs]
>     btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
>     btrfs_ioctl+0x11a4/0x2da0 [btrfs]
>     do_vfs_ioctl+0xa9/0x640
>     ksys_ioctl+0x67/0x90
>     __x64_sys_ioctl+0x1a/0x20
>     do_syscall_64+0x5a/0x110
>     entry_SYSCALL_64_after_hwframe+0x44/0xa9
>    ---[ end trace 33f2f83f3d5250e9 ]---
>    BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown
>    BTRFS info (device sda1): forced readonly
>    BTRFS warning (device sda1): Skipping commit of aborted transaction.
>    BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown
> 
> [CAUSE]
> The root cause is we don't have unlimited resource for anonymous block
> device number.
> The anonymous block device pool only contains 1<<20 devices, and is
> shared across a several fses, like ceph and overlayfs.
> 
> While btrfs has support for 1<<48 subvolumes, so it's just a problem of
> time to hit such limit.
> 
> [WORKAROUND]
> Here we can free btrfs_root::anon_dev as long as the subvolume is no
> longer visible to users.
> 
> By freeing it earlier we reclaim the anon_dev quicker, hopefully to
> reduce the chance of exhausting the pool.
> 

Why isn't this happening as part of the root teardown once all the references to 
it are gone?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes
  2020-06-16  2:17 ` [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes Qu Wenruo
@ 2020-06-16 19:25   ` Josef Bacik
  2020-06-16 22:49     ` Qu Wenruo
  0 siblings, 1 reply; 27+ messages in thread
From: Josef Bacik @ 2020-06-16 19:25 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 6/15/20 10:17 PM, Qu Wenruo wrote:
> btrfs_root::anon_dev is an indicator for different subvolume namespaces.
> Thus it should always be initialized to an anonymous block device.
> 
> Add a safe net to catch such uninitialized values.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Can we handle stat->dev not having a device set?  Or will this blow up in other 
ways?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool
  2020-06-16 19:23   ` Josef Bacik
@ 2020-06-16 22:48     ` David Sterba
  2020-06-16 23:31       ` Josef Bacik
  0 siblings, 1 reply; 27+ messages in thread
From: David Sterba @ 2020-06-16 22:48 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Qu Wenruo, linux-btrfs, Greed Rong

On Tue, Jun 16, 2020 at 03:23:03PM -0400, Josef Bacik wrote:
> > By freeing it earlier we reclaim the anon_dev quicker, hopefully to
> > reduce the chance of exhausting the pool.
> 
> Why isn't this happening as part of the root teardown once all the references to 
> it are gone?  Thanks,

This is where it happens now and is correct. The problem is that deleted
subvolumes keep the id allocated until they are cleaned up, ie. all the
dead roots consume the id though we don't need it anymore. Creating and
deleting snapshots at large will produce a long list of dead subvolumes.
THis patch will return the ids at the earlies possible moment so they
can get reused.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes
  2020-06-16 19:25   ` Josef Bacik
@ 2020-06-16 22:49     ` Qu Wenruo
  2020-06-16 23:32       ` Josef Bacik
  0 siblings, 1 reply; 27+ messages in thread
From: Qu Wenruo @ 2020-06-16 22:49 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 746 bytes --]



On 2020/6/17 上午3:25, Josef Bacik wrote:
> On 6/15/20 10:17 PM, Qu Wenruo wrote:
>> btrfs_root::anon_dev is an indicator for different subvolume namespaces.
>> Thus it should always be initialized to an anonymous block device.
>>
>> Add a safe net to catch such uninitialized values.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> Can we handle stat->dev not having a device set?  Or will this blow up
> in other ways?  Thanks,

We can handle it without any problem, just users get confused.

As a common practice, we use different bdev as a namespace for different
subvolumes.
Without a valid bdev, some user space tools may not be able to
distinguish inodes in different subvolumes.

Thanks,
Qu
> 
> Josef


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation
  2020-06-16 15:10   ` David Sterba
@ 2020-06-16 22:54     ` Qu Wenruo
  2020-07-01  3:25     ` Qu Wenruo
  1 sibling, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2020-06-16 22:54 UTC (permalink / raw)
  To: dsterba, linux-btrfs, Greed Rong



On 2020/6/16 下午11:10, David Sterba wrote:
> On Tue, Jun 16, 2020 at 10:17:36AM +0800, Qu Wenruo wrote:
>> [BUG]
>> When a lot of subvolumes are created, there is a user report about
>> transaction aborted:
>>
>>   ------------[ cut here ]------------
>>   BTRFS: Transaction aborted (error -24)
>>   WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs]
>>   RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs]
>>   Call Trace:
>>    create_pending_snapshots+0x82/0xa0 [btrfs]
>>    btrfs_commit_transaction+0x275/0x8c0 [btrfs]
>>    btrfs_mksubvol+0x4b9/0x500 [btrfs]
>>    btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs]
>>    btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
>>    btrfs_ioctl+0x11a4/0x2da0 [btrfs]
>>    do_vfs_ioctl+0xa9/0x640
>>    ksys_ioctl+0x67/0x90
>>    __x64_sys_ioctl+0x1a/0x20
>>    do_syscall_64+0x5a/0x110
>>    entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>   ---[ end trace 33f2f83f3d5250e9 ]---
>>   BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown
>>   BTRFS info (device sda1): forced readonly
>>   BTRFS warning (device sda1): Skipping commit of aborted transaction.
>>   BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown
>>
>> [CAUSE]
>> When the global anonymous block device pool is exhausted, the following
>> call chain will fail, and lead to transaction abort:
>>
>>  btrfs_ioctl_snap_create_v2()
>>  |- btrfs_ioctl_snap_create_transid()
>>     |- btrfs_mksubvol()
>>        |- btrfs_commit_transaction()
>>           |- create_pending_snapshot()
>>              |- btrfs_get_fs_root()
>>                 |- btrfs_init_fs_root()
>>                    |- get_anon_bdev()
>>
>> [FIX]
>> Although we can't enlarge the anonymous block device pool, at least we
>> can preallocate anon_dev for subvolume/snapshot creation.
>> So that when the pool is exhausted, user will get an error other than
>> aborting transaction later.
>>
>> Reported-by: Greed Rong <greedrong@gmail.com>
>> Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/disk-io.c     | 51 ++++++++++++++++++++++++++++++++++++------
>>  fs/btrfs/disk-io.h     |  2 ++
>>  fs/btrfs/ioctl.c       | 21 ++++++++++++++++-
>>  fs/btrfs/transaction.c |  3 ++-
>>  fs/btrfs/transaction.h |  2 ++
>>  5 files changed, 70 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index cfc0ff288238..14fd69b71cb8 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1395,7 +1395,7 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
>>  	goto out;
>>  }
>>  
>> -static int btrfs_init_fs_root(struct btrfs_root *root)
>> +static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
>>  {
>>  	int ret;
>>  	unsigned int nofs_flag;
>> @@ -1435,9 +1435,13 @@ static int btrfs_init_fs_root(struct btrfs_root *root)
>>  	 */
>>  	if (is_fstree(root->root_key.objectid) &&
>>  	    btrfs_root_refs(&root->root_item)) {
>> -		ret = get_anon_bdev(&root->anon_dev);
>> -		if (ret)
>> -			goto fail;
>> +		if (!anon_dev) {
>> +			ret = get_anon_bdev(&root->anon_dev);
>> +			if (ret)
>> +				goto fail;
>> +		} else {
>> +			root->anon_dev = anon_dev;
>> +		}
>>  	}
>>  
>>  	mutex_lock(&root->objectid_mutex);
>> @@ -1542,8 +1546,27 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
>>  }
>>  
>>  
>> -struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
>> -				     u64 objectid, bool check_ref)
>> +/*
>> + * Get a fs root.
>> + *
>> + * For essential trees like root/extent tree, we grab it from fs_info directly.
>> + * For subvolume trees, we check the cached fs roots first. If miss then
>> + * read it from disk and add it to cached fs roots.
>> + *
>> + * Caller should release the root by calling btrfs_put_root() after the usage.
>> + *
>> + * NOTE: Reloc and log trees can't be read by this function as they share the
>> + *	 same root objectid.
>> + *
>> + * @objectid:	Root (subvolume) id
>> + * @anon_dev:	Preallocated anonymous block device number for new roots.
>> + * 		Pass 0 for automatic allocation.
>> + * @check_ref:	Whether to check root refs. If true, return -ENOENT for orphan
>> + * 		roots.
>> + */
>> +static struct btrfs_root *__get_fs_root(struct btrfs_fs_info *fs_info,
>> +					u64 objectid, dev_t anon_dev,
>> +					bool check_ref)
> 
> 
>> +struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
>> +				     u64 objectid, bool check_ref)
>> +{
>> +	return __get_fs_root(fs_info, objectid, 0, check_ref);
>> +}
>> +
>> +struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
>> +					 u64 objectid, dev_t anon_dev)
>> +{
>> +	return __get_fs_root(fs_info, objectid, anon_dev, true);
>> +}
> 
> This does not look like a good API, we should keep btrfs_get_fs_root and
> add the anon_bdev initialization to the callers, there are only a few.
> 
OK, I'd go that direction.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool
  2020-06-16 22:48     ` David Sterba
@ 2020-06-16 23:31       ` Josef Bacik
  0 siblings, 0 replies; 27+ messages in thread
From: Josef Bacik @ 2020-06-16 23:31 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Greed Rong

On 6/16/20 6:48 PM, David Sterba wrote:
> On Tue, Jun 16, 2020 at 03:23:03PM -0400, Josef Bacik wrote:
>>> By freeing it earlier we reclaim the anon_dev quicker, hopefully to
>>> reduce the chance of exhausting the pool.
>>
>> Why isn't this happening as part of the root teardown once all the references to
>> it are gone?  Thanks,
> 
> This is where it happens now and is correct. The problem is that deleted
> subvolumes keep the id allocated until they are cleaned up, ie. all the
> dead roots consume the id though we don't need it anymore. Creating and
> deleting snapshots at large will produce a long list of dead subvolumes.
> THis patch will return the ids at the earlies possible moment so they
> can get reused.
> 

Oh ok I misread, we're doing it earlier on purpose.  Alright that's fine, you 
can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes
  2020-06-16 22:49     ` Qu Wenruo
@ 2020-06-16 23:32       ` Josef Bacik
  2020-06-16 23:49         ` Qu Wenruo
  0 siblings, 1 reply; 27+ messages in thread
From: Josef Bacik @ 2020-06-16 23:32 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs

On 6/16/20 6:49 PM, Qu Wenruo wrote:
> 
> 
> On 2020/6/17 上午3:25, Josef Bacik wrote:
>> On 6/15/20 10:17 PM, Qu Wenruo wrote:
>>> btrfs_root::anon_dev is an indicator for different subvolume namespaces.
>>> Thus it should always be initialized to an anonymous block device.
>>>
>>> Add a safe net to catch such uninitialized values.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>
>> Can we handle stat->dev not having a device set?  Or will this blow up
>> in other ways?  Thanks,
> 
> We can handle it without any problem, just users get confused.
> 
> As a common practice, we use different bdev as a namespace for different
> subvolumes.
> Without a valid bdev, some user space tools may not be able to
> distinguish inodes in different subvolumes.
> 

Alright that's fine then.  But I feel like stat is one of those things that'll 
flood the console, can we put this somewhere else that's going to be hit less? 
Thanks,

Josef

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes
  2020-06-16 23:32       ` Josef Bacik
@ 2020-06-16 23:49         ` Qu Wenruo
  2020-06-17 11:31           ` David Sterba
  0 siblings, 1 reply; 27+ messages in thread
From: Qu Wenruo @ 2020-06-16 23:49 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo, linux-btrfs



On 2020/6/17 上午7:32, Josef Bacik wrote:
> On 6/16/20 6:49 PM, Qu Wenruo wrote:
>>
>>
>> On 2020/6/17 上午3:25, Josef Bacik wrote:
>>> On 6/15/20 10:17 PM, Qu Wenruo wrote:
>>>> btrfs_root::anon_dev is an indicator for different subvolume
>>>> namespaces.
>>>> Thus it should always be initialized to an anonymous block device.
>>>>
>>>> Add a safe net to catch such uninitialized values.
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>
>>> Can we handle stat->dev not having a device set?  Or will this blow up
>>> in other ways?  Thanks,
>>
>> We can handle it without any problem, just users get confused.
>>
>> As a common practice, we use different bdev as a namespace for different
>> subvolumes.
>> Without a valid bdev, some user space tools may not be able to
>> distinguish inodes in different subvolumes.
>>
> 
> Alright that's fine then.  But I feel like stat is one of those things
> that'll flood the console, can we put this somewhere else that's going
> to be hit less? Thanks,

Unfortunately, stat() is the only user of btrfs_root::anon_dev.

While fortunately, the logical is pretty simple, even without the safe
net we can understand the lifespan pretty well.

I'm fine to drop this patch if you're concerned about the possible
warning flood, as the benefit is really not that much.

Thanks,
Qu

> 
> Josef

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes
  2020-06-16 23:49         ` Qu Wenruo
@ 2020-06-17 11:31           ` David Sterba
  2020-06-17 13:37             ` Josef Bacik
  0 siblings, 1 reply; 27+ messages in thread
From: David Sterba @ 2020-06-17 11:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Josef Bacik, Qu Wenruo, linux-btrfs

On Wed, Jun 17, 2020 at 07:49:33AM +0800, Qu Wenruo wrote:
> >>> Can we handle stat->dev not having a device set?  Or will this blow up
> >>> in other ways?  Thanks,
> >>
> >> We can handle it without any problem, just users get confused.
> >>
> >> As a common practice, we use different bdev as a namespace for different
> >> subvolumes.
> >> Without a valid bdev, some user space tools may not be able to
> >> distinguish inodes in different subvolumes.
> >>
> > 
> > Alright that's fine then.  But I feel like stat is one of those things
> > that'll flood the console, can we put this somewhere else that's going
> > to be hit less? Thanks,
> 
> Unfortunately, stat() is the only user of btrfs_root::anon_dev.
> 
> While fortunately, the logical is pretty simple, even without the safe
> net we can understand the lifespan pretty well.
> 
> I'm fine to drop this patch if you're concerned about the possible
> warning flood, as the benefit is really not that much.

It could be a developer-only warning but if there's a root with a bad
anon_dev, a simple 'ls -l' would flood the log for sure.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes
  2020-06-17 11:31           ` David Sterba
@ 2020-06-17 13:37             ` Josef Bacik
  2020-06-17 23:39               ` Qu Wenruo
  0 siblings, 1 reply; 27+ messages in thread
From: Josef Bacik @ 2020-06-17 13:37 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs

On 6/17/20 7:31 AM, David Sterba wrote:
> On Wed, Jun 17, 2020 at 07:49:33AM +0800, Qu Wenruo wrote:
>>>>> Can we handle stat->dev not having a device set?  Or will this blow up
>>>>> in other ways?  Thanks,
>>>>
>>>> We can handle it without any problem, just users get confused.
>>>>
>>>> As a common practice, we use different bdev as a namespace for different
>>>> subvolumes.
>>>> Without a valid bdev, some user space tools may not be able to
>>>> distinguish inodes in different subvolumes.
>>>>
>>>
>>> Alright that's fine then.  But I feel like stat is one of those things
>>> that'll flood the console, can we put this somewhere else that's going
>>> to be hit less? Thanks,
>>
>> Unfortunately, stat() is the only user of btrfs_root::anon_dev.
>>
>> While fortunately, the logical is pretty simple, even without the safe
>> net we can understand the lifespan pretty well.
>>
>> I'm fine to drop this patch if you're concerned about the possible
>> warning flood, as the benefit is really not that much.
> 
> It could be a developer-only warning but if there's a root with a bad
> anon_dev, a simple 'ls -l' would flood the log for sure.
> 

We'll know in btrfs_init_fs_root() when get_anon_bdev() fails right?  Can't we 
just complain then?  That seems less spammy.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes
  2020-06-17 13:37             ` Josef Bacik
@ 2020-06-17 23:39               ` Qu Wenruo
  0 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2020-06-17 23:39 UTC (permalink / raw)
  To: Josef Bacik, dsterba, Qu Wenruo, linux-btrfs



On 2020/6/17 下午9:37, Josef Bacik wrote:
> On 6/17/20 7:31 AM, David Sterba wrote:
>> On Wed, Jun 17, 2020 at 07:49:33AM +0800, Qu Wenruo wrote:
>>>>>> Can we handle stat->dev not having a device set?  Or will this
>>>>>> blow up
>>>>>> in other ways?  Thanks,
>>>>>
>>>>> We can handle it without any problem, just users get confused.
>>>>>
>>>>> As a common practice, we use different bdev as a namespace for
>>>>> different
>>>>> subvolumes.
>>>>> Without a valid bdev, some user space tools may not be able to
>>>>> distinguish inodes in different subvolumes.
>>>>>
>>>>
>>>> Alright that's fine then.  But I feel like stat is one of those things
>>>> that'll flood the console, can we put this somewhere else that's going
>>>> to be hit less? Thanks,
>>>
>>> Unfortunately, stat() is the only user of btrfs_root::anon_dev.
>>>
>>> While fortunately, the logical is pretty simple, even without the safe
>>> net we can understand the lifespan pretty well.
>>>
>>> I'm fine to drop this patch if you're concerned about the possible
>>> warning flood, as the benefit is really not that much.
>>
>> It could be a developer-only warning but if there's a root with a bad
>> anon_dev, a simple 'ls -l' would flood the log for sure.
>>
>
> We'll know in btrfs_init_fs_root() when get_anon_bdev() fails right? 
> Can't we just complain then?  That seems less spammy.  Thanks,

That's exactly where we do the allocation, that would be something like
warning about the allocation result.

That's why I'm fine to drop this patch if it's too spammy.

Thanks,
Qu
>
> Josef

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool
  2020-06-16  2:17 [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool Qu Wenruo
                   ` (3 preceding siblings ...)
  2020-06-16  2:17 ` [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool Qu Wenruo
@ 2020-06-30 14:14 ` David Sterba
  4 siblings, 0 replies; 27+ messages in thread
From: David Sterba @ 2020-06-30 14:14 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Jun 16, 2020 at 10:17:33AM +0800, Qu Wenruo wrote:
> Qu Wenruo (4):
>   btrfs: disk-io: don't allocate anonymous block device for user
>     invisible roots
>   btrfs: detect uninitialized btrfs_root::anon_dev for user visible
>     subvolumes
>   btrfs: preallocate anon_dev for subvolume and snapshot creation
>   btrfs: free anon_dev earlier to prevent exhausting anonymous block
>     device pool

I haven't seen an update to this patchset and would really like to get
it fixed. Let me know if I missed an update or you need help, there
weren't any big issues left.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation
  2020-06-16 15:10   ` David Sterba
  2020-06-16 22:54     ` Qu Wenruo
@ 2020-07-01  3:25     ` Qu Wenruo
  2020-07-01 17:39       ` David Sterba
  1 sibling, 1 reply; 27+ messages in thread
From: Qu Wenruo @ 2020-07-01  3:25 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Greed Rong


[-- Attachment #1.1: Type: text/plain, Size: 6198 bytes --]



On 2020/6/16 下午11:10, David Sterba wrote:
> On Tue, Jun 16, 2020 at 10:17:36AM +0800, Qu Wenruo wrote:
>> [BUG]
>> When a lot of subvolumes are created, there is a user report about
>> transaction aborted:
>>
>>   ------------[ cut here ]------------
>>   BTRFS: Transaction aborted (error -24)
>>   WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs]
>>   RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs]
>>   Call Trace:
>>    create_pending_snapshots+0x82/0xa0 [btrfs]
>>    btrfs_commit_transaction+0x275/0x8c0 [btrfs]
>>    btrfs_mksubvol+0x4b9/0x500 [btrfs]
>>    btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs]
>>    btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
>>    btrfs_ioctl+0x11a4/0x2da0 [btrfs]
>>    do_vfs_ioctl+0xa9/0x640
>>    ksys_ioctl+0x67/0x90
>>    __x64_sys_ioctl+0x1a/0x20
>>    do_syscall_64+0x5a/0x110
>>    entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>   ---[ end trace 33f2f83f3d5250e9 ]---
>>   BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown
>>   BTRFS info (device sda1): forced readonly
>>   BTRFS warning (device sda1): Skipping commit of aborted transaction.
>>   BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown
>>
>> [CAUSE]
>> When the global anonymous block device pool is exhausted, the following
>> call chain will fail, and lead to transaction abort:
>>
>>  btrfs_ioctl_snap_create_v2()
>>  |- btrfs_ioctl_snap_create_transid()
>>     |- btrfs_mksubvol()
>>        |- btrfs_commit_transaction()
>>           |- create_pending_snapshot()
>>              |- btrfs_get_fs_root()
>>                 |- btrfs_init_fs_root()
>>                    |- get_anon_bdev()
>>
>> [FIX]
>> Although we can't enlarge the anonymous block device pool, at least we
>> can preallocate anon_dev for subvolume/snapshot creation.
>> So that when the pool is exhausted, user will get an error other than
>> aborting transaction later.
>>
>> Reported-by: Greed Rong <greedrong@gmail.com>
>> Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/disk-io.c     | 51 ++++++++++++++++++++++++++++++++++++------
>>  fs/btrfs/disk-io.h     |  2 ++
>>  fs/btrfs/ioctl.c       | 21 ++++++++++++++++-
>>  fs/btrfs/transaction.c |  3 ++-
>>  fs/btrfs/transaction.h |  2 ++
>>  5 files changed, 70 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index cfc0ff288238..14fd69b71cb8 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1395,7 +1395,7 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
>>  	goto out;
>>  }
>>  
>> -static int btrfs_init_fs_root(struct btrfs_root *root)
>> +static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
>>  {
>>  	int ret;
>>  	unsigned int nofs_flag;
>> @@ -1435,9 +1435,13 @@ static int btrfs_init_fs_root(struct btrfs_root *root)
>>  	 */
>>  	if (is_fstree(root->root_key.objectid) &&
>>  	    btrfs_root_refs(&root->root_item)) {
>> -		ret = get_anon_bdev(&root->anon_dev);
>> -		if (ret)
>> -			goto fail;
>> +		if (!anon_dev) {
>> +			ret = get_anon_bdev(&root->anon_dev);
>> +			if (ret)
>> +				goto fail;
>> +		} else {
>> +			root->anon_dev = anon_dev;
>> +		}
>>  	}
>>  
>>  	mutex_lock(&root->objectid_mutex);
>> @@ -1542,8 +1546,27 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
>>  }
>>  
>>  
>> -struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
>> -				     u64 objectid, bool check_ref)
>> +/*
>> + * Get a fs root.
>> + *
>> + * For essential trees like root/extent tree, we grab it from fs_info directly.
>> + * For subvolume trees, we check the cached fs roots first. If miss then
>> + * read it from disk and add it to cached fs roots.
>> + *
>> + * Caller should release the root by calling btrfs_put_root() after the usage.
>> + *
>> + * NOTE: Reloc and log trees can't be read by this function as they share the
>> + *	 same root objectid.
>> + *
>> + * @objectid:	Root (subvolume) id
>> + * @anon_dev:	Preallocated anonymous block device number for new roots.
>> + * 		Pass 0 for automatic allocation.
>> + * @check_ref:	Whether to check root refs. If true, return -ENOENT for orphan
>> + * 		roots.
>> + */
>> +static struct btrfs_root *__get_fs_root(struct btrfs_fs_info *fs_info,
>> +					u64 objectid, dev_t anon_dev,
>> +					bool check_ref)
> 
> 
>> +struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
>> +				     u64 objectid, bool check_ref)
>> +{
>> +	return __get_fs_root(fs_info, objectid, 0, check_ref);
>> +}
>> +
>> +struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
>> +					 u64 objectid, dev_t anon_dev)
>> +{
>> +	return __get_fs_root(fs_info, objectid, anon_dev, true);
>> +}
> 
> This does not look like a good API, we should keep btrfs_get_fs_root and
> add the anon_bdev initialization to the callers, there are only a few.
> 

A few = over 25?

I have switched to keep btrfs_get_fs_root(), but you won't like the summary:

Old:
 fs/btrfs/disk-io.h     |  2 ++
 fs/btrfs/ioctl.c       | 21 ++++++++++++++++-
 fs/btrfs/transaction.c |  3 ++-
 fs/btrfs/transaction.h |  2 ++
 5 files changed, 70 insertions(+), 9 deletions(-)

New:
 fs/btrfs/backref.c     |  4 ++--
 fs/btrfs/disk-io.c     | 42 ++++++++++++++++++++++++++++++++++--------
 fs/btrfs/disk-io.h     |  3 ++-
 fs/btrfs/export.c      |  2 +-
 fs/btrfs/file.c        |  2 +-
 fs/btrfs/inode.c       |  2 +-
 fs/btrfs/ioctl.c       | 31 +++++++++++++++++++++++++------
 fs/btrfs/relocation.c  | 11 ++++++-----
 fs/btrfs/root-tree.c   |  2 +-
 fs/btrfs/scrub.c       |  2 +-
 fs/btrfs/send.c        |  4 ++--
 fs/btrfs/super.c       |  2 +-
 fs/btrfs/transaction.c |  3 ++-
 fs/btrfs/transaction.h |  2 ++
 fs/btrfs/tree-log.c    |  2 +-
 fs/btrfs/uuid-tree.c   |  2 +-
 16 files changed, 83 insertions(+), 33 deletions(-)

Do we really go that direction?

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation
  2020-07-01  3:25     ` Qu Wenruo
@ 2020-07-01 17:39       ` David Sterba
  2020-07-01 23:56         ` Qu Wenruo
  0 siblings, 1 reply; 27+ messages in thread
From: David Sterba @ 2020-07-01 17:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, Greed Rong

On Wed, Jul 01, 2020 at 11:25:27AM +0800, Qu Wenruo wrote:
> >> +struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
> >> +					 u64 objectid, dev_t anon_dev)
> >> +{
> >> +	return __get_fs_root(fs_info, objectid, anon_dev, true);
> >> +}
> > 
> > This does not look like a good API, we should keep btrfs_get_fs_root and
> > add the anon_bdev initialization to the callers, there are only a few.
> > 
> 
> A few = over 25?
> 
> I have switched to keep btrfs_get_fs_root(), but you won't like the summary:
> 
> Old:
>  fs/btrfs/disk-io.h     |  2 ++
>  fs/btrfs/ioctl.c       | 21 ++++++++++++++++-
>  fs/btrfs/transaction.c |  3 ++-
>  fs/btrfs/transaction.h |  2 ++
>  5 files changed, 70 insertions(+), 9 deletions(-)
> 
> New:
>  fs/btrfs/backref.c     |  4 ++--
>  fs/btrfs/disk-io.c     | 42 ++++++++++++++++++++++++++++++++++--------
>  fs/btrfs/disk-io.h     |  3 ++-
>  fs/btrfs/export.c      |  2 +-
>  fs/btrfs/file.c        |  2 +-
>  fs/btrfs/inode.c       |  2 +-
>  fs/btrfs/ioctl.c       | 31 +++++++++++++++++++++++++------
>  fs/btrfs/relocation.c  | 11 ++++++-----
>  fs/btrfs/root-tree.c   |  2 +-
>  fs/btrfs/scrub.c       |  2 +-
>  fs/btrfs/send.c        |  4 ++--
>  fs/btrfs/super.c       |  2 +-
>  fs/btrfs/transaction.c |  3 ++-
>  fs/btrfs/transaction.h |  2 ++
>  fs/btrfs/tree-log.c    |  2 +-
>  fs/btrfs/uuid-tree.c   |  2 +-
>  16 files changed, 83 insertions(+), 33 deletions(-)
> 
> Do we really go that direction?

You're right, I don't like the summary and I don't like the code either.

Adding the anon_dev argument to btrfs_get_fs_root is wrong and I have
never suggested that. What I meant is to put the actual id allocation
to the callers where the subvolume is created, ie only 2 places.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation
  2020-07-01 17:39       ` David Sterba
@ 2020-07-01 23:56         ` Qu Wenruo
  2020-07-02 16:08           ` David Sterba
  0 siblings, 1 reply; 27+ messages in thread
From: Qu Wenruo @ 2020-07-01 23:56 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Greed Rong


[-- Attachment #1.1: Type: text/plain, Size: 2186 bytes --]



On 2020/7/2 上午1:39, David Sterba wrote:
> On Wed, Jul 01, 2020 at 11:25:27AM +0800, Qu Wenruo wrote:
>>>> +struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
>>>> +					 u64 objectid, dev_t anon_dev)
>>>> +{
>>>> +	return __get_fs_root(fs_info, objectid, anon_dev, true);
>>>> +}
>>>
>>> This does not look like a good API, we should keep btrfs_get_fs_root and
>>> add the anon_bdev initialization to the callers, there are only a few.
>>>
>>
>> A few = over 25?
>>
>> I have switched to keep btrfs_get_fs_root(), but you won't like the summary:
>>
>> Old:
>>  fs/btrfs/disk-io.h     |  2 ++
>>  fs/btrfs/ioctl.c       | 21 ++++++++++++++++-
>>  fs/btrfs/transaction.c |  3 ++-
>>  fs/btrfs/transaction.h |  2 ++
>>  5 files changed, 70 insertions(+), 9 deletions(-)
>>
>> New:
>>  fs/btrfs/backref.c     |  4 ++--
>>  fs/btrfs/disk-io.c     | 42 ++++++++++++++++++++++++++++++++++--------
>>  fs/btrfs/disk-io.h     |  3 ++-
>>  fs/btrfs/export.c      |  2 +-
>>  fs/btrfs/file.c        |  2 +-
>>  fs/btrfs/inode.c       |  2 +-
>>  fs/btrfs/ioctl.c       | 31 +++++++++++++++++++++++++------
>>  fs/btrfs/relocation.c  | 11 ++++++-----
>>  fs/btrfs/root-tree.c   |  2 +-
>>  fs/btrfs/scrub.c       |  2 +-
>>  fs/btrfs/send.c        |  4 ++--
>>  fs/btrfs/super.c       |  2 +-
>>  fs/btrfs/transaction.c |  3 ++-
>>  fs/btrfs/transaction.h |  2 ++
>>  fs/btrfs/tree-log.c    |  2 +-
>>  fs/btrfs/uuid-tree.c   |  2 +-
>>  16 files changed, 83 insertions(+), 33 deletions(-)
>>
>> Do we really go that direction?
> 
> You're right, I don't like the summary and I don't like the code either.
> 
> Adding the anon_dev argument to btrfs_get_fs_root is wrong and I have
> never suggested that. What I meant is to put the actual id allocation
> to the callers where the subvolume is created, ie only 2 places.
> 

You mean to extract btrfs_init_fs_root() out of btrfs_get_fs_root()?

That looks a little risky and I can't find any good solution to make it
more elegant than the current one.

Although I would definitely remove the "__" prefix as we shouldn't add
such prefix anymore.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation
  2020-07-01 23:56         ` Qu Wenruo
@ 2020-07-02 16:08           ` David Sterba
  2020-07-02 23:46             ` David Sterba
  0 siblings, 1 reply; 27+ messages in thread
From: David Sterba @ 2020-07-02 16:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, Greed Rong

On Thu, Jul 02, 2020 at 07:56:57AM +0800, Qu Wenruo wrote:
> On 2020/7/2 上午1:39, David Sterba wrote:
> > On Wed, Jul 01, 2020 at 11:25:27AM +0800, Qu Wenruo wrote:
> > Adding the anon_dev argument to btrfs_get_fs_root is wrong and I have
> > never suggested that. What I meant is to put the actual id allocation
> > to the callers where the subvolume is created, ie only 2 places.
> 
> You mean to extract btrfs_init_fs_root() out of btrfs_get_fs_root()?
> 
> That looks a little risky and I can't find any good solution to make it
> more elegant than the current one.

I spent more time reading through the get-fs-root functions and the main
problem is that btrfs_get_fs_root is doing several things, and it makes
a lot of code simple, I certainly want to keep it that way.

The idea was to pre-insert the new root (similar to the root item
insertion, btrfs_insert_root) and not letting btrfs_get_fs_root call to
btrfs_init_fs_info where the anon_bdev allocation happens for all the
other non-ioctl cases.

Which could be done by factoring out btrfs_init_fs_root from
btrfs_get_fs_root. This would allow to extend only btrfs_init_fs_root
arguments with the anon_bdev, and keep btrfs_get_fs_root intact.
So this is splitting the API from the end.

What you originally proposed is a split from the begnning, ie. add a
common implementation for existing and new and provide btrfs_get_fs_root
and btrfs_get_new_fs_root that would hide the additional parameters.

Both ways are IMO valid but I thought it would be easier to pass the
anon bdev inside ioctl callbacks. The problem that makes my proposal
less appealing is that btrfs_read_tree_root gets called earlier than
I'd like so factoring everything after btrfs_init_fs_root would not be
so straightforward.

In conclusion, your proposal is better and I'm going to merge it.

> Although I would definitely remove the "__" prefix as we shouldn't add
> such prefix anymore.

Yeah with the small naming fixups.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation
  2020-07-02 16:08           ` David Sterba
@ 2020-07-02 23:46             ` David Sterba
  2020-07-03  5:19               ` Qu Wenruo
  0 siblings, 1 reply; 27+ messages in thread
From: David Sterba @ 2020-07-02 23:46 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs, Greed Rong

On Thu, Jul 02, 2020 at 06:08:21PM +0200, David Sterba wrote:
> On Thu, Jul 02, 2020 at 07:56:57AM +0800, Qu Wenruo wrote:
> > On 2020/7/2 上午1:39, David Sterba wrote:
> > > On Wed, Jul 01, 2020 at 11:25:27AM +0800, Qu Wenruo wrote:
> > > Adding the anon_dev argument to btrfs_get_fs_root is wrong and I have
> > > never suggested that. What I meant is to put the actual id allocation
> > > to the callers where the subvolume is created, ie only 2 places.
> > 
> > You mean to extract btrfs_init_fs_root() out of btrfs_get_fs_root()?
> > 
> > That looks a little risky and I can't find any good solution to make it
> > more elegant than the current one.
> 
> I spent more time reading through the get-fs-root functions and the main
> problem is that btrfs_get_fs_root is doing several things, and it makes
> a lot of code simple, I certainly want to keep it that way.
> 
> The idea was to pre-insert the new root (similar to the root item
> insertion, btrfs_insert_root) and not letting btrfs_get_fs_root call to
> btrfs_init_fs_info where the anon_bdev allocation happens for all the
> other non-ioctl cases.
> 
> Which could be done by factoring out btrfs_init_fs_root from
> btrfs_get_fs_root. This would allow to extend only btrfs_init_fs_root
> arguments with the anon_bdev, and keep btrfs_get_fs_root intact.
> So this is splitting the API from the end.
> 
> What you originally proposed is a split from the begnning, ie. add a
> common implementation for existing and new and provide btrfs_get_fs_root
> and btrfs_get_new_fs_root that would hide the additional parameters.
> 
> Both ways are IMO valid but I thought it would be easier to pass the
> anon bdev inside ioctl callbacks. The problem that makes my proposal
> less appealing is that btrfs_read_tree_root gets called earlier than
> I'd like so factoring everything after btrfs_init_fs_root would not be
> so straightforward.
> 
> In conclusion, your proposal is better and I'm going to merge it.
> 
> > Although I would definitely remove the "__" prefix as we shouldn't add
> > such prefix anymore.
> 
> Yeah with the small naming fixups.

It's in for-next-20200703. I've updated the changelogs to reflect what
we found during debugging the issue, the __ function renamed to
btrfs_get_root_ref and some function comments added. All patches
reordered and tagged for stable though the preallocation is not within
the size limit.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation
  2020-07-02 23:46             ` David Sterba
@ 2020-07-03  5:19               ` Qu Wenruo
  2020-07-03 12:29                 ` David Sterba
  0 siblings, 1 reply; 27+ messages in thread
From: Qu Wenruo @ 2020-07-03  5:19 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Greed Rong


[-- Attachment #1.1: Type: text/plain, Size: 2868 bytes --]



On 2020/7/3 上午7:46, David Sterba wrote:
> On Thu, Jul 02, 2020 at 06:08:21PM +0200, David Sterba wrote:
>> On Thu, Jul 02, 2020 at 07:56:57AM +0800, Qu Wenruo wrote:
>>> On 2020/7/2 上午1:39, David Sterba wrote:
>>>> On Wed, Jul 01, 2020 at 11:25:27AM +0800, Qu Wenruo wrote:
>>>> Adding the anon_dev argument to btrfs_get_fs_root is wrong and I have
>>>> never suggested that. What I meant is to put the actual id allocation
>>>> to the callers where the subvolume is created, ie only 2 places.
>>>
>>> You mean to extract btrfs_init_fs_root() out of btrfs_get_fs_root()?
>>>
>>> That looks a little risky and I can't find any good solution to make it
>>> more elegant than the current one.
>>
>> I spent more time reading through the get-fs-root functions and the main
>> problem is that btrfs_get_fs_root is doing several things, and it makes
>> a lot of code simple, I certainly want to keep it that way.
>>
>> The idea was to pre-insert the new root (similar to the root item
>> insertion, btrfs_insert_root) and not letting btrfs_get_fs_root call to
>> btrfs_init_fs_info where the anon_bdev allocation happens for all the
>> other non-ioctl cases.
>>
>> Which could be done by factoring out btrfs_init_fs_root from
>> btrfs_get_fs_root. This would allow to extend only btrfs_init_fs_root
>> arguments with the anon_bdev, and keep btrfs_get_fs_root intact.
>> So this is splitting the API from the end.
>>
>> What you originally proposed is a split from the begnning, ie. add a
>> common implementation for existing and new and provide btrfs_get_fs_root
>> and btrfs_get_new_fs_root that would hide the additional parameters.
>>
>> Both ways are IMO valid but I thought it would be easier to pass the
>> anon bdev inside ioctl callbacks. The problem that makes my proposal
>> less appealing is that btrfs_read_tree_root gets called earlier than
>> I'd like so factoring everything after btrfs_init_fs_root would not be
>> so straightforward.
>>
>> In conclusion, your proposal is better and I'm going to merge it.
>>
>>> Although I would definitely remove the "__" prefix as we shouldn't add
>>> such prefix anymore.
>>
>> Yeah with the small naming fixups.
> 
> It's in for-next-20200703. I've updated the changelogs to reflect what
> we found during debugging the issue, the __ function renamed to
> btrfs_get_root_ref and some function comments added. All patches
> reordered and tagged for stable though the preallocation is not within
> the size limit.
> 

Thanks for the merge and dropping the unneeded check patch.

All the modification looks good to me.

Just a small nitpick for commit a561defc34aa ("btrfs: don't allocate
anonymous block device for user invisible roots"), there is an
unnecessary new line after "[CAUSE]".

Thanks for your daily work of maintaining btrfs,
Qu



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation
  2020-07-03  5:19               ` Qu Wenruo
@ 2020-07-03 12:29                 ` David Sterba
  2020-07-03 12:39                   ` Qu Wenruo
  0 siblings, 1 reply; 27+ messages in thread
From: David Sterba @ 2020-07-03 12:29 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, Greed Rong

On Fri, Jul 03, 2020 at 01:19:14PM +0800, Qu Wenruo wrote:
> >> In conclusion, your proposal is better and I'm going to merge it.
> >>
> >>> Although I would definitely remove the "__" prefix as we shouldn't add
> >>> such prefix anymore.
> >>
> >> Yeah with the small naming fixups.
> > 
> > It's in for-next-20200703. I've updated the changelogs to reflect what
> > we found during debugging the issue, the __ function renamed to
> > btrfs_get_root_ref and some function comments added. All patches
> > reordered and tagged for stable though the preallocation is not within
> > the size limit.
> > 
> 
> Thanks for the merge and dropping the unneeded check patch.
> 
> All the modification looks good to me.
> 
> Just a small nitpick for commit a561defc34aa ("btrfs: don't allocate
> anonymous block device for user invisible roots"), there is an
> unnecessary new line after "[CAUSE]".

Fixed, thanks for checking. I'm not entirely satisfied with the name of
btrfs_get_root_ref, as it could be confused with the on-disk
btrfs_root_ref. The get-ref functions could use some cleanup as
btrfs_get_fs_root is sometimes used for non-filesystem roots. Adding a
generic get_any_root that would accept any tree and btrfs_get_fs_root
would be only for subvolume roots or perhaps related trees like data
reloc. But this is not essential for the anon bdev fixes so that's for
later.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation
  2020-07-03 12:29                 ` David Sterba
@ 2020-07-03 12:39                   ` Qu Wenruo
  0 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2020-07-03 12:39 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Greed Rong


[-- Attachment #1.1: Type: text/plain, Size: 1601 bytes --]



On 2020/7/3 下午8:29, David Sterba wrote:
> On Fri, Jul 03, 2020 at 01:19:14PM +0800, Qu Wenruo wrote:
>>>> In conclusion, your proposal is better and I'm going to merge it.
>>>>
>>>>> Although I would definitely remove the "__" prefix as we shouldn't add
>>>>> such prefix anymore.
>>>>
>>>> Yeah with the small naming fixups.
>>>
>>> It's in for-next-20200703. I've updated the changelogs to reflect what
>>> we found during debugging the issue, the __ function renamed to
>>> btrfs_get_root_ref and some function comments added. All patches
>>> reordered and tagged for stable though the preallocation is not within
>>> the size limit.
>>>
>>
>> Thanks for the merge and dropping the unneeded check patch.
>>
>> All the modification looks good to me.
>>
>> Just a small nitpick for commit a561defc34aa ("btrfs: don't allocate
>> anonymous block device for user invisible roots"), there is an
>> unnecessary new line after "[CAUSE]".
> 
> Fixed, thanks for checking. I'm not entirely satisfied with the name of
> btrfs_get_root_ref, as it could be confused with the on-disk
> btrfs_root_ref.

My original plan is even simpler, just get_fs_root(), remove the btrfs_
prefix...

> The get-ref functions could use some cleanup as
> btrfs_get_fs_root is sometimes used for non-filesystem roots. Adding a
> generic get_any_root that would accept any tree and btrfs_get_fs_root
> would be only for subvolume roots or perhaps related trees like data
> reloc. But this is not essential for the anon bdev fixes so that's for
> later.

No problem.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-07-03 12:39 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-16  2:17 [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool Qu Wenruo
2020-06-16  2:17 ` [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots Qu Wenruo
2020-06-16 19:21   ` Josef Bacik
2020-06-16  2:17 ` [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes Qu Wenruo
2020-06-16 19:25   ` Josef Bacik
2020-06-16 22:49     ` Qu Wenruo
2020-06-16 23:32       ` Josef Bacik
2020-06-16 23:49         ` Qu Wenruo
2020-06-17 11:31           ` David Sterba
2020-06-17 13:37             ` Josef Bacik
2020-06-17 23:39               ` Qu Wenruo
2020-06-16  2:17 ` [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation Qu Wenruo
2020-06-16 15:10   ` David Sterba
2020-06-16 22:54     ` Qu Wenruo
2020-07-01  3:25     ` Qu Wenruo
2020-07-01 17:39       ` David Sterba
2020-07-01 23:56         ` Qu Wenruo
2020-07-02 16:08           ` David Sterba
2020-07-02 23:46             ` David Sterba
2020-07-03  5:19               ` Qu Wenruo
2020-07-03 12:29                 ` David Sterba
2020-07-03 12:39                   ` Qu Wenruo
2020-06-16  2:17 ` [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool Qu Wenruo
2020-06-16 19:23   ` Josef Bacik
2020-06-16 22:48     ` David Sterba
2020-06-16 23:31       ` Josef Bacik
2020-06-30 14:14 ` [PATCH 0/4] btrfs: workaround exhausted " David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.