All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot
@ 2022-12-05  9:51 robbieko
  2022-12-05  9:51 ` [PATCH 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot robbieko
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: robbieko @ 2022-12-05  9:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

From: Robbie Ko <robbieko@synology.com>

[Issue]
When creating subvolume/snapshot, the transaction may be abort due to -ENOMEM

  WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937 create_pending_snapshot+0xe30/0xe70 [btrfs]()
  CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
  Call Trace:
    create_pending_snapshot+0xe30/0xe70 [btrfs]
    create_pending_snapshots+0x89/0xb0 [btrfs]
    btrfs_commit_transaction+0x469/0xc60 [btrfs]
    btrfs_mksubvol+0x5bd/0x690 [btrfs]
    btrfs_mksnapshot+0x102/0x170 [btrfs]
    btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
    btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
    btrfs_ioctl+0x2111/0x3130 [btrfs]
    do_vfs_ioctl+0x7ea/0xa80
    SyS_ioctl+0xa1/0xb0
    entry_SYSCALL_64_fastpath+0x1e/0x8e
  ---[ end trace 910c8f86780ca385 ]---
  BTRFS: error (device dm-2) in create_pending_snapshot:1937: errno=-12 Out of memory

[Cause]
During creating a subvolume/snapshot, it is necessary to allocate memory for Initializing fs root.
Therefore, it can only use GFP_NOFS to allocate memory to avoid deadlock issues.
However, atomic allocation is required when processing percpu_counter_init
without GFP_KERNEL due to the unique structure of percpu_counter.
In this situation, allocating memory for initializing fs root may cause
unexpected -ENOMEM when free memory is low and causes btrfs transaction to abort.

[Fix]
We allocate memory at the beginning of creating a subvolume/snapshot.
This way, we can ensure the memory is enough when initializing fs root.
Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
the transaction won’t abort since it hasn’t started yet.

Robbie Ko (3):
  btrfs: refactor anon_dev with new_fs_root_args for create
    subvolume/snapshot
  btrfs: change snapshot_lock to dynamic pointer
  btrfs: add snapshot_lock to new_fs_root_args

 fs/btrfs/ctree.h       |   2 +-
 fs/btrfs/disk-io.c     | 107 ++++++++++++++++++++++++++++++-----------
 fs/btrfs/disk-io.h     |  12 ++++-
 fs/btrfs/file.c        |   8 +--
 fs/btrfs/inode.c       |  12 ++---
 fs/btrfs/ioctl.c       |  38 +++++++--------
 fs/btrfs/transaction.c |   2 +-
 fs/btrfs/transaction.h |   5 +-
 8 files changed, 123 insertions(+), 63 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot
  2022-12-05  9:51 [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot robbieko
@ 2022-12-05  9:51 ` robbieko
  2022-12-12 17:04   ` Filipe Manana
  2022-12-05  9:51 ` [PATCH 2/3] btrfs: change snapshot_lock to dynamic pointer robbieko
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: robbieko @ 2022-12-05  9:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

From: Robbie Ko <robbieko@synology.com>

Create a new structure called btrfs_new_fs_root_args that
refactor the type of anon_dev.
With the new structure, btrfs_get_new_fs_root can input
btrfs_new_fs_root_args into btrfs_init_fs_root by order.

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/disk-io.c     | 63 ++++++++++++++++++++++++++++++------------
 fs/btrfs/disk-io.h     | 10 ++++++-
 fs/btrfs/ioctl.c       | 34 +++++++++++------------
 fs/btrfs/transaction.c |  2 +-
 fs/btrfs/transaction.h |  5 ++--
 5 files changed, 74 insertions(+), 40 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d99bf7c64611..afe16e1f0b18 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1422,12 +1422,44 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
 	return root;
 }
 
+struct btrfs_new_fs_root_args *btrfs_new_fs_root_args_prepare(gfp_t gfp_mask)
+{
+	int err;
+	struct btrfs_new_fs_root_args *args;
+
+	args = kzalloc(sizeof(*args), gfp_mask);
+	if (!args) {
+		err = -ENOMEM;
+		goto error;
+	}
+
+	err = get_anon_bdev(&args->anon_dev);
+	if (err)
+		goto error;
+
+	return args;
+
+error:
+	btrfs_new_fs_root_args_destroy(args);
+	return ERR_PTR(err);
+}
+
+void btrfs_new_fs_root_args_destroy(struct btrfs_new_fs_root_args *args)
+{
+	if (!args)
+		return;
+	if (args->anon_dev)
+		free_anon_bdev(args->anon_dev);
+	kfree(args);
+}
+
 /*
  * Initialize subvolume root in-memory structure
  *
  * @anon_dev:	anonymous device to attach to the root, if zero, allocate new
  */
-static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
+static int btrfs_init_fs_root(struct btrfs_root *root,
+			      struct btrfs_new_fs_root_args *new_fs_root_args)
 {
 	int ret;
 	unsigned int nofs_flag;
@@ -1454,12 +1486,13 @@ static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
 	 */
 	if (is_fstree(root->root_key.objectid) &&
 	    btrfs_root_refs(&root->root_item) > 0) {
-		if (!anon_dev) {
+		if (!new_fs_root_args || !new_fs_root_args->anon_dev) {
 			ret = get_anon_bdev(&root->anon_dev);
 			if (ret)
 				goto fail;
 		} else {
-			root->anon_dev = anon_dev;
+			root->anon_dev = new_fs_root_args->anon_dev;
+			new_fs_root_args->anon_dev = 0;
 		}
 	}
 
@@ -1633,7 +1666,8 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
  *		for orphan roots
  */
 static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
-					     u64 objectid, dev_t anon_dev,
+					     u64 objectid,
+					     struct btrfs_new_fs_root_args *new_fs_root_args,
 					     bool check_ref)
 {
 	struct btrfs_root *root;
@@ -1647,8 +1681,8 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 again:
 	root = btrfs_lookup_fs_root(fs_info, objectid);
 	if (root) {
-		/* Shouldn't get preallocated anon_dev for cached roots */
-		ASSERT(!anon_dev);
+		/* Shouldn't get new_fs_root_args for cached roots */
+		ASSERT(!new_fs_root_args);
 		if (check_ref && btrfs_root_refs(&root->root_item) == 0) {
 			btrfs_put_root(root);
 			return ERR_PTR(-ENOENT);
@@ -1668,7 +1702,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 		goto fail;
 	}
 
-	ret = btrfs_init_fs_root(root, anon_dev);
+	ret = btrfs_init_fs_root(root, new_fs_root_args);
 	if (ret)
 		goto fail;
 
@@ -1698,14 +1732,6 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 	}
 	return root;
 fail:
-	/*
-	 * If our caller provided us an anonymous device, then it's his
-	 * responsibility to free it in case we fail. So we have to set our
-	 * root's anon_dev to 0 to avoid a double free, once by btrfs_put_root()
-	 * and once again by our caller.
-	 */
-	if (anon_dev)
-		root->anon_dev = 0;
 	btrfs_put_root(root);
 	return ERR_PTR(ret);
 }
@@ -1720,7 +1746,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
 				     u64 objectid, bool check_ref)
 {
-	return btrfs_get_root_ref(fs_info, objectid, 0, check_ref);
+	return btrfs_get_root_ref(fs_info, objectid, NULL, check_ref);
 }
 
 /*
@@ -1732,9 +1758,10 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
  *		parameter value
  */
 struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
-					 u64 objectid, dev_t anon_dev)
+					 u64 objectid,
+					 struct btrfs_new_fs_root_args *new_fs_root_args)
 {
-	return btrfs_get_root_ref(fs_info, objectid, anon_dev, true);
+	return btrfs_get_root_ref(fs_info, objectid, new_fs_root_args, true);
 }
 
 /*
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 9fa923e005a3..a7c91bfb0c16 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -25,6 +25,11 @@ static inline u64 btrfs_sb_offset(int mirror)
 	return BTRFS_SUPER_INFO_OFFSET;
 }
 
+struct btrfs_new_fs_root_args {
+	/* Preallocated anonymous block device number */
+	dev_t anon_dev;
+};
+
 struct btrfs_device;
 struct btrfs_fs_devices;
 
@@ -58,6 +63,8 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
 int btrfs_commit_super(struct btrfs_fs_info *fs_info);
 struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
 					struct btrfs_key *key);
+struct btrfs_new_fs_root_args *btrfs_new_fs_root_args_prepare(gfp_t gfp_mask);
+void btrfs_new_fs_root_args_destroy(struct btrfs_new_fs_root_args *args);
 int btrfs_insert_fs_root(struct btrfs_fs_info *fs_info,
 			 struct btrfs_root *root);
 void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info);
@@ -65,7 +72,8 @@ void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info);
 struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
 				     u64 objectid, bool check_ref);
 struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
-					 u64 objectid, dev_t anon_dev);
+					 u64 objectid,
+					 struct btrfs_new_fs_root_args *new_fs_root_args);
 struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
 						 struct btrfs_path *path,
 						 u64 objectid);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5ba2e810dc6e..5785336ab7cf 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -588,8 +588,8 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
 	};
 	unsigned int trans_num_items;
 	int ret;
-	dev_t anon_dev;
 	u64 objectid;
+	struct btrfs_new_fs_root_args *new_fs_root_args = NULL;
 
 	root_item = kzalloc(sizeof(*root_item), GFP_KERNEL);
 	if (!root_item)
@@ -608,14 +608,17 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
 		goto out_root_item;
 	}
 
-	ret = get_anon_bdev(&anon_dev);
-	if (ret < 0)
+	new_fs_root_args = btrfs_new_fs_root_args_prepare(GFP_KERNEL);
+	if (IS_ERR(new_fs_root_args)) {
+		ret = PTR_ERR(new_fs_root_args);
+		new_fs_root_args = NULL;
 		goto out_root_item;
+	}
 
 	new_inode_args.inode = btrfs_new_subvol_inode(mnt_userns, dir);
 	if (!new_inode_args.inode) {
 		ret = -ENOMEM;
-		goto out_anon_dev;
+		goto out_root_item;
 	}
 	ret = btrfs_new_inode_prepare(&new_inode_args, &trans_num_items);
 	if (ret)
@@ -706,14 +709,12 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
 	free_extent_buffer(leaf);
 	leaf = NULL;
 
-	new_root = btrfs_get_new_fs_root(fs_info, objectid, anon_dev);
+	new_root = btrfs_get_new_fs_root(fs_info, objectid, new_fs_root_args);
 	if (IS_ERR(new_root)) {
 		ret = PTR_ERR(new_root);
 		btrfs_abort_transaction(trans, ret);
 		goto out;
 	}
-	/* anon_dev is owned by new_root now. */
-	anon_dev = 0;
 	BTRFS_I(new_inode_args.inode)->root = new_root;
 	/* ... and new_root is owned by new_inode_args.inode now. */
 
@@ -752,10 +753,8 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
 	btrfs_new_inode_args_destroy(&new_inode_args);
 out_inode:
 	iput(new_inode_args.inode);
-out_anon_dev:
-	if (anon_dev)
-		free_anon_bdev(anon_dev);
 out_root_item:
+	btrfs_new_fs_root_args_destroy(new_fs_root_args);
 	kfree(root_item);
 	return ret;
 }
@@ -791,9 +790,13 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
 	if (!pending_snapshot)
 		return -ENOMEM;
 
-	ret = get_anon_bdev(&pending_snapshot->anon_dev);
-	if (ret < 0)
+	pending_snapshot->new_fs_root_args = btrfs_new_fs_root_args_prepare(GFP_KERNEL);
+	if (IS_ERR(pending_snapshot->new_fs_root_args)) {
+		ret = PTR_ERR(pending_snapshot->new_fs_root_args);
+		pending_snapshot->new_fs_root_args = NULL;
 		goto free_pending;
+	}
+
 	pending_snapshot->root_item = kzalloc(sizeof(struct btrfs_root_item),
 			GFP_KERNEL);
 	pending_snapshot->path = btrfs_alloc_path();
@@ -850,16 +853,11 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
 
 	d_instantiate(dentry, inode);
 	ret = 0;
-	pending_snapshot->anon_dev = 0;
 fail:
-	/* Prevent double freeing of anon_dev */
-	if (ret && pending_snapshot->snap)
-		pending_snapshot->snap->anon_dev = 0;
 	btrfs_put_root(pending_snapshot->snap);
 	btrfs_subvolume_release_metadata(root, &pending_snapshot->block_rsv);
 free_pending:
-	if (pending_snapshot->anon_dev)
-		free_anon_bdev(pending_snapshot->anon_dev);
+	btrfs_new_fs_root_args_destroy(pending_snapshot->new_fs_root_args);
 	kfree(pending_snapshot->root_item);
 	btrfs_free_path(pending_snapshot->path);
 	kfree(pending_snapshot);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index d1f1da6820fb..614eb84422e1 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1777,7 +1777,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	}
 
 	key.offset = (u64)-1;
-	pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->anon_dev);
+	pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->new_fs_root_args);
 	if (IS_ERR(pending->snap)) {
 		ret = PTR_ERR(pending->snap);
 		pending->snap = NULL;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 970ff316069d..34648a617a2c 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -10,6 +10,7 @@
 #include "btrfs_inode.h"
 #include "delayed-ref.h"
 #include "ctree.h"
+#include "disk-io.h"
 
 enum btrfs_trans_state {
 	TRANS_STATE_RUNNING,
@@ -161,8 +162,8 @@ struct btrfs_pending_snapshot {
 	struct btrfs_block_rsv block_rsv;
 	/* extra metadata reservation for relocation */
 	int error;
-	/* Preallocated anonymous block device number */
-	dev_t anon_dev;
+	/* Preallocated new_fs_root_args */
+	struct btrfs_new_fs_root_args *new_fs_root_args;
 	bool readonly;
 	struct list_head list;
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] btrfs: change snapshot_lock to dynamic pointer
  2022-12-05  9:51 [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot robbieko
  2022-12-05  9:51 ` [PATCH 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot robbieko
@ 2022-12-05  9:51 ` robbieko
  2022-12-12 17:05   ` Filipe Manana
  2022-12-05  9:51 ` [PATCH 3/3] btrfs: add snapshot_lock to new_fs_root_args robbieko
  2022-12-05 11:15 ` [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot Filipe Manana
  3 siblings, 1 reply; 16+ messages in thread
From: robbieko @ 2022-12-05  9:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

From: Robbie Ko <robbieko@synology.com>

Change snapshot_lock to dynamic pointer to allocate memory
at the beginning of creating a subvolume/snapshot.

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/disk-io.c | 10 ++++++++--
 fs/btrfs/file.c    |  8 ++++----
 fs/btrfs/inode.c   | 12 ++++++------
 fs/btrfs/ioctl.c   |  4 ++--
 5 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9e6d48ff4597..99003b0dd407 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1429,7 +1429,7 @@ struct btrfs_root {
 	 */
 	int dedupe_in_progress;
 	/* For exclusion of snapshot creation and nocow writes */
-	struct btrfs_drew_lock snapshot_lock;
+	struct btrfs_drew_lock *snapshot_lock;
 
 	atomic_t snapshot_force_cow;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index afe16e1f0b18..5760c2b1a100 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1464,12 +1464,17 @@ static int btrfs_init_fs_root(struct btrfs_root *root,
 	int ret;
 	unsigned int nofs_flag;
 
+	root->snapshot_lock = kzalloc(sizeof(*root->snapshot_lock), GFP_NOFS);
+	if (!root->snapshot_lock) {
+		ret = -ENOMEM;
+		goto fail;
+	}
 	/*
 	 * We might be called under a transaction (e.g. indirect backref
 	 * resolution) which could deadlock if it triggers memory reclaim
 	 */
 	nofs_flag = memalloc_nofs_save();
-	ret = btrfs_drew_lock_init(&root->snapshot_lock);
+	ret = btrfs_drew_lock_init(root->snapshot_lock);
 	memalloc_nofs_restore(nofs_flag);
 	if (ret)
 		goto fail;
@@ -2180,7 +2185,8 @@ void btrfs_put_root(struct btrfs_root *root)
 		WARN_ON(test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state));
 		if (root->anon_dev)
 			free_anon_bdev(root->anon_dev);
-		btrfs_drew_lock_destroy(&root->snapshot_lock);
+		if (root->snapshot_lock)
+			btrfs_drew_lock_destroy(root->snapshot_lock);
 		free_root_extent_buffers(root);
 #ifdef CONFIG_BTRFS_DEBUG
 		spin_lock(&root->fs_info->fs_roots_radix_lock);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index d01631d47806..a338fbd34472 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1379,7 +1379,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 	if (!(inode->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)))
 		return 0;
 
-	if (!btrfs_drew_try_write_lock(&root->snapshot_lock))
+	if (!btrfs_drew_try_write_lock(root->snapshot_lock))
 		return -EAGAIN;
 
 	lockstart = round_down(pos, fs_info->sectorsize);
@@ -1389,7 +1389,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 
 	if (nowait) {
 		if (!btrfs_try_lock_ordered_range(inode, lockstart, lockend)) {
-			btrfs_drew_write_unlock(&root->snapshot_lock);
+			btrfs_drew_write_unlock(root->snapshot_lock);
 			return -EAGAIN;
 		}
 	} else {
@@ -1398,7 +1398,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 	ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
 			NULL, NULL, NULL, nowait, false);
 	if (ret <= 0)
-		btrfs_drew_write_unlock(&root->snapshot_lock);
+		btrfs_drew_write_unlock(root->snapshot_lock);
 	else
 		*write_bytes = min_t(size_t, *write_bytes ,
 				     num_bytes - pos + lockstart);
@@ -1409,7 +1409,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 
 void btrfs_check_nocow_unlock(struct btrfs_inode *inode)
 {
-	btrfs_drew_write_unlock(&inode->root->snapshot_lock);
+	btrfs_drew_write_unlock(inode->root->snapshot_lock);
 }
 
 static void update_time_for_write(struct inode *inode)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0e516aefbf51..8fe4b00d31f4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5167,16 +5167,16 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		 * truncation, it must capture all writes that happened before
 		 * this truncation.
 		 */
-		btrfs_drew_write_lock(&root->snapshot_lock);
+		btrfs_drew_write_lock(root->snapshot_lock);
 		ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, newsize);
 		if (ret) {
-			btrfs_drew_write_unlock(&root->snapshot_lock);
+			btrfs_drew_write_unlock(root->snapshot_lock);
 			return ret;
 		}
 
 		trans = btrfs_start_transaction(root, 1);
 		if (IS_ERR(trans)) {
-			btrfs_drew_write_unlock(&root->snapshot_lock);
+			btrfs_drew_write_unlock(root->snapshot_lock);
 			return PTR_ERR(trans);
 		}
 
@@ -5184,7 +5184,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		btrfs_inode_safe_disk_i_size_write(BTRFS_I(inode), 0);
 		pagecache_isize_extended(inode, oldsize, newsize);
 		ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
-		btrfs_drew_write_unlock(&root->snapshot_lock);
+		btrfs_drew_write_unlock(root->snapshot_lock);
 		btrfs_end_transaction(trans);
 	} else {
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -11026,7 +11026,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 	 * completes before the first write into the swap file after it is
 	 * activated, than that write would fallback to COW.
 	 */
-	if (!btrfs_drew_try_write_lock(&root->snapshot_lock)) {
+	if (!btrfs_drew_try_write_lock(root->snapshot_lock)) {
 		btrfs_exclop_finish(fs_info);
 		btrfs_warn(fs_info,
 	   "cannot activate swapfile because snapshot creation is in progress");
@@ -11199,7 +11199,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 	if (ret)
 		btrfs_swap_deactivate(file);
 
-	btrfs_drew_write_unlock(&root->snapshot_lock);
+	btrfs_drew_write_unlock(root->snapshot_lock);
 
 	btrfs_exclop_finish(fs_info);
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5785336ab7cf..9f1b81ff37a3 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1008,7 +1008,7 @@ static noinline int btrfs_mksnapshot(const struct path *parent,
 	 * possible. This is to avoid later writeback (running dealloc) to
 	 * fallback to COW mode and unexpectedly fail with ENOSPC.
 	 */
-	btrfs_drew_read_lock(&root->snapshot_lock);
+	btrfs_drew_read_lock(root->snapshot_lock);
 
 	ret = btrfs_start_delalloc_snapshot(root, false);
 	if (ret)
@@ -1029,7 +1029,7 @@ static noinline int btrfs_mksnapshot(const struct path *parent,
 out:
 	if (snapshot_force_cow)
 		atomic_dec(&root->snapshot_force_cow);
-	btrfs_drew_read_unlock(&root->snapshot_lock);
+	btrfs_drew_read_unlock(root->snapshot_lock);
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] btrfs: add snapshot_lock to new_fs_root_args
  2022-12-05  9:51 [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot robbieko
  2022-12-05  9:51 ` [PATCH 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot robbieko
  2022-12-05  9:51 ` [PATCH 2/3] btrfs: change snapshot_lock to dynamic pointer robbieko
@ 2022-12-05  9:51 ` robbieko
  2022-12-12 17:06   ` Filipe Manana
  2022-12-05 11:15 ` [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot Filipe Manana
  3 siblings, 1 reply; 16+ messages in thread
From: robbieko @ 2022-12-05  9:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

From: Robbie Ko <robbieko@synology.com>

Add snapshot_lock into new_fs_root_args to prevent
percpu_counter_init enter unexpected -ENOMEM when
calling btrfs_init_fs_root.

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/disk-io.c | 44 +++++++++++++++++++++++++++++++-------------
 fs/btrfs/disk-io.h |  2 ++
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5760c2b1a100..5704c8f5873c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1437,6 +1437,16 @@ struct btrfs_new_fs_root_args *btrfs_new_fs_root_args_prepare(gfp_t gfp_mask)
 	if (err)
 		goto error;
 
+	args->snapshot_lock = kzalloc(sizeof(*args->snapshot_lock), gfp_mask);
+	if (!args->snapshot_lock) {
+		err = -ENOMEM;
+		goto error;
+	}
+	ASSERT((gfp_mask & GFP_KERNEL) == GFP_KERNEL);
+	err = btrfs_drew_lock_init(args->snapshot_lock);
+	if (err)
+		goto error;
+
 	return args;
 
 error:
@@ -1448,6 +1458,9 @@ void btrfs_new_fs_root_args_destroy(struct btrfs_new_fs_root_args *args)
 {
 	if (!args)
 		return;
+	if (args->snapshot_lock)
+		btrfs_drew_lock_destroy(args->snapshot_lock);
+	kfree(args->snapshot_lock);
 	if (args->anon_dev)
 		free_anon_bdev(args->anon_dev);
 	kfree(args);
@@ -1464,20 +1477,25 @@ static int btrfs_init_fs_root(struct btrfs_root *root,
 	int ret;
 	unsigned int nofs_flag;
 
-	root->snapshot_lock = kzalloc(sizeof(*root->snapshot_lock), GFP_NOFS);
-	if (!root->snapshot_lock) {
-		ret = -ENOMEM;
-		goto fail;
+	if (new_fs_root_args && new_fs_root_args->snapshot_lock) {
+		root->snapshot_lock = new_fs_root_args->snapshot_lock;
+		new_fs_root_args->snapshot_lock = NULL;
+	} else {
+		root->snapshot_lock = kzalloc(sizeof(*root->snapshot_lock), GFP_NOFS);
+		if (!root->snapshot_lock) {
+			ret = -ENOMEM;
+			goto fail;
+		}
+		/*
+		 * We might be called under a transaction (e.g. indirect backref
+		 * resolution) which could deadlock if it triggers memory reclaim
+		 */
+		nofs_flag = memalloc_nofs_save();
+		ret = btrfs_drew_lock_init(root->snapshot_lock);
+		memalloc_nofs_restore(nofs_flag);
+		if (ret)
+			goto fail;
 	}
-	/*
-	 * We might be called under a transaction (e.g. indirect backref
-	 * resolution) which could deadlock if it triggers memory reclaim
-	 */
-	nofs_flag = memalloc_nofs_save();
-	ret = btrfs_drew_lock_init(root->snapshot_lock);
-	memalloc_nofs_restore(nofs_flag);
-	if (ret)
-		goto fail;
 
 	if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID &&
 	    !btrfs_is_data_reloc_root(root)) {
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index a7c91bfb0c16..0e84927859ce 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -28,6 +28,8 @@ static inline u64 btrfs_sb_offset(int mirror)
 struct btrfs_new_fs_root_args {
 	/* Preallocated anonymous block device number */
 	dev_t anon_dev;
+	/* Preallocated snapshot lock */
+	struct btrfs_drew_lock *snapshot_lock;
 };
 
 struct btrfs_device;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot
  2022-12-05  9:51 [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot robbieko
                   ` (2 preceding siblings ...)
  2022-12-05  9:51 ` [PATCH 3/3] btrfs: add snapshot_lock to new_fs_root_args robbieko
@ 2022-12-05 11:15 ` Filipe Manana
  2022-12-06  8:58   ` robbieko
  3 siblings, 1 reply; 16+ messages in thread
From: Filipe Manana @ 2022-12-05 11:15 UTC (permalink / raw)
  To: robbieko; +Cc: linux-btrfs

On Mon, Dec 5, 2022 at 10:55 AM robbieko <robbieko@synology.com> wrote:
>
> From: Robbie Ko <robbieko@synology.com>
>
> [Issue]
> When creating subvolume/snapshot, the transaction may be abort due to -ENOMEM
>
>   WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937 create_pending_snapshot+0xe30/0xe70 [btrfs]()
>   CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661

4.4.180...

How often does that happen on a supported kernel? The oldest supported
kernel is 4.9 at the moment.

>   Call Trace:
>     create_pending_snapshot+0xe30/0xe70 [btrfs]
>     create_pending_snapshots+0x89/0xb0 [btrfs]
>     btrfs_commit_transaction+0x469/0xc60 [btrfs]
>     btrfs_mksubvol+0x5bd/0x690 [btrfs]
>     btrfs_mksnapshot+0x102/0x170 [btrfs]
>     btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
>     btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
>     btrfs_ioctl+0x2111/0x3130 [btrfs]
>     do_vfs_ioctl+0x7ea/0xa80
>     SyS_ioctl+0xa1/0xb0
>     entry_SYSCALL_64_fastpath+0x1e/0x8e
>   ---[ end trace 910c8f86780ca385 ]---
>   BTRFS: error (device dm-2) in create_pending_snapshot:1937: errno=-12 Out of memory
>
> [Cause]
> During creating a subvolume/snapshot, it is necessary to allocate memory for Initializing fs root.
> Therefore, it can only use GFP_NOFS to allocate memory to avoid deadlock issues.
> However, atomic allocation is required when processing percpu_counter_init
> without GFP_KERNEL due to the unique structure of percpu_counter.
> In this situation, allocating memory for initializing fs root may cause
> unexpected -ENOMEM when free memory is low and causes btrfs transaction to abort.

This sounds familiar, and we had a regression in mm that made
percepu_counter_init fail very often with -ENOMEM.
See:

https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/

The kernel fix was this:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0760fa3d8f7fceeea508b98899f1c826e10ffe78

I'm assuming that you are probably running a Synology kernel based on
4.4 with a lot of backported patches, is that correct?
Maybe you have the patchset that introduced the regression in that
kernel, but that later fix is not in there.

Thanks.


>
> [Fix]
> We allocate memory at the beginning of creating a subvolume/snapshot.
> This way, we can ensure the memory is enough when initializing fs root.
> Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
> the transaction won’t abort since it hasn’t started yet.
>
> Robbie Ko (3):
>   btrfs: refactor anon_dev with new_fs_root_args for create
>     subvolume/snapshot
>   btrfs: change snapshot_lock to dynamic pointer
>   btrfs: add snapshot_lock to new_fs_root_args
>
>  fs/btrfs/ctree.h       |   2 +-
>  fs/btrfs/disk-io.c     | 107 ++++++++++++++++++++++++++++++-----------
>  fs/btrfs/disk-io.h     |  12 ++++-
>  fs/btrfs/file.c        |   8 +--
>  fs/btrfs/inode.c       |  12 ++---
>  fs/btrfs/ioctl.c       |  38 +++++++--------
>  fs/btrfs/transaction.c |   2 +-
>  fs/btrfs/transaction.h |   5 +-
>  8 files changed, 123 insertions(+), 63 deletions(-)
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot
  2022-12-05 11:15 ` [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot Filipe Manana
@ 2022-12-06  8:58   ` robbieko
  2022-12-08  6:24     ` robbieko
  0 siblings, 1 reply; 16+ messages in thread
From: robbieko @ 2022-12-06  8:58 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs


Filipe Manana 於 2022/12/5 下午7:15 寫道:
> On Mon, Dec 5, 2022 at 10:55 AM robbieko <robbieko@synology.com> wrote:
>> From: Robbie Ko <robbieko@synology.com>
>>
>> [Issue]
>> When creating subvolume/snapshot, the transaction may be abort due to -ENOMEM
>>
>>    WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937 create_pending_snapshot+0xe30/0xe70 [btrfs]()
>>    CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
> 4.4.180...
>
> How often does that happen on a supported kernel? The oldest supported
> kernel is 4.9 at the moment.

The occurrence of this issue is extremely low, and it cannot be 
reproduced stably.
We have millions of machines out there, and this issue happened 3 times 
in the last two years.


>
>>    Call Trace:
>>      create_pending_snapshot+0xe30/0xe70 [btrfs]
>>      create_pending_snapshots+0x89/0xb0 [btrfs]
>>      btrfs_commit_transaction+0x469/0xc60 [btrfs]
>>      btrfs_mksubvol+0x5bd/0x690 [btrfs]
>>      btrfs_mksnapshot+0x102/0x170 [btrfs]
>>      btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
>>      btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
>>      btrfs_ioctl+0x2111/0x3130 [btrfs]
>>      do_vfs_ioctl+0x7ea/0xa80
>>      SyS_ioctl+0xa1/0xb0
>>      entry_SYSCALL_64_fastpath+0x1e/0x8e
>>    ---[ end trace 910c8f86780ca385 ]---
>>    BTRFS: error (device dm-2) in create_pending_snapshot:1937: errno=-12 Out of memory
>>
>> [Cause]
>> During creating a subvolume/snapshot, it is necessary to allocate memory for Initializing fs root.
>> Therefore, it can only use GFP_NOFS to allocate memory to avoid deadlock issues.
>> However, atomic allocation is required when processing percpu_counter_init
>> without GFP_KERNEL due to the unique structure of percpu_counter.
>> In this situation, allocating memory for initializing fs root may cause
>> unexpected -ENOMEM when free memory is low and causes btrfs transaction to abort.
> This sounds familiar, and we had a regression in mm that made
> percepu_counter_init fail very often with -ENOMEM.
> See:
>
> https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/
>
> The kernel fix was this:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0760fa3d8f7fceeea508b98899f1c826e10ffe78
>
> I'm assuming that you are probably running a Synology kernel based on
> 4.4 with a lot of backported patches, is that correct?
> Maybe you have the patchset that introduced the regression in that
> kernel, but that later fix is not in there.
>
> Thanks.

Yes, We do backport many patches, but we don't backport "mm: 
memcg/percpu: account percpu memory to memory cgroups",
So we think the issue has nothing to do with "percpu: make 
pcpu_nr_empty_pop_pages per chunk type".

This issue is just a corner case, when percpu_counter_init is executed 
with GFP_NOFS, there is a chance to fail.
So we feel that the snapshot_lock should be preallocated.

Thanks.


>
>> [Fix]
>> We allocate memory at the beginning of creating a subvolume/snapshot.
>> This way, we can ensure the memory is enough when initializing fs root.
>> Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
>> the transaction won’t abort since it hasn’t started yet.
>>
>> Robbie Ko (3):
>>    btrfs: refactor anon_dev with new_fs_root_args for create
>>      subvolume/snapshot
>>    btrfs: change snapshot_lock to dynamic pointer
>>    btrfs: add snapshot_lock to new_fs_root_args
>>
>>   fs/btrfs/ctree.h       |   2 +-
>>   fs/btrfs/disk-io.c     | 107 ++++++++++++++++++++++++++++++-----------
>>   fs/btrfs/disk-io.h     |  12 ++++-
>>   fs/btrfs/file.c        |   8 +--
>>   fs/btrfs/inode.c       |  12 ++---
>>   fs/btrfs/ioctl.c       |  38 +++++++--------
>>   fs/btrfs/transaction.c |   2 +-
>>   fs/btrfs/transaction.h |   5 +-
>>   8 files changed, 123 insertions(+), 63 deletions(-)
>>
>> --
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot
  2022-12-06  8:58   ` robbieko
@ 2022-12-08  6:24     ` robbieko
  2022-12-12 17:04       ` Filipe Manana
  0 siblings, 1 reply; 16+ messages in thread
From: robbieko @ 2022-12-08  6:24 UTC (permalink / raw)
  To: linux-btrfs, Filipe Manana


robbieko 於 2022/12/6 下午4:58 寫道:
>
> Filipe Manana 於 2022/12/5 下午7:15 寫道:
>> On Mon, Dec 5, 2022 at 10:55 AM robbieko <robbieko@synology.com> wrote:
>>> From: Robbie Ko <robbieko@synology.com>
>>>
>>> [Issue]
>>> When creating subvolume/snapshot, the transaction may be abort due 
>>> to -ENOMEM
>>>
>>>    WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937 
>>> create_pending_snapshot+0xe30/0xe70 [btrfs]()
>>>    CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
>> 4.4.180...
>>
>> How often does that happen on a supported kernel? The oldest supported
>> kernel is 4.9 at the moment.
>
> The occurrence of this issue is extremely low, and it cannot be 
> reproduced stably.
> We have millions of machines out there, and this issue happened 3 
> times in the last two years.
>
>
>>
>>>    Call Trace:
>>>      create_pending_snapshot+0xe30/0xe70 [btrfs]
>>>      create_pending_snapshots+0x89/0xb0 [btrfs]
>>>      btrfs_commit_transaction+0x469/0xc60 [btrfs]
>>>      btrfs_mksubvol+0x5bd/0x690 [btrfs]
>>>      btrfs_mksnapshot+0x102/0x170 [btrfs]
>>>      btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
>>>      btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
>>>      btrfs_ioctl+0x2111/0x3130 [btrfs]
>>>      do_vfs_ioctl+0x7ea/0xa80
>>>      SyS_ioctl+0xa1/0xb0
>>>      entry_SYSCALL_64_fastpath+0x1e/0x8e
>>>    ---[ end trace 910c8f86780ca385 ]---
>>>    BTRFS: error (device dm-2) in create_pending_snapshot:1937: 
>>> errno=-12 Out of memory
>>>
>>> [Cause]
>>> During creating a subvolume/snapshot, it is necessary to allocate 
>>> memory for Initializing fs root.
>>> Therefore, it can only use GFP_NOFS to allocate memory to avoid 
>>> deadlock issues.
>>> However, atomic allocation is required when processing 
>>> percpu_counter_init
>>> without GFP_KERNEL due to the unique structure of percpu_counter.
>>> In this situation, allocating memory for initializing fs root may cause
>>> unexpected -ENOMEM when free memory is low and causes btrfs 
>>> transaction to abort.
>> This sounds familiar, and we had a regression in mm that made
>> percepu_counter_init fail very often with -ENOMEM.
>> See:
>>
>> https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/ 
>>
>>
>> The kernel fix was this:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0760fa3d8f7fceeea508b98899f1c826e10ffe78 
>>
>>
>> I'm assuming that you are probably running a Synology kernel based on
>> 4.4 with a lot of backported patches, is that correct?
>> Maybe you have the patchset that introduced the regression in that
>> kernel, but that later fix is not in there.
>>
>> Thanks.
>
> Yes, We do backport many patches, but we don't backport "mm: 
> memcg/percpu: account percpu memory to memory cgroups",
> So we think the issue has nothing to do with "percpu: make 
> pcpu_nr_empty_pop_pages per chunk type".
>
> This issue is just a corner case, when percpu_counter_init is executed 
> with GFP_NOFS, there is a chance to fail.
> So we feel that the snapshot_lock should be preallocated.
>
> Thanks.
>
>

Anyone have any suggestions ?


>>
>>> [Fix]
>>> We allocate memory at the beginning of creating a subvolume/snapshot.
>>> This way, we can ensure the memory is enough when initializing fs root.
>>> Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
>>> the transaction won’t abort since it hasn’t started yet.
>>>
>>> Robbie Ko (3):
>>>    btrfs: refactor anon_dev with new_fs_root_args for create
>>>      subvolume/snapshot
>>>    btrfs: change snapshot_lock to dynamic pointer
>>>    btrfs: add snapshot_lock to new_fs_root_args
>>>
>>>   fs/btrfs/ctree.h       |   2 +-
>>>   fs/btrfs/disk-io.c     | 107 
>>> ++++++++++++++++++++++++++++++-----------
>>>   fs/btrfs/disk-io.h     |  12 ++++-
>>>   fs/btrfs/file.c        |   8 +--
>>>   fs/btrfs/inode.c       |  12 ++---
>>>   fs/btrfs/ioctl.c       |  38 +++++++--------
>>>   fs/btrfs/transaction.c |   2 +-
>>>   fs/btrfs/transaction.h |   5 +-
>>>   8 files changed, 123 insertions(+), 63 deletions(-)
>>>
>>> -- 
>>> 2.17.1
>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot
  2022-12-08  6:24     ` robbieko
@ 2022-12-12 17:04       ` Filipe Manana
  2022-12-13  2:59         ` robbieko
  0 siblings, 1 reply; 16+ messages in thread
From: Filipe Manana @ 2022-12-12 17:04 UTC (permalink / raw)
  To: robbieko; +Cc: linux-btrfs

On Thu, Dec 8, 2022 at 6:24 AM robbieko <robbieko@synology.com> wrote:
>
>
> robbieko 於 2022/12/6 下午4:58 寫道:
> >
> > Filipe Manana 於 2022/12/5 下午7:15 寫道:
> >> On Mon, Dec 5, 2022 at 10:55 AM robbieko <robbieko@synology.com> wrote:
> >>> From: Robbie Ko <robbieko@synology.com>
> >>>
> >>> [Issue]
> >>> When creating subvolume/snapshot, the transaction may be abort due
> >>> to -ENOMEM
> >>>
> >>>    WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937
> >>> create_pending_snapshot+0xe30/0xe70 [btrfs]()
> >>>    CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
> >> 4.4.180...
> >>
> >> How often does that happen on a supported kernel? The oldest supported
> >> kernel is 4.9 at the moment.
> >
> > The occurrence of this issue is extremely low, and it cannot be
> > reproduced stably.
> > We have millions of machines out there, and this issue happened 3
> > times in the last two years.
> >
> >
> >>
> >>>    Call Trace:
> >>>      create_pending_snapshot+0xe30/0xe70 [btrfs]
> >>>      create_pending_snapshots+0x89/0xb0 [btrfs]
> >>>      btrfs_commit_transaction+0x469/0xc60 [btrfs]
> >>>      btrfs_mksubvol+0x5bd/0x690 [btrfs]
> >>>      btrfs_mksnapshot+0x102/0x170 [btrfs]
> >>>      btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
> >>>      btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
> >>>      btrfs_ioctl+0x2111/0x3130 [btrfs]
> >>>      do_vfs_ioctl+0x7ea/0xa80
> >>>      SyS_ioctl+0xa1/0xb0
> >>>      entry_SYSCALL_64_fastpath+0x1e/0x8e
> >>>    ---[ end trace 910c8f86780ca385 ]---
> >>>    BTRFS: error (device dm-2) in create_pending_snapshot:1937:
> >>> errno=-12 Out of memory
> >>>
> >>> [Cause]
> >>> During creating a subvolume/snapshot, it is necessary to allocate
> >>> memory for Initializing fs root.
> >>> Therefore, it can only use GFP_NOFS to allocate memory to avoid
> >>> deadlock issues.
> >>> However, atomic allocation is required when processing
> >>> percpu_counter_init
> >>> without GFP_KERNEL due to the unique structure of percpu_counter.
> >>> In this situation, allocating memory for initializing fs root may cause
> >>> unexpected -ENOMEM when free memory is low and causes btrfs
> >>> transaction to abort.
> >> This sounds familiar, and we had a regression in mm that made
> >> percepu_counter_init fail very often with -ENOMEM.
> >> See:
> >>
> >> https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/
> >>
> >>
> >> The kernel fix was this:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0760fa3d8f7fceeea508b98899f1c826e10ffe78
> >>
> >>
> >> I'm assuming that you are probably running a Synology kernel based on
> >> 4.4 with a lot of backported patches, is that correct?
> >> Maybe you have the patchset that introduced the regression in that
> >> kernel, but that later fix is not in there.
> >>
> >> Thanks.
> >
> > Yes, We do backport many patches, but we don't backport "mm:
> > memcg/percpu: account percpu memory to memory cgroups",
> > So we think the issue has nothing to do with "percpu: make
> > pcpu_nr_empty_pop_pages per chunk type".
> >
> > This issue is just a corner case, when percpu_counter_init is executed
> > with GFP_NOFS, there is a chance to fail.
> > So we feel that the snapshot_lock should be preallocated.
> >
> > Thanks.
> >
> >
>
> Anyone have any suggestions ?

Sorry, some holidays and other priorities came in.

The idea seems fine to me. I was just wondering if you weren't hitting
that regression in mm.
Patches 1 and 2 don't apply with current misc-next, btw, they require
manual conflict resolution.

I'll leave review comments on each patch, small things, nothing major.

Thanks.

>
>
> >>
> >>> [Fix]
> >>> We allocate memory at the beginning of creating a subvolume/snapshot.
> >>> This way, we can ensure the memory is enough when initializing fs root.
> >>> Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
> >>> the transaction won’t abort since it hasn’t started yet.
> >>>
> >>> Robbie Ko (3):
> >>>    btrfs: refactor anon_dev with new_fs_root_args for create
> >>>      subvolume/snapshot
> >>>    btrfs: change snapshot_lock to dynamic pointer
> >>>    btrfs: add snapshot_lock to new_fs_root_args
> >>>
> >>>   fs/btrfs/ctree.h       |   2 +-
> >>>   fs/btrfs/disk-io.c     | 107
> >>> ++++++++++++++++++++++++++++++-----------
> >>>   fs/btrfs/disk-io.h     |  12 ++++-
> >>>   fs/btrfs/file.c        |   8 +--
> >>>   fs/btrfs/inode.c       |  12 ++---
> >>>   fs/btrfs/ioctl.c       |  38 +++++++--------
> >>>   fs/btrfs/transaction.c |   2 +-
> >>>   fs/btrfs/transaction.h |   5 +-
> >>>   8 files changed, 123 insertions(+), 63 deletions(-)
> >>>
> >>> --
> >>> 2.17.1
> >>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot
  2022-12-05  9:51 ` [PATCH 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot robbieko
@ 2022-12-12 17:04   ` Filipe Manana
  2022-12-13  3:29     ` robbieko
  0 siblings, 1 reply; 16+ messages in thread
From: Filipe Manana @ 2022-12-12 17:04 UTC (permalink / raw)
  To: robbieko; +Cc: linux-btrfs

On Mon, Dec 5, 2022 at 10:11 AM robbieko <robbieko@synology.com> wrote:
>
> From: Robbie Ko <robbieko@synology.com>
>
> Create a new structure called btrfs_new_fs_root_args that
> refactor the type of anon_dev.
> With the new structure, btrfs_get_new_fs_root can input
> btrfs_new_fs_root_args into btrfs_init_fs_root by order.
>
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
>  fs/btrfs/disk-io.c     | 63 ++++++++++++++++++++++++++++++------------
>  fs/btrfs/disk-io.h     | 10 ++++++-
>  fs/btrfs/ioctl.c       | 34 +++++++++++------------
>  fs/btrfs/transaction.c |  2 +-
>  fs/btrfs/transaction.h |  5 ++--
>  5 files changed, 74 insertions(+), 40 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index d99bf7c64611..afe16e1f0b18 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1422,12 +1422,44 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
>         return root;
>  }
>
> +struct btrfs_new_fs_root_args *btrfs_new_fs_root_args_prepare(gfp_t gfp_mask)

Can we remove the argument?
Even after the whole patchset, it will always be GFP_KERNEL.

I know patch 3/3 adds an assertion to verify we get GFP_KERNEL, but
lockdep is smart enough to detect an issue and complain.

Also, can you name the function something like btrfs_alloc_new_fs_roots_arg() ?
The naming pattern we use for allocator functions is to use "alloc" in
the name, for e.g. btrfs_alloc_path(), alloc_extent_map(), etc.
So having an allocator named *_prepare() breaks the pattern and it's
also confusing (as it allocates the structure).

> +{
> +       int err;
> +       struct btrfs_new_fs_root_args *args;
> +
> +       args = kzalloc(sizeof(*args), gfp_mask);
> +       if (!args) {
> +               err = -ENOMEM;
> +               goto error;
> +       }

if (!args)
     return ERR_PTR(-ENOMEM);

That's simpler.

> +
> +       err = get_anon_bdev(&args->anon_dev);
> +       if (err)
> +               goto error;
> +
> +       return args;
> +
> +error:
> +       btrfs_new_fs_root_args_destroy(args);
> +       return ERR_PTR(err);
> +}
> +
> +void btrfs_new_fs_root_args_destroy(struct btrfs_new_fs_root_args *args)

Same thing about the naming consistency here. Something like:
btrfs_free_new_fs_roots_arg()
We have btrfs_free_path(), free_extent_map(), etc.

Thanks.

> +{
> +       if (!args)
> +               return;
> +       if (args->anon_dev)
> +               free_anon_bdev(args->anon_dev);
> +       kfree(args);
> +}
> +
>  /*
>   * Initialize subvolume root in-memory structure
>   *
>   * @anon_dev:  anonymous device to attach to the root, if zero, allocate new
>   */
> -static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
> +static int btrfs_init_fs_root(struct btrfs_root *root,
> +                             struct btrfs_new_fs_root_args *new_fs_root_args)
>  {
>         int ret;
>         unsigned int nofs_flag;
> @@ -1454,12 +1486,13 @@ static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
>          */
>         if (is_fstree(root->root_key.objectid) &&
>             btrfs_root_refs(&root->root_item) > 0) {
> -               if (!anon_dev) {
> +               if (!new_fs_root_args || !new_fs_root_args->anon_dev) {
>                         ret = get_anon_bdev(&root->anon_dev);
>                         if (ret)
>                                 goto fail;
>                 } else {
> -                       root->anon_dev = anon_dev;
> +                       root->anon_dev = new_fs_root_args->anon_dev;
> +                       new_fs_root_args->anon_dev = 0;
>                 }
>         }
>
> @@ -1633,7 +1666,8 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
>   *             for orphan roots
>   */
>  static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
> -                                            u64 objectid, dev_t anon_dev,
> +                                            u64 objectid,
> +                                            struct btrfs_new_fs_root_args *new_fs_root_args,
>                                              bool check_ref)
>  {
>         struct btrfs_root *root;
> @@ -1647,8 +1681,8 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
>  again:
>         root = btrfs_lookup_fs_root(fs_info, objectid);
>         if (root) {
> -               /* Shouldn't get preallocated anon_dev for cached roots */
> -               ASSERT(!anon_dev);
> +               /* Shouldn't get new_fs_root_args for cached roots */
> +               ASSERT(!new_fs_root_args);
>                 if (check_ref && btrfs_root_refs(&root->root_item) == 0) {
>                         btrfs_put_root(root);
>                         return ERR_PTR(-ENOENT);
> @@ -1668,7 +1702,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
>                 goto fail;
>         }
>
> -       ret = btrfs_init_fs_root(root, anon_dev);
> +       ret = btrfs_init_fs_root(root, new_fs_root_args);
>         if (ret)
>                 goto fail;
>
> @@ -1698,14 +1732,6 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
>         }
>         return root;
>  fail:
> -       /*
> -        * If our caller provided us an anonymous device, then it's his
> -        * responsibility to free it in case we fail. So we have to set our
> -        * root's anon_dev to 0 to avoid a double free, once by btrfs_put_root()
> -        * and once again by our caller.
> -        */
> -       if (anon_dev)
> -               root->anon_dev = 0;
>         btrfs_put_root(root);
>         return ERR_PTR(ret);
>  }
> @@ -1720,7 +1746,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
>  struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
>                                      u64 objectid, bool check_ref)
>  {
> -       return btrfs_get_root_ref(fs_info, objectid, 0, check_ref);
> +       return btrfs_get_root_ref(fs_info, objectid, NULL, check_ref);
>  }
>
>  /*
> @@ -1732,9 +1758,10 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
>   *             parameter value
>   */
>  struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
> -                                        u64 objectid, dev_t anon_dev)
> +                                        u64 objectid,
> +                                        struct btrfs_new_fs_root_args *new_fs_root_args)
>  {
> -       return btrfs_get_root_ref(fs_info, objectid, anon_dev, true);
> +       return btrfs_get_root_ref(fs_info, objectid, new_fs_root_args, true);
>  }
>
>  /*
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index 9fa923e005a3..a7c91bfb0c16 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -25,6 +25,11 @@ static inline u64 btrfs_sb_offset(int mirror)
>         return BTRFS_SUPER_INFO_OFFSET;
>  }
>
> +struct btrfs_new_fs_root_args {
> +       /* Preallocated anonymous block device number */
> +       dev_t anon_dev;
> +};
> +
>  struct btrfs_device;
>  struct btrfs_fs_devices;
>
> @@ -58,6 +63,8 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
>  int btrfs_commit_super(struct btrfs_fs_info *fs_info);
>  struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
>                                         struct btrfs_key *key);
> +struct btrfs_new_fs_root_args *btrfs_new_fs_root_args_prepare(gfp_t gfp_mask);
> +void btrfs_new_fs_root_args_destroy(struct btrfs_new_fs_root_args *args);
>  int btrfs_insert_fs_root(struct btrfs_fs_info *fs_info,
>                          struct btrfs_root *root);
>  void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info);
> @@ -65,7 +72,8 @@ void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info);
>  struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
>                                      u64 objectid, bool check_ref);
>  struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
> -                                        u64 objectid, dev_t anon_dev);
> +                                        u64 objectid,
> +                                        struct btrfs_new_fs_root_args *new_fs_root_args);
>  struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
>                                                  struct btrfs_path *path,
>                                                  u64 objectid);
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 5ba2e810dc6e..5785336ab7cf 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -588,8 +588,8 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
>         };
>         unsigned int trans_num_items;
>         int ret;
> -       dev_t anon_dev;
>         u64 objectid;
> +       struct btrfs_new_fs_root_args *new_fs_root_args = NULL;
>
>         root_item = kzalloc(sizeof(*root_item), GFP_KERNEL);
>         if (!root_item)
> @@ -608,14 +608,17 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
>                 goto out_root_item;
>         }
>
> -       ret = get_anon_bdev(&anon_dev);
> -       if (ret < 0)
> +       new_fs_root_args = btrfs_new_fs_root_args_prepare(GFP_KERNEL);
> +       if (IS_ERR(new_fs_root_args)) {
> +               ret = PTR_ERR(new_fs_root_args);
> +               new_fs_root_args = NULL;
>                 goto out_root_item;
> +       }
>
>         new_inode_args.inode = btrfs_new_subvol_inode(mnt_userns, dir);
>         if (!new_inode_args.inode) {
>                 ret = -ENOMEM;
> -               goto out_anon_dev;
> +               goto out_root_item;
>         }
>         ret = btrfs_new_inode_prepare(&new_inode_args, &trans_num_items);
>         if (ret)
> @@ -706,14 +709,12 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
>         free_extent_buffer(leaf);
>         leaf = NULL;
>
> -       new_root = btrfs_get_new_fs_root(fs_info, objectid, anon_dev);
> +       new_root = btrfs_get_new_fs_root(fs_info, objectid, new_fs_root_args);
>         if (IS_ERR(new_root)) {
>                 ret = PTR_ERR(new_root);
>                 btrfs_abort_transaction(trans, ret);
>                 goto out;
>         }
> -       /* anon_dev is owned by new_root now. */
> -       anon_dev = 0;
>         BTRFS_I(new_inode_args.inode)->root = new_root;
>         /* ... and new_root is owned by new_inode_args.inode now. */
>
> @@ -752,10 +753,8 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
>         btrfs_new_inode_args_destroy(&new_inode_args);
>  out_inode:
>         iput(new_inode_args.inode);
> -out_anon_dev:
> -       if (anon_dev)
> -               free_anon_bdev(anon_dev);
>  out_root_item:
> +       btrfs_new_fs_root_args_destroy(new_fs_root_args);
>         kfree(root_item);
>         return ret;
>  }
> @@ -791,9 +790,13 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
>         if (!pending_snapshot)
>                 return -ENOMEM;
>
> -       ret = get_anon_bdev(&pending_snapshot->anon_dev);
> -       if (ret < 0)
> +       pending_snapshot->new_fs_root_args = btrfs_new_fs_root_args_prepare(GFP_KERNEL);
> +       if (IS_ERR(pending_snapshot->new_fs_root_args)) {
> +               ret = PTR_ERR(pending_snapshot->new_fs_root_args);
> +               pending_snapshot->new_fs_root_args = NULL;
>                 goto free_pending;
> +       }
> +
>         pending_snapshot->root_item = kzalloc(sizeof(struct btrfs_root_item),
>                         GFP_KERNEL);
>         pending_snapshot->path = btrfs_alloc_path();
> @@ -850,16 +853,11 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
>
>         d_instantiate(dentry, inode);
>         ret = 0;
> -       pending_snapshot->anon_dev = 0;
>  fail:
> -       /* Prevent double freeing of anon_dev */
> -       if (ret && pending_snapshot->snap)
> -               pending_snapshot->snap->anon_dev = 0;
>         btrfs_put_root(pending_snapshot->snap);
>         btrfs_subvolume_release_metadata(root, &pending_snapshot->block_rsv);
>  free_pending:
> -       if (pending_snapshot->anon_dev)
> -               free_anon_bdev(pending_snapshot->anon_dev);
> +       btrfs_new_fs_root_args_destroy(pending_snapshot->new_fs_root_args);
>         kfree(pending_snapshot->root_item);
>         btrfs_free_path(pending_snapshot->path);
>         kfree(pending_snapshot);
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index d1f1da6820fb..614eb84422e1 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1777,7 +1777,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>         }
>
>         key.offset = (u64)-1;
> -       pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->anon_dev);
> +       pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->new_fs_root_args);
>         if (IS_ERR(pending->snap)) {
>                 ret = PTR_ERR(pending->snap);
>                 pending->snap = NULL;
> diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
> index 970ff316069d..34648a617a2c 100644
> --- a/fs/btrfs/transaction.h
> +++ b/fs/btrfs/transaction.h
> @@ -10,6 +10,7 @@
>  #include "btrfs_inode.h"
>  #include "delayed-ref.h"
>  #include "ctree.h"
> +#include "disk-io.h"
>
>  enum btrfs_trans_state {
>         TRANS_STATE_RUNNING,
> @@ -161,8 +162,8 @@ struct btrfs_pending_snapshot {
>         struct btrfs_block_rsv block_rsv;
>         /* extra metadata reservation for relocation */
>         int error;
> -       /* Preallocated anonymous block device number */
> -       dev_t anon_dev;
> +       /* Preallocated new_fs_root_args */
> +       struct btrfs_new_fs_root_args *new_fs_root_args;
>         bool readonly;
>         struct list_head list;
>  };
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] btrfs: change snapshot_lock to dynamic pointer
  2022-12-05  9:51 ` [PATCH 2/3] btrfs: change snapshot_lock to dynamic pointer robbieko
@ 2022-12-12 17:05   ` Filipe Manana
  2022-12-13  3:31     ` robbieko
  0 siblings, 1 reply; 16+ messages in thread
From: Filipe Manana @ 2022-12-12 17:05 UTC (permalink / raw)
  To: robbieko; +Cc: linux-btrfs

On Mon, Dec 5, 2022 at 10:30 AM robbieko <robbieko@synology.com> wrote:
>
> From: Robbie Ko <robbieko@synology.com>
>
> Change snapshot_lock to dynamic pointer to allocate memory
> at the beginning of creating a subvolume/snapshot.
>
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
>  fs/btrfs/ctree.h   |  2 +-
>  fs/btrfs/disk-io.c | 10 ++++++++--
>  fs/btrfs/file.c    |  8 ++++----
>  fs/btrfs/inode.c   | 12 ++++++------
>  fs/btrfs/ioctl.c   |  4 ++--
>  5 files changed, 21 insertions(+), 15 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 9e6d48ff4597..99003b0dd407 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1429,7 +1429,7 @@ struct btrfs_root {
>          */
>         int dedupe_in_progress;
>         /* For exclusion of snapshot creation and nocow writes */
> -       struct btrfs_drew_lock snapshot_lock;
> +       struct btrfs_drew_lock *snapshot_lock;
>
>         atomic_t snapshot_force_cow;
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index afe16e1f0b18..5760c2b1a100 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1464,12 +1464,17 @@ static int btrfs_init_fs_root(struct btrfs_root *root,
>         int ret;
>         unsigned int nofs_flag;
>
> +       root->snapshot_lock = kzalloc(sizeof(*root->snapshot_lock), GFP_NOFS);
> +       if (!root->snapshot_lock) {
> +               ret = -ENOMEM;
> +               goto fail;
> +       }
>         /*
>          * We might be called under a transaction (e.g. indirect backref
>          * resolution) which could deadlock if it triggers memory reclaim
>          */
>         nofs_flag = memalloc_nofs_save();
> -       ret = btrfs_drew_lock_init(&root->snapshot_lock);
> +       ret = btrfs_drew_lock_init(root->snapshot_lock);
>         memalloc_nofs_restore(nofs_flag);
>         if (ret)
>                 goto fail;
> @@ -2180,7 +2185,8 @@ void btrfs_put_root(struct btrfs_root *root)
>                 WARN_ON(test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state));
>                 if (root->anon_dev)
>                         free_anon_bdev(root->anon_dev);
> -               btrfs_drew_lock_destroy(&root->snapshot_lock);
> +               if (root->snapshot_lock)
> +                       btrfs_drew_lock_destroy(root->snapshot_lock);

This is leaking the memory allocated for the lock, it needs a
kfree(root->snapshot_lock) too.

Thanks.

>                 free_root_extent_buffers(root);
>  #ifdef CONFIG_BTRFS_DEBUG
>                 spin_lock(&root->fs_info->fs_roots_radix_lock);
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index d01631d47806..a338fbd34472 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1379,7 +1379,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>         if (!(inode->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)))
>                 return 0;
>
> -       if (!btrfs_drew_try_write_lock(&root->snapshot_lock))
> +       if (!btrfs_drew_try_write_lock(root->snapshot_lock))
>                 return -EAGAIN;
>
>         lockstart = round_down(pos, fs_info->sectorsize);
> @@ -1389,7 +1389,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>
>         if (nowait) {
>                 if (!btrfs_try_lock_ordered_range(inode, lockstart, lockend)) {
> -                       btrfs_drew_write_unlock(&root->snapshot_lock);
> +                       btrfs_drew_write_unlock(root->snapshot_lock);
>                         return -EAGAIN;
>                 }
>         } else {
> @@ -1398,7 +1398,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>         ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
>                         NULL, NULL, NULL, nowait, false);
>         if (ret <= 0)
> -               btrfs_drew_write_unlock(&root->snapshot_lock);
> +               btrfs_drew_write_unlock(root->snapshot_lock);
>         else
>                 *write_bytes = min_t(size_t, *write_bytes ,
>                                      num_bytes - pos + lockstart);
> @@ -1409,7 +1409,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>
>  void btrfs_check_nocow_unlock(struct btrfs_inode *inode)
>  {
> -       btrfs_drew_write_unlock(&inode->root->snapshot_lock);
> +       btrfs_drew_write_unlock(inode->root->snapshot_lock);
>  }
>
>  static void update_time_for_write(struct inode *inode)
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 0e516aefbf51..8fe4b00d31f4 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -5167,16 +5167,16 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
>                  * truncation, it must capture all writes that happened before
>                  * this truncation.
>                  */
> -               btrfs_drew_write_lock(&root->snapshot_lock);
> +               btrfs_drew_write_lock(root->snapshot_lock);
>                 ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, newsize);
>                 if (ret) {
> -                       btrfs_drew_write_unlock(&root->snapshot_lock);
> +                       btrfs_drew_write_unlock(root->snapshot_lock);
>                         return ret;
>                 }
>
>                 trans = btrfs_start_transaction(root, 1);
>                 if (IS_ERR(trans)) {
> -                       btrfs_drew_write_unlock(&root->snapshot_lock);
> +                       btrfs_drew_write_unlock(root->snapshot_lock);
>                         return PTR_ERR(trans);
>                 }
>
> @@ -5184,7 +5184,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
>                 btrfs_inode_safe_disk_i_size_write(BTRFS_I(inode), 0);
>                 pagecache_isize_extended(inode, oldsize, newsize);
>                 ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
> -               btrfs_drew_write_unlock(&root->snapshot_lock);
> +               btrfs_drew_write_unlock(root->snapshot_lock);
>                 btrfs_end_transaction(trans);
>         } else {
>                 struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> @@ -11026,7 +11026,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
>          * completes before the first write into the swap file after it is
>          * activated, than that write would fallback to COW.
>          */
> -       if (!btrfs_drew_try_write_lock(&root->snapshot_lock)) {
> +       if (!btrfs_drew_try_write_lock(root->snapshot_lock)) {
>                 btrfs_exclop_finish(fs_info);
>                 btrfs_warn(fs_info,
>            "cannot activate swapfile because snapshot creation is in progress");
> @@ -11199,7 +11199,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
>         if (ret)
>                 btrfs_swap_deactivate(file);
>
> -       btrfs_drew_write_unlock(&root->snapshot_lock);
> +       btrfs_drew_write_unlock(root->snapshot_lock);
>
>         btrfs_exclop_finish(fs_info);
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 5785336ab7cf..9f1b81ff37a3 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1008,7 +1008,7 @@ static noinline int btrfs_mksnapshot(const struct path *parent,
>          * possible. This is to avoid later writeback (running dealloc) to
>          * fallback to COW mode and unexpectedly fail with ENOSPC.
>          */
> -       btrfs_drew_read_lock(&root->snapshot_lock);
> +       btrfs_drew_read_lock(root->snapshot_lock);
>
>         ret = btrfs_start_delalloc_snapshot(root, false);
>         if (ret)
> @@ -1029,7 +1029,7 @@ static noinline int btrfs_mksnapshot(const struct path *parent,
>  out:
>         if (snapshot_force_cow)
>                 atomic_dec(&root->snapshot_force_cow);
> -       btrfs_drew_read_unlock(&root->snapshot_lock);
> +       btrfs_drew_read_unlock(root->snapshot_lock);
>         return ret;
>  }
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] btrfs: add snapshot_lock to new_fs_root_args
  2022-12-05  9:51 ` [PATCH 3/3] btrfs: add snapshot_lock to new_fs_root_args robbieko
@ 2022-12-12 17:06   ` Filipe Manana
  2022-12-13  3:34     ` robbieko
  0 siblings, 1 reply; 16+ messages in thread
From: Filipe Manana @ 2022-12-12 17:06 UTC (permalink / raw)
  To: robbieko; +Cc: linux-btrfs

On Mon, Dec 5, 2022 at 10:36 AM robbieko <robbieko@synology.com> wrote:
>
> From: Robbie Ko <robbieko@synology.com>
>
> Add snapshot_lock into new_fs_root_args to prevent
> percpu_counter_init enter unexpected -ENOMEM when
> calling btrfs_init_fs_root.

You could be more detailed here and mention that all this is to
prevent transaction aborts in case percpu_counter_init() fails to
allocate memory.

Interpreting literally what you wrote, it gives the idea that after
this patch the memory allocation can never fail, which isn't true.
The goal is just to allocate the snapshot lock before holding a
transaction handle, so that we can use GFP_KERNEL, which reduces the
chances of memory allocation failing and, above all, avoid aborting a
transaction and turn the fs to RO mode.

>
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
>  fs/btrfs/disk-io.c | 44 +++++++++++++++++++++++++++++++-------------
>  fs/btrfs/disk-io.h |  2 ++
>  2 files changed, 33 insertions(+), 13 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 5760c2b1a100..5704c8f5873c 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1437,6 +1437,16 @@ struct btrfs_new_fs_root_args *btrfs_new_fs_root_args_prepare(gfp_t gfp_mask)
>         if (err)
>                 goto error;
>
> +       args->snapshot_lock = kzalloc(sizeof(*args->snapshot_lock), gfp_mask);
> +       if (!args->snapshot_lock) {
> +               err = -ENOMEM;
> +               goto error;
> +       }
> +       ASSERT((gfp_mask & GFP_KERNEL) == GFP_KERNEL);

As commented for patch 1/3, we could get away with the argument and
directly use GFP_KERNEL above, and let lockdep warn us (which makes
fstests fail).

> +       err = btrfs_drew_lock_init(args->snapshot_lock);
> +       if (err)
> +               goto error;
> +
>         return args;
>
>  error:
> @@ -1448,6 +1458,9 @@ void btrfs_new_fs_root_args_destroy(struct btrfs_new_fs_root_args *args)
>  {
>         if (!args)
>                 return;
> +       if (args->snapshot_lock)
> +               btrfs_drew_lock_destroy(args->snapshot_lock);
> +       kfree(args->snapshot_lock);

You could combine the kfree of the lock inside the if statement,
making it more clear imo.

>         if (args->anon_dev)
>                 free_anon_bdev(args->anon_dev);
>         kfree(args);
> @@ -1464,20 +1477,25 @@ static int btrfs_init_fs_root(struct btrfs_root *root,
>         int ret;
>         unsigned int nofs_flag;
>
> -       root->snapshot_lock = kzalloc(sizeof(*root->snapshot_lock), GFP_NOFS);
> -       if (!root->snapshot_lock) {
> -               ret = -ENOMEM;
> -               goto fail;
> +       if (new_fs_root_args && new_fs_root_args->snapshot_lock) {
> +               root->snapshot_lock = new_fs_root_args->snapshot_lock;
> +               new_fs_root_args->snapshot_lock = NULL;
> +       } else {
> +               root->snapshot_lock = kzalloc(sizeof(*root->snapshot_lock), GFP_NOFS);
> +               if (!root->snapshot_lock) {
> +                       ret = -ENOMEM;
> +                       goto fail;
> +               }
> +               /*
> +                * We might be called under a transaction (e.g. indirect backref
> +                * resolution) which could deadlock if it triggers memory reclaim

While here, we could end the sentence with punctuation (.).

Thanks.

> +                */
> +               nofs_flag = memalloc_nofs_save();
> +               ret = btrfs_drew_lock_init(root->snapshot_lock);
> +               memalloc_nofs_restore(nofs_flag);
> +               if (ret)
> +                       goto fail;
>         }
> -       /*
> -        * We might be called under a transaction (e.g. indirect backref
> -        * resolution) which could deadlock if it triggers memory reclaim
> -        */
> -       nofs_flag = memalloc_nofs_save();
> -       ret = btrfs_drew_lock_init(root->snapshot_lock);
> -       memalloc_nofs_restore(nofs_flag);
> -       if (ret)
> -               goto fail;
>
>         if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID &&
>             !btrfs_is_data_reloc_root(root)) {
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index a7c91bfb0c16..0e84927859ce 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -28,6 +28,8 @@ static inline u64 btrfs_sb_offset(int mirror)
>  struct btrfs_new_fs_root_args {
>         /* Preallocated anonymous block device number */
>         dev_t anon_dev;
> +       /* Preallocated snapshot lock */
> +       struct btrfs_drew_lock *snapshot_lock;
>  };
>
>  struct btrfs_device;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot
  2022-12-12 17:04       ` Filipe Manana
@ 2022-12-13  2:59         ` robbieko
  0 siblings, 0 replies; 16+ messages in thread
From: robbieko @ 2022-12-13  2:59 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs


Filipe Manana 於 2022/12/13 上午1:04 寫道:
> On Thu, Dec 8, 2022 at 6:24 AM robbieko <robbieko@synology.com> wrote:
>>
>> robbieko 於 2022/12/6 下午4:58 寫道:
>>> Filipe Manana 於 2022/12/5 下午7:15 寫道:
>>>> On Mon, Dec 5, 2022 at 10:55 AM robbieko <robbieko@synology.com> wrote:
>>>>> From: Robbie Ko <robbieko@synology.com>
>>>>>
>>>>> [Issue]
>>>>> When creating subvolume/snapshot, the transaction may be abort due
>>>>> to -ENOMEM
>>>>>
>>>>>     WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937
>>>>> create_pending_snapshot+0xe30/0xe70 [btrfs]()
>>>>>     CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
>>>> 4.4.180...
>>>>
>>>> How often does that happen on a supported kernel? The oldest supported
>>>> kernel is 4.9 at the moment.
>>> The occurrence of this issue is extremely low, and it cannot be
>>> reproduced stably.
>>> We have millions of machines out there, and this issue happened 3
>>> times in the last two years.
>>>
>>>
>>>>>     Call Trace:
>>>>>       create_pending_snapshot+0xe30/0xe70 [btrfs]
>>>>>       create_pending_snapshots+0x89/0xb0 [btrfs]
>>>>>       btrfs_commit_transaction+0x469/0xc60 [btrfs]
>>>>>       btrfs_mksubvol+0x5bd/0x690 [btrfs]
>>>>>       btrfs_mksnapshot+0x102/0x170 [btrfs]
>>>>>       btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
>>>>>       btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
>>>>>       btrfs_ioctl+0x2111/0x3130 [btrfs]
>>>>>       do_vfs_ioctl+0x7ea/0xa80
>>>>>       SyS_ioctl+0xa1/0xb0
>>>>>       entry_SYSCALL_64_fastpath+0x1e/0x8e
>>>>>     ---[ end trace 910c8f86780ca385 ]---
>>>>>     BTRFS: error (device dm-2) in create_pending_snapshot:1937:
>>>>> errno=-12 Out of memory
>>>>>
>>>>> [Cause]
>>>>> During creating a subvolume/snapshot, it is necessary to allocate
>>>>> memory for Initializing fs root.
>>>>> Therefore, it can only use GFP_NOFS to allocate memory to avoid
>>>>> deadlock issues.
>>>>> However, atomic allocation is required when processing
>>>>> percpu_counter_init
>>>>> without GFP_KERNEL due to the unique structure of percpu_counter.
>>>>> In this situation, allocating memory for initializing fs root may cause
>>>>> unexpected -ENOMEM when free memory is low and causes btrfs
>>>>> transaction to abort.
>>>> This sounds familiar, and we had a regression in mm that made
>>>> percepu_counter_init fail very often with -ENOMEM.
>>>> See:
>>>>
>>>> https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/
>>>>
>>>>
>>>> The kernel fix was this:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0760fa3d8f7fceeea508b98899f1c826e10ffe78
>>>>
>>>>
>>>> I'm assuming that you are probably running a Synology kernel based on
>>>> 4.4 with a lot of backported patches, is that correct?
>>>> Maybe you have the patchset that introduced the regression in that
>>>> kernel, but that later fix is not in there.
>>>>
>>>> Thanks.
>>> Yes, We do backport many patches, but we don't backport "mm:
>>> memcg/percpu: account percpu memory to memory cgroups",
>>> So we think the issue has nothing to do with "percpu: make
>>> pcpu_nr_empty_pop_pages per chunk type".
>>>
>>> This issue is just a corner case, when percpu_counter_init is executed
>>> with GFP_NOFS, there is a chance to fail.
>>> So we feel that the snapshot_lock should be preallocated.
>>>
>>> Thanks.
>>>
>>>
>> Anyone have any suggestions ?
> Sorry, some holidays and other priorities came in.
>
> The idea seems fine to me. I was just wondering if you weren't hitting
> that regression in mm.
> Patches 1 and 2 don't apply with current misc-next, btw, they require
> manual conflict resolution.

Ok. I will switch to using misc-next for development and send patch v2

Thanks.

> I'll leave review comments on each patch, small things, nothing major.
>
> Thanks.
>
>>
>>>>> [Fix]
>>>>> We allocate memory at the beginning of creating a subvolume/snapshot.
>>>>> This way, we can ensure the memory is enough when initializing fs root.
>>>>> Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
>>>>> the transaction won’t abort since it hasn’t started yet.
>>>>>
>>>>> Robbie Ko (3):
>>>>>     btrfs: refactor anon_dev with new_fs_root_args for create
>>>>>       subvolume/snapshot
>>>>>     btrfs: change snapshot_lock to dynamic pointer
>>>>>     btrfs: add snapshot_lock to new_fs_root_args
>>>>>
>>>>>    fs/btrfs/ctree.h       |   2 +-
>>>>>    fs/btrfs/disk-io.c     | 107
>>>>> ++++++++++++++++++++++++++++++-----------
>>>>>    fs/btrfs/disk-io.h     |  12 ++++-
>>>>>    fs/btrfs/file.c        |   8 +--
>>>>>    fs/btrfs/inode.c       |  12 ++---
>>>>>    fs/btrfs/ioctl.c       |  38 +++++++--------
>>>>>    fs/btrfs/transaction.c |   2 +-
>>>>>    fs/btrfs/transaction.h |   5 +-
>>>>>    8 files changed, 123 insertions(+), 63 deletions(-)
>>>>>
>>>>> --
>>>>> 2.17.1
>>>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot
  2022-12-12 17:04   ` Filipe Manana
@ 2022-12-13  3:29     ` robbieko
  2022-12-13 19:02       ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: robbieko @ 2022-12-13  3:29 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs


Filipe Manana 於 2022/12/13 上午1:04 寫道:
> On Mon, Dec 5, 2022 at 10:11 AM robbieko <robbieko@synology.com> wrote:
>> From: Robbie Ko <robbieko@synology.com>
>>
>> Create a new structure called btrfs_new_fs_root_args that
>> refactor the type of anon_dev.
>> With the new structure, btrfs_get_new_fs_root can input
>> btrfs_new_fs_root_args into btrfs_init_fs_root by order.
>>
>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>> ---
>>   fs/btrfs/disk-io.c     | 63 ++++++++++++++++++++++++++++++------------
>>   fs/btrfs/disk-io.h     | 10 ++++++-
>>   fs/btrfs/ioctl.c       | 34 +++++++++++------------
>>   fs/btrfs/transaction.c |  2 +-
>>   fs/btrfs/transaction.h |  5 ++--
>>   5 files changed, 74 insertions(+), 40 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index d99bf7c64611..afe16e1f0b18 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1422,12 +1422,44 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
>>          return root;
>>   }
>>
>> +struct btrfs_new_fs_root_args *btrfs_new_fs_root_args_prepare(gfp_t gfp_mask)
> Can we remove the argument?
> Even after the whole patchset, it will always be GFP_KERNEL.
>
> I know patch 3/3 adds an assertion to verify we get GFP_KERNEL, but
> lockdep is smart enough to detect an issue and complain.

Okay,  I will remove the argument.

Thanks.

>
> Also, can you name the function something like btrfs_alloc_new_fs_roots_arg() ?
> The naming pattern we use for allocator functions is to use "alloc" in
> the name, for e.g. btrfs_alloc_path(), alloc_extent_map(), etc.
> So having an allocator named *_prepare() breaks the pattern and it's
> also confusing (as it allocates the structure).

Okay, I will modify the function name to comply with the coding style.

Thanks.

>> +{
>> +       int err;
>> +       struct btrfs_new_fs_root_args *args;
>> +
>> +       args = kzalloc(sizeof(*args), gfp_mask);
>> +       if (!args) {
>> +               err = -ENOMEM;
>> +               goto error;
>> +       }
> if (!args)
>       return ERR_PTR(-ENOMEM);
>
> That's simpler.

We hope to satisfy the single-exit rule as much as possible to increase
code readability and uniform resource release.

Thanks.

>> +
>> +       err = get_anon_bdev(&args->anon_dev);
>> +       if (err)
>> +               goto error;
>> +
>> +       return args;
>> +
>> +error:
>> +       btrfs_new_fs_root_args_destroy(args);
>> +       return ERR_PTR(err);
>> +}
>> +
>> +void btrfs_new_fs_root_args_destroy(struct btrfs_new_fs_root_args *args)
> Same thing about the naming consistency here. Something like:
> btrfs_free_new_fs_roots_arg()
> We have btrfs_free_path(), free_extent_map(), etc.


Okay, I will modify it.

Thanks.


> Thanks.
>> +{
>> +       if (!args)
>> +               return;
>> +       if (args->anon_dev)
>> +               free_anon_bdev(args->anon_dev);
>> +       kfree(args);
>> +}
>> +
>>   /*
>>    * Initialize subvolume root in-memory structure
>>    *
>>    * @anon_dev:  anonymous device to attach to the root, if zero, allocate new
>>    */
>> -static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
>> +static int btrfs_init_fs_root(struct btrfs_root *root,
>> +                             struct btrfs_new_fs_root_args *new_fs_root_args)
>>   {
>>          int ret;
>>          unsigned int nofs_flag;
>> @@ -1454,12 +1486,13 @@ static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
>>           */
>>          if (is_fstree(root->root_key.objectid) &&
>>              btrfs_root_refs(&root->root_item) > 0) {
>> -               if (!anon_dev) {
>> +               if (!new_fs_root_args || !new_fs_root_args->anon_dev) {
>>                          ret = get_anon_bdev(&root->anon_dev);
>>                          if (ret)
>>                                  goto fail;
>>                  } else {
>> -                       root->anon_dev = anon_dev;
>> +                       root->anon_dev = new_fs_root_args->anon_dev;
>> +                       new_fs_root_args->anon_dev = 0;
>>                  }
>>          }
>>
>> @@ -1633,7 +1666,8 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
>>    *             for orphan roots
>>    */
>>   static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
>> -                                            u64 objectid, dev_t anon_dev,
>> +                                            u64 objectid,
>> +                                            struct btrfs_new_fs_root_args *new_fs_root_args,
>>                                               bool check_ref)
>>   {
>>          struct btrfs_root *root;
>> @@ -1647,8 +1681,8 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
>>   again:
>>          root = btrfs_lookup_fs_root(fs_info, objectid);
>>          if (root) {
>> -               /* Shouldn't get preallocated anon_dev for cached roots */
>> -               ASSERT(!anon_dev);
>> +               /* Shouldn't get new_fs_root_args for cached roots */
>> +               ASSERT(!new_fs_root_args);
>>                  if (check_ref && btrfs_root_refs(&root->root_item) == 0) {
>>                          btrfs_put_root(root);
>>                          return ERR_PTR(-ENOENT);
>> @@ -1668,7 +1702,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
>>                  goto fail;
>>          }
>>
>> -       ret = btrfs_init_fs_root(root, anon_dev);
>> +       ret = btrfs_init_fs_root(root, new_fs_root_args);
>>          if (ret)
>>                  goto fail;
>>
>> @@ -1698,14 +1732,6 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
>>          }
>>          return root;
>>   fail:
>> -       /*
>> -        * If our caller provided us an anonymous device, then it's his
>> -        * responsibility to free it in case we fail. So we have to set our
>> -        * root's anon_dev to 0 to avoid a double free, once by btrfs_put_root()
>> -        * and once again by our caller.
>> -        */
>> -       if (anon_dev)
>> -               root->anon_dev = 0;
>>          btrfs_put_root(root);
>>          return ERR_PTR(ret);
>>   }
>> @@ -1720,7 +1746,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
>>   struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
>>                                       u64 objectid, bool check_ref)
>>   {
>> -       return btrfs_get_root_ref(fs_info, objectid, 0, check_ref);
>> +       return btrfs_get_root_ref(fs_info, objectid, NULL, check_ref);
>>   }
>>
>>   /*
>> @@ -1732,9 +1758,10 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
>>    *             parameter value
>>    */
>>   struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
>> -                                        u64 objectid, dev_t anon_dev)
>> +                                        u64 objectid,
>> +                                        struct btrfs_new_fs_root_args *new_fs_root_args)
>>   {
>> -       return btrfs_get_root_ref(fs_info, objectid, anon_dev, true);
>> +       return btrfs_get_root_ref(fs_info, objectid, new_fs_root_args, true);
>>   }
>>
>>   /*
>> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
>> index 9fa923e005a3..a7c91bfb0c16 100644
>> --- a/fs/btrfs/disk-io.h
>> +++ b/fs/btrfs/disk-io.h
>> @@ -25,6 +25,11 @@ static inline u64 btrfs_sb_offset(int mirror)
>>          return BTRFS_SUPER_INFO_OFFSET;
>>   }
>>
>> +struct btrfs_new_fs_root_args {
>> +       /* Preallocated anonymous block device number */
>> +       dev_t anon_dev;
>> +};
>> +
>>   struct btrfs_device;
>>   struct btrfs_fs_devices;
>>
>> @@ -58,6 +63,8 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
>>   int btrfs_commit_super(struct btrfs_fs_info *fs_info);
>>   struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
>>                                          struct btrfs_key *key);
>> +struct btrfs_new_fs_root_args *btrfs_new_fs_root_args_prepare(gfp_t gfp_mask);
>> +void btrfs_new_fs_root_args_destroy(struct btrfs_new_fs_root_args *args);
>>   int btrfs_insert_fs_root(struct btrfs_fs_info *fs_info,
>>                           struct btrfs_root *root);
>>   void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info);
>> @@ -65,7 +72,8 @@ void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info);
>>   struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
>>                                       u64 objectid, bool check_ref);
>>   struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
>> -                                        u64 objectid, dev_t anon_dev);
>> +                                        u64 objectid,
>> +                                        struct btrfs_new_fs_root_args *new_fs_root_args);
>>   struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
>>                                                   struct btrfs_path *path,
>>                                                   u64 objectid);
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 5ba2e810dc6e..5785336ab7cf 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -588,8 +588,8 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
>>          };
>>          unsigned int trans_num_items;
>>          int ret;
>> -       dev_t anon_dev;
>>          u64 objectid;
>> +       struct btrfs_new_fs_root_args *new_fs_root_args = NULL;
>>
>>          root_item = kzalloc(sizeof(*root_item), GFP_KERNEL);
>>          if (!root_item)
>> @@ -608,14 +608,17 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
>>                  goto out_root_item;
>>          }
>>
>> -       ret = get_anon_bdev(&anon_dev);
>> -       if (ret < 0)
>> +       new_fs_root_args = btrfs_new_fs_root_args_prepare(GFP_KERNEL);
>> +       if (IS_ERR(new_fs_root_args)) {
>> +               ret = PTR_ERR(new_fs_root_args);
>> +               new_fs_root_args = NULL;
>>                  goto out_root_item;
>> +       }
>>
>>          new_inode_args.inode = btrfs_new_subvol_inode(mnt_userns, dir);
>>          if (!new_inode_args.inode) {
>>                  ret = -ENOMEM;
>> -               goto out_anon_dev;
>> +               goto out_root_item;
>>          }
>>          ret = btrfs_new_inode_prepare(&new_inode_args, &trans_num_items);
>>          if (ret)
>> @@ -706,14 +709,12 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
>>          free_extent_buffer(leaf);
>>          leaf = NULL;
>>
>> -       new_root = btrfs_get_new_fs_root(fs_info, objectid, anon_dev);
>> +       new_root = btrfs_get_new_fs_root(fs_info, objectid, new_fs_root_args);
>>          if (IS_ERR(new_root)) {
>>                  ret = PTR_ERR(new_root);
>>                  btrfs_abort_transaction(trans, ret);
>>                  goto out;
>>          }
>> -       /* anon_dev is owned by new_root now. */
>> -       anon_dev = 0;
>>          BTRFS_I(new_inode_args.inode)->root = new_root;
>>          /* ... and new_root is owned by new_inode_args.inode now. */
>>
>> @@ -752,10 +753,8 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
>>          btrfs_new_inode_args_destroy(&new_inode_args);
>>   out_inode:
>>          iput(new_inode_args.inode);
>> -out_anon_dev:
>> -       if (anon_dev)
>> -               free_anon_bdev(anon_dev);
>>   out_root_item:
>> +       btrfs_new_fs_root_args_destroy(new_fs_root_args);
>>          kfree(root_item);
>>          return ret;
>>   }
>> @@ -791,9 +790,13 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
>>          if (!pending_snapshot)
>>                  return -ENOMEM;
>>
>> -       ret = get_anon_bdev(&pending_snapshot->anon_dev);
>> -       if (ret < 0)
>> +       pending_snapshot->new_fs_root_args = btrfs_new_fs_root_args_prepare(GFP_KERNEL);
>> +       if (IS_ERR(pending_snapshot->new_fs_root_args)) {
>> +               ret = PTR_ERR(pending_snapshot->new_fs_root_args);
>> +               pending_snapshot->new_fs_root_args = NULL;
>>                  goto free_pending;
>> +       }
>> +
>>          pending_snapshot->root_item = kzalloc(sizeof(struct btrfs_root_item),
>>                          GFP_KERNEL);
>>          pending_snapshot->path = btrfs_alloc_path();
>> @@ -850,16 +853,11 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
>>
>>          d_instantiate(dentry, inode);
>>          ret = 0;
>> -       pending_snapshot->anon_dev = 0;
>>   fail:
>> -       /* Prevent double freeing of anon_dev */
>> -       if (ret && pending_snapshot->snap)
>> -               pending_snapshot->snap->anon_dev = 0;
>>          btrfs_put_root(pending_snapshot->snap);
>>          btrfs_subvolume_release_metadata(root, &pending_snapshot->block_rsv);
>>   free_pending:
>> -       if (pending_snapshot->anon_dev)
>> -               free_anon_bdev(pending_snapshot->anon_dev);
>> +       btrfs_new_fs_root_args_destroy(pending_snapshot->new_fs_root_args);
>>          kfree(pending_snapshot->root_item);
>>          btrfs_free_path(pending_snapshot->path);
>>          kfree(pending_snapshot);
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index d1f1da6820fb..614eb84422e1 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -1777,7 +1777,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>>          }
>>
>>          key.offset = (u64)-1;
>> -       pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->anon_dev);
>> +       pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->new_fs_root_args);
>>          if (IS_ERR(pending->snap)) {
>>                  ret = PTR_ERR(pending->snap);
>>                  pending->snap = NULL;
>> diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
>> index 970ff316069d..34648a617a2c 100644
>> --- a/fs/btrfs/transaction.h
>> +++ b/fs/btrfs/transaction.h
>> @@ -10,6 +10,7 @@
>>   #include "btrfs_inode.h"
>>   #include "delayed-ref.h"
>>   #include "ctree.h"
>> +#include "disk-io.h"
>>
>>   enum btrfs_trans_state {
>>          TRANS_STATE_RUNNING,
>> @@ -161,8 +162,8 @@ struct btrfs_pending_snapshot {
>>          struct btrfs_block_rsv block_rsv;
>>          /* extra metadata reservation for relocation */
>>          int error;
>> -       /* Preallocated anonymous block device number */
>> -       dev_t anon_dev;
>> +       /* Preallocated new_fs_root_args */
>> +       struct btrfs_new_fs_root_args *new_fs_root_args;
>>          bool readonly;
>>          struct list_head list;
>>   };
>> --
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] btrfs: change snapshot_lock to dynamic pointer
  2022-12-12 17:05   ` Filipe Manana
@ 2022-12-13  3:31     ` robbieko
  0 siblings, 0 replies; 16+ messages in thread
From: robbieko @ 2022-12-13  3:31 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs


Filipe Manana 於 2022/12/13 上午1:05 寫道:
> On Mon, Dec 5, 2022 at 10:30 AM robbieko <robbieko@synology.com> wrote:
>> From: Robbie Ko <robbieko@synology.com>
>>
>> Change snapshot_lock to dynamic pointer to allocate memory
>> at the beginning of creating a subvolume/snapshot.
>>
>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>> ---
>>   fs/btrfs/ctree.h   |  2 +-
>>   fs/btrfs/disk-io.c | 10 ++++++++--
>>   fs/btrfs/file.c    |  8 ++++----
>>   fs/btrfs/inode.c   | 12 ++++++------
>>   fs/btrfs/ioctl.c   |  4 ++--
>>   5 files changed, 21 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 9e6d48ff4597..99003b0dd407 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1429,7 +1429,7 @@ struct btrfs_root {
>>           */
>>          int dedupe_in_progress;
>>          /* For exclusion of snapshot creation and nocow writes */
>> -       struct btrfs_drew_lock snapshot_lock;
>> +       struct btrfs_drew_lock *snapshot_lock;
>>
>>          atomic_t snapshot_force_cow;
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index afe16e1f0b18..5760c2b1a100 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1464,12 +1464,17 @@ static int btrfs_init_fs_root(struct btrfs_root *root,
>>          int ret;
>>          unsigned int nofs_flag;
>>
>> +       root->snapshot_lock = kzalloc(sizeof(*root->snapshot_lock), GFP_NOFS);
>> +       if (!root->snapshot_lock) {
>> +               ret = -ENOMEM;
>> +               goto fail;
>> +       }
>>          /*
>>           * We might be called under a transaction (e.g. indirect backref
>>           * resolution) which could deadlock if it triggers memory reclaim
>>           */
>>          nofs_flag = memalloc_nofs_save();
>> -       ret = btrfs_drew_lock_init(&root->snapshot_lock);
>> +       ret = btrfs_drew_lock_init(root->snapshot_lock);
>>          memalloc_nofs_restore(nofs_flag);
>>          if (ret)
>>                  goto fail;
>> @@ -2180,7 +2185,8 @@ void btrfs_put_root(struct btrfs_root *root)
>>                  WARN_ON(test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state));
>>                  if (root->anon_dev)
>>                          free_anon_bdev(root->anon_dev);
>> -               btrfs_drew_lock_destroy(&root->snapshot_lock);
>> +               if (root->snapshot_lock)
>> +                       btrfs_drew_lock_destroy(root->snapshot_lock);
> This is leaking the memory allocated for the lock, it needs a
> kfree(root->snapshot_lock) too.

I missed the release,
I will fix it.

Thanks

> Thanks.
>
>>                  free_root_extent_buffers(root);
>>   #ifdef CONFIG_BTRFS_DEBUG
>>                  spin_lock(&root->fs_info->fs_roots_radix_lock);
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index d01631d47806..a338fbd34472 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1379,7 +1379,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>>          if (!(inode->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)))
>>                  return 0;
>>
>> -       if (!btrfs_drew_try_write_lock(&root->snapshot_lock))
>> +       if (!btrfs_drew_try_write_lock(root->snapshot_lock))
>>                  return -EAGAIN;
>>
>>          lockstart = round_down(pos, fs_info->sectorsize);
>> @@ -1389,7 +1389,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>>
>>          if (nowait) {
>>                  if (!btrfs_try_lock_ordered_range(inode, lockstart, lockend)) {
>> -                       btrfs_drew_write_unlock(&root->snapshot_lock);
>> +                       btrfs_drew_write_unlock(root->snapshot_lock);
>>                          return -EAGAIN;
>>                  }
>>          } else {
>> @@ -1398,7 +1398,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>>          ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
>>                          NULL, NULL, NULL, nowait, false);
>>          if (ret <= 0)
>> -               btrfs_drew_write_unlock(&root->snapshot_lock);
>> +               btrfs_drew_write_unlock(root->snapshot_lock);
>>          else
>>                  *write_bytes = min_t(size_t, *write_bytes ,
>>                                       num_bytes - pos + lockstart);
>> @@ -1409,7 +1409,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>>
>>   void btrfs_check_nocow_unlock(struct btrfs_inode *inode)
>>   {
>> -       btrfs_drew_write_unlock(&inode->root->snapshot_lock);
>> +       btrfs_drew_write_unlock(inode->root->snapshot_lock);
>>   }
>>
>>   static void update_time_for_write(struct inode *inode)
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 0e516aefbf51..8fe4b00d31f4 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -5167,16 +5167,16 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
>>                   * truncation, it must capture all writes that happened before
>>                   * this truncation.
>>                   */
>> -               btrfs_drew_write_lock(&root->snapshot_lock);
>> +               btrfs_drew_write_lock(root->snapshot_lock);
>>                  ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, newsize);
>>                  if (ret) {
>> -                       btrfs_drew_write_unlock(&root->snapshot_lock);
>> +                       btrfs_drew_write_unlock(root->snapshot_lock);
>>                          return ret;
>>                  }
>>
>>                  trans = btrfs_start_transaction(root, 1);
>>                  if (IS_ERR(trans)) {
>> -                       btrfs_drew_write_unlock(&root->snapshot_lock);
>> +                       btrfs_drew_write_unlock(root->snapshot_lock);
>>                          return PTR_ERR(trans);
>>                  }
>>
>> @@ -5184,7 +5184,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
>>                  btrfs_inode_safe_disk_i_size_write(BTRFS_I(inode), 0);
>>                  pagecache_isize_extended(inode, oldsize, newsize);
>>                  ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
>> -               btrfs_drew_write_unlock(&root->snapshot_lock);
>> +               btrfs_drew_write_unlock(root->snapshot_lock);
>>                  btrfs_end_transaction(trans);
>>          } else {
>>                  struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>> @@ -11026,7 +11026,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
>>           * completes before the first write into the swap file after it is
>>           * activated, than that write would fallback to COW.
>>           */
>> -       if (!btrfs_drew_try_write_lock(&root->snapshot_lock)) {
>> +       if (!btrfs_drew_try_write_lock(root->snapshot_lock)) {
>>                  btrfs_exclop_finish(fs_info);
>>                  btrfs_warn(fs_info,
>>             "cannot activate swapfile because snapshot creation is in progress");
>> @@ -11199,7 +11199,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
>>          if (ret)
>>                  btrfs_swap_deactivate(file);
>>
>> -       btrfs_drew_write_unlock(&root->snapshot_lock);
>> +       btrfs_drew_write_unlock(root->snapshot_lock);
>>
>>          btrfs_exclop_finish(fs_info);
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 5785336ab7cf..9f1b81ff37a3 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1008,7 +1008,7 @@ static noinline int btrfs_mksnapshot(const struct path *parent,
>>           * possible. This is to avoid later writeback (running dealloc) to
>>           * fallback to COW mode and unexpectedly fail with ENOSPC.
>>           */
>> -       btrfs_drew_read_lock(&root->snapshot_lock);
>> +       btrfs_drew_read_lock(root->snapshot_lock);
>>
>>          ret = btrfs_start_delalloc_snapshot(root, false);
>>          if (ret)
>> @@ -1029,7 +1029,7 @@ static noinline int btrfs_mksnapshot(const struct path *parent,
>>   out:
>>          if (snapshot_force_cow)
>>                  atomic_dec(&root->snapshot_force_cow);
>> -       btrfs_drew_read_unlock(&root->snapshot_lock);
>> +       btrfs_drew_read_unlock(root->snapshot_lock);
>>          return ret;
>>   }
>>
>> --
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] btrfs: add snapshot_lock to new_fs_root_args
  2022-12-12 17:06   ` Filipe Manana
@ 2022-12-13  3:34     ` robbieko
  0 siblings, 0 replies; 16+ messages in thread
From: robbieko @ 2022-12-13  3:34 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs


Filipe Manana 於 2022/12/13 上午1:06 寫道:
> On Mon, Dec 5, 2022 at 10:36 AM robbieko <robbieko@synology.com> wrote:
>> From: Robbie Ko <robbieko@synology.com>
>>
>> Add snapshot_lock into new_fs_root_args to prevent
>> percpu_counter_init enter unexpected -ENOMEM when
>> calling btrfs_init_fs_root.
> You could be more detailed here and mention that all this is to
> prevent transaction aborts in case percpu_counter_init() fails to
> allocate memory.
>
> Interpreting literally what you wrote, it gives the idea that after
> this patch the memory allocation can never fail, which isn't true.
> The goal is just to allocate the snapshot lock before holding a
> transaction handle, so that we can use GFP_KERNEL, which reduces the
> chances of memory allocation failing and, above all, avoid aborting a
> transaction and turn the fs to RO mode.

Okay. I will fix the commit message to provide a more detailed description.

Thanks.

>
>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>> ---
>>   fs/btrfs/disk-io.c | 44 +++++++++++++++++++++++++++++++-------------
>>   fs/btrfs/disk-io.h |  2 ++
>>   2 files changed, 33 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 5760c2b1a100..5704c8f5873c 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1437,6 +1437,16 @@ struct btrfs_new_fs_root_args *btrfs_new_fs_root_args_prepare(gfp_t gfp_mask)
>>          if (err)
>>                  goto error;
>>
>> +       args->snapshot_lock = kzalloc(sizeof(*args->snapshot_lock), gfp_mask);
>> +       if (!args->snapshot_lock) {
>> +               err = -ENOMEM;
>> +               goto error;
>> +       }
>> +       ASSERT((gfp_mask & GFP_KERNEL) == GFP_KERNEL);
> As commented for patch 1/3, we could get away with the argument and
> directly use GFP_KERNEL above, and let lockdep warn us (which makes
> fstests fail).
Okay.
>
>> +       err = btrfs_drew_lock_init(args->snapshot_lock);
>> +       if (err)
>> +               goto error;
>> +
>>          return args;
>>
>>   error:
>> @@ -1448,6 +1458,9 @@ void btrfs_new_fs_root_args_destroy(struct btrfs_new_fs_root_args *args)
>>   {
>>          if (!args)
>>                  return;
>> +       if (args->snapshot_lock)
>> +               btrfs_drew_lock_destroy(args->snapshot_lock);
>> +       kfree(args->snapshot_lock);
> You could combine the kfree of the lock inside the if statement,
> making it more clear imo.
Okay.
>
>>          if (args->anon_dev)
>>                  free_anon_bdev(args->anon_dev);
>>          kfree(args);
>> @@ -1464,20 +1477,25 @@ static int btrfs_init_fs_root(struct btrfs_root *root,
>>          int ret;
>>          unsigned int nofs_flag;
>>
>> -       root->snapshot_lock = kzalloc(sizeof(*root->snapshot_lock), GFP_NOFS);
>> -       if (!root->snapshot_lock) {
>> -               ret = -ENOMEM;
>> -               goto fail;
>> +       if (new_fs_root_args && new_fs_root_args->snapshot_lock) {
>> +               root->snapshot_lock = new_fs_root_args->snapshot_lock;
>> +               new_fs_root_args->snapshot_lock = NULL;
>> +       } else {
>> +               root->snapshot_lock = kzalloc(sizeof(*root->snapshot_lock), GFP_NOFS);
>> +               if (!root->snapshot_lock) {
>> +                       ret = -ENOMEM;
>> +                       goto fail;
>> +               }
>> +               /*
>> +                * We might be called under a transaction (e.g. indirect backref
>> +                * resolution) which could deadlock if it triggers memory reclaim
> While here, we could end the sentence with punctuation (.).
Okay.
> Thanks.
>
>> +                */
>> +               nofs_flag = memalloc_nofs_save();
>> +               ret = btrfs_drew_lock_init(root->snapshot_lock);
>> +               memalloc_nofs_restore(nofs_flag);
>> +               if (ret)
>> +                       goto fail;
>>          }
>> -       /*
>> -        * We might be called under a transaction (e.g. indirect backref
>> -        * resolution) which could deadlock if it triggers memory reclaim
>> -        */
>> -       nofs_flag = memalloc_nofs_save();
>> -       ret = btrfs_drew_lock_init(root->snapshot_lock);
>> -       memalloc_nofs_restore(nofs_flag);
>> -       if (ret)
>> -               goto fail;
>>
>>          if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID &&
>>              !btrfs_is_data_reloc_root(root)) {
>> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
>> index a7c91bfb0c16..0e84927859ce 100644
>> --- a/fs/btrfs/disk-io.h
>> +++ b/fs/btrfs/disk-io.h
>> @@ -28,6 +28,8 @@ static inline u64 btrfs_sb_offset(int mirror)
>>   struct btrfs_new_fs_root_args {
>>          /* Preallocated anonymous block device number */
>>          dev_t anon_dev;
>> +       /* Preallocated snapshot lock */
>> +       struct btrfs_drew_lock *snapshot_lock;
>>   };
>>
>>   struct btrfs_device;
>> --
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot
  2022-12-13  3:29     ` robbieko
@ 2022-12-13 19:02       ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2022-12-13 19:02 UTC (permalink / raw)
  To: robbieko; +Cc: Filipe Manana, linux-btrfs

On Tue, Dec 13, 2022 at 11:29:09AM +0800, robbieko wrote:
> Filipe Manana 於 2022/12/13 上午1:04 寫道:
> >> +       int err;
> >> +       struct btrfs_new_fs_root_args *args;
> >> +
> >> +       args = kzalloc(sizeof(*args), gfp_mask);
> >> +       if (!args) {
> >> +               err = -ENOMEM;
> >> +               goto error;
> >> +       }
> > if (!args)
> >       return ERR_PTR(-ENOMEM);
> >
> > That's simpler.
> 
> We hope to satisfy the single-exit rule as much as possible to increase
> code readability and uniform resource release.

If there's no cleanup needed the first few checks can do direct return,
and then the part with single exit and errors handled as goto.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-12-13 19:03 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-05  9:51 [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot robbieko
2022-12-05  9:51 ` [PATCH 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot robbieko
2022-12-12 17:04   ` Filipe Manana
2022-12-13  3:29     ` robbieko
2022-12-13 19:02       ` David Sterba
2022-12-05  9:51 ` [PATCH 2/3] btrfs: change snapshot_lock to dynamic pointer robbieko
2022-12-12 17:05   ` Filipe Manana
2022-12-13  3:31     ` robbieko
2022-12-05  9:51 ` [PATCH 3/3] btrfs: add snapshot_lock to new_fs_root_args robbieko
2022-12-12 17:06   ` Filipe Manana
2022-12-13  3:34     ` robbieko
2022-12-05 11:15 ` [PATCH 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot Filipe Manana
2022-12-06  8:58   ` robbieko
2022-12-08  6:24     ` robbieko
2022-12-12 17:04       ` Filipe Manana
2022-12-13  2:59         ` robbieko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.