linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot
@ 2022-12-14  2:11 robbieko
  2022-12-14  2:11 ` [PATCH v2 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot robbieko
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: robbieko @ 2022-12-14  2:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

From: Robbie Ko <robbieko@synology.com>

[Issue]
When creating subvolume/snapshot, the transaction may be abort due to -ENOMEM

  WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937 create_pending_snapshot+0xe30/0xe70 [btrfs]()
  CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
  Call Trace:
    create_pending_snapshot+0xe30/0xe70 [btrfs]
    create_pending_snapshots+0x89/0xb0 [btrfs]
    btrfs_commit_transaction+0x469/0xc60 [btrfs]
    btrfs_mksubvol+0x5bd/0x690 [btrfs]
    btrfs_mksnapshot+0x102/0x170 [btrfs]
    btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
    btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
    btrfs_ioctl+0x2111/0x3130 [btrfs]
    do_vfs_ioctl+0x7ea/0xa80
    SyS_ioctl+0xa1/0xb0
    entry_SYSCALL_64_fastpath+0x1e/0x8e
  ---[ end trace 910c8f86780ca385 ]---
  BTRFS: error (device dm-2) in create_pending_snapshot:1937: errno=-12 Out of memory

[Cause]
During creating a subvolume/snapshot, it is necessary to allocate memory for Initializing fs root.
Therefore, it can only use GFP_NOFS to allocate memory to avoid deadlock issues.
However, atomic allocation is required when processing percpu_counter_init
without GFP_KERNEL due to the unique structure of percpu_counter.
In this situation, allocating memory for initializing fs root may cause
unexpected -ENOMEM when free memory is low and causes btrfs transaction to abort.

[Fix]
We allocate memory at the beginning of creating a subvolume/snapshot.
This way, we can ensure the memory is enough when initializing fs root.
Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
the transaction won’t abort since it hasn’t started yet.

Robbie Ko (3):
  btrfs: refactor anon_dev with new_fs_root_args for create
    subvolume/snapshot
  btrfs: change snapshot_lock to dynamic pointer
  btrfs: add snapshot_lock to new_fs_root_args

 fs/btrfs/ctree.h       |   2 +-
 fs/btrfs/disk-io.c     | 107 ++++++++++++++++++++++++++++++-----------
 fs/btrfs/disk-io.h     |  12 ++++-
 fs/btrfs/file.c        |   8 +--
 fs/btrfs/inode.c       |  12 ++---
 fs/btrfs/ioctl.c       |  38 +++++++--------
 fs/btrfs/transaction.c |   2 +-
 fs/btrfs/transaction.h |   5 +-
 8 files changed, 123 insertions(+), 63 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot
  2022-12-14  2:11 [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot robbieko
@ 2022-12-14  2:11 ` robbieko
  2022-12-14  2:11 ` [PATCH v2 2/3] btrfs: change snapshot_lock to dynamic pointer robbieko
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: robbieko @ 2022-12-14  2:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

From: Robbie Ko <robbieko@synology.com>

Create a new structure called btrfs_new_fs_root_args that
refactor the type of anon_dev.
With the new structure, btrfs_get_new_fs_root can input
btrfs_new_fs_root_args into btrfs_init_fs_root by order.

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/disk-io.c     | 61 +++++++++++++++++++++++++++++-------------
 fs/btrfs/disk-io.h     | 10 ++++++-
 fs/btrfs/ioctl.c       | 34 +++++++++++------------
 fs/btrfs/transaction.c |  2 +-
 fs/btrfs/transaction.h |  5 ++--
 5 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index bd6cd8956ec8..e45fd1ef5d11 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1506,12 +1506,42 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
 	return root;
 }
 
+struct btrfs_new_fs_root_args *btrfs_alloc_new_fs_root_args(void)
+{
+	int err;
+	struct btrfs_new_fs_root_args *args;
+
+	args = kzalloc(sizeof(*args), GFP_KERNEL);
+	if (!args)
+		return ERR_PTR(-ENOMEM);
+
+	err = get_anon_bdev(&args->anon_dev);
+	if (err)
+		goto error;
+
+	return args;
+
+error:
+	btrfs_free_new_fs_root_args(args);
+	return ERR_PTR(err);
+}
+
+void btrfs_free_new_fs_root_args(struct btrfs_new_fs_root_args *args)
+{
+	if (!args)
+		return;
+	if (args->anon_dev)
+		free_anon_bdev(args->anon_dev);
+	kfree(args);
+}
+
 /*
  * Initialize subvolume root in-memory structure
  *
  * @anon_dev:	anonymous device to attach to the root, if zero, allocate new
  */
-static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
+static int btrfs_init_fs_root(struct btrfs_root *root,
+			      struct btrfs_new_fs_root_args *new_fs_root_args)
 {
 	int ret;
 	unsigned int nofs_flag;
@@ -1538,12 +1568,13 @@ static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
 	 */
 	if (is_fstree(root->root_key.objectid) &&
 	    btrfs_root_refs(&root->root_item) > 0) {
-		if (!anon_dev) {
+		if (!new_fs_root_args || !new_fs_root_args->anon_dev) {
 			ret = get_anon_bdev(&root->anon_dev);
 			if (ret)
 				goto fail;
 		} else {
-			root->anon_dev = anon_dev;
+			root->anon_dev = new_fs_root_args->anon_dev;
+			new_fs_root_args->anon_dev = 0;
 		}
 	}
 
@@ -1717,7 +1748,8 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
  *		for orphan roots
  */
 static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
-					     u64 objectid, dev_t anon_dev,
+					     u64 objectid,
+					     struct btrfs_new_fs_root_args *new_fs_root_args,
 					     bool check_ref)
 {
 	struct btrfs_root *root;
@@ -1731,8 +1763,8 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 again:
 	root = btrfs_lookup_fs_root(fs_info, objectid);
 	if (root) {
-		/* Shouldn't get preallocated anon_dev for cached roots */
-		ASSERT(!anon_dev);
+		/* Shouldn't get new_fs_root_args for cached roots */
+		ASSERT(!new_fs_root_args);
 		if (check_ref && btrfs_root_refs(&root->root_item) == 0) {
 			btrfs_put_root(root);
 			return ERR_PTR(-ENOENT);
@@ -1752,7 +1784,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 		goto fail;
 	}
 
-	ret = btrfs_init_fs_root(root, anon_dev);
+	ret = btrfs_init_fs_root(root, new_fs_root_args);
 	if (ret)
 		goto fail;
 
@@ -1782,14 +1814,6 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 	}
 	return root;
 fail:
-	/*
-	 * If our caller provided us an anonymous device, then it's his
-	 * responsibility to free it in case we fail. So we have to set our
-	 * root's anon_dev to 0 to avoid a double free, once by btrfs_put_root()
-	 * and once again by our caller.
-	 */
-	if (anon_dev)
-		root->anon_dev = 0;
 	btrfs_put_root(root);
 	return ERR_PTR(ret);
 }
@@ -1804,7 +1828,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
 				     u64 objectid, bool check_ref)
 {
-	return btrfs_get_root_ref(fs_info, objectid, 0, check_ref);
+	return btrfs_get_root_ref(fs_info, objectid, NULL, check_ref);
 }
 
 /*
@@ -1816,9 +1840,10 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
  *		parameter value
  */
 struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
-					 u64 objectid, dev_t anon_dev)
+					 u64 objectid,
+					 struct btrfs_new_fs_root_args *new_fs_root_args)
 {
-	return btrfs_get_root_ref(fs_info, objectid, anon_dev, true);
+	return btrfs_get_root_ref(fs_info, objectid, new_fs_root_args, true);
 }
 
 /*
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 363935cfc084..67c6625797ca 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -25,6 +25,11 @@ static inline u64 btrfs_sb_offset(int mirror)
 	return BTRFS_SUPER_INFO_OFFSET;
 }
 
+struct btrfs_new_fs_root_args {
+	/* Preallocated anonymous block device number */
+	dev_t anon_dev;
+};
+
 struct btrfs_device;
 struct btrfs_fs_devices;
 struct btrfs_tree_parent_check;
@@ -58,6 +63,8 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
 int btrfs_commit_super(struct btrfs_fs_info *fs_info);
 struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
 					struct btrfs_key *key);
+struct btrfs_new_fs_root_args *btrfs_alloc_new_fs_root_args(void);
+void btrfs_free_new_fs_root_args(struct btrfs_new_fs_root_args *args);
 int btrfs_insert_fs_root(struct btrfs_fs_info *fs_info,
 			 struct btrfs_root *root);
 void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info);
@@ -65,7 +72,8 @@ void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info);
 struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
 				     u64 objectid, bool check_ref);
 struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
-					 u64 objectid, dev_t anon_dev);
+					 u64 objectid,
+					 struct btrfs_new_fs_root_args *new_fs_root_args);
 struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
 						 struct btrfs_path *path,
 						 u64 objectid);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 7e348bd2ccde..de0cafe9b62b 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -599,8 +599,8 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
 	};
 	unsigned int trans_num_items;
 	int ret;
-	dev_t anon_dev;
 	u64 objectid;
+	struct btrfs_new_fs_root_args *new_fs_root_args = NULL;
 
 	root_item = kzalloc(sizeof(*root_item), GFP_KERNEL);
 	if (!root_item)
@@ -619,14 +619,17 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
 		goto out_root_item;
 	}
 
-	ret = get_anon_bdev(&anon_dev);
-	if (ret < 0)
+	new_fs_root_args = btrfs_alloc_new_fs_root_args();
+	if (IS_ERR(new_fs_root_args)) {
+		ret = PTR_ERR(new_fs_root_args);
+		new_fs_root_args = NULL;
 		goto out_root_item;
+	}
 
 	new_inode_args.inode = btrfs_new_subvol_inode(mnt_userns, dir);
 	if (!new_inode_args.inode) {
 		ret = -ENOMEM;
-		goto out_anon_dev;
+		goto out_root_item;
 	}
 	ret = btrfs_new_inode_prepare(&new_inode_args, &trans_num_items);
 	if (ret)
@@ -717,14 +720,12 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
 	free_extent_buffer(leaf);
 	leaf = NULL;
 
-	new_root = btrfs_get_new_fs_root(fs_info, objectid, anon_dev);
+	new_root = btrfs_get_new_fs_root(fs_info, objectid, new_fs_root_args);
 	if (IS_ERR(new_root)) {
 		ret = PTR_ERR(new_root);
 		btrfs_abort_transaction(trans, ret);
 		goto out;
 	}
-	/* anon_dev is owned by new_root now. */
-	anon_dev = 0;
 	BTRFS_I(new_inode_args.inode)->root = new_root;
 	/* ... and new_root is owned by new_inode_args.inode now. */
 
@@ -763,10 +764,8 @@ static noinline int create_subvol(struct user_namespace *mnt_userns,
 	btrfs_new_inode_args_destroy(&new_inode_args);
 out_inode:
 	iput(new_inode_args.inode);
-out_anon_dev:
-	if (anon_dev)
-		free_anon_bdev(anon_dev);
 out_root_item:
+	btrfs_free_new_fs_root_args(new_fs_root_args);
 	kfree(root_item);
 	return ret;
 }
@@ -802,9 +801,13 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
 	if (!pending_snapshot)
 		return -ENOMEM;
 
-	ret = get_anon_bdev(&pending_snapshot->anon_dev);
-	if (ret < 0)
+	pending_snapshot->new_fs_root_args = btrfs_alloc_new_fs_root_args();
+	if (IS_ERR(pending_snapshot->new_fs_root_args)) {
+		ret = PTR_ERR(pending_snapshot->new_fs_root_args);
+		pending_snapshot->new_fs_root_args = NULL;
 		goto free_pending;
+	}
+
 	pending_snapshot->root_item = kzalloc(sizeof(struct btrfs_root_item),
 			GFP_KERNEL);
 	pending_snapshot->path = btrfs_alloc_path();
@@ -861,16 +864,11 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
 
 	d_instantiate(dentry, inode);
 	ret = 0;
-	pending_snapshot->anon_dev = 0;
 fail:
-	/* Prevent double freeing of anon_dev */
-	if (ret && pending_snapshot->snap)
-		pending_snapshot->snap->anon_dev = 0;
 	btrfs_put_root(pending_snapshot->snap);
 	btrfs_subvolume_release_metadata(root, &pending_snapshot->block_rsv);
 free_pending:
-	if (pending_snapshot->anon_dev)
-		free_anon_bdev(pending_snapshot->anon_dev);
+	btrfs_free_new_fs_root_args(pending_snapshot->new_fs_root_args);
 	kfree(pending_snapshot->root_item);
 	btrfs_free_path(pending_snapshot->path);
 	kfree(pending_snapshot);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index b8c52e89688c..042626fbd3b8 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1801,7 +1801,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	}
 
 	key.offset = (u64)-1;
-	pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->anon_dev);
+	pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->new_fs_root_args);
 	if (IS_ERR(pending->snap)) {
 		ret = PTR_ERR(pending->snap);
 		pending->snap = NULL;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 97f6c39f59c8..60022240c1ea 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -11,6 +11,7 @@
 #include "delayed-ref.h"
 #include "ctree.h"
 #include "misc.h"
+#include "disk-io.h"
 
 enum btrfs_trans_state {
 	TRANS_STATE_RUNNING,
@@ -163,8 +164,8 @@ struct btrfs_pending_snapshot {
 	struct btrfs_block_rsv block_rsv;
 	/* extra metadata reservation for relocation */
 	int error;
-	/* Preallocated anonymous block device number */
-	dev_t anon_dev;
+	/* Preallocated new_fs_root_args */
+	struct btrfs_new_fs_root_args *new_fs_root_args;
 	bool readonly;
 	struct list_head list;
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/3] btrfs: change snapshot_lock to dynamic pointer
  2022-12-14  2:11 [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot robbieko
  2022-12-14  2:11 ` [PATCH v2 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot robbieko
@ 2022-12-14  2:11 ` robbieko
  2022-12-14  2:11 ` [PATCH v2 3/3] btrfs: add snapshot_lock to new_fs_root_args robbieko
  2022-12-14 16:59 ` [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot Josef Bacik
  3 siblings, 0 replies; 10+ messages in thread
From: robbieko @ 2022-12-14  2:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

From: Robbie Ko <robbieko@synology.com>

Change snapshot_lock to dynamic pointer to allocate memory
at the beginning of creating a subvolume/snapshot.

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/disk-io.c | 12 ++++++++++--
 fs/btrfs/file.c    |  8 ++++----
 fs/btrfs/inode.c   | 12 ++++++------
 fs/btrfs/ioctl.c   |  4 ++--
 5 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6965703a81b6..d0e703ad865f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -307,7 +307,7 @@ struct btrfs_root {
 	 */
 	int dedupe_in_progress;
 	/* For exclusion of snapshot creation and nocow writes */
-	struct btrfs_drew_lock snapshot_lock;
+	struct btrfs_drew_lock *snapshot_lock;
 
 	atomic_t snapshot_force_cow;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e45fd1ef5d11..ed2ce2dfbfcd 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1546,12 +1546,17 @@ static int btrfs_init_fs_root(struct btrfs_root *root,
 	int ret;
 	unsigned int nofs_flag;
 
+	root->snapshot_lock = kzalloc(sizeof(*root->snapshot_lock), GFP_NOFS);
+	if (!root->snapshot_lock) {
+		ret = -ENOMEM;
+		goto fail;
+	}
 	/*
 	 * We might be called under a transaction (e.g. indirect backref
 	 * resolution) which could deadlock if it triggers memory reclaim
 	 */
 	nofs_flag = memalloc_nofs_save();
-	ret = btrfs_drew_lock_init(&root->snapshot_lock);
+	ret = btrfs_drew_lock_init(root->snapshot_lock);
 	memalloc_nofs_restore(nofs_flag);
 	if (ret)
 		goto fail;
@@ -2260,7 +2265,10 @@ void btrfs_put_root(struct btrfs_root *root)
 		WARN_ON(test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state));
 		if (root->anon_dev)
 			free_anon_bdev(root->anon_dev);
-		btrfs_drew_lock_destroy(&root->snapshot_lock);
+		if (root->snapshot_lock) {
+			btrfs_drew_lock_destroy(root->snapshot_lock);
+			kfree(root->snapshot_lock);
+		}
 		free_root_extent_buffers(root);
 #ifdef CONFIG_BTRFS_DEBUG
 		spin_lock(&root->fs_info->fs_roots_radix_lock);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 91b00eb2440e..377ed246792c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1071,7 +1071,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 	if (!(inode->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)))
 		return 0;
 
-	if (!btrfs_drew_try_write_lock(&root->snapshot_lock))
+	if (!btrfs_drew_try_write_lock(root->snapshot_lock))
 		return -EAGAIN;
 
 	lockstart = round_down(pos, fs_info->sectorsize);
@@ -1082,7 +1082,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 	if (nowait) {
 		if (!btrfs_try_lock_ordered_range(inode, lockstart, lockend,
 						  &cached_state)) {
-			btrfs_drew_write_unlock(&root->snapshot_lock);
+			btrfs_drew_write_unlock(root->snapshot_lock);
 			return -EAGAIN;
 		}
 	} else {
@@ -1092,7 +1092,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 	ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
 			NULL, NULL, NULL, nowait, false);
 	if (ret <= 0)
-		btrfs_drew_write_unlock(&root->snapshot_lock);
+		btrfs_drew_write_unlock(root->snapshot_lock);
 	else
 		*write_bytes = min_t(size_t, *write_bytes ,
 				     num_bytes - pos + lockstart);
@@ -1103,7 +1103,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 
 void btrfs_check_nocow_unlock(struct btrfs_inode *inode)
 {
-	btrfs_drew_write_unlock(&inode->root->snapshot_lock);
+	btrfs_drew_write_unlock(inode->root->snapshot_lock);
 }
 
 static void update_time_for_write(struct inode *inode)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8bcad9940154..a0a1cd70f5ee 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5218,16 +5218,16 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		 * truncation, it must capture all writes that happened before
 		 * this truncation.
 		 */
-		btrfs_drew_write_lock(&root->snapshot_lock);
+		btrfs_drew_write_lock(root->snapshot_lock);
 		ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, newsize);
 		if (ret) {
-			btrfs_drew_write_unlock(&root->snapshot_lock);
+			btrfs_drew_write_unlock(root->snapshot_lock);
 			return ret;
 		}
 
 		trans = btrfs_start_transaction(root, 1);
 		if (IS_ERR(trans)) {
-			btrfs_drew_write_unlock(&root->snapshot_lock);
+			btrfs_drew_write_unlock(root->snapshot_lock);
 			return PTR_ERR(trans);
 		}
 
@@ -5235,7 +5235,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		btrfs_inode_safe_disk_i_size_write(BTRFS_I(inode), 0);
 		pagecache_isize_extended(inode, oldsize, newsize);
 		ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
-		btrfs_drew_write_unlock(&root->snapshot_lock);
+		btrfs_drew_write_unlock(root->snapshot_lock);
 		btrfs_end_transaction(trans);
 	} else {
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -11090,7 +11090,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 	 * completes before the first write into the swap file after it is
 	 * activated, than that write would fallback to COW.
 	 */
-	if (!btrfs_drew_try_write_lock(&root->snapshot_lock)) {
+	if (!btrfs_drew_try_write_lock(root->snapshot_lock)) {
 		btrfs_exclop_finish(fs_info);
 		btrfs_warn(fs_info,
 	   "cannot activate swapfile because snapshot creation is in progress");
@@ -11263,7 +11263,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 	if (ret)
 		btrfs_swap_deactivate(file);
 
-	btrfs_drew_write_unlock(&root->snapshot_lock);
+	btrfs_drew_write_unlock(root->snapshot_lock);
 
 	btrfs_exclop_finish(fs_info);
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index de0cafe9b62b..9adaf3384f58 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1019,7 +1019,7 @@ static noinline int btrfs_mksnapshot(const struct path *parent,
 	 * possible. This is to avoid later writeback (running dealloc) to
 	 * fallback to COW mode and unexpectedly fail with ENOSPC.
 	 */
-	btrfs_drew_read_lock(&root->snapshot_lock);
+	btrfs_drew_read_lock(root->snapshot_lock);
 
 	ret = btrfs_start_delalloc_snapshot(root, false);
 	if (ret)
@@ -1040,7 +1040,7 @@ static noinline int btrfs_mksnapshot(const struct path *parent,
 out:
 	if (snapshot_force_cow)
 		atomic_dec(&root->snapshot_force_cow);
-	btrfs_drew_read_unlock(&root->snapshot_lock);
+	btrfs_drew_read_unlock(root->snapshot_lock);
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/3] btrfs: add snapshot_lock to new_fs_root_args
  2022-12-14  2:11 [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot robbieko
  2022-12-14  2:11 ` [PATCH v2 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot robbieko
  2022-12-14  2:11 ` [PATCH v2 2/3] btrfs: change snapshot_lock to dynamic pointer robbieko
@ 2022-12-14  2:11 ` robbieko
  2022-12-14 16:59 ` [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot Josef Bacik
  3 siblings, 0 replies; 10+ messages in thread
From: robbieko @ 2022-12-14  2:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

From: Robbie Ko <robbieko@synology.com>

[Issue]
When creating subvolume/snapshot, the transaction may
be abort due to -ENOMEM.

[Cause]
During creating a subvolume/snapshot, it is necessary
to allocate memory for Initializing fs root.
Therefore, it can only use GFP_NOFS to allocate memory
to avoid deadlock issues.
However, atomic allocation is required when processing
percpu_counter_init without GFP_KERNEL due to the unique
structure of percpu_counter.
In this situation, allocating memory for initializing
fs root may cause unexpected -ENOMEM when free memory
is low and causes btrfs transaction to abort.

[Fix]
So we add snapshot_lock into new_fs_root_args to allocate
the memory before holding a transaction handle.
This way, we can reduce the chances of -ENOMEM when
calling btrfs_init_fs_root.
Furthermore, it can avoid aborting a transaction and
turn the fs to RO mode.

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/disk-io.c | 44 +++++++++++++++++++++++++++++++-------------
 fs/btrfs/disk-io.h |  2 ++
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ed2ce2dfbfcd..7ba1a019f5b0 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1519,6 +1519,15 @@ struct btrfs_new_fs_root_args *btrfs_alloc_new_fs_root_args(void)
 	if (err)
 		goto error;
 
+	args->snapshot_lock = kzalloc(sizeof(*args->snapshot_lock), GFP_KERNEL);
+	if (!args->snapshot_lock) {
+		err = -ENOMEM;
+		goto error;
+	}
+	err = btrfs_drew_lock_init(args->snapshot_lock);
+	if (err)
+		goto error;
+
 	return args;
 
 error:
@@ -1530,6 +1539,10 @@ void btrfs_free_new_fs_root_args(struct btrfs_new_fs_root_args *args)
 {
 	if (!args)
 		return;
+	if (args->snapshot_lock) {
+		btrfs_drew_lock_destroy(args->snapshot_lock);
+		kfree(args->snapshot_lock);
+	}
 	if (args->anon_dev)
 		free_anon_bdev(args->anon_dev);
 	kfree(args);
@@ -1546,20 +1559,25 @@ static int btrfs_init_fs_root(struct btrfs_root *root,
 	int ret;
 	unsigned int nofs_flag;
 
-	root->snapshot_lock = kzalloc(sizeof(*root->snapshot_lock), GFP_NOFS);
-	if (!root->snapshot_lock) {
-		ret = -ENOMEM;
-		goto fail;
+	if (new_fs_root_args && new_fs_root_args->snapshot_lock) {
+		root->snapshot_lock = new_fs_root_args->snapshot_lock;
+		new_fs_root_args->snapshot_lock = NULL;
+	} else {
+		root->snapshot_lock = kzalloc(sizeof(*root->snapshot_lock), GFP_NOFS);
+		if (!root->snapshot_lock) {
+			ret = -ENOMEM;
+			goto fail;
+		}
+		/*
+		 * We might be called under a transaction (e.g. indirect backref
+		 * resolution) which could deadlock if it triggers memory reclaim.
+		 */
+		nofs_flag = memalloc_nofs_save();
+		ret = btrfs_drew_lock_init(root->snapshot_lock);
+		memalloc_nofs_restore(nofs_flag);
+		if (ret)
+			goto fail;
 	}
-	/*
-	 * We might be called under a transaction (e.g. indirect backref
-	 * resolution) which could deadlock if it triggers memory reclaim
-	 */
-	nofs_flag = memalloc_nofs_save();
-	ret = btrfs_drew_lock_init(root->snapshot_lock);
-	memalloc_nofs_restore(nofs_flag);
-	if (ret)
-		goto fail;
 
 	if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID &&
 	    !btrfs_is_data_reloc_root(root)) {
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 67c6625797ca..21c41cf8d115 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -28,6 +28,8 @@ static inline u64 btrfs_sb_offset(int mirror)
 struct btrfs_new_fs_root_args {
 	/* Preallocated anonymous block device number */
 	dev_t anon_dev;
+	/* Preallocated snapshot lock */
+	struct btrfs_drew_lock *snapshot_lock;
 };
 
 struct btrfs_device;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot
  2022-12-14  2:11 [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot robbieko
                   ` (2 preceding siblings ...)
  2022-12-14  2:11 ` [PATCH v2 3/3] btrfs: add snapshot_lock to new_fs_root_args robbieko
@ 2022-12-14 16:59 ` Josef Bacik
  2022-12-14 18:07   ` David Sterba
  3 siblings, 1 reply; 10+ messages in thread
From: Josef Bacik @ 2022-12-14 16:59 UTC (permalink / raw)
  To: robbieko; +Cc: linux-btrfs

On Wed, Dec 14, 2022 at 10:11:22AM +0800, robbieko wrote:
> From: Robbie Ko <robbieko@synology.com>
> 
> [Issue]
> When creating subvolume/snapshot, the transaction may be abort due to -ENOMEM
> 
>   WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937 create_pending_snapshot+0xe30/0xe70 [btrfs]()
>   CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
>   Call Trace:
>     create_pending_snapshot+0xe30/0xe70 [btrfs]
>     create_pending_snapshots+0x89/0xb0 [btrfs]
>     btrfs_commit_transaction+0x469/0xc60 [btrfs]
>     btrfs_mksubvol+0x5bd/0x690 [btrfs]
>     btrfs_mksnapshot+0x102/0x170 [btrfs]
>     btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
>     btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
>     btrfs_ioctl+0x2111/0x3130 [btrfs]
>     do_vfs_ioctl+0x7ea/0xa80
>     SyS_ioctl+0xa1/0xb0
>     entry_SYSCALL_64_fastpath+0x1e/0x8e
>   ---[ end trace 910c8f86780ca385 ]---
>   BTRFS: error (device dm-2) in create_pending_snapshot:1937: errno=-12 Out of memory
> 
> [Cause]
> During creating a subvolume/snapshot, it is necessary to allocate memory for Initializing fs root.
> Therefore, it can only use GFP_NOFS to allocate memory to avoid deadlock issues.
> However, atomic allocation is required when processing percpu_counter_init
> without GFP_KERNEL due to the unique structure of percpu_counter.
> In this situation, allocating memory for initializing fs root may cause
> unexpected -ENOMEM when free memory is low and causes btrfs transaction to abort.
> 
> [Fix]
> We allocate memory at the beginning of creating a subvolume/snapshot.
> This way, we can ensure the memory is enough when initializing fs root.
> Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
> the transaction won’t abort since it hasn’t started yet.
> 

Honestly I'd rather just make the btrfs_drew_lock use an atomic_t for the
writers counter as well.  This is only taken in truncate an nocow writes, and in
nocow writes there are a looooot of slower things that have to be done that
we're not winning a lot with the percpu counter.  Is there any reason not to
just do that and leave all this code alone?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot
  2022-12-14 16:59 ` [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot Josef Bacik
@ 2022-12-14 18:07   ` David Sterba
  2022-12-19  6:54     ` robbieko
  0 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2022-12-14 18:07 UTC (permalink / raw)
  To: Josef Bacik; +Cc: robbieko, linux-btrfs

On Wed, Dec 14, 2022 at 11:59:10AM -0500, Josef Bacik wrote:
> On Wed, Dec 14, 2022 at 10:11:22AM +0800, robbieko wrote:
> > From: Robbie Ko <robbieko@synology.com>
> > 
> > [Issue]
> > When creating subvolume/snapshot, the transaction may be abort due to -ENOMEM
> > 
> >   WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937 create_pending_snapshot+0xe30/0xe70 [btrfs]()
> >   CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
> >   Call Trace:
> >     create_pending_snapshot+0xe30/0xe70 [btrfs]
> >     create_pending_snapshots+0x89/0xb0 [btrfs]
> >     btrfs_commit_transaction+0x469/0xc60 [btrfs]
> >     btrfs_mksubvol+0x5bd/0x690 [btrfs]
> >     btrfs_mksnapshot+0x102/0x170 [btrfs]
> >     btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
> >     btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
> >     btrfs_ioctl+0x2111/0x3130 [btrfs]
> >     do_vfs_ioctl+0x7ea/0xa80
> >     SyS_ioctl+0xa1/0xb0
> >     entry_SYSCALL_64_fastpath+0x1e/0x8e
> >   ---[ end trace 910c8f86780ca385 ]---
> >   BTRFS: error (device dm-2) in create_pending_snapshot:1937: errno=-12 Out of memory
> > 
> > [Cause]
> > During creating a subvolume/snapshot, it is necessary to allocate memory for Initializing fs root.
> > Therefore, it can only use GFP_NOFS to allocate memory to avoid deadlock issues.
> > However, atomic allocation is required when processing percpu_counter_init
> > without GFP_KERNEL due to the unique structure of percpu_counter.
> > In this situation, allocating memory for initializing fs root may cause
> > unexpected -ENOMEM when free memory is low and causes btrfs transaction to abort.
> > 
> > [Fix]
> > We allocate memory at the beginning of creating a subvolume/snapshot.
> > This way, we can ensure the memory is enough when initializing fs root.
> > Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
> > the transaction won’t abort since it hasn’t started yet.
> 
> Honestly I'd rather just make the btrfs_drew_lock use an atomic_t for the
> writers counter as well.  This is only taken in truncate an nocow writes, and in
> nocow writes there are a looooot of slower things that have to be done that
> we're not winning a lot with the percpu counter.  Is there any reason not to
> just do that and leave all this code alone?  Thanks,

The percpu counter for writers is there since the original commit
8257b2dc3c1a1057 "Btrfs: introduce btrfs_{start, end}_nocow_write() for
each subvolume". The reason could be to avoid hammering the same
cacheline from all the readers but then the writers do that anyway.
This happens in context related to IO or there's some waiting anyway, so
yeah using atomic for writers should be ok as well.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot
  2022-12-14 18:07   ` David Sterba
@ 2022-12-19  6:54     ` robbieko
  2023-01-10 15:08       ` David Sterba
  0 siblings, 1 reply; 10+ messages in thread
From: robbieko @ 2022-12-19  6:54 UTC (permalink / raw)
  To: dsterba, Josef Bacik; +Cc: linux-btrfs


David Sterba 於 2022/12/15 上午2:07 寫道:
> On Wed, Dec 14, 2022 at 11:59:10AM -0500, Josef Bacik wrote:
>> On Wed, Dec 14, 2022 at 10:11:22AM +0800, robbieko wrote:
>>> From: Robbie Ko <robbieko@synology.com>
>>>
>>> [Issue]
>>> When creating subvolume/snapshot, the transaction may be abort due to -ENOMEM
>>>
>>>    WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937 create_pending_snapshot+0xe30/0xe70 [btrfs]()
>>>    CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
>>>    Call Trace:
>>>      create_pending_snapshot+0xe30/0xe70 [btrfs]
>>>      create_pending_snapshots+0x89/0xb0 [btrfs]
>>>      btrfs_commit_transaction+0x469/0xc60 [btrfs]
>>>      btrfs_mksubvol+0x5bd/0x690 [btrfs]
>>>      btrfs_mksnapshot+0x102/0x170 [btrfs]
>>>      btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
>>>      btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
>>>      btrfs_ioctl+0x2111/0x3130 [btrfs]
>>>      do_vfs_ioctl+0x7ea/0xa80
>>>      SyS_ioctl+0xa1/0xb0
>>>      entry_SYSCALL_64_fastpath+0x1e/0x8e
>>>    ---[ end trace 910c8f86780ca385 ]---
>>>    BTRFS: error (device dm-2) in create_pending_snapshot:1937: errno=-12 Out of memory
>>>
>>> [Cause]
>>> During creating a subvolume/snapshot, it is necessary to allocate memory for Initializing fs root.
>>> Therefore, it can only use GFP_NOFS to allocate memory to avoid deadlock issues.
>>> However, atomic allocation is required when processing percpu_counter_init
>>> without GFP_KERNEL due to the unique structure of percpu_counter.
>>> In this situation, allocating memory for initializing fs root may cause
>>> unexpected -ENOMEM when free memory is low and causes btrfs transaction to abort.
>>>
>>> [Fix]
>>> We allocate memory at the beginning of creating a subvolume/snapshot.
>>> This way, we can ensure the memory is enough when initializing fs root.
>>> Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
>>> the transaction won’t abort since it hasn’t started yet.
>> Honestly I'd rather just make the btrfs_drew_lock use an atomic_t for the
>> writers counter as well.  This is only taken in truncate an nocow writes, and in
>> nocow writes there are a looooot of slower things that have to be done that
>> we're not winning a lot with the percpu counter.  Is there any reason not to
>> just do that and leave all this code alone?  Thanks,
> The percpu counter for writers is there since the original commit
> 8257b2dc3c1a1057 "Btrfs: introduce btrfs_{start, end}_nocow_write() for
> each subvolume". The reason could be to avoid hammering the same
> cacheline from all the readers but then the writers do that anyway.
> This happens in context related to IO or there's some waiting anyway, so
> yeah using atomic for writers should be ok as well.

Sorry for the late reply, I've been busy recently.
This modification will not affect the original btrfs_drew_lock behavior,
and the framework can also provide future scenarios where memory
needs to be allocated in init_fs_root.

Thanks.

Robbie Ko


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot
  2022-12-19  6:54     ` robbieko
@ 2023-01-10 15:08       ` David Sterba
  2023-03-09 18:47         ` David Sterba
  0 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2023-01-10 15:08 UTC (permalink / raw)
  To: robbieko; +Cc: dsterba, Josef Bacik, linux-btrfs

On Mon, Dec 19, 2022 at 02:54:06PM +0800, robbieko wrote:
> 
> David Sterba 於 2022/12/15 上午2:07 寫道:
> > On Wed, Dec 14, 2022 at 11:59:10AM -0500, Josef Bacik wrote:
> >> On Wed, Dec 14, 2022 at 10:11:22AM +0800, robbieko wrote:
> >>> From: Robbie Ko <robbieko@synology.com>
> >>>
> >>> [Issue]
> >>> When creating subvolume/snapshot, the transaction may be abort due to -ENOMEM
> >>>
> >>>    WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937 create_pending_snapshot+0xe30/0xe70 [btrfs]()
> >>>    CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
> >>>    Call Trace:
> >>>      create_pending_snapshot+0xe30/0xe70 [btrfs]
> >>>      create_pending_snapshots+0x89/0xb0 [btrfs]
> >>>      btrfs_commit_transaction+0x469/0xc60 [btrfs]
> >>>      btrfs_mksubvol+0x5bd/0x690 [btrfs]
> >>>      btrfs_mksnapshot+0x102/0x170 [btrfs]
> >>>      btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
> >>>      btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
> >>>      btrfs_ioctl+0x2111/0x3130 [btrfs]
> >>>      do_vfs_ioctl+0x7ea/0xa80
> >>>      SyS_ioctl+0xa1/0xb0
> >>>      entry_SYSCALL_64_fastpath+0x1e/0x8e
> >>>    ---[ end trace 910c8f86780ca385 ]---
> >>>    BTRFS: error (device dm-2) in create_pending_snapshot:1937: errno=-12 Out of memory
> >>>
> >>> [Cause]
> >>> During creating a subvolume/snapshot, it is necessary to allocate memory for Initializing fs root.
> >>> Therefore, it can only use GFP_NOFS to allocate memory to avoid deadlock issues.
> >>> However, atomic allocation is required when processing percpu_counter_init
> >>> without GFP_KERNEL due to the unique structure of percpu_counter.
> >>> In this situation, allocating memory for initializing fs root may cause
> >>> unexpected -ENOMEM when free memory is low and causes btrfs transaction to abort.
> >>>
> >>> [Fix]
> >>> We allocate memory at the beginning of creating a subvolume/snapshot.
> >>> This way, we can ensure the memory is enough when initializing fs root.
> >>> Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
> >>> the transaction won’t abort since it hasn’t started yet.
> >> Honestly I'd rather just make the btrfs_drew_lock use an atomic_t for the
> >> writers counter as well.  This is only taken in truncate an nocow writes, and in
> >> nocow writes there are a looooot of slower things that have to be done that
> >> we're not winning a lot with the percpu counter.  Is there any reason not to
> >> just do that and leave all this code alone?  Thanks,
> > The percpu counter for writers is there since the original commit
> > 8257b2dc3c1a1057 "Btrfs: introduce btrfs_{start, end}_nocow_write() for
> > each subvolume". The reason could be to avoid hammering the same
> > cacheline from all the readers but then the writers do that anyway.
> > This happens in context related to IO or there's some waiting anyway, so
> > yeah using atomic for writers should be ok as well.
> 
> Sorry for the late reply, I've been busy recently.
> This modification will not affect the original btrfs_drew_lock behavior,
> and the framework can also provide future scenarios where memory
> needs to be allocated in init_fs_root.

With the conversion to atomic_t the preallocation can remain unchanged
as there would be only the anon bdev preallocated, then there's not much
reason to have the infrastructure.

I'm now testing the patch below, it's relatively short an can be
backported if needed but it can potentially make the performance worse
in some cases.

---
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a7d5a3967ba0..7acedc6c2a56 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1523,15 +1523,7 @@ static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
 	int ret;
 	unsigned int nofs_flag;
 
-	/*
-	 * We might be called under a transaction (e.g. indirect backref
-	 * resolution) which could deadlock if it triggers memory reclaim
-	 */
-	nofs_flag = memalloc_nofs_save();
-	ret = btrfs_drew_lock_init(&root->snapshot_lock);
-	memalloc_nofs_restore(nofs_flag);
-	if (ret)
-		goto fail;
+	 btrfs_drew_lock_init(&root->snapshot_lock);
 
 	if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID &&
 	    !btrfs_is_data_reloc_root(root)) {
@@ -2242,7 +2234,6 @@ void btrfs_put_root(struct btrfs_root *root)
 		WARN_ON(test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state));
 		if (root->anon_dev)
 			free_anon_bdev(root->anon_dev);
-		btrfs_drew_lock_destroy(&root->snapshot_lock);
 		free_root_extent_buffers(root);
 #ifdef CONFIG_BTRFS_DEBUG
 		spin_lock(&root->fs_info->fs_roots_radix_lock);
diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 870528d87526..3a496b0d3d2b 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -325,24 +325,12 @@ struct extent_buffer *btrfs_try_read_lock_root_node(struct btrfs_root *root)
  * acquire the lock.
  */
 
-int btrfs_drew_lock_init(struct btrfs_drew_lock *lock)
+void btrfs_drew_lock_init(struct btrfs_drew_lock *lock)
 {
-	int ret;
-
-	ret = percpu_counter_init(&lock->writers, 0, GFP_KERNEL);
-	if (ret)
-		return ret;
-
 	atomic_set(&lock->readers, 0);
+	atomic_set(&lock->writers, 0);
 	init_waitqueue_head(&lock->pending_readers);
 	init_waitqueue_head(&lock->pending_writers);
-
-	return 0;
-}
-
-void btrfs_drew_lock_destroy(struct btrfs_drew_lock *lock)
-{
-	percpu_counter_destroy(&lock->writers);
 }
 
 /* Return true if acquisition is successful, false otherwise */
@@ -351,10 +339,10 @@ bool btrfs_drew_try_write_lock(struct btrfs_drew_lock *lock)
 	if (atomic_read(&lock->readers))
 		return false;
 
-	percpu_counter_inc(&lock->writers);
+	atomic_inc(&lock->writers);
 
 	/* Ensure writers count is updated before we check for pending readers */
-	smp_mb();
+	smp_mb__after_atomic();
 	if (atomic_read(&lock->readers)) {
 		btrfs_drew_write_unlock(lock);
 		return false;
@@ -374,7 +362,7 @@ void btrfs_drew_write_lock(struct btrfs_drew_lock *lock)
 
 void btrfs_drew_write_unlock(struct btrfs_drew_lock *lock)
 {
-	percpu_counter_dec(&lock->writers);
+	atomic_dec(&lock->writers);
 	cond_wake_up(&lock->pending_readers);
 }
 
@@ -390,8 +378,7 @@ void btrfs_drew_read_lock(struct btrfs_drew_lock *lock)
 	 */
 	smp_mb__after_atomic();
 
-	wait_event(lock->pending_readers,
-		   percpu_counter_sum(&lock->writers) == 0);
+	wait_event(lock->pending_readers, atomic_read(&lock->writers) == 0);
 }
 
 void btrfs_drew_read_unlock(struct btrfs_drew_lock *lock)
diff --git a/fs/btrfs/locking.h b/fs/btrfs/locking.h
index 11c2269b4b6f..edb9b4a0dba1 100644
--- a/fs/btrfs/locking.h
+++ b/fs/btrfs/locking.h
@@ -195,13 +195,12 @@ static inline void btrfs_tree_unlock_rw(struct extent_buffer *eb, int rw)
 
 struct btrfs_drew_lock {
 	atomic_t readers;
-	struct percpu_counter writers;
+	atomic_t writers;
 	wait_queue_head_t pending_writers;
 	wait_queue_head_t pending_readers;
 };
 
-int btrfs_drew_lock_init(struct btrfs_drew_lock *lock);
-void btrfs_drew_lock_destroy(struct btrfs_drew_lock *lock);
+void btrfs_drew_lock_init(struct btrfs_drew_lock *lock);
 void btrfs_drew_write_lock(struct btrfs_drew_lock *lock);
 bool btrfs_drew_try_write_lock(struct btrfs_drew_lock *lock);
 void btrfs_drew_write_unlock(struct btrfs_drew_lock *lock);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot
  2023-01-10 15:08       ` David Sterba
@ 2023-03-09 18:47         ` David Sterba
  2023-03-10  1:07           ` robbieko
  0 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2023-03-09 18:47 UTC (permalink / raw)
  To: robbieko; +Cc: David Sterba, Josef Bacik, linux-btrfs

On Tue, Jan 10, 2023 at 04:08:18PM +0100, David Sterba wrote:
> > Sorry for the late reply, I've been busy recently.
> > This modification will not affect the original btrfs_drew_lock behavior,
> > and the framework can also provide future scenarios where memory
> > needs to be allocated in init_fs_root.
> 
> With the conversion to atomic_t the preallocation can remain unchanged
> as there would be only the anon bdev preallocated, then there's not much
> reason to have the infrastructure.
> 
> I'm now testing the patch below, it's relatively short an can be
> backported if needed but it can potentially make the performance worse
> in some cases.

I forgot to CC you, the patch implementing the switch to atomic has been
sent and merged to misc-next.

https://lore.kernel.org/linux-btrfs/20230301204708.25710-1-dsterba@suse.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot
  2023-03-09 18:47         ` David Sterba
@ 2023-03-10  1:07           ` robbieko
  0 siblings, 0 replies; 10+ messages in thread
From: robbieko @ 2023-03-10  1:07 UTC (permalink / raw)
  To: dsterba; +Cc: Josef Bacik, linux-btrfs


David Sterba 於 2023/3/10 上午2:47 寫道:
> On Tue, Jan 10, 2023 at 04:08:18PM +0100, David Sterba wrote:
>>> Sorry for the late reply, I've been busy recently.
>>> This modification will not affect the original btrfs_drew_lock behavior,
>>> and the framework can also provide future scenarios where memory
>>> needs to be allocated in init_fs_root.
>> With the conversion to atomic_t the preallocation can remain unchanged
>> as there would be only the anon bdev preallocated, then there's not much
>> reason to have the infrastructure.
>>
>> I'm now testing the patch below, it's relatively short an can be
>> backported if needed but it can potentially make the performance worse
>> in some cases.
> I forgot to CC you, the patch implementing the switch to atomic has been
> sent and merged to misc-next.
>
> https://lore.kernel.org/linux-btrfs/20230301204708.25710-1-dsterba@suse.com/
OK, Thank you so much.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-03-10  1:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-14  2:11 [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot robbieko
2022-12-14  2:11 ` [PATCH v2 1/3] btrfs: refactor anon_dev with new_fs_root_args for create subvolume/snapshot robbieko
2022-12-14  2:11 ` [PATCH v2 2/3] btrfs: change snapshot_lock to dynamic pointer robbieko
2022-12-14  2:11 ` [PATCH v2 3/3] btrfs: add snapshot_lock to new_fs_root_args robbieko
2022-12-14 16:59 ` [PATCH v2 0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot Josef Bacik
2022-12-14 18:07   ` David Sterba
2022-12-19  6:54     ` robbieko
2023-01-10 15:08       ` David Sterba
2023-03-09 18:47         ` David Sterba
2023-03-10  1:07           ` robbieko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).