linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Qgroup/delayed node related fixes
@ 2021-02-22 16:40 Nikolay Borisov
  2021-02-22 16:40 ` [PATCH 1/6] btrfs: Free correct amount of space in btrfs_delayed_inode_reserve_metadata Nikolay Borisov
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Nikolay Borisov @ 2021-02-22 16:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This series contains a couple of fixes and code simplifications around qgroup
and delayed node interation. The first 3 patches fix 2 separate issues - one
possible underflow when freeing qgroup-reserved space and the other one is a
deadlock. Next 3 patches build on the fixes to clean up and simplify qgroup's
flushing code.

Nikolay Borisov (6):
  btrfs: Free correct amount of space in btrfs_delayed_inode_reserve_metadata
  btrfs: Export qgroup_reserve_meta
  btrfs: Don't flush from btrfs_delayed_inode_reserve_metadata
  btrfs: Cleanup try_flush_qgroup
  btrfs: Remove btrfs_inode from btrfs_delayed_inode_reserve_metadata
  btrfs: Simplify code flow in btrfs_delayed_inode_reserve_metadata

 fs/btrfs/delayed-inode.c | 32 +++++++-------------------------
 fs/btrfs/inode.c         |  2 +-
 fs/btrfs/qgroup.c        | 39 +++++++++------------------------------
 fs/btrfs/qgroup.h        |  3 +++
 4 files changed, 20 insertions(+), 56 deletions(-)

--
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6] btrfs: Free correct amount of space in btrfs_delayed_inode_reserve_metadata
  2021-02-22 16:40 [PATCH 0/6] Qgroup/delayed node related fixes Nikolay Borisov
@ 2021-02-22 16:40 ` Nikolay Borisov
  2021-02-22 23:41   ` Qu Wenruo
  2021-02-22 16:40 ` [PATCH 2/6] btrfs: Export qgroup_reserve_meta Nikolay Borisov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2021-02-22 16:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Following commit f218ea6c4792 ("btrfs: delayed-inode: Remove wrong
qgroup meta reservation calls") this function now reserves num_bytes,
rather than the fixed amount of nodesize. As such this requires the
same amount to be freed in case of failure. Fix this by adjusting
the amount we are freeing.

Fixes f218ea6c4792 ("btrfs: delayed-inode: Remove wrong qgroup meta reservation calls")

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/delayed-inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index ec0b50b8c5d6..ac9966e76a2f 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -649,7 +649,7 @@ static int btrfs_delayed_inode_reserve_metadata(
 						      btrfs_ino(inode),
 						      num_bytes, 1);
 		} else {
-			btrfs_qgroup_free_meta_prealloc(root, fs_info->nodesize);
+			btrfs_qgroup_free_meta_prealloc(root, num_bytes);
 		}
 		return ret;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/6] btrfs: Export qgroup_reserve_meta
  2021-02-22 16:40 [PATCH 0/6] Qgroup/delayed node related fixes Nikolay Borisov
  2021-02-22 16:40 ` [PATCH 1/6] btrfs: Free correct amount of space in btrfs_delayed_inode_reserve_metadata Nikolay Borisov
@ 2021-02-22 16:40 ` Nikolay Borisov
  2021-02-22 23:42   ` Qu Wenruo
  2021-02-22 16:40 ` [PATCH 3/6] btrfs: Don't flush from btrfs_delayed_inode_reserve_metadata Nikolay Borisov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2021-02-22 16:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/qgroup.c | 4 ++--
 fs/btrfs/qgroup.h | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 808370ada888..fbef95bc3557 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3841,8 +3841,8 @@ static int sub_root_meta_rsv(struct btrfs_root *root, int num_bytes,
 	return num_bytes;
 }
 
-static int qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
-				enum btrfs_qgroup_rsv_type type, bool enforce)
+int qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
+			enum btrfs_qgroup_rsv_type type, bool enforce)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	int ret;
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 50dea9a2d8fb..c1a3cc15dede 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -354,6 +354,9 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
 			       u64 rfer, u64 excl);
 #endif
 
+int qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
+			enum btrfs_qgroup_rsv_type type, bool enforce);
+
 /* New io_tree based accurate qgroup reserve API */
 int btrfs_qgroup_reserve_data(struct btrfs_inode *inode,
 			struct extent_changeset **reserved, u64 start, u64 len);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/6] btrfs: Don't flush from btrfs_delayed_inode_reserve_metadata
  2021-02-22 16:40 [PATCH 0/6] Qgroup/delayed node related fixes Nikolay Borisov
  2021-02-22 16:40 ` [PATCH 1/6] btrfs: Free correct amount of space in btrfs_delayed_inode_reserve_metadata Nikolay Borisov
  2021-02-22 16:40 ` [PATCH 2/6] btrfs: Export qgroup_reserve_meta Nikolay Borisov
@ 2021-02-22 16:40 ` Nikolay Borisov
  2021-02-22 23:45   ` Qu Wenruo
  2021-02-22 16:40 ` [PATCH 4/6] btrfs: Cleanup try_flush_qgroup Nikolay Borisov
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2021-02-22 16:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Calling btrfs_qgroup_reserve_meta_prealloc from
btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
while holding a transaction and delayed node locks. This is is dead-lock
prone. In the past multiple commits:

 * ae5e070eaca9 ("btrfs: qgroup: don't try to wait flushing if we're
already holding a transaction")

 * 6f23277a49e6 ("btrfs: qgroup: don't commit transaction when we already
 hold the handle")

Tried to solve various aspects of this but this was always a
whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
scenario involving btrfs_delayed_node::mutex. Namely, one thread
can call btrfs_dirty_inode as a result of reading a file and modifying
its atime:

> PID: 6963   TASK: ffff8c7f3f94c000  CPU: 2   COMMAND: "http-0.0.0.0-62"
>  #0 [ffffaedd02a67a00] __schedule at ffffffffa529e07d
> #1 [ffffaedd02a67a90] schedule at ffffffffa529e4ff
> #2 [ffffaedd02a67aa0] schedule_timeout at ffffffffa52a1bdd
> #3 [ffffaedd02a67b18] wait_for_completion at ffffffffa529eeea <-- sleeps with delayed node mutex held
> #4 [ffffaedd02a67b68] start_delalloc_inodes at ffffffffc0380db5 [btrfs]
> #5 [ffffaedd02a67be8] btrfs_start_delalloc_snapshot at ffffffffc0393836 [btrfs]
> #6 [ffffaedd02a67bf0] try_flush_qgroup at ffffffffc03f04b2 [btrfs]
> #7 [ffffaedd02a67c40] __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6 [btrfs] <-- tries to reserve space and starts delalloc inodes.
> #8 [ffffaedd02a67c68] btrfs_delayed_update_inode at ffffffffc03e31aa [btrfs] <-- Acquires delayed node mutex
> #9 [ffffaedd02a67cc0] btrfs_update_inode at ffffffffc0385ba8 [btrfs]
> #10 [ffffaedd02a67ce8] btrfs_dirty_inode at ffffffffc038627b [btrfs] <-- TRANSACTIION OPENED
> #11 [ffffaedd02a67d18] touch_atime at ffffffffa4cf0000
> #12 [ffffaedd02a67d58] generic_file_read_iter at ffffffffa4c1f123
> #13 [ffffaedd02a67e40] new_sync_read at ffffffffa4ccdc8a
> #14 [ffffaedd02a67ec8] vfs_read at ffffffffa4cd0849
> #15 [ffffaedd02a67ef8] ksys_read at ffffffffa4cd0bd1
> #16 [ffffaedd02a67f38] do_syscall_64 at ffffffffa4a052eb
> #17 [ffffaedd02a67f50] entry_SYSCALL_64_after_hwframe at ffffffffa540008c

This will cause an asynchronous work to flush the delalloc inodes to
happen which can try to acquire the same delayed_node mutex:

> PID: 455    TASK: ffff8c8085fa4000  CPU: 5   COMMAND: "kworker/u16:30"
> #0 [ffffaedd009f77b0] __schedule at ffffffffa529e07d
> #1 [ffffaedd009f7840] schedule at ffffffffa529e4ff
> #2 [ffffaedd009f7850] schedule_preempt_disabled at ffffffffa529e80a
> #3 [ffffaedd009f7858] __mutex_lock at ffffffffa529fdcb <--- goes to sleep, never wakes up.
> #4 [ffffaedd009f78f8] btrfs_delayed_update_inode at ffffffffc03e3143 [btrfs] <-- tries to acquire the mutex
> #5 [ffffaedd009f7950] btrfs_update_inode at ffffffffc0385ba8 [btrfs]   <-- This is the same inode that pid 6963 is holding
> #6 [ffffaedd009f7978] cow_file_range_inline.constprop.78 at ffffffffc0386be7 [btrfs]
> #7 [ffffaedd009f7a30] cow_file_range at ffffffffc03879c1 [btrfs]
> #8 [ffffaedd009f7ab8] btrfs_run_delalloc_range at ffffffffc038894c [btrfs]
> #9 [ffffaedd009f7b40] writepage_delalloc at ffffffffc03a3c8f [btrfs]
> #10 [ffffaedd009f7ba0] __extent_writepage at ffffffffc03a4c01 [btrfs]
> #11 [ffffaedd009f7c08] extent_write_cache_pages at ffffffffc03a500b [btrfs]
> #12 [ffffaedd009f7d08] extent_writepages at ffffffffc03a6de2 [btrfs]
> #13 [ffffaedd009f7d38] do_writepages at ffffffffa4c277eb
> #14 [ffffaedd009f7db8] __filemap_fdatawrite_range at ffffffffa4c1e5bb
> #15 [ffffaedd009f7e40] btrfs_run_delalloc_work at ffffffffc0380987 [btrfs] <-- starts running delayed nodes
> #16 [ffffaedd009f7e58] normal_work_helper at ffffffffc03b706c [btrfs]
> #17 [ffffaedd009f7e98] process_one_work at ffffffffa4aba4e4
> #18 [ffffaedd009f7ed8] worker_thread at ffffffffa4aba6fd
> #19 [ffffaedd009f7f10] kthread at ffffffffa4ac0a3d
> #20 [ffffaedd009f7f50] ret_from_fork at ffffffffa54001ff

To fully address those cases the complete fix is to never issue any
flushing while holding the transaction or the delayed node lock. This
patch achieves it by calling qgroup_reserve_meta directly which will
either succeed without flushing or will fail and return -EDQUOT. In the
latter case that return value is going to be propagated to
btrfs_dirty_inode which will fallback to start a new transaction. That's
fine as the majority of time we expect the inode will have
BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
copying the in-memory state.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/delayed-inode.c | 3 ++-
 fs/btrfs/inode.c         | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index ac9966e76a2f..6dcf2cd1b39e 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -627,7 +627,8 @@ static int btrfs_delayed_inode_reserve_metadata(
 	 */
 	if (!src_rsv || (!trans->bytes_reserved &&
 			 src_rsv->type != BTRFS_BLOCK_RSV_DELALLOC)) {
-		ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
+		ret = qgroup_reserve_meta(root, num_bytes,
+					  BTRFS_QGROUP_RSV_META_PREALLOC, true);
 		if (ret < 0)
 			return ret;
 		ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 547d6c1287d5..bf2d0d3ae7c5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6081,7 +6081,7 @@ static int btrfs_dirty_inode(struct inode *inode)
 		return PTR_ERR(trans);
 
 	ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
-	if (ret && ret == -ENOSPC) {
+	if (ret && (ret == -ENOSPC || ret == -EDQUOT)) {
 		/* whoops, lets try again with the full transaction */
 		btrfs_end_transaction(trans);
 		trans = btrfs_start_transaction(root, 1);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/6] btrfs: Cleanup try_flush_qgroup
  2021-02-22 16:40 [PATCH 0/6] Qgroup/delayed node related fixes Nikolay Borisov
                   ` (2 preceding siblings ...)
  2021-02-22 16:40 ` [PATCH 3/6] btrfs: Don't flush from btrfs_delayed_inode_reserve_metadata Nikolay Borisov
@ 2021-02-22 16:40 ` Nikolay Borisov
  2021-02-22 23:46   ` Qu Wenruo
  2021-02-22 16:40 ` [PATCH 5/6] btrfs: Remove btrfs_inode from btrfs_delayed_inode_reserve_metadata Nikolay Borisov
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2021-02-22 16:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

It's no longer expected to call this function with an open transaction
so all the hacks concerning this can be removed. In fact it'll
constitute a bug to call this function with a transaction already held
so WARN in this case.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/qgroup.c | 35 +++++++----------------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index fbef95bc3557..c9e82e0c88e3 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3535,37 +3535,19 @@ static int try_flush_qgroup(struct btrfs_root *root)
 {
 	struct btrfs_trans_handle *trans;
 	int ret;
-	bool can_commit = true;
 
-	/*
-	 * If current process holds a transaction, we shouldn't flush, as we
-	 * assume all space reservation happens before a transaction handle is
-	 * held.
-	 *
-	 * But there are cases like btrfs_delayed_item_reserve_metadata() where
-	 * we try to reserve space with one transction handle already held.
-	 * In that case we can't commit transaction, but at least try to end it
-	 * and hope the started data writes can free some space.
-	 */
-	if (current->journal_info &&
-	    current->journal_info != BTRFS_SEND_TRANS_STUB)
-		can_commit = false;
+	/* Can't hold an open transaction or we run the risk of deadlocking. */
+	ASSERT(current->journal_info == NULL ||
+	       current->journal_info == BTRFS_SEND_TRANS_STUB);
+	if (WARN_ON(current->journal_info &&
+		     current->journal_info != BTRFS_SEND_TRANS_STUB))
+		return 0;
 
 	/*
 	 * We don't want to run flush again and again, so if there is a running
 	 * one, we won't try to start a new flush, but exit directly.
 	 */
 	if (test_and_set_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state)) {
-		/*
-		 * We are already holding a transaction, thus we can block other
-		 * threads from flushing.  So exit right now. This increases
-		 * the chance of EDQUOT for heavy load and near limit cases.
-		 * But we can argue that if we're already near limit, EDQUOT is
-		 * unavoidable anyway.
-		 */
-		if (!can_commit)
-			return 0;
-
 		wait_event(root->qgroup_flush_wait,
 			!test_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state));
 		return 0;
@@ -3582,10 +3564,7 @@ static int try_flush_qgroup(struct btrfs_root *root)
 		goto out;
 	}
 
-	if (can_commit)
-		ret = btrfs_commit_transaction(trans);
-	else
-		ret = btrfs_end_transaction(trans);
+	ret = btrfs_commit_transaction(trans);
 out:
 	clear_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state);
 	wake_up(&root->qgroup_flush_wait);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/6] btrfs: Remove btrfs_inode from btrfs_delayed_inode_reserve_metadata
  2021-02-22 16:40 [PATCH 0/6] Qgroup/delayed node related fixes Nikolay Borisov
                   ` (3 preceding siblings ...)
  2021-02-22 16:40 ` [PATCH 4/6] btrfs: Cleanup try_flush_qgroup Nikolay Borisov
@ 2021-02-22 16:40 ` Nikolay Borisov
  2021-02-22 23:53   ` Qu Wenruo
  2021-02-22 16:40 ` [PATCH 6/6] btrfs: Simplify code flow in btrfs_delayed_inode_reserve_metadata Nikolay Borisov
  2021-03-01 18:55 ` [PATCH 0/6] Qgroup/delayed node related fixes David Sterba
  6 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2021-02-22 16:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

It's only used for tracepoint to obtain the ino, but we already have
the ino from btrfs_delayed_node::inode_id.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/delayed-inode.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 6dcf2cd1b39e..875daca63d5d 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -602,7 +602,6 @@ static void btrfs_delayed_item_release_metadata(struct btrfs_root *root,
 static int btrfs_delayed_inode_reserve_metadata(
 					struct btrfs_trans_handle *trans,
 					struct btrfs_root *root,
-					struct btrfs_inode *inode,
 					struct btrfs_delayed_node *node)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
@@ -647,7 +646,7 @@ static int btrfs_delayed_inode_reserve_metadata(
 			node->bytes_reserved = num_bytes;
 			trace_btrfs_space_reservation(fs_info,
 						      "delayed_inode",
-						      btrfs_ino(inode),
+						      node->inode_id,
 						      num_bytes, 1);
 		} else {
 			btrfs_qgroup_free_meta_prealloc(root, num_bytes);
@@ -658,7 +657,7 @@ static int btrfs_delayed_inode_reserve_metadata(
 	ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, true);
 	if (!ret) {
 		trace_btrfs_space_reservation(fs_info, "delayed_inode",
-					      btrfs_ino(inode), num_bytes, 1);
+					      node->inode_id, num_bytes, 1);
 		node->bytes_reserved = num_bytes;
 	}
 
@@ -1833,8 +1832,7 @@ int btrfs_delayed_update_inode(struct btrfs_trans_handle *trans,
 		goto release_node;
 	}
 
-	ret = btrfs_delayed_inode_reserve_metadata(trans, root, inode,
-						   delayed_node);
+	ret = btrfs_delayed_inode_reserve_metadata(trans, root, delayed_node);
 	if (ret)
 		goto release_node;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/6] btrfs: Simplify code flow in btrfs_delayed_inode_reserve_metadata
  2021-02-22 16:40 [PATCH 0/6] Qgroup/delayed node related fixes Nikolay Borisov
                   ` (4 preceding siblings ...)
  2021-02-22 16:40 ` [PATCH 5/6] btrfs: Remove btrfs_inode from btrfs_delayed_inode_reserve_metadata Nikolay Borisov
@ 2021-02-22 16:40 ` Nikolay Borisov
  2021-03-01 16:15   ` David Sterba
  2021-03-01 18:55 ` [PATCH 0/6] Qgroup/delayed node related fixes David Sterba
  6 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2021-02-22 16:40 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

btrfs_block_rsv_add can return only ENOSPC since it's called with
NO_FLUSH modifier. This so simplify the logic in
btrfs_delayed_inode_reserve_metadata to exploit this invariant.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/delayed-inode.c | 23 +++--------------------
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 875daca63d5d..92843105ebd8 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -632,29 +632,12 @@ static int btrfs_delayed_inode_reserve_metadata(
 			return ret;
 		ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes,
 					  BTRFS_RESERVE_NO_FLUSH);
-		/*
-		 * Since we're under a transaction reserve_metadata_bytes could
-		 * try to commit the transaction which will make it return
-		 * EAGAIN to make us stop the transaction we have, so return
-		 * ENOSPC instead so that btrfs_dirty_inode knows what to do.
-		 */
-		if (ret == -EAGAIN) {
-			ret = -ENOSPC;
-			btrfs_qgroup_free_meta_prealloc(root, num_bytes);
-		}
-		if (!ret) {
-			node->bytes_reserved = num_bytes;
-			trace_btrfs_space_reservation(fs_info,
-						      "delayed_inode",
-						      node->inode_id,
-						      num_bytes, 1);
-		} else {
+		if (ret)
 			btrfs_qgroup_free_meta_prealloc(root, num_bytes);
-		}
-		return ret;
+	} else {
+		ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, true);
 	}
 
-	ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, true);
 	if (!ret) {
 		trace_btrfs_space_reservation(fs_info, "delayed_inode",
 					      node->inode_id, num_bytes, 1);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] btrfs: Free correct amount of space in btrfs_delayed_inode_reserve_metadata
  2021-02-22 16:40 ` [PATCH 1/6] btrfs: Free correct amount of space in btrfs_delayed_inode_reserve_metadata Nikolay Borisov
@ 2021-02-22 23:41   ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2021-02-22 23:41 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2021/2/23 上午12:40, Nikolay Borisov wrote:
> Following commit f218ea6c4792 ("btrfs: delayed-inode: Remove wrong
> qgroup meta reservation calls") this function now reserves num_bytes,
> rather than the fixed amount of nodesize. As such this requires the
> same amount to be freed in case of failure. Fix this by adjusting
> the amount we are freeing.
>
> Fixes f218ea6c4792 ("btrfs: delayed-inode: Remove wrong qgroup meta reservation calls")
>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>   fs/btrfs/delayed-inode.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index ec0b50b8c5d6..ac9966e76a2f 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -649,7 +649,7 @@ static int btrfs_delayed_inode_reserve_metadata(
>   						      btrfs_ino(inode),
>   						      num_bytes, 1);
>   		} else {
> -			btrfs_qgroup_free_meta_prealloc(root, fs_info->nodesize);
> +			btrfs_qgroup_free_meta_prealloc(root, num_bytes);
>   		}
>   		return ret;
>   	}
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] btrfs: Export qgroup_reserve_meta
  2021-02-22 16:40 ` [PATCH 2/6] btrfs: Export qgroup_reserve_meta Nikolay Borisov
@ 2021-02-22 23:42   ` Qu Wenruo
  2021-02-25 16:27     ` David Sterba
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2021-02-22 23:42 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2021/2/23 上午12:40, Nikolay Borisov wrote:
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Considering how small the export is, I prefer this to be merged with
next patch, as it's much easier to understand why we want to export the
function.

And since it will be exported, may be it's a good idea to rename it as
btrfs_qgroup_reserve_meta_atomic() or btrfs_qgroup_reserve_meta_noflush()?

Thanks,
Qu
> ---
>   fs/btrfs/qgroup.c | 4 ++--
>   fs/btrfs/qgroup.h | 3 +++
>   2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 808370ada888..fbef95bc3557 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -3841,8 +3841,8 @@ static int sub_root_meta_rsv(struct btrfs_root *root, int num_bytes,
>   	return num_bytes;
>   }
>
> -static int qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
> -				enum btrfs_qgroup_rsv_type type, bool enforce)
> +int qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
> +			enum btrfs_qgroup_rsv_type type, bool enforce)
>   {
>   	struct btrfs_fs_info *fs_info = root->fs_info;
>   	int ret;
> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
> index 50dea9a2d8fb..c1a3cc15dede 100644
> --- a/fs/btrfs/qgroup.h
> +++ b/fs/btrfs/qgroup.h
> @@ -354,6 +354,9 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
>   			       u64 rfer, u64 excl);
>   #endif
>
> +int qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
> +			enum btrfs_qgroup_rsv_type type, bool enforce);
> +
>   /* New io_tree based accurate qgroup reserve API */
>   int btrfs_qgroup_reserve_data(struct btrfs_inode *inode,
>   			struct extent_changeset **reserved, u64 start, u64 len);
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/6] btrfs: Don't flush from btrfs_delayed_inode_reserve_metadata
  2021-02-22 16:40 ` [PATCH 3/6] btrfs: Don't flush from btrfs_delayed_inode_reserve_metadata Nikolay Borisov
@ 2021-02-22 23:45   ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2021-02-22 23:45 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2021/2/23 上午12:40, Nikolay Borisov wrote:
> Calling btrfs_qgroup_reserve_meta_prealloc from
> btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
> while holding a transaction and delayed node locks. This is is dead-lock
> prone. In the past multiple commits:
>
>   * ae5e070eaca9 ("btrfs: qgroup: don't try to wait flushing if we're
> already holding a transaction")
>
>   * 6f23277a49e6 ("btrfs: qgroup: don't commit transaction when we already
>   hold the handle")
>
> Tried to solve various aspects of this but this was always a
> whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
> scenario involving btrfs_delayed_node::mutex. Namely, one thread
> can call btrfs_dirty_inode as a result of reading a file and modifying
> its atime:
>
>> PID: 6963   TASK: ffff8c7f3f94c000  CPU: 2   COMMAND: "http-0.0.0.0-62"
>>   #0 [ffffaedd02a67a00] __schedule at ffffffffa529e07d
>> #1 [ffffaedd02a67a90] schedule at ffffffffa529e4ff
>> #2 [ffffaedd02a67aa0] schedule_timeout at ffffffffa52a1bdd
>> #3 [ffffaedd02a67b18] wait_for_completion at ffffffffa529eeea <-- sleeps with delayed node mutex held
>> #4 [ffffaedd02a67b68] start_delalloc_inodes at ffffffffc0380db5 [btrfs]
>> #5 [ffffaedd02a67be8] btrfs_start_delalloc_snapshot at ffffffffc0393836 [btrfs]
>> #6 [ffffaedd02a67bf0] try_flush_qgroup at ffffffffc03f04b2 [btrfs]
>> #7 [ffffaedd02a67c40] __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6 [btrfs] <-- tries to reserve space and starts delalloc inodes.
>> #8 [ffffaedd02a67c68] btrfs_delayed_update_inode at ffffffffc03e31aa [btrfs] <-- Acquires delayed node mutex
>> #9 [ffffaedd02a67cc0] btrfs_update_inode at ffffffffc0385ba8 [btrfs]
>> #10 [ffffaedd02a67ce8] btrfs_dirty_inode at ffffffffc038627b [btrfs] <-- TRANSACTIION OPENED
>> #11 [ffffaedd02a67d18] touch_atime at ffffffffa4cf0000
>> #12 [ffffaedd02a67d58] generic_file_read_iter at ffffffffa4c1f123
>> #13 [ffffaedd02a67e40] new_sync_read at ffffffffa4ccdc8a
>> #14 [ffffaedd02a67ec8] vfs_read at ffffffffa4cd0849
>> #15 [ffffaedd02a67ef8] ksys_read at ffffffffa4cd0bd1
>> #16 [ffffaedd02a67f38] do_syscall_64 at ffffffffa4a052eb
>> #17 [ffffaedd02a67f50] entry_SYSCALL_64_after_hwframe at ffffffffa540008c
>
> This will cause an asynchronous work to flush the delalloc inodes to
> happen which can try to acquire the same delayed_node mutex:
>
>> PID: 455    TASK: ffff8c8085fa4000  CPU: 5   COMMAND: "kworker/u16:30"
>> #0 [ffffaedd009f77b0] __schedule at ffffffffa529e07d
>> #1 [ffffaedd009f7840] schedule at ffffffffa529e4ff
>> #2 [ffffaedd009f7850] schedule_preempt_disabled at ffffffffa529e80a
>> #3 [ffffaedd009f7858] __mutex_lock at ffffffffa529fdcb <--- goes to sleep, never wakes up.
>> #4 [ffffaedd009f78f8] btrfs_delayed_update_inode at ffffffffc03e3143 [btrfs] <-- tries to acquire the mutex
>> #5 [ffffaedd009f7950] btrfs_update_inode at ffffffffc0385ba8 [btrfs]   <-- This is the same inode that pid 6963 is holding
>> #6 [ffffaedd009f7978] cow_file_range_inline.constprop.78 at ffffffffc0386be7 [btrfs]
>> #7 [ffffaedd009f7a30] cow_file_range at ffffffffc03879c1 [btrfs]
>> #8 [ffffaedd009f7ab8] btrfs_run_delalloc_range at ffffffffc038894c [btrfs]
>> #9 [ffffaedd009f7b40] writepage_delalloc at ffffffffc03a3c8f [btrfs]
>> #10 [ffffaedd009f7ba0] __extent_writepage at ffffffffc03a4c01 [btrfs]
>> #11 [ffffaedd009f7c08] extent_write_cache_pages at ffffffffc03a500b [btrfs]
>> #12 [ffffaedd009f7d08] extent_writepages at ffffffffc03a6de2 [btrfs]
>> #13 [ffffaedd009f7d38] do_writepages at ffffffffa4c277eb
>> #14 [ffffaedd009f7db8] __filemap_fdatawrite_range at ffffffffa4c1e5bb
>> #15 [ffffaedd009f7e40] btrfs_run_delalloc_work at ffffffffc0380987 [btrfs] <-- starts running delayed nodes
>> #16 [ffffaedd009f7e58] normal_work_helper at ffffffffc03b706c [btrfs]
>> #17 [ffffaedd009f7e98] process_one_work at ffffffffa4aba4e4
>> #18 [ffffaedd009f7ed8] worker_thread at ffffffffa4aba6fd
>> #19 [ffffaedd009f7f10] kthread at ffffffffa4ac0a3d
>> #20 [ffffaedd009f7f50] ret_from_fork at ffffffffa54001ff
>
> To fully address those cases the complete fix is to never issue any
> flushing while holding the transaction or the delayed node lock. This
> patch achieves it by calling qgroup_reserve_meta directly which will
> either succeed without flushing or will fail and return -EDQUOT. In the
> latter case that return value is going to be propagated to
> btrfs_dirty_inode which will fallback to start a new transaction. That's
> fine as the majority of time we expect the inode will have
> BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
> copying the in-memory state.
>
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

The fix is indeed much better.

It avoids the performance regression from my previous
btrfs_dirty_inode() fix, but still remove the flush in the context.

With merge with previous patch, feel free to add:

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/delayed-inode.c | 3 ++-
>   fs/btrfs/inode.c         | 2 +-
>   2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index ac9966e76a2f..6dcf2cd1b39e 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -627,7 +627,8 @@ static int btrfs_delayed_inode_reserve_metadata(
>   	 */
>   	if (!src_rsv || (!trans->bytes_reserved &&
>   			 src_rsv->type != BTRFS_BLOCK_RSV_DELALLOC)) {
> -		ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
> +		ret = qgroup_reserve_meta(root, num_bytes,
> +					  BTRFS_QGROUP_RSV_META_PREALLOC, true);
>   		if (ret < 0)
>   			return ret;
>   		ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 547d6c1287d5..bf2d0d3ae7c5 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6081,7 +6081,7 @@ static int btrfs_dirty_inode(struct inode *inode)
>   		return PTR_ERR(trans);
>
>   	ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
> -	if (ret && ret == -ENOSPC) {
> +	if (ret && (ret == -ENOSPC || ret == -EDQUOT)) {
>   		/* whoops, lets try again with the full transaction */
>   		btrfs_end_transaction(trans);
>   		trans = btrfs_start_transaction(root, 1);
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] btrfs: Cleanup try_flush_qgroup
  2021-02-22 16:40 ` [PATCH 4/6] btrfs: Cleanup try_flush_qgroup Nikolay Borisov
@ 2021-02-22 23:46   ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2021-02-22 23:46 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2021/2/23 上午12:40, Nikolay Borisov wrote:
> It's no longer expected to call this function with an open transaction
> so all the hacks concerning this can be removed. In fact it'll
> constitute a bug to call this function with a transaction already held
> so WARN in this case.
>
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/qgroup.c | 35 +++++++----------------------------
>   1 file changed, 7 insertions(+), 28 deletions(-)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index fbef95bc3557..c9e82e0c88e3 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -3535,37 +3535,19 @@ static int try_flush_qgroup(struct btrfs_root *root)
>   {
>   	struct btrfs_trans_handle *trans;
>   	int ret;
> -	bool can_commit = true;
>
> -	/*
> -	 * If current process holds a transaction, we shouldn't flush, as we
> -	 * assume all space reservation happens before a transaction handle is
> -	 * held.
> -	 *
> -	 * But there are cases like btrfs_delayed_item_reserve_metadata() where
> -	 * we try to reserve space with one transction handle already held.
> -	 * In that case we can't commit transaction, but at least try to end it
> -	 * and hope the started data writes can free some space.
> -	 */
> -	if (current->journal_info &&
> -	    current->journal_info != BTRFS_SEND_TRANS_STUB)
> -		can_commit = false;
> +	/* Can't hold an open transaction or we run the risk of deadlocking. */
> +	ASSERT(current->journal_info == NULL ||
> +	       current->journal_info == BTRFS_SEND_TRANS_STUB);
> +	if (WARN_ON(current->journal_info &&
> +		     current->journal_info != BTRFS_SEND_TRANS_STUB))
> +		return 0;
>
>   	/*
>   	 * We don't want to run flush again and again, so if there is a running
>   	 * one, we won't try to start a new flush, but exit directly.
>   	 */
>   	if (test_and_set_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state)) {
> -		/*
> -		 * We are already holding a transaction, thus we can block other
> -		 * threads from flushing.  So exit right now. This increases
> -		 * the chance of EDQUOT for heavy load and near limit cases.
> -		 * But we can argue that if we're already near limit, EDQUOT is
> -		 * unavoidable anyway.
> -		 */
> -		if (!can_commit)
> -			return 0;
> -
>   		wait_event(root->qgroup_flush_wait,
>   			!test_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state));
>   		return 0;
> @@ -3582,10 +3564,7 @@ static int try_flush_qgroup(struct btrfs_root *root)
>   		goto out;
>   	}
>
> -	if (can_commit)
> -		ret = btrfs_commit_transaction(trans);
> -	else
> -		ret = btrfs_end_transaction(trans);
> +	ret = btrfs_commit_transaction(trans);
>   out:
>   	clear_bit(BTRFS_ROOT_QGROUP_FLUSHING, &root->state);
>   	wake_up(&root->qgroup_flush_wait);
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] btrfs: Remove btrfs_inode from btrfs_delayed_inode_reserve_metadata
  2021-02-22 16:40 ` [PATCH 5/6] btrfs: Remove btrfs_inode from btrfs_delayed_inode_reserve_metadata Nikolay Borisov
@ 2021-02-22 23:53   ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2021-02-22 23:53 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2021/2/23 上午12:40, Nikolay Borisov wrote:
> It's only used for tracepoint to obtain the ino, but we already have
> the ino from btrfs_delayed_node::inode_id.
>
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/delayed-inode.c | 8 +++-----
>   1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index 6dcf2cd1b39e..875daca63d5d 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -602,7 +602,6 @@ static void btrfs_delayed_item_release_metadata(struct btrfs_root *root,
>   static int btrfs_delayed_inode_reserve_metadata(
>   					struct btrfs_trans_handle *trans,
>   					struct btrfs_root *root,
> -					struct btrfs_inode *inode,
>   					struct btrfs_delayed_node *node)
>   {
>   	struct btrfs_fs_info *fs_info = root->fs_info;
> @@ -647,7 +646,7 @@ static int btrfs_delayed_inode_reserve_metadata(
>   			node->bytes_reserved = num_bytes;
>   			trace_btrfs_space_reservation(fs_info,
>   						      "delayed_inode",
> -						      btrfs_ino(inode),
> +						      node->inode_id,
>   						      num_bytes, 1);
>   		} else {
>   			btrfs_qgroup_free_meta_prealloc(root, num_bytes);
> @@ -658,7 +657,7 @@ static int btrfs_delayed_inode_reserve_metadata(
>   	ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, true);
>   	if (!ret) {
>   		trace_btrfs_space_reservation(fs_info, "delayed_inode",
> -					      btrfs_ino(inode), num_bytes, 1);
> +					      node->inode_id, num_bytes, 1);
>   		node->bytes_reserved = num_bytes;
>   	}
>
> @@ -1833,8 +1832,7 @@ int btrfs_delayed_update_inode(struct btrfs_trans_handle *trans,
>   		goto release_node;
>   	}
>
> -	ret = btrfs_delayed_inode_reserve_metadata(trans, root, inode,
> -						   delayed_node);
> +	ret = btrfs_delayed_inode_reserve_metadata(trans, root, delayed_node);
>   	if (ret)
>   		goto release_node;
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] btrfs: Export qgroup_reserve_meta
  2021-02-22 23:42   ` Qu Wenruo
@ 2021-02-25 16:27     ` David Sterba
  0 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2021-02-25 16:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Nikolay Borisov, linux-btrfs

On Tue, Feb 23, 2021 at 07:42:48AM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/2/23 上午12:40, Nikolay Borisov wrote:
> > Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> 
> Considering how small the export is, I prefer this to be merged with
> next patch, as it's much easier to understand why we want to export the
> function.
> 
> And since it will be exported, may be it's a good idea to rename it as
> btrfs_qgroup_reserve_meta_atomic() or btrfs_qgroup_reserve_meta_noflush()?

Yes the exported functions should have the btrfs_ prefix and because
that needs changing all callers it's usually a good idea to do it in a
separate patch.

About the rename, using _atomic could be confusing as it has already two
other meanings in linux.  There's already __btrfs_qgroup_reserve_meta,
looking at all the other reserve_meta helpers, I think we can keep it as
btrfs_qgroup_reserve_meta, but the _noflush suffix also makes sense.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] btrfs: Simplify code flow in btrfs_delayed_inode_reserve_metadata
  2021-02-22 16:40 ` [PATCH 6/6] btrfs: Simplify code flow in btrfs_delayed_inode_reserve_metadata Nikolay Borisov
@ 2021-03-01 16:15   ` David Sterba
  2021-03-01 16:20     ` Nikolay Borisov
  0 siblings, 1 reply; 17+ messages in thread
From: David Sterba @ 2021-03-01 16:15 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Mon, Feb 22, 2021 at 06:40:47PM +0200, Nikolay Borisov wrote:
> btrfs_block_rsv_add can return only ENOSPC since it's called with
> NO_FLUSH modifier. This so simplify the logic in
> btrfs_delayed_inode_reserve_metadata to exploit this invariant.

This seems quite fragile, it's not straightforward to see from the
context that the NO_FLUSH code will always return ENOSPC. I followed a
few calls down from btrfs_block_rsv_add and it's well hidden inside
__reserve_bytes. So in case it's an invariant I'd rather add an
assertion, ie. ASSERT(ret == 0 || ret == -ENOSPC) so at least we know
when this gets broken. Otherwise looks ok.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] btrfs: Simplify code flow in btrfs_delayed_inode_reserve_metadata
  2021-03-01 16:15   ` David Sterba
@ 2021-03-01 16:20     ` Nikolay Borisov
  2021-03-01 18:54       ` David Sterba
  0 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2021-03-01 16:20 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On 1.03.21 г. 18:15 ч., David Sterba wrote:
> On Mon, Feb 22, 2021 at 06:40:47PM +0200, Nikolay Borisov wrote:
>> btrfs_block_rsv_add can return only ENOSPC since it's called with
>> NO_FLUSH modifier. This so simplify the logic in
>> btrfs_delayed_inode_reserve_metadata to exploit this invariant.
> 
> This seems quite fragile, it's not straightforward to see from the
> context that the NO_FLUSH code will always return ENOSPC. I followed a
> few calls down from btrfs_block_rsv_add and it's well hidden inside
> __reserve_bytes. So in case it's an invariant I'd rather add an
> assertion, ie. ASSERT(ret == 0 || ret == -ENOSPC) so at least we know
> when this gets broken. Otherwise looks ok.
> 


Fair enough, I'm fine with it. In any case we no longer return eagain
when reserving. either we succeed or we return ENOSPC - either because
we don't have space and we can't flush or because even after the
flushing machinery did its work a ticket still couldn't be satisfied, in
which case we failed it hence ENOSPC got returned.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] btrfs: Simplify code flow in btrfs_delayed_inode_reserve_metadata
  2021-03-01 16:20     ` Nikolay Borisov
@ 2021-03-01 18:54       ` David Sterba
  0 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2021-03-01 18:54 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dsterba, linux-btrfs

On Mon, Mar 01, 2021 at 06:20:29PM +0200, Nikolay Borisov wrote:
> 
> 
> On 1.03.21 г. 18:15 ч., David Sterba wrote:
> > On Mon, Feb 22, 2021 at 06:40:47PM +0200, Nikolay Borisov wrote:
> >> btrfs_block_rsv_add can return only ENOSPC since it's called with
> >> NO_FLUSH modifier. This so simplify the logic in
> >> btrfs_delayed_inode_reserve_metadata to exploit this invariant.
> > 
> > This seems quite fragile, it's not straightforward to see from the
> > context that the NO_FLUSH code will always return ENOSPC. I followed a
> > few calls down from btrfs_block_rsv_add and it's well hidden inside
> > __reserve_bytes. So in case it's an invariant I'd rather add an
> > assertion, ie. ASSERT(ret == 0 || ret == -ENOSPC) so at least we know
> > when this gets broken. Otherwise looks ok.
> 
> Fair enough, I'm fine with it. In any case we no longer return eagain
> when reserving. either we succeed or we return ENOSPC - either because
> we don't have space and we can't flush or because even after the
> flushing machinery did its work a ticket still couldn't be satisfied, in
> which case we failed it hence ENOSPC got returned.

Yeah but this is a precaution when somebody reworks the flushing logic
yet another time and suddenly EAGAIN or whatever else can become the
return code again. Hunting such bugs can be quite difficult as it
depends on the runtime state and the space stat on disk etc.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] Qgroup/delayed node related fixes
  2021-02-22 16:40 [PATCH 0/6] Qgroup/delayed node related fixes Nikolay Borisov
                   ` (5 preceding siblings ...)
  2021-02-22 16:40 ` [PATCH 6/6] btrfs: Simplify code flow in btrfs_delayed_inode_reserve_metadata Nikolay Borisov
@ 2021-03-01 18:55 ` David Sterba
  6 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2021-03-01 18:55 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Mon, Feb 22, 2021 at 06:40:41PM +0200, Nikolay Borisov wrote:
> This series contains a couple of fixes and code simplifications around qgroup
> and delayed node interation. The first 3 patches fix 2 separate issues - one
> possible underflow when freeing qgroup-reserved space and the other one is a
> deadlock. Next 3 patches build on the fixes to clean up and simplify qgroup's
> flushing code.
> 
> Nikolay Borisov (6):
>   btrfs: Free correct amount of space in btrfs_delayed_inode_reserve_metadata
>   btrfs: Export qgroup_reserve_meta
>   btrfs: Don't flush from btrfs_delayed_inode_reserve_metadata
>   btrfs: Cleanup try_flush_qgroup
>   btrfs: Remove btrfs_inode from btrfs_delayed_inode_reserve_metadata
>   btrfs: Simplify code flow in btrfs_delayed_inode_reserve_metadata

Patchset added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-03-01 19:00 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-22 16:40 [PATCH 0/6] Qgroup/delayed node related fixes Nikolay Borisov
2021-02-22 16:40 ` [PATCH 1/6] btrfs: Free correct amount of space in btrfs_delayed_inode_reserve_metadata Nikolay Borisov
2021-02-22 23:41   ` Qu Wenruo
2021-02-22 16:40 ` [PATCH 2/6] btrfs: Export qgroup_reserve_meta Nikolay Borisov
2021-02-22 23:42   ` Qu Wenruo
2021-02-25 16:27     ` David Sterba
2021-02-22 16:40 ` [PATCH 3/6] btrfs: Don't flush from btrfs_delayed_inode_reserve_metadata Nikolay Borisov
2021-02-22 23:45   ` Qu Wenruo
2021-02-22 16:40 ` [PATCH 4/6] btrfs: Cleanup try_flush_qgroup Nikolay Borisov
2021-02-22 23:46   ` Qu Wenruo
2021-02-22 16:40 ` [PATCH 5/6] btrfs: Remove btrfs_inode from btrfs_delayed_inode_reserve_metadata Nikolay Borisov
2021-02-22 23:53   ` Qu Wenruo
2021-02-22 16:40 ` [PATCH 6/6] btrfs: Simplify code flow in btrfs_delayed_inode_reserve_metadata Nikolay Borisov
2021-03-01 16:15   ` David Sterba
2021-03-01 16:20     ` Nikolay Borisov
2021-03-01 18:54       ` David Sterba
2021-03-01 18:55 ` [PATCH 0/6] Qgroup/delayed node related fixes David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).