All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] btrfs: Annotate wait events with lockdep
@ 2022-07-19  4:09 Ioannis Angelakopoulos
  2022-07-19  4:09 ` [PATCH v2 1/5] btrfs: Add a lockdep model for the num_writers wait event Ioannis Angelakopoulos
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Ioannis Angelakopoulos @ 2022-07-19  4:09 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

With this patch series we annotate wait events in btrfs with lockdep to
catch deadlocks involving these wait events.

Recently the btrfs developers fixed a non trivial deadlock involving
wait events
https://lore.kernel.org/linux-btrfs/20220614131413.GJ20633@twin.jikos.cz/

Currently lockdep is unable to catch these deadlocks since it does not
support wait events by default.

With our lockdep annotations we train lockdep to track these wait events
and catch more potential deadlocks.

Specifically, we annotate the below wait events in fs/btrfs/transaction.c
and in fs/btrfs/ordered-data.c:

  1) The num_writers wait event
  2) The num_extwriters wait event
  3) The transaction states wait events
  4) The pending_ordered wait event
  5) The ordered extents wait event

Changes from v1:
  1) Added 2 labels in the cleanup code of btrfs_commit_transaction() in
  fs/btrfs/transaction.c so that btrfs_lockdep_release() is not called
  multiple times during the error paths.
  2) Added lockdep annotations for the btrfs transaction states wait
  events.
  3) Added a lockdep annotation for the pending_ordered wait event.
  4) Added a lockdep annotation for the ordered extents wait event.


Ioannis Angelakopoulos (5):
  btrfs: Add a lockdep model for the num_writers wait event
  btrfs: Add a lockdep model for the num_extwriters wait event
  btrfs: Add lockdep models for the transaction states wait events
  btrfs: Add a lockdep model for the pending_ordered wait event
  btrfs: Add a lockdep model for the ordered extents wait event

 fs/btrfs/ctree.h            |  37 ++++++++++++
 fs/btrfs/disk-io.c          |  35 ++++++++++++
 fs/btrfs/free-space-cache.c |  11 ++++
 fs/btrfs/inode.c            |  13 +++++
 fs/btrfs/ordered-data.c     |  21 +++++++
 fs/btrfs/transaction.c      | 108 ++++++++++++++++++++++++++++++++----
 6 files changed, 214 insertions(+), 11 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/5] btrfs: Add a lockdep model for the num_writers wait event
  2022-07-19  4:09 [PATCH v2 0/5] btrfs: Annotate wait events with lockdep Ioannis Angelakopoulos
@ 2022-07-19  4:09 ` Ioannis Angelakopoulos
  2022-07-20 14:46   ` Josef Bacik
  2022-07-20 14:47   ` Sweet Tea Dorminy
  2022-07-19  4:09 ` [PATCH v2 2/5] btrfs: Add a lockdep model for the num_extwriters " Ioannis Angelakopoulos
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Ioannis Angelakopoulos @ 2022-07-19  4:09 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Annotate the num_writers wait event in fs/btrfs/transaction.c with lockdep
in order to catch deadlocks involving this wait event.

Use a read/write lockdep map for the annotation. A thread starting/joining
the transaction acquires the map as a reader when it increments
cur_trans->num_writers and it acquires the map as a writer before it
blocks on the wait event.

Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
---
 fs/btrfs/ctree.h       | 14 ++++++++++++++
 fs/btrfs/disk-io.c     |  6 ++++++
 fs/btrfs/transaction.c | 37 ++++++++++++++++++++++++++++++++-----
 3 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 202496172059..999868734be7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1095,6 +1095,8 @@ struct btrfs_fs_info {
 	/* Updates are not protected by any lock */
 	struct btrfs_commit_stats commit_stats;
 
+	struct lockdep_map btrfs_trans_num_writers_map;
+
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
 	struct rb_root block_tree;
@@ -1175,6 +1177,18 @@ enum {
 	BTRFS_ROOT_UNFINISHED_DROP,
 };
 
+#define btrfs_might_wait_for_event(b, lock) \
+	do { \
+		rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \
+		rwsem_release(&b->lock##_map, _THIS_IP_); \
+	} while (0)
+
+#define btrfs_lockdep_acquire(b, lock) \
+	rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_)
+
+#define btrfs_lockdep_release(b, lock) \
+	rwsem_release(&b->lock##_map, _THIS_IP_)
+
 static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
 {
 	clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3fac429cf8a4..01a5a49a3a11 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3046,6 +3046,8 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
 
 void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 {
+	static struct lock_class_key btrfs_trans_num_writers_key;
+
 	INIT_RADIX_TREE(&fs_info->fs_roots_radix, GFP_ATOMIC);
 	INIT_RADIX_TREE(&fs_info->buffer_radix, GFP_ATOMIC);
 	INIT_LIST_HEAD(&fs_info->trans_list);
@@ -3074,6 +3076,10 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	mutex_init(&fs_info->zoned_data_reloc_io_lock);
 	seqlock_init(&fs_info->profiles_lock);
 
+	lockdep_init_map(&fs_info->btrfs_trans_num_writers_map,
+					 "btrfs_trans_num_writers",
+					 &btrfs_trans_num_writers_key, 0);
+
 	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
 	INIT_LIST_HEAD(&fs_info->space_info);
 	INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 0bec10740ad3..d8287ec890bc 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -313,6 +313,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 		atomic_inc(&cur_trans->num_writers);
 		extwriter_counter_inc(cur_trans, type);
 		spin_unlock(&fs_info->trans_lock);
+		btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers);
 		return 0;
 	}
 	spin_unlock(&fs_info->trans_lock);
@@ -334,16 +335,20 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 	if (!cur_trans)
 		return -ENOMEM;
 
+	btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers);
+
 	spin_lock(&fs_info->trans_lock);
 	if (fs_info->running_transaction) {
 		/*
 		 * someone started a transaction after we unlocked.  Make sure
 		 * to redo the checks above
 		 */
+		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 		kfree(cur_trans);
 		goto loop;
 	} else if (BTRFS_FS_ERROR(fs_info)) {
 		spin_unlock(&fs_info->trans_lock);
+		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 		kfree(cur_trans);
 		return -EROFS;
 	}
@@ -1022,6 +1027,9 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 	extwriter_counter_dec(cur_trans, trans->type);
 
 	cond_wake_up(&cur_trans->writer_wait);
+
+	btrfs_lockdep_release(info, btrfs_trans_num_writers);
+
 	btrfs_put_transaction(cur_trans);
 
 	if (current->journal_info == trans)
@@ -1994,6 +2002,12 @@ static void cleanup_transaction(struct btrfs_trans_handle *trans, int err)
 	if (cur_trans == fs_info->running_transaction) {
 		cur_trans->state = TRANS_STATE_COMMIT_DOING;
 		spin_unlock(&fs_info->trans_lock);
+
+		/*
+		 * The thread has already released the lockdep map as reader already in
+		 * btrfs_commit_transaction().
+		 */
+		btrfs_might_wait_for_event(fs_info, btrfs_trans_num_writers);
 		wait_event(cur_trans->writer_wait,
 			   atomic_read(&cur_trans->num_writers) == 1);
 
@@ -2222,7 +2236,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 
 			btrfs_put_transaction(prev_trans);
 			if (ret)
-				goto cleanup_transaction;
+				goto lockdep_release;
 		} else {
 			spin_unlock(&fs_info->trans_lock);
 		}
@@ -2236,7 +2250,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 		 */
 		if (BTRFS_FS_ERROR(fs_info)) {
 			ret = -EROFS;
-			goto cleanup_transaction;
+			goto lockdep_release;
 		}
 	}
 
@@ -2250,19 +2264,21 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 
 	ret = btrfs_start_delalloc_flush(fs_info);
 	if (ret)
-		goto cleanup_transaction;
+		goto lockdep_release;
 
 	ret = btrfs_run_delayed_items(trans);
 	if (ret)
-		goto cleanup_transaction;
+		goto lockdep_release;
 
 	wait_event(cur_trans->writer_wait,
 		   extwriter_counter_read(cur_trans) == 0);
 
 	/* some pending stuffs might be added after the previous flush. */
 	ret = btrfs_run_delayed_items(trans);
-	if (ret)
+	if (ret) {
+		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 		goto cleanup_transaction;
+	}
 
 	btrfs_wait_delalloc_flush(fs_info);
 
@@ -2284,6 +2300,14 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	add_pending_snapshot(trans);
 	cur_trans->state = TRANS_STATE_COMMIT_DOING;
 	spin_unlock(&fs_info->trans_lock);
+
+	/*
+	 * The thread has started/joined the transaction thus it holds the lockdep
+	 * map as a reader. It has to release it before acquiring the lockdep map
+	 * as a writer.
+	 */
+	btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
+	btrfs_might_wait_for_event(fs_info, btrfs_trans_num_writers);
 	wait_event(cur_trans->writer_wait,
 		   atomic_read(&cur_trans->num_writers) == 1);
 
@@ -2515,6 +2539,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	cleanup_transaction(trans, ret);
 
 	return ret;
+lockdep_release:
+	btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
+	goto cleanup_transaction;
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/5] btrfs: Add a lockdep model for the num_extwriters wait event
  2022-07-19  4:09 [PATCH v2 0/5] btrfs: Annotate wait events with lockdep Ioannis Angelakopoulos
  2022-07-19  4:09 ` [PATCH v2 1/5] btrfs: Add a lockdep model for the num_writers wait event Ioannis Angelakopoulos
@ 2022-07-19  4:09 ` Ioannis Angelakopoulos
  2022-07-20 14:46   ` Josef Bacik
  2022-07-19  4:09 ` [PATCH v2 3/5] btrfs: Add lockdep models for the transaction states wait events Ioannis Angelakopoulos
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Ioannis Angelakopoulos @ 2022-07-19  4:09 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Similarly to the num_writers wait event in fs/btrfs/transaction.c add a
lockdep annotation for the num_extwriters wait event.

Use a read/write lockdep map for the annotation. A thread starting/joining
the transaction acquires the map as a reader when it increments
cur_trans->num_writers and it acquires the map as a writer before it
blocks on the wait event.

Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
---
 fs/btrfs/ctree.h       |  1 +
 fs/btrfs/disk-io.c     |  4 ++++
 fs/btrfs/transaction.c | 13 +++++++++++++
 3 files changed, 18 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 999868734be7..586756f831e5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1096,6 +1096,7 @@ struct btrfs_fs_info {
 	struct btrfs_commit_stats commit_stats;
 
 	struct lockdep_map btrfs_trans_num_writers_map;
+	struct lockdep_map btrfs_trans_num_extwriters_map;
 
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 01a5a49a3a11..b1193584ba49 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3047,6 +3047,7 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
 void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 {
 	static struct lock_class_key btrfs_trans_num_writers_key;
+	static struct lock_class_key btrfs_trans_num_extwriters_key;
 
 	INIT_RADIX_TREE(&fs_info->fs_roots_radix, GFP_ATOMIC);
 	INIT_RADIX_TREE(&fs_info->buffer_radix, GFP_ATOMIC);
@@ -3079,6 +3080,9 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	lockdep_init_map(&fs_info->btrfs_trans_num_writers_map,
 					 "btrfs_trans_num_writers",
 					 &btrfs_trans_num_writers_key, 0);
+	lockdep_init_map(&fs_info->btrfs_trans_num_extwriters_map,
+					 "btrfs_trans_num_extwriters",
+					 &btrfs_trans_num_extwriters_key, 0);
 
 	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
 	INIT_LIST_HEAD(&fs_info->space_info);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index d8287ec890bc..c9751a05c029 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -314,6 +314,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 		extwriter_counter_inc(cur_trans, type);
 		spin_unlock(&fs_info->trans_lock);
 		btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers);
+		btrfs_lockdep_acquire(fs_info, btrfs_trans_num_extwriters);
 		return 0;
 	}
 	spin_unlock(&fs_info->trans_lock);
@@ -336,6 +337,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 		return -ENOMEM;
 
 	btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers);
+	btrfs_lockdep_acquire(fs_info, btrfs_trans_num_extwriters);
 
 	spin_lock(&fs_info->trans_lock);
 	if (fs_info->running_transaction) {
@@ -343,11 +345,13 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 		 * someone started a transaction after we unlocked.  Make sure
 		 * to redo the checks above
 		 */
+		btrfs_lockdep_release(fs_info, btrfs_trans_num_extwriters);
 		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 		kfree(cur_trans);
 		goto loop;
 	} else if (BTRFS_FS_ERROR(fs_info)) {
 		spin_unlock(&fs_info->trans_lock);
+		btrfs_lockdep_release(fs_info, btrfs_trans_num_extwriters);
 		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 		kfree(cur_trans);
 		return -EROFS;
@@ -1028,6 +1032,7 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 
 	cond_wake_up(&cur_trans->writer_wait);
 
+	btrfs_lockdep_release(info, btrfs_trans_num_extwriters);
 	btrfs_lockdep_release(info, btrfs_trans_num_writers);
 
 	btrfs_put_transaction(cur_trans);
@@ -2270,6 +2275,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	if (ret)
 		goto lockdep_release;
 
+	/*
+	 * The thread has started/joined the transaction thus it holds the lockdep
+	 * map as a reader. It has to release it before acquiring the lockdep map
+	 * as a writer.
+	 */
+	btrfs_lockdep_release(fs_info, btrfs_trans_num_extwriters);
+	btrfs_might_wait_for_event(fs_info, btrfs_trans_num_extwriters);
 	wait_event(cur_trans->writer_wait,
 		   extwriter_counter_read(cur_trans) == 0);
 
@@ -2540,6 +2552,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 
 	return ret;
 lockdep_release:
+	btrfs_lockdep_release(fs_info, btrfs_trans_num_extwriters);
 	btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 	goto cleanup_transaction;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/5] btrfs: Add lockdep models for the transaction states wait events
  2022-07-19  4:09 [PATCH v2 0/5] btrfs: Annotate wait events with lockdep Ioannis Angelakopoulos
  2022-07-19  4:09 ` [PATCH v2 1/5] btrfs: Add a lockdep model for the num_writers wait event Ioannis Angelakopoulos
  2022-07-19  4:09 ` [PATCH v2 2/5] btrfs: Add a lockdep model for the num_extwriters " Ioannis Angelakopoulos
@ 2022-07-19  4:09 ` Ioannis Angelakopoulos
  2022-07-20 14:47   ` Sweet Tea Dorminy
  2022-07-20 14:48   ` Josef Bacik
  2022-07-19  4:09 ` [PATCH v2 4/5] btrfs: Add a lockdep model for the pending_ordered wait event Ioannis Angelakopoulos
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Ioannis Angelakopoulos @ 2022-07-19  4:09 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Add a lockdep annotation for the transaction states that have wait
events; 1) TRANS_STATE_COMMIT_START, 2) TRANS_STATE_UNBLOCKED, 3)
TRANS_STATE_SUPER_COMMITTED, and 4) TRANS_STATE_COMPLETED in
fs/btrfs/transaction.c.

With the exception of the lockdep annotation for TRANS_STATE_COMMIT_START
the transaction thread has to acquire the lockdep maps for the transaction
states as reader after the lockdep map for num_writers is released so that
lockdep does not complain.


Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
---
 fs/btrfs/ctree.h       | 20 +++++++++++++++
 fs/btrfs/disk-io.c     | 17 +++++++++++++
 fs/btrfs/transaction.c | 57 +++++++++++++++++++++++++++++++++++++-----
 3 files changed, 88 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 586756f831e5..e6c7cafcd296 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1097,6 +1097,7 @@ struct btrfs_fs_info {
 
 	struct lockdep_map btrfs_trans_num_writers_map;
 	struct lockdep_map btrfs_trans_num_extwriters_map;
+	struct lockdep_map btrfs_state_change_map[4];
 
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
@@ -1178,6 +1179,13 @@ enum {
 	BTRFS_ROOT_UNFINISHED_DROP,
 };
 
+enum btrfs_lockdep_trans_states {
+	BTRFS_LOCKDEP_TRANS_COMMIT_START,
+	BTRFS_LOCKDEP_TRANS_UNBLOCKED,
+	BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED,
+	BTRFS_LOCKDEP_TRANS_COMPLETED,
+};
+
 #define btrfs_might_wait_for_event(b, lock) \
 	do { \
 		rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \
@@ -1190,6 +1198,18 @@ enum {
 #define btrfs_lockdep_release(b, lock) \
 	rwsem_release(&b->lock##_map, _THIS_IP_)
 
+#define btrfs_might_wait_for_state(b, i) \
+	do { \
+		rwsem_acquire(&b->btrfs_state_change_map[i], 0, 0, _THIS_IP_); \
+		rwsem_release(&b->btrfs_state_change_map[i], _THIS_IP_); \
+	} while (0)
+
+#define btrfs_trans_state_lockdep_acquire(b, i) \
+	rwsem_acquire_read(&b->btrfs_state_change_map[i], 0, 0, _THIS_IP_)
+
+#define btrfs_trans_state_lockdep_release(b, i) \
+	rwsem_release(&b->btrfs_state_change_map[i], _THIS_IP_)
+
 static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
 {
 	clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b1193584ba49..be5cf86fa992 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3048,6 +3048,10 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 {
 	static struct lock_class_key btrfs_trans_num_writers_key;
 	static struct lock_class_key btrfs_trans_num_extwriters_key;
+	static struct lock_class_key btrfs_trans_commit_start_key;
+	static struct lock_class_key btrfs_trans_unblocked_key;
+	static struct lock_class_key btrfs_trans_sup_committed_key;
+	static struct lock_class_key btrfs_trans_completed_key;
 
 	INIT_RADIX_TREE(&fs_info->fs_roots_radix, GFP_ATOMIC);
 	INIT_RADIX_TREE(&fs_info->buffer_radix, GFP_ATOMIC);
@@ -3084,6 +3088,19 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 					 "btrfs_trans_num_extwriters",
 					 &btrfs_trans_num_extwriters_key, 0);
 
+	lockdep_init_map(&fs_info->btrfs_state_change_map[0],
+					 "btrfs_trans_commit_start",
+					 &btrfs_trans_commit_start_key, 0);
+	lockdep_init_map(&fs_info->btrfs_state_change_map[1],
+					 "btrfs_trans_unblocked",
+					 &btrfs_trans_unblocked_key, 0);
+	lockdep_init_map(&fs_info->btrfs_state_change_map[2],
+					 "btrfs_trans_sup_commited",
+					 &btrfs_trans_sup_committed_key, 0);
+	lockdep_init_map(&fs_info->btrfs_state_change_map[3],
+					 "btrfs_trans_completed",
+					 &btrfs_trans_completed_key, 0);
+
 	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
 	INIT_LIST_HEAD(&fs_info->space_info);
 	INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index c9751a05c029..e4efaa27ec17 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -550,6 +550,7 @@ static void wait_current_trans(struct btrfs_fs_info *fs_info)
 		refcount_inc(&cur_trans->use_count);
 		spin_unlock(&fs_info->trans_lock);
 
+		btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_UNBLOCKED);
 		wait_event(fs_info->transaction_wait,
 			   cur_trans->state >= TRANS_STATE_UNBLOCKED ||
 			   TRANS_ABORTED(cur_trans));
@@ -949,6 +950,7 @@ int btrfs_wait_for_commit(struct btrfs_fs_info *fs_info, u64 transid)
 			goto out;  /* nothing committing|committed */
 	}
 
+	btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_COMPLETED);
 	wait_for_commit(cur_trans, TRANS_STATE_COMPLETED);
 	btrfs_put_transaction(cur_trans);
 out:
@@ -1980,6 +1982,7 @@ void btrfs_commit_transaction_async(struct btrfs_trans_handle *trans)
 	 * Wait for the current transaction commit to start and block
 	 * subsequent transaction joins
 	 */
+	btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_COMMIT_START);
 	wait_event(fs_info->transaction_blocked_wait,
 		   cur_trans->state >= TRANS_STATE_COMMIT_START ||
 		   TRANS_ABORTED(cur_trans));
@@ -2137,14 +2140,16 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	ktime_t interval;
 
 	ASSERT(refcount_read(&trans->use_count) == 1);
+	btrfs_trans_state_lockdep_acquire(fs_info,
+					  BTRFS_LOCKDEP_TRANS_COMMIT_START);
 
 	/* Stop the commit early if ->aborted is set */
 	if (TRANS_ABORTED(cur_trans)) {
 		ret = cur_trans->aborted;
-		btrfs_end_transaction(trans);
-		return ret;
+		goto lockdep_trans_commit_start_release;
 	}
 
+
 	btrfs_trans_release_metadata(trans);
 	trans->block_rsv = NULL;
 
@@ -2160,8 +2165,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 		 */
 		ret = btrfs_run_delayed_refs(trans, 0);
 		if (ret) {
-			btrfs_end_transaction(trans);
-			return ret;
+			goto lockdep_trans_commit_start_release;
 		}
 	}
 
@@ -2192,8 +2196,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 		if (run_it) {
 			ret = btrfs_start_dirty_block_groups(trans);
 			if (ret) {
-				btrfs_end_transaction(trans);
-				return ret;
+				goto lockdep_trans_commit_start_release;
 			}
 		}
 	}
@@ -2209,7 +2212,17 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 
 		if (trans->in_fsync)
 			want_state = TRANS_STATE_SUPER_COMMITTED;
+
+		btrfs_trans_state_lockdep_release(fs_info,
+						  BTRFS_LOCKDEP_TRANS_COMMIT_START);
 		ret = btrfs_end_transaction(trans);
+
+		if (want_state == TRANS_STATE_COMPLETED)
+			btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_COMPLETED);
+		else
+			btrfs_might_wait_for_state(fs_info,
+						   BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED);
+
 		wait_for_commit(cur_trans, want_state);
 
 		if (TRANS_ABORTED(cur_trans))
@@ -2222,6 +2235,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 
 	cur_trans->state = TRANS_STATE_COMMIT_START;
 	wake_up(&fs_info->transaction_blocked_wait);
+	btrfs_trans_state_lockdep_release(fs_info,
+					  BTRFS_LOCKDEP_TRANS_COMMIT_START);
 
 	if (cur_trans->list.prev != &fs_info->trans_list) {
 		enum btrfs_trans_state want_state = TRANS_STATE_COMPLETED;
@@ -2235,6 +2250,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 			refcount_inc(&prev_trans->use_count);
 			spin_unlock(&fs_info->trans_lock);
 
+			if (want_state == TRANS_STATE_COMPLETED)
+				btrfs_might_wait_for_state(fs_info,
+							   BTRFS_LOCKDEP_TRANS_COMPLETED);
+			else
+				btrfs_might_wait_for_state(fs_info,
+							   BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED);
 			wait_for_commit(prev_trans, want_state);
 
 			ret = READ_ONCE(prev_trans->aborted);
@@ -2323,6 +2344,15 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	wait_event(cur_trans->writer_wait,
 		   atomic_read(&cur_trans->num_writers) == 1);
 
+	/*
+	 * Make lockdep happy by acquiring the state locks after
+	 * btrfs_trans_num_writers is released.
+	 */
+	btrfs_trans_state_lockdep_acquire(fs_info, BTRFS_LOCKDEP_TRANS_COMPLETED);
+	btrfs_trans_state_lockdep_acquire(fs_info,
+					  BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED);
+	btrfs_trans_state_lockdep_acquire(fs_info, BTRFS_LOCKDEP_TRANS_UNBLOCKED);
+
 	/*
 	 * We've started the commit, clear the flag in case we were triggered to
 	 * do an async commit but somebody else started before the transaction
@@ -2332,6 +2362,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 
 	if (TRANS_ABORTED(cur_trans)) {
 		ret = cur_trans->aborted;
+		btrfs_trans_state_lockdep_release(fs_info,
+						  BTRFS_LOCKDEP_TRANS_UNBLOCKED);
 		goto scrub_continue;
 	}
 	/*
@@ -2466,6 +2498,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	mutex_unlock(&fs_info->reloc_mutex);
 
 	wake_up(&fs_info->transaction_wait);
+	btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_UNBLOCKED);
 
 	ret = btrfs_write_and_wait_transaction(trans);
 	if (ret) {
@@ -2497,6 +2530,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	 */
 	cur_trans->state = TRANS_STATE_SUPER_COMMITTED;
 	wake_up(&cur_trans->commit_wait);
+	btrfs_trans_state_lockdep_release(fs_info,
+					  BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED);
 
 	btrfs_finish_extent_commit(trans);
 
@@ -2510,6 +2545,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	 */
 	cur_trans->state = TRANS_STATE_COMPLETED;
 	wake_up(&cur_trans->commit_wait);
+	btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_COMPLETED);
 
 	spin_lock(&fs_info->trans_lock);
 	list_del_init(&cur_trans->list);
@@ -2538,7 +2574,11 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 
 unlock_reloc:
 	mutex_unlock(&fs_info->reloc_mutex);
+	btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_UNBLOCKED);
 scrub_continue:
+	btrfs_trans_state_lockdep_release(fs_info,
+					  BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED);
+	btrfs_trans_state_lockdep_release(fs_info, BTRFS_LOCKDEP_TRANS_COMPLETED);
 	btrfs_scrub_continue(fs_info);
 cleanup_transaction:
 	btrfs_trans_release_metadata(trans);
@@ -2555,6 +2595,11 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	btrfs_lockdep_release(fs_info, btrfs_trans_num_extwriters);
 	btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
 	goto cleanup_transaction;
+lockdep_trans_commit_start_release:
+	btrfs_trans_state_lockdep_release(fs_info,
+					  BTRFS_LOCKDEP_TRANS_COMMIT_START);
+	btrfs_end_transaction(trans);
+	return ret;
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/5] btrfs: Add a lockdep model for the pending_ordered wait event
  2022-07-19  4:09 [PATCH v2 0/5] btrfs: Annotate wait events with lockdep Ioannis Angelakopoulos
                   ` (2 preceding siblings ...)
  2022-07-19  4:09 ` [PATCH v2 3/5] btrfs: Add lockdep models for the transaction states wait events Ioannis Angelakopoulos
@ 2022-07-19  4:09 ` Ioannis Angelakopoulos
  2022-07-20 14:48   ` Josef Bacik
  2022-07-19  4:10 ` [PATCH v2 5/5] btrfs: Add a lockdep model for the ordered extents " Ioannis Angelakopoulos
  2022-07-20 14:47 ` [PATCH v2 0/5] btrfs: Annotate wait events with lockdep Sweet Tea Dorminy
  5 siblings, 1 reply; 17+ messages in thread
From: Ioannis Angelakopoulos @ 2022-07-19  4:09 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

In contrast to the num_writers and num_extwriters wait events, the
condition for the pending_ordered wait event is signaled in a different
context from the wait event itself. The condition signaling occurs in
btrfs_remove_ordered_extent() in fs/btrfs/ordered-data.c while the wait
event is implemented in btrfs_commit_transaction() in
fs/btrfs/transaction.c

Thus the thread signaling the condition has to acquire the lockdep map as a
reader at the start of btrfs_remove_ordered_extent() and release it after
it has signaled the condition. In this case some dependencies might be left
out due to the placement of the annotation, but it is better than no
annotation at all.

Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
---
 fs/btrfs/ctree.h        | 1 +
 fs/btrfs/disk-io.c      | 4 ++++
 fs/btrfs/ordered-data.c | 3 +++
 fs/btrfs/transaction.c  | 1 +
 4 files changed, 9 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e6c7cafcd296..661d19ac41d1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1098,6 +1098,7 @@ struct btrfs_fs_info {
 	struct lockdep_map btrfs_trans_num_writers_map;
 	struct lockdep_map btrfs_trans_num_extwriters_map;
 	struct lockdep_map btrfs_state_change_map[4];
+	struct lockdep_map btrfs_trans_pending_ordered_map;
 
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index be5cf86fa992..e9956cbf653f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3052,6 +3052,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	static struct lock_class_key btrfs_trans_unblocked_key;
 	static struct lock_class_key btrfs_trans_sup_committed_key;
 	static struct lock_class_key btrfs_trans_completed_key;
+	static struct lock_class_key btrfs_trans_pending_ordered_key;
 
 	INIT_RADIX_TREE(&fs_info->fs_roots_radix, GFP_ATOMIC);
 	INIT_RADIX_TREE(&fs_info->buffer_radix, GFP_ATOMIC);
@@ -3087,6 +3088,9 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	lockdep_init_map(&fs_info->btrfs_trans_num_extwriters_map,
 					 "btrfs_trans_num_extwriters",
 					 &btrfs_trans_num_extwriters_key, 0);
+	lockdep_init_map(&fs_info->btrfs_trans_pending_ordered_map,
+					 "btrfs_trans_pending_ordered",
+					 &btrfs_trans_pending_ordered_key, 0);
 
 	lockdep_init_map(&fs_info->btrfs_state_change_map[0],
 					 "btrfs_trans_commit_start",
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 1952ac85222c..2a4cb6db42d1 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -525,6 +525,7 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
 	struct rb_node *node;
 	bool pending;
 
+	btrfs_lockdep_acquire(fs_info, btrfs_trans_pending_ordered);
 	/* This is paired with btrfs_add_ordered_extent. */
 	spin_lock(&btrfs_inode->lock);
 	btrfs_mod_outstanding_extents(btrfs_inode, -1);
@@ -580,6 +581,8 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
 		}
 	}
 
+	btrfs_lockdep_release(fs_info, btrfs_trans_pending_ordered);
+
 	spin_lock(&root->ordered_extent_lock);
 	list_del_init(&entry->root_extent_list);
 	root->nr_ordered_extents--;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index e4efaa27ec17..3522e8f3381c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2320,6 +2320,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	 * transaction. Otherwise if this transaction commits before the ordered
 	 * extents complete we lose logged data after a power failure.
 	 */
+	btrfs_might_wait_for_event(fs_info, btrfs_trans_pending_ordered);
 	wait_event(cur_trans->pending_wait,
 		   atomic_read(&cur_trans->pending_ordered) == 0);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/5] btrfs: Add a lockdep model for the ordered extents wait event
  2022-07-19  4:09 [PATCH v2 0/5] btrfs: Annotate wait events with lockdep Ioannis Angelakopoulos
                   ` (3 preceding siblings ...)
  2022-07-19  4:09 ` [PATCH v2 4/5] btrfs: Add a lockdep model for the pending_ordered wait event Ioannis Angelakopoulos
@ 2022-07-19  4:10 ` Ioannis Angelakopoulos
  2022-07-20 14:50   ` Josef Bacik
  2022-07-20 14:47 ` [PATCH v2 0/5] btrfs: Annotate wait events with lockdep Sweet Tea Dorminy
  5 siblings, 1 reply; 17+ messages in thread
From: Ioannis Angelakopoulos @ 2022-07-19  4:10 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

This wait event is very similar to the pending_ordered wait event in the
sense that it occurs in a different context than the condition signaling
for the event. The signaling occurs in btrfs_remove_ordered_extent() while
the wait event is implemented in btrfs_start_ordered_extent() in
fs/btrfs/ordered-data.c

However, in this case a thread must not acquire the lockdep map for the
ordered extents wait event when the ordered extent is related to a free
space inode. That is because lockdep creates dependencies between locks
acquired both in execution paths related to normal inodes and paths related
to free space inodes, thus leading to false positives.

Also to prevent false positives related to free space inodes and normal
inodes, the lockdep map class for the inode->mapping->invalidate_lock is
reinitialized in load_free_space_cache() in fs/btrfs/free-space-cache.c

Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
---
 fs/btrfs/ctree.h            |  1 +
 fs/btrfs/disk-io.c          |  4 ++++
 fs/btrfs/free-space-cache.c | 11 +++++++++++
 fs/btrfs/inode.c            | 13 +++++++++++++
 fs/btrfs/ordered-data.c     | 18 ++++++++++++++++++
 5 files changed, 47 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 661d19ac41d1..941987ed4c43 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1099,6 +1099,7 @@ struct btrfs_fs_info {
 	struct lockdep_map btrfs_trans_num_extwriters_map;
 	struct lockdep_map btrfs_state_change_map[4];
 	struct lockdep_map btrfs_trans_pending_ordered_map;
+	struct lockdep_map btrfs_ordered_extent_map;
 
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e9956cbf653f..8f7a07362efb 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3053,6 +3053,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	static struct lock_class_key btrfs_trans_sup_committed_key;
 	static struct lock_class_key btrfs_trans_completed_key;
 	static struct lock_class_key btrfs_trans_pending_ordered_key;
+	static struct lock_class_key btrfs_ordered_extent_key;
 
 	INIT_RADIX_TREE(&fs_info->fs_roots_radix, GFP_ATOMIC);
 	INIT_RADIX_TREE(&fs_info->buffer_radix, GFP_ATOMIC);
@@ -3104,6 +3105,9 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	lockdep_init_map(&fs_info->btrfs_state_change_map[3],
 					 "btrfs_trans_completed",
 					 &btrfs_trans_completed_key, 0);
+	lockdep_init_map(&fs_info->btrfs_ordered_extent_map,
+					 "btrfs_ordered_extent",
+					 &btrfs_ordered_extent_key, 0);
 
 	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
 	INIT_LIST_HEAD(&fs_info->space_info);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 996da650ecdc..1e41a01a782b 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -914,6 +914,8 @@ static int copy_free_space_cache(struct btrfs_block_group *block_group,
 	return ret;
 }
 
+static struct lock_class_key btrfs_free_space_inode_key;
+
 int load_free_space_cache(struct btrfs_block_group *block_group)
 {
 	struct btrfs_fs_info *fs_info = block_group->fs_info;
@@ -924,6 +926,7 @@ int load_free_space_cache(struct btrfs_block_group *block_group)
 	int ret = 0;
 	bool matched;
 	u64 used = block_group->used;
+	struct address_space *mapping;
 
 	/*
 	 * Because we could potentially discard our loaded free space, we want
@@ -983,6 +986,14 @@ int load_free_space_cache(struct btrfs_block_group *block_group)
 	}
 	spin_unlock(&block_group->lock);
 
+	/*
+	 * Reinitialize the class of the inode->mapping->invalidate_lock for free
+	 * space inodes to prevent false positives related to the ordered extents
+	 * lockdep map.
+	 */
+	mapping = &inode->i_data;
+	lockdep_set_class(&mapping->invalidate_lock, &btrfs_free_space_inode_key);
+
 	ret = __load_free_space_cache(fs_info->tree_root, inode, &tmp_ctl,
 				      path, block_group->start);
 	btrfs_free_path(path);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f20740812e5b..36f973ffbd26 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3223,6 +3223,8 @@ int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 		clear_bits |= EXTENT_DELALLOC_NEW;
 
 	freespace_inode = btrfs_is_free_space_inode(inode);
+	if (!freespace_inode)
+		btrfs_lockdep_acquire(fs_info, btrfs_ordered_extent);
 
 	if (test_bit(BTRFS_ORDERED_IOERR, &ordered_extent->flags)) {
 		ret = -EIO;
@@ -8952,6 +8954,7 @@ void btrfs_destroy_inode(struct inode *vfs_inode)
 	struct btrfs_ordered_extent *ordered;
 	struct btrfs_inode *inode = BTRFS_I(vfs_inode);
 	struct btrfs_root *root = inode->root;
+	bool freespace_inode;
 
 	WARN_ON(!hlist_empty(&vfs_inode->i_dentry));
 	WARN_ON(vfs_inode->i_data.nrpages);
@@ -8973,6 +8976,12 @@ void btrfs_destroy_inode(struct inode *vfs_inode)
 	if (!root)
 		return;
 
+	/*
+	 * If this is a free space inode do not take the ordered extents lockdep
+	 * map.
+	 */
+	freespace_inode = btrfs_is_free_space_inode(inode);
+
 	while (1) {
 		ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1);
 		if (!ordered)
@@ -8981,6 +8990,10 @@ void btrfs_destroy_inode(struct inode *vfs_inode)
 			btrfs_err(root->fs_info,
 				  "found ordered extent %llu %llu on inode cleanup",
 				  ordered->file_offset, ordered->num_bytes);
+
+			if (!freespace_inode)
+				btrfs_lockdep_acquire(root->fs_info, btrfs_ordered_extent);
+
 			btrfs_remove_ordered_extent(inode, ordered);
 			btrfs_put_ordered_extent(ordered);
 			btrfs_put_ordered_extent(ordered);
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 2a4cb6db42d1..eb24a6d20ff8 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -524,6 +524,13 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct rb_node *node;
 	bool pending;
+	bool freespace_inode;
+
+	/*
+	 * If this is a free space inode the thread has not acquired the ordered
+	 * extents lockdep map.
+	 */
+	freespace_inode = btrfs_is_free_space_inode(btrfs_inode);
 
 	btrfs_lockdep_acquire(fs_info, btrfs_trans_pending_ordered);
 	/* This is paired with btrfs_add_ordered_extent. */
@@ -597,6 +604,8 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode,
 	}
 	spin_unlock(&root->ordered_extent_lock);
 	wake_up(&entry->wait);
+	if (!freespace_inode)
+		btrfs_lockdep_release(fs_info, btrfs_ordered_extent);
 }
 
 static void btrfs_run_ordered_extent_work(struct btrfs_work *work)
@@ -715,9 +724,16 @@ void btrfs_start_ordered_extent(struct btrfs_ordered_extent *entry, int wait)
 	u64 start = entry->file_offset;
 	u64 end = start + entry->num_bytes - 1;
 	struct btrfs_inode *inode = BTRFS_I(entry->inode);
+	bool freespace_inode;
 
 	trace_btrfs_ordered_extent_start(inode, entry);
 
+	/*
+	 * If this is a free space inode do not take the ordered extents lockdep
+	 * map.
+	 */
+	freespace_inode = btrfs_is_free_space_inode(inode);
+
 	/*
 	 * pages in the range can be dirty, clean or writeback.  We
 	 * start IO on any dirty ones so the wait doesn't stall waiting
@@ -726,6 +742,8 @@ void btrfs_start_ordered_extent(struct btrfs_ordered_extent *entry, int wait)
 	if (!test_bit(BTRFS_ORDERED_DIRECT, &entry->flags))
 		filemap_fdatawrite_range(inode->vfs_inode.i_mapping, start, end);
 	if (wait) {
+		if (!freespace_inode)
+			btrfs_might_wait_for_event(inode->root->fs_info, btrfs_ordered_extent);
 		wait_event(entry->wait, test_bit(BTRFS_ORDERED_COMPLETE,
 						 &entry->flags));
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/5] btrfs: Add a lockdep model for the num_writers wait event
  2022-07-19  4:09 ` [PATCH v2 1/5] btrfs: Add a lockdep model for the num_writers wait event Ioannis Angelakopoulos
@ 2022-07-20 14:46   ` Josef Bacik
  2022-07-20 14:47   ` Sweet Tea Dorminy
  1 sibling, 0 replies; 17+ messages in thread
From: Josef Bacik @ 2022-07-20 14:46 UTC (permalink / raw)
  To: Ioannis Angelakopoulos; +Cc: linux-btrfs, kernel-team

On Mon, Jul 18, 2022 at 09:09:52PM -0700, Ioannis Angelakopoulos wrote:
> Annotate the num_writers wait event in fs/btrfs/transaction.c with lockdep
> in order to catch deadlocks involving this wait event.
> 
> Use a read/write lockdep map for the annotation. A thread starting/joining
> the transaction acquires the map as a reader when it increments
> cur_trans->num_writers and it acquires the map as a writer before it
> blocks on the wait event.
> 
> Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] btrfs: Add a lockdep model for the num_extwriters wait event
  2022-07-19  4:09 ` [PATCH v2 2/5] btrfs: Add a lockdep model for the num_extwriters " Ioannis Angelakopoulos
@ 2022-07-20 14:46   ` Josef Bacik
  0 siblings, 0 replies; 17+ messages in thread
From: Josef Bacik @ 2022-07-20 14:46 UTC (permalink / raw)
  To: Ioannis Angelakopoulos; +Cc: linux-btrfs, kernel-team

On Mon, Jul 18, 2022 at 09:09:54PM -0700, Ioannis Angelakopoulos wrote:
> Similarly to the num_writers wait event in fs/btrfs/transaction.c add a
> lockdep annotation for the num_extwriters wait event.
> 
> Use a read/write lockdep map for the annotation. A thread starting/joining
> the transaction acquires the map as a reader when it increments
> cur_trans->num_writers and it acquires the map as a writer before it
> blocks on the wait event.
> 
> Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] btrfs: Annotate wait events with lockdep
  2022-07-19  4:09 [PATCH v2 0/5] btrfs: Annotate wait events with lockdep Ioannis Angelakopoulos
                   ` (4 preceding siblings ...)
  2022-07-19  4:10 ` [PATCH v2 5/5] btrfs: Add a lockdep model for the ordered extents " Ioannis Angelakopoulos
@ 2022-07-20 14:47 ` Sweet Tea Dorminy
  5 siblings, 0 replies; 17+ messages in thread
From: Sweet Tea Dorminy @ 2022-07-20 14:47 UTC (permalink / raw)
  To: Ioannis Angelakopoulos, linux-btrfs, kernel-team

On 7/19/22 00:09, Ioannis Angelakopoulos wrote:
> Hello,
> 
> With this patch series we annotate wait events in btrfs with lockdep to
> catch deadlocks involving these wait events.
> 
> Recently the btrfs developers fixed a non trivial deadlock involving
> wait events
> https://lore.kernel.org/linux-btrfs/20220614131413.GJ20633@twin.jikos.cz/
> 
> Currently lockdep is unable to catch these deadlocks since it does not
> support wait events by default.
> 
> With our lockdep annotations we train lockdep to track these wait events
> and catch more potential deadlocks.
> 
> Specifically, we annotate the below wait events in fs/btrfs/transaction.c
> and in fs/btrfs/ordered-data.c:
> 
>    1) The num_writers wait event
>    2) The num_extwriters wait event
>    3) The transaction states wait events
>    4) The pending_ordered wait event
>    5) The ordered extents wait event
> 
> Changes from v1:
>    1) Added 2 labels in the cleanup code of btrfs_commit_transaction() in
>    fs/btrfs/transaction.c so that btrfs_lockdep_release() is not called
>    multiple times during the error paths.
>    2) Added lockdep annotations for the btrfs transaction states wait
>    events.
>    3) Added a lockdep annotation for the pending_ordered wait event.
>    4) Added a lockdep annotation for the ordered extents wait event.
> 
Overall this seems really nice and I learned a bit about lockdep 
reviewing it; thank you! I have only organization and change description 
comments, I think.

I don't know much about lockdep and may be systematically confused here, 
but here's a change organization suggestion, and I have a few 
change-specific comments too in re specific changes.

I think it would be a little more elegant to have as change 1 the 
introduction of the various btrfs_lockdep_* macros and a detailed 
explanation of how the notional locks work. (I really like the word 
notional.) All the boilerplate currently in each change description 
could come into this one change description, maybe something like this:

"We use a read/write lockdep map to annotate 'lock'-type counters or 
events in the code. A common pattern among all the notional locks is 
that threads start or join a planned action, representing getting a 
non-exclusive lock on that action. Later, threads move on, relying on 
other threads to perform the planned action, until only one thread 
remains which then performs the action; notionally this is equivalent to 
releasing the lock and then attempting to get an exclusive lock. These 
lock actions are represented in code by the new btrfs_lockdep_*() 
functions [etc etc etc]" (I may be basing this too heavily on the first 
one...)

Then each subsequent change could be described as e.g. "Annotate the 
notional transaction writers lock, representing the threads involved in 
a particular transaction."

I personally don't know the code well enough to really grasp what each 
annotation represents from the current individual change description... 
is it possible to flesh out the changes with a little bit more about 
what each one is?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/5] btrfs: Add a lockdep model for the num_writers wait event
  2022-07-19  4:09 ` [PATCH v2 1/5] btrfs: Add a lockdep model for the num_writers wait event Ioannis Angelakopoulos
  2022-07-20 14:46   ` Josef Bacik
@ 2022-07-20 14:47   ` Sweet Tea Dorminy
  2022-07-20 17:12     ` Ioannis Angelakopoulos
  1 sibling, 1 reply; 17+ messages in thread
From: Sweet Tea Dorminy @ 2022-07-20 14:47 UTC (permalink / raw)
  To: Ioannis Angelakopoulos, linux-btrfs, kernel-team



On 7/19/22 00:09, Ioannis Angelakopoulos wrote:
> Annotate the num_writers wait event in fs/btrfs/transaction.c with lockdep
> in order to catch deadlocks involving this wait event.
> 
> Use a read/write lockdep map for the annotation. A thread starting/joining
> the transaction acquires the map as a reader when it increments
> cur_trans->num_writers and it acquires the map as a writer before it
> blocks on the wait events

> 
> Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
> ---
>   fs/btrfs/ctree.h       | 14 ++++++++++++++
>   fs/btrfs/disk-io.c     |  6 ++++++
>   fs/btrfs/transaction.c | 37 ++++++++++++++++++++++++++++++++-----
>   3 files changed, 52 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 202496172059..999868734be7 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1095,6 +1095,8 @@ struct btrfs_fs_info {
>   	/* Updates are not protected by any lock */
>   	struct btrfs_commit_stats commit_stats;
>   
> +	struct lockdep_map btrfs_trans_num_writers_map;
> +
>   #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>   	spinlock_t ref_verify_lock;
>   	struct rb_root block_tree;
> @@ -1175,6 +1177,18 @@ enum {
>   	BTRFS_ROOT_UNFINISHED_DROP,
>   };
>   
> +#define btrfs_might_wait_for_event(b, lock) \
> +	do { \
> +		rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \
> +		rwsem_release(&b->lock##_map, _THIS_IP_); \
> +	} while (0)
> +
This confuses me a lot. btrfs_might_wait_for_event() doesn't have the 
same lockdep prefix that the other annotations do, so it's not obvious 
reading the callsites that it's lockdep-related. I also don't understand 
what parameters I should pass this function easily, so a comment would 
be nice.

And I think, after reading lockdep code, I understand why it's 
rwsem_acquire() here, vs rwsem_acquire_read() below, but it might be 
nice to have a comment block explaining why there's a difference in 
which one is called.

(Stylistically, it would be nice IMO for all of the slashes to be lined 
up vertically, a la tree-checker.c:102)

> +#define btrfs_lockdep_acquire(b, lock) \
> +	rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_)
> +
Reading the lockdep stuff, it seems that they draw distinction between 
_acquire and _acquire_read. Maybe worth a comment to note all the 
notional locks start off shared and only become exclusive in 
btrfs_might_wait_for_event()?

> +#define btrfs_lockdep_release(b, lock) \
> +	rwsem_release(&b->lock##_map, _THIS_IP_)
> +
>   static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
>   {
>   	clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 3fac429cf8a4..01a5a49a3a11 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3046,6 +3046,8 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
>   
>   void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>   {
> +	static struct lock_class_key btrfs_trans_num_writers_key;
> +
>   	INIT_RADIX_TREE(&fs_info->fs_roots_radix, GFP_ATOMIC);
>   	INIT_RADIX_TREE(&fs_info->buffer_radix, GFP_ATOMIC);
>   	INIT_LIST_HEAD(&fs_info->trans_list);
> @@ -3074,6 +3076,10 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>   	mutex_init(&fs_info->zoned_data_reloc_io_lock);
>   	seqlock_init(&fs_info->profiles_lock);
>   
> +	lockdep_init_map(&fs_info->btrfs_trans_num_writers_map,
> +					 "btrfs_trans_num_writers",
> +					 &btrfs_trans_num_writers_key, 0);
Sort of a non-standard indentation, as far as I know. Might I suggest 
lining up the second and third lines like so:
+	lockdep_init_map(&fs_info->btrfs_trans_num_writers_map,
+			 "btrfs_trans_num_writers",
+			 &btrfs_trans_num_writers_key, 0);

Also, it looks like you call this function with the same sort of pattern 
in all the patches. You could make a macro that does something like 
(untested)
do {
	static struct lock_class_key lockname##_key
	lockdep_init_map(&fs_info->lockname##_map, lockname,
			 &lockname##_key, 0);
} while (0);
and not have to separately define the keys.


> +
>   	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
>   	INIT_LIST_HEAD(&fs_info->space_info);
>   	INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 0bec10740ad3..d8287ec890bc 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -313,6 +313,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
>   		atomic_inc(&cur_trans->num_writers);
>   		extwriter_counter_inc(cur_trans, type);
>   		spin_unlock(&fs_info->trans_lock);
> +		btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers);
>   		return 0;
>   	}
>   	spin_unlock(&fs_info->trans_lock);
> @@ -334,16 +335,20 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
>   	if (!cur_trans)
>   		return -ENOMEM;
>   
> +	btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers);
> +
>   	spin_lock(&fs_info->trans_lock);
>   	if (fs_info->running_transaction) {
>   		/*
>   		 * someone started a transaction after we unlocked.  Make sure
>   		 * to redo the checks above
>   		 */
> +		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>   		kfree(cur_trans);
>   		goto loop;
>   	} else if (BTRFS_FS_ERROR(fs_info)) {
>   		spin_unlock(&fs_info->trans_lock);
> +		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>   		kfree(cur_trans);
>   		return -EROFS;
>   	}
> @@ -1022,6 +1027,9 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
>   	extwriter_counter_dec(cur_trans, trans->type);
>   
>   	cond_wake_up(&cur_trans->writer_wait);
> +
> +	btrfs_lockdep_release(info, btrfs_trans_num_writers);
> +
>   	btrfs_put_transaction(cur_trans);
>   
>   	if (current->journal_info == trans)
> @@ -1994,6 +2002,12 @@ static void cleanup_transaction(struct btrfs_trans_handle *trans, int err)
>   	if (cur_trans == fs_info->running_transaction) {
>   		cur_trans->state = TRANS_STATE_COMMIT_DOING;
>   		spin_unlock(&fs_info->trans_lock);
> +
> +		/*
> +		 * The thread has already released the lockdep map as reader already in
> +		 * btrfs_commit_transaction().
> +		 */
> +		btrfs_might_wait_for_event(fs_info, btrfs_trans_num_writers);
>   		wait_event(cur_trans->writer_wait,
>   			   atomic_read(&cur_trans->num_writers) == 1);
>   
> @@ -2222,7 +2236,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>   
>   			btrfs_put_transaction(prev_trans);
>   			if (ret)
> -				goto cleanup_transaction;
> +				goto lockdep_release;
>   		} else {
>   			spin_unlock(&fs_info->trans_lock);
>   		}
> @@ -2236,7 +2250,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>   		 */
>   		if (BTRFS_FS_ERROR(fs_info)) {
>   			ret = -EROFS;
> -			goto cleanup_transaction;
> +			goto lockdep_release;
>   		}
>   	}
>   
> @@ -2250,19 +2264,21 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>   
>   	ret = btrfs_start_delalloc_flush(fs_info);
>   	if (ret)
> -		goto cleanup_transaction;
> +		goto lockdep_release;
>   
>   	ret = btrfs_run_delayed_items(trans);
>   	if (ret)
> -		goto cleanup_transaction;
> +		goto lockdep_release;
>   
>   	wait_event(cur_trans->writer_wait,
>   		   extwriter_counter_read(cur_trans) == 0);
>   
>   	/* some pending stuffs might be added after the previous flush. */
>   	ret = btrfs_run_delayed_items(trans);
> -	if (ret)
> +	if (ret) {
> +		btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>   		goto cleanup_transaction;
> +	}
Could this just be goto lockdep_release; like the earlier sites?
>   
>   	btrfs_wait_delalloc_flush(fs_info);
>   
> @@ -2284,6 +2300,14 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>   	add_pending_snapshot(trans);
>   	cur_trans->state = TRANS_STATE_COMMIT_DOING;
>   	spin_unlock(&fs_info->trans_lock);
> +
> +	/*
> +	 * The thread has started/joined the transaction thus it holds the lockdep
> +	 * map as a reader. It has to release it before acquiring the lockdep map
> +	 * as a writer.
> +	 */
> +	btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
> +	btrfs_might_wait_for_event(fs_info, btrfs_trans_num_writers);

I am really unsure if this is more readable, but: you could do here:
lockdep_release:
	btrfs_lockdep_release(...)
	if (ret)
		goto cleanup_transaction;
	btrfs_might_wait_for_event()
which has the merit of keeping the control flow relatively linear for 
cleanup at the end. I find the end a little confusing, with the lockdep 
release at the end and then jumping backwards to other cleanup, but I 
can't find a better way than this proposal to untangle it, and I'm not
sure it's better...

>   	cleanup_transaction(trans, ret);
>   
>   	return ret;
> +lockdep_release:
> +	btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
> +	goto cleanup_transaction;
>   }
>   
>   /*

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/5] btrfs: Add lockdep models for the transaction states wait events
  2022-07-19  4:09 ` [PATCH v2 3/5] btrfs: Add lockdep models for the transaction states wait events Ioannis Angelakopoulos
@ 2022-07-20 14:47   ` Sweet Tea Dorminy
  2022-07-20 17:39     ` Ioannis Angelakopoulos
  2022-07-20 14:48   ` Josef Bacik
  1 sibling, 1 reply; 17+ messages in thread
From: Sweet Tea Dorminy @ 2022-07-20 14:47 UTC (permalink / raw)
  To: Ioannis Angelakopoulos, linux-btrfs, kernel-team



On 7/19/22 00:09, Ioannis Angelakopoulos wrote:
> Add a lockdep annotation for the transaction states that have wait
> events; 1) TRANS_STATE_COMMIT_START, 2) TRANS_STATE_UNBLOCKED, 3)
> TRANS_STATE_SUPER_COMMITTED, and 4) TRANS_STATE_COMPLETED in
> fs/btrfs/transaction.c.
> 
> With the exception of the lockdep annotation for TRANS_STATE_COMMIT_START
> the transaction thread has to acquire the lockdep maps for the transaction
> states as reader after the lockdep map for num_writers is released so that
> lockdep does not complain.

"acquire the lockdep maps for the transaction states as reader ..." took 
a couple readings for me to grasp. Maybe "get a notional read lock on 
the transaction state only after releasing the notional lock on 
num_writers, as lockdep requires locks to be acquired/released in 
[somehow]". (I don't know how lockdep complains, but I think saying why 
in this commit message would help me understand the 'why' better.)

> @@ -2323,6 +2344,15 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>   	wait_event(cur_trans->writer_wait,
>   		   atomic_read(&cur_trans->num_writers) == 1);
>   
> +	/*
> +	 * Make lockdep happy by acquiring the state locks after
> +	 * btrfs_trans_num_writers is released.
> +	 */

Same sort of comment here: elaborating on why it makes lockdep happy 
would help me understand this better.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/5] btrfs: Add lockdep models for the transaction states wait events
  2022-07-19  4:09 ` [PATCH v2 3/5] btrfs: Add lockdep models for the transaction states wait events Ioannis Angelakopoulos
  2022-07-20 14:47   ` Sweet Tea Dorminy
@ 2022-07-20 14:48   ` Josef Bacik
  2022-07-20 17:49     ` Ioannis Angelakopoulos
  1 sibling, 1 reply; 17+ messages in thread
From: Josef Bacik @ 2022-07-20 14:48 UTC (permalink / raw)
  To: Ioannis Angelakopoulos; +Cc: linux-btrfs, kernel-team

On Mon, Jul 18, 2022 at 09:09:56PM -0700, Ioannis Angelakopoulos wrote:
> Add a lockdep annotation for the transaction states that have wait
> events; 1) TRANS_STATE_COMMIT_START, 2) TRANS_STATE_UNBLOCKED, 3)
> TRANS_STATE_SUPER_COMMITTED, and 4) TRANS_STATE_COMPLETED in
> fs/btrfs/transaction.c.
> 
> With the exception of the lockdep annotation for TRANS_STATE_COMMIT_START
> the transaction thread has to acquire the lockdep maps for the transaction
> states as reader after the lockdep map for num_writers is released so that
> lockdep does not complain.
> 
> 
> Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
> ---
>  fs/btrfs/ctree.h       | 20 +++++++++++++++
>  fs/btrfs/disk-io.c     | 17 +++++++++++++
>  fs/btrfs/transaction.c | 57 +++++++++++++++++++++++++++++++++++++-----
>  3 files changed, 88 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 586756f831e5..e6c7cafcd296 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1097,6 +1097,7 @@ struct btrfs_fs_info {
>  
>  	struct lockdep_map btrfs_trans_num_writers_map;
>  	struct lockdep_map btrfs_trans_num_extwriters_map;
> +	struct lockdep_map btrfs_state_change_map[4];
>  
>  #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>  	spinlock_t ref_verify_lock;
> @@ -1178,6 +1179,13 @@ enum {
>  	BTRFS_ROOT_UNFINISHED_DROP,
>  };
>  
> +enum btrfs_lockdep_trans_states {
> +	BTRFS_LOCKDEP_TRANS_COMMIT_START,
> +	BTRFS_LOCKDEP_TRANS_UNBLOCKED,
> +	BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED,
> +	BTRFS_LOCKDEP_TRANS_COMPLETED,
> +};
> +
>  #define btrfs_might_wait_for_event(b, lock) \
>  	do { \
>  		rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \
> @@ -1190,6 +1198,18 @@ enum {
>  #define btrfs_lockdep_release(b, lock) \
>  	rwsem_release(&b->lock##_map, _THIS_IP_)
>  
> +#define btrfs_might_wait_for_state(b, i) \
> +	do { \
> +		rwsem_acquire(&b->btrfs_state_change_map[i], 0, 0, _THIS_IP_); \
> +		rwsem_release(&b->btrfs_state_change_map[i], _THIS_IP_); \
> +	} while (0)
> +
> +#define btrfs_trans_state_lockdep_acquire(b, i) \
> +	rwsem_acquire_read(&b->btrfs_state_change_map[i], 0, 0, _THIS_IP_)
> +
> +#define btrfs_trans_state_lockdep_release(b, i) \
> +	rwsem_release(&b->btrfs_state_change_map[i], _THIS_IP_)
> +
>  static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
>  {
>  	clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index b1193584ba49..be5cf86fa992 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3048,6 +3048,10 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>  {
>  	static struct lock_class_key btrfs_trans_num_writers_key;
>  	static struct lock_class_key btrfs_trans_num_extwriters_key;
> +	static struct lock_class_key btrfs_trans_commit_start_key;
> +	static struct lock_class_key btrfs_trans_unblocked_key;
> +	static struct lock_class_key btrfs_trans_sup_committed_key;
> +	static struct lock_class_key btrfs_trans_completed_key;
>  
>  	INIT_RADIX_TREE(&fs_info->fs_roots_radix, GFP_ATOMIC);
>  	INIT_RADIX_TREE(&fs_info->buffer_radix, GFP_ATOMIC);
> @@ -3084,6 +3088,19 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>  					 "btrfs_trans_num_extwriters",
>  					 &btrfs_trans_num_extwriters_key, 0);
>  
> +	lockdep_init_map(&fs_info->btrfs_state_change_map[0],
> +					 "btrfs_trans_commit_start",
> +					 &btrfs_trans_commit_start_key, 0);
> +	lockdep_init_map(&fs_info->btrfs_state_change_map[1],
> +					 "btrfs_trans_unblocked",
> +					 &btrfs_trans_unblocked_key, 0);
> +	lockdep_init_map(&fs_info->btrfs_state_change_map[2],
> +					 "btrfs_trans_sup_commited",

s/commited/committed/

> +					 &btrfs_trans_sup_committed_key, 0);
> +	lockdep_init_map(&fs_info->btrfs_state_change_map[3],
> +					 "btrfs_trans_completed",
> +					 &btrfs_trans_completed_key, 0);
> +
>  	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
>  	INIT_LIST_HEAD(&fs_info->space_info);
>  	INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index c9751a05c029..e4efaa27ec17 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -550,6 +550,7 @@ static void wait_current_trans(struct btrfs_fs_info *fs_info)
>  		refcount_inc(&cur_trans->use_count);
>  		spin_unlock(&fs_info->trans_lock);
>  
> +		btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_UNBLOCKED);
>  		wait_event(fs_info->transaction_wait,
>  			   cur_trans->state >= TRANS_STATE_UNBLOCKED ||
>  			   TRANS_ABORTED(cur_trans));
> @@ -949,6 +950,7 @@ int btrfs_wait_for_commit(struct btrfs_fs_info *fs_info, u64 transid)
>  			goto out;  /* nothing committing|committed */
>  	}
>  
> +	btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_COMPLETED);
>  	wait_for_commit(cur_trans, TRANS_STATE_COMPLETED);
>  	btrfs_put_transaction(cur_trans);
>  out:
> @@ -1980,6 +1982,7 @@ void btrfs_commit_transaction_async(struct btrfs_trans_handle *trans)
>  	 * Wait for the current transaction commit to start and block
>  	 * subsequent transaction joins
>  	 */
> +	btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_COMMIT_START);
>  	wait_event(fs_info->transaction_blocked_wait,
>  		   cur_trans->state >= TRANS_STATE_COMMIT_START ||
>  		   TRANS_ABORTED(cur_trans));
> @@ -2137,14 +2140,16 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  	ktime_t interval;
>  
>  	ASSERT(refcount_read(&trans->use_count) == 1);
> +	btrfs_trans_state_lockdep_acquire(fs_info,
> +					  BTRFS_LOCKDEP_TRANS_COMMIT_START);
>  
>  	/* Stop the commit early if ->aborted is set */
>  	if (TRANS_ABORTED(cur_trans)) {
>  		ret = cur_trans->aborted;
> -		btrfs_end_transaction(trans);
> -		return ret;
> +		goto lockdep_trans_commit_start_release;
>  	}
>  
> +
>  	btrfs_trans_release_metadata(trans);
>  	trans->block_rsv = NULL;
>  
> @@ -2160,8 +2165,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  		 */
>  		ret = btrfs_run_delayed_refs(trans, 0);
>  		if (ret) {
> -			btrfs_end_transaction(trans);
> -			return ret;
> +			goto lockdep_trans_commit_start_release;
>  		}
>  	}
>  
> @@ -2192,8 +2196,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  		if (run_it) {
>  			ret = btrfs_start_dirty_block_groups(trans);
>  			if (ret) {
> -				btrfs_end_transaction(trans);
> -				return ret;
> +				goto lockdep_trans_commit_start_release;
>  			}
>  		}
>  	}
> @@ -2209,7 +2212,17 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  
>  		if (trans->in_fsync)
>  			want_state = TRANS_STATE_SUPER_COMMITTED;
> +
> +		btrfs_trans_state_lockdep_release(fs_info,
> +						  BTRFS_LOCKDEP_TRANS_COMMIT_START);
>  		ret = btrfs_end_transaction(trans);
> +
> +		if (want_state == TRANS_STATE_COMPLETED)
> +			btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_COMPLETED);
> +		else
> +			btrfs_might_wait_for_state(fs_info,
> +						   BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED);
> +
>  		wait_for_commit(cur_trans, want_state);
>  
>  		if (TRANS_ABORTED(cur_trans))
> @@ -2222,6 +2235,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  
>  	cur_trans->state = TRANS_STATE_COMMIT_START;
>  	wake_up(&fs_info->transaction_blocked_wait);
> +	btrfs_trans_state_lockdep_release(fs_info,
> +					  BTRFS_LOCKDEP_TRANS_COMMIT_START);
>  
>  	if (cur_trans->list.prev != &fs_info->trans_list) {
>  		enum btrfs_trans_state want_state = TRANS_STATE_COMPLETED;
> @@ -2235,6 +2250,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  			refcount_inc(&prev_trans->use_count);
>  			spin_unlock(&fs_info->trans_lock);
>  
> +			if (want_state == TRANS_STATE_COMPLETED)
> +				btrfs_might_wait_for_state(fs_info,
> +							   BTRFS_LOCKDEP_TRANS_COMPLETED);
> +			else
> +				btrfs_might_wait_for_state(fs_info,
> +							   BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED);

You do this everywhere we call wait_for_commit(), you can push these
btrfs_might_wait_for_state() into wait_for_commit and make everything a bit
cleaner.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] btrfs: Add a lockdep model for the pending_ordered wait event
  2022-07-19  4:09 ` [PATCH v2 4/5] btrfs: Add a lockdep model for the pending_ordered wait event Ioannis Angelakopoulos
@ 2022-07-20 14:48   ` Josef Bacik
  0 siblings, 0 replies; 17+ messages in thread
From: Josef Bacik @ 2022-07-20 14:48 UTC (permalink / raw)
  To: Ioannis Angelakopoulos; +Cc: linux-btrfs, kernel-team

On Mon, Jul 18, 2022 at 09:09:58PM -0700, Ioannis Angelakopoulos wrote:
> In contrast to the num_writers and num_extwriters wait events, the
> condition for the pending_ordered wait event is signaled in a different
> context from the wait event itself. The condition signaling occurs in
> btrfs_remove_ordered_extent() in fs/btrfs/ordered-data.c while the wait
> event is implemented in btrfs_commit_transaction() in
> fs/btrfs/transaction.c
> 
> Thus the thread signaling the condition has to acquire the lockdep map as a
> reader at the start of btrfs_remove_ordered_extent() and release it after
> it has signaled the condition. In this case some dependencies might be left
> out due to the placement of the annotation, but it is better than no
> annotation at all.
> 
> Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 5/5] btrfs: Add a lockdep model for the ordered extents wait event
  2022-07-19  4:10 ` [PATCH v2 5/5] btrfs: Add a lockdep model for the ordered extents " Ioannis Angelakopoulos
@ 2022-07-20 14:50   ` Josef Bacik
  0 siblings, 0 replies; 17+ messages in thread
From: Josef Bacik @ 2022-07-20 14:50 UTC (permalink / raw)
  To: Ioannis Angelakopoulos; +Cc: linux-btrfs, kernel-team

On Mon, Jul 18, 2022 at 09:10:00PM -0700, Ioannis Angelakopoulos wrote:
> This wait event is very similar to the pending_ordered wait event in the
> sense that it occurs in a different context than the condition signaling
> for the event. The signaling occurs in btrfs_remove_ordered_extent() while
> the wait event is implemented in btrfs_start_ordered_extent() in
> fs/btrfs/ordered-data.c
> 
> However, in this case a thread must not acquire the lockdep map for the
> ordered extents wait event when the ordered extent is related to a free
> space inode. That is because lockdep creates dependencies between locks
> acquired both in execution paths related to normal inodes and paths related
> to free space inodes, thus leading to false positives.
> 
> Also to prevent false positives related to free space inodes and normal
> inodes, the lockdep map class for the inode->mapping->invalidate_lock is
> reinitialized in load_free_space_cache() in fs/btrfs/free-space-cache.c
> 

Make this bit a separate patch, put it before this one with an explanation of
why so we have a commit per logical change.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/5] btrfs: Add a lockdep model for the num_writers wait event
  2022-07-20 14:47   ` Sweet Tea Dorminy
@ 2022-07-20 17:12     ` Ioannis Angelakopoulos
  0 siblings, 0 replies; 17+ messages in thread
From: Ioannis Angelakopoulos @ 2022-07-20 17:12 UTC (permalink / raw)
  To: Sweet Tea Dorminy, linux-btrfs, Kernel Team

On 7/20/22 7:47 AM, Sweet Tea Dorminy wrote:
> 
> 
> On 7/19/22 00:09, Ioannis Angelakopoulos wrote:
>> Annotate the num_writers wait event in fs/btrfs/transaction.c with 
>> lockdep
>> in order to catch deadlocks involving this wait event.
>>
>> Use a read/write lockdep map for the annotation. A thread 
>> starting/joining
>> the transaction acquires the map as a reader when it increments
>> cur_trans->num_writers and it acquires the map as a writer before it
>> blocks on the wait events
> 
>>
>> Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
>> ---
>>   fs/btrfs/ctree.h       | 14 ++++++++++++++
>>   fs/btrfs/disk-io.c     |  6 ++++++
>>   fs/btrfs/transaction.c | 37 ++++++++++++++++++++++++++++++++-----
>>   3 files changed, 52 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 202496172059..999868734be7 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1095,6 +1095,8 @@ struct btrfs_fs_info {
>>       /* Updates are not protected by any lock */
>>       struct btrfs_commit_stats commit_stats;
>> +    struct lockdep_map btrfs_trans_num_writers_map;
>> +
>>   #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>>       spinlock_t ref_verify_lock;
>>       struct rb_root block_tree;
>> @@ -1175,6 +1177,18 @@ enum {
>>       BTRFS_ROOT_UNFINISHED_DROP,
>>   };
>> +#define btrfs_might_wait_for_event(b, lock) \
>> +    do { \
>> +        rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \
>> +        rwsem_release(&b->lock##_map, _THIS_IP_); \
>> +    } while (0)
>> +
> This confuses me a lot. btrfs_might_wait_for_event() doesn't have the 
> same lockdep prefix that the other annotations do, so it's not obvious 
> reading the callsites that it's lockdep-related. I also don't understand 
> what parameters I should pass this function easily, so a comment would 
> be nice.
> 
> And I think, after reading lockdep code, I understand why it's 
> rwsem_acquire() here, vs rwsem_acquire_read() below, but it might be 
> nice to have a comment block explaining why there's a difference in 
> which one is called.
> 
The rwsem_acquire_read is to acquire a shared lock (reader), and the 
rwsem_acquire is to acquire an exclusive lock (writer). Threads that 
modify the conditition take the lock as readers while the condition is 
false and release it after the condition is signaled. The threads that 
might block on the wait event take the lock as writers, meaning that 
they should first block on the lock until all the reader threads 
released their locks.

> (Stylistically, it would be nice IMO for all of the slashes to be lined 
> up vertically, a la tree-checker.c:102)
> 
>> +#define btrfs_lockdep_acquire(b, lock) \
>> +    rwsem_acquire_read(&b->lock##_map, 0, 0, _THIS_IP_)
>> +
> Reading the lockdep stuff, it seems that they draw distinction between 
> _acquire and _acquire_read. Maybe worth a comment to note all the 
> notional locks start off shared and only become exclusive in 
> btrfs_might_wait_for_event()?
>
Yes, I can put a comment specifying where each lock is taken as 
explained above.

>> +#define btrfs_lockdep_release(b, lock) \
>> +    rwsem_release(&b->lock##_map, _THIS_IP_)
>> +
>>   static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info 
>> *fs_info)
>>   {
>>       clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 3fac429cf8a4..01a5a49a3a11 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3046,6 +3046,8 @@ static int __cold init_tree_roots(struct 
>> btrfs_fs_info *fs_info)
>>   void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>>   {
>> +    static struct lock_class_key btrfs_trans_num_writers_key;
>> +
>>       INIT_RADIX_TREE(&fs_info->fs_roots_radix, GFP_ATOMIC);
>>       INIT_RADIX_TREE(&fs_info->buffer_radix, GFP_ATOMIC);
>>       INIT_LIST_HEAD(&fs_info->trans_list);
>> @@ -3074,6 +3076,10 @@ void btrfs_init_fs_info(struct btrfs_fs_info 
>> *fs_info)
>>       mutex_init(&fs_info->zoned_data_reloc_io_lock);
>>       seqlock_init(&fs_info->profiles_lock);
>> +    lockdep_init_map(&fs_info->btrfs_trans_num_writers_map,
>> +                     "btrfs_trans_num_writers",
>> +                     &btrfs_trans_num_writers_key, 0);
> Sort of a non-standard indentation, as far as I know. Might I suggest 
> lining up the second and third lines like so:

Sorry for this. It might be a problem with my editor and I will address it.
> +    lockdep_init_map(&fs_info->btrfs_trans_num_writers_map,
> +             "btrfs_trans_num_writers",
> +             &btrfs_trans_num_writers_key, 0);
> 
> Also, it looks like you call this function with the same sort of pattern 
> in all the patches. You could make a macro that does something like 
> (untested)
> do {
>      static struct lock_class_key lockname##_key
>      lockdep_init_map(&fs_info->lockname##_map, lockname,
>               &lockname##_key, 0);
> } while (0);
> and not have to separately define the keys.
You are right this looks cleaner. I will test it and if it works I will 
include it in the next version of the patches.
> 
> 
>> +
>>       INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
>>       INIT_LIST_HEAD(&fs_info->space_info);
>>       INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index 0bec10740ad3..d8287ec890bc 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -313,6 +313,7 @@ static noinline int join_transaction(struct 
>> btrfs_fs_info *fs_info,
>>           atomic_inc(&cur_trans->num_writers);
>>           extwriter_counter_inc(cur_trans, type);
>>           spin_unlock(&fs_info->trans_lock);
>> +        btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers);
>>           return 0;
>>       }
>>       spin_unlock(&fs_info->trans_lock);
>> @@ -334,16 +335,20 @@ static noinline int join_transaction(struct 
>> btrfs_fs_info *fs_info,
>>       if (!cur_trans)
>>           return -ENOMEM;
>> +    btrfs_lockdep_acquire(fs_info, btrfs_trans_num_writers);
>> +
>>       spin_lock(&fs_info->trans_lock);
>>       if (fs_info->running_transaction) {
>>           /*
>>            * someone started a transaction after we unlocked.  Make sure
>>            * to redo the checks above
>>            */
>> +        btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>>           kfree(cur_trans);
>>           goto loop;
>>       } else if (BTRFS_FS_ERROR(fs_info)) {
>>           spin_unlock(&fs_info->trans_lock);
>> +        btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>>           kfree(cur_trans);
>>           return -EROFS;
>>       }
>> @@ -1022,6 +1027,9 @@ static int __btrfs_end_transaction(struct 
>> btrfs_trans_handle *trans,
>>       extwriter_counter_dec(cur_trans, trans->type);
>>       cond_wake_up(&cur_trans->writer_wait);
>> +
>> +    btrfs_lockdep_release(info, btrfs_trans_num_writers);
>> +
>>       btrfs_put_transaction(cur_trans);
>>       if (current->journal_info == trans)
>> @@ -1994,6 +2002,12 @@ static void cleanup_transaction(struct 
>> btrfs_trans_handle *trans, int err)
>>       if (cur_trans == fs_info->running_transaction) {
>>           cur_trans->state = TRANS_STATE_COMMIT_DOING;
>>           spin_unlock(&fs_info->trans_lock);
>> +
>> +        /*
>> +         * The thread has already released the lockdep map as reader 
>> already in
>> +         * btrfs_commit_transaction().
>> +         */
>> +        btrfs_might_wait_for_event(fs_info, btrfs_trans_num_writers);
>>           wait_event(cur_trans->writer_wait,
>>                  atomic_read(&cur_trans->num_writers) == 1);
>> @@ -2222,7 +2236,7 @@ int btrfs_commit_transaction(struct 
>> btrfs_trans_handle *trans)
>>               btrfs_put_transaction(prev_trans);
>>               if (ret)
>> -                goto cleanup_transaction;
>> +                goto lockdep_release;
>>           } else {
>>               spin_unlock(&fs_info->trans_lock);
>>           }
>> @@ -2236,7 +2250,7 @@ int btrfs_commit_transaction(struct 
>> btrfs_trans_handle *trans)
>>            */
>>           if (BTRFS_FS_ERROR(fs_info)) {
>>               ret = -EROFS;
>> -            goto cleanup_transaction;
>> +            goto lockdep_release;
>>           }
>>       }
>> @@ -2250,19 +2264,21 @@ int btrfs_commit_transaction(struct 
>> btrfs_trans_handle *trans)
>>       ret = btrfs_start_delalloc_flush(fs_info);
>>       if (ret)
>> -        goto cleanup_transaction;
>> +        goto lockdep_release;
>>       ret = btrfs_run_delayed_items(trans);
>>       if (ret)
>> -        goto cleanup_transaction;
>> +        goto lockdep_release;
>>       wait_event(cur_trans->writer_wait,
>>              extwriter_counter_read(cur_trans) == 0);
>>       /* some pending stuffs might be added after the previous flush. */
>>       ret = btrfs_run_delayed_items(trans);
>> -    if (ret)
>> +    if (ret) {
>> +        btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>>           goto cleanup_transaction;
>> +    }
> Could this just be goto lockdep_release; like the earlier sites?
Unfortunately, this is intentional for the num_extwriters wait event in 
the next patch. It has to do with the places where the thread releases 
the locks as reader.

>>       btrfs_wait_delalloc_flush(fs_info);
>> @@ -2284,6 +2300,14 @@ int btrfs_commit_transaction(struct 
>> btrfs_trans_handle *trans)
>>       add_pending_snapshot(trans);
>>       cur_trans->state = TRANS_STATE_COMMIT_DOING;
>>       spin_unlock(&fs_info->trans_lock);
>> +
>> +    /*
>> +     * The thread has started/joined the transaction thus it holds 
>> the lockdep
>> +     * map as a reader. It has to release it before acquiring the 
>> lockdep map
>> +     * as a writer.
>> +     */
>> +    btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>> +    btrfs_might_wait_for_event(fs_info, btrfs_trans_num_writers);
> 
> I am really unsure if this is more readable, but: you could do here:
> lockdep_release:
>      btrfs_lockdep_release(...)
>      if (ret)
>          goto cleanup_transaction;

I am not sure I entirely understand this comment, but the purpose of 
lockdep_release is not to call btrfs_lockdep_release() multiple times in 
the error code paths and thus to keep the code cleaner. Unfortunately I 
could not add the btrfs_lockdep_release in the cleanup_transaction label 
due to way the locks are released in the normal code path. Jumping back 
from release_lockdep to cleanup_transaction might not be keeping the 
control flow intact but keeps the code cleaner in general.
>      btrfs_might_wait_for_event()
> which has the merit of keeping the control flow relatively linear for 
> cleanup at the end. I find the end a little confusing, with the lockdep 
> release at the end and then jumping backwards to other cleanup, but I 
> can't find a better way than this proposal to untangle it, and I'm not
> sure it's better...
> 
>>       cleanup_transaction(trans, ret);
>>       return ret;
>> +lockdep_release:
>> +    btrfs_lockdep_release(fs_info, btrfs_trans_num_writers);
>> +    goto cleanup_transaction;
>>   }
>>   /*


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/5] btrfs: Add lockdep models for the transaction states wait events
  2022-07-20 14:47   ` Sweet Tea Dorminy
@ 2022-07-20 17:39     ` Ioannis Angelakopoulos
  0 siblings, 0 replies; 17+ messages in thread
From: Ioannis Angelakopoulos @ 2022-07-20 17:39 UTC (permalink / raw)
  To: Sweet Tea Dorminy, linux-btrfs, Kernel Team

On 7/20/22 7:47 AM, Sweet Tea Dorminy wrote:
> 
> 
> On 7/19/22 00:09, Ioannis Angelakopoulos wrote:
>> Add a lockdep annotation for the transaction states that have wait
>> events; 1) TRANS_STATE_COMMIT_START, 2) TRANS_STATE_UNBLOCKED, 3)
>> TRANS_STATE_SUPER_COMMITTED, and 4) TRANS_STATE_COMPLETED in
>> fs/btrfs/transaction.c.
>>
>> With the exception of the lockdep annotation for TRANS_STATE_COMMIT_START
>> the transaction thread has to acquire the lockdep maps for the 
>> transaction
>> states as reader after the lockdep map for num_writers is released so 
>> that
>> lockdep does not complain.
> 
> "acquire the lockdep maps for the transaction states as reader ..." took 
> a couple readings for me to grasp. Maybe "get a notional read lock on 
> the transaction state only after releasing the notional lock on 
> num_writers, as lockdep requires locks to be acquired/released in 
> [somehow]". (I don't know how lockdep complains, but I think saying why 
> in this commit message would help me understand the 'why' better.)
>
I will make the comment here or the one below more detailed. Yes it has 
to do with the lock ordering. As it is, the thread has first acquired 
the btrfs_trans_num_writers lock. In case it acquired the state locks 
before releasing the btrfs_trans_num_writers lock, it would first 
release the btrfs_trans_num_writers lock and then the state locks. Thus 
lockdep would complain about a potential deadlock, since locks should be 
released in the reverse order they are acquired.
>> @@ -2323,6 +2344,15 @@ int btrfs_commit_transaction(struct 
>> btrfs_trans_handle *trans)
>>       wait_event(cur_trans->writer_wait,
>>              atomic_read(&cur_trans->num_writers) == 1);
>> +    /*
>> +     * Make lockdep happy by acquiring the state locks after
>> +     * btrfs_trans_num_writers is released.
>> +     */
> 
> Same sort of comment here: elaborating on why it makes lockdep happy 
> would help me understand this better.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/5] btrfs: Add lockdep models for the transaction states wait events
  2022-07-20 14:48   ` Josef Bacik
@ 2022-07-20 17:49     ` Ioannis Angelakopoulos
  0 siblings, 0 replies; 17+ messages in thread
From: Ioannis Angelakopoulos @ 2022-07-20 17:49 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, Kernel Team

On 7/20/22 7:48 AM, Josef Bacik wrote:
> On Mon, Jul 18, 2022 at 09:09:56PM -0700, Ioannis Angelakopoulos wrote:
>> Add a lockdep annotation for the transaction states that have wait
>> events; 1) TRANS_STATE_COMMIT_START, 2) TRANS_STATE_UNBLOCKED, 3)
>> TRANS_STATE_SUPER_COMMITTED, and 4) TRANS_STATE_COMPLETED in
>> fs/btrfs/transaction.c.
>>
>> With the exception of the lockdep annotation for TRANS_STATE_COMMIT_START
>> the transaction thread has to acquire the lockdep maps for the transaction
>> states as reader after the lockdep map for num_writers is released so that
>> lockdep does not complain.
>>
>>
>> Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
>> ---
>>   fs/btrfs/ctree.h       | 20 +++++++++++++++
>>   fs/btrfs/disk-io.c     | 17 +++++++++++++
>>   fs/btrfs/transaction.c | 57 +++++++++++++++++++++++++++++++++++++-----
>>   3 files changed, 88 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 586756f831e5..e6c7cafcd296 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1097,6 +1097,7 @@ struct btrfs_fs_info {
>>   
>>   	struct lockdep_map btrfs_trans_num_writers_map;
>>   	struct lockdep_map btrfs_trans_num_extwriters_map;
>> +	struct lockdep_map btrfs_state_change_map[4];
>>   
>>   #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>>   	spinlock_t ref_verify_lock;
>> @@ -1178,6 +1179,13 @@ enum {
>>   	BTRFS_ROOT_UNFINISHED_DROP,
>>   };
>>   
>> +enum btrfs_lockdep_trans_states {
>> +	BTRFS_LOCKDEP_TRANS_COMMIT_START,
>> +	BTRFS_LOCKDEP_TRANS_UNBLOCKED,
>> +	BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED,
>> +	BTRFS_LOCKDEP_TRANS_COMPLETED,
>> +};
>> +
>>   #define btrfs_might_wait_for_event(b, lock) \
>>   	do { \
>>   		rwsem_acquire(&b->lock##_map, 0, 0, _THIS_IP_); \
>> @@ -1190,6 +1198,18 @@ enum {
>>   #define btrfs_lockdep_release(b, lock) \
>>   	rwsem_release(&b->lock##_map, _THIS_IP_)
>>   
>> +#define btrfs_might_wait_for_state(b, i) \
>> +	do { \
>> +		rwsem_acquire(&b->btrfs_state_change_map[i], 0, 0, _THIS_IP_); \
>> +		rwsem_release(&b->btrfs_state_change_map[i], _THIS_IP_); \
>> +	} while (0)
>> +
>> +#define btrfs_trans_state_lockdep_acquire(b, i) \
>> +	rwsem_acquire_read(&b->btrfs_state_change_map[i], 0, 0, _THIS_IP_)
>> +
>> +#define btrfs_trans_state_lockdep_release(b, i) \
>> +	rwsem_release(&b->btrfs_state_change_map[i], _THIS_IP_)
>> +
>>   static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
>>   {
>>   	clear_and_wake_up_bit(BTRFS_FS_UNFINISHED_DROPS, &fs_info->flags);
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index b1193584ba49..be5cf86fa992 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3048,6 +3048,10 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>>   {
>>   	static struct lock_class_key btrfs_trans_num_writers_key;
>>   	static struct lock_class_key btrfs_trans_num_extwriters_key;
>> +	static struct lock_class_key btrfs_trans_commit_start_key;
>> +	static struct lock_class_key btrfs_trans_unblocked_key;
>> +	static struct lock_class_key btrfs_trans_sup_committed_key;
>> +	static struct lock_class_key btrfs_trans_completed_key;
>>   
>>   	INIT_RADIX_TREE(&fs_info->fs_roots_radix, GFP_ATOMIC);
>>   	INIT_RADIX_TREE(&fs_info->buffer_radix, GFP_ATOMIC);
>> @@ -3084,6 +3088,19 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>>   					 "btrfs_trans_num_extwriters",
>>   					 &btrfs_trans_num_extwriters_key, 0);
>>   
>> +	lockdep_init_map(&fs_info->btrfs_state_change_map[0],
>> +					 "btrfs_trans_commit_start",
>> +					 &btrfs_trans_commit_start_key, 0);
>> +	lockdep_init_map(&fs_info->btrfs_state_change_map[1],
>> +					 "btrfs_trans_unblocked",
>> +					 &btrfs_trans_unblocked_key, 0);
>> +	lockdep_init_map(&fs_info->btrfs_state_change_map[2],
>> +					 "btrfs_trans_sup_commited",
> 
> s/commited/committed/
> 
>> +					 &btrfs_trans_sup_committed_key, 0);
>> +	lockdep_init_map(&fs_info->btrfs_state_change_map[3],
>> +					 "btrfs_trans_completed",
>> +					 &btrfs_trans_completed_key, 0);
>> +
>>   	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
>>   	INIT_LIST_HEAD(&fs_info->space_info);
>>   	INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index c9751a05c029..e4efaa27ec17 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -550,6 +550,7 @@ static void wait_current_trans(struct btrfs_fs_info *fs_info)
>>   		refcount_inc(&cur_trans->use_count);
>>   		spin_unlock(&fs_info->trans_lock);
>>   
>> +		btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_UNBLOCKED);
>>   		wait_event(fs_info->transaction_wait,
>>   			   cur_trans->state >= TRANS_STATE_UNBLOCKED ||
>>   			   TRANS_ABORTED(cur_trans));
>> @@ -949,6 +950,7 @@ int btrfs_wait_for_commit(struct btrfs_fs_info *fs_info, u64 transid)
>>   			goto out;  /* nothing committing|committed */
>>   	}
>>   
>> +	btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_COMPLETED);
>>   	wait_for_commit(cur_trans, TRANS_STATE_COMPLETED);
>>   	btrfs_put_transaction(cur_trans);
>>   out:
>> @@ -1980,6 +1982,7 @@ void btrfs_commit_transaction_async(struct btrfs_trans_handle *trans)
>>   	 * Wait for the current transaction commit to start and block
>>   	 * subsequent transaction joins
>>   	 */
>> +	btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_COMMIT_START);
>>   	wait_event(fs_info->transaction_blocked_wait,
>>   		   cur_trans->state >= TRANS_STATE_COMMIT_START ||
>>   		   TRANS_ABORTED(cur_trans));
>> @@ -2137,14 +2140,16 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>>   	ktime_t interval;
>>   
>>   	ASSERT(refcount_read(&trans->use_count) == 1);
>> +	btrfs_trans_state_lockdep_acquire(fs_info,
>> +					  BTRFS_LOCKDEP_TRANS_COMMIT_START);
>>   
>>   	/* Stop the commit early if ->aborted is set */
>>   	if (TRANS_ABORTED(cur_trans)) {
>>   		ret = cur_trans->aborted;
>> -		btrfs_end_transaction(trans);
>> -		return ret;
>> +		goto lockdep_trans_commit_start_release;
>>   	}
>>   
>> +
>>   	btrfs_trans_release_metadata(trans);
>>   	trans->block_rsv = NULL;
>>   
>> @@ -2160,8 +2165,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>>   		 */
>>   		ret = btrfs_run_delayed_refs(trans, 0);
>>   		if (ret) {
>> -			btrfs_end_transaction(trans);
>> -			return ret;
>> +			goto lockdep_trans_commit_start_release;
>>   		}
>>   	}
>>   
>> @@ -2192,8 +2196,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>>   		if (run_it) {
>>   			ret = btrfs_start_dirty_block_groups(trans);
>>   			if (ret) {
>> -				btrfs_end_transaction(trans);
>> -				return ret;
>> +				goto lockdep_trans_commit_start_release;
>>   			}
>>   		}
>>   	}
>> @@ -2209,7 +2212,17 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>>   
>>   		if (trans->in_fsync)
>>   			want_state = TRANS_STATE_SUPER_COMMITTED;
>> +
>> +		btrfs_trans_state_lockdep_release(fs_info,
>> +						  BTRFS_LOCKDEP_TRANS_COMMIT_START);
>>   		ret = btrfs_end_transaction(trans);
>> +
>> +		if (want_state == TRANS_STATE_COMPLETED)
>> +			btrfs_might_wait_for_state(fs_info, BTRFS_LOCKDEP_TRANS_COMPLETED);
>> +		else
>> +			btrfs_might_wait_for_state(fs_info,
>> +						   BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED);
>> +
>>   		wait_for_commit(cur_trans, want_state);
>>   
>>   		if (TRANS_ABORTED(cur_trans))
>> @@ -2222,6 +2235,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>>   
>>   	cur_trans->state = TRANS_STATE_COMMIT_START;
>>   	wake_up(&fs_info->transaction_blocked_wait);
>> +	btrfs_trans_state_lockdep_release(fs_info,
>> +					  BTRFS_LOCKDEP_TRANS_COMMIT_START);
>>   
>>   	if (cur_trans->list.prev != &fs_info->trans_list) {
>>   		enum btrfs_trans_state want_state = TRANS_STATE_COMPLETED;
>> @@ -2235,6 +2250,12 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>>   			refcount_inc(&prev_trans->use_count);
>>   			spin_unlock(&fs_info->trans_lock);
>>   
>> +			if (want_state == TRANS_STATE_COMPLETED)
>> +				btrfs_might_wait_for_state(fs_info,
>> +							   BTRFS_LOCKDEP_TRANS_COMPLETED);
>> +			else
>> +				btrfs_might_wait_for_state(fs_info,
>> +							   BTRFS_LOCKDEP_TRANS_SUPER_COMMITTED);
> 
> You do this everywhere we call wait_for_commit(), you can push these
> btrfs_might_wait_for_state() into wait_for_commit and make everything a bit
> cleaner.  Thanks,
> 
> Josef
Will implement the change. Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-07-20 17:49 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-19  4:09 [PATCH v2 0/5] btrfs: Annotate wait events with lockdep Ioannis Angelakopoulos
2022-07-19  4:09 ` [PATCH v2 1/5] btrfs: Add a lockdep model for the num_writers wait event Ioannis Angelakopoulos
2022-07-20 14:46   ` Josef Bacik
2022-07-20 14:47   ` Sweet Tea Dorminy
2022-07-20 17:12     ` Ioannis Angelakopoulos
2022-07-19  4:09 ` [PATCH v2 2/5] btrfs: Add a lockdep model for the num_extwriters " Ioannis Angelakopoulos
2022-07-20 14:46   ` Josef Bacik
2022-07-19  4:09 ` [PATCH v2 3/5] btrfs: Add lockdep models for the transaction states wait events Ioannis Angelakopoulos
2022-07-20 14:47   ` Sweet Tea Dorminy
2022-07-20 17:39     ` Ioannis Angelakopoulos
2022-07-20 14:48   ` Josef Bacik
2022-07-20 17:49     ` Ioannis Angelakopoulos
2022-07-19  4:09 ` [PATCH v2 4/5] btrfs: Add a lockdep model for the pending_ordered wait event Ioannis Angelakopoulos
2022-07-20 14:48   ` Josef Bacik
2022-07-19  4:10 ` [PATCH v2 5/5] btrfs: Add a lockdep model for the ordered extents " Ioannis Angelakopoulos
2022-07-20 14:50   ` Josef Bacik
2022-07-20 14:47 ` [PATCH v2 0/5] btrfs: Annotate wait events with lockdep Sweet Tea Dorminy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.