All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent
@ 2018-10-22  9:09 fdmanana
  2018-10-22  9:53 ` Filipe Manana
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: fdmanana @ 2018-10-22  9:09 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When we are writing out a free space cache, during the transaction commit
phase, we can end up in a deadlock which results in a stack trace like the
following:

 schedule+0x28/0x80
 btrfs_tree_read_lock+0x8e/0x120 [btrfs]
 ? finish_wait+0x80/0x80
 btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
 btrfs_search_slot+0xf6/0x9f0 [btrfs]
 ? evict_refill_and_join+0xd0/0xd0 [btrfs]
 ? inode_insert5+0x119/0x190
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_iget+0x113/0x690 [btrfs]
 __lookup_free_space_inode+0xd8/0x150 [btrfs]
 lookup_free_space_inode+0x5b/0xb0 [btrfs]
 load_free_space_cache+0x7c/0x170 [btrfs]
 ? cache_block_group+0x72/0x3b0 [btrfs]
 cache_block_group+0x1b3/0x3b0 [btrfs]
 ? finish_wait+0x80/0x80
 find_free_extent+0x799/0x1010 [btrfs]
 btrfs_reserve_extent+0x9b/0x180 [btrfs]
 btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
 __btrfs_cow_block+0x11d/0x500 [btrfs]
 btrfs_cow_block+0xdc/0x180 [btrfs]
 btrfs_search_slot+0x3bd/0x9f0 [btrfs]
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_update_inode_item+0x46/0x100 [btrfs]
 cache_save_setup+0xe4/0x3a0 [btrfs]
 btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
 btrfs_commit_transaction+0xcb/0x8b0 [btrfs]

At cache_save_setup() we need to update the inode item of a block group's
cache which is located in the tree root (fs_info->tree_root), which means
that it may result in COWing a leaf from that tree. If that happens we
need to find a free metadata extent and while looking for one, if we find
a block group which was not cached yet we attempt to load its cache by
calling cache_block_group(). However this function will try to load the
inode of the free space cache, which requires finding the matching inode
item in the tree root - if that inode item is located in the same leaf as
the inode item of the space cache we are updating at cache_save_setup(),
we end up in a deadlock, since we try to obtain a read lock on the same
extent buffer that we previously write locked.

So fix this by skipping the loading of free space caches of any block
groups that are not yet cached (rare cases) if we are updating the inode
of a free space cache. This is a rare case and its downside is failure to
find a free extent (return -ENOSPC) when all the already cached block
groups have no free extents.

Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.h       |  3 +++
 fs/btrfs/disk-io.c     |  2 ++
 fs/btrfs/extent-tree.c | 22 +++++++++++++++++++++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2cddfe7806a4..d23ee26eb17d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1121,6 +1121,9 @@ struct btrfs_fs_info {
 	u32 sectorsize;
 	u32 stripesize;
 
+	/* The task currently updating a free space cache inode item. */
+	struct task_struct *space_cache_updater;
+
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
 	struct rb_root block_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 05dc3c17cb62..aa5e9a91e560 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2782,6 +2782,8 @@ int open_ctree(struct super_block *sb,
 	fs_info->sectorsize = 4096;
 	fs_info->stripesize = 4096;
 
+	fs_info->space_cache_updater = NULL;
+
 	ret = btrfs_alloc_stripe_hash_table(fs_info);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 577878324799..e93040449771 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3364,7 +3364,9 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
 	 * time.
 	 */
 	BTRFS_I(inode)->generation = 0;
+	fs_info->space_cache_updater = current;
 	ret = btrfs_update_inode(trans, root, inode);
+	fs_info->space_cache_updater = NULL;
 	if (ret) {
 		/*
 		 * So theoretically we could recover from this, simply set the
@@ -7366,7 +7368,25 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 
 have_block_group:
 		cached = block_group_cache_done(block_group);
-		if (unlikely(!cached)) {
+		/*
+		 * If we are updating the inode of a free space cache, we can
+		 * not start the caching of any block group because we could
+		 * deadlock on an extent buffer of the root tree.
+		 * At cache_save_setup() we update the inode item of a free
+		 * space cache, so we may need to COW a leaf of the root tree,
+		 * which implies finding a free metadata extent. So if when
+		 * searching for such an extent we find a block group that was
+		 * not yet cached (which is unlikely), we can not start loading
+		 * or building its free space cache because that implies reading
+		 * its inode from disk (load_free_space_cache()) which implies
+		 * searching the root tree for its inode item, which can be
+		 * located in the same leaf that we previously locked at
+		 * cache_save_setup() for updating the inode item of the former
+		 * free space cache, therefore leading to an attempt to lock the
+		 * same leaf twice.
+		 */
+		if (unlikely(!cached) &&
+		    fs_info->space_cache_updater != current) {
 			have_caching_bg = true;
 			ret = cache_block_group(block_group, 0);
 			BUG_ON(ret < 0);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-22  9:09 [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent fdmanana
@ 2018-10-22  9:53 ` Filipe Manana
  2018-10-22  9:54 ` Filipe Manana
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Filipe Manana @ 2018-10-22  9:53 UTC (permalink / raw)
  To: linux-btrfs

On Mon, Oct 22, 2018 at 10:10 AM <fdmanana@kernel.org> wrote:
>
> From: Filipe Manana <fdmanana@suse.com>
>
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
>
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
>
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
>
> So fix this by skipping the loading of free space caches of any block
> groups that are not yet cached (rare cases) if we are updating the inode
> of a free space cache. This is a rare case and its downside is failure to
> find a free extent (return -ENOSPC) when all the already cached block
> groups have no free extents.
>
> Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Andrew Nelson <andrew.s.nelson@gmail.com>


> ---
>  fs/btrfs/ctree.h       |  3 +++
>  fs/btrfs/disk-io.c     |  2 ++
>  fs/btrfs/extent-tree.c | 22 +++++++++++++++++++++-
>  3 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 2cddfe7806a4..d23ee26eb17d 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1121,6 +1121,9 @@ struct btrfs_fs_info {
>         u32 sectorsize;
>         u32 stripesize;
>
> +       /* The task currently updating a free space cache inode item. */
> +       struct task_struct *space_cache_updater;
> +
>  #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>         spinlock_t ref_verify_lock;
>         struct rb_root block_tree;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 05dc3c17cb62..aa5e9a91e560 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2782,6 +2782,8 @@ int open_ctree(struct super_block *sb,
>         fs_info->sectorsize = 4096;
>         fs_info->stripesize = 4096;
>
> +       fs_info->space_cache_updater = NULL;
> +
>         ret = btrfs_alloc_stripe_hash_table(fs_info);
>         if (ret) {
>                 err = ret;
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 577878324799..e93040449771 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3364,7 +3364,9 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
>          * time.
>          */
>         BTRFS_I(inode)->generation = 0;
> +       fs_info->space_cache_updater = current;
>         ret = btrfs_update_inode(trans, root, inode);
> +       fs_info->space_cache_updater = NULL;
>         if (ret) {
>                 /*
>                  * So theoretically we could recover from this, simply set the
> @@ -7366,7 +7368,25 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>
>  have_block_group:
>                 cached = block_group_cache_done(block_group);
> -               if (unlikely(!cached)) {
> +               /*
> +                * If we are updating the inode of a free space cache, we can
> +                * not start the caching of any block group because we could
> +                * deadlock on an extent buffer of the root tree.
> +                * At cache_save_setup() we update the inode item of a free
> +                * space cache, so we may need to COW a leaf of the root tree,
> +                * which implies finding a free metadata extent. So if when
> +                * searching for such an extent we find a block group that was
> +                * not yet cached (which is unlikely), we can not start loading
> +                * or building its free space cache because that implies reading
> +                * its inode from disk (load_free_space_cache()) which implies
> +                * searching the root tree for its inode item, which can be
> +                * located in the same leaf that we previously locked at
> +                * cache_save_setup() for updating the inode item of the former
> +                * free space cache, therefore leading to an attempt to lock the
> +                * same leaf twice.
> +                */
> +               if (unlikely(!cached) &&
> +                   fs_info->space_cache_updater != current) {
>                         have_caching_bg = true;
>                         ret = cache_block_group(block_group, 0);
>                         BUG_ON(ret < 0);
> --
> 2.11.0
>

-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-22  9:09 [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent fdmanana
  2018-10-22  9:53 ` Filipe Manana
@ 2018-10-22  9:54 ` Filipe Manana
  2018-10-22 18:07 ` Josef Bacik
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Filipe Manana @ 2018-10-22  9:54 UTC (permalink / raw)
  To: linux-btrfs

On Mon, Oct 22, 2018 at 10:10 AM <fdmanana@kernel.org> wrote:
>
> From: Filipe Manana <fdmanana@suse.com>
>
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
>
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
>
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
>
> So fix this by skipping the loading of free space caches of any block
> groups that are not yet cached (rare cases) if we are updating the inode
> of a free space cache. This is a rare case and its downside is failure to
> find a free extent (return -ENOSPC) when all the already cached block
> groups have no free extents.
>
> Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>

> ---
>  fs/btrfs/ctree.h       |  3 +++
>  fs/btrfs/disk-io.c     |  2 ++
>  fs/btrfs/extent-tree.c | 22 +++++++++++++++++++++-
>  3 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 2cddfe7806a4..d23ee26eb17d 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1121,6 +1121,9 @@ struct btrfs_fs_info {
>         u32 sectorsize;
>         u32 stripesize;
>
> +       /* The task currently updating a free space cache inode item. */
> +       struct task_struct *space_cache_updater;
> +
>  #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>         spinlock_t ref_verify_lock;
>         struct rb_root block_tree;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 05dc3c17cb62..aa5e9a91e560 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2782,6 +2782,8 @@ int open_ctree(struct super_block *sb,
>         fs_info->sectorsize = 4096;
>         fs_info->stripesize = 4096;
>
> +       fs_info->space_cache_updater = NULL;
> +
>         ret = btrfs_alloc_stripe_hash_table(fs_info);
>         if (ret) {
>                 err = ret;
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 577878324799..e93040449771 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3364,7 +3364,9 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
>          * time.
>          */
>         BTRFS_I(inode)->generation = 0;
> +       fs_info->space_cache_updater = current;
>         ret = btrfs_update_inode(trans, root, inode);
> +       fs_info->space_cache_updater = NULL;
>         if (ret) {
>                 /*
>                  * So theoretically we could recover from this, simply set the
> @@ -7366,7 +7368,25 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
>
>  have_block_group:
>                 cached = block_group_cache_done(block_group);
> -               if (unlikely(!cached)) {
> +               /*
> +                * If we are updating the inode of a free space cache, we can
> +                * not start the caching of any block group because we could
> +                * deadlock on an extent buffer of the root tree.
> +                * At cache_save_setup() we update the inode item of a free
> +                * space cache, so we may need to COW a leaf of the root tree,
> +                * which implies finding a free metadata extent. So if when
> +                * searching for such an extent we find a block group that was
> +                * not yet cached (which is unlikely), we can not start loading
> +                * or building its free space cache because that implies reading
> +                * its inode from disk (load_free_space_cache()) which implies
> +                * searching the root tree for its inode item, which can be
> +                * located in the same leaf that we previously locked at
> +                * cache_save_setup() for updating the inode item of the former
> +                * free space cache, therefore leading to an attempt to lock the
> +                * same leaf twice.
> +                */
> +               if (unlikely(!cached) &&
> +                   fs_info->space_cache_updater != current) {
>                         have_caching_bg = true;
>                         ret = cache_block_group(block_group, 0);
>                         BUG_ON(ret < 0);
> --
> 2.11.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-22  9:09 [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent fdmanana
  2018-10-22  9:53 ` Filipe Manana
  2018-10-22  9:54 ` Filipe Manana
@ 2018-10-22 18:07 ` Josef Bacik
  2018-10-22 18:51   ` Filipe Manana
  2018-10-22 18:48 ` [PATCH v2] " fdmanana
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Josef Bacik @ 2018-10-22 18:07 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Mon, Oct 22, 2018 at 10:09:46AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
> 
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> 
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
> 
> So fix this by skipping the loading of free space caches of any block
> groups that are not yet cached (rare cases) if we are updating the inode
> of a free space cache. This is a rare case and its downside is failure to
> find a free extent (return -ENOSPC) when all the already cached block
> groups have no free extents.
> 

Actually isn't this a problem for anything that tries to allocate an extent
while in the tree_root?  Like we snapshot or make a subvolume or anything?  We
should just disallow if root == tree_root.  But even then we only need to do
this if we're using SPACE_CACHE, using the ye-olde caching or the free space
tree are both ok.  Let's just limit it to those cases.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-22  9:09 [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent fdmanana
                   ` (2 preceding siblings ...)
  2018-10-22 18:07 ` Josef Bacik
@ 2018-10-22 18:48 ` fdmanana
  2018-10-22 18:55   ` Josef Bacik
  2018-10-22 19:10 ` [PATCH v3] " fdmanana
  2018-10-24  9:13 ` [PATCH v4] " fdmanana
  5 siblings, 1 reply; 24+ messages in thread
From: fdmanana @ 2018-10-22 18:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: josef, Filipe Manana

From: Filipe Manana <fdmanana@suse.com>

When we are writing out a free space cache, during the transaction commit
phase, we can end up in a deadlock which results in a stack trace like the
following:

 schedule+0x28/0x80
 btrfs_tree_read_lock+0x8e/0x120 [btrfs]
 ? finish_wait+0x80/0x80
 btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
 btrfs_search_slot+0xf6/0x9f0 [btrfs]
 ? evict_refill_and_join+0xd0/0xd0 [btrfs]
 ? inode_insert5+0x119/0x190
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_iget+0x113/0x690 [btrfs]
 __lookup_free_space_inode+0xd8/0x150 [btrfs]
 lookup_free_space_inode+0x5b/0xb0 [btrfs]
 load_free_space_cache+0x7c/0x170 [btrfs]
 ? cache_block_group+0x72/0x3b0 [btrfs]
 cache_block_group+0x1b3/0x3b0 [btrfs]
 ? finish_wait+0x80/0x80
 find_free_extent+0x799/0x1010 [btrfs]
 btrfs_reserve_extent+0x9b/0x180 [btrfs]
 btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
 __btrfs_cow_block+0x11d/0x500 [btrfs]
 btrfs_cow_block+0xdc/0x180 [btrfs]
 btrfs_search_slot+0x3bd/0x9f0 [btrfs]
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_update_inode_item+0x46/0x100 [btrfs]
 cache_save_setup+0xe4/0x3a0 [btrfs]
 btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
 btrfs_commit_transaction+0xcb/0x8b0 [btrfs]

At cache_save_setup() we need to update the inode item of a block group's
cache which is located in the tree root (fs_info->tree_root), which means
that it may result in COWing a leaf from that tree. If that happens we
need to find a free metadata extent and while looking for one, if we find
a block group which was not cached yet we attempt to load its cache by
calling cache_block_group(). However this function will try to load the
inode of the free space cache, which requires finding the matching inode
item in the tree root - if that inode item is located in the same leaf as
the inode item of the space cache we are updating at cache_save_setup(),
we end up in a deadlock, since we try to obtain a read lock on the same
extent buffer that we previously write locked.

So fix this by skipping the loading of free space caches of any block
groups that are not yet cached (rare cases) if we are COWing an extent
buffer from the root tree and space caching is enabled (-o space_cache
mount option). This is a rare case and its downside is failure to
find a free extent (return -ENOSPC) when all the already cached block
groups have no free extents.

Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Made the solution more generic, since the problem could happen in any
    path COWing an extent buffer from the root tree.

    Applies on top of a previous patch titled:

     "Btrfs: fix deadlock when writing out free space caches"

 fs/btrfs/ctree.c       |  4 ++++
 fs/btrfs/ctree.h       |  3 +++
 fs/btrfs/disk-io.c     |  2 ++
 fs/btrfs/extent-tree.c | 15 ++++++++++++++-
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 089b46c4d97f..646aafda55a3 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1065,10 +1065,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
 	    root == fs_info->chunk_root ||
 	    root == fs_info->dev_root)
 		trans->can_flush_pending_bgs = false;
+	else if (root == fs_info->tree_root)
+		atomic_inc(&fs_info->tree_root_cows);
 
 	cow = btrfs_alloc_tree_block(trans, root, parent_start,
 			root->root_key.objectid, &disk_key, level,
 			search_start, empty_size);
+	if (root == fs_info->tree_root)
+		atomic_dec(&fs_info->tree_root_cows);
 	trans->can_flush_pending_bgs = true;
 	if (IS_ERR(cow))
 		return PTR_ERR(cow);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2cddfe7806a4..1b73433c69e2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1121,6 +1121,9 @@ struct btrfs_fs_info {
 	u32 sectorsize;
 	u32 stripesize;
 
+	/* Number of tasks corrently COWing a leaf/node from the tree root. */
+	atomic_t tree_root_cows;
+
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
 	struct rb_root block_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 05dc3c17cb62..08c15bf69fb5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2782,6 +2782,8 @@ int open_ctree(struct super_block *sb,
 	fs_info->sectorsize = 4096;
 	fs_info->stripesize = 4096;
 
+	atomic_set(&fs_info->tree_root_cows, 0);
+
 	ret = btrfs_alloc_stripe_hash_table(fs_info);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 577878324799..14f35e020050 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7366,7 +7366,20 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 
 have_block_group:
 		cached = block_group_cache_done(block_group);
-		if (unlikely(!cached)) {
+		/*
+		 * If we are COWing a leaf/node from the root tree, we can not
+		 * start caching of a block group because we could deadlock on
+		 * an extent buffer of the root tree.
+		 * Because if we are COWing a leaf from the root tree, we are
+		 * holding a write lock on the respective extent buffer, and
+		 * loading the space cache of a block group requires searching
+		 * for its inode item in the root tree, which can be located
+		 * in the same leaf that we previously write locked, in which
+		 * case we will deadlock.
+		 */
+		if (unlikely(!cached) &&
+		    (atomic_read(&fs_info->tree_root_cows) == 0 ||
+		     !btrfs_test_opt(fs_info, SPACE_CACHE))) {
 			have_caching_bg = true;
 			ret = cache_block_group(block_group, 0);
 			BUG_ON(ret < 0);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-22 18:07 ` Josef Bacik
@ 2018-10-22 18:51   ` Filipe Manana
  0 siblings, 0 replies; 24+ messages in thread
From: Filipe Manana @ 2018-10-22 18:51 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On Mon, Oct 22, 2018 at 7:07 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Mon, Oct 22, 2018 at 10:09:46AM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When we are writing out a free space cache, during the transaction commit
> > phase, we can end up in a deadlock which results in a stack trace like the
> > following:
> >
> >  schedule+0x28/0x80
> >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> >  ? inode_insert5+0x119/0x190
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_iget+0x113/0x690 [btrfs]
> >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> >  load_free_space_cache+0x7c/0x170 [btrfs]
> >  ? cache_block_group+0x72/0x3b0 [btrfs]
> >  cache_block_group+0x1b3/0x3b0 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  find_free_extent+0x799/0x1010 [btrfs]
> >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> >  btrfs_cow_block+0xdc/0x180 [btrfs]
> >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> >  cache_save_setup+0xe4/0x3a0 [btrfs]
> >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> >
> > At cache_save_setup() we need to update the inode item of a block group's
> > cache which is located in the tree root (fs_info->tree_root), which means
> > that it may result in COWing a leaf from that tree. If that happens we
> > need to find a free metadata extent and while looking for one, if we find
> > a block group which was not cached yet we attempt to load its cache by
> > calling cache_block_group(). However this function will try to load the
> > inode of the free space cache, which requires finding the matching inode
> > item in the tree root - if that inode item is located in the same leaf as
> > the inode item of the space cache we are updating at cache_save_setup(),
> > we end up in a deadlock, since we try to obtain a read lock on the same
> > extent buffer that we previously write locked.
> >
> > So fix this by skipping the loading of free space caches of any block
> > groups that are not yet cached (rare cases) if we are updating the inode
> > of a free space cache. This is a rare case and its downside is failure to
> > find a free extent (return -ENOSPC) when all the already cached block
> > groups have no free extents.
> >
>
> Actually isn't this a problem for anything that tries to allocate an extent
> while in the tree_root?  Like we snapshot or make a subvolume or anything?

Indeed. Initially I considered making it more generic (like the recent
fix for deadlock when cowing from extent/chunk/device tree) but I
totally forgot about the other cases like you mentioned.

>  We
> should just disallow if root == tree_root.  But even then we only need to do
> this if we're using SPACE_CACHE, using the ye-olde caching or the free space
> tree are both ok.  Let's just limit it to those cases.  Thanks,

Yep, makes all sense.

Thanks! V2 sent out.

>
> Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-22 18:48 ` [PATCH v2] " fdmanana
@ 2018-10-22 18:55   ` Josef Bacik
  2018-10-22 19:10     ` Filipe Manana
  0 siblings, 1 reply; 24+ messages in thread
From: Josef Bacik @ 2018-10-22 18:55 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, josef, Filipe Manana

On Mon, Oct 22, 2018 at 07:48:30PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
> 
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> 
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
> 
> So fix this by skipping the loading of free space caches of any block
> groups that are not yet cached (rare cases) if we are COWing an extent
> buffer from the root tree and space caching is enabled (-o space_cache
> mount option). This is a rare case and its downside is failure to
> find a free extent (return -ENOSPC) when all the already cached block
> groups have no free extents.
> 
> Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> 
> V2: Made the solution more generic, since the problem could happen in any
>     path COWing an extent buffer from the root tree.
> 
>     Applies on top of a previous patch titled:
> 
>      "Btrfs: fix deadlock when writing out free space caches"
> 
>  fs/btrfs/ctree.c       |  4 ++++
>  fs/btrfs/ctree.h       |  3 +++
>  fs/btrfs/disk-io.c     |  2 ++
>  fs/btrfs/extent-tree.c | 15 ++++++++++++++-
>  4 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 089b46c4d97f..646aafda55a3 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1065,10 +1065,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
>  	    root == fs_info->chunk_root ||
>  	    root == fs_info->dev_root)
>  		trans->can_flush_pending_bgs = false;
> +	else if (root == fs_info->tree_root)
> +		atomic_inc(&fs_info->tree_root_cows);
>  
>  	cow = btrfs_alloc_tree_block(trans, root, parent_start,
>  			root->root_key.objectid, &disk_key, level,
>  			search_start, empty_size);
> +	if (root == fs_info->tree_root)
> +		atomic_dec(&fs_info->tree_root_cows);

Do we need this though?  Our root should be the root we're cow'ing the block
for, and it should be passed all the way down to find_free_extent properly, so
we really should be able to just do if (root == fs_info->tree_root) and not add
all this stuff.

Not to mention this will race with anybody else doing stuff, so if another
thread that isn't actually touching the tree_root it would skip caching a block
group when it's completely ok for that thread to do it.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-22 18:55   ` Josef Bacik
@ 2018-10-22 19:10     ` Filipe Manana
  0 siblings, 0 replies; 24+ messages in thread
From: Filipe Manana @ 2018-10-22 19:10 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, Filipe David Borba Manana

On Mon, Oct 22, 2018 at 7:56 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Mon, Oct 22, 2018 at 07:48:30PM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When we are writing out a free space cache, during the transaction commit
> > phase, we can end up in a deadlock which results in a stack trace like the
> > following:
> >
> >  schedule+0x28/0x80
> >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> >  ? inode_insert5+0x119/0x190
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_iget+0x113/0x690 [btrfs]
> >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> >  load_free_space_cache+0x7c/0x170 [btrfs]
> >  ? cache_block_group+0x72/0x3b0 [btrfs]
> >  cache_block_group+0x1b3/0x3b0 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  find_free_extent+0x799/0x1010 [btrfs]
> >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> >  btrfs_cow_block+0xdc/0x180 [btrfs]
> >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> >  cache_save_setup+0xe4/0x3a0 [btrfs]
> >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> >
> > At cache_save_setup() we need to update the inode item of a block group's
> > cache which is located in the tree root (fs_info->tree_root), which means
> > that it may result in COWing a leaf from that tree. If that happens we
> > need to find a free metadata extent and while looking for one, if we find
> > a block group which was not cached yet we attempt to load its cache by
> > calling cache_block_group(). However this function will try to load the
> > inode of the free space cache, which requires finding the matching inode
> > item in the tree root - if that inode item is located in the same leaf as
> > the inode item of the space cache we are updating at cache_save_setup(),
> > we end up in a deadlock, since we try to obtain a read lock on the same
> > extent buffer that we previously write locked.
> >
> > So fix this by skipping the loading of free space caches of any block
> > groups that are not yet cached (rare cases) if we are COWing an extent
> > buffer from the root tree and space caching is enabled (-o space_cache
> > mount option). This is a rare case and its downside is failure to
> > find a free extent (return -ENOSPC) when all the already cached block
> > groups have no free extents.
> >
> > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >
> > V2: Made the solution more generic, since the problem could happen in any
> >     path COWing an extent buffer from the root tree.
> >
> >     Applies on top of a previous patch titled:
> >
> >      "Btrfs: fix deadlock when writing out free space caches"
> >
> >  fs/btrfs/ctree.c       |  4 ++++
> >  fs/btrfs/ctree.h       |  3 +++
> >  fs/btrfs/disk-io.c     |  2 ++
> >  fs/btrfs/extent-tree.c | 15 ++++++++++++++-
> >  4 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index 089b46c4d97f..646aafda55a3 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -1065,10 +1065,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
> >           root == fs_info->chunk_root ||
> >           root == fs_info->dev_root)
> >               trans->can_flush_pending_bgs = false;
> > +     else if (root == fs_info->tree_root)
> > +             atomic_inc(&fs_info->tree_root_cows);
> >
> >       cow = btrfs_alloc_tree_block(trans, root, parent_start,
> >                       root->root_key.objectid, &disk_key, level,
> >                       search_start, empty_size);
> > +     if (root == fs_info->tree_root)
> > +             atomic_dec(&fs_info->tree_root_cows);
>
> Do we need this though?  Our root should be the root we're cow'ing the block
> for, and it should be passed all the way down to find_free_extent properly, so
> we really should be able to just do if (root == fs_info->tree_root) and not add
> all this stuff.

Ups, I missed that we could actually pass the root down to find_free_extent().
That's why made the atomic thing.

Sending v3, thanks.

>
> Not to mention this will race with anybody else doing stuff, so if another
> thread that isn't actually touching the tree_root it would skip caching a block
> group when it's completely ok for that thread to do it.  Thanks,
>
> Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-22  9:09 [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent fdmanana
                   ` (3 preceding siblings ...)
  2018-10-22 18:48 ` [PATCH v2] " fdmanana
@ 2018-10-22 19:10 ` fdmanana
  2018-10-22 19:18   ` Josef Bacik
  2018-10-24  9:13 ` [PATCH v4] " fdmanana
  5 siblings, 1 reply; 24+ messages in thread
From: fdmanana @ 2018-10-22 19:10 UTC (permalink / raw)
  To: linux-btrfs; +Cc: josef, Filipe Manana

From: Filipe Manana <fdmanana@suse.com>

When we are writing out a free space cache, during the transaction commit
phase, we can end up in a deadlock which results in a stack trace like the
following:

 schedule+0x28/0x80
 btrfs_tree_read_lock+0x8e/0x120 [btrfs]
 ? finish_wait+0x80/0x80
 btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
 btrfs_search_slot+0xf6/0x9f0 [btrfs]
 ? evict_refill_and_join+0xd0/0xd0 [btrfs]
 ? inode_insert5+0x119/0x190
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_iget+0x113/0x690 [btrfs]
 __lookup_free_space_inode+0xd8/0x150 [btrfs]
 lookup_free_space_inode+0x5b/0xb0 [btrfs]
 load_free_space_cache+0x7c/0x170 [btrfs]
 ? cache_block_group+0x72/0x3b0 [btrfs]
 cache_block_group+0x1b3/0x3b0 [btrfs]
 ? finish_wait+0x80/0x80
 find_free_extent+0x799/0x1010 [btrfs]
 btrfs_reserve_extent+0x9b/0x180 [btrfs]
 btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
 __btrfs_cow_block+0x11d/0x500 [btrfs]
 btrfs_cow_block+0xdc/0x180 [btrfs]
 btrfs_search_slot+0x3bd/0x9f0 [btrfs]
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_update_inode_item+0x46/0x100 [btrfs]
 cache_save_setup+0xe4/0x3a0 [btrfs]
 btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
 btrfs_commit_transaction+0xcb/0x8b0 [btrfs]

At cache_save_setup() we need to update the inode item of a block group's
cache which is located in the tree root (fs_info->tree_root), which means
that it may result in COWing a leaf from that tree. If that happens we
need to find a free metadata extent and while looking for one, if we find
a block group which was not cached yet we attempt to load its cache by
calling cache_block_group(). However this function will try to load the
inode of the free space cache, which requires finding the matching inode
item in the tree root - if that inode item is located in the same leaf as
the inode item of the space cache we are updating at cache_save_setup(),
we end up in a deadlock, since we try to obtain a read lock on the same
extent buffer that we previously write locked.

So fix this by skipping the loading of free space caches of any block
groups that are not yet cached (rare cases) if we are COWing an extent
buffer from the root tree and space caching is enabled (-o space_cache
mount option). This is a rare case and its downside is failure to
find a free extent (return -ENOSPC) when all the already cached block
groups have no free extents.

Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Made the solution more generic, since the problem could happen in any
    path COWing an extent buffer from the root tree.

    Applies on top of a previous patch titled:

     "Btrfs: fix deadlock when writing out free space caches"

V3: Made it more simple by avoiding the atomic from V2 and pass the root
    to find_free_extent().

 fs/btrfs/extent-tree.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 577878324799..e5fd086799ab 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7218,12 +7218,13 @@ btrfs_release_block_group(struct btrfs_block_group_cache *cache,
  * the free space extent currently.
  */
 static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
+				struct btrfs_root *root,
 				u64 ram_bytes, u64 num_bytes, u64 empty_size,
 				u64 hint_byte, struct btrfs_key *ins,
 				u64 flags, int delalloc)
 {
 	int ret = 0;
-	struct btrfs_root *root = fs_info->extent_root;
+	struct btrfs_root *extent_root = fs_info->extent_root;
 	struct btrfs_free_cluster *last_ptr = NULL;
 	struct btrfs_block_group_cache *block_group = NULL;
 	u64 search_start = 0;
@@ -7366,7 +7367,20 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 
 have_block_group:
 		cached = block_group_cache_done(block_group);
-		if (unlikely(!cached)) {
+		/*
+		 * If we are COWing a leaf/node from the root tree, we can not
+		 * start caching of a block group because we could deadlock on
+		 * an extent buffer of the root tree.
+		 * Because if we are COWing a leaf from the root tree, we are
+		 * holding a write lock on the respective extent buffer, and
+		 * loading the space cache of a block group requires searching
+		 * for its inode item in the root tree, which can be located
+		 * in the same leaf that we previously write locked, in which
+		 * case we will deadlock.
+		 */
+		if (unlikely(!cached) &&
+		    (root != fs_info->tree_root ||
+		     !btrfs_test_opt(fs_info, SPACE_CACHE))) {
 			have_caching_bg = true;
 			ret = cache_block_group(block_group, 0);
 			BUG_ON(ret < 0);
@@ -7633,7 +7647,7 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
 			if (trans)
 				exist = 1;
 			else
-				trans = btrfs_join_transaction(root);
+				trans = btrfs_join_transaction(extent_root);
 
 			if (IS_ERR(trans)) {
 				ret = PTR_ERR(trans);
@@ -7796,7 +7810,7 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes,
 	flags = get_alloc_profile_by_root(root, is_data);
 again:
 	WARN_ON(num_bytes < fs_info->sectorsize);
-	ret = find_free_extent(fs_info, ram_bytes, num_bytes, empty_size,
+	ret = find_free_extent(fs_info, root, ram_bytes, num_bytes, empty_size,
 			       hint_byte, ins, flags, delalloc);
 	if (!ret && !is_data) {
 		btrfs_dec_block_group_reservations(fs_info, ins->objectid);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-22 19:10 ` [PATCH v3] " fdmanana
@ 2018-10-22 19:18   ` Josef Bacik
  2018-10-22 22:05     ` Filipe Manana
  0 siblings, 1 reply; 24+ messages in thread
From: Josef Bacik @ 2018-10-22 19:18 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, josef, Filipe Manana

On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
> 
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> 
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
> 
> So fix this by skipping the loading of free space caches of any block
> groups that are not yet cached (rare cases) if we are COWing an extent
> buffer from the root tree and space caching is enabled (-o space_cache
> mount option). This is a rare case and its downside is failure to
> find a free extent (return -ENOSPC) when all the already cached block
> groups have no free extents.
> 
> Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Great, thanks,

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-22 19:18   ` Josef Bacik
@ 2018-10-22 22:05     ` Filipe Manana
  2018-10-24  4:08       ` Josef Bacik
  0 siblings, 1 reply; 24+ messages in thread
From: Filipe Manana @ 2018-10-22 22:05 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, Filipe David Borba Manana

On Mon, Oct 22, 2018 at 8:18 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When we are writing out a free space cache, during the transaction commit
> > phase, we can end up in a deadlock which results in a stack trace like the
> > following:
> >
> >  schedule+0x28/0x80
> >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> >  ? inode_insert5+0x119/0x190
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_iget+0x113/0x690 [btrfs]
> >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> >  load_free_space_cache+0x7c/0x170 [btrfs]
> >  ? cache_block_group+0x72/0x3b0 [btrfs]
> >  cache_block_group+0x1b3/0x3b0 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  find_free_extent+0x799/0x1010 [btrfs]
> >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> >  btrfs_cow_block+0xdc/0x180 [btrfs]
> >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> >  cache_save_setup+0xe4/0x3a0 [btrfs]
> >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> >
> > At cache_save_setup() we need to update the inode item of a block group's
> > cache which is located in the tree root (fs_info->tree_root), which means
> > that it may result in COWing a leaf from that tree. If that happens we
> > need to find a free metadata extent and while looking for one, if we find
> > a block group which was not cached yet we attempt to load its cache by
> > calling cache_block_group(). However this function will try to load the
> > inode of the free space cache, which requires finding the matching inode
> > item in the tree root - if that inode item is located in the same leaf as
> > the inode item of the space cache we are updating at cache_save_setup(),
> > we end up in a deadlock, since we try to obtain a read lock on the same
> > extent buffer that we previously write locked.
> >
> > So fix this by skipping the loading of free space caches of any block
> > groups that are not yet cached (rare cases) if we are COWing an extent
> > buffer from the root tree and space caching is enabled (-o space_cache
> > mount option). This is a rare case and its downside is failure to
> > find a free extent (return -ENOSPC) when all the already cached block
> > groups have no free extents.
> >
> > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> Great, thanks,
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>

So this makes many fstests occasionally fail with aborted transaction
due to ENOSPC.
It's late and I haven't verified yet, but I suppose this is because we
always skip loading the cache regardless of currently being COWing an
existing leaf or allocating a new one (growing the tree).
Needs to be fixed.

>
> Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-22 22:05     ` Filipe Manana
@ 2018-10-24  4:08       ` Josef Bacik
  2018-10-24  9:07         ` Filipe Manana
  0 siblings, 1 reply; 24+ messages in thread
From: Josef Bacik @ 2018-10-24  4:08 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Josef Bacik, linux-btrfs, Filipe David Borba Manana

On Mon, Oct 22, 2018 at 11:05:08PM +0100, Filipe Manana wrote:
> On Mon, Oct 22, 2018 at 8:18 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > When we are writing out a free space cache, during the transaction commit
> > > phase, we can end up in a deadlock which results in a stack trace like the
> > > following:
> > >
> > >  schedule+0x28/0x80
> > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > >  ? finish_wait+0x80/0x80
> > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > >  ? inode_insert5+0x119/0x190
> > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > >  ? kmem_cache_alloc+0x166/0x1d0
> > >  btrfs_iget+0x113/0x690 [btrfs]
> > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > >  ? finish_wait+0x80/0x80
> > >  find_free_extent+0x799/0x1010 [btrfs]
> > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > >  ? kmem_cache_alloc+0x166/0x1d0
> > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > >
> > > At cache_save_setup() we need to update the inode item of a block group's
> > > cache which is located in the tree root (fs_info->tree_root), which means
> > > that it may result in COWing a leaf from that tree. If that happens we
> > > need to find a free metadata extent and while looking for one, if we find
> > > a block group which was not cached yet we attempt to load its cache by
> > > calling cache_block_group(). However this function will try to load the
> > > inode of the free space cache, which requires finding the matching inode
> > > item in the tree root - if that inode item is located in the same leaf as
> > > the inode item of the space cache we are updating at cache_save_setup(),
> > > we end up in a deadlock, since we try to obtain a read lock on the same
> > > extent buffer that we previously write locked.
> > >
> > > So fix this by skipping the loading of free space caches of any block
> > > groups that are not yet cached (rare cases) if we are COWing an extent
> > > buffer from the root tree and space caching is enabled (-o space_cache
> > > mount option). This is a rare case and its downside is failure to
> > > find a free extent (return -ENOSPC) when all the already cached block
> > > groups have no free extents.
> > >
> > > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >
> > Great, thanks,
> >
> > Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> 
> So this makes many fstests occasionally fail with aborted transaction
> due to ENOSPC.
> It's late and I haven't verified yet, but I suppose this is because we
> always skip loading the cache regardless of currently being COWing an
> existing leaf or allocating a new one (growing the tree).
> Needs to be fixed.
> 

How about we just use path->search_commit_root?  If we're loading the cache we
just want the last committed version, it's not like we read it after we've
written it.  Then we can go back to business as usual.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-24  4:08       ` Josef Bacik
@ 2018-10-24  9:07         ` Filipe Manana
  0 siblings, 0 replies; 24+ messages in thread
From: Filipe Manana @ 2018-10-24  9:07 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, Filipe David Borba Manana

On Wed, Oct 24, 2018 at 5:08 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Mon, Oct 22, 2018 at 11:05:08PM +0100, Filipe Manana wrote:
> > On Mon, Oct 22, 2018 at 8:18 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > >
> > > On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdmanana@kernel.org wrote:
> > > > From: Filipe Manana <fdmanana@suse.com>
> > > >
> > > > When we are writing out a free space cache, during the transaction commit
> > > > phase, we can end up in a deadlock which results in a stack trace like the
> > > > following:
> > > >
> > > >  schedule+0x28/0x80
> > > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > > >  ? finish_wait+0x80/0x80
> > > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > > >  ? inode_insert5+0x119/0x190
> > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > >  btrfs_iget+0x113/0x690 [btrfs]
> > > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > > >  ? finish_wait+0x80/0x80
> > > >  find_free_extent+0x799/0x1010 [btrfs]
> > > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > > >
> > > > At cache_save_setup() we need to update the inode item of a block group's
> > > > cache which is located in the tree root (fs_info->tree_root), which means
> > > > that it may result in COWing a leaf from that tree. If that happens we
> > > > need to find a free metadata extent and while looking for one, if we find
> > > > a block group which was not cached yet we attempt to load its cache by
> > > > calling cache_block_group(). However this function will try to load the
> > > > inode of the free space cache, which requires finding the matching inode
> > > > item in the tree root - if that inode item is located in the same leaf as
> > > > the inode item of the space cache we are updating at cache_save_setup(),
> > > > we end up in a deadlock, since we try to obtain a read lock on the same
> > > > extent buffer that we previously write locked.
> > > >
> > > > So fix this by skipping the loading of free space caches of any block
> > > > groups that are not yet cached (rare cases) if we are COWing an extent
> > > > buffer from the root tree and space caching is enabled (-o space_cache
> > > > mount option). This is a rare case and its downside is failure to
> > > > find a free extent (return -ENOSPC) when all the already cached block
> > > > groups have no free extents.
> > > >
> > > > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > >
> > > Great, thanks,
> > >
> > > Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> >
> > So this makes many fstests occasionally fail with aborted transaction
> > due to ENOSPC.
> > It's late and I haven't verified yet, but I suppose this is because we
> > always skip loading the cache regardless of currently being COWing an
> > existing leaf or allocating a new one (growing the tree).
> > Needs to be fixed.
> >
>
> How about we just use path->search_commit_root?  If we're loading the cache we
> just want the last committed version, it's not like we read it after we've
> written it.  Then we can go back to business as usual.  Thanks,

Yeah, that works. It was an idea before sending v1 but it felt a bit
dirty at the time.
Left fstests running overnight using the commit root approach and
everything seems fine. Sending a v4.

Another alternative, which would solve similar problems for any tree,
would be to allow getting a read lock on an
already (spin) write locked eb, just like we do for blocking write
locks:  https://friendpaste.com/6XrGXb5p0RSJGixUFZ8lCt

thanks


>
> Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-22  9:09 [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent fdmanana
                   ` (4 preceding siblings ...)
  2018-10-22 19:10 ` [PATCH v3] " fdmanana
@ 2018-10-24  9:13 ` fdmanana
  2018-10-24 11:37   ` Josef Bacik
  5 siblings, 1 reply; 24+ messages in thread
From: fdmanana @ 2018-10-24  9:13 UTC (permalink / raw)
  To: linux-btrfs; +Cc: josef, Filipe Manana

From: Filipe Manana <fdmanana@suse.com>

When we are writing out a free space cache, during the transaction commit
phase, we can end up in a deadlock which results in a stack trace like the
following:

 schedule+0x28/0x80
 btrfs_tree_read_lock+0x8e/0x120 [btrfs]
 ? finish_wait+0x80/0x80
 btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
 btrfs_search_slot+0xf6/0x9f0 [btrfs]
 ? evict_refill_and_join+0xd0/0xd0 [btrfs]
 ? inode_insert5+0x119/0x190
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_iget+0x113/0x690 [btrfs]
 __lookup_free_space_inode+0xd8/0x150 [btrfs]
 lookup_free_space_inode+0x5b/0xb0 [btrfs]
 load_free_space_cache+0x7c/0x170 [btrfs]
 ? cache_block_group+0x72/0x3b0 [btrfs]
 cache_block_group+0x1b3/0x3b0 [btrfs]
 ? finish_wait+0x80/0x80
 find_free_extent+0x799/0x1010 [btrfs]
 btrfs_reserve_extent+0x9b/0x180 [btrfs]
 btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
 __btrfs_cow_block+0x11d/0x500 [btrfs]
 btrfs_cow_block+0xdc/0x180 [btrfs]
 btrfs_search_slot+0x3bd/0x9f0 [btrfs]
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_update_inode_item+0x46/0x100 [btrfs]
 cache_save_setup+0xe4/0x3a0 [btrfs]
 btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
 btrfs_commit_transaction+0xcb/0x8b0 [btrfs]

At cache_save_setup() we need to update the inode item of a block group's
cache which is located in the tree root (fs_info->tree_root), which means
that it may result in COWing a leaf from that tree. If that happens we
need to find a free metadata extent and while looking for one, if we find
a block group which was not cached yet we attempt to load its cache by
calling cache_block_group(). However this function will try to load the
inode of the free space cache, which requires finding the matching inode
item in the tree root - if that inode item is located in the same leaf as
the inode item of the space cache we are updating at cache_save_setup(),
we end up in a deadlock, since we try to obtain a read lock on the same
extent buffer that we previously write locked.

So fix this by using the tree root's commit root when searching for a
block group's free space cache inode item when we are attempting to load
a free space cache. This is safe since block groups once loaded stay in
memory forever, as well as their caches, so after they are first loaded
we will never need to read their inode items again. For new block groups,
once they are created they get their ->cached field set to
BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.

Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Made the solution more generic, since the problem could happen in any
    path COWing an extent buffer from the root tree.

    Applies on top of a previous patch titled:

     "Btrfs: fix deadlock when writing out free space caches"

V3: Made it more simple by avoiding the atomic from V2 and pass the root
    to find_free_extent().

V4: Changed the whole approach so that we lookup for free space cache inode
    items using the commit root instead.
    The previous approach was causing some transactions to be aborted with
    -ENOSPC during umount because sometimes we were skipping cache loading
    of all metadata block groups.

 fs/btrfs/ctree.h            |  3 +++
 fs/btrfs/free-space-cache.c | 22 +++++++++++++++++++++-
 fs/btrfs/inode.c            | 32 ++++++++++++++++++++++----------
 3 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2cddfe7806a4..2b34b2a05ad6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3177,6 +3177,9 @@ void btrfs_destroy_inode(struct inode *inode);
 int btrfs_drop_inode(struct inode *inode);
 int __init btrfs_init_cachep(void);
 void __cold btrfs_destroy_cachep(void);
+struct inode *btrfs_iget_path(struct super_block *s, struct btrfs_key *location,
+			      struct btrfs_root *root, int *new,
+			      struct btrfs_path *path);
 struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location,
 			 struct btrfs_root *root, int *was_new);
 struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 65b79500e09f..7265f35324f6 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -68,7 +68,8 @@ static struct inode *__lookup_free_space_inode(struct btrfs_root *root,
 	btrfs_disk_key_to_cpu(&location, &disk_key);
 	btrfs_release_path(path);
 
-	inode = btrfs_iget(fs_info->sb, &location, root, NULL);
+	inode = btrfs_iget_path(fs_info->sb, &location, root, NULL, path);
+	btrfs_release_path(path);
 	if (IS_ERR(inode))
 		return inode;
 
@@ -830,6 +831,25 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
 	path->search_commit_root = 1;
 	path->skip_locking = 1;
 
+	/*
+	 * We must pass a path with search_commit_root set to btrfs_iget in
+	 * order to avoid a deadlock when allocating extents for the tree root.
+	 *
+	 * When we are COWing an extent buffer from the tree root, when looking
+	 * for a free extent, at extent-tree.c:find_free_extent(), we can find
+	 * block group without its free space cache loaded. When we find one
+	 * we must load its space cache which requires reading its free space
+	 * cache's inode item from the root tree. If this inode item is located
+	 * in the same leaf that we started COWing before, then we end up in
+	 * deadlock on the extent buffer (trying to read lock it when we
+	 * previously write locked it).
+	 *
+	 * It's safe to read the inode item using the commit root because
+	 * block groups, once loaded, stay in memory forever (until they are
+	 * removed) as well as their space caches once loaded. New block groups
+	 * once created get their ->cached field set to BTRFS_CACHE_FINISHED so
+	 * we will never try to read their inode item while the fs is mounted.
+	 */
 	inode = lookup_free_space_inode(fs_info, block_group, path);
 	if (IS_ERR(inode)) {
 		btrfs_free_path(path);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d6b61b1facdd..64ea749c1ba4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3568,10 +3568,11 @@ static noinline int acls_after_inode_item(struct extent_buffer *leaf,
 /*
  * read an inode from the btree into the in-memory inode
  */
-static int btrfs_read_locked_inode(struct inode *inode)
+static int btrfs_read_locked_inode(struct inode *inode,
+				   struct btrfs_path *in_path)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct btrfs_path *path;
+	struct btrfs_path *path = in_path;
 	struct extent_buffer *leaf;
 	struct btrfs_inode_item *inode_item;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -3587,15 +3588,18 @@ static int btrfs_read_locked_inode(struct inode *inode)
 	if (!ret)
 		filled = true;
 
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
+	if (!path) {
+		path = btrfs_alloc_path();
+		if (!path)
+			return -ENOMEM;
+	}
 
 	memcpy(&location, &BTRFS_I(inode)->location, sizeof(location));
 
 	ret = btrfs_lookup_inode(NULL, root, path, &location, 0);
 	if (ret) {
-		btrfs_free_path(path);
+		if (path != in_path)
+			btrfs_free_path(path);
 		return ret;
 	}
 
@@ -3720,7 +3724,8 @@ static int btrfs_read_locked_inode(struct inode *inode)
 				  btrfs_ino(BTRFS_I(inode)),
 				  root->root_key.objectid, ret);
 	}
-	btrfs_free_path(path);
+	if (path != in_path)
+		btrfs_free_path(path);
 
 	if (!maybe_acls)
 		cache_no_acl(inode);
@@ -5662,8 +5667,9 @@ static struct inode *btrfs_iget_locked(struct super_block *s,
 /* Get an inode object given its location and corresponding root.
  * Returns in *is_new if the inode was read from disk
  */
-struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location,
-			 struct btrfs_root *root, int *new)
+struct inode *btrfs_iget_path(struct super_block *s, struct btrfs_key *location,
+			      struct btrfs_root *root, int *new,
+			      struct btrfs_path *path)
 {
 	struct inode *inode;
 
@@ -5674,7 +5680,7 @@ struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location,
 	if (inode->i_state & I_NEW) {
 		int ret;
 
-		ret = btrfs_read_locked_inode(inode);
+		ret = btrfs_read_locked_inode(inode, path);
 		if (!ret) {
 			inode_tree_add(inode);
 			unlock_new_inode(inode);
@@ -5696,6 +5702,12 @@ struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location,
 	return inode;
 }
 
+struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location,
+			 struct btrfs_root *root, int *new)
+{
+	return btrfs_iget_path(s, location, root, new, NULL);
+}
+
 static struct inode *new_simple_dir(struct super_block *s,
 				    struct btrfs_key *key,
 				    struct btrfs_root *root)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-24  9:13 ` [PATCH v4] " fdmanana
@ 2018-10-24 11:37   ` Josef Bacik
  2018-10-24 11:53     ` Filipe Manana
  0 siblings, 1 reply; 24+ messages in thread
From: Josef Bacik @ 2018-10-24 11:37 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, josef, Filipe Manana

On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
> 
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> 
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
> 
> So fix this by using the tree root's commit root when searching for a
> block group's free space cache inode item when we are attempting to load
> a free space cache. This is safe since block groups once loaded stay in
> memory forever, as well as their caches, so after they are first loaded
> we will never need to read their inode items again. For new block groups,
> once they are created they get their ->cached field set to
> BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> 
> Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> 

Now my goal is to see how many times I can get you to redo this thing.

Why not instead just do 

if (btrfs_is_free_space_inode(inode))
  path->search_commit_root = 1;

in read_locked_inode?  That would be cleaner.  If we don't want to do that for
the inode cache (I'm not sure if it's ok or not) we could just do

if (root == fs_info->tree_root)

instead.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-24 11:37   ` Josef Bacik
@ 2018-10-24 11:53     ` Filipe Manana
  2018-10-24 12:40       ` Josef Bacik
  0 siblings, 1 reply; 24+ messages in thread
From: Filipe Manana @ 2018-10-24 11:53 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, Filipe David Borba Manana

On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When we are writing out a free space cache, during the transaction commit
> > phase, we can end up in a deadlock which results in a stack trace like the
> > following:
> >
> >  schedule+0x28/0x80
> >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> >  ? inode_insert5+0x119/0x190
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_iget+0x113/0x690 [btrfs]
> >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> >  load_free_space_cache+0x7c/0x170 [btrfs]
> >  ? cache_block_group+0x72/0x3b0 [btrfs]
> >  cache_block_group+0x1b3/0x3b0 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  find_free_extent+0x799/0x1010 [btrfs]
> >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> >  btrfs_cow_block+0xdc/0x180 [btrfs]
> >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> >  cache_save_setup+0xe4/0x3a0 [btrfs]
> >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> >
> > At cache_save_setup() we need to update the inode item of a block group's
> > cache which is located in the tree root (fs_info->tree_root), which means
> > that it may result in COWing a leaf from that tree. If that happens we
> > need to find a free metadata extent and while looking for one, if we find
> > a block group which was not cached yet we attempt to load its cache by
> > calling cache_block_group(). However this function will try to load the
> > inode of the free space cache, which requires finding the matching inode
> > item in the tree root - if that inode item is located in the same leaf as
> > the inode item of the space cache we are updating at cache_save_setup(),
> > we end up in a deadlock, since we try to obtain a read lock on the same
> > extent buffer that we previously write locked.
> >
> > So fix this by using the tree root's commit root when searching for a
> > block group's free space cache inode item when we are attempting to load
> > a free space cache. This is safe since block groups once loaded stay in
> > memory forever, as well as their caches, so after they are first loaded
> > we will never need to read their inode items again. For new block groups,
> > once they are created they get their ->cached field set to
> > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> >
> > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >
>
> Now my goal is to see how many times I can get you to redo this thing.
>
> Why not instead just do
>
> if (btrfs_is_free_space_inode(inode))
>   path->search_commit_root = 1;
>
> in read_locked_inode?  That would be cleaner.  If we don't want to do that for
> the inode cache (I'm not sure if it's ok or not) we could just do
>
> if (root == fs_info->tree_root)

We can't (not just that at least).
Tried something like that, but we get into a BUG_ON when writing out
the space cache for new block groups (created in the current
transaction).
Because at cache_save_setup() we have this:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342

Lookup for the inode in normal root, doesn't exist, create it then
repeat - if still not found, BUG_ON.
Could also make create_free_space_inode() return an inode pointer and
make it call btrfs_iget().

>
> instead.  Thanks,
>
> Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-24 11:53     ` Filipe Manana
@ 2018-10-24 12:40       ` Josef Bacik
  2018-10-24 12:48         ` Filipe Manana
  0 siblings, 1 reply; 24+ messages in thread
From: Josef Bacik @ 2018-10-24 12:40 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Josef Bacik, linux-btrfs, Filipe David Borba Manana

On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote:
> On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > When we are writing out a free space cache, during the transaction commit
> > > phase, we can end up in a deadlock which results in a stack trace like the
> > > following:
> > >
> > >  schedule+0x28/0x80
> > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > >  ? finish_wait+0x80/0x80
> > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > >  ? inode_insert5+0x119/0x190
> > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > >  ? kmem_cache_alloc+0x166/0x1d0
> > >  btrfs_iget+0x113/0x690 [btrfs]
> > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > >  ? finish_wait+0x80/0x80
> > >  find_free_extent+0x799/0x1010 [btrfs]
> > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > >  ? kmem_cache_alloc+0x166/0x1d0
> > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > >
> > > At cache_save_setup() we need to update the inode item of a block group's
> > > cache which is located in the tree root (fs_info->tree_root), which means
> > > that it may result in COWing a leaf from that tree. If that happens we
> > > need to find a free metadata extent and while looking for one, if we find
> > > a block group which was not cached yet we attempt to load its cache by
> > > calling cache_block_group(). However this function will try to load the
> > > inode of the free space cache, which requires finding the matching inode
> > > item in the tree root - if that inode item is located in the same leaf as
> > > the inode item of the space cache we are updating at cache_save_setup(),
> > > we end up in a deadlock, since we try to obtain a read lock on the same
> > > extent buffer that we previously write locked.
> > >
> > > So fix this by using the tree root's commit root when searching for a
> > > block group's free space cache inode item when we are attempting to load
> > > a free space cache. This is safe since block groups once loaded stay in
> > > memory forever, as well as their caches, so after they are first loaded
> > > we will never need to read their inode items again. For new block groups,
> > > once they are created they get their ->cached field set to
> > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> > >
> > > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > > ---
> > >
> >
> > Now my goal is to see how many times I can get you to redo this thing.
> >
> > Why not instead just do
> >
> > if (btrfs_is_free_space_inode(inode))
> >   path->search_commit_root = 1;
> >
> > in read_locked_inode?  That would be cleaner.  If we don't want to do that for
> > the inode cache (I'm not sure if it's ok or not) we could just do
> >
> > if (root == fs_info->tree_root)
> 
> We can't (not just that at least).
> Tried something like that, but we get into a BUG_ON when writing out
> the space cache for new block groups (created in the current
> transaction).
> Because at cache_save_setup() we have this:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342
> 
> Lookup for the inode in normal root, doesn't exist, create it then
> repeat - if still not found, BUG_ON.
> Could also make create_free_space_inode() return an inode pointer and
> make it call btrfs_iget().
> 

Ah ok makes sense.  Well in that case lets just make btrfs_read_locked_inode()
take a path, and allocate it in btrfs_iget, that'll remove the ugly

if (path != in_path)

stuff.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-24 12:40       ` Josef Bacik
@ 2018-10-24 12:48         ` Filipe Manana
  2018-10-24 23:58           ` Josef Bacik
  2018-11-05 16:28           ` David Sterba
  0 siblings, 2 replies; 24+ messages in thread
From: Filipe Manana @ 2018-10-24 12:48 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, Filipe David Borba Manana

On Wed, Oct 24, 2018 at 1:40 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote:
> > On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > >
> > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdmanana@kernel.org wrote:
> > > > From: Filipe Manana <fdmanana@suse.com>
> > > >
> > > > When we are writing out a free space cache, during the transaction commit
> > > > phase, we can end up in a deadlock which results in a stack trace like the
> > > > following:
> > > >
> > > >  schedule+0x28/0x80
> > > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > > >  ? finish_wait+0x80/0x80
> > > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > > >  ? inode_insert5+0x119/0x190
> > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > >  btrfs_iget+0x113/0x690 [btrfs]
> > > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > > >  ? finish_wait+0x80/0x80
> > > >  find_free_extent+0x799/0x1010 [btrfs]
> > > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > > >
> > > > At cache_save_setup() we need to update the inode item of a block group's
> > > > cache which is located in the tree root (fs_info->tree_root), which means
> > > > that it may result in COWing a leaf from that tree. If that happens we
> > > > need to find a free metadata extent and while looking for one, if we find
> > > > a block group which was not cached yet we attempt to load its cache by
> > > > calling cache_block_group(). However this function will try to load the
> > > > inode of the free space cache, which requires finding the matching inode
> > > > item in the tree root - if that inode item is located in the same leaf as
> > > > the inode item of the space cache we are updating at cache_save_setup(),
> > > > we end up in a deadlock, since we try to obtain a read lock on the same
> > > > extent buffer that we previously write locked.
> > > >
> > > > So fix this by using the tree root's commit root when searching for a
> > > > block group's free space cache inode item when we are attempting to load
> > > > a free space cache. This is safe since block groups once loaded stay in
> > > > memory forever, as well as their caches, so after they are first loaded
> > > > we will never need to read their inode items again. For new block groups,
> > > > once they are created they get their ->cached field set to
> > > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> > > >
> > > > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > > > ---
> > > >
> > >
> > > Now my goal is to see how many times I can get you to redo this thing.
> > >
> > > Why not instead just do
> > >
> > > if (btrfs_is_free_space_inode(inode))
> > >   path->search_commit_root = 1;
> > >
> > > in read_locked_inode?  That would be cleaner.  If we don't want to do that for
> > > the inode cache (I'm not sure if it's ok or not) we could just do
> > >
> > > if (root == fs_info->tree_root)
> >
> > We can't (not just that at least).
> > Tried something like that, but we get into a BUG_ON when writing out
> > the space cache for new block groups (created in the current
> > transaction).
> > Because at cache_save_setup() we have this:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342
> >
> > Lookup for the inode in normal root, doesn't exist, create it then
> > repeat - if still not found, BUG_ON.
> > Could also make create_free_space_inode() return an inode pointer and
> > make it call btrfs_iget().
> >
>
> Ah ok makes sense.  Well in that case lets just make btrfs_read_locked_inode()
> take a path, and allocate it in btrfs_iget, that'll remove the ugly
>
> if (path != in_path)

You mean the following on top of v4:

https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg

Not much different, just saves one such if statement. I'm ok with that.

>
> stuff.  Thanks,
>
> Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-24 12:48         ` Filipe Manana
@ 2018-10-24 23:58           ` Josef Bacik
  2018-11-05 16:28           ` David Sterba
  1 sibling, 0 replies; 24+ messages in thread
From: Josef Bacik @ 2018-10-24 23:58 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Josef Bacik, linux-btrfs, Filipe David Borba Manana

On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> On Wed, Oct 24, 2018 at 1:40 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote:
> > > On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > > >
> > > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdmanana@kernel.org wrote:
> > > > > From: Filipe Manana <fdmanana@suse.com>
> > > > >
> > > > > When we are writing out a free space cache, during the transaction commit
> > > > > phase, we can end up in a deadlock which results in a stack trace like the
> > > > > following:
> > > > >
> > > > >  schedule+0x28/0x80
> > > > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > > > >  ? finish_wait+0x80/0x80
> > > > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > > > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > > > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > > > >  ? inode_insert5+0x119/0x190
> > > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > > >  btrfs_iget+0x113/0x690 [btrfs]
> > > > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > > > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > > > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > > > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > > > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > > > >  ? finish_wait+0x80/0x80
> > > > >  find_free_extent+0x799/0x1010 [btrfs]
> > > > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > > > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > > > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > > > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > > > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > > > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > > > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > > > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > > > >
> > > > > At cache_save_setup() we need to update the inode item of a block group's
> > > > > cache which is located in the tree root (fs_info->tree_root), which means
> > > > > that it may result in COWing a leaf from that tree. If that happens we
> > > > > need to find a free metadata extent and while looking for one, if we find
> > > > > a block group which was not cached yet we attempt to load its cache by
> > > > > calling cache_block_group(). However this function will try to load the
> > > > > inode of the free space cache, which requires finding the matching inode
> > > > > item in the tree root - if that inode item is located in the same leaf as
> > > > > the inode item of the space cache we are updating at cache_save_setup(),
> > > > > we end up in a deadlock, since we try to obtain a read lock on the same
> > > > > extent buffer that we previously write locked.
> > > > >
> > > > > So fix this by using the tree root's commit root when searching for a
> > > > > block group's free space cache inode item when we are attempting to load
> > > > > a free space cache. This is safe since block groups once loaded stay in
> > > > > memory forever, as well as their caches, so after they are first loaded
> > > > > we will never need to read their inode items again. For new block groups,
> > > > > once they are created they get their ->cached field set to
> > > > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> > > > >
> > > > > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > > > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> > > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > > > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > > > > ---
> > > > >
> > > >
> > > > Now my goal is to see how many times I can get you to redo this thing.
> > > >
> > > > Why not instead just do
> > > >
> > > > if (btrfs_is_free_space_inode(inode))
> > > >   path->search_commit_root = 1;
> > > >
> > > > in read_locked_inode?  That would be cleaner.  If we don't want to do that for
> > > > the inode cache (I'm not sure if it's ok or not) we could just do
> > > >
> > > > if (root == fs_info->tree_root)
> > >
> > > We can't (not just that at least).
> > > Tried something like that, but we get into a BUG_ON when writing out
> > > the space cache for new block groups (created in the current
> > > transaction).
> > > Because at cache_save_setup() we have this:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342
> > >
> > > Lookup for the inode in normal root, doesn't exist, create it then
> > > repeat - if still not found, BUG_ON.
> > > Could also make create_free_space_inode() return an inode pointer and
> > > make it call btrfs_iget().
> > >
> >
> > Ah ok makes sense.  Well in that case lets just make btrfs_read_locked_inode()
> > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> >
> > if (path != in_path)
> 
> You mean the following on top of v4:
> 
> https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> 
> Not much different, just saves one such if statement. I'm ok with that.
> 

Yeah ok, thanks,

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-10-24 12:48         ` Filipe Manana
  2018-10-24 23:58           ` Josef Bacik
@ 2018-11-05 16:28           ` David Sterba
  2018-11-05 16:30             ` Filipe Manana
  1 sibling, 1 reply; 24+ messages in thread
From: David Sterba @ 2018-11-05 16:28 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Josef Bacik, linux-btrfs, Filipe David Borba Manana

On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > Ah ok makes sense.  Well in that case lets just make btrfs_read_locked_inode()
> > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> >
> > if (path != in_path)
> 
> You mean the following on top of v4:
> 
> https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> 
> Not much different, just saves one such if statement. I'm ok with that.

Now in misc-next with v4 and the friendpaste incremental as

https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-11-05 16:28           ` David Sterba
@ 2018-11-05 16:30             ` Filipe Manana
  2018-11-05 16:34               ` David Sterba
  0 siblings, 1 reply; 24+ messages in thread
From: Filipe Manana @ 2018-11-05 16:30 UTC (permalink / raw)
  To: dsterba, Josef Bacik, linux-btrfs, Filipe David Borba Manana

On Mon, Nov 5, 2018 at 4:29 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > > Ah ok makes sense.  Well in that case lets just make btrfs_read_locked_inode()
> > > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> > >
> > > if (path != in_path)
> >
> > You mean the following on top of v4:
> >
> > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> >
> > Not much different, just saves one such if statement. I'm ok with that.
>
> Now in misc-next with v4 and the friendpaste incremental as
>
> https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69

Please don't add the incremental. It's buggy. It was meant to figure
out what Josef was saying. That's why I haven't sent a V5.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-11-05 16:30             ` Filipe Manana
@ 2018-11-05 16:34               ` David Sterba
  2018-11-05 16:36                 ` Filipe Manana
  0 siblings, 1 reply; 24+ messages in thread
From: David Sterba @ 2018-11-05 16:34 UTC (permalink / raw)
  To: Filipe Manana
  Cc: dsterba, Josef Bacik, linux-btrfs, Filipe David Borba Manana

On Mon, Nov 05, 2018 at 04:30:35PM +0000, Filipe Manana wrote:
> On Mon, Nov 5, 2018 at 4:29 PM David Sterba <dsterba@suse.cz> wrote:
> >
> > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > > > Ah ok makes sense.  Well in that case lets just make btrfs_read_locked_inode()
> > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> > > >
> > > > if (path != in_path)
> > >
> > > You mean the following on top of v4:
> > >
> > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> > >
> > > Not much different, just saves one such if statement. I'm ok with that.
> >
> > Now in misc-next with v4 and the friendpaste incremental as
> >
> > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69
> 
> Please don't add the incremental. It's buggy. It was meant to figure
> out what Josef was saying. That's why I haven't sent a V5.

Ok dropped, I'll will wait for a proper patch.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-11-05 16:34               ` David Sterba
@ 2018-11-05 16:36                 ` Filipe Manana
  2018-11-13 14:33                   ` David Sterba
  0 siblings, 1 reply; 24+ messages in thread
From: Filipe Manana @ 2018-11-05 16:36 UTC (permalink / raw)
  To: dsterba, Josef Bacik, linux-btrfs, Filipe David Borba Manana

On Mon, Nov 5, 2018 at 4:34 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Mon, Nov 05, 2018 at 04:30:35PM +0000, Filipe Manana wrote:
> > On Mon, Nov 5, 2018 at 4:29 PM David Sterba <dsterba@suse.cz> wrote:
> > >
> > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > > > > Ah ok makes sense.  Well in that case lets just make btrfs_read_locked_inode()
> > > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> > > > >
> > > > > if (path != in_path)
> > > >
> > > > You mean the following on top of v4:
> > > >
> > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> > > >
> > > > Not much different, just saves one such if statement. I'm ok with that.
> > >
> > > Now in misc-next with v4 and the friendpaste incremental as
> > >
> > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69
> >
> > Please don't add the incremental. It's buggy. It was meant to figure
> > out what Josef was saying. That's why I haven't sent a V5.
>
> Ok dropped, I'll will wait for a proper patch.

It's V4, the last sent version. Just forget the incremental.
Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
  2018-11-05 16:36                 ` Filipe Manana
@ 2018-11-13 14:33                   ` David Sterba
  0 siblings, 0 replies; 24+ messages in thread
From: David Sterba @ 2018-11-13 14:33 UTC (permalink / raw)
  To: Filipe Manana
  Cc: dsterba, Josef Bacik, linux-btrfs, Filipe David Borba Manana

On Mon, Nov 05, 2018 at 04:36:34PM +0000, Filipe Manana wrote:
> On Mon, Nov 5, 2018 at 4:34 PM David Sterba <dsterba@suse.cz> wrote:
> > On Mon, Nov 05, 2018 at 04:30:35PM +0000, Filipe Manana wrote:
> > > On Mon, Nov 5, 2018 at 4:29 PM David Sterba <dsterba@suse.cz> wrote:
> > > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > > > > > Ah ok makes sense.  Well in that case lets just make btrfs_read_locked_inode()
> > > > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> > > > > >
> > > > > > if (path != in_path)
> > > > >
> > > > > You mean the following on top of v4:
> > > > >
> > > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> > > > >
> > > > > Not much different, just saves one such if statement. I'm ok with that.
> > > >
> > > > Now in misc-next with v4 and the friendpaste incremental as
> > > >
> > > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69
> > >
> > > Please don't add the incremental. It's buggy. It was meant to figure
> > > out what Josef was saying. That's why I haven't sent a V5.
> >
> > Ok dropped, I'll will wait for a proper patch.
> 
> It's V4, the last sent version. Just forget the incremental.
> Thanks.

For the record, V4 has been merged to master in 4.20-rc2.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-11-13 14:34 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-22  9:09 [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent fdmanana
2018-10-22  9:53 ` Filipe Manana
2018-10-22  9:54 ` Filipe Manana
2018-10-22 18:07 ` Josef Bacik
2018-10-22 18:51   ` Filipe Manana
2018-10-22 18:48 ` [PATCH v2] " fdmanana
2018-10-22 18:55   ` Josef Bacik
2018-10-22 19:10     ` Filipe Manana
2018-10-22 19:10 ` [PATCH v3] " fdmanana
2018-10-22 19:18   ` Josef Bacik
2018-10-22 22:05     ` Filipe Manana
2018-10-24  4:08       ` Josef Bacik
2018-10-24  9:07         ` Filipe Manana
2018-10-24  9:13 ` [PATCH v4] " fdmanana
2018-10-24 11:37   ` Josef Bacik
2018-10-24 11:53     ` Filipe Manana
2018-10-24 12:40       ` Josef Bacik
2018-10-24 12:48         ` Filipe Manana
2018-10-24 23:58           ` Josef Bacik
2018-11-05 16:28           ` David Sterba
2018-11-05 16:30             ` Filipe Manana
2018-11-05 16:34               ` David Sterba
2018-11-05 16:36                 ` Filipe Manana
2018-11-13 14:33                   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.