linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] btrfs-progs: Metadata preallocation enhancement
@ 2019-04-16 10:21 Qu Wenruo
  2019-04-16 10:21 ` [PATCH 1/2] btrfs-progs: Avoid nested chunk allocation call Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Qu Wenruo @ 2019-04-16 10:21 UTC (permalink / raw)
  To: linux-btrfs

This patchset will address the github issue #123 to make btrfs-convert
less possible to report false ENOSPC.

The first patch will try to avoid the nested chunk/extent tree
modification in a more explicit way.

The 2nd patch will address the ENOSPC problem, by adding CSUM tree to
the metadata preallocate list.

The 2nd patch is not as aggressive as 7a12d8470e5f ("btrfs-progs:
Do metadata preallocation as long as we're not modifying extent tree").
Even we have the 1st patch I still prefer to make less modification to
avoid possible bugs.

Qu Wenruo (2):
  btrfs-progs: Avoid nested chunk allocation call
  btrfs-progs: Do metadata preallocation for fs trees and csum tree

 check/main.c  |  2 +-
 extent-tree.c | 20 +++++++++++++++++++-
 transaction.c |  3 ++-
 transaction.h |  3 ++-
 4 files changed, 24 insertions(+), 4 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] btrfs-progs: Avoid nested chunk allocation call
  2019-04-16 10:21 [PATCH 0/2] btrfs-progs: Metadata preallocation enhancement Qu Wenruo
@ 2019-04-16 10:21 ` Qu Wenruo
  2019-04-16 11:22   ` Filipe Manana
  2019-06-05 16:32   ` David Sterba
  2019-04-16 10:21 ` [PATCH 2/2] btrfs-progs: Do metadata preallocation for fs trees and csum tree Qu Wenruo
  2019-06-05 16:39 ` [PATCH 0/2] btrfs-progs: Metadata preallocation enhancement David Sterba
  2 siblings, 2 replies; 9+ messages in thread
From: Qu Wenruo @ 2019-04-16 10:21 UTC (permalink / raw)
  To: linux-btrfs

There is a hidden call loop which can trigger itself:

btrfs_reserve_extent()             <--|
|- do_chunk_alloc()                   |
   |- btrfs_alloc_chunk()             |
      |- btrfs_insert_item()          |
	 |- btrfs_reserve_extent() <--|

Currently, we're using root->ref_cows to determine whether we should do
chunk prealloc to avoid such loop.

But that's still a hidden trap. Instead of solving it using some hidden
tricks, this patch will make chunk/block group allocation exclusive.

Now if do_chunk_alloc() determines to alloc chunk, it will set a special
flag in transaction handler so it new do_chunk_alloc() will refuse to
allocate new chunk until current chunk allocation finishes.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 check/main.c  |  2 +-
 extent-tree.c | 12 ++++++++++++
 transaction.c |  3 ++-
 transaction.h |  3 ++-
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/check/main.c b/check/main.c
index 683c322eba6f..121be4b73c4f 100644
--- a/check/main.c
+++ b/check/main.c
@@ -10185,7 +10185,7 @@ int cmd_check(int argc, char **argv)
 			goto close_out;
 		}
 
-		trans->reinit_extent_tree = true;
+		trans->reinit_extent_tree = 1;
 		if (init_extent_tree) {
 			printf("Creating a new extent tree\n");
 			ret = reinit_extent_tree(trans, info,
diff --git a/extent-tree.c b/extent-tree.c
index 8c9cdeff3b02..e25ddf02e5cc 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1873,10 +1873,21 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
 	    (flags & BTRFS_BLOCK_GROUP_SYSTEM))
 		return 0;
 
+	/*
+	 * We're going to allocate new chunk, during the process, we will
+	 * allocate new tree blocks, which can trigger new chunk allocation
+	 * again.
+	 * Avoid such loop call
+	 */
+	if (trans->allocating_chunk)
+		return 0;
+	trans->allocating_chunk = 1;
+
 	ret = btrfs_alloc_chunk(trans, fs_info, &start, &num_bytes,
 	                        space_info->flags);
 	if (ret == -ENOSPC) {
 		space_info->full = 1;
+		trans->allocating_chunk = 0;
 		return 0;
 	}
 
@@ -1885,6 +1896,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
 	ret = btrfs_make_block_group(trans, fs_info, 0, space_info->flags,
 				     start, num_bytes);
 	BUG_ON(ret);
+	trans->allocating_chunk = 0;
 	return 0;
 }
 
diff --git a/transaction.c b/transaction.c
index 3a63988b0969..21377282f806 100644
--- a/transaction.c
+++ b/transaction.c
@@ -46,7 +46,8 @@ struct btrfs_trans_handle* btrfs_start_transaction(struct btrfs_root *root,
 	fs_info->generation++;
 	h->transid = fs_info->generation;
 	h->blocks_reserved = num_blocks;
-	h->reinit_extent_tree = false;
+	h->reinit_extent_tree = 0;
+	h->allocating_chunk = 0;
 	root->last_trans = h->transid;
 	root->commit_root = root->node;
 	extent_buffer_get(root->node);
diff --git a/transaction.h b/transaction.h
index 34060252dd5c..b426cd112447 100644
--- a/transaction.h
+++ b/transaction.h
@@ -28,7 +28,8 @@ struct btrfs_trans_handle {
 	u64 transid;
 	u64 alloc_exclude_start;
 	u64 alloc_exclude_nr;
-	bool reinit_extent_tree;
+	unsigned int reinit_extent_tree:1;
+	unsigned int allocating_chunk:1;
 	u64 delayed_ref_updates;
 	unsigned long blocks_reserved;
 	unsigned long blocks_used;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] btrfs-progs: Do metadata preallocation for fs trees and csum tree
  2019-04-16 10:21 [PATCH 0/2] btrfs-progs: Metadata preallocation enhancement Qu Wenruo
  2019-04-16 10:21 ` [PATCH 1/2] btrfs-progs: Avoid nested chunk allocation call Qu Wenruo
@ 2019-04-16 10:21 ` Qu Wenruo
  2019-06-05 16:39 ` [PATCH 0/2] btrfs-progs: Metadata preallocation enhancement David Sterba
  2 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2019-04-16 10:21 UTC (permalink / raw)
  To: linux-btrfs

In github issues, one user reports unexpected ENOSPC error if enabling
datasum druing convert.  After some investigation, it looks like that
during ext2_saved/image creation, we could create large file extent
whose size can be 128M (max data extent size).

In that case, its csum block will be at least 128K. Under certain case
we need to allocate extra metadata chunks to fulfill such space
requirement.

However we only do metadata prealloc if we're reserving extents for fs
trees.  (we use btrfs_root::ref_cows to determine whether we should do
metadata prealloc, and that member is only set for fs trees).

There is no explaination on why we only do metadata prealloc for file
trees, but from my educated guess, it could be related to avoid nested
extent/chunk tree modication.

At least extent reservation for csum tree shouldn't be a problem with
metadata block group preallocation.

So adding new condition for metadata preallocate to avoid unexpected
ENOSPC problem.

Issue: #123
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 extent-tree.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/extent-tree.c b/extent-tree.c
index e25ddf02e5cc..cf2e29258dfb 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2518,7 +2518,13 @@ int btrfs_reserve_extent(struct btrfs_trans_handle *trans,
 		profile = BTRFS_BLOCK_GROUP_METADATA | alloc_profile;
 	}
 
-	if (root->ref_cows) {
+	/*
+	 * Also preallocate metadata for csum tree and fs trees (root->ref_cows
+	 * already set), as they can consume a lot of metadata space.
+	 * Pre-allocate to avoid unexpected ENOSPC.
+	 */
+	if (root->ref_cows ||
+	    root->root_key.objectid == BTRFS_CSUM_TREE_OBJECTID) {
 		if (!(profile & BTRFS_BLOCK_GROUP_METADATA)) {
 			ret = do_chunk_alloc(trans, info,
 					     num_bytes,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs-progs: Avoid nested chunk allocation call
  2019-04-16 10:21 ` [PATCH 1/2] btrfs-progs: Avoid nested chunk allocation call Qu Wenruo
@ 2019-04-16 11:22   ` Filipe Manana
  2019-04-16 11:32     ` Qu Wenruo
  2019-06-05 16:32   ` David Sterba
  1 sibling, 1 reply; 9+ messages in thread
From: Filipe Manana @ 2019-04-16 11:22 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Apr 16, 2019 at 11:23 AM Qu Wenruo <wqu@suse.com> wrote:
>
> There is a hidden call loop which can trigger itself:
>
> btrfs_reserve_extent()             <--|
> |- do_chunk_alloc()                   |
>    |- btrfs_alloc_chunk()             |
>       |- btrfs_insert_item()          |
>          |- btrfs_reserve_extent() <--|
>
> Currently, we're using root->ref_cows to determine whether we should do
> chunk prealloc to avoid such loop.
>
> But that's still a hidden trap. Instead of solving it using some hidden
> tricks, this patch will make chunk/block group allocation exclusive.
>
> Now if do_chunk_alloc() determines to alloc chunk, it will set a special
> flag in transaction handler so it new do_chunk_alloc() will refuse to
> allocate new chunk until current chunk allocation finishes.

What if the chunk allocated during the recursion is of a different type?
I.e. we're allocating a data chunk, then when inserting the items in the
extent, chunk, device, etc trees, we run out of metadata data space and
need to allocate a metadata chunk. What you are doing skips the allocation
of the metadata chunk and makes the inserts/updates of the trees fail
with ENOSPC.

thanks



>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  check/main.c  |  2 +-
>  extent-tree.c | 12 ++++++++++++
>  transaction.c |  3 ++-
>  transaction.h |  3 ++-
>  4 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/check/main.c b/check/main.c
> index 683c322eba6f..121be4b73c4f 100644
> --- a/check/main.c
> +++ b/check/main.c
> @@ -10185,7 +10185,7 @@ int cmd_check(int argc, char **argv)
>                         goto close_out;
>                 }
>
> -               trans->reinit_extent_tree = true;
> +               trans->reinit_extent_tree = 1;
>                 if (init_extent_tree) {
>                         printf("Creating a new extent tree\n");
>                         ret = reinit_extent_tree(trans, info,
> diff --git a/extent-tree.c b/extent-tree.c
> index 8c9cdeff3b02..e25ddf02e5cc 100644
> --- a/extent-tree.c
> +++ b/extent-tree.c
> @@ -1873,10 +1873,21 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
>             (flags & BTRFS_BLOCK_GROUP_SYSTEM))
>                 return 0;
>
> +       /*
> +        * We're going to allocate new chunk, during the process, we will
> +        * allocate new tree blocks, which can trigger new chunk allocation
> +        * again.
> +        * Avoid such loop call
> +        */
> +       if (trans->allocating_chunk)
> +               return 0;
> +       trans->allocating_chunk = 1;
> +
>         ret = btrfs_alloc_chunk(trans, fs_info, &start, &num_bytes,
>                                 space_info->flags);
>         if (ret == -ENOSPC) {
>                 space_info->full = 1;
> +               trans->allocating_chunk = 0;
>                 return 0;
>         }
>
> @@ -1885,6 +1896,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
>         ret = btrfs_make_block_group(trans, fs_info, 0, space_info->flags,
>                                      start, num_bytes);
>         BUG_ON(ret);
> +       trans->allocating_chunk = 0;
>         return 0;
>  }
>
> diff --git a/transaction.c b/transaction.c
> index 3a63988b0969..21377282f806 100644
> --- a/transaction.c
> +++ b/transaction.c
> @@ -46,7 +46,8 @@ struct btrfs_trans_handle* btrfs_start_transaction(struct btrfs_root *root,
>         fs_info->generation++;
>         h->transid = fs_info->generation;
>         h->blocks_reserved = num_blocks;
> -       h->reinit_extent_tree = false;
> +       h->reinit_extent_tree = 0;
> +       h->allocating_chunk = 0;
>         root->last_trans = h->transid;
>         root->commit_root = root->node;
>         extent_buffer_get(root->node);
> diff --git a/transaction.h b/transaction.h
> index 34060252dd5c..b426cd112447 100644
> --- a/transaction.h
> +++ b/transaction.h
> @@ -28,7 +28,8 @@ struct btrfs_trans_handle {
>         u64 transid;
>         u64 alloc_exclude_start;
>         u64 alloc_exclude_nr;
> -       bool reinit_extent_tree;
> +       unsigned int reinit_extent_tree:1;
> +       unsigned int allocating_chunk:1;
>         u64 delayed_ref_updates;
>         unsigned long blocks_reserved;
>         unsigned long blocks_used;
> --
> 2.21.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs-progs: Avoid nested chunk allocation call
  2019-04-16 11:22   ` Filipe Manana
@ 2019-04-16 11:32     ` Qu Wenruo
  2019-04-16 11:40       ` Filipe Manana
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2019-04-16 11:32 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5368 bytes --]



On 2019/4/16 下午7:22, Filipe Manana wrote:
> On Tue, Apr 16, 2019 at 11:23 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> There is a hidden call loop which can trigger itself:
>>
>> btrfs_reserve_extent()             <--|
>> |- do_chunk_alloc()                   |
>>    |- btrfs_alloc_chunk()             |
>>       |- btrfs_insert_item()          |
>>          |- btrfs_reserve_extent() <--|
>>
>> Currently, we're using root->ref_cows to determine whether we should do
>> chunk prealloc to avoid such loop.
>>
>> But that's still a hidden trap. Instead of solving it using some hidden
>> tricks, this patch will make chunk/block group allocation exclusive.
>>
>> Now if do_chunk_alloc() determines to alloc chunk, it will set a special
>> flag in transaction handler so it new do_chunk_alloc() will refuse to
>> allocate new chunk until current chunk allocation finishes.
> 
> What if the chunk allocated during the recursion is of a different type?
> I.e. we're allocating a data chunk, then when inserting the items in the
> extent, chunk, device, etc trees, we run out of metadata data space and
> need to allocate a metadata chunk. What you are doing skips the allocation
> of the metadata chunk and makes the inserts/updates of the trees fail
> with ENOSPC.

That's why we do preallocation to avoid such problem.

Just as the old code shows, even when we're allocating data chunks, it
will try to check metadata space first.

And for the profile we're asking, we over-allocate 2M, definitely not
the best solution but should be enough for extent/chunk tree update.

To make it as good as kernel, we need a lot of accounting which is not
here in btrfs-progs.

So in your case, every btrfs_reserve_extent() call should either trigger
 chunk allocation (either metadata or data, or both), or have 2M overhead.

If that 2M can not meet the metadata space requirement for
chunk/extent/root tree update, then we hit the problem you described.

However 2M looks pretty generous to me, so it should just work.

Thanks,
Qu

> 
> thanks
> 
> 
> 
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  check/main.c  |  2 +-
>>  extent-tree.c | 12 ++++++++++++
>>  transaction.c |  3 ++-
>>  transaction.h |  3 ++-
>>  4 files changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/check/main.c b/check/main.c
>> index 683c322eba6f..121be4b73c4f 100644
>> --- a/check/main.c
>> +++ b/check/main.c
>> @@ -10185,7 +10185,7 @@ int cmd_check(int argc, char **argv)
>>                         goto close_out;
>>                 }
>>
>> -               trans->reinit_extent_tree = true;
>> +               trans->reinit_extent_tree = 1;
>>                 if (init_extent_tree) {
>>                         printf("Creating a new extent tree\n");
>>                         ret = reinit_extent_tree(trans, info,
>> diff --git a/extent-tree.c b/extent-tree.c
>> index 8c9cdeff3b02..e25ddf02e5cc 100644
>> --- a/extent-tree.c
>> +++ b/extent-tree.c
>> @@ -1873,10 +1873,21 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
>>             (flags & BTRFS_BLOCK_GROUP_SYSTEM))
>>                 return 0;
>>
>> +       /*
>> +        * We're going to allocate new chunk, during the process, we will
>> +        * allocate new tree blocks, which can trigger new chunk allocation
>> +        * again.
>> +        * Avoid such loop call
>> +        */
>> +       if (trans->allocating_chunk)
>> +               return 0;
>> +       trans->allocating_chunk = 1;
>> +
>>         ret = btrfs_alloc_chunk(trans, fs_info, &start, &num_bytes,
>>                                 space_info->flags);
>>         if (ret == -ENOSPC) {
>>                 space_info->full = 1;
>> +               trans->allocating_chunk = 0;
>>                 return 0;
>>         }
>>
>> @@ -1885,6 +1896,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
>>         ret = btrfs_make_block_group(trans, fs_info, 0, space_info->flags,
>>                                      start, num_bytes);
>>         BUG_ON(ret);
>> +       trans->allocating_chunk = 0;
>>         return 0;
>>  }
>>
>> diff --git a/transaction.c b/transaction.c
>> index 3a63988b0969..21377282f806 100644
>> --- a/transaction.c
>> +++ b/transaction.c
>> @@ -46,7 +46,8 @@ struct btrfs_trans_handle* btrfs_start_transaction(struct btrfs_root *root,
>>         fs_info->generation++;
>>         h->transid = fs_info->generation;
>>         h->blocks_reserved = num_blocks;
>> -       h->reinit_extent_tree = false;
>> +       h->reinit_extent_tree = 0;
>> +       h->allocating_chunk = 0;
>>         root->last_trans = h->transid;
>>         root->commit_root = root->node;
>>         extent_buffer_get(root->node);
>> diff --git a/transaction.h b/transaction.h
>> index 34060252dd5c..b426cd112447 100644
>> --- a/transaction.h
>> +++ b/transaction.h
>> @@ -28,7 +28,8 @@ struct btrfs_trans_handle {
>>         u64 transid;
>>         u64 alloc_exclude_start;
>>         u64 alloc_exclude_nr;
>> -       bool reinit_extent_tree;
>> +       unsigned int reinit_extent_tree:1;
>> +       unsigned int allocating_chunk:1;
>>         u64 delayed_ref_updates;
>>         unsigned long blocks_reserved;
>>         unsigned long blocks_used;
>> --
>> 2.21.0
>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs-progs: Avoid nested chunk allocation call
  2019-04-16 11:32     ` Qu Wenruo
@ 2019-04-16 11:40       ` Filipe Manana
  0 siblings, 0 replies; 9+ messages in thread
From: Filipe Manana @ 2019-04-16 11:40 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Apr 16, 2019 at 12:32 PM Qu Wenruo <wqu@suse.de> wrote:
>
>
>
> On 2019/4/16 下午7:22, Filipe Manana wrote:
> > On Tue, Apr 16, 2019 at 11:23 AM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> There is a hidden call loop which can trigger itself:
> >>
> >> btrfs_reserve_extent()             <--|
> >> |- do_chunk_alloc()                   |
> >>    |- btrfs_alloc_chunk()             |
> >>       |- btrfs_insert_item()          |
> >>          |- btrfs_reserve_extent() <--|
> >>
> >> Currently, we're using root->ref_cows to determine whether we should do
> >> chunk prealloc to avoid such loop.
> >>
> >> But that's still a hidden trap. Instead of solving it using some hidden
> >> tricks, this patch will make chunk/block group allocation exclusive.
> >>
> >> Now if do_chunk_alloc() determines to alloc chunk, it will set a special
> >> flag in transaction handler so it new do_chunk_alloc() will refuse to
> >> allocate new chunk until current chunk allocation finishes.
> >
> > What if the chunk allocated during the recursion is of a different type?
> > I.e. we're allocating a data chunk, then when inserting the items in the
> > extent, chunk, device, etc trees, we run out of metadata data space and
> > need to allocate a metadata chunk. What you are doing skips the allocation
> > of the metadata chunk and makes the inserts/updates of the trees fail
> > with ENOSPC.
>
> That's why we do preallocation to avoid such problem.

Ok, I wasn't aware about such preallocation in btrfs-progs.

>
> Just as the old code shows, even when we're allocating data chunks, it
> will try to check metadata space first.
>
> And for the profile we're asking, we over-allocate 2M, definitely not
> the best solution but should be enough for extent/chunk tree update.
>
> To make it as good as kernel, we need a lot of accounting which is not
> here in btrfs-progs.
>
> So in your case, every btrfs_reserve_extent() call should either trigger
>  chunk allocation (either metadata or data, or both), or have 2M overhead.
>
> If that 2M can not meet the metadata space requirement for
> chunk/extent/root tree update, then we hit the problem you described.
>
> However 2M looks pretty generous to me, so it should just work.

Yes, that should be enough even for 64Kb nodes and trees with 8 (max) levels.

thanks

>
> Thanks,
> Qu
>
> >
> > thanks
> >
> >
> >
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>  check/main.c  |  2 +-
> >>  extent-tree.c | 12 ++++++++++++
> >>  transaction.c |  3 ++-
> >>  transaction.h |  3 ++-
> >>  4 files changed, 17 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/check/main.c b/check/main.c
> >> index 683c322eba6f..121be4b73c4f 100644
> >> --- a/check/main.c
> >> +++ b/check/main.c
> >> @@ -10185,7 +10185,7 @@ int cmd_check(int argc, char **argv)
> >>                         goto close_out;
> >>                 }
> >>
> >> -               trans->reinit_extent_tree = true;
> >> +               trans->reinit_extent_tree = 1;
> >>                 if (init_extent_tree) {
> >>                         printf("Creating a new extent tree\n");
> >>                         ret = reinit_extent_tree(trans, info,
> >> diff --git a/extent-tree.c b/extent-tree.c
> >> index 8c9cdeff3b02..e25ddf02e5cc 100644
> >> --- a/extent-tree.c
> >> +++ b/extent-tree.c
> >> @@ -1873,10 +1873,21 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
> >>             (flags & BTRFS_BLOCK_GROUP_SYSTEM))
> >>                 return 0;
> >>
> >> +       /*
> >> +        * We're going to allocate new chunk, during the process, we will
> >> +        * allocate new tree blocks, which can trigger new chunk allocation
> >> +        * again.
> >> +        * Avoid such loop call
> >> +        */
> >> +       if (trans->allocating_chunk)
> >> +               return 0;
> >> +       trans->allocating_chunk = 1;
> >> +
> >>         ret = btrfs_alloc_chunk(trans, fs_info, &start, &num_bytes,
> >>                                 space_info->flags);
> >>         if (ret == -ENOSPC) {
> >>                 space_info->full = 1;
> >> +               trans->allocating_chunk = 0;
> >>                 return 0;
> >>         }
> >>
> >> @@ -1885,6 +1896,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
> >>         ret = btrfs_make_block_group(trans, fs_info, 0, space_info->flags,
> >>                                      start, num_bytes);
> >>         BUG_ON(ret);
> >> +       trans->allocating_chunk = 0;
> >>         return 0;
> >>  }
> >>
> >> diff --git a/transaction.c b/transaction.c
> >> index 3a63988b0969..21377282f806 100644
> >> --- a/transaction.c
> >> +++ b/transaction.c
> >> @@ -46,7 +46,8 @@ struct btrfs_trans_handle* btrfs_start_transaction(struct btrfs_root *root,
> >>         fs_info->generation++;
> >>         h->transid = fs_info->generation;
> >>         h->blocks_reserved = num_blocks;
> >> -       h->reinit_extent_tree = false;
> >> +       h->reinit_extent_tree = 0;
> >> +       h->allocating_chunk = 0;
> >>         root->last_trans = h->transid;
> >>         root->commit_root = root->node;
> >>         extent_buffer_get(root->node);
> >> diff --git a/transaction.h b/transaction.h
> >> index 34060252dd5c..b426cd112447 100644
> >> --- a/transaction.h
> >> +++ b/transaction.h
> >> @@ -28,7 +28,8 @@ struct btrfs_trans_handle {
> >>         u64 transid;
> >>         u64 alloc_exclude_start;
> >>         u64 alloc_exclude_nr;
> >> -       bool reinit_extent_tree;
> >> +       unsigned int reinit_extent_tree:1;
> >> +       unsigned int allocating_chunk:1;
> >>         u64 delayed_ref_updates;
> >>         unsigned long blocks_reserved;
> >>         unsigned long blocks_used;
> >> --
> >> 2.21.0
> >>
> >
> >
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs-progs: Avoid nested chunk allocation call
  2019-04-16 10:21 ` [PATCH 1/2] btrfs-progs: Avoid nested chunk allocation call Qu Wenruo
  2019-04-16 11:22   ` Filipe Manana
@ 2019-06-05 16:32   ` David Sterba
  2019-06-06  0:29     ` Qu Wenruo
  1 sibling, 1 reply; 9+ messages in thread
From: David Sterba @ 2019-06-05 16:32 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Apr 16, 2019 at 06:21:43PM +0800, Qu Wenruo wrote:
> There is a hidden call loop which can trigger itself:
> 
> btrfs_reserve_extent()             <--|
> |- do_chunk_alloc()                   |
>    |- btrfs_alloc_chunk()             |
>       |- btrfs_insert_item()          |
> 	 |- btrfs_reserve_extent() <--|
> 
> Currently, we're using root->ref_cows to determine whether we should do
> chunk prealloc to avoid such loop.
> 
> But that's still a hidden trap. Instead of solving it using some hidden
> tricks, this patch will make chunk/block group allocation exclusive.
> 
> Now if do_chunk_alloc() determines to alloc chunk, it will set a special
> flag in transaction handler so it new do_chunk_alloc() will refuse to
> allocate new chunk until current chunk allocation finishes.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  check/main.c  |  2 +-
>  extent-tree.c | 12 ++++++++++++
>  transaction.c |  3 ++-
>  transaction.h |  3 ++-
>  4 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/check/main.c b/check/main.c
> index 683c322eba6f..121be4b73c4f 100644
> --- a/check/main.c
> +++ b/check/main.c
> @@ -10185,7 +10185,7 @@ int cmd_check(int argc, char **argv)
>  			goto close_out;
>  		}
>  
> -		trans->reinit_extent_tree = true;
> +		trans->reinit_extent_tree = 1;
>  		if (init_extent_tree) {
>  			printf("Creating a new extent tree\n");
>  			ret = reinit_extent_tree(trans, info,
> diff --git a/extent-tree.c b/extent-tree.c
> index 8c9cdeff3b02..e25ddf02e5cc 100644
> --- a/extent-tree.c
> +++ b/extent-tree.c
> @@ -1873,10 +1873,21 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
>  	    (flags & BTRFS_BLOCK_GROUP_SYSTEM))
>  		return 0;
>  
> +	/*
> +	 * We're going to allocate new chunk, during the process, we will
> +	 * allocate new tree blocks, which can trigger new chunk allocation
> +	 * again.
> +	 * Avoid such loop call
> +	 */
> +	if (trans->allocating_chunk)
> +		return 0;
> +	trans->allocating_chunk = 1;
> +
>  	ret = btrfs_alloc_chunk(trans, fs_info, &start, &num_bytes,
>  	                        space_info->flags);
>  	if (ret == -ENOSPC) {
>  		space_info->full = 1;
> +		trans->allocating_chunk = 0;
>  		return 0;
>  	}
>  
> @@ -1885,6 +1896,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
>  	ret = btrfs_make_block_group(trans, fs_info, 0, space_info->flags,
>  				     start, num_bytes);
>  	BUG_ON(ret);
> +	trans->allocating_chunk = 0;
>  	return 0;
>  }
>  
> diff --git a/transaction.c b/transaction.c
> index 3a63988b0969..21377282f806 100644
> --- a/transaction.c
> +++ b/transaction.c
> @@ -46,7 +46,8 @@ struct btrfs_trans_handle* btrfs_start_transaction(struct btrfs_root *root,
>  	fs_info->generation++;
>  	h->transid = fs_info->generation;
>  	h->blocks_reserved = num_blocks;
> -	h->reinit_extent_tree = false;
> +	h->reinit_extent_tree = 0;
> +	h->allocating_chunk = 0;
>  	root->last_trans = h->transid;
>  	root->commit_root = root->node;
>  	extent_buffer_get(root->node);
> diff --git a/transaction.h b/transaction.h
> index 34060252dd5c..b426cd112447 100644
> --- a/transaction.h
> +++ b/transaction.h
> @@ -28,7 +28,8 @@ struct btrfs_trans_handle {
>  	u64 transid;
>  	u64 alloc_exclude_start;
>  	u64 alloc_exclude_nr;
> -	bool reinit_extent_tree;
> +	unsigned int reinit_extent_tree:1;
> +	unsigned int allocating_chunk:1;

Why do you switch reinit_extent_tree to the bit, this is an unrelated
change. I'll drop it and update changelog with the 2M preallocation
that was discussed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] btrfs-progs: Metadata preallocation enhancement
  2019-04-16 10:21 [PATCH 0/2] btrfs-progs: Metadata preallocation enhancement Qu Wenruo
  2019-04-16 10:21 ` [PATCH 1/2] btrfs-progs: Avoid nested chunk allocation call Qu Wenruo
  2019-04-16 10:21 ` [PATCH 2/2] btrfs-progs: Do metadata preallocation for fs trees and csum tree Qu Wenruo
@ 2019-06-05 16:39 ` David Sterba
  2 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2019-06-05 16:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Apr 16, 2019 at 06:21:42PM +0800, Qu Wenruo wrote:
> This patchset will address the github issue #123 to make btrfs-convert
> less possible to report false ENOSPC.
> 
> The first patch will try to avoid the nested chunk/extent tree
> modification in a more explicit way.
> 
> The 2nd patch will address the ENOSPC problem, by adding CSUM tree to
> the metadata preallocate list.
> 
> The 2nd patch is not as aggressive as 7a12d8470e5f ("btrfs-progs:
> Do metadata preallocation as long as we're not modifying extent tree").
> Even we have the 1st patch I still prefer to make less modification to
> avoid possible bugs.
> 
> Qu Wenruo (2):
>   btrfs-progs: Avoid nested chunk allocation call
>   btrfs-progs: Do metadata preallocation for fs trees and csum tree

Added to devel, thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs-progs: Avoid nested chunk allocation call
  2019-06-05 16:32   ` David Sterba
@ 2019-06-06  0:29     ` Qu Wenruo
  0 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2019-06-06  0:29 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3921 bytes --]



On 2019/6/6 上午12:32, David Sterba wrote:
> On Tue, Apr 16, 2019 at 06:21:43PM +0800, Qu Wenruo wrote:
>> There is a hidden call loop which can trigger itself:
>>
>> btrfs_reserve_extent()             <--|
>> |- do_chunk_alloc()                   |
>>    |- btrfs_alloc_chunk()             |
>>       |- btrfs_insert_item()          |
>> 	 |- btrfs_reserve_extent() <--|
>>
>> Currently, we're using root->ref_cows to determine whether we should do
>> chunk prealloc to avoid such loop.
>>
>> But that's still a hidden trap. Instead of solving it using some hidden
>> tricks, this patch will make chunk/block group allocation exclusive.
>>
>> Now if do_chunk_alloc() determines to alloc chunk, it will set a special
>> flag in transaction handler so it new do_chunk_alloc() will refuse to
>> allocate new chunk until current chunk allocation finishes.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  check/main.c  |  2 +-
>>  extent-tree.c | 12 ++++++++++++
>>  transaction.c |  3 ++-
>>  transaction.h |  3 ++-
>>  4 files changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/check/main.c b/check/main.c
>> index 683c322eba6f..121be4b73c4f 100644
>> --- a/check/main.c
>> +++ b/check/main.c
>> @@ -10185,7 +10185,7 @@ int cmd_check(int argc, char **argv)
>>  			goto close_out;
>>  		}
>>  
>> -		trans->reinit_extent_tree = true;
>> +		trans->reinit_extent_tree = 1;
>>  		if (init_extent_tree) {
>>  			printf("Creating a new extent tree\n");
>>  			ret = reinit_extent_tree(trans, info,
>> diff --git a/extent-tree.c b/extent-tree.c
>> index 8c9cdeff3b02..e25ddf02e5cc 100644
>> --- a/extent-tree.c
>> +++ b/extent-tree.c
>> @@ -1873,10 +1873,21 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
>>  	    (flags & BTRFS_BLOCK_GROUP_SYSTEM))
>>  		return 0;
>>  
>> +	/*
>> +	 * We're going to allocate new chunk, during the process, we will
>> +	 * allocate new tree blocks, which can trigger new chunk allocation
>> +	 * again.
>> +	 * Avoid such loop call
>> +	 */
>> +	if (trans->allocating_chunk)
>> +		return 0;
>> +	trans->allocating_chunk = 1;
>> +
>>  	ret = btrfs_alloc_chunk(trans, fs_info, &start, &num_bytes,
>>  	                        space_info->flags);
>>  	if (ret == -ENOSPC) {
>>  		space_info->full = 1;
>> +		trans->allocating_chunk = 0;
>>  		return 0;
>>  	}
>>  
>> @@ -1885,6 +1896,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
>>  	ret = btrfs_make_block_group(trans, fs_info, 0, space_info->flags,
>>  				     start, num_bytes);
>>  	BUG_ON(ret);
>> +	trans->allocating_chunk = 0;
>>  	return 0;
>>  }
>>  
>> diff --git a/transaction.c b/transaction.c
>> index 3a63988b0969..21377282f806 100644
>> --- a/transaction.c
>> +++ b/transaction.c
>> @@ -46,7 +46,8 @@ struct btrfs_trans_handle* btrfs_start_transaction(struct btrfs_root *root,
>>  	fs_info->generation++;
>>  	h->transid = fs_info->generation;
>>  	h->blocks_reserved = num_blocks;
>> -	h->reinit_extent_tree = false;
>> +	h->reinit_extent_tree = 0;
>> +	h->allocating_chunk = 0;
>>  	root->last_trans = h->transid;
>>  	root->commit_root = root->node;
>>  	extent_buffer_get(root->node);
>> diff --git a/transaction.h b/transaction.h
>> index 34060252dd5c..b426cd112447 100644
>> --- a/transaction.h
>> +++ b/transaction.h
>> @@ -28,7 +28,8 @@ struct btrfs_trans_handle {
>>  	u64 transid;
>>  	u64 alloc_exclude_start;
>>  	u64 alloc_exclude_nr;
>> -	bool reinit_extent_tree;
>> +	unsigned int reinit_extent_tree:1;
>> +	unsigned int allocating_chunk:1;
> 
> Why do you switch reinit_extent_tree to the bit, this is an unrelated
> change. I'll drop it and update changelog with the 2M preallocation
> that was discussed.
> 

Because in this patch, we're introducing new bit, thus to me it's pretty
valid to expanding old bool value into bits.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-06-06  0:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-16 10:21 [PATCH 0/2] btrfs-progs: Metadata preallocation enhancement Qu Wenruo
2019-04-16 10:21 ` [PATCH 1/2] btrfs-progs: Avoid nested chunk allocation call Qu Wenruo
2019-04-16 11:22   ` Filipe Manana
2019-04-16 11:32     ` Qu Wenruo
2019-04-16 11:40       ` Filipe Manana
2019-06-05 16:32   ` David Sterba
2019-06-06  0:29     ` Qu Wenruo
2019-04-16 10:21 ` [PATCH 2/2] btrfs-progs: Do metadata preallocation for fs trees and csum tree Qu Wenruo
2019-06-05 16:39 ` [PATCH 0/2] btrfs-progs: Metadata preallocation enhancement David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).