All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: simple Chunk allocator like calculation to replace Factor based calculation
       [not found] ` <20201001233649.888B.409509F4@e16-tech.com>
@ 2020-10-01 23:38   ` Qu Wenruo
  2020-10-02  1:30     ` Wang Yugui
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2020-10-01 23:38 UTC (permalink / raw)
  To: Wang Yugui, linux-btrfs



On 2020/10/1 下午11:36, Wang Yugui wrote:
> Hi, Qu Wenruo
> 
> Chunk allocator like calculation will get the right value, but it is
> slow for big file system such as 500T.

Nope, the ballon allocator doesn't have any size limit, thus it will try
to use as much space as possible in a single run.

If the 500T fs only has say 100T used, and the remaining 400T is split
into, say 2 parts, then just two small run would finish.

On the other hand, if the 500T is mostly used, only 100T unallocated and
the 100T is in a big unallocate chunk (under most cases it's true), then
just one allocation run is enough.

So in short, it's unrelated to the fs, but how fragmented the
unallocated space is.
And normally unallocated space is not fragmented at all, thus it's very
speedy.

> 
> we can call this 'Simple Chunk allocator' as 'free_space_by_min_profile',
> it is good enough to prevent over commit, and fast enough too.
> 
> N = .devs_min + .nparity
> 
> (N-th big free space) * (.devs_min) / (.devs_min + .nparity)  / .ncopies.

Nope. Check the basic unbalance case of 10T + 1T raid1, and cases like 6
disk RAID10 with 10T, 10T, 5T, 5T, 1T, 1T.

Last but not the least, please send such feedback to the mail list.
Open-source doesn't only mean source open, but also open discussion.

Without an open environment to discuss, it will be lame open-source
practice just like almost all Chinese companies do, publish a tarball
and call it a day. That's not open at all.

Thanks,
Qu

> 
> Best Regards
> 王玉贵
> 2020/10/01
> 
>> Hi, Qu Wenruo
>>
>> https://patchwork.kernel.org/patch/11810913/
>> is a good job.
>>
>> We have two types of estimation:
>> - Factor based calculation
>> - Chunk allocator like calculation
>>
>> In factor, we can have a simple Chunk allocator like calculation to
>> replace Factor based calculation to prevent overcommit.
>>
>> -Simple Chunk allocator like calculation
>> we just use the top devs_min devices to get the free space.
>> we use this value to prevent overcommit.
>>
>> this Simple Chunk allocator like calculation is always <= Chunk allocator like calculation.
>>
>> for  this example:
>>   devid 1 unallocated:	1T
>>   devid 2 unallocated:  1T
>>   devid 3 unallocated:	10T
>>   devid 4 unallocated:	5T
>>
>> RAID1 of Simple Chunk allocator like calculation :	5T(top 2 device)
>> RAID5 of Simple Chunk allocator like calculation :	2T(top 3 device)
>>
>> Simple Chunk allocator like calculation is fast for big file system just like
>> Factor based calculation.
>>
>> Best Regards
>> 王玉贵
>> 2020/10/01
>>
>> --------------------------------------
>> 北京京垓科技有限公司
>> 王玉贵	wangyugui@e16-tech.com
>> 电话:+86-136-71123776
> 
> --------------------------------------
> 北京京垓科技有限公司
> 王玉贵	wangyugui@e16-tech.com
> 电话:+86-136-71123776
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simple Chunk allocator like calculation to replace Factor based calculation
  2020-10-01 23:38   ` simple Chunk allocator like calculation to replace Factor based calculation Qu Wenruo
@ 2020-10-02  1:30     ` Wang Yugui
  2020-10-02  1:46       ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Wang Yugui @ 2020-10-02  1:30 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Hi, Qu Wenruo

> On 2020/10/1 下午11:36, Wang Yugui wrote:
> > Hi, Qu Wenruo
> > 
> > Chunk allocator like calculation will get the right value, but it is
> > slow for big file system such as 500T.
> 
> Nope, the ballon allocator doesn't have any size limit, thus it will try
> to use as much space as possible in a single run.
> 
> If the 500T fs only has say 100T used, and the remaining 400T is split
> into, say 2 parts, then just two small run would finish.
> 
> On the other hand, if the 500T is mostly used, only 100T unallocated and
> the 100T is in a big unallocate chunk (under most cases it's true), then
> just one allocation run is enough.
> 
> So in short, it's unrelated to the fs, but how fragmented the
> unallocated space is.
> And normally unallocated space is not fragmented at all, thus it's very
> speedy.

We need some more complex example for 'Chunk allocator like calculation'
to make it easy to understand.

such as
1) RAID10 with 8T,1T,1T,1T,1T
    the virtal chunk size of 1st iteration:	1T or 0.33T?
    1T    chunk will use 4T at most
    0.33T chunk will use 5.33T at most?

2) RAID10 with 8T,1T,1T,1T,0.5T
    the virtal chunk size of 1st iteration:0.5T or smaller?

Best Regards
王玉贵
2020/10/02


--------------------------------------
北京京垓科技有限公司
王玉贵	wangyugui@e16-tech.com
电话:+86-136-71123776


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simple Chunk allocator like calculation to replace Factor based calculation
  2020-10-02  1:30     ` Wang Yugui
@ 2020-10-02  1:46       ` Qu Wenruo
  2020-10-02  1:59         ` Wang Yugui
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2020-10-02  1:46 UTC (permalink / raw)
  To: Wang Yugui, Qu Wenruo; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2128 bytes --]



On 2020/10/2 上午9:30, Wang Yugui wrote:
> Hi, Qu Wenruo
> 
>> On 2020/10/1 下午11:36, Wang Yugui wrote:
>>> Hi, Qu Wenruo
>>>
>>> Chunk allocator like calculation will get the right value, but it is
>>> slow for big file system such as 500T.
>>
>> Nope, the ballon allocator doesn't have any size limit, thus it will try
>> to use as much space as possible in a single run.
>>
>> If the 500T fs only has say 100T used, and the remaining 400T is split
>> into, say 2 parts, then just two small run would finish.
>>
>> On the other hand, if the 500T is mostly used, only 100T unallocated and
>> the 100T is in a big unallocate chunk (under most cases it's true), then
>> just one allocation run is enough.
>>
>> So in short, it's unrelated to the fs, but how fragmented the
>> unallocated space is.
>> And normally unallocated space is not fragmented at all, thus it's very
>> speedy.
> 
> We need some more complex example for 'Chunk allocator like calculation'
> to make it easy to understand.
> 
> such as
> 1) RAID10 with 8T,1T,1T,1T,1T
>     the virtal chunk size of 1st iteration:	1T or 0.33T?
>     1T    chunk will use 4T at most
>     0.33T chunk will use 5.33T at most?

You didn't get the point.
For the each loop we:
- Sort the devices with their free space.
  In this case, it's 8, 1, 1, 1, 1.

- Round down to dev increament
  Then we got 8, 1, 1, 1.

- Then allocate chunk
  Since the biggest unallocated space is 1T, we allocate a RAID10 with
  1T stripe size, which will be a 2T chunk.

  The remaining size is 7, 0, 0, 0, 1.

- We go to next round.
  No way to allocate new chunk.

In this case, we can only get 2T chunk.
Just as the chunk allocator do.

> 
> 2) RAID10 with 8T,1T,1T,1T,0.5T
>     the virtal chunk size of 1st iteration:0.5T or smaller?

Still the same, 2T chunk can be allocated using the largest 4 devices.

Thanks,
Qu

> 
> Best Regards
> 王玉贵
> 2020/10/02
> 
> 
> --------------------------------------
> 北京京垓科技有限公司
> 王玉贵	wangyugui@e16-tech.com
> 电话:+86-136-71123776
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simple Chunk allocator like calculation to replace Factor based calculation
  2020-10-02  1:46       ` Qu Wenruo
@ 2020-10-02  1:59         ` Wang Yugui
  2020-10-02  3:06           ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Wang Yugui @ 2020-10-02  1:59 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

Hi,


> > such as
> > 1) RAID10 with 8T,1T,1T,1T,1T
> >     the virtal chunk size of 1st iteration:	1T or 0.33T?
> >     1T    chunk will use 4T at most
> >     0.33T chunk will use 5.33T at most?
> 
> You didn't get the point.
> For the each loop we:
> - Sort the devices with their free space.
>   In this case, it's 8, 1, 1, 1, 1.
> 
> - Round down to dev increament
>   Then we got 8, 1, 1, 1.
> 
> - Then allocate chunk
>   Since the biggest unallocated space is 1T, we allocate a RAID10 with
>   1T stripe size, which will be a 2T chunk.
> 
>   The remaining size is 7, 0, 0, 0, 1.

That is the problem.  1T chunk size is too big for this case.

if we use 1/3T chunk size, the result will be same as 1G chunk size.

1st:	8T - 1/3T     2/3T, 2/3T, 2/3T, 1T
2nd:	8T -2/3T     2/3T, 1/3T, 1/3T, 2/3T
3rd:	8T -1T        1/3T, 1/3T, 0T, 1/3T
4th:	8T -4/3T     0T,     0T,   0T,  0T

Best Regards
王玉贵
2020/10/02

> 
> - We go to next round.
>   No way to allocate new chunk.
> 
> In this case, we can only get 2T chunk.
> Just as the chunk allocator do.
> 
> > 
> > 2) RAID10 with 8T,1T,1T,1T,0.5T
> >     the virtal chunk size of 1st iteration:0.5T or smaller?
> 
> Still the same, 2T chunk can be allocated using the largest 4 devices.
> 
> Thanks,
> Qu
> 
> > 
> > Best Regards
> > 王玉贵
> > 2020/10/02
> > 
> > 
> > --------------------------------------
> > 北京京垓科技有限公司
> > 王玉贵	wangyugui@e16-tech.com
> > 电话:+86-136-71123776
> > 
> 

--------------------------------------
北京京垓科技有限公司
王玉贵	wangyugui@e16-tech.com
电话:+86-136-71123776


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simple Chunk allocator like calculation to replace Factor based calculation
  2020-10-02  1:59         ` Wang Yugui
@ 2020-10-02  3:06           ` Qu Wenruo
  2020-10-02  9:01             ` Wang Yugui
  2020-10-02  9:06             ` Wang Yugui
  0 siblings, 2 replies; 10+ messages in thread
From: Qu Wenruo @ 2020-10-02  3:06 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2387 bytes --]



On 2020/10/2 上午9:59, Wang Yugui wrote:
> Hi,
> 
> 
>>> such as
>>> 1) RAID10 with 8T,1T,1T,1T,1T
>>>     the virtal chunk size of 1st iteration:	1T or 0.33T?
>>>     1T    chunk will use 4T at most
>>>     0.33T chunk will use 5.33T at most?
>>
>> You didn't get the point.
>> For the each loop we:
>> - Sort the devices with their free space.
>>   In this case, it's 8, 1, 1, 1, 1.
>>
>> - Round down to dev increament
>>   Then we got 8, 1, 1, 1.
>>
>> - Then allocate chunk
>>   Since the biggest unallocated space is 1T, we allocate a RAID10 with
>>   1T stripe size, which will be a 2T chunk.
>>
>>   The remaining size is 7, 0, 0, 0, 1.
> 
> That is the problem.  1T chunk size is too big for this case.
> 
> if we use 1/3T chunk size, the result will be same as 1G chunk size.
> 
> 1st:	8T - 1/3T     2/3T, 2/3T, 2/3T, 1T
> 2nd:	8T -2/3T     2/3T, 1/3T, 1/3T, 2/3T
> 3rd:	8T -1T        1/3T, 1/3T, 0T, 1/3T
> 4th:	8T -4/3T     0T,     0T,   0T,  0T

You're right, smaller balloon chunk size would make the allocation more
accurate to real chunk allocator.

However that would slow down the calculation, don't forget that we need
to run that calculation on each chunk allocation.
Changing the balloon chunk allocation size dynamically may improve this.

But please also keep in mind that, with less and less space left, our
predication will be more and more accurate, and under estimate is always
less a problem.

So I'll keep your suggestion for future enhancement.

Thanks for pointing out the pitfall of current calculation,
Qu

> 
> Best Regards
> 王玉贵
> 2020/10/02
> 
>>
>> - We go to next round.
>>   No way to allocate new chunk.
>>
>> In this case, we can only get 2T chunk.
>> Just as the chunk allocator do.
>>
>>>
>>> 2) RAID10 with 8T,1T,1T,1T,0.5T
>>>     the virtal chunk size of 1st iteration:0.5T or smaller?
>>
>> Still the same, 2T chunk can be allocated using the largest 4 devices.
>>
>> Thanks,
>> Qu
>>
>>>
>>> Best Regards
>>> 王玉贵
>>> 2020/10/02
>>>
>>>
>>> --------------------------------------
>>> 北京京垓科技有限公司
>>> 王玉贵	wangyugui@e16-tech.com
>>> 电话:+86-136-71123776
>>>
>>
> 
> --------------------------------------
> 北京京垓科技有限公司
> 王玉贵	wangyugui@e16-tech.com
> 电话:+86-136-71123776
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simple Chunk allocator like calculation to replace Factor based calculation
  2020-10-02  3:06           ` Qu Wenruo
@ 2020-10-02  9:01             ` Wang Yugui
  2020-10-02  9:15               ` Qu Wenruo
  2020-10-02  9:06             ` Wang Yugui
  1 sibling, 1 reply; 10+ messages in thread
From: Wang Yugui @ 2020-10-02  9:01 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

Hi,

We have another user case difficult to process.

Use case: 
Add 10T disk * 4  to near full  full RAID10 10T *4;
free space maybe be such as 10T,10T,10T,10T,2G,2G,2G,2G.

There maybe a lot of iterations for this case because of 2G chunk size,
and then result in bad performance?

Best Regards
王玉贵
2020/10/02

> 
> 
> On 2020/10/2 上午9:59, Wang Yugui wrote:
> > Hi,
> > 
> > 
> >>> such as
> >>> 1) RAID10 with 8T,1T,1T,1T,1T
> >>>     the virtal chunk size of 1st iteration:	1T or 0.33T?
> >>>     1T    chunk will use 4T at most
> >>>     0.33T chunk will use 5.33T at most?
> >>
> >> You didn't get the point.
> >> For the each loop we:
> >> - Sort the devices with their free space.
> >>   In this case, it's 8, 1, 1, 1, 1.
> >>
> >> - Round down to dev increament
> >>   Then we got 8, 1, 1, 1.
> >>
> >> - Then allocate chunk
> >>   Since the biggest unallocated space is 1T, we allocate a RAID10 with
> >>   1T stripe size, which will be a 2T chunk.
> >>
> >>   The remaining size is 7, 0, 0, 0, 1.
> > 
> > That is the problem.  1T chunk size is too big for this case.
> > 
> > if we use 1/3T chunk size, the result will be same as 1G chunk size.
> > 
> > 1st:	8T - 1/3T     2/3T, 2/3T, 2/3T, 1T
> > 2nd:	8T -2/3T     2/3T, 1/3T, 1/3T, 2/3T
> > 3rd:	8T -1T        1/3T, 1/3T, 0T, 1/3T
> > 4th:	8T -4/3T     0T,     0T,   0T,  0T
> 
> You're right, smaller balloon chunk size would make the allocation more
> accurate to real chunk allocator.
> 
> However that would slow down the calculation, don't forget that we need
> to run that calculation on each chunk allocation.
> Changing the balloon chunk allocation size dynamically may improve this.
> 
> But please also keep in mind that, with less and less space left, our
> predication will be more and more accurate, and under estimate is always
> less a problem.
> 
> So I'll keep your suggestion for future enhancement.
> 
> Thanks for pointing out the pitfall of current calculation,
> Qu
> 
> > 
> > Best Regards
> > 王玉贵
> > 2020/10/02
> > 
> >>
> >> - We go to next round.
> >>   No way to allocate new chunk.
> >>
> >> In this case, we can only get 2T chunk.
> >> Just as the chunk allocator do.
> >>
> >>>
> >>> 2) RAID10 with 8T,1T,1T,1T,0.5T
> >>>     the virtal chunk size of 1st iteration:0.5T or smaller?
> >>
> >> Still the same, 2T chunk can be allocated using the largest 4 devices.
> >>
> >> Thanks,
> >> Qu
> >>
> >>>
> >>> Best Regards
> >>> 王玉贵
> >>> 2020/10/02
> >>>
> >>>
> >>> --------------------------------------
> >>> 北京京垓科技有限公司
> >>> 王玉贵	wangyugui@e16-tech.com
> >>> 电话:+86-136-71123776
> >>>
> >>
> > 
> > --------------------------------------
> > 北京京垓科技有限公司
> > 王玉贵	wangyugui@e16-tech.com
> > 电话:+86-136-71123776
> > 
> 

--------------------------------------
北京京垓科技有限公司
王玉贵	wangyugui@e16-tech.com
电话:+86-136-71123776


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simple Chunk allocator like calculation to replace Factor based calculation
  2020-10-02  3:06           ` Qu Wenruo
  2020-10-02  9:01             ` Wang Yugui
@ 2020-10-02  9:06             ` Wang Yugui
  2020-10-02 10:13               ` Hugo Mills
  1 sibling, 1 reply; 10+ messages in thread
From: Wang Yugui @ 2020-10-02  9:06 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

Hi,

We have another user case difficult to process.

#with a fix of RAID10 to RAID1C4
#for RAID10, the iteration number is not big.
#but for RAID1C4,the iteration number is big.

Use case: 
Add 10T disk * 4  to near full  full RAID1C4 10T *4;
free space maybe be such as 10T,10T,10T,10T,2G,2G,2G,2G.

There maybe a lot of iterations for this case because of 2G chunk size,
and then result in bad performance?


Best Regards
王玉贵
2020/10/02

> 
> 
> On 2020/10/2 上午9:59, Wang Yugui wrote:
> > Hi,
> > 
> > 
> >>> such as
> >>> 1) RAID10 with 8T,1T,1T,1T,1T
> >>>     the virtal chunk size of 1st iteration:	1T or 0.33T?
> >>>     1T    chunk will use 4T at most
> >>>     0.33T chunk will use 5.33T at most?
> >>
> >> You didn't get the point.
> >> For the each loop we:
> >> - Sort the devices with their free space.
> >>   In this case, it's 8, 1, 1, 1, 1.
> >>
> >> - Round down to dev increament
> >>   Then we got 8, 1, 1, 1.
> >>
> >> - Then allocate chunk
> >>   Since the biggest unallocated space is 1T, we allocate a RAID10 with
> >>   1T stripe size, which will be a 2T chunk.
> >>
> >>   The remaining size is 7, 0, 0, 0, 1.
> > 
> > That is the problem.  1T chunk size is too big for this case.
> > 
> > if we use 1/3T chunk size, the result will be same as 1G chunk size.
> > 
> > 1st:	8T - 1/3T     2/3T, 2/3T, 2/3T, 1T
> > 2nd:	8T -2/3T     2/3T, 1/3T, 1/3T, 2/3T
> > 3rd:	8T -1T        1/3T, 1/3T, 0T, 1/3T
> > 4th:	8T -4/3T     0T,     0T,   0T,  0T
> 
> You're right, smaller balloon chunk size would make the allocation more
> accurate to real chunk allocator.
> 
> However that would slow down the calculation, don't forget that we need
> to run that calculation on each chunk allocation.
> Changing the balloon chunk allocation size dynamically may improve this.
> 
> But please also keep in mind that, with less and less space left, our
> predication will be more and more accurate, and under estimate is always
> less a problem.
> 
> So I'll keep your suggestion for future enhancement.
> 
> Thanks for pointing out the pitfall of current calculation,
> Qu
> 
> > 
> > Best Regards
> > 王玉贵
> > 2020/10/02
> > 
> >>
> >> - We go to next round.
> >>   No way to allocate new chunk.
> >>
> >> In this case, we can only get 2T chunk.
> >> Just as the chunk allocator do.
> >>
> >>>
> >>> 2) RAID10 with 8T,1T,1T,1T,0.5T
> >>>     the virtal chunk size of 1st iteration:0.5T or smaller?
> >>
> >> Still the same, 2T chunk can be allocated using the largest 4 devices.
> >>
> >> Thanks,
> >> Qu
> >>
> >>>
> >>> Best Regards
> >>> 王玉贵
> >>> 2020/10/02
> >>>
> >>>
> >>> --------------------------------------
> >>> 北京京垓科技有限公司
> >>> 王玉贵	wangyugui@e16-tech.com
> >>> 电话:+86-136-71123776
> >>>
> >>
> > 
> > --------------------------------------
> > 北京京垓科技有限公司
> > 王玉贵	wangyugui@e16-tech.com
> > 电话:+86-136-71123776
> > 
> 

--------------------------------------
北京京垓科技有限公司
王玉贵	wangyugui@e16-tech.com
电话:+86-136-71123776


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simple Chunk allocator like calculation to replace Factor based calculation
  2020-10-02  9:01             ` Wang Yugui
@ 2020-10-02  9:15               ` Qu Wenruo
  0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2020-10-02  9:15 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3393 bytes --]



On 2020/10/2 下午5:01, Wang Yugui wrote:
> Hi,
> 
> We have another user case difficult to process.
> 
> Use case: 
> Add 10T disk * 4  to near full  full RAID10 10T *4;
> free space maybe be such as 10T,10T,10T,10T,2G,2G,2G,2G.
> 
> There maybe a lot of iterations for this case because of 2G chunk size,
> and then result in bad performance?

For that case, it won't. Just 2 runs.

The first run, we choose to use 2G as stripe length, it results the last
4 devices to be exhausted.

Then in the 2nd run, we use the remaining (10T-2G) as stripe size, and
exhaust all devices.

Thanks,
Qu

> 
> Best Regards
> 王玉贵
> 2020/10/02
> 
>>
>>
>> On 2020/10/2 上午9:59, Wang Yugui wrote:
>>> Hi,
>>>
>>>
>>>>> such as
>>>>> 1) RAID10 with 8T,1T,1T,1T,1T
>>>>>     the virtal chunk size of 1st iteration:	1T or 0.33T?
>>>>>     1T    chunk will use 4T at most
>>>>>     0.33T chunk will use 5.33T at most?
>>>>
>>>> You didn't get the point.
>>>> For the each loop we:
>>>> - Sort the devices with their free space.
>>>>   In this case, it's 8, 1, 1, 1, 1.
>>>>
>>>> - Round down to dev increament
>>>>   Then we got 8, 1, 1, 1.
>>>>
>>>> - Then allocate chunk
>>>>   Since the biggest unallocated space is 1T, we allocate a RAID10 with
>>>>   1T stripe size, which will be a 2T chunk.
>>>>
>>>>   The remaining size is 7, 0, 0, 0, 1.
>>>
>>> That is the problem.  1T chunk size is too big for this case.
>>>
>>> if we use 1/3T chunk size, the result will be same as 1G chunk size.
>>>
>>> 1st:	8T - 1/3T     2/3T, 2/3T, 2/3T, 1T
>>> 2nd:	8T -2/3T     2/3T, 1/3T, 1/3T, 2/3T
>>> 3rd:	8T -1T        1/3T, 1/3T, 0T, 1/3T
>>> 4th:	8T -4/3T     0T,     0T,   0T,  0T
>>
>> You're right, smaller balloon chunk size would make the allocation more
>> accurate to real chunk allocator.
>>
>> However that would slow down the calculation, don't forget that we need
>> to run that calculation on each chunk allocation.
>> Changing the balloon chunk allocation size dynamically may improve this.
>>
>> But please also keep in mind that, with less and less space left, our
>> predication will be more and more accurate, and under estimate is always
>> less a problem.
>>
>> So I'll keep your suggestion for future enhancement.
>>
>> Thanks for pointing out the pitfall of current calculation,
>> Qu
>>
>>>
>>> Best Regards
>>> 王玉贵
>>> 2020/10/02
>>>
>>>>
>>>> - We go to next round.
>>>>   No way to allocate new chunk.
>>>>
>>>> In this case, we can only get 2T chunk.
>>>> Just as the chunk allocator do.
>>>>
>>>>>
>>>>> 2) RAID10 with 8T,1T,1T,1T,0.5T
>>>>>     the virtal chunk size of 1st iteration:0.5T or smaller?
>>>>
>>>> Still the same, 2T chunk can be allocated using the largest 4 devices.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>>
>>>>> Best Regards
>>>>> 王玉贵
>>>>> 2020/10/02
>>>>>
>>>>>
>>>>> --------------------------------------
>>>>> 北京京垓科技有限公司
>>>>> 王玉贵	wangyugui@e16-tech.com
>>>>> 电话:+86-136-71123776
>>>>>
>>>>
>>>
>>> --------------------------------------
>>> 北京京垓科技有限公司
>>> 王玉贵	wangyugui@e16-tech.com
>>> 电话:+86-136-71123776
>>>
>>
> 
> --------------------------------------
> 北京京垓科技有限公司
> 王玉贵	wangyugui@e16-tech.com
> 电话:+86-136-71123776
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simple Chunk allocator like calculation to replace Factor based calculation
  2020-10-02  9:06             ` Wang Yugui
@ 2020-10-02 10:13               ` Hugo Mills
  2020-10-02 10:23                 ` Hugo Mills
  0 siblings, 1 reply; 10+ messages in thread
From: Hugo Mills @ 2020-10-02 10:13 UTC (permalink / raw)
  To: Wang Yugui; +Cc: Qu Wenruo, Qu Wenruo, linux-btrfs

   We can do this calculation precisely, in at most O(n^2) time, where
n is the number of *devices* in the FS.

Inner step:

   Let s be the number of devices allocated in a single allocation step(*).

   Sort the n devices in decreasing order of size, c[i].

   For each device i, let b[i] = floor(sum c[j] / (s-i)), where the
      sum is taken over all devices smaller than i.

   Throw out values of b[i] where c[i] < b[i].

   Let B = floor(sum c[j] / n), where the sum is taken over all devices.

   Let t_max be the smallest value of B and the remaining b[i].

   t_max is the number of allocations that can be performed on the
   filesystem using an allocation size of s.

Outer step:

   If the RAID level has a fixed s value, run the inner step and stop.

   If the RAID level can vary, run the inner step, setting s to the
   number of devices with free space on at each iteration.


(*) single:       s = 1
    RAID1(c3,c4): s = 2(,3,4)
    RAID10:       s = 4
    RAID0,5,6:    s = the number of devices with free space

   This is the algorithm used in the carfax btrfs-usage calculator,
and has been fairly comprehensively battle-tested over the years,
although I was unable to prove its correctness mathematically.

   Hugo.

On Fri, Oct 02, 2020 at 05:06:14PM +0800, Wang Yugui wrote:
> Hi,
> 
> We have another user case difficult to process.
> 
> #with a fix of RAID10 to RAID1C4
> #for RAID10, the iteration number is not big.
> #but for RAID1C4,the iteration number is big.
> 
> Use case: 
> Add 10T disk * 4  to near full  full RAID1C4 10T *4;
> free space maybe be such as 10T,10T,10T,10T,2G,2G,2G,2G.
> 
> There maybe a lot of iterations for this case because of 2G chunk size,
> and then result in bad performance?
> 
> 
> Best Regards
> 王玉贵
> 2020/10/02
> 
> > 
> > 
> > On 2020/10/2 上午9:59, Wang Yugui wrote:
> > > Hi,
> > > 
> > > 
> > >>> such as
> > >>> 1) RAID10 with 8T,1T,1T,1T,1T
> > >>>     the virtal chunk size of 1st iteration:	1T or 0.33T?
> > >>>     1T    chunk will use 4T at most
> > >>>     0.33T chunk will use 5.33T at most?
> > >>
> > >> You didn't get the point.
> > >> For the each loop we:
> > >> - Sort the devices with their free space.
> > >>   In this case, it's 8, 1, 1, 1, 1.
> > >>
> > >> - Round down to dev increament
> > >>   Then we got 8, 1, 1, 1.
> > >>
> > >> - Then allocate chunk
> > >>   Since the biggest unallocated space is 1T, we allocate a RAID10 with
> > >>   1T stripe size, which will be a 2T chunk.
> > >>
> > >>   The remaining size is 7, 0, 0, 0, 1.
> > > 
> > > That is the problem.  1T chunk size is too big for this case.
> > > 
> > > if we use 1/3T chunk size, the result will be same as 1G chunk size.
> > > 
> > > 1st:	8T - 1/3T     2/3T, 2/3T, 2/3T, 1T
> > > 2nd:	8T -2/3T     2/3T, 1/3T, 1/3T, 2/3T
> > > 3rd:	8T -1T        1/3T, 1/3T, 0T, 1/3T
> > > 4th:	8T -4/3T     0T,     0T,   0T,  0T
> > 
> > You're right, smaller balloon chunk size would make the allocation more
> > accurate to real chunk allocator.
> > 
> > However that would slow down the calculation, don't forget that we need
> > to run that calculation on each chunk allocation.
> > Changing the balloon chunk allocation size dynamically may improve this.
> > 
> > But please also keep in mind that, with less and less space left, our
> > predication will be more and more accurate, and under estimate is always
> > less a problem.
> > 
> > So I'll keep your suggestion for future enhancement.
> > 
> > Thanks for pointing out the pitfall of current calculation,
> > Qu
> > 
> > > 
> > > Best Regards
> > > 王玉贵
> > > 2020/10/02
> > > 
> > >>
> > >> - We go to next round.
> > >>   No way to allocate new chunk.
> > >>
> > >> In this case, we can only get 2T chunk.
> > >> Just as the chunk allocator do.
> > >>
> > >>>
> > >>> 2) RAID10 with 8T,1T,1T,1T,0.5T
> > >>>     the virtal chunk size of 1st iteration:0.5T or smaller?
> > >>
> > >> Still the same, 2T chunk can be allocated using the largest 4 devices.
> > >>
> > >> Thanks,
> > >> Qu
> > >>
> > >>>
> > >>> Best Regards
> > >>> 王玉贵
> > >>> 2020/10/02
> > >>>
> > >>>
> > >>> --------------------------------------
> > >>> 北京京垓科技有限公司
> > >>> 王玉贵	wangyugui@e16-tech.com
> > >>> 电话:+86-136-71123776
> > >>>
> > >>
> > > 
> > > --------------------------------------
> > > 北京京垓科技有限公司
> > > 王玉贵	wangyugui@e16-tech.com
> > > 电话:+86-136-71123776
> > > 
> > 
> 
> --------------------------------------
> 北京京垓科技有限公司
> 王玉贵	wangyugui@e16-tech.com
> 电话:+86-136-71123776
> 

-- 
Hugo Mills             | Turning, pages turning in the widening bath,
hugo@... carfax.org.uk | The spine cannot bear the humidity.
http://carfax.org.uk/  | Books fall apart; the binding cannot hold.
PGP: E2AB1DE4          | Page 129 is loosed upon the world.               Zarf

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simple Chunk allocator like calculation to replace Factor based calculation
  2020-10-02 10:13               ` Hugo Mills
@ 2020-10-02 10:23                 ` Hugo Mills
  0 siblings, 0 replies; 10+ messages in thread
From: Hugo Mills @ 2020-10-02 10:23 UTC (permalink / raw)
  To: Wang Yugui, Qu Wenruo, Qu Wenruo, linux-btrfs

   Oh, I forgot: the write-up of this algorithm, sich as it is, is at
https://carfax.org.uk/files/btrfs-usage.pdf, if anyone's interested.
It's incomplete because I was unable to complete a formal proof of its
correctness.

   Hugo.

On Fri, Oct 02, 2020 at 11:13:13AM +0100, Hugo Mills wrote:
>    We can do this calculation precisely, in at most O(n^2) time, where
> n is the number of *devices* in the FS.
> 
> Inner step:
> 
>    Let s be the number of devices allocated in a single allocation step(*).
> 
>    Sort the n devices in decreasing order of size, c[i].
> 
>    For each device i, let b[i] = floor(sum c[j] / (s-i)), where the
>       sum is taken over all devices smaller than i.
> 
>    Throw out values of b[i] where c[i] < b[i].
> 
>    Let B = floor(sum c[j] / n), where the sum is taken over all devices.
> 
>    Let t_max be the smallest value of B and the remaining b[i].
> 
>    t_max is the number of allocations that can be performed on the
>    filesystem using an allocation size of s.
> 
> Outer step:
> 
>    If the RAID level has a fixed s value, run the inner step and stop.
> 
>    If the RAID level can vary, run the inner step, setting s to the
>    number of devices with free space on at each iteration.
> 
> 
> (*) single:       s = 1
>     RAID1(c3,c4): s = 2(,3,4)
>     RAID10:       s = 4
>     RAID0,5,6:    s = the number of devices with free space
> 
>    This is the algorithm used in the carfax btrfs-usage calculator,
> and has been fairly comprehensively battle-tested over the years,
> although I was unable to prove its correctness mathematically.
> 
>    Hugo.
> 
> On Fri, Oct 02, 2020 at 05:06:14PM +0800, Wang Yugui wrote:
> > Hi,
> > 
> > We have another user case difficult to process.
> > 
> > #with a fix of RAID10 to RAID1C4
> > #for RAID10, the iteration number is not big.
> > #but for RAID1C4,the iteration number is big.
> > 
> > Use case: 
> > Add 10T disk * 4  to near full  full RAID1C4 10T *4;
> > free space maybe be such as 10T,10T,10T,10T,2G,2G,2G,2G.
> > 
> > There maybe a lot of iterations for this case because of 2G chunk size,
> > and then result in bad performance?
> > 
> > 
> > Best Regards
> > 王玉贵
> > 2020/10/02
> > 
> > > 
> > > 
> > > On 2020/10/2 上午9:59, Wang Yugui wrote:
> > > > Hi,
> > > > 
> > > > 
> > > >>> such as
> > > >>> 1) RAID10 with 8T,1T,1T,1T,1T
> > > >>>     the virtal chunk size of 1st iteration:	1T or 0.33T?
> > > >>>     1T    chunk will use 4T at most
> > > >>>     0.33T chunk will use 5.33T at most?
> > > >>
> > > >> You didn't get the point.
> > > >> For the each loop we:
> > > >> - Sort the devices with their free space.
> > > >>   In this case, it's 8, 1, 1, 1, 1.
> > > >>
> > > >> - Round down to dev increament
> > > >>   Then we got 8, 1, 1, 1.
> > > >>
> > > >> - Then allocate chunk
> > > >>   Since the biggest unallocated space is 1T, we allocate a RAID10 with
> > > >>   1T stripe size, which will be a 2T chunk.
> > > >>
> > > >>   The remaining size is 7, 0, 0, 0, 1.
> > > > 
> > > > That is the problem.  1T chunk size is too big for this case.
> > > > 
> > > > if we use 1/3T chunk size, the result will be same as 1G chunk size.
> > > > 
> > > > 1st:	8T - 1/3T     2/3T, 2/3T, 2/3T, 1T
> > > > 2nd:	8T -2/3T     2/3T, 1/3T, 1/3T, 2/3T
> > > > 3rd:	8T -1T        1/3T, 1/3T, 0T, 1/3T
> > > > 4th:	8T -4/3T     0T,     0T,   0T,  0T
> > > 
> > > You're right, smaller balloon chunk size would make the allocation more
> > > accurate to real chunk allocator.
> > > 
> > > However that would slow down the calculation, don't forget that we need
> > > to run that calculation on each chunk allocation.
> > > Changing the balloon chunk allocation size dynamically may improve this.
> > > 
> > > But please also keep in mind that, with less and less space left, our
> > > predication will be more and more accurate, and under estimate is always
> > > less a problem.
> > > 
> > > So I'll keep your suggestion for future enhancement.
> > > 
> > > Thanks for pointing out the pitfall of current calculation,
> > > Qu
> > > 
> > > > 
> > > > Best Regards
> > > > 王玉贵
> > > > 2020/10/02
> > > > 
> > > >>
> > > >> - We go to next round.
> > > >>   No way to allocate new chunk.
> > > >>
> > > >> In this case, we can only get 2T chunk.
> > > >> Just as the chunk allocator do.
> > > >>
> > > >>>
> > > >>> 2) RAID10 with 8T,1T,1T,1T,0.5T
> > > >>>     the virtal chunk size of 1st iteration:0.5T or smaller?
> > > >>
> > > >> Still the same, 2T chunk can be allocated using the largest 4 devices.
> > > >>
> > > >> Thanks,
> > > >> Qu
> > > >>
> > > >>>
> > > >>> Best Regards
> > > >>> 王玉贵
> > > >>> 2020/10/02
> > > >>>
> > > >>>
> > > >>> --------------------------------------
> > > >>> 北京京垓科技有限公司
> > > >>> 王玉贵	wangyugui@e16-tech.com
> > > >>> 电话:+86-136-71123776
> > > >>>
> > > >>
> > > > 
> > > > --------------------------------------
> > > > 北京京垓科技有限公司
> > > > 王玉贵	wangyugui@e16-tech.com
> > > > 电话:+86-136-71123776
> > > > 
> > > 
> > 
> > --------------------------------------
> > 北京京垓科技有限公司
> > 王玉贵	wangyugui@e16-tech.com
> > 电话:+86-136-71123776
> > 
> 

-- 
Hugo Mills             | Gomez, darling, don't torture yourself.
hugo@... carfax.org.uk | That's my job.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                       Morticia Addams

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-10-02 10:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20201001212617.82BC.409509F4@e16-tech.com>
     [not found] ` <20201001233649.888B.409509F4@e16-tech.com>
2020-10-01 23:38   ` simple Chunk allocator like calculation to replace Factor based calculation Qu Wenruo
2020-10-02  1:30     ` Wang Yugui
2020-10-02  1:46       ` Qu Wenruo
2020-10-02  1:59         ` Wang Yugui
2020-10-02  3:06           ` Qu Wenruo
2020-10-02  9:01             ` Wang Yugui
2020-10-02  9:15               ` Qu Wenruo
2020-10-02  9:06             ` Wang Yugui
2020-10-02 10:13               ` Hugo Mills
2020-10-02 10:23                 ` Hugo Mills

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.