All of lore.kernel.org
 help / color / mirror / Atom feed
* Regarding random grouop search start for allocation of inode.
@ 2015-12-03  7:44 lokesh jaliminche
  2015-12-03  8:07 ` lokesh jaliminche
  0 siblings, 1 reply; 10+ messages in thread
From: lokesh jaliminche @ 2015-12-03  7:44 UTC (permalink / raw)
  To: linux-ext4

hello folks,
                I am new to ext4 code. I was going through the
ext4-source for allocation of inode.
There is one thing that I did not understand while selection of groups
for inode allocation . I came across this code snippet which is part
of find_group_orlov function. question is, why group search start is
random ?

Code snippet:
==========
···if (qstr) {
»·······»·······»·······hinfo.hash_version = LDISKFS_DX_HASH_HALF_MD4;
»·······»·······»·······hinfo.seed = sbi->s_hash_seed;
»·······»·······»·······ldiskfsfs_dirhash(qstr->name, qstr->len, &hinfo);
»·······»·······»·······grp = hinfo.hash;
»·······»·······} else
»·······»·······»·······get_random_bytes(&grp, sizeof(grp));
»·······»·······parent_group = (unsigned)grp % ngroups;
»·······»·······for (i = 0; i < ngroups; i++) {
»·······»·······»·······g = (parent_group + i) % ngroups;
»·······»·······»·······get_orlov_stats(sb, g, flex_size, &stats);
»·······»·······»·······if (!stats.free_inodes)
»·······»·······»·······»·······continue;
»·······»·······»·······if (stats.used_dirs >= best_ndir)
»·······»·······»·······»·······continue;
»·······»·······»·······if (stats.free_inodes < avefreei)
»·······»·······»·······»·······continue;
»·······»·······»·······if (stats.free_blocks < avefreeb)
»·······»·······»·······»·······continue;
»·······»·······»·······grp = g;
»·······»·······»·······ret = 0;
»·······»·······»·······best_ndir = stats.used_dirs;
»·······»·······}

Thanks & Regards,
  Lokesh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regarding random grouop search start for allocation of inode.
  2015-12-03  7:44 Regarding random grouop search start for allocation of inode lokesh jaliminche
@ 2015-12-03  8:07 ` lokesh jaliminche
  2015-12-03 17:58   ` Andreas Dilger
  0 siblings, 1 reply; 10+ messages in thread
From: lokesh jaliminche @ 2015-12-03  8:07 UTC (permalink / raw)
  To: linux-ext4

Thought of giving more clarification on my question
why group search start is random ? because we can also start  search
for valid groups for inode allocation from the start . As this group
search is random  inode selection might go to end of groups which
might affect IO performance

On Thu, Dec 3, 2015 at 1:14 PM, lokesh jaliminche
<lokesh.jaliminche@gmail.com> wrote:
> hello folks,
>                 I am new to ext4 code. I was going through the
> ext4-source for allocation of inode.
> There is one thing that I did not understand while selection of groups
> for inode allocation . I came across this code snippet which is part
> of find_group_orlov function. question is, why group search start is
> random ?
>
> Code snippet:
> ==========
> ···if (qstr) {
> »·······»·······»·······hinfo.hash_version = LDISKFS_DX_HASH_HALF_MD4;
> »·······»·······»·······hinfo.seed = sbi->s_hash_seed;
> »·······»·······»·······ldiskfsfs_dirhash(qstr->name, qstr->len, &hinfo);
> »·······»·······»·······grp = hinfo.hash;
> »·······»·······} else
> »·······»·······»·······get_random_bytes(&grp, sizeof(grp));
> »·······»·······parent_group = (unsigned)grp % ngroups;
> »·······»·······for (i = 0; i < ngroups; i++) {
> »·······»·······»·······g = (parent_group + i) % ngroups;
> »·······»·······»·······get_orlov_stats(sb, g, flex_size, &stats);
> »·······»·······»·······if (!stats.free_inodes)
> »·······»·······»·······»·······continue;
> »·······»·······»·······if (stats.used_dirs >= best_ndir)
> »·······»·······»·······»·······continue;
> »·······»·······»·······if (stats.free_inodes < avefreei)
> »·······»·······»·······»·······continue;
> »·······»·······»·······if (stats.free_blocks < avefreeb)
> »·······»·······»·······»·······continue;
> »·······»·······»·······grp = g;
> »·······»·······»·······ret = 0;
> »·······»·······»·······best_ndir = stats.used_dirs;
> »·······»·······}
>
> Thanks & Regards,
>   Lokesh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regarding random grouop search start for allocation of inode.
  2015-12-03  8:07 ` lokesh jaliminche
@ 2015-12-03 17:58   ` Andreas Dilger
  2015-12-03 19:13     ` lokesh jaliminche
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Dilger @ 2015-12-03 17:58 UTC (permalink / raw)
  To: lokesh jaliminche; +Cc: linux-ext4

On Dec 3, 2015, at 01:07, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
> 
> Thought of giving more clarification on my question
> why group search start is random ? because we can also start  search
> for valid groups for inode allocation from the start. As this group
> search is random  inode selection might go to end of groups which
> might affect IO performance

Starting the inode search at the beginning of the disk each time
means that inode allocation will be inefficient because it will search
over groups that are mostly or entirely full already.

Allocating the new directory in a semi-random group, one that is
relatively unused, ensures that new
inode and block allocations are relatively efficient afterward. 

Cheers, Andreas

> On Thu, Dec 3, 2015 at 1:14 PM, lokesh jaliminche
> <lokesh.jaliminche@gmail.com> wrote:
>> hello folks,
>>                I am new to ext4 code. I was going through the
>> ext4-source for allocation of inode.
>> There is one thing that I did not understand while selection of groups
>> for inode allocation . I came across this code snippet which is part
>> of find_group_orlov function. question is, why group search start is
>> random ?
>> 
>> Code snippet:
>> ==========
>> В·В·В·if (qstr) {
>> »·······»·······»·······hinfo.hash_version = LDISKFS_DX_HASH_HALF_MD4;
>> »·······»·······»·······hinfo.seed = sbi->s_hash_seed;
>> »·······»·······»·······ldiskfsfs_dirhash(qstr->name, qstr->len, &hinfo);
>> »·······»·······»·······grp = hinfo.hash;
>> »·······»·······} else
>> »·······»·······»·······get_random_bytes(&grp, sizeof(grp));
>> »·······»·······parent_group = (unsigned)grp % ngroups;
>> »·······»·······for (i = 0; i < ngroups; i++) {
>> »·······»·······»·······g = (parent_group + i) % ngroups;
>> »·······»·······»·······get_orlov_stats(sb, g, flex_size, &stats);
>> »·······»·······»·······if (!stats.free_inodes)
>> »·······»·······»·······»·······continue;
>> »·······»·······»·······if (stats.used_dirs >= best_ndir)
>> »·······»·······»·······»·······continue;
>> »·······»·······»·······if (stats.free_inodes < avefreei)
>> »·······»·······»·······»·······continue;
>> »·······»·······»·······if (stats.free_blocks < avefreeb)
>> »·······»·······»·······»·······continue;
>> »·······»·······»·······grp = g;
>> »·······»·······»·······ret = 0;
>> »·······»·······»·······best_ndir = stats.used_dirs;
>> »·······»·······}
>> 
>> Thanks & Regards,
>>  Lokesh
> N‹§Іжмrё›yъиљШbІX¬¶З§vШ^–)Ює{.nЗ+‰·ҐЉ{±{^[xЉ{ayє\x1dК‡Ъ™л,j\a­ўfЈў·hљ‹аz№\x1e®wҐўё\fў·¦j:+v‰ЁЉwиjШm¶џяѕ\a«‘кзzZ+ѓщљЋЉЭўj"ќъ!¶i

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regarding random grouop search start for allocation of inode.
  2015-12-03 17:58   ` Andreas Dilger
@ 2015-12-03 19:13     ` lokesh jaliminche
  2015-12-04 21:54       ` Andreas Dilger
  0 siblings, 1 reply; 10+ messages in thread
From: lokesh jaliminche @ 2015-12-03 19:13 UTC (permalink / raw)
  To: Andreas Dilger, linux-ext4

Ohh thanks for the clarification. There is one more thing I would like
to point out here.
In the code there  is a loop to scan the groups for inode
alllocation(Inside find_group_orlov function).
There are some policies for group selection . while scanning the
groups, it checks for these
policies to be satisfied.
If a particular group satisfy these properties it should get selected
for inode allocation but instead
it does further lookup in next groups.
I think there is missing breaking condition. I have added break over
there and here is the
patch for that. Any reason for not having break condition over here ?

diff -Nur linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
--- linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2014-04-12
01:20:31.000000000 +0530
+++ linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2015-11-29
21:36:51.805542209 +0530
@@ -529,6 +529,7 @@
             grp = g;
             ret = 0;
             best_ndir = stats.used_dirs;
+            break;
         }
         if (ret)
         goto fallback;

Thanks & Regards,
   Lokesh



On Thu, Dec 3, 2015 at 11:28 PM, Andreas Dilger <adilger@dilger.ca> wrote:
> On Dec 3, 2015, at 01:07, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>>
>> Thought of giving more clarification on my question
>> why group search start is random ? because we can also start  search
>> for valid groups for inode allocation from the start. As this group
>> search is random  inode selection might go to end of groups which
>> might affect IO performance
>
> Starting the inode search at the beginning of the disk each time
> means that inode allocation will be inefficient because it will search
> over groups that are mostly or entirely full already.
>
> Allocating the new directory in a semi-random group, one that is
> relatively unused, ensures that new
> inode and block allocations are relatively efficient afterward.
>
> Cheers, Andreas
>
>> On Thu, Dec 3, 2015 at 1:14 PM, lokesh jaliminche
>> <lokesh.jaliminche@gmail.com> wrote:
>>> hello folks,
>>>                I am new to ext4 code. I was going through the
>>> ext4-source for allocation of inode.
>>> There is one thing that I did not understand while selection of groups
>>> for inode allocation . I came across this code snippet which is part
>>> of find_group_orlov function. question is, why group search start is
>>> random ?
>>>
>>> Code snippet:
>>> ==========
>>> В·В·В·if (qstr) {
>>> »·······»·······»·······hinfo.hash_version = LDISKFS_DX_HASH_HALF_MD4;
>>> »·······»·······»·······hinfo.seed = sbi->s_hash_seed;
>>> »·······»·······»·······ldiskfsfs_dirhash(qstr->name, qstr->len, &hinfo);
>>> »·······»·······»·······grp = hinfo.hash;
>>> »·······»·······} else
>>> »·······»·······»·······get_random_bytes(&grp, sizeof(grp));
>>> »·······»·······parent_group = (unsigned)grp % ngroups;
>>> »·······»·······for (i = 0; i < ngroups; i++) {
>>> »·······»·······»·······g = (parent_group + i) % ngroups;
>>> »·······»·······»·······get_orlov_stats(sb, g, flex_size, &stats);
>>> »·······»·······»·······if (!stats.free_inodes)
>>> »·······»·······»·······»·······continue;
>>> »·······»·······»·······if (stats.used_dirs >= best_ndir)
>>> »·······»·······»·······»·······continue;
>>> »·······»·······»·······if (stats.free_inodes < avefreei)
>>> »·······»·······»·······»·······continue;
>>> »·······»·······»·······if (stats.free_blocks < avefreeb)
>>> »·······»·······»·······»·······continue;
>>> »·······»·······»·······grp = g;
>>> »·······»·······»·······ret = 0;
>>> »·······»·······»·······best_ndir = stats.used_dirs;
>>> »·······»·······}
>>>
>>> Thanks & Regards,
>>>  Lokesh
>> N‹§Іжмrё›yъиљШbІX¬¶З§vШ^–)Ює{.nЗ+‰·ҐЉ{±{ xЉ{ayє К‡Ъ™л,j ­ўfЈў·hљ‹аz№ ®wҐўё ў·¦j:+v‰ЁЉwиjШm¶џяѕ «‘кзzZ+ѓщљЋЉЭўj"ќъ!¶i

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regarding random grouop search start for allocation of inode.
  2015-12-03 19:13     ` lokesh jaliminche
@ 2015-12-04 21:54       ` Andreas Dilger
  2015-12-15 10:33         ` lokesh jaliminche
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Dilger @ 2015-12-04 21:54 UTC (permalink / raw)
  To: lokesh jaliminche; +Cc: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 6774 bytes --]


> On Dec 3, 2015, at 12:13 PM, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
> 
> Ohh thanks for the clarification. There is one more thing I would like
> to point out here.
> In the code there  is a loop to scan the groups for inode
> alllocation(Inside find_group_orlov function).
> There are some policies for group selection . while scanning the
> groups, it checks for these
> policies to be satisfied.
> If a particular group satisfy these properties it should get selected
> for inode allocation but instead
> it does further lookup in next groups.
> I think there is missing breaking condition. I have added break over
> there and here is the
> patch for that. Any reason for not having break condition over here ?
> 
> diff -Nur linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
> linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
> --- linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2014-04-12
> 01:20:31.000000000 +0530
> +++ linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2015-11-29
> 21:36:51.805542209 +0530
> @@ -529,6 +529,7 @@
>             grp = g;
>             ret = 0;
>             best_ndir = stats.used_dirs;
> +            break;
>         }
>         if (ret)
>         goto fallback;

No, this isn't correct.  This loop is looking for the *best* group it can find,
and your "break" would have it exit the loop as soon as the *first* directory
that matches the conditions is found.  Since those conditions are fairly weak,
for example that the group actually has free inodes, and it has better than
average free inodes and blocks, it makes sense to search beyond just the first
matching group.

That said, it also doesn't make sense to search beyond a "perfect" group that
has no allocated inodes and no allocated blocks, so a break condition could be
added to this loop and make it more efficient, especially for very large
filesystems that have 128k+ groups.

It should be noted that this part of the algorithm only applies to "top level"
directories (those below the root inode, or with the EXT4_INODE_TOPDIR flag
set, so searching a bit longer for a good group is not a bad idea in this case.

Cheers, Andreas.

> Thanks & Regards,
>   Lokesh
> 
> 
> 
> On Thu, Dec 3, 2015 at 11:28 PM, Andreas Dilger <adilger@dilger.ca> wrote:
>> On Dec 3, 2015, at 01:07, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>>> 
>>> Thought of giving more clarification on my question
>>> why group search start is random ? because we can also start  search
>>> for valid groups for inode allocation from the start. As this group
>>> search is random  inode selection might go to end of groups which
>>> might affect IO performance
>> 
>> Starting the inode search at the beginning of the disk each time
>> means that inode allocation will be inefficient because it will search
>> over groups that are mostly or entirely full already.
>> 
>> Allocating the new directory in a semi-random group, one that is
>> relatively unused, ensures that new
>> inode and block allocations are relatively efficient afterward.
>> 
>> Cheers, Andreas
>> 
>>> On Thu, Dec 3, 2015 at 1:14 PM, lokesh jaliminche
>>> <lokesh.jaliminche@gmail.com> wrote:
>>>> hello folks,
>>>>               I am new to ext4 code. I was going through the
>>>> ext4-source for allocation of inode.
>>>> There is one thing that I did not understand while selection of groups
>>>> for inode allocation . I came across this code snippet which is part
>>>> of find_group_orlov function. question is, why group search start is
>>>> random ?
>>>> 
>>>> Code snippet:
>>>> ==========
>>>> В·В·В·if (qstr) {
>>>> »·······»·······»·······hinfo.hash_version = LDISKFS_DX_HASH_HALF_MD4;
>>>> »·······»·······»·······hinfo.seed = sbi->s_hash_seed;
>>>> »·······»·······»·······ldiskfsfs_dirhash(qstr->name, qstr->len, &hinfo);
>>>> »·······»·······»·······grp = hinfo.hash;
>>>> »·······»·······} else
>>>> »·······»·······»·······get_random_bytes(&grp, sizeof(grp));
>>>> »·······»·······parent_group = (unsigned)grp % ngroups;
>>>> »·······»·······for (i = 0; i < ngroups; i++) {
>>>> »·······»·······»·······g = (parent_group + i) % ngroups;
>>>> »·······»·······»·······get_orlov_stats(sb, g, flex_size, &stats);
>>>> »·······»·······»·······if (!stats.free_inodes)
>>>> »·······»·······»·······»·······continue;
>>>> »·······»·······»·······if (stats.used_dirs >= best_ndir)
>>>> »·······»·······»·······»·······continue;
>>>> »·······»·······»·······if (stats.free_inodes < avefreei)
>>>> »·······»·······»·······»·······continue;
>>>> »·······»·······»·······if (stats.free_blocks < avefreeb)
>>>> »·······»·······»·······»·······continue;
>>>> »·······»·······»·······grp = g;
>>>> »·······»·······»·······ret = 0;
>>>> »·······»·······»·······best_ndir = stats.used_dirs;
>>>> »·······»·······}
>>>> 
>>>> Thanks & Regards,
>>>> Lokesh
>>> N‹§Іжмrё›yъиљШbІX¬¶З§vШ^–)Ює{.nЗ+‰·ҐЉ{±{ xЉ{ayє К‡Ъ™л,j ­ўfЈў·hљ‹аz№ ®wҐўё ў·¦j:+v‰ЁЉwиjШm¶џяѕ «‘кзzZ+ѓщљЋЉЭўj"ќъ!¶i


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regarding random grouop search start for allocation of inode.
  2015-12-04 21:54       ` Andreas Dilger
@ 2015-12-15 10:33         ` lokesh jaliminche
  2015-12-15 21:32           ` Andreas Dilger
  0 siblings, 1 reply; 10+ messages in thread
From: lokesh jaliminche @ 2015-12-15 10:33 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-ext4

"No, this isn't correct.  This loop is looking for the *best* group it
can find, and your "break" would have it exit the loop as soon as the
*first* group that matches the conditions is found."

But as we are checking  all the groups with the same conditions then
how it guarantees better group selection ? As per my understanding  we
are just wasting time in looping.

On Sat, Dec 5, 2015 at 3:24 AM, Andreas Dilger <adilger@dilger.ca> wrote:
>
>> On Dec 3, 2015, at 12:13 PM, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>>
>> Ohh thanks for the clarification. There is one more thing I would like
>> to point out here.
>> In the code there  is a loop to scan the groups for inode
>> alllocation(Inside find_group_orlov function).
>> There are some policies for group selection . while scanning the
>> groups, it checks for these
>> policies to be satisfied.
>> If a particular group satisfy these properties it should get selected
>> for inode allocation but instead
>> it does further lookup in next groups.
>> I think there is missing breaking condition. I have added break over
>> there and here is the
>> patch for that. Any reason for not having break condition over here ?
>>
>> diff -Nur linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
>> linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
>> --- linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2014-04-12
>> 01:20:31.000000000 +0530
>> +++ linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2015-11-29
>> 21:36:51.805542209 +0530
>> @@ -529,6 +529,7 @@
>>             grp = g;
>>             ret = 0;
>>             best_ndir = stats.used_dirs;
>> +            break;
>>         }
>>         if (ret)
>>         goto fallback;
>
> No, this isn't correct.  This loop is looking for the *best* group it can find,
> and your "break" would have it exit the loop as soon as the *first* directory
> that matches the conditions is found.  Since those conditions are fairly weak,
> for example that the group actually has free inodes, and it has better than
> average free inodes and blocks, it makes sense to search beyond just the first
> matching group.
>
> That said, it also doesn't make sense to search beyond a "perfect" group that
> has no allocated inodes and no allocated blocks, so a break condition could be
> added to this loop and make it more efficient, especially for very large
> filesystems that have 128k+ groups.
>
> It should be noted that this part of the algorithm only applies to "top level"
> directories (those below the root inode, or with the EXT4_INODE_TOPDIR flag
> set, so searching a bit longer for a good group is not a bad idea in this case.
>
> Cheers, Andreas.
>
>> Thanks & Regards,
>>   Lokesh
>>
>>
>>
>> On Thu, Dec 3, 2015 at 11:28 PM, Andreas Dilger <adilger@dilger.ca> wrote:
>>> On Dec 3, 2015, at 01:07, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>>>>
>>>> Thought of giving more clarification on my question
>>>> why group search start is random ? because we can also start  search
>>>> for valid groups for inode allocation from the start. As this group
>>>> search is random  inode selection might go to end of groups which
>>>> might affect IO performance
>>>
>>> Starting the inode search at the beginning of the disk each time
>>> means that inode allocation will be inefficient because it will search
>>> over groups that are mostly or entirely full already.
>>>
>>> Allocating the new directory in a semi-random group, one that is
>>> relatively unused, ensures that new
>>> inode and block allocations are relatively efficient afterward.
>>>
>>> Cheers, Andreas
>>>
>>>> On Thu, Dec 3, 2015 at 1:14 PM, lokesh jaliminche
>>>> <lokesh.jaliminche@gmail.com> wrote:
>>>>> hello folks,
>>>>>               I am new to ext4 code. I was going through the
>>>>> ext4-source for allocation of inode.
>>>>> There is one thing that I did not understand while selection of groups
>>>>> for inode allocation . I came across this code snippet which is part
>>>>> of find_group_orlov function. question is, why group search start is
>>>>> random ?
>>>>>
>>>>> Code snippet:
>>>>> ==========
>>>>> В·В·В·if (qstr) {
>>>>> »·······»·······»·······hinfo.hash_version = LDISKFS_DX_HASH_HALF_MD4;
>>>>> »·······»·······»·······hinfo.seed = sbi->s_hash_seed;
>>>>> »·······»·······»·······ldiskfsfs_dirhash(qstr->name, qstr->len, &hinfo);
>>>>> »·······»·······»·······grp = hinfo.hash;
>>>>> »·······»·······} else
>>>>> »·······»·······»·······get_random_bytes(&grp, sizeof(grp));
>>>>> »·······»·······parent_group = (unsigned)grp % ngroups;
>>>>> »·······»·······for (i = 0; i < ngroups; i++) {
>>>>> »·······»·······»·······g = (parent_group + i) % ngroups;
>>>>> »·······»·······»·······get_orlov_stats(sb, g, flex_size, &stats);
>>>>> »·······»·······»·······if (!stats.free_inodes)
>>>>> »·······»·······»·······»·······continue;
>>>>> »·······»·······»·······if (stats.used_dirs >= best_ndir)
>>>>> »·······»·······»·······»·······continue;
>>>>> »·······»·······»·······if (stats.free_inodes < avefreei)
>>>>> »·······»·······»·······»·······continue;
>>>>> »·······»·······»·······if (stats.free_blocks < avefreeb)
>>>>> »·······»·······»·······»·······continue;
>>>>> »·······»·······»·······grp = g;
>>>>> »·······»·······»·······ret = 0;
>>>>> »·······»·······»·······best_ndir = stats.used_dirs;
>>>>> »·······»·······}
>>>>>
>>>>> Thanks & Regards,
>>>>> Lokesh
>>>> N‹§Іжмrё›yъиљШbІX¬¶З§vШ^–)Ює{.nЗ+‰·ҐЉ{±{ xЉ{ayє К‡Ъ™л,j ­ўfЈў·hљ‹аz№ ®wҐўё ў·¦j:+v‰ЁЉwиjШm¶џяѕ «‘кзzZ+ѓщљЋЉЭўj"ќъ!¶i
>
>
> Cheers, Andreas
>
>
>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regarding random grouop search start for allocation of inode.
  2015-12-15 10:33         ` lokesh jaliminche
@ 2015-12-15 21:32           ` Andreas Dilger
  2015-12-17 14:34             ` lokesh jaliminche
  2015-12-19  5:09             ` lokesh
  0 siblings, 2 replies; 10+ messages in thread
From: Andreas Dilger @ 2015-12-15 21:32 UTC (permalink / raw)
  To: lokesh jaliminche; +Cc: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 8885 bytes --]

On Dec 15, 2015, at 3:33 AM, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
> 
> "No, this isn't correct.  This loop is looking for the *best* group it
> can find, and your "break" would have it exit the loop as soon as the
> *first* group that matches the conditions is found."
> 
> But as we are checking  all the groups with the same conditions then
> how it guarantees better group selection ? As per my understanding  we
> are just wasting time in looping.

The important part of the loop is ensuring that the selected group is always
improving over the previous one:

			if (le16_to_cpu(desc->bg_used_dirs_count) >= best_ndir)
                                continue;

The "best_ndir" value tracks for the best group found so far the number of
used directories in the group, and if the new directory has fewer directories
than the  previous "best" directory and still meets all the other criteria
(fewer than average inodes allocated, etc) then the new group will be chosen.

That said, you are correct that the loop can spend a lot of time searching
needlessly.  It would be trivial to add a check at the start of the loop:

			/* the group can't get any better than empty */
			if (desc->bg_free_inodes_count == inodes_per_group &&
			    desc->bg_free_blocks_count ==
						EXT4_BLOCKS_PER_GROUP(sb))
				goto found;

This jumps directly to "found" with "desc" and "group" already set, so
there is no need to go through the extra steps of setting "best_desc" and
"best_group" and then break out of the loop just to set "desc" and "group"
again.

Since you are the one to find this optimization, could you please submit a
patch to this effect so you get the credit.

Cheers, Andreas

> On Sat, Dec 5, 2015 at 3:24 AM, Andreas Dilger <adilger@dilger.ca> wrote:
>> 
>>> On Dec 3, 2015, at 12:13 PM, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>>> 
>>> Ohh thanks for the clarification. There is one more thing I would like
>>> to point out here.
>>> In the code there  is a loop to scan the groups for inode
>>> alllocation(Inside find_group_orlov function).
>>> There are some policies for group selection . while scanning the
>>> groups, it checks for these
>>> policies to be satisfied.
>>> If a particular group satisfy these properties it should get selected
>>> for inode allocation but instead
>>> it does further lookup in next groups.
>>> I think there is missing breaking condition. I have added break over
>>> there and here is the
>>> patch for that. Any reason for not having break condition over here ?
>>> 
>>> diff -Nur linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
>>> linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
>>> --- linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2014-04-12
>>> 01:20:31.000000000 +0530
>>> +++ linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2015-11-29
>>> 21:36:51.805542209 +0530
>>> @@ -529,6 +529,7 @@
>>>            grp = g;
>>>            ret = 0;
>>>            best_ndir = stats.used_dirs;
>>> +            break;
>>>        }
>>>        if (ret)
>>>        goto fallback;
>> 
>> No, this isn't correct.  This loop is looking for the *best* group it can find,
>> and your "break" would have it exit the loop as soon as the *first* directory
>> that matches the conditions is found.  Since those conditions are fairly weak,
>> for example that the group actually has free inodes, and it has better than
>> average free inodes and blocks, it makes sense to search beyond just the first
>> matching group.
>> 
>> That said, it also doesn't make sense to search beyond a "perfect" group that
>> has no allocated inodes and no allocated blocks, so a break condition could be
>> added to this loop and make it more efficient, especially for very large
>> filesystems that have 128k+ groups.
>> 
>> It should be noted that this part of the algorithm only applies to "top level"
>> directories (those below the root inode, or with the EXT4_INODE_TOPDIR flag
>> set, so searching a bit longer for a good group is not a bad idea in this case.
>> 
>> Cheers, Andreas.
>> 
>>> Thanks & Regards,
>>>  Lokesh
>>> 
>>> 
>>> 
>>> On Thu, Dec 3, 2015 at 11:28 PM, Andreas Dilger <adilger@dilger.ca> wrote:
>>>> On Dec 3, 2015, at 01:07, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>>>>> 
>>>>> Thought of giving more clarification on my question
>>>>> why group search start is random ? because we can also start  search
>>>>> for valid groups for inode allocation from the start. As this group
>>>>> search is random  inode selection might go to end of groups which
>>>>> might affect IO performance
>>>> 
>>>> Starting the inode search at the beginning of the disk each time
>>>> means that inode allocation will be inefficient because it will search
>>>> over groups that are mostly or entirely full already.
>>>> 
>>>> Allocating the new directory in a semi-random group, one that is
>>>> relatively unused, ensures that new
>>>> inode and block allocations are relatively efficient afterward.
>>>> 
>>>> Cheers, Andreas
>>>> 
>>>>> On Thu, Dec 3, 2015 at 1:14 PM, lokesh jaliminche
>>>>> <lokesh.jaliminche@gmail.com> wrote:
>>>>>> hello folks,
>>>>>>              I am new to ext4 code. I was going through the
>>>>>> ext4-source for allocation of inode.
>>>>>> There is one thing that I did not understand while selection of groups
>>>>>> for inode allocation . I came across this code snippet which is part
>>>>>> of find_group_orlov function. question is, why group search start is
>>>>>> random ?
>>>>>> 
>>>>>> Code snippet:
>>>>>> ==========
>>>>>> В·В·В·if (qstr) {
>>>>>> »·······»·······»·······hinfo.hash_version = LDISKFS_DX_HASH_HALF_MD4;
>>>>>> »·······»·······»·······hinfo.seed = sbi->s_hash_seed;
>>>>>> »·······»·······»·······ldiskfsfs_dirhash(qstr->name, qstr->len, &hinfo);
>>>>>> »·······»·······»·······grp = hinfo.hash;
>>>>>> »·······»·······} else
>>>>>> »·······»·······»·······get_random_bytes(&grp, sizeof(grp));
>>>>>> »·······»·······parent_group = (unsigned)grp % ngroups;
>>>>>> »·······»·······for (i = 0; i < ngroups; i++) {
>>>>>> »·······»·······»·······g = (parent_group + i) % ngroups;
>>>>>> »·······»·······»·······get_orlov_stats(sb, g, flex_size, &stats);
>>>>>> »·······»·······»·······if (!stats.free_inodes)
>>>>>> »·······»·······»·······»·······continue;
>>>>>> »·······»·······»·······if (stats.used_dirs >= best_ndir)
>>>>>> »·······»·······»·······»·······continue;
>>>>>> »·······»·······»·······if (stats.free_inodes < avefreei)
>>>>>> »·······»·······»·······»·······continue;
>>>>>> »·······»·······»·······if (stats.free_blocks < avefreeb)
>>>>>> »·······»·······»·······»·······continue;
>>>>>> »·······»·······»·······grp = g;
>>>>>> »·······»·······»·······ret = 0;
>>>>>> »·······»·······»·······best_ndir = stats.used_dirs;
>>>>>> »·······»·······}
>>>>>> 
>>>>>> Thanks & Regards,
>>>>>> Lokesh
>>>>> N‹§Іжмrё›yъиљШbІX¬¶З§vШ^–)Ює{.nЗ+‰·ҐЉ{±{ xЉ{ayє К‡Ъ™л,j ­ўfЈў·hљ‹аz№ ®wҐўё ў·¦j:+v‰ЁЉwиjШm¶џяѕ «‘кзzZ+ѓщљЋЉЭўj"ќъ!¶i
>> 
>> 
>> Cheers, Andreas
>> 
>> 
>> 
>> 
>> 


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regarding random grouop search start for allocation of inode.
  2015-12-15 21:32           ` Andreas Dilger
@ 2015-12-17 14:34             ` lokesh jaliminche
  2015-12-19  5:09             ` lokesh
  1 sibling, 0 replies; 10+ messages in thread
From: lokesh jaliminche @ 2015-12-17 14:34 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-ext4

Hey thanks for that, I will push a patch for this ASAP :-)

Regards,
Lokesh

On Wed, Dec 16, 2015 at 3:02 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> On Dec 15, 2015, at 3:33 AM, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>>
>> "No, this isn't correct.  This loop is looking for the *best* group it
>> can find, and your "break" would have it exit the loop as soon as the
>> *first* group that matches the conditions is found."
>>
>> But as we are checking  all the groups with the same conditions then
>> how it guarantees better group selection ? As per my understanding  we
>> are just wasting time in looping.
>
> The important part of the loop is ensuring that the selected group is always
> improving over the previous one:
>
>                         if (le16_to_cpu(desc->bg_used_dirs_count) >= best_ndir)
>                                 continue;
>
> The "best_ndir" value tracks for the best group found so far the number of
> used directories in the group, and if the new directory has fewer directories
> than the  previous "best" directory and still meets all the other criteria
> (fewer than average inodes allocated, etc) then the new group will be chosen.
>
> That said, you are correct that the loop can spend a lot of time searching
> needlessly.  It would be trivial to add a check at the start of the loop:
>
>                         /* the group can't get any better than empty */
>                         if (desc->bg_free_inodes_count == inodes_per_group &&
>                             desc->bg_free_blocks_count ==
>                                                 EXT4_BLOCKS_PER_GROUP(sb))
>                                 goto found;
>
> This jumps directly to "found" with "desc" and "group" already set, so
> there is no need to go through the extra steps of setting "best_desc" and
> "best_group" and then break out of the loop just to set "desc" and "group"
> again.
>
> Since you are the one to find this optimization, could you please submit a
> patch to this effect so you get the credit.
>
> Cheers, Andreas
>
>> On Sat, Dec 5, 2015 at 3:24 AM, Andreas Dilger <adilger@dilger.ca> wrote:
>>>
>>>> On Dec 3, 2015, at 12:13 PM, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>>>>
>>>> Ohh thanks for the clarification. There is one more thing I would like
>>>> to point out here.
>>>> In the code there  is a loop to scan the groups for inode
>>>> alllocation(Inside find_group_orlov function).
>>>> There are some policies for group selection . while scanning the
>>>> groups, it checks for these
>>>> policies to be satisfied.
>>>> If a particular group satisfy these properties it should get selected
>>>> for inode allocation but instead
>>>> it does further lookup in next groups.
>>>> I think there is missing breaking condition. I have added break over
>>>> there and here is the
>>>> patch for that. Any reason for not having break condition over here ?
>>>>
>>>> diff -Nur linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
>>>> linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
>>>> --- linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2014-04-12
>>>> 01:20:31.000000000 +0530
>>>> +++ linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2015-11-29
>>>> 21:36:51.805542209 +0530
>>>> @@ -529,6 +529,7 @@
>>>>            grp = g;
>>>>            ret = 0;
>>>>            best_ndir = stats.used_dirs;
>>>> +            break;
>>>>        }
>>>>        if (ret)
>>>>        goto fallback;
>>>
>>> No, this isn't correct.  This loop is looking for the *best* group it can find,
>>> and your "break" would have it exit the loop as soon as the *first* directory
>>> that matches the conditions is found.  Since those conditions are fairly weak,
>>> for example that the group actually has free inodes, and it has better than
>>> average free inodes and blocks, it makes sense to search beyond just the first
>>> matching group.
>>>
>>> That said, it also doesn't make sense to search beyond a "perfect" group that
>>> has no allocated inodes and no allocated blocks, so a break condition could be
>>> added to this loop and make it more efficient, especially for very large
>>> filesystems that have 128k+ groups.
>>>
>>> It should be noted that this part of the algorithm only applies to "top level"
>>> directories (those below the root inode, or with the EXT4_INODE_TOPDIR flag
>>> set, so searching a bit longer for a good group is not a bad idea in this case.
>>>
>>> Cheers, Andreas.
>>>
>>>> Thanks & Regards,
>>>>  Lokesh
>>>>
>>>>
>>>>
>>>> On Thu, Dec 3, 2015 at 11:28 PM, Andreas Dilger <adilger@dilger.ca> wrote:
>>>>> On Dec 3, 2015, at 01:07, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>>>>>>
>>>>>> Thought of giving more clarification on my question
>>>>>> why group search start is random ? because we can also start  search
>>>>>> for valid groups for inode allocation from the start. As this group
>>>>>> search is random  inode selection might go to end of groups which
>>>>>> might affect IO performance
>>>>>
>>>>> Starting the inode search at the beginning of the disk each time
>>>>> means that inode allocation will be inefficient because it will search
>>>>> over groups that are mostly or entirely full already.
>>>>>
>>>>> Allocating the new directory in a semi-random group, one that is
>>>>> relatively unused, ensures that new
>>>>> inode and block allocations are relatively efficient afterward.
>>>>>
>>>>> Cheers, Andreas
>>>>>
>>>>>> On Thu, Dec 3, 2015 at 1:14 PM, lokesh jaliminche
>>>>>> <lokesh.jaliminche@gmail.com> wrote:
>>>>>>> hello folks,
>>>>>>>              I am new to ext4 code. I was going through the
>>>>>>> ext4-source for allocation of inode.
>>>>>>> There is one thing that I did not understand while selection of groups
>>>>>>> for inode allocation . I came across this code snippet which is part
>>>>>>> of find_group_orlov function. question is, why group search start is
>>>>>>> random ?
>>>>>>>
>>>>>>> Code snippet:
>>>>>>> ==========
>>>>>>> В·В·В·if (qstr) {
>>>>>>> »·······»·······»·······hinfo.hash_version = LDISKFS_DX_HASH_HALF_MD4;
>>>>>>> »·······»·······»·······hinfo.seed = sbi->s_hash_seed;
>>>>>>> »·······»·······»·······ldiskfsfs_dirhash(qstr->name, qstr->len, &hinfo);
>>>>>>> »·······»·······»·······grp = hinfo.hash;
>>>>>>> »·······»·······} else
>>>>>>> »·······»·······»·······get_random_bytes(&grp, sizeof(grp));
>>>>>>> »·······»·······parent_group = (unsigned)grp % ngroups;
>>>>>>> »·······»·······for (i = 0; i < ngroups; i++) {
>>>>>>> »·······»·······»·······g = (parent_group + i) % ngroups;
>>>>>>> »·······»·······»·······get_orlov_stats(sb, g, flex_size, &stats);
>>>>>>> »·······»·······»·······if (!stats.free_inodes)
>>>>>>> »·······»·······»·······»·······continue;
>>>>>>> »·······»·······»·······if (stats.used_dirs >= best_ndir)
>>>>>>> »·······»·······»·······»·······continue;
>>>>>>> »·······»·······»·······if (stats.free_inodes < avefreei)
>>>>>>> »·······»·······»·······»·······continue;
>>>>>>> »·······»·······»·······if (stats.free_blocks < avefreeb)
>>>>>>> »·······»·······»·······»·······continue;
>>>>>>> »·······»·······»·······grp = g;
>>>>>>> »·······»·······»·······ret = 0;
>>>>>>> »·······»·······»·······best_ndir = stats.used_dirs;
>>>>>>> »·······»·······}
>>>>>>>
>>>>>>> Thanks & Regards,
>>>>>>> Lokesh
>>>>>> N‹§Іжмrё›yъиљШbІX¬¶З§vШ^–)Ює{.nЗ+‰·ҐЉ{±{ xЉ{ayє К‡Ъ™л,j ­ўfЈў·hљ‹аz№ ®wҐўё ў·¦j:+v‰ЁЉwиjШm¶џяѕ «‘кзzZ+ѓщљЋЉЭўj"ќъ!¶i
>>>
>>>
>>> Cheers, Andreas
>>>
>>>
>>>
>>>
>>>
>
>
> Cheers, Andreas
>
>
>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Regarding random grouop search start for allocation of inode.
  2015-12-15 21:32           ` Andreas Dilger
  2015-12-17 14:34             ` lokesh jaliminche
@ 2015-12-19  5:09             ` lokesh
  2015-12-21 18:25               ` Andreas Dilger
  1 sibling, 1 reply; 10+ messages in thread
From: lokesh @ 2015-12-19  5:09 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-ext4

From 9e09fef78b2fa552c883bf8124af873abfde0805 Mon Sep 17 00:00:00 2001
From: Lokesh Nagappa Jaliminche <lokesh.jaliminche@gmail.com>
Date: Sat, 19 Dec 2015 00:33:06 +0530
Subject: [PATCH] ext4: optimizing group serch for inode allocation

Added a check at the start of group search loop to
avoid looping unecessarily in case of empty group.
This also allow group search to jump directly to
"found_flex_bg" with "stats" and "group" already set,
so there is no need to go through the extra steps of
setting "best_desc" and "best_group" and then break
out of the loop just to set "stats" and "group" again.

Signed-off-by: Lokesh Nagappa Jaliminche <lokesh.jaliminche@gmail.com>
---
 fs/ext4/ialloc.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 1b8024d..588bf8e 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -446,6 +446,8 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	ext4_group_t real_ngroups = ext4_get_groups_count(sb);
 	int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
+	unsigned int inodes_per_flex_group;
+	long unsigned int blocks_per_clustre;
 	unsigned int freei, avefreei, grp_free;
 	ext4_fsblk_t freeb, avefreec;
 	unsigned int ndirs;
@@ -470,6 +472,8 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
 		percpu_counter_read_positive(&sbi->s_freeclusters_counter));
 	avefreec = freeb;
 	do_div(avefreec, ngroups);
+	inodes_per_flex_group = inodes_per_group * flex_size;
+	blocks_per_clustre = sbi->s_blocks_per_group * flex_size;
 	ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
·
 	if (S_ISDIR(mode) &&
@@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
 		for (i = 0; i < ngroups; i++) {
 			g = (parent_group + i) % ngroups;
 			get_orlov_stats(sb, g, flex_size, &stats);
+			/* the group can't get any better than empty */
+			if (inodes_per_flex_group == stats.free_inodes &&
+				blocks_per_clustre == stats.free_clusters)
+					goto found_flex_bg;
 			if (!stats.free_inodes)
 				continue;
 			if (stats.used_dirs >= best_ndir)
--
1.7.1
On Tue, Dec 15, 2015 at 02:32:52PM -0700, Andreas Dilger wrote:
> On Dec 15, 2015, at 3:33 AM, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
> > 
> > "No, this isn't correct.  This loop is looking for the *best* group it
> > can find, and your "break" would have it exit the loop as soon as the
> > *first* group that matches the conditions is found."
> > 
> > But as we are checking  all the groups with the same conditions then
> > how it guarantees better group selection ? As per my understanding  we
> > are just wasting time in looping.
> 
> The important part of the loop is ensuring that the selected group is always
> improving over the previous one:
> 
> 			if (le16_to_cpu(desc->bg_used_dirs_count) >= best_ndir)
>                                 continue;
> 
> The "best_ndir" value tracks for the best group found so far the number of
> used directories in the group, and if the new directory has fewer directories
> than the  previous "best" directory and still meets all the other criteria
> (fewer than average inodes allocated, etc) then the new group will be chosen.
> 
> That said, you are correct that the loop can spend a lot of time searching
> needlessly.  It would be trivial to add a check at the start of the loop:
> 
> 			/* the group can't get any better than empty */
> 			if (desc->bg_free_inodes_count == inodes_per_group &&
> 			    desc->bg_free_blocks_count ==
> 						EXT4_BLOCKS_PER_GROUP(sb))
> 				goto found;
> 
> This jumps directly to "found" with "desc" and "group" already set, so
> there is no need to go through the extra steps of setting "best_desc" and
> "best_group" and then break out of the loop just to set "desc" and "group"
> again.
> 
> Since you are the one to find this optimization, could you please submit a
> patch to this effect so you get the credit.
> 
> Cheers, Andreas
> 
> > On Sat, Dec 5, 2015 at 3:24 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> >> 
> >>> On Dec 3, 2015, at 12:13 PM, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
> >>> 
> >>> Ohh thanks for the clarification. There is one more thing I would like
> >>> to point out here.
> >>> In the code there  is a loop to scan the groups for inode
> >>> alllocation(Inside find_group_orlov function).
> >>> There are some policies for group selection . while scanning the
> >>> groups, it checks for these
> >>> policies to be satisfied.
> >>> If a particular group satisfy these properties it should get selected
> >>> for inode allocation but instead
> >>> it does further lookup in next groups.
> >>> I think there is missing breaking condition. I have added break over
> >>> there and here is the
> >>> patch for that. Any reason for not having break condition over here ?
> >>> 
> >>> diff -Nur linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
> >>> linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
> >>> --- linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2014-04-12
> >>> 01:20:31.000000000 +0530
> >>> +++ linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2015-11-29
> >>> 21:36:51.805542209 +0530
> >>> @@ -529,6 +529,7 @@
> >>>            grp = g;
> >>>            ret = 0;
> >>>            best_ndir = stats.used_dirs;
> >>> +            break;
> >>>        }
> >>>        if (ret)
> >>>        goto fallback;
> >> 
> >> No, this isn't correct.  This loop is looking for the *best* group it can find,
> >> and your "break" would have it exit the loop as soon as the *first* directory
> >> that matches the conditions is found.  Since those conditions are fairly weak,
> >> for example that the group actually has free inodes, and it has better than
> >> average free inodes and blocks, it makes sense to search beyond just the first
> >> matching group.
> >> 
> >> That said, it also doesn't make sense to search beyond a "perfect" group that
> >> has no allocated inodes and no allocated blocks, so a break condition could be
> >> added to this loop and make it more efficient, especially for very large
> >> filesystems that have 128k+ groups.
> >> 
> >> It should be noted that this part of the algorithm only applies to "top level"
> >> directories (those below the root inode, or with the EXT4_INODE_TOPDIR flag
> >> set, so searching a bit longer for a good group is not a bad idea in this case.
> >> 
> >> Cheers, Andreas.
> >> 
> >>> Thanks & Regards,
> >>>  Lokesh
> >>> 
> >>> 
> >>> 
> >>> On Thu, Dec 3, 2015 at 11:28 PM, Andreas Dilger <adilger@dilger.ca> wrote:
> >>>> On Dec 3, 2015, at 01:07, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
> >>>>> 
> >>>>> Thought of giving more clarification on my question
> >>>>> why group search start is random ? because we can also start  search
> >>>>> for valid groups for inode allocation from the start. As this group
> >>>>> search is random  inode selection might go to end of groups which
> >>>>> might affect IO performance
> >>>> 
> >>>> Starting the inode search at the beginning of the disk each time
> >>>> means that inode allocation will be inefficient because it will search
> >>>> over groups that are mostly or entirely full already.
> >>>> 
> >>>> Allocating the new directory in a semi-random group, one that is
> >>>> relatively unused, ensures that new
> >>>> inode and block allocations are relatively efficient afterward.
> >>>> 
> >>>> Cheers, Andreas
> >>>> 
> >>>>> On Thu, Dec 3, 2015 at 1:14 PM, lokesh jaliminche
> >>>>> <lokesh.jaliminche@gmail.com> wrote:
> >>>>>> hello folks,
> >>>>>>              I am new to ext4 code. I was going through the
> >>>>>> ext4-source for allocation of inode.
> >>>>>> There is one thing that I did not understand while selection of groups
> >>>>>> for inode allocation . I came across this code snippet which is part
> >>>>>> of find_group_orlov function. question is, why group search start is
> >>>>>> random ?
> >>>>>> 
> >>>>>> Code snippet:
> >>>>>> ==========
> >>>>>> В·В·В·if (qstr) {
> >>>>>> »·······»·······»·······hinfo.hash_version = LDISKFS_DX_HASH_HALF_MD4;
> >>>>>> »·······»·······»·······hinfo.seed = sbi->s_hash_seed;
> >>>>>> »·······»·······»·······ldiskfsfs_dirhash(qstr->name, qstr->len, &hinfo);
> >>>>>> »·······»·······»·······grp = hinfo.hash;
> >>>>>> »·······»·······} else
> >>>>>> »·······»·······»·······get_random_bytes(&grp, sizeof(grp));
> >>>>>> »·······»·······parent_group = (unsigned)grp % ngroups;
> >>>>>> »·······»·······for (i = 0; i < ngroups; i++) {
> >>>>>> »·······»·······»·······g = (parent_group + i) % ngroups;
> >>>>>> »·······»·······»·······get_orlov_stats(sb, g, flex_size, &stats);
> >>>>>> »·······»·······»·······if (!stats.free_inodes)
> >>>>>> »·······»·······»·······»·······continue;
> >>>>>> »·······»·······»·······if (stats.used_dirs >= best_ndir)
> >>>>>> »·······»·······»·······»·······continue;
> >>>>>> »·······»·······»·······if (stats.free_inodes < avefreei)
> >>>>>> »·······»·······»·······»·······continue;
> >>>>>> »·······»·······»·······if (stats.free_blocks < avefreeb)
> >>>>>> »·······»·······»·······»·······continue;
> >>>>>> »·······»·······»·······grp = g;
> >>>>>> »·······»·······»·······ret = 0;
> >>>>>> »·······»·······»·······best_ndir = stats.used_dirs;
> >>>>>> »·······»·······}
> >>>>>> 
> >>>>>> Thanks & Regards,
> >>>>>> Lokesh
> >>>>> N‹§Іжмrё›yъиљШbІX¬¶З§vШ^–)Ює{.nЗ+‰·ҐЉ{±{ xЉ{ayє К‡Ъ™л,j ­ўfЈў·hљ‹аz№ ®wҐўё ў·¦j:+v‰ЁЉwиjШm¶џяѕ «‘кзzZ+ѓщљЋЉЭўj"ќъ!¶i
> >> 
> >> 
> >> Cheers, Andreas
> >> 
> >> 
> >> 
> >> 
> >> 
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Regarding random grouop search start for allocation of inode.
  2015-12-19  5:09             ` lokesh
@ 2015-12-21 18:25               ` Andreas Dilger
  0 siblings, 0 replies; 10+ messages in thread
From: Andreas Dilger @ 2015-12-21 18:25 UTC (permalink / raw)
  To: lokesh; +Cc: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 12396 bytes --]

On Dec 18, 2015, at 10:09 PM, lokesh <lokesh.jaliminche@gmail.com> wrote:
> 
> From 9e09fef78b2fa552c883bf8124af873abfde0805 Mon Sep 17 00:00:00 2001
> From: Lokesh Nagappa Jaliminche <lokesh.jaliminche@gmail.com>
> Date: Sat, 19 Dec 2015 00:33:06 +0530
> Subject: [PATCH] ext4: optimizing group serch for inode allocation

You need to send the patch with [PATCH] in the actual email Subject
line, or it will likely be missed by Ted.  In this case, you should
use "[PATCH v3]" for the new patch, since you've already sent out this
patch twice.

> Added a check at the start of group search loop to
> avoid looping unecessarily in case of empty group.
> This also allow group search to jump directly to
> "found_flex_bg" with "stats" and "group" already set,
> so there is no need to go through the extra steps of
> setting "best_desc" and "best_group" and then break
> out of the loop just to set "stats" and "group" again.


> Signed-off-by: Lokesh Nagappa Jaliminche <lokesh.jaliminche@gmail.com>
> ---
> fs/ext4/ialloc.c |    8 ++++++++
> 1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 1b8024d..588bf8e 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -446,6 +446,8 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> 	struct ext4_sb_info *sbi = EXT4_SB(sb);
> 	ext4_group_t real_ngroups = ext4_get_groups_count(sb);
> 	int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
> +	unsigned int inodes_per_flex_group;
> +	long unsigned int blocks_per_clustre;

s/clustre/cluster/ but it should really be "clusters_per_flex_group"
to match "inodes_per_flex_group".

These declarations should be inside the "if" block, by "int best_ndir"

> 	unsigned int freei, avefreei, grp_free;
> 	ext4_fsblk_t freeb, avefreec;
> 	unsigned int ndirs;
> @@ -470,6 +472,8 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> 		percpu_counter_read_positive(&sbi->s_freeclusters_counter));
> 	avefreec = freeb;
> 	do_div(avefreec, ngroups);
> +	inodes_per_flex_group = inodes_per_group * flex_size;
> +	blocks_per_clustre = sbi->s_blocks_per_group * flex_size;

This calculation should also be inside the "if (S_ISDIR()" block below.

> 	ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
> ·
> 	if (S_ISDIR(mode) &&
> @@ -489,6 +493,10 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> 		for (i = 0; i < ngroups; i++) {
> 			g = (parent_group + i) % ngroups;
> 			get_orlov_stats(sb, g, flex_size, &stats);
> +			/* the group can't get any better than empty */
> +			if (inodes_per_flex_group == stats.free_inodes &&
> +				blocks_per_clustre == stats.free_clusters)

Align continued line after '(' on previous line.

> +					goto found_flex_bg;

This should only have a single level of indent.

Cheers, Andreas

> 			if (!stats.free_inodes)
> 				continue;
> 			if (stats.used_dirs >= best_ndir)
> --
> 1.7.1
> On Tue, Dec 15, 2015 at 02:32:52PM -0700, Andreas Dilger wrote:
>> On Dec 15, 2015, at 3:33 AM, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>>> 
>>> "No, this isn't correct.  This loop is looking for the *best* group it
>>> can find, and your "break" would have it exit the loop as soon as the
>>> *first* group that matches the conditions is found."
>>> 
>>> But as we are checking  all the groups with the same conditions then
>>> how it guarantees better group selection ? As per my understanding  we
>>> are just wasting time in looping.
>> 
>> The important part of the loop is ensuring that the selected group is always
>> improving over the previous one:
>> 
>> 			if (le16_to_cpu(desc->bg_used_dirs_count) >= best_ndir)
>>                                continue;
>> 
>> The "best_ndir" value tracks for the best group found so far the number of
>> used directories in the group, and if the new directory has fewer directories
>> than the  previous "best" directory and still meets all the other criteria
>> (fewer than average inodes allocated, etc) then the new group will be chosen.
>> 
>> That said, you are correct that the loop can spend a lot of time searching
>> needlessly.  It would be trivial to add a check at the start of the loop:
>> 
>> 			/* the group can't get any better than empty */
>> 			if (desc->bg_free_inodes_count == inodes_per_group &&
>> 			    desc->bg_free_blocks_count ==
>> 						EXT4_BLOCKS_PER_GROUP(sb))
>> 				goto found;
>> 
>> This jumps directly to "found" with "desc" and "group" already set, so
>> there is no need to go through the extra steps of setting "best_desc" and
>> "best_group" and then break out of the loop just to set "desc" and "group"
>> again.
>> 
>> Since you are the one to find this optimization, could you please submit a
>> patch to this effect so you get the credit.
>> 
>> Cheers, Andreas
>> 
>>> On Sat, Dec 5, 2015 at 3:24 AM, Andreas Dilger <adilger@dilger.ca> wrote:
>>>> 
>>>>> On Dec 3, 2015, at 12:13 PM, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>>>>> 
>>>>> Ohh thanks for the clarification. There is one more thing I would like
>>>>> to point out here.
>>>>> In the code there  is a loop to scan the groups for inode
>>>>> alllocation(Inside find_group_orlov function).
>>>>> There are some policies for group selection . while scanning the
>>>>> groups, it checks for these
>>>>> policies to be satisfied.
>>>>> If a particular group satisfy these properties it should get selected
>>>>> for inode allocation but instead
>>>>> it does further lookup in next groups.
>>>>> I think there is missing breaking condition. I have added break over
>>>>> there and here is the
>>>>> patch for that. Any reason for not having break condition over here ?
>>>>> 
>>>>> diff -Nur linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
>>>>> linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c
>>>>> --- linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2014-04-12
>>>>> 01:20:31.000000000 +0530
>>>>> +++ linux-2.6.32-431.17.1.el6.x86_64/fs/ext4/ialloc.c    2015-11-29
>>>>> 21:36:51.805542209 +0530
>>>>> @@ -529,6 +529,7 @@
>>>>>           grp = g;
>>>>>           ret = 0;
>>>>>           best_ndir = stats.used_dirs;
>>>>> +            break;
>>>>>       }
>>>>>       if (ret)
>>>>>       goto fallback;
>>>> 
>>>> No, this isn't correct.  This loop is looking for the *best* group it can find,
>>>> and your "break" would have it exit the loop as soon as the *first* directory
>>>> that matches the conditions is found.  Since those conditions are fairly weak,
>>>> for example that the group actually has free inodes, and it has better than
>>>> average free inodes and blocks, it makes sense to search beyond just the first
>>>> matching group.
>>>> 
>>>> That said, it also doesn't make sense to search beyond a "perfect" group that
>>>> has no allocated inodes and no allocated blocks, so a break condition could be
>>>> added to this loop and make it more efficient, especially for very large
>>>> filesystems that have 128k+ groups.
>>>> 
>>>> It should be noted that this part of the algorithm only applies to "top level"
>>>> directories (those below the root inode, or with the EXT4_INODE_TOPDIR flag
>>>> set, so searching a bit longer for a good group is not a bad idea in this case.
>>>> 
>>>> Cheers, Andreas.
>>>> 
>>>>> Thanks & Regards,
>>>>> Lokesh
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Dec 3, 2015 at 11:28 PM, Andreas Dilger <adilger@dilger.ca> wrote:
>>>>>> On Dec 3, 2015, at 01:07, lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>>>>>>> 
>>>>>>> Thought of giving more clarification on my question
>>>>>>> why group search start is random ? because we can also start  search
>>>>>>> for valid groups for inode allocation from the start. As this group
>>>>>>> search is random  inode selection might go to end of groups which
>>>>>>> might affect IO performance
>>>>>> 
>>>>>> Starting the inode search at the beginning of the disk each time
>>>>>> means that inode allocation will be inefficient because it will search
>>>>>> over groups that are mostly or entirely full already.
>>>>>> 
>>>>>> Allocating the new directory in a semi-random group, one that is
>>>>>> relatively unused, ensures that new
>>>>>> inode and block allocations are relatively efficient afterward.
>>>>>> 
>>>>>> Cheers, Andreas
>>>>>> 
>>>>>>> On Thu, Dec 3, 2015 at 1:14 PM, lokesh jaliminche
>>>>>>> <lokesh.jaliminche@gmail.com> wrote:
>>>>>>>> hello folks,
>>>>>>>>             I am new to ext4 code. I was going through the
>>>>>>>> ext4-source for allocation of inode.
>>>>>>>> There is one thing that I did not understand while selection of groups
>>>>>>>> for inode allocation . I came across this code snippet which is part
>>>>>>>> of find_group_orlov function. question is, why group search start is
>>>>>>>> random ?
>>>>>>>> 
>>>>>>>> Code snippet:
>>>>>>>> ==========
>>>>>>>> В·В·В·if (qstr) {
>>>>>>>> »·······»·······»·······hinfo.hash_version = LDISKFS_DX_HASH_HALF_MD4;
>>>>>>>> »·······»·······»·······hinfo.seed = sbi->s_hash_seed;
>>>>>>>> »·······»·······»·······ldiskfsfs_dirhash(qstr->name, qstr->len, &hinfo);
>>>>>>>> »·······»·······»·······grp = hinfo.hash;
>>>>>>>> »·······»·······} else
>>>>>>>> »·······»·······»·······get_random_bytes(&grp, sizeof(grp));
>>>>>>>> »·······»·······parent_group = (unsigned)grp % ngroups;
>>>>>>>> »·······»·······for (i = 0; i < ngroups; i++) {
>>>>>>>> »·······»·······»·······g = (parent_group + i) % ngroups;
>>>>>>>> »·······»·······»·······get_orlov_stats(sb, g, flex_size, &stats);
>>>>>>>> »·······»·······»·······if (!stats.free_inodes)
>>>>>>>> »·······»·······»·······»·······continue;
>>>>>>>> »·······»·······»·······if (stats.used_dirs >= best_ndir)
>>>>>>>> »·······»·······»·······»·······continue;
>>>>>>>> »·······»·······»·······if (stats.free_inodes < avefreei)
>>>>>>>> »·······»·······»·······»·······continue;
>>>>>>>> »·······»·······»·······if (stats.free_blocks < avefreeb)
>>>>>>>> »·······»·······»·······»·······continue;
>>>>>>>> »·······»·······»·······grp = g;
>>>>>>>> »·······»·······»·······ret = 0;
>>>>>>>> »·······»·······»·······best_ndir = stats.used_dirs;
>>>>>>>> »·······»·······}
>>>>>>>> 
>>>>>>>> Thanks & Regards,
>>>>>>>> Lokesh
>>>>>>> N‹§Іжмrё›yъиљШbІX¬¶З§vШ^–)Ює{.nЗ+‰·ҐЉ{±{ xЉ{ayє К‡Ъ™л,j ­ўfЈў·hљ‹аz№ ®wҐўё ў·¦j:+v‰ЁЉwиjШm¶џяѕ «‘кзzZ+ѓщљЋЉЭўj"ќъ!¶i
>>>> 
>>>> 
>>>> Cheers, Andreas
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
>> Cheers, Andreas
>> 
>> 
>> 
>> 
>> 
> 
> 


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-12-21 18:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03  7:44 Regarding random grouop search start for allocation of inode lokesh jaliminche
2015-12-03  8:07 ` lokesh jaliminche
2015-12-03 17:58   ` Andreas Dilger
2015-12-03 19:13     ` lokesh jaliminche
2015-12-04 21:54       ` Andreas Dilger
2015-12-15 10:33         ` lokesh jaliminche
2015-12-15 21:32           ` Andreas Dilger
2015-12-17 14:34             ` lokesh jaliminche
2015-12-19  5:09             ` lokesh
2015-12-21 18:25               ` Andreas Dilger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.