On May 15, 2020, at 2:56 AM, Alex Zhuravlev <azhuravlev@whamcloud.com> wrote:
> On 14 May 2020, at 13:04, Ritesh Harjani <riteshh@linux.ibm.com> wrote:
>>> +		/* cr=0/1 is a very optimistic search to find large
>>> +		 * good chunks almost for free. if buddy data is
>>> +		 * not ready, then this optimization makes no sense */
>> 
>> I guess it will be also helpful to mention a comment related to the
>> discussion that we had on why this should be ok to skip those groups.
>> Because this could result into we skipping the group which is closer to
>> our inode. I somehow couldn't recollect it completely.
> 
> Please remind where the discussion took place? I must be missing that.

Alex, this discussion was during the ext4 weekly developer concall.
I can send you the details if you want to join for next week's call.

The summary of the discussion is that Ted was wondering what happens if
one thread was scanning the filesystem and triggering prefetch on the
groups, but didn't find any loaded (so skips cr=0/1 passes completely),
then does an allocation in its preferred group (assuming there is space
available there, what happens to allocations for other inodes after this?

Presumably, the first thread's prefetch has loaded a bunch of groups, and
even if the second thread starts scanning for blocks at its preferred
group (though a different one from the first thread), it will skip all of
these groups until it finds the group(s) from the first inode allocation
are in memory already, and will proceed to allocate blocks there.

The question is whether this is situation is affecting only a few inode
allocations for a short time after mount, or does this persist for a long
time?  I think that it _should_ be only a short time, because these other
threads should all start prefetch on their preferred groups, so even if a
few inodes have their blocks allocated in the "wrong" group, it shouldn't
be a long term problem since the prefetched bitmaps will finish loading
and allow the blocks to be allocated, or skipped if group is fragmented.


Cheers, Andreas