Re: [PATCH 2/2] ext4: skip non-loaded groups at cr=0/1

From: Alex Zhuravlev <azhuravlev@whamcloud.com>
To: Andreas Dilger <adilger@dilger.ca>
Cc: Alex Zhuravlev <azhuravlev@whamcloud.com>,
	Ritesh Harjani <riteshh@linux.ibm.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH 2/2] ext4: skip non-loaded groups at cr=0/1
Date: Wed, 20 May 2020 19:59:09 +0000	[thread overview]
Message-ID: <7F6AF0FC-2E52-4FC5-9663-C8874BA7B98E@whamcloud.com> (raw)
In-Reply-To: <DDB9F79B-55A9-4667-AE03-60D575CAD77A@dilger.ca>

> On 20 May 2020, at 22:34, Andreas Dilger <adilger@dilger.ca> wrote:
> 
> On May 20, 2020, at 2:40 AM, Alex Zhuravlev <azhuravlev@whamcloud.com> wrote:
>> 
>>> On 17 May 2020, at 10:55, Andreas Dilger <adilger@dilger.ca> wrote:
>>> 
>>> The question is whether this is situation is affecting only a few inode
>>> allocations for a short time after mount, or does this persist for a long
>>> time?  I think that it _should_ be only a short time, because these other
>>> threads should all start prefetch on their preferred groups, so even if a
>>> few inodes have their blocks allocated in the "wrong" group, it shouldn't
>>> be a long term problem since the prefetched bitmaps will finish loading
>>> and allow the blocks to be allocated, or skipped if group is fragmented.
>> 
>> Yes, that’s the idea - there is a short window when buddy data is being
>> populated. And for each “cluster” (not just a single group) prefetching
>> will be initiated by allocation.
>> It’s possible that some number of inodes will get “bad” blocks right after
>> after mount.
>> If you think this is a bad scenario I can introduce couple more things:
>> 1) few times discussed prefetching thread
>> 2) let mballoc wait for the goal group to get ready - this essentials one
>>   more check in ext4_mb_good_group()
> 
> IMHO, this is an acceptable "cache warmup" behavior, not really different
> than mballoc doing limited scanning when looking for any other allocation.
> Since we already separate inode table blocks and data blocks into separate
> groups due to flex_bg, I don't think any group is "better" than another,
> so long as the allocations are avoiding worst-case fragmentation (i.e. a
> series of one-block allocations).

I tend to agree, but refreshed the patch to enable waiting for the goal group
(one more check). Extra waiting for one group during warmup should be fine, IMO.

Thanks, Alex