linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Zhuravlev <azhuravlev@whamcloud.com>
To: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Cc: "Theodore Y. Ts'o" <tytso@mit.edu>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: [RFC] improve malloc for large filesystems
Date: Thu, 21 Nov 2019 08:52:10 +0000	[thread overview]
Message-ID: <9114E776-B44E-4CA5-BD49-C432A688C24E@whamcloud.com> (raw)
In-Reply-To: <4EB2303A-01A3-49AC-B713-195126DB621B@gmail.com>



> On 21 Nov 2019, at 11:30, Artem Blagodarenko <artem.blagodarenko@gmail.com> wrote:
> 
> Hello Alex,
> 
> Code looks good, but I have objections about the approach.
> 
> 512TB disk with 4k block size have 4194304 groups. So 4k groups is only ~0.01% of whole disk.
> Can we make decision to break search and get minimum blocks based on such limited data.
> I am not sure that spending some time to find good group is worse then allocate blocks without 
> optimisation. Especially, if disk is quite free and there are a lot of free block groups.

Exact number isn’t hardcoded and subject to discussion, but you don’t really want to scan 4M 
groups (especially uninitialised) to find “best” chunk.

This can be optimized further like “don’t count initialized and/or empty groups”, but still some limit
Is required, IMO. Notice this limit doesn’t apply if once we tried to find “best”, i.e. it’s applied only
with cr=0 and cr=1.


Thanks, Alex

> 
> Best regards,
> Artem Blagodarenko.
>> On 21 Nov 2019, at 10:03, Alex Zhuravlev <azhuravlev@whamcloud.com> wrote:
>> 
>> 
>> 
>>> On 20 Nov 2019, at 21:13, Theodore Y. Ts'o <tytso@mit.edu> wrote:
>>> 
>>> Hi Alex,
>>> 
>>> A couple of comments.  First, please separate this patch so that these
>>> two separate pieces of functionality can be reviewed and tested
>>> separately:
>>> 
>> 
>> This is the first patch of the series.
>> 
>> Thanks, Alex
>> 
>> From 81c4b3b5a17d94525bbc6d2d89b20f6618b05bc6 Mon Sep 17 00:00:00 2001
>> From: Alex Zhuravlev <bzzz@whamcloud.com>
>> Date: Thu, 21 Nov 2019 09:53:13 +0300
>> Subject: [PATCH 1/2] ext4: limit scanning for a good group
>> 
>> at first two rounds to prevent situation when 10x-100x thousand
>> of groups are scanned, especially non-initialized groups.
>> 
>> Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
>> ---
>> fs/ext4/ext4.h    |  2 ++
>> fs/ext4/mballoc.c | 14 ++++++++++++--
>> fs/ext4/sysfs.c   |  4 ++++
>> 3 files changed, 18 insertions(+), 2 deletions(-)
>> 
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 03db3e71676c..d4e47fdad87c 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1480,6 +1480,8 @@ struct ext4_sb_info {
>> 	/* where last allocation was done - for stream allocation */
>> 	unsigned long s_mb_last_group;
>> 	unsigned long s_mb_last_start;
>> +	unsigned int s_mb_toscan0;
>> +	unsigned int s_mb_toscan1;
>> 
>> 	/* stats for buddy allocator */
>> 	atomic_t s_bal_reqs;	/* number of reqs with len > 1 */
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index a3e2767bdf2f..cebd7d8df0b8 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -2098,7 +2098,7 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
>> static noinline_for_stack int
>> ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>> {
>> -	ext4_group_t ngroups, group, i;
>> +	ext4_group_t ngroups, toscan, group, i;
>> 	int cr;
>> 	int err = 0, first_err = 0;
>> 	struct ext4_sb_info *sbi;
>> @@ -2169,7 +2169,15 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>> 		 */
>> 		group = ac->ac_g_ex.fe_group;
>> 
>> -		for (i = 0; i < ngroups; group++, i++) {
>> +		/* limit number of groups to scan at the first two rounds
>> +		 * when we hope to find something really good */
>> +		toscan = ngroups;
>> +		if (cr == 0)
>> +			toscan = sbi->s_mb_toscan0;
>> +		else if (cr == 1)
>> +			toscan = sbi->s_mb_toscan1;
>> +
>> +		for (i = 0; i < toscan; group++, i++) {
>> 			int ret = 0;
>> 			cond_resched();
>> 			/*
>> @@ -2872,6 +2880,8 @@ void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid)
>> 			bio_put(discard_bio);
>> 		}
>> 	}
>> +	sbi->s_mb_toscan0 = 1024;
>> +	sbi->s_mb_toscan1 = 4096;
>> 
>> 	list_for_each_entry_safe(entry, tmp, &freed_data_list, efd_list)
>> 		ext4_free_data_in_buddy(sb, entry);
>> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
>> index eb1efad0e20a..c96ee20f5487 100644
>> --- a/fs/ext4/sysfs.c
>> +++ b/fs/ext4/sysfs.c
>> @@ -198,6 +198,8 @@ EXT4_RO_ATTR_ES_UI(errors_count, s_error_count);
>> EXT4_ATTR(first_error_time, 0444, first_error_time);
>> EXT4_ATTR(last_error_time, 0444, last_error_time);
>> EXT4_ATTR(journal_task, 0444, journal_task);
>> +EXT4_RW_ATTR_SBI_UI(mb_toscan0, s_mb_toscan0);
>> +EXT4_RW_ATTR_SBI_UI(mb_toscan1, s_mb_toscan1);
>> 
>> static unsigned int old_bump_val = 128;
>> EXT4_ATTR_PTR(max_writeback_mb_bump, 0444, pointer_ui, &old_bump_val);
>> @@ -228,6 +230,8 @@ static struct attribute *ext4_attrs[] = {
>> 	ATTR_LIST(first_error_time),
>> 	ATTR_LIST(last_error_time),
>> 	ATTR_LIST(journal_task),
>> +	ATTR_LIST(mb_toscan0),
>> +	ATTR_LIST(mb_toscan1),
>> 	NULL,
>> };
>> ATTRIBUTE_GROUPS(ext4);
>> -- 
>> 2.20.1
>> 
>> 
> 


  reply	other threads:[~2019-11-21  8:52 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-20 10:35 [RFC] improve malloc for large filesystems Alex Zhuravlev
2019-11-20 11:56 ` Artem Blagodarenko
2019-11-20 18:33   ` Alex Zhuravlev
2019-11-20 18:13 ` Theodore Y. Ts'o
2019-11-20 18:22   ` Alex Zhuravlev
2019-11-21  7:03   ` Alex Zhuravlev
2019-11-21  8:30     ` Artem Blagodarenko
2019-11-21  8:52       ` Alex Zhuravlev [this message]
2019-11-21  9:18         ` Artem Blagodarenko
2019-11-21 14:41           ` Alex Zhuravlev
2019-11-25 21:39             ` Andreas Dilger
2019-12-02  8:46               ` Alex Zhuravlev
2019-11-21  7:03   ` Alex Zhuravlev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9114E776-B44E-4CA5-BD49-C432A688C24E@whamcloud.com \
    --to=azhuravlev@whamcloud.com \
    --cc=artem.blagodarenko@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).