All of lore.kernel.org
 help / color / mirror / Atom feed
From: tm@tao.ma
To: "Lukas Czerner" <lczerner@redhat.com>
Cc: "Tao Ma" <tm@tao.ma>,
	linux-ext4@vger.kernel.org,
	"Andreas Dilger" <adilger.kernel@dilger.ca>,
	"Lukas Czerner" <lczerner@redhat.com>
Subject: Re: [PATCH 4/4 v2] ext4: Speed up FITRIM by recording flags in ext4_group_info.
Date: Thu, 10 Feb 2011 07:58:13 -0700	[thread overview]
Message-ID: <e8bec22699d84951aae0f12f31689bed.squirrel@box585.bluehost.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1102101146470.3320@dhcp-27-109.brq.redhat.com>

> On Thu, 10 Feb 2011, Tao Ma wrote:
>
>> From: Tao Ma <boyu.mt@taobao.com>
>>
>> In ext4, when FITRIM is called every time, we iterate all the
>> groups and do trim one by one. It is a bit time wasting if the
>> group has been trimmed and there is no change since the last
>> trim.
>>
>> So this patch adds a new flag in ext4_group_info->bb_state to
>> indicate that the group has been trimmed, and it will be cleared
>> if some blocks is freed(in release_blocks_on_commit). Another
>> trim_minlen is added in ext4_sb_info to record the last minlen
>> we use to trim the volume, so that if the caller provide a small
>> one, we will go on the trim regardless of the bb_state.
>>
>> A simple test with my intel x25m ssd:
>> df -h shows:
>> /dev/sdb2             108G   35G   68G  34% /mnt/ext4
>> Block size:               4096
>>
>> run the FITRIM with the following parameter:
>> range.start = 0;
>> range.len = UINT64_MAX;
>> range.minlen = 1048576;
>>
>> without the patch:
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m4.039s
>> user	0m0.000s
>> sys	0m1.020s
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m3.577s
>> user	0m0.001s
>> sys	0m1.004s
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m3.380s
>> user	0m0.000s
>> sys	0m0.991s
>>
>> with the patch:
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m3.466s
>> user	0m0.000s
>> sys	0m0.966s
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m0.001s
>> user	0m0.000s
>> sys	0m0.001s
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m0.001s
>> user	0m0.000s
>> sys	0m0.000s
>>
>> A big improvement for the 2nd and 3rd run.
>>
>> After I delete some big image files and re-run the trim,
>> it is still much faster than iterating the whole disk.
>> /dev/sdb2             108G   25G   78G  24% /mnt/ext4
>>
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m0.513s
>> user	0m0.000s
>> sys	0m0.069s
>
> Great it looks really good.
>
>>
>> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
>> Cc: Lukas Czerner <lczerner@redhat.com>
>> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
>> ---
>>  fs/ext4/ext4.h    |    8 +++++++-
>>  fs/ext4/mballoc.c |   22 ++++++++++++++++++++++
>>  2 files changed, 29 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 0c8d97b..1d59a63 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>>  	struct ext4_li_request *s_li_request;
>>  	/* Wait multiplier for lazy initialization thread */
>>  	unsigned int s_li_wait_mult;
>> +
>> +	/* record the last minlen when FITRIM is called. */
>> +	u64 s_last_trim_minblks;
>>  };
>>
>>  static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
>> @@ -1970,10 +1973,13 @@ struct ext4_group_info {
>>  					 * 5 free 8-block regions. */
>>  };
>>
>> -#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
>> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
>>
>>  #define EXT4_MB_GRP_NEED_INIT(grp)	\
>>  	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
>> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp)	\
>> +	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>>
>>  #define EXT4_MAX_CONTENTION		8
>>  #define EXT4_CONTENTION_THRESHOLD	2
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 4eadac8..c7aa094 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -2687,6 +2687,16 @@ static void release_blocks_on_commit(journal_t
>> *journal, transaction_t *txn)
>>  		rb_erase(&entry->node, &(db->bb_free_root));
>>  		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
>>
>> +		/*
>> +		 * Clear the trimmed flag for the group so that the next
>> +		 * ext4_trim_fs can trim it.
>> +		 * If the volume is mounted with -o discard, online discard
>> +		 * is supported and the free blocks will be trimmed online.
>> +		 */
>> +		if (!test_opt(sb, DISCARD))
>> +			clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
>> +				  &(db->bb_state));
>> +
>>  		if (!db->bb_free_root.rb_node) {
>>  			/* No more items in the per group rb tree
>>  			 * balance refcounts from ext4_mb_free_metadata()
>> @@ -4772,6 +4782,10 @@ ext4_grpblk_t ext4_trim_all_free(struct
>> super_block *sb, struct ext4_buddy *e4b,
>>
>>  	ext4_lock_group(sb, group);
>>
>> +	if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
>> +	    minblocks >= EXT4_SB(sb)->s_last_trim_minblks)
>> +		goto out;
>> +
>>  	trace_ext4_trim_all_free(sb, group, start, max);
>>
>>  	while (start < max) {
>> @@ -4804,6 +4818,11 @@ ext4_grpblk_t ext4_trim_all_free(struct
>> super_block *sb, struct ext4_buddy *e4b,
>>  		if ((e4b->bd_info->bb_free - free_count) < minblocks)
>>  			break;
>>  	}
>> +
>> +	if (!ret)
>> +		set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
>> +			&(e4b->bd_info->bb_state));
>> +out:
>>  	ext4_unlock_group(sb, group);
>>
>>  	ext4_debug("trimmed %d blocks in the group %d\n",
>> @@ -4892,6 +4911,9 @@ int ext4_trim_fs(struct super_block *sb, struct
>> fstrim_range *range)
>>  	}
>>  	range->len = trimmed * sb->s_blocksize;
>>
>> +	if (!ret)
>> +		EXT4_SB(sb)->s_last_trim_minblks = minlen;
>> +
>
> Since this is not protected by any lock, would not it race in case of
> multiple FITRIM calls ?
yeah, I am also thinking of this, but I don't think we need a new lock
just for this. And I guess atomic_t isn't good here because minlen is a
u64.

Do you think we can use some other spin_lock in ext4 system? I am not
quite familiar with ext4 by now, so do you have any suggestion?

Regards,
Tao



  reply	other threads:[~2011-02-10 14:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-09  5:52 [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
2011-02-09  5:57 ` [PATCH 1/4] ext4: fix trim length underflow with small trim length Tao Ma
2011-02-09  5:57 ` [PATCH 2/4] ext4: speed up group trim with the right free block count Tao Ma
2011-02-09  5:57 ` [PATCH 3/4] ext4: Add new ext4 trim tracepoints Tao Ma
2011-02-09  5:57 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
2011-02-09 14:01   ` Lukas Czerner
2011-02-09 19:25     ` Andreas Dilger
2011-02-10  1:39       ` Tao Ma
2011-02-10  1:36     ` Tao Ma
2011-02-10  3:56     ` Tao Ma
2011-02-10  7:33   ` [PATCH 4/4 v2] " Tao Ma
2011-02-10 11:25     ` Lukas Czerner
2011-02-10 14:58       ` tm [this message]
2011-02-21 16:44         ` Lukas Czerner
2011-02-24 14:18           ` [PATCH 4/4 V3] " Tao Ma
2011-02-10 21:50     ` [PATCH 4/4 v2] " Andreas Dilger
2011-02-11  6:29       ` [PATCH 4/4 v3] " Tao Ma
2011-02-22 13:11 ` [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
2011-02-22 13:51   ` Lukas Czerner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8bec22699d84951aae0f12f31689bed.squirrel@box585.bluehost.com \
    --to=tm@tao.ma \
    --cc=adilger.kernel@dilger.ca \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.