All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tao Ma <tm@tao.ma>
To: Lukas Czerner <lczerner@redhat.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
Date: Fri, 01 Jul 2011 18:46:49 +0800	[thread overview]
Message-ID: <4E0DA599.9040607@tao.ma> (raw)
In-Reply-To: <alpine.LFD.2.00.1107011206580.4168@dhcp-27-109.brq.redhat.com>

On 07/01/2011 06:09 PM, Lukas Czerner wrote:
> On Thu, 30 Jun 2011, Tao Ma wrote:
> 
>> From: Tao Ma <boyu.mt@taobao.com>
>>
>> In ext4, when FITRIM is called every time, we iterate all the
>> groups and do trim one by one. It is a bit time wasting if the
>> group has been trimmed and there is no change since the last
>> trim.
>>
>> So this patch adds a new flag in ext4_group_info->bb_state to
>> indicate that the group has been trimmed, and it will be cleared
>> if some blocks is freed(in release_blocks_on_commit). Another
>> trim_minlen is added in ext4_sb_info to record the last minlen
>> we use to trim the volume, so that if the caller provide a small
>> one, we will go on the trim regardless of the bb_state.
>>
>> A simple test with my intel x25m ssd:
>> df -h shows:
>> /dev/sdb1              40G   21G   17G  56% /mnt/ext4
>> Block size:               4096
>>
>> run the FITRIM with the following parameter:
>> range.start = 0;
>> range.len = UINT64_MAX;
>> range.minlen = 1048576;
>>
>> without the patch:
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m5.505s
>> user	0m0.000s
>> sys	0m1.224s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m5.359s
>> user	0m0.000s
>> sys	0m1.178s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m5.228s
>> user	0m0.000s
>> sys	0m1.151s
>>
>> with the patch:
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m5.625s
>> user	0m0.000s
>> sys	0m1.269s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m0.002s
>> user	0m0.000s
>> sys	0m0.001s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m0.002s
>> user	0m0.000s
>> sys	0m0.001s
>>
>> A big improvement for the 2nd and 3rd run.
>>
>> Even after I delete some big image files, it is still much
>> faster than iterating the whole disk.
>>
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m1.217s
>> user	0m0.000s
>> sys	0m0.196s
> 
> Thanks for the patch Tao, I have couple of comments bellow.
> 
>>
>> Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
>> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
>> ---
>>  fs/ext4/ext4.h    |   13 ++++++++++++-
>>  fs/ext4/mballoc.c |   20 ++++++++++++++++++++
>>  fs/ext4/super.c   |    2 ++
>>  3 files changed, 34 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 1921392..5878a22 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1214,6 +1214,9 @@ struct ext4_sb_info {
>>  
>>  	/* Kernel thread for multiple mount protection */
>>  	struct task_struct *s_mmp_tsk;
>> +
>> +	/* record the last minlen when FITRIM is called. */
>> +	atomic_t s_last_trim_minblks;
>>  };
>>  
>>  static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
>> @@ -2067,11 +2070,19 @@ struct ext4_group_info {
>>  					 * 5 free 8-block regions. */
>>  };
>>  
>> -#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
>> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
>>  
>>  #define EXT4_MB_GRP_NEED_INIT(grp)	\
>>  	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
>>  
>> +#define EXT4_MB_GRP_WAS_TRIMMED(grp)	\
>> +	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>> +#define EXT4_MB_GRP_SET_TRIMMED(grp)	\
>> +	(set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>> +#define EXT4_MB_GRP_CLEAR_TRIMMED(grp)	\
>> +	(clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>> +
>>  #define EXT4_MAX_CONTENTION		8
>>  #define EXT4_CONTENTION_THRESHOLD	2
>>  
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index a822f2a..afdd869 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -2628,6 +2628,15 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
>>  		rb_erase(&entry->node, &(db->bb_free_root));
>>  		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
>>  
>> +		/*
>> +		 * Clear the trimmed flag for the group so that the next
>> +		 * ext4_trim_fs can trim it.
>> +		 * If the volume is mounted with -o discard, online discard
>> +		 * is supported and the free blocks will be trimmed online.
>> +		 */
>> +		if (!test_opt(sb, DISCARD))
>> +			EXT4_MB_GRP_CLEAR_TRIMMED(db);
> 
> If the online discard failed we should probably clear the flag as well
> right ? However I am not sure how helpful it would be since it'll most
> likely fail again. Maybe we do not need to bother with that case at all,
> what do you think ?
yes, in this case, we don't have a minor chance to trim successfully
later and it will increase the time of the next FITRIM ioctl. So I guess
we can leave it as-is.
> 
>> +
>>  		if (!db->bb_free_root.rb_node) {
>>  			/* No more items in the per group rb tree
>>  			 * balance refcounts from ext4_mb_free_metadata()
>> @@ -4840,6 +4849,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>>  	bitmap = e4b.bd_bitmap;
>>  
>>  	ext4_lock_group(sb, group);
>> +	if (EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) &&
>> +	    minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
>> +		goto out;
>> +
>>  	start = (e4b.bd_info->bb_first_free > start) ?
>>  		e4b.bd_info->bb_first_free : start;
>>  
>> @@ -4871,6 +4884,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>>  		if ((e4b.bd_info->bb_free - free_count) < minblocks)
>>  			break;
>>  	}
>> +
>> +	if (!ret)
>> +		EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
>> +out:
>>  	ext4_unlock_group(sb, group);
>>  	ext4_mb_unload_buddy(&e4b);
>>  
>> @@ -4960,6 +4977,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>>  	}
>>  	range->len = trimmed * sb->s_blocksize;
>>  
>> +	if (!ret)
>> +		atomic_set(&EXT4_SB(sb)->s_last_trim_minblks, minlen);
>> +
>>  out:
>>  	return ret;
>>  }
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index 9ea71aa..080e9f9 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -3086,6 +3086,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>  		sbi->s_sectors_written_start =
>>  			part_stat_read(sb->s_bdev->bd_part, sectors[1]);
>>  
>> +	atomic_set(&sbi->s_last_trim_minblks, 0);
>> +
> 
> I am not sure that this is needed since sbi is allocated with kzalloc,
> so it should be zero already.
yes, maybe. I can remove it if you like.

Thanks
Tao
> 
>>  	/* Cleanup superblock name */
>>  	for (cp = sb->s_id; (cp = strchr(cp, '/'));)
>>  		*cp = '!';
>>
> 
> Thanks!
> -Lukas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2011-07-01 10:46 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-30 14:42 [PATCH 0/4 RESEND] ext4 trim bug fixes and improvement Tao Ma
2011-06-30 14:50 ` [PATCH 1/4] ext4: fix trim length underflow with small trim length Tao Ma
2011-07-01  9:45   ` Lukas Czerner
2011-07-01 10:15     ` Tao Ma
2011-07-01 10:20       ` Lukas Czerner
2011-06-30 14:50 ` [PATCH 2/4] ext4: speed up group trim with the right free block count Tao Ma
2011-06-30 16:35   ` Andreas Dilger
2011-07-01  9:48   ` Lukas Czerner
2011-06-30 14:50 ` [PATCH 3/4] ext4: Add new ext4 trim tracepoints Tao Ma
2011-07-01  9:54   ` Lukas Czerner
2011-06-30 14:50 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
2011-07-01 10:09   ` Lukas Czerner
2011-07-01 10:46     ` Tao Ma [this message]
2011-07-01 11:03       ` Lukas Czerner
  -- strict thread matches above, loose matches on Subject: below --
2011-02-09  5:52 [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
2011-02-09  5:57 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
2011-02-09 14:01   ` Lukas Czerner
2011-02-09 19:25     ` Andreas Dilger
2011-02-10  1:39       ` Tao Ma
2011-02-10  1:36     ` Tao Ma
2011-02-10  3:56     ` Tao Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E0DA599.9040607@tao.ma \
    --to=tm@tao.ma \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.