All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] EXT4 trim bug fixes and improvement.
@ 2011-02-09  5:52 Tao Ma
  2011-02-09  5:57 ` [PATCH 1/4] ext4: fix trim length underflow with small trim length Tao Ma
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Tao Ma @ 2011-02-09  5:52 UTC (permalink / raw)
  To: ext4 development

Hi Ted and the list,
     Here are 4 patches for trim support in ext4.
     Patch 1/4 is a fix for the 'len' overflow. As we have decided that 
we don't adjust start to s_first_data_block, this patch is still needed.
     Patch 2/4 is a speedup for trim in one group.
     Patch 3/4 adds trace points for the functions used in FITRIM.
     Patch 4/4 is a speedup for trim in the whole volume. For more 
details, please check the commit log.

Regards,
Tao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/4] ext4: fix trim length underflow with small trim length.
  2011-02-09  5:52 [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
@ 2011-02-09  5:57 ` Tao Ma
  2011-02-09  5:57 ` [PATCH 2/4] ext4: speed up group trim with the right free block count Tao Ma
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Tao Ma @ 2011-02-09  5:57 UTC (permalink / raw)
  To: linux-ext4

From: Tao Ma <boyu.mt@taobao.com>

In 0f0a25b, we adjust 'len' with s_first_data_block - start, but
it could underflow in case blocksize=1K, fstrim_range.len=512 and
fstrim_range.start = 0. In this case, when we run the code:
len -= first_data_blk - start; len will be underflow to -1ULL.
In the end, although we are safe that last_group check later will limit
the trim to the whole volume, but that isn't what the user really want.

So this patch fix it. It also adds the check for 'start' like ext3 so that
we can break immediately if the start is invalid.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/mballoc.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 02cff4a..94e9f23 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4839,6 +4839,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 
 	if (unlikely(minlen > EXT4_BLOCKS_PER_GROUP(sb)))
 		return -EINVAL;
+	if (start >= ext4_blocks_count(EXT4_SB(sb)->s_es) ||
+	    start + len <= first_data_blk)
+		goto out;
 	if (start < first_data_blk) {
 		len -= first_data_blk - start;
 		start = first_data_blk;
@@ -4883,5 +4886,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 	}
 	range->len = trimmed * sb->s_blocksize;
 
+out:
 	return ret;
 }
-- 
1.6.3.GIT


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/4] ext4: speed up group trim with the right free block count.
  2011-02-09  5:52 [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
  2011-02-09  5:57 ` [PATCH 1/4] ext4: fix trim length underflow with small trim length Tao Ma
@ 2011-02-09  5:57 ` Tao Ma
  2011-02-09  5:57 ` [PATCH 3/4] ext4: Add new ext4 trim tracepoints Tao Ma
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Tao Ma @ 2011-02-09  5:57 UTC (permalink / raw)
  To: linux-ext4

From: Tao Ma <boyu.mt@taobao.com>

When we trim some free blocks in a group of ext4, we should
calculate the free blocks properly and check whether there are
enough freed blocks left for us to trim. Current solution will
only calculate free spaces if they are large for a trim which
isn't appropriate.

Let us see a small example:
a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
And minblocks is 1M. With current solution, we have to iterate
the whole group since these 300k will never be subtracted from
1.5M. But actually we should exit after we find the first 2
free spaces since the left 3 chunks only sum up to 900K if we
subtract the first 600K although they can't be trimed.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/mballoc.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 94e9f23..806adb2 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4757,7 +4757,7 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 		ext4_grpblk_t start, ext4_grpblk_t max, ext4_grpblk_t minblocks)
 {
 	void *bitmap;
-	ext4_grpblk_t next, count = 0;
+	ext4_grpblk_t next, count = 0, free_count = 0;
 	ext4_group_t group;
 	int ret = 0;
 
@@ -4782,6 +4782,7 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 				break;
 			count += next - start;
 		}
+		free_count += next - start;
 		start = next + 1;
 
 		if (fatal_signal_pending(current)) {
@@ -4795,7 +4796,7 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 			ext4_lock_group(sb, group);
 		}
 
-		if ((e4b->bd_info->bb_free - count) < minblocks)
+		if ((e4b->bd_info->bb_free - free_count) < minblocks)
 			break;
 	}
 	ext4_unlock_group(sb, group);
-- 
1.6.3.GIT


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/4] ext4: Add new ext4 trim tracepoints
  2011-02-09  5:52 [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
  2011-02-09  5:57 ` [PATCH 1/4] ext4: fix trim length underflow with small trim length Tao Ma
  2011-02-09  5:57 ` [PATCH 2/4] ext4: speed up group trim with the right free block count Tao Ma
@ 2011-02-09  5:57 ` Tao Ma
  2011-02-09  5:57 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
  2011-02-22 13:11 ` [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
  4 siblings, 0 replies; 23+ messages in thread
From: Tao Ma @ 2011-02-09  5:57 UTC (permalink / raw)
  To: linux-ext4

From: Tao Ma <boyu.mt@taobao.com>

Add ext4_trim_extent and ext4_trim_all_free.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/mballoc.c           |    5 ++++
 include/trace/events/ext4.h |   49 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 806adb2..4eadac8 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4715,6 +4715,8 @@ static int ext4_trim_extent(struct super_block *sb, int start, int count,
 	struct ext4_free_extent ex;
 	int ret = 0;
 
+	trace_ext4_trim_extent(sb, group, start, count);
+
 	assert_spin_locked(ext4_group_lock_ptr(sb, group));
 
 	ex.fe_start = start;
@@ -4767,8 +4769,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 	group = e4b->bd_group;
 	start = (e4b->bd_info->bb_first_free > start) ?
 		e4b->bd_info->bb_first_free : start;
+
 	ext4_lock_group(sb, group);
 
+	trace_ext4_trim_all_free(sb, group, start, max);
+
 	while (start < max) {
 		start = mb_find_next_zero_bit(bitmap, max, start);
 		if (start >= max)
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index e5e345f..b8e6aeb 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -1166,6 +1166,55 @@ DEFINE_EVENT(ext4__bitmap_load, ext4_mb_buddy_bitmap_load,
 	TP_ARGS(sb, group)
 );
 
+DECLARE_EVENT_CLASS(ext4__trim,
+	TP_PROTO(struct super_block *sb,
+		 ext4_group_t group,
+		 ext4_grpblk_t start,
+		 ext4_grpblk_t len),
+
+	TP_ARGS(sb, group, start, len),
+
+	TP_STRUCT__entry(
+		__field(	int,	dev_major               )
+		__field(	int,	dev_minor               )
+		__field(	__u32, 	group			)
+		__field(	int,	start			)
+		__field(	int,	len			)
+	),
+
+	TP_fast_assign(
+		__entry->dev_major	= MAJOR(sb->s_dev);
+		__entry->dev_minor	= MINOR(sb->s_dev);
+		__entry->group		= group;
+		__entry->start		= start;
+		__entry->len		= len;
+	),
+
+	TP_printk("dev %d,%d group %u, start %d, len %d",
+		  __entry->dev_major, __entry->dev_minor,
+		  __entry->group, __entry->start, __entry->len)
+);
+
+DEFINE_EVENT(ext4__trim, ext4_trim_extent,
+
+	TP_PROTO(struct super_block *sb,
+		 ext4_group_t group,
+		 ext4_grpblk_t start,
+		 ext4_grpblk_t len),
+
+	TP_ARGS(sb, group, start, len)
+);
+
+DEFINE_EVENT(ext4__trim, ext4_trim_all_free,
+
+	TP_PROTO(struct super_block *sb,
+		 ext4_group_t group,
+		 ext4_grpblk_t start,
+		 ext4_grpblk_t len),
+
+	TP_ARGS(sb, group, start, len)
+);
+
 #endif /* _TRACE_EXT4_H */
 
 /* This part must be outside protection */
-- 
1.6.3.GIT


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-09  5:52 [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
                   ` (2 preceding siblings ...)
  2011-02-09  5:57 ` [PATCH 3/4] ext4: Add new ext4 trim tracepoints Tao Ma
@ 2011-02-09  5:57 ` Tao Ma
  2011-02-09 14:01   ` Lukas Czerner
  2011-02-10  7:33   ` [PATCH 4/4 v2] " Tao Ma
  2011-02-22 13:11 ` [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
  4 siblings, 2 replies; 23+ messages in thread
From: Tao Ma @ 2011-02-09  5:57 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andreas Dilger, Lukas Czerner

From: Tao Ma <boyu.mt@taobao.com>

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state. The idea
is inspired by Andreas  and Lukas Czerner.

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/ext4.h    |    8 +++++++-
 fs/ext4/mballoc.c |   22 ++++++++++++++++++++++
 2 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0c8d97b..684f48e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1200,6 +1200,9 @@ struct ext4_sb_info {
 	struct ext4_li_request *s_li_request;
 	/* Wait multiplier for lazy initialization thread */
 	unsigned int s_li_wait_mult;
+
+	/* record the last minlen when FITRIM is called. */
+	u64 trim_minlen;
 };
 
 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -1970,10 +1973,13 @@ struct ext4_group_info {
 					 * 5 free 8-block regions. */
 };
 
-#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
+#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
+#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED	1
 
 #define EXT4_MB_GRP_NEED_INIT(grp)	\
 	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp)	\
+	(test_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED, &((grp)->bb_state)))
 
 #define EXT4_MAX_CONTENTION		8
 #define EXT4_CONTENTION_THRESHOLD	2
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4eadac8..3f6b809 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
 	mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
 	mb_check_buddy(e4b);
 
+	if (!ret)
+		EXT4_SB(sb)->trim_minlen = minlen;
+
 	return ret;
 }
 
@@ -2687,6 +2690,16 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
 		rb_erase(&entry->node, &(db->bb_free_root));
 		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
 
+		/*
+		 * Clear the trimmed flag for the group so that the next
+		 * ext4_trim_fs can trim it.
+		 * If the volume is mounted with -o discard, online discard
+		 * is supported and the free blocks will be trimmed online.
+		 */
+		if (!test_opt(sb, DISCARD))
+			clear_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
+				  &(db->bb_state));
+
 		if (!db->bb_free_root.rb_node) {
 			/* No more items in the per group rb tree
 			 * balance refcounts from ext4_mb_free_metadata()
@@ -4772,6 +4785,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 
 	ext4_lock_group(sb, group);
 
+	if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
+	    minblocks >= EXT4_SB(sb)->trim_minlen)
+		goto out;
+
 	trace_ext4_trim_all_free(sb, group, start, max);
 
 	while (start < max) {
@@ -4804,6 +4821,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 		if ((e4b->bd_info->bb_free - free_count) < minblocks)
 			break;
 	}
+
+	if (!ret)
+		set_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
+			&(e4b->bd_info->bb_state));
+out:
 	ext4_unlock_group(sb, group);
 
 	ext4_debug("trimmed %d blocks in the group %d\n",
-- 
1.6.3.GIT


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-09  5:57 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
@ 2011-02-09 14:01   ` Lukas Czerner
  2011-02-09 19:25     ` Andreas Dilger
                       ` (2 more replies)
  2011-02-10  7:33   ` [PATCH 4/4 v2] " Tao Ma
  1 sibling, 3 replies; 23+ messages in thread
From: Lukas Czerner @ 2011-02-09 14:01 UTC (permalink / raw)
  To: Tao Ma; +Cc: linux-ext4, Andreas Dilger, Lukas Czerner

On Wed, 9 Feb 2011, Tao Ma wrote:

> From: Tao Ma <boyu.mt@taobao.com>
> 
> In ext4, when FITRIM is called every time, we iterate all the
> groups and do trim one by one. It is a bit time wasting if the
> group has been trimmed and there is no change since the last
> trim.
> 
> So this patch adds a new flag in ext4_group_info->bb_state to
> indicate that the group has been trimmed, and it will be cleared
> if some blocks is freed(in release_blocks_on_commit). Another
> trim_minlen is added in ext4_sb_info to record the last minlen
> we use to trim the volume, so that if the caller provide a small
> one, we will go on the trim regardless of the bb_state. The idea
> is inspired by Andreas  and Lukas Czerner.

Hi Tao,

This is great, thanks for the patch! Do you have any benchmark numbers
showing how much does it improve FSTRIM time ? It seems it might help a
lot. Couple of comments bellow.

> 
> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> Cc: Lukas Czerner <lczerner@redhat.com>
> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> ---
>  fs/ext4/ext4.h    |    8 +++++++-
>  fs/ext4/mballoc.c |   22 ++++++++++++++++++++++
>  2 files changed, 29 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 0c8d97b..684f48e 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>  	struct ext4_li_request *s_li_request;
>  	/* Wait multiplier for lazy initialization thread */
>  	unsigned int s_li_wait_mult;
> +
> +	/* record the last minlen when FITRIM is called. */
> +	u64 trim_minlen;

Maybe this is a bit nitpicking, but it is not very clear what is the
"trim_minlen" for. I would rather use something like
"last_trim_minblks", or similar.

>  };
>  
>  static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
> @@ -1970,10 +1973,13 @@ struct ext4_group_info {
>  					 * 5 free 8-block regions. */
>  };
>  
> -#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
> +#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED	1
>  
>  #define EXT4_MB_GRP_NEED_INIT(grp)	\
>  	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp)	\
> +	(test_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED, &((grp)->bb_state)))
>  
>  #define EXT4_MAX_CONTENTION		8
>  #define EXT4_CONTENTION_THRESHOLD	2
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 4eadac8..3f6b809 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>  	mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
>  	mb_check_buddy(e4b);
>  
> +	if (!ret)
> +		EXT4_SB(sb)->trim_minlen = minlen;
> +

Did you even tried to compile it ? "minlen" is not defined in
mb_mark_used(), also it does make sense to set trim_minlen there. Did
not you meant it to be in ext4_trim_fs ?

>  	return ret;
>  }
>  
> @@ -2687,6 +2690,16 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
>  		rb_erase(&entry->node, &(db->bb_free_root));
>  		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
>  
> +		/*
> +		 * Clear the trimmed flag for the group so that the next
> +		 * ext4_trim_fs can trim it.
> +		 * If the volume is mounted with -o discard, online discard
> +		 * is supported and the free blocks will be trimmed online.
> +		 */
> +		if (!test_opt(sb, DISCARD))
> +			clear_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
> +				  &(db->bb_state));
> +
>  		if (!db->bb_free_root.rb_node) {
>  			/* No more items in the per group rb tree
>  			 * balance refcounts from ext4_mb_free_metadata()
> @@ -4772,6 +4785,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>  
>  	ext4_lock_group(sb, group);
>  
> +	if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
> +	    minblocks >= EXT4_SB(sb)->trim_minlen)
> +		goto out;
> +

I think this can be placed in the ext4_trim_fs() in the for() loop so we can
avoid unnecessary call to ext4_trim_all_free.

>  	trace_ext4_trim_all_free(sb, group, start, max);
>  
>  	while (start < max) {
> @@ -4804,6 +4821,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>  		if ((e4b->bd_info->bb_free - free_count) < minblocks)
>  			break;
>  	}
> +
> +	if (!ret)
> +		set_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
> +			&(e4b->bd_info->bb_state));
> +out:
>  	ext4_unlock_group(sb, group);
>  
>  	ext4_debug("trimmed %d blocks in the group %d\n",
> 

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-09 14:01   ` Lukas Czerner
@ 2011-02-09 19:25     ` Andreas Dilger
  2011-02-10  1:39       ` Tao Ma
  2011-02-10  1:36     ` Tao Ma
  2011-02-10  3:56     ` Tao Ma
  2 siblings, 1 reply; 23+ messages in thread
From: Andreas Dilger @ 2011-02-09 19:25 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: Tao Ma, linux-ext4

On 2011-02-09, at 06:01, Lukas Czerner wrote:
>> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>> 	struct ext4_li_request *s_li_request;
>> 	/* Wait multiplier for lazy initialization thread */
>> 	unsigned int s_li_wait_mult;
>> +
>> +	/* record the last minlen when FITRIM is called. */
>> +	u64 trim_minlen;
> 
> Maybe this is a bit nitpicking, but it is not very clear what is the
> "trim_minlen" for. I would rather use something like
> "last_trim_minblks", or similar.

And for good measure, this should use the "s_" prefix to indicate it is in the superblock.

>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
>> +#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED	1

It would also be good if the names of these fields were consistent, like

+#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
+#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT	1

>> @@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>> 	mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
>> 	mb_check_buddy(e4b);
>> 
>> +	if (!ret)
>> +		EXT4_SB(sb)->trim_minlen = minlen;
>> +
> 
> Did you even tried to compile it ? "minlen" is not defined in
> mb_mark_used(), also it does make sense to set trim_minlen there. Did
> not you meant it to be in ext4_trim_fs ?

I was wondering exactly the same thing, but I haven't had time to look at the code.

Cheers, Andreas






^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-09 14:01   ` Lukas Czerner
  2011-02-09 19:25     ` Andreas Dilger
@ 2011-02-10  1:36     ` Tao Ma
  2011-02-10  3:56     ` Tao Ma
  2 siblings, 0 replies; 23+ messages in thread
From: Tao Ma @ 2011-02-10  1:36 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, Andreas Dilger

On 02/09/2011 10:01 PM, Lukas Czerner wrote:
> On Wed, 9 Feb 2011, Tao Ma wrote:
>
>    
>> From: Tao Ma<boyu.mt@taobao.com>
>>
>> In ext4, when FITRIM is called every time, we iterate all the
>> groups and do trim one by one. It is a bit time wasting if the
>> group has been trimmed and there is no change since the last
>> trim.
>>
>> So this patch adds a new flag in ext4_group_info->bb_state to
>> indicate that the group has been trimmed, and it will be cleared
>> if some blocks is freed(in release_blocks_on_commit). Another
>> trim_minlen is added in ext4_sb_info to record the last minlen
>> we use to trim the volume, so that if the caller provide a small
>> one, we will go on the trim regardless of the bb_state. The idea
>> is inspired by Andreas  and Lukas Czerner.
>>      
> Hi Tao,
>
> This is great, thanks for the patch! Do you have any benchmark numbers
> showing how much does it improve FSTRIM time ? It seems it might help a
> lot. Couple of comments bellow.
>    
I have a intel x25m ssd. I will try to collect some data about it and 
update them in the commit log.
>    
>> Cc: Andreas Dilger<adilger.kernel@dilger.ca>
>> Cc: Lukas Czerner<lczerner@redhat.com>
>> Signed-off-by: Tao Ma<boyu.mt@taobao.com>
>> ---
>>   fs/ext4/ext4.h    |    8 +++++++-
>>   fs/ext4/mballoc.c |   22 ++++++++++++++++++++++
>>   2 files changed, 29 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 0c8d97b..684f48e 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>>   	struct ext4_li_request *s_li_request;
>>   	/* Wait multiplier for lazy initialization thread */
>>   	unsigned int s_li_wait_mult;
>> +
>> +	/* record the last minlen when FITRIM is called. */
>> +	u64 trim_minlen;
>>      
> Maybe this is a bit nitpicking, but it is not very clear what is the
> "trim_minlen" for. I would rather use something like
> "last_trim_minblks", or similar.
>    
ok, I will change it in the next round.
>    
>>   };
>>
>>   static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
>> @@ -1970,10 +1973,13 @@ struct ext4_group_info {
>>   					 * 5 free 8-block regions. */
>>   };
>>
>> -#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
>> +#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED	1
>>
>>   #define EXT4_MB_GRP_NEED_INIT(grp)	\
>>   	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT,&((grp)->bb_state)))
>> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp)	\
>> +	(test_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,&((grp)->bb_state)))
>>
>>   #define EXT4_MAX_CONTENTION		8
>>   #define EXT4_CONTENTION_THRESHOLD	2
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 4eadac8..3f6b809 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>>   	mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
>>   	mb_check_buddy(e4b);
>>
>> +	if (!ret)
>> +		EXT4_SB(sb)->trim_minlen = minlen;
>> +
>>      
> Did you even tried to compile it ? "minlen" is not defined in
> mb_mark_used(), also it does make sense to set trim_minlen there. Did
> not you meant it to be in ext4_trim_fs ?
>    
oh, sorry, this line should be in the function ext4_trim_fs. I just 
rebased my branch to the lastest kernel to see whether it conflicts.
I don't know why they are moved here.  So the right position is:
@@ -4892,6 +4911,9 @@ int ext4_trim_fs(struct super_block *sb, struct 
fstrim_range *range)
         }
         range->len = trimmed * sb->s_blocksize;

+       if (!ret)
+               EXT4_SB(sb)->trim_minlen = minlen;
+
  out:
         return ret;
  }

Sorry for the trouble.
>    
>>   	return ret;
>>   }
>>
>> @@ -2687,6 +2690,16 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
>>   		rb_erase(&entry->node,&(db->bb_free_root));
>>   		mb_free_blocks(NULL,&e4b, entry->start_blk, entry->count);
>>
>> +		/*
>> +		 * Clear the trimmed flag for the group so that the next
>> +		 * ext4_trim_fs can trim it.
>> +		 * If the volume is mounted with -o discard, online discard
>> +		 * is supported and the free blocks will be trimmed online.
>> +		 */
>> +		if (!test_opt(sb, DISCARD))
>> +			clear_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
>> +				&(db->bb_state));
>> +
>>   		if (!db->bb_free_root.rb_node) {
>>   			/* No more items in the per group rb tree
>>   			 * balance refcounts from ext4_mb_free_metadata()
>> @@ -4772,6 +4785,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>>
>>   	ext4_lock_group(sb, group);
>>
>> +	if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info)&&
>> +	    minblocks>= EXT4_SB(sb)->trim_minlen)
>> +		goto out;
>> +
>>      
> I think this can be placed in the ext4_trim_fs() in the for() loop so we can
> avoid unnecessary call to ext4_trim_all_free.
>    
fair enough. I will update it.

Regards,
Tao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-09 19:25     ` Andreas Dilger
@ 2011-02-10  1:39       ` Tao Ma
  0 siblings, 0 replies; 23+ messages in thread
From: Tao Ma @ 2011-02-10  1:39 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Lukas Czerner, linux-ext4

On 02/10/2011 03:25 AM, Andreas Dilger wrote:
> On 2011-02-09, at 06:01, Lukas Czerner wrote:
>    
>>> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>>> 	struct ext4_li_request *s_li_request;
>>> 	/* Wait multiplier for lazy initialization thread */
>>> 	unsigned int s_li_wait_mult;
>>> +
>>> +	/* record the last minlen when FITRIM is called. */
>>> +	u64 trim_minlen;
>>>        
>> Maybe this is a bit nitpicking, but it is not very clear what is the
>> "trim_minlen" for. I would rather use something like
>> "last_trim_minblks", or similar.
>>      
> And for good measure, this should use the "s_" prefix to indicate it is in the superblock.
>    
ok.
>    
>>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
>>> +#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED	1
>>>        
> It would also be good if the names of these fields were consistent, like
>
> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT	1
>    
thanks for the advice.
>    
>>> @@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>>> 	mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
>>> 	mb_check_buddy(e4b);
>>>
>>> +	if (!ret)
>>> +		EXT4_SB(sb)->trim_minlen = minlen;
>>> +
>>>        
>> Did you even tried to compile it ? "minlen" is not defined in
>> mb_mark_used(), also it does make sense to set trim_minlen there. Did
>> not you meant it to be in ext4_trim_fs ?
>>      
> I was wondering exactly the same thing, but I haven't had time to look at the code.
>    
Sorry, it seems to be caused by my rebase. The right place should be in 
the end of ext4_trim_fs.

Regards,
Tao
> Cheers, Andreas
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-09 14:01   ` Lukas Czerner
  2011-02-09 19:25     ` Andreas Dilger
  2011-02-10  1:36     ` Tao Ma
@ 2011-02-10  3:56     ` Tao Ma
  2 siblings, 0 replies; 23+ messages in thread
From: Tao Ma @ 2011-02-10  3:56 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, Andreas Dilger

Hi Lukas,
On 02/09/2011 10:01 PM, Lukas Czerner wrote:
> On Wed, 9 Feb 2011, Tao Ma wrote:
>
>    
>> From: Tao Ma<boyu.mt@taobao.com>
>>
>> In ext4, when FITRIM is called every time, we iterate all the
>> groups and do trim one by one. It is a bit time wasting if the
>> group has been trimmed and there is no change since the last
>> trim.
>>
>> So this patch adds a new flag in ext4_group_info->bb_state to
>> indicate that the group has been trimmed, and it will be cleared
>> if some blocks is freed(in release_blocks_on_commit). Another
>> trim_minlen is added in ext4_sb_info to record the last minlen
>> we use to trim the volume, so that if the caller provide a small
>> one, we will go on the trim regardless of the bb_state. The idea
>> is inspired by Andreas  and Lukas Czerner.
>>      
> Hi Tao,
>
> This is great, thanks for the patch! Do you have any benchmark numbers
> showing how much does it improve FSTRIM time ? It seems it might help a
> lot. Couple of comments bellow.
>    
<snip>
>    
>> @@ -4772,6 +4785,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>>
>>   	ext4_lock_group(sb, group);
>>
>> +	if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info)&&
>> +	    minblocks>= EXT4_SB(sb)->trim_minlen)
>> +		goto out;
>> +
>>      
> I think this can be placed in the ext4_trim_fs() in the for() loop so we can
> avoid unnecessary call to ext4_trim_all_free.
>    
I remembered why I put the check here. We have to check this flag with 
the group locked. So in ext4_trim_fs, we don't lock the group and it is 
unsafe to check it there since we may free some blocks and clear the bit 
in the same time in release_blocks_on_commit.

Regards,
Tao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 4/4 v2] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-09  5:57 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
  2011-02-09 14:01   ` Lukas Czerner
@ 2011-02-10  7:33   ` Tao Ma
  2011-02-10 11:25     ` Lukas Czerner
  2011-02-10 21:50     ` [PATCH 4/4 v2] " Andreas Dilger
  1 sibling, 2 replies; 23+ messages in thread
From: Tao Ma @ 2011-02-10  7:33 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andreas Dilger, Lukas Czerner

From: Tao Ma <boyu.mt@taobao.com>

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state.

A simple test with my intel x25m ssd:
df -h shows:
/dev/sdb2             108G   35G   68G  34% /mnt/ext4
Block size:               4096

run the FITRIM with the following parameter:
range.start = 0;
range.len = UINT64_MAX;
range.minlen = 1048576;

without the patch:
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m4.039s
user	0m0.000s
sys	0m1.020s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m3.577s
user	0m0.001s
sys	0m1.004s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m3.380s
user	0m0.000s
sys	0m0.991s

with the patch:
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m3.466s
user	0m0.000s
sys	0m0.966s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m0.001s
user	0m0.000s
sys	0m0.001s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m0.001s
user	0m0.000s
sys	0m0.000s

A big improvement for the 2nd and 3rd run.

After I delete some big image files and re-run the trim,
it is still much faster than iterating the whole disk.
/dev/sdb2             108G   25G   78G  24% /mnt/ext4

[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m0.513s
user	0m0.000s
sys	0m0.069s

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/ext4.h    |    8 +++++++-
 fs/ext4/mballoc.c |   22 ++++++++++++++++++++++
 2 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0c8d97b..1d59a63 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1200,6 +1200,9 @@ struct ext4_sb_info {
 	struct ext4_li_request *s_li_request;
 	/* Wait multiplier for lazy initialization thread */
 	unsigned int s_li_wait_mult;
+
+	/* record the last minlen when FITRIM is called. */
+	u64 s_last_trim_minblks;
 };
 
 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -1970,10 +1973,13 @@ struct ext4_group_info {
 					 * 5 free 8-block regions. */
 };
 
-#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
+#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
+#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
 
 #define EXT4_MB_GRP_NEED_INIT(grp)	\
 	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp)	\
+	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
 
 #define EXT4_MAX_CONTENTION		8
 #define EXT4_CONTENTION_THRESHOLD	2
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4eadac8..c7aa094 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2687,6 +2687,16 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
 		rb_erase(&entry->node, &(db->bb_free_root));
 		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
 
+		/*
+		 * Clear the trimmed flag for the group so that the next
+		 * ext4_trim_fs can trim it.
+		 * If the volume is mounted with -o discard, online discard
+		 * is supported and the free blocks will be trimmed online.
+		 */
+		if (!test_opt(sb, DISCARD))
+			clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
+				  &(db->bb_state));
+
 		if (!db->bb_free_root.rb_node) {
 			/* No more items in the per group rb tree
 			 * balance refcounts from ext4_mb_free_metadata()
@@ -4772,6 +4782,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 
 	ext4_lock_group(sb, group);
 
+	if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
+	    minblocks >= EXT4_SB(sb)->s_last_trim_minblks)
+		goto out;
+
 	trace_ext4_trim_all_free(sb, group, start, max);
 
 	while (start < max) {
@@ -4804,6 +4818,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 		if ((e4b->bd_info->bb_free - free_count) < minblocks)
 			break;
 	}
+
+	if (!ret)
+		set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
+			&(e4b->bd_info->bb_state));
+out:
 	ext4_unlock_group(sb, group);
 
 	ext4_debug("trimmed %d blocks in the group %d\n",
@@ -4892,6 +4911,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 	}
 	range->len = trimmed * sb->s_blocksize;
 
+	if (!ret)
+		EXT4_SB(sb)->s_last_trim_minblks = minlen;
+
 out:
 	return ret;
 }
-- 
1.6.3.GIT


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4 v2] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-10  7:33   ` [PATCH 4/4 v2] " Tao Ma
@ 2011-02-10 11:25     ` Lukas Czerner
  2011-02-10 14:58       ` tm
  2011-02-10 21:50     ` [PATCH 4/4 v2] " Andreas Dilger
  1 sibling, 1 reply; 23+ messages in thread
From: Lukas Czerner @ 2011-02-10 11:25 UTC (permalink / raw)
  To: Tao Ma; +Cc: linux-ext4, Andreas Dilger, Lukas Czerner

On Thu, 10 Feb 2011, Tao Ma wrote:

> From: Tao Ma <boyu.mt@taobao.com>
> 
> In ext4, when FITRIM is called every time, we iterate all the
> groups and do trim one by one. It is a bit time wasting if the
> group has been trimmed and there is no change since the last
> trim.
> 
> So this patch adds a new flag in ext4_group_info->bb_state to
> indicate that the group has been trimmed, and it will be cleared
> if some blocks is freed(in release_blocks_on_commit). Another
> trim_minlen is added in ext4_sb_info to record the last minlen
> we use to trim the volume, so that if the caller provide a small
> one, we will go on the trim regardless of the bb_state.
> 
> A simple test with my intel x25m ssd:
> df -h shows:
> /dev/sdb2             108G   35G   68G  34% /mnt/ext4
> Block size:               4096
> 
> run the FITRIM with the following parameter:
> range.start = 0;
> range.len = UINT64_MAX;
> range.minlen = 1048576;
> 
> without the patch:
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real	0m4.039s
> user	0m0.000s
> sys	0m1.020s
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real	0m3.577s
> user	0m0.001s
> sys	0m1.004s
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real	0m3.380s
> user	0m0.000s
> sys	0m0.991s
> 
> with the patch:
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real	0m3.466s
> user	0m0.000s
> sys	0m0.966s
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real	0m0.001s
> user	0m0.000s
> sys	0m0.001s
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real	0m0.001s
> user	0m0.000s
> sys	0m0.000s
> 
> A big improvement for the 2nd and 3rd run.
> 
> After I delete some big image files and re-run the trim,
> it is still much faster than iterating the whole disk.
> /dev/sdb2             108G   25G   78G  24% /mnt/ext4
> 
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real	0m0.513s
> user	0m0.000s
> sys	0m0.069s

Great it looks really good.

> 
> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> Cc: Lukas Czerner <lczerner@redhat.com>
> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> ---
>  fs/ext4/ext4.h    |    8 +++++++-
>  fs/ext4/mballoc.c |   22 ++++++++++++++++++++++
>  2 files changed, 29 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 0c8d97b..1d59a63 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>  	struct ext4_li_request *s_li_request;
>  	/* Wait multiplier for lazy initialization thread */
>  	unsigned int s_li_wait_mult;
> +
> +	/* record the last minlen when FITRIM is called. */
> +	u64 s_last_trim_minblks;
>  };
>  
>  static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
> @@ -1970,10 +1973,13 @@ struct ext4_group_info {
>  					 * 5 free 8-block regions. */
>  };
>  
> -#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
>  
>  #define EXT4_MB_GRP_NEED_INIT(grp)	\
>  	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp)	\
> +	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>  
>  #define EXT4_MAX_CONTENTION		8
>  #define EXT4_CONTENTION_THRESHOLD	2
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 4eadac8..c7aa094 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2687,6 +2687,16 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
>  		rb_erase(&entry->node, &(db->bb_free_root));
>  		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
>  
> +		/*
> +		 * Clear the trimmed flag for the group so that the next
> +		 * ext4_trim_fs can trim it.
> +		 * If the volume is mounted with -o discard, online discard
> +		 * is supported and the free blocks will be trimmed online.
> +		 */
> +		if (!test_opt(sb, DISCARD))
> +			clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
> +				  &(db->bb_state));
> +
>  		if (!db->bb_free_root.rb_node) {
>  			/* No more items in the per group rb tree
>  			 * balance refcounts from ext4_mb_free_metadata()
> @@ -4772,6 +4782,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>  
>  	ext4_lock_group(sb, group);
>  
> +	if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
> +	    minblocks >= EXT4_SB(sb)->s_last_trim_minblks)
> +		goto out;
> +
>  	trace_ext4_trim_all_free(sb, group, start, max);
>  
>  	while (start < max) {
> @@ -4804,6 +4818,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>  		if ((e4b->bd_info->bb_free - free_count) < minblocks)
>  			break;
>  	}
> +
> +	if (!ret)
> +		set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
> +			&(e4b->bd_info->bb_state));
> +out:
>  	ext4_unlock_group(sb, group);
>  
>  	ext4_debug("trimmed %d blocks in the group %d\n",
> @@ -4892,6 +4911,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>  	}
>  	range->len = trimmed * sb->s_blocksize;
>  
> +	if (!ret)
> +		EXT4_SB(sb)->s_last_trim_minblks = minlen;
> +

Since this is not protected by any lock, would not it race in case of
multiple FITRIM calls ?

>  out:
>  	return ret;
>  }
> 

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4 v2] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-10 11:25     ` Lukas Czerner
@ 2011-02-10 14:58       ` tm
  2011-02-21 16:44         ` Lukas Czerner
  0 siblings, 1 reply; 23+ messages in thread
From: tm @ 2011-02-10 14:58 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: Tao Ma, linux-ext4, Andreas Dilger, Lukas Czerner

> On Thu, 10 Feb 2011, Tao Ma wrote:
>
>> From: Tao Ma <boyu.mt@taobao.com>
>>
>> In ext4, when FITRIM is called every time, we iterate all the
>> groups and do trim one by one. It is a bit time wasting if the
>> group has been trimmed and there is no change since the last
>> trim.
>>
>> So this patch adds a new flag in ext4_group_info->bb_state to
>> indicate that the group has been trimmed, and it will be cleared
>> if some blocks is freed(in release_blocks_on_commit). Another
>> trim_minlen is added in ext4_sb_info to record the last minlen
>> we use to trim the volume, so that if the caller provide a small
>> one, we will go on the trim regardless of the bb_state.
>>
>> A simple test with my intel x25m ssd:
>> df -h shows:
>> /dev/sdb2             108G   35G   68G  34% /mnt/ext4
>> Block size:               4096
>>
>> run the FITRIM with the following parameter:
>> range.start = 0;
>> range.len = UINT64_MAX;
>> range.minlen = 1048576;
>>
>> without the patch:
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m4.039s
>> user	0m0.000s
>> sys	0m1.020s
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m3.577s
>> user	0m0.001s
>> sys	0m1.004s
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m3.380s
>> user	0m0.000s
>> sys	0m0.991s
>>
>> with the patch:
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m3.466s
>> user	0m0.000s
>> sys	0m0.966s
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m0.001s
>> user	0m0.000s
>> sys	0m0.001s
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m0.001s
>> user	0m0.000s
>> sys	0m0.000s
>>
>> A big improvement for the 2nd and 3rd run.
>>
>> After I delete some big image files and re-run the trim,
>> it is still much faster than iterating the whole disk.
>> /dev/sdb2             108G   25G   78G  24% /mnt/ext4
>>
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m0.513s
>> user	0m0.000s
>> sys	0m0.069s
>
> Great it looks really good.
>
>>
>> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
>> Cc: Lukas Czerner <lczerner@redhat.com>
>> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
>> ---
>>  fs/ext4/ext4.h    |    8 +++++++-
>>  fs/ext4/mballoc.c |   22 ++++++++++++++++++++++
>>  2 files changed, 29 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 0c8d97b..1d59a63 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>>  	struct ext4_li_request *s_li_request;
>>  	/* Wait multiplier for lazy initialization thread */
>>  	unsigned int s_li_wait_mult;
>> +
>> +	/* record the last minlen when FITRIM is called. */
>> +	u64 s_last_trim_minblks;
>>  };
>>
>>  static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
>> @@ -1970,10 +1973,13 @@ struct ext4_group_info {
>>  					 * 5 free 8-block regions. */
>>  };
>>
>> -#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
>> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
>>
>>  #define EXT4_MB_GRP_NEED_INIT(grp)	\
>>  	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
>> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp)	\
>> +	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>>
>>  #define EXT4_MAX_CONTENTION		8
>>  #define EXT4_CONTENTION_THRESHOLD	2
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 4eadac8..c7aa094 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -2687,6 +2687,16 @@ static void release_blocks_on_commit(journal_t
>> *journal, transaction_t *txn)
>>  		rb_erase(&entry->node, &(db->bb_free_root));
>>  		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
>>
>> +		/*
>> +		 * Clear the trimmed flag for the group so that the next
>> +		 * ext4_trim_fs can trim it.
>> +		 * If the volume is mounted with -o discard, online discard
>> +		 * is supported and the free blocks will be trimmed online.
>> +		 */
>> +		if (!test_opt(sb, DISCARD))
>> +			clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
>> +				  &(db->bb_state));
>> +
>>  		if (!db->bb_free_root.rb_node) {
>>  			/* No more items in the per group rb tree
>>  			 * balance refcounts from ext4_mb_free_metadata()
>> @@ -4772,6 +4782,10 @@ ext4_grpblk_t ext4_trim_all_free(struct
>> super_block *sb, struct ext4_buddy *e4b,
>>
>>  	ext4_lock_group(sb, group);
>>
>> +	if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
>> +	    minblocks >= EXT4_SB(sb)->s_last_trim_minblks)
>> +		goto out;
>> +
>>  	trace_ext4_trim_all_free(sb, group, start, max);
>>
>>  	while (start < max) {
>> @@ -4804,6 +4818,11 @@ ext4_grpblk_t ext4_trim_all_free(struct
>> super_block *sb, struct ext4_buddy *e4b,
>>  		if ((e4b->bd_info->bb_free - free_count) < minblocks)
>>  			break;
>>  	}
>> +
>> +	if (!ret)
>> +		set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
>> +			&(e4b->bd_info->bb_state));
>> +out:
>>  	ext4_unlock_group(sb, group);
>>
>>  	ext4_debug("trimmed %d blocks in the group %d\n",
>> @@ -4892,6 +4911,9 @@ int ext4_trim_fs(struct super_block *sb, struct
>> fstrim_range *range)
>>  	}
>>  	range->len = trimmed * sb->s_blocksize;
>>
>> +	if (!ret)
>> +		EXT4_SB(sb)->s_last_trim_minblks = minlen;
>> +
>
> Since this is not protected by any lock, would not it race in case of
> multiple FITRIM calls ?
yeah, I am also thinking of this, but I don't think we need a new lock
just for this. And I guess atomic_t isn't good here because minlen is a
u64.

Do you think we can use some other spin_lock in ext4 system? I am not
quite familiar with ext4 by now, so do you have any suggestion?

Regards,
Tao



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4 v2] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-10  7:33   ` [PATCH 4/4 v2] " Tao Ma
  2011-02-10 11:25     ` Lukas Czerner
@ 2011-02-10 21:50     ` Andreas Dilger
  2011-02-11  6:29       ` [PATCH 4/4 v3] " Tao Ma
  1 sibling, 1 reply; 23+ messages in thread
From: Andreas Dilger @ 2011-02-10 21:50 UTC (permalink / raw)
  To: Tao Ma; +Cc: linux-ext4, Andreas Dilger, Lukas Czerner

On 2011-02-09, at 23:33, Tao Ma <tm@tao.ma> wrote:
> After I delete some big image files and re-run the trim,
> it is still much faster than iterating the whole disk.
> /dev/sdb2             108G   25G   78G  24% /mnt/ext4
> 
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real    0m0.513s
> user    0m0.000s
> sys    0m0.069s

Excellent results. 

> +#define EXT4_GROUP_INFO_NEED_INIT_BIT        0
> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT        1
> 
> #define EXT4_MB_GRP_NEED_INIT(grp)    \
>    (test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp)    \

For consistency, it would be better to call this:

EXT4_MB_GRP_WAS_TRIMMED(grp)

And also add and use:

EXT4_MB_GRP_SET_TRIMMED(grp)

And 

EXT4_MB_GRP_CLEAR_TRIMMED(grp)

> 

In the code. Then you can add Reviewed-by: on this patch. 

Cheers, Andreas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 4/4 v3] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-10 21:50     ` [PATCH 4/4 v2] " Andreas Dilger
@ 2011-02-11  6:29       ` Tao Ma
  0 siblings, 0 replies; 23+ messages in thread
From: Tao Ma @ 2011-02-11  6:29 UTC (permalink / raw)
  To: linux-ext4; +Cc: Lukas Czerner

From: Tao Ma <boyu.mt@taobao.com>

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state.

A simple test with my intel x25m ssd:
df -h shows:
/dev/sdb2             108G   35G   68G  34% /mnt/ext4
Block size:               4096

run the FITRIM with the following parameter:
range.start = 0;
range.len = UINT64_MAX;
range.minlen = 1048576;

without the patch:
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m4.039s
user	0m0.000s
sys	0m1.020s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m3.577s
user	0m0.001s
sys	0m1.004s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m3.380s
user	0m0.000s
sys	0m0.991s

with the patch:
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m3.466s
user	0m0.000s
sys	0m0.966s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m0.001s
user	0m0.000s
sys	0m0.001s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m0.001s
user	0m0.000s
sys	0m0.000s

A big improvement for the 2nd and 3rd run.

Even after I delete some big image files, it is still much
faster than iterating the whole disk.
/dev/sdb2             108G   25G   78G  24% /mnt/ext4

[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m0.513s
user	0m0.000s
sys	0m0.069s

Cc: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/ext4.h    |   13 ++++++++++++-
 fs/ext4/mballoc.c |   20 ++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0c8d97b..259887d 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1200,6 +1200,9 @@ struct ext4_sb_info {
 	struct ext4_li_request *s_li_request;
 	/* Wait multiplier for lazy initialization thread */
 	unsigned int s_li_wait_mult;
+
+	/* record the last minlen when FITRIM is called. */
+	u64 s_last_trim_minblks;
 };
 
 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -1970,11 +1973,19 @@ struct ext4_group_info {
 					 * 5 free 8-block regions. */
 };
 
-#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
+#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
+#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
 
 #define EXT4_MB_GRP_NEED_INIT(grp)	\
 	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
 
+#define EXT4_MB_GRP_WAS_TRIMMED(grp)	\
+	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_SET_TRIMMED(grp)	\
+	(set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_CLEAR_TRIMMED(grp)	\
+	(clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+
 #define EXT4_MAX_CONTENTION		8
 #define EXT4_CONTENTION_THRESHOLD	2
 
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4eadac8..f1eef35 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2687,6 +2687,15 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
 		rb_erase(&entry->node, &(db->bb_free_root));
 		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
 
+		/*
+		 * Clear the trimmed flag for the group so that the next
+		 * ext4_trim_fs can trim it.
+		 * If the volume is mounted with -o discard, online discard
+		 * is supported and the free blocks will be trimmed online.
+		 */
+		if (!test_opt(sb, DISCARD))
+			EXT4_MB_GRP_CLEAR_TRIMMED(db);
+
 		if (!db->bb_free_root.rb_node) {
 			/* No more items in the per group rb tree
 			 * balance refcounts from ext4_mb_free_metadata()
@@ -4772,6 +4781,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 
 	ext4_lock_group(sb, group);
 
+	if (EXT4_MB_GRP_WAS_TRIMMED(e4b->bd_info) &&
+	    minblocks >= EXT4_SB(sb)->s_last_trim_minblks)
+		goto out;
+
 	trace_ext4_trim_all_free(sb, group, start, max);
 
 	while (start < max) {
@@ -4804,6 +4817,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 		if ((e4b->bd_info->bb_free - free_count) < minblocks)
 			break;
 	}
+
+	if (!ret)
+		EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+out:
 	ext4_unlock_group(sb, group);
 
 	ext4_debug("trimmed %d blocks in the group %d\n",
@@ -4892,6 +4909,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 	}
 	range->len = trimmed * sb->s_blocksize;
 
+	if (!ret)
+		EXT4_SB(sb)->s_last_trim_minblks = minlen;
+
 out:
 	return ret;
 }
-- 
1.6.3.GIT


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4 v2] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-10 14:58       ` tm
@ 2011-02-21 16:44         ` Lukas Czerner
  2011-02-24 14:18           ` [PATCH 4/4 V3] " Tao Ma
  0 siblings, 1 reply; 23+ messages in thread
From: Lukas Czerner @ 2011-02-21 16:44 UTC (permalink / raw)
  To: Tao Ma; +Cc: Lukas Czerner, linux-ext4, Andreas Dilger

On Thu, 10 Feb 2011, tm@tao.ma wrote:

> > On Thu, 10 Feb 2011, Tao Ma wrote:
> >
> >> From: Tao Ma <boyu.mt@taobao.com>
> >>
> >> In ext4, when FITRIM is called every time, we iterate all the
> >> groups and do trim one by one. It is a bit time wasting if the
> >> group has been trimmed and there is no change since the last
> >> trim.
> >>
> >> So this patch adds a new flag in ext4_group_info->bb_state to
> >> indicate that the group has been trimmed, and it will be cleared
> >> if some blocks is freed(in release_blocks_on_commit). Another
> >> trim_minlen is added in ext4_sb_info to record the last minlen
> >> we use to trim the volume, so that if the caller provide a small
> >> one, we will go on the trim regardless of the bb_state.
> >>
> >> A simple test with my intel x25m ssd:
> >> df -h shows:
> >> /dev/sdb2             108G   35G   68G  34% /mnt/ext4
> >> Block size:               4096
> >>
> >> run the FITRIM with the following parameter:
> >> range.start = 0;
> >> range.len = UINT64_MAX;
> >> range.minlen = 1048576;
> >>
> >> without the patch:
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real	0m4.039s
> >> user	0m0.000s
> >> sys	0m1.020s
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real	0m3.577s
> >> user	0m0.001s
> >> sys	0m1.004s
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real	0m3.380s
> >> user	0m0.000s
> >> sys	0m0.991s
> >>
> >> with the patch:
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real	0m3.466s
> >> user	0m0.000s
> >> sys	0m0.966s
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real	0m0.001s
> >> user	0m0.000s
> >> sys	0m0.001s
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real	0m0.001s
> >> user	0m0.000s
> >> sys	0m0.000s
> >>
> >> A big improvement for the 2nd and 3rd run.
> >>
> >> After I delete some big image files and re-run the trim,
> >> it is still much faster than iterating the whole disk.
> >> /dev/sdb2             108G   25G   78G  24% /mnt/ext4
> >>
> >> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> >> real	0m0.513s
> >> user	0m0.000s
> >> sys	0m0.069s
> >
> > Great it looks really good.
> >
> >>
> >> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> >> Cc: Lukas Czerner <lczerner@redhat.com>
> >> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> >> ---
> >>  fs/ext4/ext4.h    |    8 +++++++-
> >>  fs/ext4/mballoc.c |   22 ++++++++++++++++++++++
> >>  2 files changed, 29 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> >> index 0c8d97b..1d59a63 100644
> >> --- a/fs/ext4/ext4.h
> >> +++ b/fs/ext4/ext4.h
> >> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
> >>  	struct ext4_li_request *s_li_request;
> >>  	/* Wait multiplier for lazy initialization thread */
> >>  	unsigned int s_li_wait_mult;
> >> +
> >> +	/* record the last minlen when FITRIM is called. */
> >> +	u64 s_last_trim_minblks;
> >>  };
> >>
> >>  static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
> >> @@ -1970,10 +1973,13 @@ struct ext4_group_info {
> >>  					 * 5 free 8-block regions. */
> >>  };
> >>
> >> -#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
> >> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
> >> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
> >>
> >>  #define EXT4_MB_GRP_NEED_INIT(grp)	\
> >>  	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
> >> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp)	\
> >> +	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
> >>
> >>  #define EXT4_MAX_CONTENTION		8
> >>  #define EXT4_CONTENTION_THRESHOLD	2
> >> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> >> index 4eadac8..c7aa094 100644
> >> --- a/fs/ext4/mballoc.c
> >> +++ b/fs/ext4/mballoc.c
> >> @@ -2687,6 +2687,16 @@ static void release_blocks_on_commit(journal_t
> >> *journal, transaction_t *txn)
> >>  		rb_erase(&entry->node, &(db->bb_free_root));
> >>  		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
> >>
> >> +		/*
> >> +		 * Clear the trimmed flag for the group so that the next
> >> +		 * ext4_trim_fs can trim it.
> >> +		 * If the volume is mounted with -o discard, online discard
> >> +		 * is supported and the free blocks will be trimmed online.
> >> +		 */
> >> +		if (!test_opt(sb, DISCARD))
> >> +			clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
> >> +				  &(db->bb_state));
> >> +
> >>  		if (!db->bb_free_root.rb_node) {
> >>  			/* No more items in the per group rb tree
> >>  			 * balance refcounts from ext4_mb_free_metadata()
> >> @@ -4772,6 +4782,10 @@ ext4_grpblk_t ext4_trim_all_free(struct
> >> super_block *sb, struct ext4_buddy *e4b,
> >>
> >>  	ext4_lock_group(sb, group);
> >>
> >> +	if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
> >> +	    minblocks >= EXT4_SB(sb)->s_last_trim_minblks)
> >> +		goto out;
> >> +
> >>  	trace_ext4_trim_all_free(sb, group, start, max);
> >>
> >>  	while (start < max) {
> >> @@ -4804,6 +4818,11 @@ ext4_grpblk_t ext4_trim_all_free(struct
> >> super_block *sb, struct ext4_buddy *e4b,
> >>  		if ((e4b->bd_info->bb_free - free_count) < minblocks)
> >>  			break;
> >>  	}
> >> +
> >> +	if (!ret)
> >> +		set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT,
> >> +			&(e4b->bd_info->bb_state));
> >> +out:
> >>  	ext4_unlock_group(sb, group);
> >>
> >>  	ext4_debug("trimmed %d blocks in the group %d\n",
> >> @@ -4892,6 +4911,9 @@ int ext4_trim_fs(struct super_block *sb, struct
> >> fstrim_range *range)
> >>  	}
> >>  	range->len = trimmed * sb->s_blocksize;
> >>
> >> +	if (!ret)
> >> +		EXT4_SB(sb)->s_last_trim_minblks = minlen;
> >> +
> >
> > Since this is not protected by any lock, would not it race in case of
> > multiple FITRIM calls ?
> yeah, I am also thinking of this, but I don't think we need a new lock
> just for this. And I guess atomic_t isn't good here because minlen is a
> u64.
> 
> Do you think we can use some other spin_lock in ext4 system? I am not
> quite familiar with ext4 by now, so do you have any suggestion?
> 
> Regards,
> Tao

I am sorry for late answer. Even though minlen is 64-bit long, it can
not be that long in ext4, because it can not be bigger than size of the
allocation group (see mballoc.c:4840). I hope it helps.

I would be very happy to see this upstream, because it can help a lot!

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4] EXT4 trim bug fixes and improvement.
  2011-02-09  5:52 [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
                   ` (3 preceding siblings ...)
  2011-02-09  5:57 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
@ 2011-02-22 13:11 ` Tao Ma
  2011-02-22 13:51   ` Lukas Czerner
  4 siblings, 1 reply; 23+ messages in thread
From: Tao Ma @ 2011-02-22 13:11 UTC (permalink / raw)
  To: ext4 development, Ted Ts'o

Hi Ted,
     any comments for this patch set? Are there ready for the next merge 
window?
     The most complicate patch [4/4] has already reviewed by Andreas.

Regards,
Tao
On 02/09/2011 01:52 PM, Tao Ma wrote:
> Hi Ted and the list,
>     Here are 4 patches for trim support in ext4.
>     Patch 1/4 is a fix for the 'len' overflow. As we have decided that 
> we don't adjust start to s_first_data_block, this patch is still needed.
>     Patch 2/4 is a speedup for trim in one group.
>     Patch 3/4 adds trace points for the functions used in FITRIM.
>     Patch 4/4 is a speedup for trim in the whole volume. For more 
> details, please check the commit log.
>
> Regards,
> Tao
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4] EXT4 trim bug fixes and improvement.
  2011-02-22 13:11 ` [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
@ 2011-02-22 13:51   ` Lukas Czerner
  0 siblings, 0 replies; 23+ messages in thread
From: Lukas Czerner @ 2011-02-22 13:51 UTC (permalink / raw)
  To: Tao Ma; +Cc: ext4 development, Ted Ts'o

On Tue, 22 Feb 2011, Tao Ma wrote:

> Hi Ted,
>     any comments for this patch set? Are there ready for the next merge
> window?
>     The most complicate patch [4/4] has already reviewed by Andreas.

But still, I think there is a race -- see my previous comments in the
thread where I suggested using atomic_t for s_last_trim_minblks.

Thanks!
-Lukas

> 
> Regards,
> Tao
> On 02/09/2011 01:52 PM, Tao Ma wrote:
> > Hi Ted and the list,
> >     Here are 4 patches for trim support in ext4.
> >     Patch 1/4 is a fix for the 'len' overflow. As we have decided that we
> > don't adjust start to s_first_data_block, this patch is still needed.
> >     Patch 2/4 is a speedup for trim in one group.
> >     Patch 3/4 adds trace points for the functions used in FITRIM.
> >     Patch 4/4 is a speedup for trim in the whole volume. For more details,
> > please check the commit log.
> > 
> > Regards,
> > Tao
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 4/4 V3] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-21 16:44         ` Lukas Czerner
@ 2011-02-24 14:18           ` Tao Ma
  0 siblings, 0 replies; 23+ messages in thread
From: Tao Ma @ 2011-02-24 14:18 UTC (permalink / raw)
  To: linux-ext4; +Cc: lczerner

changlog from v2 -> v3:
change s_last_trim_minblks from u64 to atomic_t to avoid a race.

Regards,
Tao

>From 3875493dea3cc25be5460c175267f08a7ed540bc Mon Sep 17 00:00:00 2001
From: Tao Ma <boyu.mt@taobao.com>
Date: Fri, 25 Feb 2011 06:10:26 +0800
Subject: [PATCH 4/4 V3] ext4: Speed up FITRIM by recording flags in ext4_group_info.

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state.

A simple test with my intel x25m ssd:
df -h shows:
/dev/sdb2             108G   35G   68G  34% /mnt/ext4
Block size:               4096

run the FITRIM with the following parameter:
range.start = 0;
range.len = UINT64_MAX;
range.minlen = 1048576;

without the patch:
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m4.039s
user	0m0.000s
sys	0m1.020s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m3.577s
user	0m0.001s
sys	0m1.004s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m3.380s
user	0m0.000s
sys	0m0.991s

with the patch:
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m3.466s
user	0m0.000s
sys	0m0.966s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m0.001s
user	0m0.000s
sys	0m0.001s
[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m0.001s
user	0m0.000s
sys	0m0.000s

A big improvement for the 2nd and 3rd run.

Even after I delete some big image files, it is still much
faster than iterating the whole disk.
/dev/sdb2             108G   25G   78G  24% /mnt/ext4

[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m0.513s
user	0m0.000s
sys	0m0.069s

Cc: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/ext4.h    |   13 ++++++++++++-
 fs/ext4/mballoc.c |   20 ++++++++++++++++++++
 fs/ext4/super.c   |    2 ++
 3 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0c8d97b..715c0a4 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1200,6 +1200,9 @@ struct ext4_sb_info {
 	struct ext4_li_request *s_li_request;
 	/* Wait multiplier for lazy initialization thread */
 	unsigned int s_li_wait_mult;
+
+	/* record the last minlen when FITRIM is called. */
+	atomic_t s_last_trim_minblks;
 };
 
 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -1970,11 +1973,19 @@ struct ext4_group_info {
 					 * 5 free 8-block regions. */
 };
 
-#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
+#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
+#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
 
 #define EXT4_MB_GRP_NEED_INIT(grp)	\
 	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
 
+#define EXT4_MB_GRP_WAS_TRIMMED(grp)	\
+	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_SET_TRIMMED(grp)	\
+	(set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_CLEAR_TRIMMED(grp)	\
+	(clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+
 #define EXT4_MAX_CONTENTION		8
 #define EXT4_CONTENTION_THRESHOLD	2
 
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4eadac8..29d7d17 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2687,6 +2687,15 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
 		rb_erase(&entry->node, &(db->bb_free_root));
 		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
 
+		/*
+		 * Clear the trimmed flag for the group so that the next
+		 * ext4_trim_fs can trim it.
+		 * If the volume is mounted with -o discard, online discard
+		 * is supported and the free blocks will be trimmed online.
+		 */
+		if (!test_opt(sb, DISCARD))
+			EXT4_MB_GRP_CLEAR_TRIMMED(db);
+
 		if (!db->bb_free_root.rb_node) {
 			/* No more items in the per group rb tree
 			 * balance refcounts from ext4_mb_free_metadata()
@@ -4772,6 +4781,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 
 	ext4_lock_group(sb, group);
 
+	if (EXT4_MB_GRP_WAS_TRIMMED(e4b->bd_info) &&
+	    minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
+		goto out;
+
 	trace_ext4_trim_all_free(sb, group, start, max);
 
 	while (start < max) {
@@ -4804,6 +4817,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 		if ((e4b->bd_info->bb_free - free_count) < minblocks)
 			break;
 	}
+
+	if (!ret)
+		EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info);
+out:
 	ext4_unlock_group(sb, group);
 
 	ext4_debug("trimmed %d blocks in the group %d\n",
@@ -4892,6 +4909,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 	}
 	range->len = trimmed * sb->s_blocksize;
 
+	if (!ret)
+		atomic_set(&EXT4_SB(sb)->s_last_trim_minblks, minlen);
+
 out:
 	return ret;
 }
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4898cb1..0fa3ed6 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3045,6 +3045,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		sbi->s_sectors_written_start =
 			part_stat_read(sb->s_bdev->bd_part, sectors[1]);
 
+	atomic_set(&sbi->s_last_trim_minblks, 0);
+
 	/* Cleanup superblock name */
 	for (cp = sb->s_id; (cp = strchr(cp, '/'));)
 		*cp = '!';
-- 
1.6.3.GIT


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-07-01 10:46     ` Tao Ma
@ 2011-07-01 11:03       ` Lukas Czerner
  0 siblings, 0 replies; 23+ messages in thread
From: Lukas Czerner @ 2011-07-01 11:03 UTC (permalink / raw)
  To: Tao Ma; +Cc: Lukas Czerner, linux-ext4

On Fri, 1 Jul 2011, Tao Ma wrote:

> >> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> >> index 9ea71aa..080e9f9 100644
> >> --- a/fs/ext4/super.c
> >> +++ b/fs/ext4/super.c
> >> @@ -3086,6 +3086,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> >>  		sbi->s_sectors_written_start =
> >>  			part_stat_read(sb->s_bdev->bd_part, sectors[1]);
> >>  
> >> +	atomic_set(&sbi->s_last_trim_minblks, 0);
> >> +
> > 
> > I am not sure that this is needed since sbi is allocated with kzalloc,
> > so it should be zero already.
> yes, maybe. I can remove it if you like.

yes, please do.

Thanks!
> 
> Thanks
> Tao

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-07-01 10:09   ` Lukas Czerner
@ 2011-07-01 10:46     ` Tao Ma
  2011-07-01 11:03       ` Lukas Czerner
  0 siblings, 1 reply; 23+ messages in thread
From: Tao Ma @ 2011-07-01 10:46 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4

On 07/01/2011 06:09 PM, Lukas Czerner wrote:
> On Thu, 30 Jun 2011, Tao Ma wrote:
> 
>> From: Tao Ma <boyu.mt@taobao.com>
>>
>> In ext4, when FITRIM is called every time, we iterate all the
>> groups and do trim one by one. It is a bit time wasting if the
>> group has been trimmed and there is no change since the last
>> trim.
>>
>> So this patch adds a new flag in ext4_group_info->bb_state to
>> indicate that the group has been trimmed, and it will be cleared
>> if some blocks is freed(in release_blocks_on_commit). Another
>> trim_minlen is added in ext4_sb_info to record the last minlen
>> we use to trim the volume, so that if the caller provide a small
>> one, we will go on the trim regardless of the bb_state.
>>
>> A simple test with my intel x25m ssd:
>> df -h shows:
>> /dev/sdb1              40G   21G   17G  56% /mnt/ext4
>> Block size:               4096
>>
>> run the FITRIM with the following parameter:
>> range.start = 0;
>> range.len = UINT64_MAX;
>> range.minlen = 1048576;
>>
>> without the patch:
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m5.505s
>> user	0m0.000s
>> sys	0m1.224s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m5.359s
>> user	0m0.000s
>> sys	0m1.178s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m5.228s
>> user	0m0.000s
>> sys	0m1.151s
>>
>> with the patch:
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m5.625s
>> user	0m0.000s
>> sys	0m1.269s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m0.002s
>> user	0m0.000s
>> sys	0m0.001s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m0.002s
>> user	0m0.000s
>> sys	0m0.001s
>>
>> A big improvement for the 2nd and 3rd run.
>>
>> Even after I delete some big image files, it is still much
>> faster than iterating the whole disk.
>>
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m1.217s
>> user	0m0.000s
>> sys	0m0.196s
> 
> Thanks for the patch Tao, I have couple of comments bellow.
> 
>>
>> Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
>> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
>> ---
>>  fs/ext4/ext4.h    |   13 ++++++++++++-
>>  fs/ext4/mballoc.c |   20 ++++++++++++++++++++
>>  fs/ext4/super.c   |    2 ++
>>  3 files changed, 34 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 1921392..5878a22 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1214,6 +1214,9 @@ struct ext4_sb_info {
>>  
>>  	/* Kernel thread for multiple mount protection */
>>  	struct task_struct *s_mmp_tsk;
>> +
>> +	/* record the last minlen when FITRIM is called. */
>> +	atomic_t s_last_trim_minblks;
>>  };
>>  
>>  static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
>> @@ -2067,11 +2070,19 @@ struct ext4_group_info {
>>  					 * 5 free 8-block regions. */
>>  };
>>  
>> -#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
>> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
>>  
>>  #define EXT4_MB_GRP_NEED_INIT(grp)	\
>>  	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
>>  
>> +#define EXT4_MB_GRP_WAS_TRIMMED(grp)	\
>> +	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>> +#define EXT4_MB_GRP_SET_TRIMMED(grp)	\
>> +	(set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>> +#define EXT4_MB_GRP_CLEAR_TRIMMED(grp)	\
>> +	(clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>> +
>>  #define EXT4_MAX_CONTENTION		8
>>  #define EXT4_CONTENTION_THRESHOLD	2
>>  
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index a822f2a..afdd869 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -2628,6 +2628,15 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
>>  		rb_erase(&entry->node, &(db->bb_free_root));
>>  		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
>>  
>> +		/*
>> +		 * Clear the trimmed flag for the group so that the next
>> +		 * ext4_trim_fs can trim it.
>> +		 * If the volume is mounted with -o discard, online discard
>> +		 * is supported and the free blocks will be trimmed online.
>> +		 */
>> +		if (!test_opt(sb, DISCARD))
>> +			EXT4_MB_GRP_CLEAR_TRIMMED(db);
> 
> If the online discard failed we should probably clear the flag as well
> right ? However I am not sure how helpful it would be since it'll most
> likely fail again. Maybe we do not need to bother with that case at all,
> what do you think ?
yes, in this case, we don't have a minor chance to trim successfully
later and it will increase the time of the next FITRIM ioctl. So I guess
we can leave it as-is.
> 
>> +
>>  		if (!db->bb_free_root.rb_node) {
>>  			/* No more items in the per group rb tree
>>  			 * balance refcounts from ext4_mb_free_metadata()
>> @@ -4840,6 +4849,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>>  	bitmap = e4b.bd_bitmap;
>>  
>>  	ext4_lock_group(sb, group);
>> +	if (EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) &&
>> +	    minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
>> +		goto out;
>> +
>>  	start = (e4b.bd_info->bb_first_free > start) ?
>>  		e4b.bd_info->bb_first_free : start;
>>  
>> @@ -4871,6 +4884,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>>  		if ((e4b.bd_info->bb_free - free_count) < minblocks)
>>  			break;
>>  	}
>> +
>> +	if (!ret)
>> +		EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
>> +out:
>>  	ext4_unlock_group(sb, group);
>>  	ext4_mb_unload_buddy(&e4b);
>>  
>> @@ -4960,6 +4977,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>>  	}
>>  	range->len = trimmed * sb->s_blocksize;
>>  
>> +	if (!ret)
>> +		atomic_set(&EXT4_SB(sb)->s_last_trim_minblks, minlen);
>> +
>>  out:
>>  	return ret;
>>  }
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index 9ea71aa..080e9f9 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -3086,6 +3086,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>  		sbi->s_sectors_written_start =
>>  			part_stat_read(sb->s_bdev->bd_part, sectors[1]);
>>  
>> +	atomic_set(&sbi->s_last_trim_minblks, 0);
>> +
> 
> I am not sure that this is needed since sbi is allocated with kzalloc,
> so it should be zero already.
yes, maybe. I can remove it if you like.

Thanks
Tao
> 
>>  	/* Cleanup superblock name */
>>  	for (cp = sb->s_id; (cp = strchr(cp, '/'));)
>>  		*cp = '!';
>>
> 
> Thanks!
> -Lukas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-06-30 14:50 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
@ 2011-07-01 10:09   ` Lukas Czerner
  2011-07-01 10:46     ` Tao Ma
  0 siblings, 1 reply; 23+ messages in thread
From: Lukas Czerner @ 2011-07-01 10:09 UTC (permalink / raw)
  To: Tao Ma; +Cc: linux-ext4, tytso

On Thu, 30 Jun 2011, Tao Ma wrote:

> From: Tao Ma <boyu.mt@taobao.com>
> 
> In ext4, when FITRIM is called every time, we iterate all the
> groups and do trim one by one. It is a bit time wasting if the
> group has been trimmed and there is no change since the last
> trim.
> 
> So this patch adds a new flag in ext4_group_info->bb_state to
> indicate that the group has been trimmed, and it will be cleared
> if some blocks is freed(in release_blocks_on_commit). Another
> trim_minlen is added in ext4_sb_info to record the last minlen
> we use to trim the volume, so that if the caller provide a small
> one, we will go on the trim regardless of the bb_state.
> 
> A simple test with my intel x25m ssd:
> df -h shows:
> /dev/sdb1              40G   21G   17G  56% /mnt/ext4
> Block size:               4096
> 
> run the FITRIM with the following parameter:
> range.start = 0;
> range.len = UINT64_MAX;
> range.minlen = 1048576;
> 
> without the patch:
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m5.505s
> user	0m0.000s
> sys	0m1.224s
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m5.359s
> user	0m0.000s
> sys	0m1.178s
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m5.228s
> user	0m0.000s
> sys	0m1.151s
> 
> with the patch:
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m5.625s
> user	0m0.000s
> sys	0m1.269s
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m0.002s
> user	0m0.000s
> sys	0m0.001s
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m0.002s
> user	0m0.000s
> sys	0m0.001s
> 
> A big improvement for the 2nd and 3rd run.
> 
> Even after I delete some big image files, it is still much
> faster than iterating the whole disk.
> 
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real	0m1.217s
> user	0m0.000s
> sys	0m0.196s

Thanks for the patch Tao, I have couple of comments bellow.

> 
> Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> ---
>  fs/ext4/ext4.h    |   13 ++++++++++++-
>  fs/ext4/mballoc.c |   20 ++++++++++++++++++++
>  fs/ext4/super.c   |    2 ++
>  3 files changed, 34 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 1921392..5878a22 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1214,6 +1214,9 @@ struct ext4_sb_info {
>  
>  	/* Kernel thread for multiple mount protection */
>  	struct task_struct *s_mmp_tsk;
> +
> +	/* record the last minlen when FITRIM is called. */
> +	atomic_t s_last_trim_minblks;
>  };
>  
>  static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
> @@ -2067,11 +2070,19 @@ struct ext4_group_info {
>  					 * 5 free 8-block regions. */
>  };
>  
> -#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
>  
>  #define EXT4_MB_GRP_NEED_INIT(grp)	\
>  	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
>  
> +#define EXT4_MB_GRP_WAS_TRIMMED(grp)	\
> +	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
> +#define EXT4_MB_GRP_SET_TRIMMED(grp)	\
> +	(set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
> +#define EXT4_MB_GRP_CLEAR_TRIMMED(grp)	\
> +	(clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
> +
>  #define EXT4_MAX_CONTENTION		8
>  #define EXT4_CONTENTION_THRESHOLD	2
>  
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index a822f2a..afdd869 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2628,6 +2628,15 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
>  		rb_erase(&entry->node, &(db->bb_free_root));
>  		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
>  
> +		/*
> +		 * Clear the trimmed flag for the group so that the next
> +		 * ext4_trim_fs can trim it.
> +		 * If the volume is mounted with -o discard, online discard
> +		 * is supported and the free blocks will be trimmed online.
> +		 */
> +		if (!test_opt(sb, DISCARD))
> +			EXT4_MB_GRP_CLEAR_TRIMMED(db);

If the online discard failed we should probably clear the flag as well
right ? However I am not sure how helpful it would be since it'll most
likely fail again. Maybe we do not need to bother with that case at all,
what do you think ?

> +
>  		if (!db->bb_free_root.rb_node) {
>  			/* No more items in the per group rb tree
>  			 * balance refcounts from ext4_mb_free_metadata()
> @@ -4840,6 +4849,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>  	bitmap = e4b.bd_bitmap;
>  
>  	ext4_lock_group(sb, group);
> +	if (EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) &&
> +	    minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
> +		goto out;
> +
>  	start = (e4b.bd_info->bb_first_free > start) ?
>  		e4b.bd_info->bb_first_free : start;
>  
> @@ -4871,6 +4884,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>  		if ((e4b.bd_info->bb_free - free_count) < minblocks)
>  			break;
>  	}
> +
> +	if (!ret)
> +		EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
> +out:
>  	ext4_unlock_group(sb, group);
>  	ext4_mb_unload_buddy(&e4b);
>  
> @@ -4960,6 +4977,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>  	}
>  	range->len = trimmed * sb->s_blocksize;
>  
> +	if (!ret)
> +		atomic_set(&EXT4_SB(sb)->s_last_trim_minblks, minlen);
> +
>  out:
>  	return ret;
>  }
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 9ea71aa..080e9f9 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3086,6 +3086,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  		sbi->s_sectors_written_start =
>  			part_stat_read(sb->s_bdev->bd_part, sectors[1]);
>  
> +	atomic_set(&sbi->s_last_trim_minblks, 0);
> +

I am not sure that this is needed since sbi is allocated with kzalloc,
so it should be zero already.

>  	/* Cleanup superblock name */
>  	for (cp = sb->s_id; (cp = strchr(cp, '/'));)
>  		*cp = '!';
> 

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-06-30 14:42 [PATCH 0/4 RESEND] ext4 " Tao Ma
@ 2011-06-30 14:50 ` Tao Ma
  2011-07-01 10:09   ` Lukas Czerner
  0 siblings, 1 reply; 23+ messages in thread
From: Tao Ma @ 2011-06-30 14:50 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso

From: Tao Ma <boyu.mt@taobao.com>

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state.

A simple test with my intel x25m ssd:
df -h shows:
/dev/sdb1              40G   21G   17G  56% /mnt/ext4
Block size:               4096

run the FITRIM with the following parameter:
range.start = 0;
range.len = UINT64_MAX;
range.minlen = 1048576;

without the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.505s
user	0m0.000s
sys	0m1.224s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.359s
user	0m0.000s
sys	0m1.178s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.228s
user	0m0.000s
sys	0m1.151s

with the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.625s
user	0m0.000s
sys	0m1.269s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s

A big improvement for the 2nd and 3rd run.

Even after I delete some big image files, it is still much
faster than iterating the whole disk.

[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m1.217s
user	0m0.000s
sys	0m0.196s

Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/ext4.h    |   13 ++++++++++++-
 fs/ext4/mballoc.c |   20 ++++++++++++++++++++
 fs/ext4/super.c   |    2 ++
 3 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1921392..5878a22 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1214,6 +1214,9 @@ struct ext4_sb_info {
 
 	/* Kernel thread for multiple mount protection */
 	struct task_struct *s_mmp_tsk;
+
+	/* record the last minlen when FITRIM is called. */
+	atomic_t s_last_trim_minblks;
 };
 
 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -2067,11 +2070,19 @@ struct ext4_group_info {
 					 * 5 free 8-block regions. */
 };
 
-#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
+#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
+#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
 
 #define EXT4_MB_GRP_NEED_INIT(grp)	\
 	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
 
+#define EXT4_MB_GRP_WAS_TRIMMED(grp)	\
+	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_SET_TRIMMED(grp)	\
+	(set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_CLEAR_TRIMMED(grp)	\
+	(clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+
 #define EXT4_MAX_CONTENTION		8
 #define EXT4_CONTENTION_THRESHOLD	2
 
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a822f2a..afdd869 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2628,6 +2628,15 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
 		rb_erase(&entry->node, &(db->bb_free_root));
 		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
 
+		/*
+		 * Clear the trimmed flag for the group so that the next
+		 * ext4_trim_fs can trim it.
+		 * If the volume is mounted with -o discard, online discard
+		 * is supported and the free blocks will be trimmed online.
+		 */
+		if (!test_opt(sb, DISCARD))
+			EXT4_MB_GRP_CLEAR_TRIMMED(db);
+
 		if (!db->bb_free_root.rb_node) {
 			/* No more items in the per group rb tree
 			 * balance refcounts from ext4_mb_free_metadata()
@@ -4840,6 +4849,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 	bitmap = e4b.bd_bitmap;
 
 	ext4_lock_group(sb, group);
+	if (EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) &&
+	    minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
+		goto out;
+
 	start = (e4b.bd_info->bb_first_free > start) ?
 		e4b.bd_info->bb_first_free : start;
 
@@ -4871,6 +4884,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 		if ((e4b.bd_info->bb_free - free_count) < minblocks)
 			break;
 	}
+
+	if (!ret)
+		EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
+out:
 	ext4_unlock_group(sb, group);
 	ext4_mb_unload_buddy(&e4b);
 
@@ -4960,6 +4977,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 	}
 	range->len = trimmed * sb->s_blocksize;
 
+	if (!ret)
+		atomic_set(&EXT4_SB(sb)->s_last_trim_minblks, minlen);
+
 out:
 	return ret;
 }
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9ea71aa..080e9f9 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3086,6 +3086,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		sbi->s_sectors_written_start =
 			part_stat_read(sb->s_bdev->bd_part, sectors[1]);
 
+	atomic_set(&sbi->s_last_trim_minblks, 0);
+
 	/* Cleanup superblock name */
 	for (cp = sb->s_id; (cp = strchr(cp, '/'));)
 		*cp = '!';
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-07-01 11:03 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-09  5:52 [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
2011-02-09  5:57 ` [PATCH 1/4] ext4: fix trim length underflow with small trim length Tao Ma
2011-02-09  5:57 ` [PATCH 2/4] ext4: speed up group trim with the right free block count Tao Ma
2011-02-09  5:57 ` [PATCH 3/4] ext4: Add new ext4 trim tracepoints Tao Ma
2011-02-09  5:57 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
2011-02-09 14:01   ` Lukas Czerner
2011-02-09 19:25     ` Andreas Dilger
2011-02-10  1:39       ` Tao Ma
2011-02-10  1:36     ` Tao Ma
2011-02-10  3:56     ` Tao Ma
2011-02-10  7:33   ` [PATCH 4/4 v2] " Tao Ma
2011-02-10 11:25     ` Lukas Czerner
2011-02-10 14:58       ` tm
2011-02-21 16:44         ` Lukas Czerner
2011-02-24 14:18           ` [PATCH 4/4 V3] " Tao Ma
2011-02-10 21:50     ` [PATCH 4/4 v2] " Andreas Dilger
2011-02-11  6:29       ` [PATCH 4/4 v3] " Tao Ma
2011-02-22 13:11 ` [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
2011-02-22 13:51   ` Lukas Czerner
2011-06-30 14:42 [PATCH 0/4 RESEND] ext4 " Tao Ma
2011-06-30 14:50 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
2011-07-01 10:09   ` Lukas Czerner
2011-07-01 10:46     ` Tao Ma
2011-07-01 11:03       ` Lukas Czerner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.