All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4 RESEND]  ext4 trim bug fixes and improvement.
@ 2011-06-30 14:42 Tao Ma
  2011-06-30 14:50 ` [PATCH 1/4] ext4: fix trim length underflow with small trim length Tao Ma
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Tao Ma @ 2011-06-30 14:42 UTC (permalink / raw)
  To: Theodore Ts'o, linux-ext4; +Cc: tm

Hi Ted,
	I just found these patches haven't been included yet, so resend them
and hope they can be included in 3.1.

    Patch 1/4 is a fix for the 'len' overflow. As we have decided that
we don't adjust start to s_first_data_block, this patch is still needed.
    Patch 2/4 is a speedup for trim in one group.
    Patch 3/4 adds trace points for the functions used in FITRIM.
    Patch 4/4 is a speedup for trim for the whole volume.

Thanks,
Tao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] ext4: fix trim length underflow with small trim length.
  2011-06-30 14:42 [PATCH 0/4 RESEND] ext4 trim bug fixes and improvement Tao Ma
@ 2011-06-30 14:50 ` Tao Ma
  2011-07-01  9:45   ` Lukas Czerner
  2011-06-30 14:50 ` [PATCH 2/4] ext4: speed up group trim with the right free block count Tao Ma
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Tao Ma @ 2011-06-30 14:50 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso

From: Tao Ma <boyu.mt@taobao.com>

In 0f0a25b, we adjust 'len' with s_first_data_block - start, but
it could underflow in case blocksize=1K, fstrim_range.len=512 and
fstrim_range.start = 0. In this case, when we run the code:
len -= first_data_blk - start; len will be underflow to -1ULL.
In the end, although we are safe that last_group check later will limit
the trim to the whole volume, but that isn't what the user really want.

So this patch fix it. It also adds the check for 'start' like ext3 so that
we can break immediately if the start is invalid.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/mballoc.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 6ed859d..2336424 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4904,6 +4904,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 
 	if (unlikely(minlen > EXT4_BLOCKS_PER_GROUP(sb)))
 		return -EINVAL;
+	if (start >= ext4_blocks_count(EXT4_SB(sb)->s_es) ||
+	    start + len <= first_data_blk)
+		goto out;
 	if (start < first_data_blk) {
 		len -= first_data_blk - start;
 		start = first_data_blk;
@@ -4952,5 +4955,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 	}
 	range->len = trimmed * sb->s_blocksize;
 
+out:
 	return ret;
 }
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/4] ext4: speed up group trim with the right free block count.
  2011-06-30 14:42 [PATCH 0/4 RESEND] ext4 trim bug fixes and improvement Tao Ma
  2011-06-30 14:50 ` [PATCH 1/4] ext4: fix trim length underflow with small trim length Tao Ma
@ 2011-06-30 14:50 ` Tao Ma
  2011-06-30 16:35   ` Andreas Dilger
  2011-07-01  9:48   ` Lukas Czerner
  2011-06-30 14:50 ` [PATCH 3/4] ext4: Add new ext4 trim tracepoints Tao Ma
  2011-06-30 14:50 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
  3 siblings, 2 replies; 20+ messages in thread
From: Tao Ma @ 2011-06-30 14:50 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso

From: Tao Ma <boyu.mt@taobao.com>

When we trim some free blocks in a group of ext4, we should
calculate the free blocks properly and check whether there are
enough freed blocks left for us to trim. Current solution will
only calculate free spaces if they are large for a trim which
isn't appropriate.

Let us see a small example:
a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
And minblocks is 1M. With current solution, we have to iterate
the whole group since these 300k will never be subtracted from
1.5M. But actually we should exit after we find the first 2
free spaces since the left 3 chunks only sum up to 900K if we
subtract the first 600K although they can't be trimed.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/mballoc.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 2336424..83fb04b 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4823,7 +4823,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 		   ext4_grpblk_t minblocks)
 {
 	void *bitmap;
-	ext4_grpblk_t next, count = 0;
+	ext4_grpblk_t next, count = 0, free_count = 0;
 	struct ext4_buddy e4b;
 	int ret;
 
@@ -4850,6 +4850,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 					 next - start, group, &e4b);
 			count += next - start;
 		}
+		free_count += next - start;
 		start = next + 1;
 
 		if (fatal_signal_pending(current)) {
@@ -4863,7 +4864,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 			ext4_lock_group(sb, group);
 		}
 
-		if ((e4b.bd_info->bb_free - count) < minblocks)
+		if ((e4b.bd_info->bb_free - free_count) < minblocks)
 			break;
 	}
 	ext4_unlock_group(sb, group);
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/4] ext4: Add new ext4 trim tracepoints
  2011-06-30 14:42 [PATCH 0/4 RESEND] ext4 trim bug fixes and improvement Tao Ma
  2011-06-30 14:50 ` [PATCH 1/4] ext4: fix trim length underflow with small trim length Tao Ma
  2011-06-30 14:50 ` [PATCH 2/4] ext4: speed up group trim with the right free block count Tao Ma
@ 2011-06-30 14:50 ` Tao Ma
  2011-07-01  9:54   ` Lukas Czerner
  2011-06-30 14:50 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
  3 siblings, 1 reply; 20+ messages in thread
From: Tao Ma @ 2011-06-30 14:50 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso

From: Tao Ma <boyu.mt@taobao.com>

Add ext4_trim_extent and ext4_trim_all_free.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/mballoc.c           |    4 +++
 include/trace/events/ext4.h |   49 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 83fb04b..a822f2a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4782,6 +4782,8 @@ static void ext4_trim_extent(struct super_block *sb, int start, int count,
 {
 	struct ext4_free_extent ex;
 
+	trace_ext4_trim_extent(sb, group, start, count);
+
 	assert_spin_locked(ext4_group_lock_ptr(sb, group));
 
 	ex.fe_start = start;
@@ -4827,6 +4829,8 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 	struct ext4_buddy e4b;
 	int ret;
 
+	trace_ext4_trim_all_free(sb, group, start, max);
+
 	ret = ext4_mb_load_buddy(sb, group, &e4b);
 	if (ret) {
 		ext4_error(sb, "Error in loading buddy "
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 5ce2b2f..405ea6b 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -1520,6 +1520,55 @@ TRACE_EVENT(ext4_load_inode,
 		  (unsigned long) __entry->ino)
 );
 
+DECLARE_EVENT_CLASS(ext4__trim,
+	TP_PROTO(struct super_block *sb,
+		 ext4_group_t group,
+		 ext4_grpblk_t start,
+		 ext4_grpblk_t len),
+
+	TP_ARGS(sb, group, start, len),
+
+	TP_STRUCT__entry(
+		__field(	int,	dev_major               )
+		__field(	int,	dev_minor               )
+		__field(	__u32, 	group			)
+		__field(	int,	start			)
+		__field(	int,	len			)
+	),
+
+	TP_fast_assign(
+		__entry->dev_major	= MAJOR(sb->s_dev);
+		__entry->dev_minor	= MINOR(sb->s_dev);
+		__entry->group		= group;
+		__entry->start		= start;
+		__entry->len		= len;
+	),
+
+	TP_printk("dev %d,%d group %u, start %d, len %d",
+		  __entry->dev_major, __entry->dev_minor,
+		  __entry->group, __entry->start, __entry->len)
+);
+
+DEFINE_EVENT(ext4__trim, ext4_trim_extent,
+
+	TP_PROTO(struct super_block *sb,
+		 ext4_group_t group,
+		 ext4_grpblk_t start,
+		 ext4_grpblk_t len),
+
+	TP_ARGS(sb, group, start, len)
+);
+
+DEFINE_EVENT(ext4__trim, ext4_trim_all_free,
+
+	TP_PROTO(struct super_block *sb,
+		 ext4_group_t group,
+		 ext4_grpblk_t start,
+		 ext4_grpblk_t len),
+
+	TP_ARGS(sb, group, start, len)
+);
+
 #endif /* _TRACE_EXT4_H */
 
 /* This part must be outside protection */
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-06-30 14:42 [PATCH 0/4 RESEND] ext4 trim bug fixes and improvement Tao Ma
                   ` (2 preceding siblings ...)
  2011-06-30 14:50 ` [PATCH 3/4] ext4: Add new ext4 trim tracepoints Tao Ma
@ 2011-06-30 14:50 ` Tao Ma
  2011-07-01 10:09   ` Lukas Czerner
  3 siblings, 1 reply; 20+ messages in thread
From: Tao Ma @ 2011-06-30 14:50 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso

From: Tao Ma <boyu.mt@taobao.com>

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state.

A simple test with my intel x25m ssd:
df -h shows:
/dev/sdb1              40G   21G   17G  56% /mnt/ext4
Block size:               4096

run the FITRIM with the following parameter:
range.start = 0;
range.len = UINT64_MAX;
range.minlen = 1048576;

without the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.505s
user	0m0.000s
sys	0m1.224s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.359s
user	0m0.000s
sys	0m1.178s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.228s
user	0m0.000s
sys	0m1.151s

with the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m5.625s
user	0m0.000s
sys	0m1.269s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real	0m0.002s
user	0m0.000s
sys	0m0.001s

A big improvement for the 2nd and 3rd run.

Even after I delete some big image files, it is still much
faster than iterating the whole disk.

[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real	0m1.217s
user	0m0.000s
sys	0m0.196s

Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/ext4.h    |   13 ++++++++++++-
 fs/ext4/mballoc.c |   20 ++++++++++++++++++++
 fs/ext4/super.c   |    2 ++
 3 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1921392..5878a22 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1214,6 +1214,9 @@ struct ext4_sb_info {
 
 	/* Kernel thread for multiple mount protection */
 	struct task_struct *s_mmp_tsk;
+
+	/* record the last minlen when FITRIM is called. */
+	atomic_t s_last_trim_minblks;
 };
 
 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -2067,11 +2070,19 @@ struct ext4_group_info {
 					 * 5 free 8-block regions. */
 };
 
-#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
+#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
+#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
 
 #define EXT4_MB_GRP_NEED_INIT(grp)	\
 	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
 
+#define EXT4_MB_GRP_WAS_TRIMMED(grp)	\
+	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_SET_TRIMMED(grp)	\
+	(set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_CLEAR_TRIMMED(grp)	\
+	(clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
+
 #define EXT4_MAX_CONTENTION		8
 #define EXT4_CONTENTION_THRESHOLD	2
 
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a822f2a..afdd869 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2628,6 +2628,15 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
 		rb_erase(&entry->node, &(db->bb_free_root));
 		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
 
+		/*
+		 * Clear the trimmed flag for the group so that the next
+		 * ext4_trim_fs can trim it.
+		 * If the volume is mounted with -o discard, online discard
+		 * is supported and the free blocks will be trimmed online.
+		 */
+		if (!test_opt(sb, DISCARD))
+			EXT4_MB_GRP_CLEAR_TRIMMED(db);
+
 		if (!db->bb_free_root.rb_node) {
 			/* No more items in the per group rb tree
 			 * balance refcounts from ext4_mb_free_metadata()
@@ -4840,6 +4849,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 	bitmap = e4b.bd_bitmap;
 
 	ext4_lock_group(sb, group);
+	if (EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) &&
+	    minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
+		goto out;
+
 	start = (e4b.bd_info->bb_first_free > start) ?
 		e4b.bd_info->bb_first_free : start;
 
@@ -4871,6 +4884,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 		if ((e4b.bd_info->bb_free - free_count) < minblocks)
 			break;
 	}
+
+	if (!ret)
+		EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
+out:
 	ext4_unlock_group(sb, group);
 	ext4_mb_unload_buddy(&e4b);
 
@@ -4960,6 +4977,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 	}
 	range->len = trimmed * sb->s_blocksize;
 
+	if (!ret)
+		atomic_set(&EXT4_SB(sb)->s_last_trim_minblks, minlen);
+
 out:
 	return ret;
 }
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9ea71aa..080e9f9 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3086,6 +3086,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		sbi->s_sectors_written_start =
 			part_stat_read(sb->s_bdev->bd_part, sectors[1]);
 
+	atomic_set(&sbi->s_last_trim_minblks, 0);
+
 	/* Cleanup superblock name */
 	for (cp = sb->s_id; (cp = strchr(cp, '/'));)
 		*cp = '!';
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] ext4: speed up group trim with the right free block count.
  2011-06-30 14:50 ` [PATCH 2/4] ext4: speed up group trim with the right free block count Tao Ma
@ 2011-06-30 16:35   ` Andreas Dilger
  2011-07-01  9:48   ` Lukas Czerner
  1 sibling, 0 replies; 20+ messages in thread
From: Andreas Dilger @ 2011-06-30 16:35 UTC (permalink / raw)
  To: Tao Ma; +Cc: linux-ext4, tytso

On 2011-06-30, at 8:50 AM, Tao Ma wrote:
> From: Tao Ma <boyu.mt@taobao.com>
> 
> When we trim some free blocks in a group of ext4, we should
> calculate the free blocks properly and check whether there are
> enough freed blocks left for us to trim. Current solution will
> only calculate free spaces if they are large for a trim which
> isn't appropriate.
> 
> Let us see a small example:
> a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
> And minblocks is 1M. With current solution, we have to iterate
> the whole group since these 300k will never be subtracted from
> 1.5M. But actually we should exit after we find the first 2
> free spaces since the left 3 chunks only sum up to 900K if we
> subtract the first 600K although they can't be trimed.
> 
> Signed-off-by: Tao Ma <boyu.mt@taobao.com>

Reviewed-by: Andreas Dilger <adilger@dilger.ca>

> ---
> fs/ext4/mballoc.c |    5 +++--
> 1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 2336424..83fb04b 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4823,7 +4823,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
> 		   ext4_grpblk_t minblocks)
> {
> 	void *bitmap;
> -	ext4_grpblk_t next, count = 0;
> +	ext4_grpblk_t next, count = 0, free_count = 0;
> 	struct ext4_buddy e4b;
> 	int ret;
> 
> @@ -4850,6 +4850,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
> 					 next - start, group, &e4b);
> 			count += next - start;
> 		}
> +		free_count += next - start;
> 		start = next + 1;
> 
> 		if (fatal_signal_pending(current)) {
> @@ -4863,7 +4864,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
> 			ext4_lock_group(sb, group);
> 		}
> 
> -		if ((e4b.bd_info->bb_free - count) < minblocks)
> +		if ((e4b.bd_info->bb_free - free_count) < minblocks)
> 			break;
> 	}
> 	ext4_unlock_group(sb, group);
> -- 
> 1.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] ext4: fix trim length underflow with small trim length.
  2011-06-30 14:50 ` [PATCH 1/4] ext4: fix trim length underflow with small trim length Tao Ma
@ 2011-07-01  9:45   ` Lukas Czerner
  2011-07-01 10:15     ` Tao Ma
  0 siblings, 1 reply; 20+ messages in thread
From: Lukas Czerner @ 2011-07-01  9:45 UTC (permalink / raw)
  To: Tao Ma; +Cc: linux-ext4, tytso

On Thu, 30 Jun 2011, Tao Ma wrote:

> From: Tao Ma <boyu.mt@taobao.com>
> 
> In 0f0a25b, we adjust 'len' with s_first_data_block - start, but
> it could underflow in case blocksize=1K, fstrim_range.len=512 and
> fstrim_range.start = 0. In this case, when we run the code:
> len -= first_data_blk - start; len will be underflow to -1ULL.
> In the end, although we are safe that last_group check later will limit
> the trim to the whole volume, but that isn't what the user really want.
> 
> So this patch fix it. It also adds the check for 'start' like ext3 so that
> we can break immediately if the start is invalid.

Hi Tao,

thanks for the resend!

> 
> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> ---
>  fs/ext4/mballoc.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 6ed859d..2336424 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4904,6 +4904,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>  
>  	if (unlikely(minlen > EXT4_BLOCKS_PER_GROUP(sb)))
>  		return -EINVAL;
> +	if (start >= ext4_blocks_count(EXT4_SB(sb)->s_es) ||
> +	    start + len <= first_data_blk)
> +		goto out;

We should really return -EINVAL in case that start is beyond the
filesystem. However we can not return -EINVAL in case that start+len is
before the first data block, because it would require user to know fs
internals.

So simply doing this, should be enough:

	if (start + len <= first_data_blk)
		goto out;

and the code later

	if (first_group > last_group)
		return -EINVAL;

will handle the rest.

Thanks!
-Lukas


>  	if (start < first_data_blk) {
>  		len -= first_data_blk - start;
>  		start = first_data_blk;
> @@ -4952,5 +4955,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>  	}
>  	range->len = trimmed * sb->s_blocksize;
>  
> +out:
>  	return ret;
>  }
> 

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] ext4: speed up group trim with the right free block count.
  2011-06-30 14:50 ` [PATCH 2/4] ext4: speed up group trim with the right free block count Tao Ma
  2011-06-30 16:35   ` Andreas Dilger
@ 2011-07-01  9:48   ` Lukas Czerner
  1 sibling, 0 replies; 20+ messages in thread
From: Lukas Czerner @ 2011-07-01  9:48 UTC (permalink / raw)
  To: Tao Ma; +Cc: linux-ext4, tytso

On Thu, 30 Jun 2011, Tao Ma wrote:

> From: Tao Ma <boyu.mt@taobao.com>
> 
> When we trim some free blocks in a group of ext4, we should
> calculate the free blocks properly and check whether there are
> enough freed blocks left for us to trim. Current solution will
> only calculate free spaces if they are large for a trim which
> isn't appropriate.
> 
> Let us see a small example:
> a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
> And minblocks is 1M. With current solution, we have to iterate
> the whole group since these 300k will never be subtracted from
> 1.5M. But actually we should exit after we find the first 2
> free spaces since the left 3 chunks only sum up to 900K if we
> subtract the first 600K although they can't be trimed.

Looks good, thanks!

-Lukas

> 
> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> ---
>  fs/ext4/mballoc.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 2336424..83fb04b 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4823,7 +4823,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>  		   ext4_grpblk_t minblocks)
>  {
>  	void *bitmap;
> -	ext4_grpblk_t next, count = 0;
> +	ext4_grpblk_t next, count = 0, free_count = 0;
>  	struct ext4_buddy e4b;
>  	int ret;
>  
> @@ -4850,6 +4850,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>  					 next - start, group, &e4b);
>  			count += next - start;
>  		}
> +		free_count += next - start;
>  		start = next + 1;
>  
>  		if (fatal_signal_pending(current)) {
> @@ -4863,7 +4864,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>  			ext4_lock_group(sb, group);
>  		}
>  
> -		if ((e4b.bd_info->bb_free - count) < minblocks)
> +		if ((e4b.bd_info->bb_free - free_count) < minblocks)
>  			break;
>  	}
>  	ext4_unlock_group(sb, group);
> 

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] ext4: Add new ext4 trim tracepoints
  2011-06-30 14:50 ` [PATCH 3/4] ext4: Add new ext4 trim tracepoints Tao Ma
@ 2011-07-01  9:54   ` Lukas Czerner
  0 siblings, 0 replies; 20+ messages in thread
From: Lukas Czerner @ 2011-07-01  9:54 UTC (permalink / raw)
  To: Tao Ma; +Cc: linux-ext4, tytso

On Thu, 30 Jun 2011, Tao Ma wrote:

> From: Tao Ma <boyu.mt@taobao.com>
> 
> Add ext4_trim_extent and ext4_trim_all_free.
> 

I have some nitpicky comment bellow, but other than that it
looks good, you can add:

Reviewed-by: Lukas Czerner <lczerner@redhat.com>

Thanks!
-Lukas

> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> ---
>  fs/ext4/mballoc.c           |    4 +++
>  include/trace/events/ext4.h |   49 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 53 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 83fb04b..a822f2a 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4782,6 +4782,8 @@ static void ext4_trim_extent(struct super_block *sb, int start, int count,
>  {
>  	struct ext4_free_extent ex;
>  
> +	trace_ext4_trim_extent(sb, group, start, count);
> +
>  	assert_spin_locked(ext4_group_lock_ptr(sb, group));
>  
>  	ex.fe_start = start;
> @@ -4827,6 +4829,8 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>  	struct ext4_buddy e4b;
>  	int ret;
>  
> +	trace_ext4_trim_all_free(sb, group, start, max);
> +
>  	ret = ext4_mb_load_buddy(sb, group, &e4b);
>  	if (ret) {
>  		ext4_error(sb, "Error in loading buddy "
> diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> index 5ce2b2f..405ea6b 100644
> --- a/include/trace/events/ext4.h
> +++ b/include/trace/events/ext4.h
> @@ -1520,6 +1520,55 @@ TRACE_EVENT(ext4_load_inode,
>  		  (unsigned long) __entry->ino)
>  );
>  
> +DECLARE_EVENT_CLASS(ext4__trim,
> +	TP_PROTO(struct super_block *sb,
> +		 ext4_group_t group,
> +		 ext4_grpblk_t start,
> +		 ext4_grpblk_t len),
> +
> +	TP_ARGS(sb, group, start, len),
> +
> +	TP_STRUCT__entry(
> +		__field(	int,	dev_major               )
> +		__field(	int,	dev_minor               )

You can rather use tabs for indent.              ^^^^^^^^^^^^^^^

> +		__field(	__u32, 	group			)
> +		__field(	int,	start			)
> +		__field(	int,	len			)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->dev_major	= MAJOR(sb->s_dev);
> +		__entry->dev_minor	= MINOR(sb->s_dev);
> +		__entry->group		= group;
> +		__entry->start		= start;
> +		__entry->len		= len;
> +	),
> +
> +	TP_printk("dev %d,%d group %u, start %d, len %d",
> +		  __entry->dev_major, __entry->dev_minor,
> +		  __entry->group, __entry->start, __entry->len)
> +);
> +
> +DEFINE_EVENT(ext4__trim, ext4_trim_extent,
> +
> +	TP_PROTO(struct super_block *sb,
> +		 ext4_group_t group,
> +		 ext4_grpblk_t start,
> +		 ext4_grpblk_t len),
> +
> +	TP_ARGS(sb, group, start, len)
> +);
> +
> +DEFINE_EVENT(ext4__trim, ext4_trim_all_free,
> +
> +	TP_PROTO(struct super_block *sb,
> +		 ext4_group_t group,
> +		 ext4_grpblk_t start,
> +		 ext4_grpblk_t len),
> +
> +	TP_ARGS(sb, group, start, len)
> +);
> +
>  #endif /* _TRACE_EXT4_H */
>  
>  /* This part must be outside protection */
> 

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-06-30 14:50 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
@ 2011-07-01 10:09   ` Lukas Czerner
  2011-07-01 10:46     ` Tao Ma
  0 siblings, 1 reply; 20+ messages in thread
From: Lukas Czerner @ 2011-07-01 10:09 UTC (permalink / raw)
  To: Tao Ma; +Cc: linux-ext4, tytso

On Thu, 30 Jun 2011, Tao Ma wrote:

> From: Tao Ma <boyu.mt@taobao.com>
> 
> In ext4, when FITRIM is called every time, we iterate all the
> groups and do trim one by one. It is a bit time wasting if the
> group has been trimmed and there is no change since the last
> trim.
> 
> So this patch adds a new flag in ext4_group_info->bb_state to
> indicate that the group has been trimmed, and it will be cleared
> if some blocks is freed(in release_blocks_on_commit). Another
> trim_minlen is added in ext4_sb_info to record the last minlen
> we use to trim the volume, so that if the caller provide a small
> one, we will go on the trim regardless of the bb_state.
> 
> A simple test with my intel x25m ssd:
> df -h shows:
> /dev/sdb1              40G   21G   17G  56% /mnt/ext4
> Block size:               4096
> 
> run the FITRIM with the following parameter:
> range.start = 0;
> range.len = UINT64_MAX;
> range.minlen = 1048576;
> 
> without the patch:
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m5.505s
> user	0m0.000s
> sys	0m1.224s
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m5.359s
> user	0m0.000s
> sys	0m1.178s
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m5.228s
> user	0m0.000s
> sys	0m1.151s
> 
> with the patch:
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m5.625s
> user	0m0.000s
> sys	0m1.269s
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m0.002s
> user	0m0.000s
> sys	0m0.001s
> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
> real	0m0.002s
> user	0m0.000s
> sys	0m0.001s
> 
> A big improvement for the 2nd and 3rd run.
> 
> Even after I delete some big image files, it is still much
> faster than iterating the whole disk.
> 
> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
> real	0m1.217s
> user	0m0.000s
> sys	0m0.196s

Thanks for the patch Tao, I have couple of comments bellow.

> 
> Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> ---
>  fs/ext4/ext4.h    |   13 ++++++++++++-
>  fs/ext4/mballoc.c |   20 ++++++++++++++++++++
>  fs/ext4/super.c   |    2 ++
>  3 files changed, 34 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 1921392..5878a22 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1214,6 +1214,9 @@ struct ext4_sb_info {
>  
>  	/* Kernel thread for multiple mount protection */
>  	struct task_struct *s_mmp_tsk;
> +
> +	/* record the last minlen when FITRIM is called. */
> +	atomic_t s_last_trim_minblks;
>  };
>  
>  static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
> @@ -2067,11 +2070,19 @@ struct ext4_group_info {
>  					 * 5 free 8-block regions. */
>  };
>  
> -#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
>  
>  #define EXT4_MB_GRP_NEED_INIT(grp)	\
>  	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
>  
> +#define EXT4_MB_GRP_WAS_TRIMMED(grp)	\
> +	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
> +#define EXT4_MB_GRP_SET_TRIMMED(grp)	\
> +	(set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
> +#define EXT4_MB_GRP_CLEAR_TRIMMED(grp)	\
> +	(clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
> +
>  #define EXT4_MAX_CONTENTION		8
>  #define EXT4_CONTENTION_THRESHOLD	2
>  
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index a822f2a..afdd869 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2628,6 +2628,15 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
>  		rb_erase(&entry->node, &(db->bb_free_root));
>  		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
>  
> +		/*
> +		 * Clear the trimmed flag for the group so that the next
> +		 * ext4_trim_fs can trim it.
> +		 * If the volume is mounted with -o discard, online discard
> +		 * is supported and the free blocks will be trimmed online.
> +		 */
> +		if (!test_opt(sb, DISCARD))
> +			EXT4_MB_GRP_CLEAR_TRIMMED(db);

If the online discard failed we should probably clear the flag as well
right ? However I am not sure how helpful it would be since it'll most
likely fail again. Maybe we do not need to bother with that case at all,
what do you think ?

> +
>  		if (!db->bb_free_root.rb_node) {
>  			/* No more items in the per group rb tree
>  			 * balance refcounts from ext4_mb_free_metadata()
> @@ -4840,6 +4849,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>  	bitmap = e4b.bd_bitmap;
>  
>  	ext4_lock_group(sb, group);
> +	if (EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) &&
> +	    minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
> +		goto out;
> +
>  	start = (e4b.bd_info->bb_first_free > start) ?
>  		e4b.bd_info->bb_first_free : start;
>  
> @@ -4871,6 +4884,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>  		if ((e4b.bd_info->bb_free - free_count) < minblocks)
>  			break;
>  	}
> +
> +	if (!ret)
> +		EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
> +out:
>  	ext4_unlock_group(sb, group);
>  	ext4_mb_unload_buddy(&e4b);
>  
> @@ -4960,6 +4977,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>  	}
>  	range->len = trimmed * sb->s_blocksize;
>  
> +	if (!ret)
> +		atomic_set(&EXT4_SB(sb)->s_last_trim_minblks, minlen);
> +
>  out:
>  	return ret;
>  }
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 9ea71aa..080e9f9 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3086,6 +3086,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  		sbi->s_sectors_written_start =
>  			part_stat_read(sb->s_bdev->bd_part, sectors[1]);
>  
> +	atomic_set(&sbi->s_last_trim_minblks, 0);
> +

I am not sure that this is needed since sbi is allocated with kzalloc,
so it should be zero already.

>  	/* Cleanup superblock name */
>  	for (cp = sb->s_id; (cp = strchr(cp, '/'));)
>  		*cp = '!';
> 

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] ext4: fix trim length underflow with small trim length.
  2011-07-01  9:45   ` Lukas Czerner
@ 2011-07-01 10:15     ` Tao Ma
  2011-07-01 10:20       ` Lukas Czerner
  0 siblings, 1 reply; 20+ messages in thread
From: Tao Ma @ 2011-07-01 10:15 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4

Hi Lukas,
On 07/01/2011 05:45 PM, Lukas Czerner wrote:
> On Thu, 30 Jun 2011, Tao Ma wrote:
> 
>> From: Tao Ma <boyu.mt@taobao.com>
>>
>> In 0f0a25b, we adjust 'len' with s_first_data_block - start, but
>> it could underflow in case blocksize=1K, fstrim_range.len=512 and
>> fstrim_range.start = 0. In this case, when we run the code:
>> len -= first_data_blk - start; len will be underflow to -1ULL.
>> In the end, although we are safe that last_group check later will limit
>> the trim to the whole volume, but that isn't what the user really want.
>>
>> So this patch fix it. It also adds the check for 'start' like ext3 so that
>> we can break immediately if the start is invalid.
> 
> Hi Tao,
> 
> thanks for the resend!
> 
>>
>> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
>> ---
>>  fs/ext4/mballoc.c |    4 ++++
>>  1 files changed, 4 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 6ed859d..2336424 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -4904,6 +4904,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>>  
>>  	if (unlikely(minlen > EXT4_BLOCKS_PER_GROUP(sb)))
>>  		return -EINVAL;
>> +	if (start >= ext4_blocks_count(EXT4_SB(sb)->s_es) ||
>> +	    start + len <= first_data_blk)
>> +		goto out;
> 
> We should really return -EINVAL in case that start is beyond the
> filesystem. However we can not return -EINVAL in case that start+len is
> before the first data block, because it would require user to know fs
> internals.
uh, actually I have checked what ext3 does and in case of start >
block_count, ext3 returns 0, not EINVAL and I made it to work like ext3.
So we should change it also?

Thanks
Tao
> 
> So simply doing this, should be enough:
> 
> 	if (start + len <= first_data_blk)
> 		goto out;
> 
> and the code later
> 
> 	if (first_group > last_group)
> 		return -EINVAL;
> 
> will handle the rest.
> 
> Thanks!
> -Lukas
> 
> 
>>  	if (start < first_data_blk) {
>>  		len -= first_data_blk - start;
>>  		start = first_data_blk;
>> @@ -4952,5 +4955,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>>  	}
>>  	range->len = trimmed * sb->s_blocksize;
>>  
>> +out:
>>  	return ret;
>>  }
>>
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] ext4: fix trim length underflow with small trim length.
  2011-07-01 10:15     ` Tao Ma
@ 2011-07-01 10:20       ` Lukas Czerner
  0 siblings, 0 replies; 20+ messages in thread
From: Lukas Czerner @ 2011-07-01 10:20 UTC (permalink / raw)
  To: Tao Ma; +Cc: Lukas Czerner, linux-ext4

On Fri, 1 Jul 2011, Tao Ma wrote:

> Hi Lukas,
> On 07/01/2011 05:45 PM, Lukas Czerner wrote:
> > On Thu, 30 Jun 2011, Tao Ma wrote:
> > 
> >> From: Tao Ma <boyu.mt@taobao.com>
> >>
> >> In 0f0a25b, we adjust 'len' with s_first_data_block - start, but
> >> it could underflow in case blocksize=1K, fstrim_range.len=512 and
> >> fstrim_range.start = 0. In this case, when we run the code:
> >> len -= first_data_blk - start; len will be underflow to -1ULL.
> >> In the end, although we are safe that last_group check later will limit
> >> the trim to the whole volume, but that isn't what the user really want.
> >>
> >> So this patch fix it. It also adds the check for 'start' like ext3 so that
> >> we can break immediately if the start is invalid.
> > 
> > Hi Tao,
> > 
> > thanks for the resend!
> > 
> >>
> >> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> >> ---
> >>  fs/ext4/mballoc.c |    4 ++++
> >>  1 files changed, 4 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> >> index 6ed859d..2336424 100644
> >> --- a/fs/ext4/mballoc.c
> >> +++ b/fs/ext4/mballoc.c
> >> @@ -4904,6 +4904,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
> >>  
> >>  	if (unlikely(minlen > EXT4_BLOCKS_PER_GROUP(sb)))
> >>  		return -EINVAL;
> >> +	if (start >= ext4_blocks_count(EXT4_SB(sb)->s_es) ||
> >> +	    start + len <= first_data_blk)
> >> +		goto out;
> > 
> > We should really return -EINVAL in case that start is beyond the
> > filesystem. However we can not return -EINVAL in case that start+len is
> > before the first data block, because it would require user to know fs
> > internals.
> uh, actually I have checked what ext3 does and in case of start >
> block_count, ext3 returns 0, not EINVAL and I made it to work like ext3.
> So we should change it also?

I have already sent a patch a while ago. It should be in Jan's queue.

-Lukas

> 
> Thanks
> Tao
> > 
> > So simply doing this, should be enough:
> > 
> > 	if (start + len <= first_data_blk)
> > 		goto out;
> > 
> > and the code later
> > 
> > 	if (first_group > last_group)
> > 		return -EINVAL;
> > 
> > will handle the rest.
> > 
> > Thanks!
> > -Lukas
> > 
> > 
> >>  	if (start < first_data_blk) {
> >>  		len -= first_data_blk - start;
> >>  		start = first_data_blk;
> >> @@ -4952,5 +4955,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
> >>  	}
> >>  	range->len = trimmed * sb->s_blocksize;
> >>  
> >> +out:
> >>  	return ret;
> >>  }
> >>
> > 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-07-01 10:09   ` Lukas Czerner
@ 2011-07-01 10:46     ` Tao Ma
  2011-07-01 11:03       ` Lukas Czerner
  0 siblings, 1 reply; 20+ messages in thread
From: Tao Ma @ 2011-07-01 10:46 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4

On 07/01/2011 06:09 PM, Lukas Czerner wrote:
> On Thu, 30 Jun 2011, Tao Ma wrote:
> 
>> From: Tao Ma <boyu.mt@taobao.com>
>>
>> In ext4, when FITRIM is called every time, we iterate all the
>> groups and do trim one by one. It is a bit time wasting if the
>> group has been trimmed and there is no change since the last
>> trim.
>>
>> So this patch adds a new flag in ext4_group_info->bb_state to
>> indicate that the group has been trimmed, and it will be cleared
>> if some blocks is freed(in release_blocks_on_commit). Another
>> trim_minlen is added in ext4_sb_info to record the last minlen
>> we use to trim the volume, so that if the caller provide a small
>> one, we will go on the trim regardless of the bb_state.
>>
>> A simple test with my intel x25m ssd:
>> df -h shows:
>> /dev/sdb1              40G   21G   17G  56% /mnt/ext4
>> Block size:               4096
>>
>> run the FITRIM with the following parameter:
>> range.start = 0;
>> range.len = UINT64_MAX;
>> range.minlen = 1048576;
>>
>> without the patch:
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m5.505s
>> user	0m0.000s
>> sys	0m1.224s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m5.359s
>> user	0m0.000s
>> sys	0m1.178s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m5.228s
>> user	0m0.000s
>> sys	0m1.151s
>>
>> with the patch:
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m5.625s
>> user	0m0.000s
>> sys	0m1.269s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m0.002s
>> user	0m0.000s
>> sys	0m0.001s
>> [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
>> real	0m0.002s
>> user	0m0.000s
>> sys	0m0.001s
>>
>> A big improvement for the 2nd and 3rd run.
>>
>> Even after I delete some big image files, it is still much
>> faster than iterating the whole disk.
>>
>> [root@boyu-tm test]# time ./ftrim /mnt/ext4/a
>> real	0m1.217s
>> user	0m0.000s
>> sys	0m0.196s
> 
> Thanks for the patch Tao, I have couple of comments bellow.
> 
>>
>> Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
>> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
>> ---
>>  fs/ext4/ext4.h    |   13 ++++++++++++-
>>  fs/ext4/mballoc.c |   20 ++++++++++++++++++++
>>  fs/ext4/super.c   |    2 ++
>>  3 files changed, 34 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 1921392..5878a22 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1214,6 +1214,9 @@ struct ext4_sb_info {
>>  
>>  	/* Kernel thread for multiple mount protection */
>>  	struct task_struct *s_mmp_tsk;
>> +
>> +	/* record the last minlen when FITRIM is called. */
>> +	atomic_t s_last_trim_minblks;
>>  };
>>  
>>  static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
>> @@ -2067,11 +2070,19 @@ struct ext4_group_info {
>>  					 * 5 free 8-block regions. */
>>  };
>>  
>> -#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
>> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
>>  
>>  #define EXT4_MB_GRP_NEED_INIT(grp)	\
>>  	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
>>  
>> +#define EXT4_MB_GRP_WAS_TRIMMED(grp)	\
>> +	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>> +#define EXT4_MB_GRP_SET_TRIMMED(grp)	\
>> +	(set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>> +#define EXT4_MB_GRP_CLEAR_TRIMMED(grp)	\
>> +	(clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
>> +
>>  #define EXT4_MAX_CONTENTION		8
>>  #define EXT4_CONTENTION_THRESHOLD	2
>>  
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index a822f2a..afdd869 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -2628,6 +2628,15 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
>>  		rb_erase(&entry->node, &(db->bb_free_root));
>>  		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
>>  
>> +		/*
>> +		 * Clear the trimmed flag for the group so that the next
>> +		 * ext4_trim_fs can trim it.
>> +		 * If the volume is mounted with -o discard, online discard
>> +		 * is supported and the free blocks will be trimmed online.
>> +		 */
>> +		if (!test_opt(sb, DISCARD))
>> +			EXT4_MB_GRP_CLEAR_TRIMMED(db);
> 
> If the online discard failed we should probably clear the flag as well
> right ? However I am not sure how helpful it would be since it'll most
> likely fail again. Maybe we do not need to bother with that case at all,
> what do you think ?
yes, in this case, we don't have a minor chance to trim successfully
later and it will increase the time of the next FITRIM ioctl. So I guess
we can leave it as-is.
> 
>> +
>>  		if (!db->bb_free_root.rb_node) {
>>  			/* No more items in the per group rb tree
>>  			 * balance refcounts from ext4_mb_free_metadata()
>> @@ -4840,6 +4849,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>>  	bitmap = e4b.bd_bitmap;
>>  
>>  	ext4_lock_group(sb, group);
>> +	if (EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) &&
>> +	    minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
>> +		goto out;
>> +
>>  	start = (e4b.bd_info->bb_first_free > start) ?
>>  		e4b.bd_info->bb_first_free : start;
>>  
>> @@ -4871,6 +4884,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>>  		if ((e4b.bd_info->bb_free - free_count) < minblocks)
>>  			break;
>>  	}
>> +
>> +	if (!ret)
>> +		EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
>> +out:
>>  	ext4_unlock_group(sb, group);
>>  	ext4_mb_unload_buddy(&e4b);
>>  
>> @@ -4960,6 +4977,9 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>>  	}
>>  	range->len = trimmed * sb->s_blocksize;
>>  
>> +	if (!ret)
>> +		atomic_set(&EXT4_SB(sb)->s_last_trim_minblks, minlen);
>> +
>>  out:
>>  	return ret;
>>  }
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index 9ea71aa..080e9f9 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -3086,6 +3086,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>  		sbi->s_sectors_written_start =
>>  			part_stat_read(sb->s_bdev->bd_part, sectors[1]);
>>  
>> +	atomic_set(&sbi->s_last_trim_minblks, 0);
>> +
> 
> I am not sure that this is needed since sbi is allocated with kzalloc,
> so it should be zero already.
yes, maybe. I can remove it if you like.

Thanks
Tao
> 
>>  	/* Cleanup superblock name */
>>  	for (cp = sb->s_id; (cp = strchr(cp, '/'));)
>>  		*cp = '!';
>>
> 
> Thanks!
> -Lukas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-07-01 10:46     ` Tao Ma
@ 2011-07-01 11:03       ` Lukas Czerner
  0 siblings, 0 replies; 20+ messages in thread
From: Lukas Czerner @ 2011-07-01 11:03 UTC (permalink / raw)
  To: Tao Ma; +Cc: Lukas Czerner, linux-ext4

On Fri, 1 Jul 2011, Tao Ma wrote:

> >> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> >> index 9ea71aa..080e9f9 100644
> >> --- a/fs/ext4/super.c
> >> +++ b/fs/ext4/super.c
> >> @@ -3086,6 +3086,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> >>  		sbi->s_sectors_written_start =
> >>  			part_stat_read(sb->s_bdev->bd_part, sectors[1]);
> >>  
> >> +	atomic_set(&sbi->s_last_trim_minblks, 0);
> >> +
> > 
> > I am not sure that this is needed since sbi is allocated with kzalloc,
> > so it should be zero already.
> yes, maybe. I can remove it if you like.

yes, please do.

Thanks!
> 
> Thanks
> Tao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-09 14:01   ` Lukas Czerner
  2011-02-09 19:25     ` Andreas Dilger
  2011-02-10  1:36     ` Tao Ma
@ 2011-02-10  3:56     ` Tao Ma
  2 siblings, 0 replies; 20+ messages in thread
From: Tao Ma @ 2011-02-10  3:56 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, Andreas Dilger

Hi Lukas,
On 02/09/2011 10:01 PM, Lukas Czerner wrote:
> On Wed, 9 Feb 2011, Tao Ma wrote:
>
>    
>> From: Tao Ma<boyu.mt@taobao.com>
>>
>> In ext4, when FITRIM is called every time, we iterate all the
>> groups and do trim one by one. It is a bit time wasting if the
>> group has been trimmed and there is no change since the last
>> trim.
>>
>> So this patch adds a new flag in ext4_group_info->bb_state to
>> indicate that the group has been trimmed, and it will be cleared
>> if some blocks is freed(in release_blocks_on_commit). Another
>> trim_minlen is added in ext4_sb_info to record the last minlen
>> we use to trim the volume, so that if the caller provide a small
>> one, we will go on the trim regardless of the bb_state. The idea
>> is inspired by Andreas  and Lukas Czerner.
>>      
> Hi Tao,
>
> This is great, thanks for the patch! Do you have any benchmark numbers
> showing how much does it improve FSTRIM time ? It seems it might help a
> lot. Couple of comments bellow.
>    
<snip>
>    
>> @@ -4772,6 +4785,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>>
>>   	ext4_lock_group(sb, group);
>>
>> +	if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info)&&
>> +	    minblocks>= EXT4_SB(sb)->trim_minlen)
>> +		goto out;
>> +
>>      
> I think this can be placed in the ext4_trim_fs() in the for() loop so we can
> avoid unnecessary call to ext4_trim_all_free.
>    
I remembered why I put the check here. We have to check this flag with 
the group locked. So in ext4_trim_fs, we don't lock the group and it is 
unsafe to check it there since we may free some blocks and clear the bit 
in the same time in release_blocks_on_commit.

Regards,
Tao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-09 19:25     ` Andreas Dilger
@ 2011-02-10  1:39       ` Tao Ma
  0 siblings, 0 replies; 20+ messages in thread
From: Tao Ma @ 2011-02-10  1:39 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Lukas Czerner, linux-ext4

On 02/10/2011 03:25 AM, Andreas Dilger wrote:
> On 2011-02-09, at 06:01, Lukas Czerner wrote:
>    
>>> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>>> 	struct ext4_li_request *s_li_request;
>>> 	/* Wait multiplier for lazy initialization thread */
>>> 	unsigned int s_li_wait_mult;
>>> +
>>> +	/* record the last minlen when FITRIM is called. */
>>> +	u64 trim_minlen;
>>>        
>> Maybe this is a bit nitpicking, but it is not very clear what is the
>> "trim_minlen" for. I would rather use something like
>> "last_trim_minblks", or similar.
>>      
> And for good measure, this should use the "s_" prefix to indicate it is in the superblock.
>    
ok.
>    
>>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
>>> +#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED	1
>>>        
> It would also be good if the names of these fields were consistent, like
>
> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
> +#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT	1
>    
thanks for the advice.
>    
>>> @@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>>> 	mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
>>> 	mb_check_buddy(e4b);
>>>
>>> +	if (!ret)
>>> +		EXT4_SB(sb)->trim_minlen = minlen;
>>> +
>>>        
>> Did you even tried to compile it ? "minlen" is not defined in
>> mb_mark_used(), also it does make sense to set trim_minlen there. Did
>> not you meant it to be in ext4_trim_fs ?
>>      
> I was wondering exactly the same thing, but I haven't had time to look at the code.
>    
Sorry, it seems to be caused by my rebase. The right place should be in 
the end of ext4_trim_fs.

Regards,
Tao
> Cheers, Andreas
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-09 14:01   ` Lukas Czerner
  2011-02-09 19:25     ` Andreas Dilger
@ 2011-02-10  1:36     ` Tao Ma
  2011-02-10  3:56     ` Tao Ma
  2 siblings, 0 replies; 20+ messages in thread
From: Tao Ma @ 2011-02-10  1:36 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, Andreas Dilger

On 02/09/2011 10:01 PM, Lukas Czerner wrote:
> On Wed, 9 Feb 2011, Tao Ma wrote:
>
>    
>> From: Tao Ma<boyu.mt@taobao.com>
>>
>> In ext4, when FITRIM is called every time, we iterate all the
>> groups and do trim one by one. It is a bit time wasting if the
>> group has been trimmed and there is no change since the last
>> trim.
>>
>> So this patch adds a new flag in ext4_group_info->bb_state to
>> indicate that the group has been trimmed, and it will be cleared
>> if some blocks is freed(in release_blocks_on_commit). Another
>> trim_minlen is added in ext4_sb_info to record the last minlen
>> we use to trim the volume, so that if the caller provide a small
>> one, we will go on the trim regardless of the bb_state. The idea
>> is inspired by Andreas  and Lukas Czerner.
>>      
> Hi Tao,
>
> This is great, thanks for the patch! Do you have any benchmark numbers
> showing how much does it improve FSTRIM time ? It seems it might help a
> lot. Couple of comments bellow.
>    
I have a intel x25m ssd. I will try to collect some data about it and 
update them in the commit log.
>    
>> Cc: Andreas Dilger<adilger.kernel@dilger.ca>
>> Cc: Lukas Czerner<lczerner@redhat.com>
>> Signed-off-by: Tao Ma<boyu.mt@taobao.com>
>> ---
>>   fs/ext4/ext4.h    |    8 +++++++-
>>   fs/ext4/mballoc.c |   22 ++++++++++++++++++++++
>>   2 files changed, 29 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 0c8d97b..684f48e 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>>   	struct ext4_li_request *s_li_request;
>>   	/* Wait multiplier for lazy initialization thread */
>>   	unsigned int s_li_wait_mult;
>> +
>> +	/* record the last minlen when FITRIM is called. */
>> +	u64 trim_minlen;
>>      
> Maybe this is a bit nitpicking, but it is not very clear what is the
> "trim_minlen" for. I would rather use something like
> "last_trim_minblks", or similar.
>    
ok, I will change it in the next round.
>    
>>   };
>>
>>   static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
>> @@ -1970,10 +1973,13 @@ struct ext4_group_info {
>>   					 * 5 free 8-block regions. */
>>   };
>>
>> -#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
>> +#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED	1
>>
>>   #define EXT4_MB_GRP_NEED_INIT(grp)	\
>>   	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT,&((grp)->bb_state)))
>> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp)	\
>> +	(test_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,&((grp)->bb_state)))
>>
>>   #define EXT4_MAX_CONTENTION		8
>>   #define EXT4_CONTENTION_THRESHOLD	2
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 4eadac8..3f6b809 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>>   	mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
>>   	mb_check_buddy(e4b);
>>
>> +	if (!ret)
>> +		EXT4_SB(sb)->trim_minlen = minlen;
>> +
>>      
> Did you even tried to compile it ? "minlen" is not defined in
> mb_mark_used(), also it does make sense to set trim_minlen there. Did
> not you meant it to be in ext4_trim_fs ?
>    
oh, sorry, this line should be in the function ext4_trim_fs. I just 
rebased my branch to the lastest kernel to see whether it conflicts.
I don't know why they are moved here.  So the right position is:
@@ -4892,6 +4911,9 @@ int ext4_trim_fs(struct super_block *sb, struct 
fstrim_range *range)
         }
         range->len = trimmed * sb->s_blocksize;

+       if (!ret)
+               EXT4_SB(sb)->trim_minlen = minlen;
+
  out:
         return ret;
  }

Sorry for the trouble.
>    
>>   	return ret;
>>   }
>>
>> @@ -2687,6 +2690,16 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
>>   		rb_erase(&entry->node,&(db->bb_free_root));
>>   		mb_free_blocks(NULL,&e4b, entry->start_blk, entry->count);
>>
>> +		/*
>> +		 * Clear the trimmed flag for the group so that the next
>> +		 * ext4_trim_fs can trim it.
>> +		 * If the volume is mounted with -o discard, online discard
>> +		 * is supported and the free blocks will be trimmed online.
>> +		 */
>> +		if (!test_opt(sb, DISCARD))
>> +			clear_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
>> +				&(db->bb_state));
>> +
>>   		if (!db->bb_free_root.rb_node) {
>>   			/* No more items in the per group rb tree
>>   			 * balance refcounts from ext4_mb_free_metadata()
>> @@ -4772,6 +4785,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>>
>>   	ext4_lock_group(sb, group);
>>
>> +	if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info)&&
>> +	    minblocks>= EXT4_SB(sb)->trim_minlen)
>> +		goto out;
>> +
>>      
> I think this can be placed in the ext4_trim_fs() in the for() loop so we can
> avoid unnecessary call to ext4_trim_all_free.
>    
fair enough. I will update it.

Regards,
Tao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-09 14:01   ` Lukas Czerner
@ 2011-02-09 19:25     ` Andreas Dilger
  2011-02-10  1:39       ` Tao Ma
  2011-02-10  1:36     ` Tao Ma
  2011-02-10  3:56     ` Tao Ma
  2 siblings, 1 reply; 20+ messages in thread
From: Andreas Dilger @ 2011-02-09 19:25 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: Tao Ma, linux-ext4

On 2011-02-09, at 06:01, Lukas Czerner wrote:
>> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>> 	struct ext4_li_request *s_li_request;
>> 	/* Wait multiplier for lazy initialization thread */
>> 	unsigned int s_li_wait_mult;
>> +
>> +	/* record the last minlen when FITRIM is called. */
>> +	u64 trim_minlen;
> 
> Maybe this is a bit nitpicking, but it is not very clear what is the
> "trim_minlen" for. I would rather use something like
> "last_trim_minblks", or similar.

And for good measure, this should use the "s_" prefix to indicate it is in the superblock.

>> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
>> +#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED	1

It would also be good if the names of these fields were consistent, like

+#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
+#define EXT4_GROUP_INFO_WAS_TRIMMED_BIT	1

>> @@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>> 	mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
>> 	mb_check_buddy(e4b);
>> 
>> +	if (!ret)
>> +		EXT4_SB(sb)->trim_minlen = minlen;
>> +
> 
> Did you even tried to compile it ? "minlen" is not defined in
> mb_mark_used(), also it does make sense to set trim_minlen there. Did
> not you meant it to be in ext4_trim_fs ?

I was wondering exactly the same thing, but I haven't had time to look at the code.

Cheers, Andreas






^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-09  5:57 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
@ 2011-02-09 14:01   ` Lukas Czerner
  2011-02-09 19:25     ` Andreas Dilger
                       ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Lukas Czerner @ 2011-02-09 14:01 UTC (permalink / raw)
  To: Tao Ma; +Cc: linux-ext4, Andreas Dilger, Lukas Czerner

On Wed, 9 Feb 2011, Tao Ma wrote:

> From: Tao Ma <boyu.mt@taobao.com>
> 
> In ext4, when FITRIM is called every time, we iterate all the
> groups and do trim one by one. It is a bit time wasting if the
> group has been trimmed and there is no change since the last
> trim.
> 
> So this patch adds a new flag in ext4_group_info->bb_state to
> indicate that the group has been trimmed, and it will be cleared
> if some blocks is freed(in release_blocks_on_commit). Another
> trim_minlen is added in ext4_sb_info to record the last minlen
> we use to trim the volume, so that if the caller provide a small
> one, we will go on the trim regardless of the bb_state. The idea
> is inspired by Andreas  and Lukas Czerner.

Hi Tao,

This is great, thanks for the patch! Do you have any benchmark numbers
showing how much does it improve FSTRIM time ? It seems it might help a
lot. Couple of comments bellow.

> 
> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> Cc: Lukas Czerner <lczerner@redhat.com>
> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> ---
>  fs/ext4/ext4.h    |    8 +++++++-
>  fs/ext4/mballoc.c |   22 ++++++++++++++++++++++
>  2 files changed, 29 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 0c8d97b..684f48e 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1200,6 +1200,9 @@ struct ext4_sb_info {
>  	struct ext4_li_request *s_li_request;
>  	/* Wait multiplier for lazy initialization thread */
>  	unsigned int s_li_wait_mult;
> +
> +	/* record the last minlen when FITRIM is called. */
> +	u64 trim_minlen;

Maybe this is a bit nitpicking, but it is not very clear what is the
"trim_minlen" for. I would rather use something like
"last_trim_minblks", or similar.

>  };
>  
>  static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
> @@ -1970,10 +1973,13 @@ struct ext4_group_info {
>  					 * 5 free 8-block regions. */
>  };
>  
> -#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
> +#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
> +#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED	1
>  
>  #define EXT4_MB_GRP_NEED_INIT(grp)	\
>  	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
> +#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp)	\
> +	(test_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED, &((grp)->bb_state)))
>  
>  #define EXT4_MAX_CONTENTION		8
>  #define EXT4_CONTENTION_THRESHOLD	2
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 4eadac8..3f6b809 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>  	mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
>  	mb_check_buddy(e4b);
>  
> +	if (!ret)
> +		EXT4_SB(sb)->trim_minlen = minlen;
> +

Did you even tried to compile it ? "minlen" is not defined in
mb_mark_used(), also it does make sense to set trim_minlen there. Did
not you meant it to be in ext4_trim_fs ?

>  	return ret;
>  }
>  
> @@ -2687,6 +2690,16 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
>  		rb_erase(&entry->node, &(db->bb_free_root));
>  		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
>  
> +		/*
> +		 * Clear the trimmed flag for the group so that the next
> +		 * ext4_trim_fs can trim it.
> +		 * If the volume is mounted with -o discard, online discard
> +		 * is supported and the free blocks will be trimmed online.
> +		 */
> +		if (!test_opt(sb, DISCARD))
> +			clear_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
> +				  &(db->bb_state));
> +
>  		if (!db->bb_free_root.rb_node) {
>  			/* No more items in the per group rb tree
>  			 * balance refcounts from ext4_mb_free_metadata()
> @@ -4772,6 +4785,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>  
>  	ext4_lock_group(sb, group);
>  
> +	if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
> +	    minblocks >= EXT4_SB(sb)->trim_minlen)
> +		goto out;
> +

I think this can be placed in the ext4_trim_fs() in the for() loop so we can
avoid unnecessary call to ext4_trim_all_free.

>  	trace_ext4_trim_all_free(sb, group, start, max);
>  
>  	while (start < max) {
> @@ -4804,6 +4821,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>  		if ((e4b->bd_info->bb_free - free_count) < minblocks)
>  			break;
>  	}
> +
> +	if (!ret)
> +		set_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
> +			&(e4b->bd_info->bb_state));
> +out:
>  	ext4_unlock_group(sb, group);
>  
>  	ext4_debug("trimmed %d blocks in the group %d\n",
> 

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info.
  2011-02-09  5:52 [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
@ 2011-02-09  5:57 ` Tao Ma
  2011-02-09 14:01   ` Lukas Czerner
  0 siblings, 1 reply; 20+ messages in thread
From: Tao Ma @ 2011-02-09  5:57 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andreas Dilger, Lukas Czerner

From: Tao Ma <boyu.mt@taobao.com>

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state. The idea
is inspired by Andreas  and Lukas Czerner.

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/ext4.h    |    8 +++++++-
 fs/ext4/mballoc.c |   22 ++++++++++++++++++++++
 2 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0c8d97b..684f48e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1200,6 +1200,9 @@ struct ext4_sb_info {
 	struct ext4_li_request *s_li_request;
 	/* Wait multiplier for lazy initialization thread */
 	unsigned int s_li_wait_mult;
+
+	/* record the last minlen when FITRIM is called. */
+	u64 trim_minlen;
 };
 
 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -1970,10 +1973,13 @@ struct ext4_group_info {
 					 * 5 free 8-block regions. */
 };
 
-#define EXT4_GROUP_INFO_NEED_INIT_BIT	0
+#define EXT4_GROUP_INFO_NEED_INIT_BIT		0
+#define EXT4_GROUP_INFO_HAS_BEEN_TRIMMED	1
 
 #define EXT4_MB_GRP_NEED_INIT(grp)	\
 	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_HAS_BEEN_TRIMMED(grp)	\
+	(test_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED, &((grp)->bb_state)))
 
 #define EXT4_MAX_CONTENTION		8
 #define EXT4_CONTENTION_THRESHOLD	2
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4eadac8..3f6b809 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1563,6 +1563,9 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
 	mb_set_bits(EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
 	mb_check_buddy(e4b);
 
+	if (!ret)
+		EXT4_SB(sb)->trim_minlen = minlen;
+
 	return ret;
 }
 
@@ -2687,6 +2690,16 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn)
 		rb_erase(&entry->node, &(db->bb_free_root));
 		mb_free_blocks(NULL, &e4b, entry->start_blk, entry->count);
 
+		/*
+		 * Clear the trimmed flag for the group so that the next
+		 * ext4_trim_fs can trim it.
+		 * If the volume is mounted with -o discard, online discard
+		 * is supported and the free blocks will be trimmed online.
+		 */
+		if (!test_opt(sb, DISCARD))
+			clear_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
+				  &(db->bb_state));
+
 		if (!db->bb_free_root.rb_node) {
 			/* No more items in the per group rb tree
 			 * balance refcounts from ext4_mb_free_metadata()
@@ -4772,6 +4785,10 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 
 	ext4_lock_group(sb, group);
 
+	if (EXT4_MB_GRP_HAS_BEEN_TRIMMED(e4b->bd_info) &&
+	    minblocks >= EXT4_SB(sb)->trim_minlen)
+		goto out;
+
 	trace_ext4_trim_all_free(sb, group, start, max);
 
 	while (start < max) {
@@ -4804,6 +4821,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 		if ((e4b->bd_info->bb_free - free_count) < minblocks)
 			break;
 	}
+
+	if (!ret)
+		set_bit(EXT4_GROUP_INFO_HAS_BEEN_TRIMMED,
+			&(e4b->bd_info->bb_state));
+out:
 	ext4_unlock_group(sb, group);
 
 	ext4_debug("trimmed %d blocks in the group %d\n",
-- 
1.6.3.GIT


^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-07-01 11:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-30 14:42 [PATCH 0/4 RESEND] ext4 trim bug fixes and improvement Tao Ma
2011-06-30 14:50 ` [PATCH 1/4] ext4: fix trim length underflow with small trim length Tao Ma
2011-07-01  9:45   ` Lukas Czerner
2011-07-01 10:15     ` Tao Ma
2011-07-01 10:20       ` Lukas Czerner
2011-06-30 14:50 ` [PATCH 2/4] ext4: speed up group trim with the right free block count Tao Ma
2011-06-30 16:35   ` Andreas Dilger
2011-07-01  9:48   ` Lukas Czerner
2011-06-30 14:50 ` [PATCH 3/4] ext4: Add new ext4 trim tracepoints Tao Ma
2011-07-01  9:54   ` Lukas Czerner
2011-06-30 14:50 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
2011-07-01 10:09   ` Lukas Czerner
2011-07-01 10:46     ` Tao Ma
2011-07-01 11:03       ` Lukas Czerner
  -- strict thread matches above, loose matches on Subject: below --
2011-02-09  5:52 [PATCH 0/4] EXT4 trim bug fixes and improvement Tao Ma
2011-02-09  5:57 ` [PATCH 4/4] ext4: Speed up FITRIM by recording flags in ext4_group_info Tao Ma
2011-02-09 14:01   ` Lukas Czerner
2011-02-09 19:25     ` Andreas Dilger
2011-02-10  1:39       ` Tao Ma
2011-02-10  1:36     ` Tao Ma
2011-02-10  3:56     ` Tao Ma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.