All of lore.kernel.org
 help / color / mirror / Atom feed
* Ext3: batched discard support
@ 2010-07-07 13:18 Lukas Czerner
  2010-07-07 13:18 ` [PATCH 1/2] Add discard/nodiscard mount option for ext3 Lukas Czerner
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Lukas Czerner @ 2010-07-07 13:18 UTC (permalink / raw)
  To: linux-ext4; +Cc: lczerner, jmoyer, rwheeler, eshishki, sandeen


Hello again,

I have composed a set of patches to add batched discard support for Ext3 file
system. Functionality is basically the same as in "batched discard support
for Ext4" (see http://www.spinics.net/lists/linux-ext4/msg19827.html ).

There are two patches:

[PATCH 1/2] Add discard/nodiscard mount option for ext3
 fs/ext3/super.c         |   14 +++++++++++++-
 include/linux/ext3_fs.h |    1 +
 2 files changed, 14 insertions(+), 1 deletions(-)

[PATCH 2/2] Add batched discard support for ext3
 fs/ext3/balloc.c        |  145 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext3/super.c         |    1 +
 include/linux/ext3_fs.h |    1 +
 3 files changed, 147 insertions(+), 0 deletions(-)

The first one adds mount option to specify whether or not the discard should
be performed (default is nodiscard), and the second one adds discard support
itself.

Note that you will need one more patch for this to work "Add ioctl FITRIM" see
(http://www.spinics.net/lists/linux-ext4/msg19828.html) which adds ioctl to
issue discard itself.

Regards.

-Lukas

Signed-off-by: Lukas Czerner <lczerner@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] Add discard/nodiscard mount option for ext3
  2010-07-07 13:18 Ext3: batched discard support Lukas Czerner
@ 2010-07-07 13:18 ` Lukas Czerner
  2010-07-12 15:19   ` Jan Kara
  2010-07-07 13:18 ` [PATCH 2/2] Add batched discard support " Lukas Czerner
  2010-07-07 19:14 ` Ext3: batched discard support Greg Freemyer
  2 siblings, 1 reply; 20+ messages in thread
From: Lukas Czerner @ 2010-07-07 13:18 UTC (permalink / raw)
  To: linux-ext4; +Cc: lczerner, jmoyer, rwheeler, eshishki, sandeen

Those mount option has the same meaning as in ext4 file system. It
provide a way to enable/disable file system's trim support. The trim
support is off by default, thus nodiscard option is not actually
necessary.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext3/super.c         |   14 +++++++++++++-
 include/linux/ext3_fs.h |    1 +
 2 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 1bee604..6baf7ef 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -662,6 +662,9 @@ static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs)
 	if (test_opt(sb, DATA_ERR_ABORT))
 		seq_puts(seq, ",data_err=abort");
 
+	if (test_opt(sb, DISCARD))
+		seq_puts(seq, ",discard");
+
 	if (test_opt(sb, NOLOAD))
 		seq_puts(seq, ",norecovery");
 
@@ -811,7 +814,8 @@ enum {
 	Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
 	Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
 	Opt_noquota, Opt_ignore, Opt_barrier, Opt_err, Opt_resize,
-	Opt_usrquota, Opt_grpquota
+	Opt_usrquota, Opt_grpquota,
+	Opt_discard, Opt_nodiscard,
 };
 
 static const match_table_t tokens = {
@@ -866,6 +870,8 @@ static const match_table_t tokens = {
 	{Opt_usrquota, "usrquota"},
 	{Opt_barrier, "barrier=%u"},
 	{Opt_resize, "resize"},
+	{Opt_discard, "discard"},
+	{Opt_nodiscard, "nodiscard"},
 	{Opt_err, NULL},
 };
 
@@ -1242,6 +1248,12 @@ set_qf_format:
 		case Opt_bh:
 			clear_opt(sbi->s_mount_opt, NOBH);
 			break;
+		case Opt_discard:
+			set_opt(sbi->s_mount_opt, DISCARD);
+			break;
+		case Opt_nodiscard:
+			clear_opt(sbi->s_mount_opt, DISCARD);
+			break;
 		default:
 			ext3_msg(sb, KERN_ERR,
 				"error: unrecognized mount option \"%s\" "
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index 5f494b4..f3fdd94 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -406,6 +406,7 @@ struct ext3_inode {
 #define EXT3_MOUNT_GRPQUOTA		0x200000 /* "old" group quota */
 #define EXT3_MOUNT_DATA_ERR_ABORT	0x400000 /* Abort on file data write
 						  * error in ordered mode */
+#define EXT3_MOUNT_DISCARD		0x800000 /* Issue DISCARD requests */
 
 /* Compatibility, for having both ext2_fs.h and ext3_fs.h included at once */
 #ifndef _LINUX_EXT2_FS_H
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] Add batched discard support for ext3
  2010-07-07 13:18 Ext3: batched discard support Lukas Czerner
  2010-07-07 13:18 ` [PATCH 1/2] Add discard/nodiscard mount option for ext3 Lukas Czerner
@ 2010-07-07 13:18 ` Lukas Czerner
  2010-07-12 15:28   ` Jan Kara
  2010-07-07 19:14 ` Ext3: batched discard support Greg Freemyer
  2 siblings, 1 reply; 20+ messages in thread
From: Lukas Czerner @ 2010-07-07 13:18 UTC (permalink / raw)
  To: linux-ext4; +Cc: lczerner, jmoyer, rwheeler, eshishki, sandeen

Walk through each allocation group and trim all free extents. It can be
invoked through TRIM ioctl on the file system. The main idea is to
provide a way to trim the whole file system if needed, since some SSD's
may suffer from performance loss after the whole device was filled (it
does not mean that fs is full!).

It search for free extents in each allocation group. When the free
extent is found, blocks are marked as used and then trimmed. Afterwards
these blocks are marked as free in per-group bitmap.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext3/balloc.c        |  145 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext3/super.c         |    1 +
 include/linux/ext3_fs.h |    1 +
 3 files changed, 147 insertions(+), 0 deletions(-)

diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
index a177122..bcee525 100644
--- a/fs/ext3/balloc.c
+++ b/fs/ext3/balloc.c
@@ -20,6 +20,7 @@
 #include <linux/ext3_jbd.h>
 #include <linux/quotaops.h>
 #include <linux/buffer_head.h>
+#include <linux/blkdev.h>
 
 /*
  * balloc.c contains the blocks allocation and deallocation routines
@@ -1876,3 +1877,147 @@ unsigned long ext3_bg_num_gdb(struct super_block *sb, int group)
 	return ext3_bg_num_gdb_meta(sb,group);
 
 }
+
+/**
+ * ext3_trim_all_free -- function to trim all free space in alloc. group
+ * @sb:			super block for file system
+ * @group:		allocation group to trim
+ * @gdp:		allocation group description structure
+ * @minblocks:		minimum extent block count
+ *
+ * ext3_trim_all_free walks through group's block bitmap searching for free
+ * blocks. When the free block is found, it tries to allocate this block and
+ * consequent free block to get the biggest free extent possible, until it
+ * reaches any used block. Then issue a TRIM command on this extent and free
+ * the extent in the block bitmap. This is done until whole group is scanned.
+ */
+ext3_grpblk_t ext3_trim_all_free(struct super_block *sb, unsigned int group,
+			struct ext3_group_desc *gdp, ext3_grpblk_t minblocks)
+{
+	ext3_grpblk_t max = EXT3_BLOCKS_PER_GROUP(sb);
+	ext3_grpblk_t next, tmp, count = 0, start = 0;
+	struct ext3_sb_info *sbi;
+	ext3_fsblk_t discard_block;
+	struct buffer_head *bh = NULL;
+	ext3_grpblk_t free_blocks;
+
+	BUG_ON(gdp == NULL);
+
+	bh = read_block_bitmap(sb, group);
+	if (!bh) {
+		brelse(bh);
+		return 0;
+	}
+
+	free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
+	sbi = EXT3_SB(sb);
+
+	/**
+	 * Walk through the whole group
+	 */
+	while (start < max) {
+		start = ext3_find_next_zero_bit(bh->b_data, max, start);
+		next = start;
+
+		/**
+		 * Allocate contiguous free extents by setting bits in the
+		 * block bitmap
+		 */
+		while (next < max
+			&& !ext3_set_bit_atomic(sb_bgl_lock(sbi, group),
+					next, bh->b_data)) {
+			next++;
+		}
+
+		if (next == start)
+			continue;
+
+		spin_lock(sb_bgl_lock(sbi, group));
+		le16_add_cpu(&gdp->bg_free_blocks_count, start - next);
+		spin_unlock(sb_bgl_lock(sbi, group));
+		percpu_counter_sub(&sbi->s_freeblocks_counter, next - start);
+
+		/* Do not issue a TRIM on extents smaller than minblocks */
+		if ((next - start) < minblocks)
+			goto free_extent;
+
+		count += (next - start);
+
+		discard_block = (ext3_fsblk_t)start +
+				ext3_group_first_block_no(sb, group);
+		sb_issue_discard(sb, discard_block, next - start);
+
+free_extent:
+		/**
+		 * Free previously allocated extent by clearing bits in the
+		 * blocks bitmap
+		 */
+		tmp = next - 1;
+		while (tmp >= start) {
+			ext3_clear_bit_atomic(sb_bgl_lock(sbi, group),
+						tmp, bh->b_data);
+			tmp--;
+		}
+
+		spin_lock(sb_bgl_lock(sbi, group));
+		le16_add_cpu(&gdp->bg_free_blocks_count, next - start);
+		spin_unlock(sb_bgl_lock(sbi, group));
+		percpu_counter_add(&sbi->s_freeblocks_counter, next - start);
+
+		start = next;
+
+		/* No more suitable extents */
+		if ((free_blocks - count) < minblocks)
+			break;
+	}
+
+	brelse(bh);
+
+	ext3_debug("trimmed %d blocks in the group %d\n",
+		count, group);
+
+	return count;
+}
+
+/**
+ * ext3_trim_fs() -- trim ioctl handle function
+ * @minlen:		minimum extent length in Bytes
+ * @sb:			superblock for filesystem
+ *
+ * ext3_trim_fs goes through all allocation group searching for groups with more
+ * free space than minlen. For such a group ext3_trim_all_free function is
+ * invoked.
+ */
+int ext3_trim_fs(unsigned int minlen, struct super_block *sb)
+{
+	ext3_grpblk_t minblocks;
+	unsigned long ngroups;
+	unsigned int group;
+	struct ext3_group_desc *gdp;
+	ext3_grpblk_t free_blocks;
+
+	if (!test_opt(sb, DISCARD))
+		return 0;
+
+	minblocks = DIV_ROUND_UP(minlen, sb->s_blocksize);
+	if (unlikely(minblocks > EXT3_BLOCKS_PER_GROUP(sb)))
+		return -EINVAL;
+
+	ngroups = EXT3_SB(sb)->s_groups_count;
+	smp_rmb();
+
+	for (group = 0; group < ngroups; group++) {
+
+		gdp = ext3_get_group_desc(sb, group, NULL);
+		if (!gdp)
+			continue;
+
+		free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
+		if (free_blocks < minblocks)
+			continue;
+
+		ext3_trim_all_free(sb, group, gdp, minblocks);
+	}
+
+	return 0;
+}
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 6baf7ef..5b639b3 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -794,6 +794,7 @@ static const struct super_operations ext3_sops = {
 	.quota_write	= ext3_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.trim_fs	= ext3_trim_fs,
 };
 
 static const struct export_operations ext3_export_ops = {
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index f3fdd94..b7337e9 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -858,6 +858,7 @@ extern struct ext3_group_desc * ext3_get_group_desc(struct super_block * sb,
 extern int ext3_should_retry_alloc(struct super_block *sb, int *retries);
 extern void ext3_init_block_alloc_info(struct inode *);
 extern void ext3_rsv_window_add(struct super_block *sb, struct ext3_reserve_window_node *rsv);
+extern int ext3_trim_fs(unsigned int minlen, struct super_block *sb);
 
 /* dir.c */
 extern int ext3_check_dir_entry(const char *, struct inode *,
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Ext3: batched discard support
  2010-07-07 13:18 Ext3: batched discard support Lukas Czerner
  2010-07-07 13:18 ` [PATCH 1/2] Add discard/nodiscard mount option for ext3 Lukas Czerner
  2010-07-07 13:18 ` [PATCH 2/2] Add batched discard support " Lukas Czerner
@ 2010-07-07 19:14 ` Greg Freemyer
  2010-07-09  8:53   ` Lukas Czerner
  2 siblings, 1 reply; 20+ messages in thread
From: Greg Freemyer @ 2010-07-07 19:14 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, jmoyer, rwheeler, eshishki, sandeen

On Wed, Jul 7, 2010 at 9:18 AM, Lukas Czerner <lczerner@redhat.com> wrote:
>
> Hello again,
>
> I have composed a set of patches to add batched discard support for Ext3 file
> system. Functionality is basically the same as in "batched discard support
> for Ext4" (see http://www.spinics.net/lists/linux-ext4/msg19827.html ).
>
> There are two patches:

I'm inclined to review these, but before I do, I thought ext3 was
closed to patches like this.

Or is it still accepting more new functionality than I thought?

Greg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ext3: batched discard support
  2010-07-07 19:14 ` Ext3: batched discard support Greg Freemyer
@ 2010-07-09  8:53   ` Lukas Czerner
  2010-07-09 10:18     ` Ric Wheeler
  0 siblings, 1 reply; 20+ messages in thread
From: Lukas Czerner @ 2010-07-09  8:53 UTC (permalink / raw)
  To: Greg Freemyer
  Cc: Lukas Czerner, linux-ext4, jmoyer, rwheeler, eshishki, sandeen

On Wed, 7 Jul 2010, Greg Freemyer wrote:

> On Wed, Jul 7, 2010 at 9:18 AM, Lukas Czerner <lczerner@redhat.com> wrote:
> >
> > Hello again,
> >
> > I have composed a set of patches to add batched discard support for Ext3 file
> > system. Functionality is basically the same as in "batched discard support
> > for Ext4" (see http://www.spinics.net/lists/linux-ext4/msg19827.html ).
> >
> > There are two patches:
> 
> I'm inclined to review these, but before I do, I thought ext3 was
> closed to patches like this.
> 
> Or is it still accepting more new functionality than I thought?
> 
> Greg
> 

Hi, Greg

thanks for your interest ! Unfortunately it seems like you are right,
I was suspecting something like that. Anyway I am not the one who make
those decisions...so anyone ?

Thanks

-Lukas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ext3: batched discard support
  2010-07-09  8:53   ` Lukas Czerner
@ 2010-07-09 10:18     ` Ric Wheeler
  2010-07-12 15:09       ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Ric Wheeler @ 2010-07-09 10:18 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: Greg Freemyer, linux-ext4, jmoyer, eshishki, sandeen, Jan Kara

On 07/09/2010 04:53 AM, Lukas Czerner wrote:
> On Wed, 7 Jul 2010, Greg Freemyer wrote:
>
>> On Wed, Jul 7, 2010 at 9:18 AM, Lukas Czerner<lczerner@redhat.com>  wrote:
>>>
>>> Hello again,
>>>
>>> I have composed a set of patches to add batched discard support for Ext3 file
>>> system. Functionality is basically the same as in "batched discard support
>>> for Ext4" (see http://www.spinics.net/lists/linux-ext4/msg19827.html ).
>>>
>>> There are two patches:
>>
>> I'm inclined to review these, but before I do, I thought ext3 was
>> closed to patches like this.
>>
>> Or is it still accepting more new functionality than I thought?
>>
>> Greg
>>
>
> Hi, Greg
>
> thanks for your interest ! Unfortunately it seems like you are right,
> I was suspecting something like that. Anyway I am not the one who make
> those decisions...so anyone ?
>
> Thanks
>
> -Lukas

Not sure how much we want to move into ext3, but I think that Jan is the person 
to make that call.

Discard support got stuck into several other file systems and would be useful 
for ext3.

Jan, Ted, what do you both think about these patches?

Thanks!

Ric



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Ext3: batched discard support
  2010-07-09 10:18     ` Ric Wheeler
@ 2010-07-12 15:09       ` Jan Kara
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2010-07-12 15:09 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: Lukas Czerner, Greg Freemyer, linux-ext4, jmoyer, eshishki,
	sandeen, Jan Kara

On Fri 09-07-10 06:18:15, Ric Wheeler wrote:
> On 07/09/2010 04:53 AM, Lukas Czerner wrote:
> >On Wed, 7 Jul 2010, Greg Freemyer wrote:
> >
> >>On Wed, Jul 7, 2010 at 9:18 AM, Lukas Czerner<lczerner@redhat.com>  wrote:
> >>>
> >>>Hello again,
> >>>
> >>>I have composed a set of patches to add batched discard support for Ext3 file
> >>>system. Functionality is basically the same as in "batched discard support
> >>>for Ext4" (see http://www.spinics.net/lists/linux-ext4/msg19827.html ).
> >>>
> >>>There are two patches:
> >>
> >>I'm inclined to review these, but before I do, I thought ext3 was
> >>closed to patches like this.
> >>
> >>Or is it still accepting more new functionality than I thought?
> >>
> >>Greg
> >>
> >
> >Hi, Greg
> >
> >thanks for your interest ! Unfortunately it seems like you are right,
> >I was suspecting something like that. Anyway I am not the one who make
> >those decisions...so anyone ?
> >
> >Thanks
> >
> >-Lukas
> 
> Not sure how much we want to move into ext3, but I think that Jan is
> the person to make that call.
> 
> Discard support got stuck into several other file systems and would
> be useful for ext3.
> 
> Jan, Ted, what do you both think about these patches?
  I'm fine with taking a feature like this one into ext3 because it's
separated from the rest of the code. So if people want this and Ted is
OK with taking it for ext4, I'll merge the ext3 patches (I would like to
avoid having ext3 feature not supported by ext4...).

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Add discard/nodiscard mount option for ext3
  2010-07-07 13:18 ` [PATCH 1/2] Add discard/nodiscard mount option for ext3 Lukas Czerner
@ 2010-07-12 15:19   ` Jan Kara
  2010-07-12 15:26     ` Lukas Czerner
                       ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Jan Kara @ 2010-07-12 15:19 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, jmoyer, rwheeler, eshishki, sandeen

> Those mount option has the same meaning as in ext4 file system. It
> provide a way to enable/disable file system's trim support. The trim
> support is off by default, thus nodiscard option is not actually
> necessary.
  I kind of miss why ext3 should have a 'discard' mount option. When
user calls DISCARD ioctl on the filesystem, then he probably wants
discard to be performed.

								Honza
 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> ---
>  fs/ext3/super.c         |   14 +++++++++++++-
>  include/linux/ext3_fs.h |    1 +
>  2 files changed, 14 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> index 1bee604..6baf7ef 100644
> --- a/fs/ext3/super.c
> +++ b/fs/ext3/super.c
> @@ -662,6 +662,9 @@ static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs)
>  	if (test_opt(sb, DATA_ERR_ABORT))
>  		seq_puts(seq, ",data_err=abort");
>  
> +	if (test_opt(sb, DISCARD))
> +		seq_puts(seq, ",discard");
> +
>  	if (test_opt(sb, NOLOAD))
>  		seq_puts(seq, ",norecovery");
>  
> @@ -811,7 +814,8 @@ enum {
>  	Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
>  	Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
>  	Opt_noquota, Opt_ignore, Opt_barrier, Opt_err, Opt_resize,
> -	Opt_usrquota, Opt_grpquota
> +	Opt_usrquota, Opt_grpquota,
> +	Opt_discard, Opt_nodiscard,
>  };
>  
>  static const match_table_t tokens = {
> @@ -866,6 +870,8 @@ static const match_table_t tokens = {
>  	{Opt_usrquota, "usrquota"},
>  	{Opt_barrier, "barrier=%u"},
>  	{Opt_resize, "resize"},
> +	{Opt_discard, "discard"},
> +	{Opt_nodiscard, "nodiscard"},
>  	{Opt_err, NULL},
>  };
>  
> @@ -1242,6 +1248,12 @@ set_qf_format:
>  		case Opt_bh:
>  			clear_opt(sbi->s_mount_opt, NOBH);
>  			break;
> +		case Opt_discard:
> +			set_opt(sbi->s_mount_opt, DISCARD);
> +			break;
> +		case Opt_nodiscard:
> +			clear_opt(sbi->s_mount_opt, DISCARD);
> +			break;
>  		default:
>  			ext3_msg(sb, KERN_ERR,
>  				"error: unrecognized mount option \"%s\" "
> diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
> index 5f494b4..f3fdd94 100644
> --- a/include/linux/ext3_fs.h
> +++ b/include/linux/ext3_fs.h
> @@ -406,6 +406,7 @@ struct ext3_inode {
>  #define EXT3_MOUNT_GRPQUOTA		0x200000 /* "old" group quota */
>  #define EXT3_MOUNT_DATA_ERR_ABORT	0x400000 /* Abort on file data write
>  						  * error in ordered mode */
> +#define EXT3_MOUNT_DISCARD		0x800000 /* Issue DISCARD requests */
>  
>  /* Compatibility, for having both ext2_fs.h and ext3_fs.h included at once */
>  #ifndef _LINUX_EXT2_FS_H
> -- 
> 1.6.6.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Add discard/nodiscard mount option for ext3
  2010-07-12 15:19   ` Jan Kara
@ 2010-07-12 15:26     ` Lukas Czerner
  2010-07-12 15:50       ` Jan Kara
  2010-07-12 15:27     ` Ric Wheeler
  2010-07-12 16:03     ` Ric Wheeler
  2 siblings, 1 reply; 20+ messages in thread
From: Lukas Czerner @ 2010-07-12 15:26 UTC (permalink / raw)
  To: Jan Kara; +Cc: Lukas Czerner, linux-ext4, jmoyer, rwheeler, eshishki, sandeen

On Mon, 12 Jul 2010, Jan Kara wrote:

> > Those mount option has the same meaning as in ext4 file system. It
> > provide a way to enable/disable file system's trim support. The trim
> > support is off by default, thus nodiscard option is not actually
> > necessary.
>   I kind of miss why ext3 should have a 'discard' mount option. When
> user calls DISCARD ioctl on the filesystem, then he probably wants
> discard to be performed.
> 
> 								Honza

You're right that it is not necessarily needed, but it is the same as in
ext4. If you want to be really sure that no unwanted trim will be send
to the device, 'nodiscard' mount option becomes handy. But I do not
insist on it and I can easily get rid of it.

-Lukas

>  
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > ---
> >  fs/ext3/super.c         |   14 +++++++++++++-
> >  include/linux/ext3_fs.h |    1 +
> >  2 files changed, 14 insertions(+), 1 deletions(-)
> > 
> > diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> > index 1bee604..6baf7ef 100644
> > --- a/fs/ext3/super.c
> > +++ b/fs/ext3/super.c
> > @@ -662,6 +662,9 @@ static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs)
> >  	if (test_opt(sb, DATA_ERR_ABORT))
> >  		seq_puts(seq, ",data_err=abort");
> >  
> > +	if (test_opt(sb, DISCARD))
> > +		seq_puts(seq, ",discard");
> > +
> >  	if (test_opt(sb, NOLOAD))
> >  		seq_puts(seq, ",norecovery");
> >  
> > @@ -811,7 +814,8 @@ enum {
> >  	Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
> >  	Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
> >  	Opt_noquota, Opt_ignore, Opt_barrier, Opt_err, Opt_resize,
> > -	Opt_usrquota, Opt_grpquota
> > +	Opt_usrquota, Opt_grpquota,
> > +	Opt_discard, Opt_nodiscard,
> >  };
> >  
> >  static const match_table_t tokens = {
> > @@ -866,6 +870,8 @@ static const match_table_t tokens = {
> >  	{Opt_usrquota, "usrquota"},
> >  	{Opt_barrier, "barrier=%u"},
> >  	{Opt_resize, "resize"},
> > +	{Opt_discard, "discard"},
> > +	{Opt_nodiscard, "nodiscard"},
> >  	{Opt_err, NULL},
> >  };
> >  
> > @@ -1242,6 +1248,12 @@ set_qf_format:
> >  		case Opt_bh:
> >  			clear_opt(sbi->s_mount_opt, NOBH);
> >  			break;
> > +		case Opt_discard:
> > +			set_opt(sbi->s_mount_opt, DISCARD);
> > +			break;
> > +		case Opt_nodiscard:
> > +			clear_opt(sbi->s_mount_opt, DISCARD);
> > +			break;
> >  		default:
> >  			ext3_msg(sb, KERN_ERR,
> >  				"error: unrecognized mount option \"%s\" "
> > diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
> > index 5f494b4..f3fdd94 100644
> > --- a/include/linux/ext3_fs.h
> > +++ b/include/linux/ext3_fs.h
> > @@ -406,6 +406,7 @@ struct ext3_inode {
> >  #define EXT3_MOUNT_GRPQUOTA		0x200000 /* "old" group quota */
> >  #define EXT3_MOUNT_DATA_ERR_ABORT	0x400000 /* Abort on file data write
> >  						  * error in ordered mode */
> > +#define EXT3_MOUNT_DISCARD		0x800000 /* Issue DISCARD requests */
> >  
> >  /* Compatibility, for having both ext2_fs.h and ext3_fs.h included at once */
> >  #ifndef _LINUX_EXT2_FS_H
> > -- 
> > 1.6.6.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Add discard/nodiscard mount option for ext3
  2010-07-12 15:19   ` Jan Kara
  2010-07-12 15:26     ` Lukas Czerner
@ 2010-07-12 15:27     ` Ric Wheeler
  2010-07-12 16:03     ` Ric Wheeler
  2 siblings, 0 replies; 20+ messages in thread
From: Ric Wheeler @ 2010-07-12 15:27 UTC (permalink / raw)
  To: Jan Kara; +Cc: Lukas Czerner, linux-ext4, jmoyer, eshishki, sandeen

On 07/12/2010 11:19 AM, Jan Kara wrote:
>> Those mount option has the same meaning as in ext4 file system. It
>> provide a way to enable/disable file system's trim support. The trim
>> support is off by default, thus nodiscard option is not actually
>> necessary.
>    I kind of miss why ext3 should have a 'discard' mount option. When
> user calls DISCARD ioctl on the filesystem, then he probably wants
> discard to be performed.
>
> 								Honza


Did you see Lukas' second patch for batched discards in ext3?

Ric

>
>> Signed-off-by: Lukas Czerner<lczerner@redhat.com>
>> ---
>>   fs/ext3/super.c         |   14 +++++++++++++-
>>   include/linux/ext3_fs.h |    1 +
>>   2 files changed, 14 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext3/super.c b/fs/ext3/super.c
>> index 1bee604..6baf7ef 100644
>> --- a/fs/ext3/super.c
>> +++ b/fs/ext3/super.c
>> @@ -662,6 +662,9 @@ static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs)
>>   	if (test_opt(sb, DATA_ERR_ABORT))
>>   		seq_puts(seq, ",data_err=abort");
>>
>> +	if (test_opt(sb, DISCARD))
>> +		seq_puts(seq, ",discard");
>> +
>>   	if (test_opt(sb, NOLOAD))
>>   		seq_puts(seq, ",norecovery");
>>
>> @@ -811,7 +814,8 @@ enum {
>>   	Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
>>   	Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
>>   	Opt_noquota, Opt_ignore, Opt_barrier, Opt_err, Opt_resize,
>> -	Opt_usrquota, Opt_grpquota
>> +	Opt_usrquota, Opt_grpquota,
>> +	Opt_discard, Opt_nodiscard,
>>   };
>>
>>   static const match_table_t tokens = {
>> @@ -866,6 +870,8 @@ static const match_table_t tokens = {
>>   	{Opt_usrquota, "usrquota"},
>>   	{Opt_barrier, "barrier=%u"},
>>   	{Opt_resize, "resize"},
>> +	{Opt_discard, "discard"},
>> +	{Opt_nodiscard, "nodiscard"},
>>   	{Opt_err, NULL},
>>   };
>>
>> @@ -1242,6 +1248,12 @@ set_qf_format:
>>   		case Opt_bh:
>>   			clear_opt(sbi->s_mount_opt, NOBH);
>>   			break;
>> +		case Opt_discard:
>> +			set_opt(sbi->s_mount_opt, DISCARD);
>> +			break;
>> +		case Opt_nodiscard:
>> +			clear_opt(sbi->s_mount_opt, DISCARD);
>> +			break;
>>   		default:
>>   			ext3_msg(sb, KERN_ERR,
>>   				"error: unrecognized mount option \"%s\" "
>> diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
>> index 5f494b4..f3fdd94 100644
>> --- a/include/linux/ext3_fs.h
>> +++ b/include/linux/ext3_fs.h
>> @@ -406,6 +406,7 @@ struct ext3_inode {
>>   #define EXT3_MOUNT_GRPQUOTA		0x200000 /* "old" group quota */
>>   #define EXT3_MOUNT_DATA_ERR_ABORT	0x400000 /* Abort on file data write
>>   						  * error in ordered mode */
>> +#define EXT3_MOUNT_DISCARD		0x800000 /* Issue DISCARD requests */
>>
>>   /* Compatibility, for having both ext2_fs.h and ext3_fs.h included at once */
>>   #ifndef _LINUX_EXT2_FS_H
>> --
>> 1.6.6.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Add batched discard support for ext3
  2010-07-07 13:18 ` [PATCH 2/2] Add batched discard support " Lukas Czerner
@ 2010-07-12 15:28   ` Jan Kara
  2010-07-12 15:58     ` Lukas Czerner
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2010-07-12 15:28 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, jmoyer, rwheeler, eshishki, sandeen

> Walk through each allocation group and trim all free extents. It can be
> invoked through TRIM ioctl on the file system. The main idea is to
> provide a way to trim the whole file system if needed, since some SSD's
> may suffer from performance loss after the whole device was filled (it
> does not mean that fs is full!).
> 
> It search for free extents in each allocation group. When the free
> extent is found, blocks are marked as used and then trimmed. Afterwards
> these blocks are marked as free in per-group bitmap.
> 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> ---
>  fs/ext3/balloc.c        |  145 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ext3/super.c         |    1 +
>  include/linux/ext3_fs.h |    1 +
>  3 files changed, 147 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
> index a177122..bcee525 100644
> --- a/fs/ext3/balloc.c
> +++ b/fs/ext3/balloc.c
...
> +		/**
> +		 * Allocate contiguous free extents by setting bits in the
> +		 * block bitmap
> +		 */
> +		while (next < max
> +			&& !ext3_set_bit_atomic(sb_bgl_lock(sbi, group),
> +					next, bh->b_data)) {
> +			next++;
> +		}
  This is actually wrong. You completely ignore journalling here. You can't
just go and modify metadata buffer - other process can be modifying it as well
and writing it to disk and thus your changes will also get written. And if
a crash happens afterwards before the bitmap is written again, you'll get an
inconsistent filesystem.
  Also you have to check whether the block isn't actually still used by a
running/committing transaction - look at fs/ext3/balloc.c:claim_block() to see
how you have to allocate free blocks.
  All-in-all won't it be good enough to just freeze the fs, do the trimming,
and unfreeze it?

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Add discard/nodiscard mount option for ext3
  2010-07-12 15:26     ` Lukas Czerner
@ 2010-07-12 15:50       ` Jan Kara
  2010-07-12 16:01         ` Lukas Czerner
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2010-07-12 15:50 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: Jan Kara, linux-ext4, jmoyer, rwheeler, eshishki, sandeen

On Mon 12-07-10 17:26:49, Lukas Czerner wrote:
> On Mon, 12 Jul 2010, Jan Kara wrote:
> 
> > > Those mount option has the same meaning as in ext4 file system. It
> > > provide a way to enable/disable file system's trim support. The trim
> > > support is off by default, thus nodiscard option is not actually
> > > necessary.
> >   I kind of miss why ext3 should have a 'discard' mount option. When
> > user calls DISCARD ioctl on the filesystem, then he probably wants
> > discard to be performed.
> > 
> > 								Honza
> 
> You're right that it is not necessarily needed, but it is the same as in
> ext4.
  For ext4 it's a bit different matter as it automatically sends discard
requests from mballoc when a block is freed. It makes a good sense to have
an option to enable / disable this. But even for ext4 it would make sense
to me to be able to allow this ioctl but still disable the logic for
automatic trimming... Thus my suggestion would be to make 'discard' mount
option only influence automatic trimming in ext4 and consequently it does
not make sense to have such an option for ext3...

> If you want to be really sure that no unwanted trim will be send to the
> device, 'nodiscard' mount option becomes handy. But I do not insist on it
> and I can easily get rid of it.
  Yeah, but the ioctl can be unsafe only if there are HW bugs or the trim
support is buggy. Of course, both can happen but I don't think it's serious
enough to warrant a new mount option (as that costs us something as well -
too much options => user confusion ;).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Add batched discard support for ext3
  2010-07-12 15:28   ` Jan Kara
@ 2010-07-12 15:58     ` Lukas Czerner
  2010-07-12 19:57       ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Lukas Czerner @ 2010-07-12 15:58 UTC (permalink / raw)
  To: Jan Kara; +Cc: Lukas Czerner, linux-ext4, jmoyer, rwheeler, eshishki, sandeen

On Mon, 12 Jul 2010, Jan Kara wrote:

> > Walk through each allocation group and trim all free extents. It can be
> > invoked through TRIM ioctl on the file system. The main idea is to
> > provide a way to trim the whole file system if needed, since some SSD's
> > may suffer from performance loss after the whole device was filled (it
> > does not mean that fs is full!).
> > 
> > It search for free extents in each allocation group. When the free
> > extent is found, blocks are marked as used and then trimmed. Afterwards
> > these blocks are marked as free in per-group bitmap.
> > 
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > ---
> >  fs/ext3/balloc.c        |  145 +++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/ext3/super.c         |    1 +
> >  include/linux/ext3_fs.h |    1 +
> >  3 files changed, 147 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
> > index a177122..bcee525 100644
> > --- a/fs/ext3/balloc.c
> > +++ b/fs/ext3/balloc.c
> ...
> > +		/**
> > +		 * Allocate contiguous free extents by setting bits in the
> > +		 * block bitmap
> > +		 */
> > +		while (next < max
> > +			&& !ext3_set_bit_atomic(sb_bgl_lock(sbi, group),
> > +					next, bh->b_data)) {
> > +			next++;
> > +		}
>   This is actually wrong. You completely ignore journalling here. You can't
> just go and modify metadata buffer - other process can be modifying it as well
> and writing it to disk and thus your changes will also get written. And if
> a crash happens afterwards before the bitmap is written again, you'll get an
> inconsistent filesystem.
>   Also you have to check whether the block isn't actually still used by a
> running/committing transaction - look at fs/ext3/balloc.c:claim_block() to see
> how you have to allocate free blocks.

I may be wrong, but I thought that since the trim command ensures that
every operation in queue completes before the trim proceed, I do not
need to care much about the journaling and running transaction. But I
will took at it once more..

>   All-in-all won't it be good enough to just freeze the fs, do the trimming,
> and unfreeze it?
> 
> 									Honza
> 

Yes, it can be done that way. But it is king of rough solution.

Thank.
-Lukas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Add discard/nodiscard mount option for ext3
  2010-07-12 15:50       ` Jan Kara
@ 2010-07-12 16:01         ` Lukas Czerner
  0 siblings, 0 replies; 20+ messages in thread
From: Lukas Czerner @ 2010-07-12 16:01 UTC (permalink / raw)
  To: Jan Kara; +Cc: Lukas Czerner, linux-ext4, jmoyer, rwheeler, eshishki, sandeen

On Mon, 12 Jul 2010, Jan Kara wrote:

> On Mon 12-07-10 17:26:49, Lukas Czerner wrote:
> > On Mon, 12 Jul 2010, Jan Kara wrote:
> > 
> > > > Those mount option has the same meaning as in ext4 file system. It
> > > > provide a way to enable/disable file system's trim support. The trim
> > > > support is off by default, thus nodiscard option is not actually
> > > > necessary.
> > >   I kind of miss why ext3 should have a 'discard' mount option. When
> > > user calls DISCARD ioctl on the filesystem, then he probably wants
> > > discard to be performed.
> > > 
> > > 								Honza
> > 
> > You're right that it is not necessarily needed, but it is the same as in
> > ext4.
>   For ext4 it's a bit different matter as it automatically sends discard
> requests from mballoc when a block is freed. It makes a good sense to have
> an option to enable / disable this. But even for ext4 it would make sense
> to me to be able to allow this ioctl but still disable the logic for
> automatic trimming... Thus my suggestion would be to make 'discard' mount
> option only influence automatic trimming in ext4 and consequently it does
> not make sense to have such an option for ext3...
> 
> > If you want to be really sure that no unwanted trim will be send to the
> > device, 'nodiscard' mount option becomes handy. But I do not insist on it
> > and I can easily get rid of it.
>   Yeah, but the ioctl can be unsafe only if there are HW bugs or the trim
> support is buggy. Of course, both can happen but I don't think it's serious
> enough to warrant a new mount option (as that costs us something as well -
> too much options => user confusion ;).
> 
> 								Honza

You're probably right, it is not needed for ext3. And since my patch for
ext4 removes the old 'send trim when block is freed' implementation, I guess
it is not needed there either.

-Lukas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Add discard/nodiscard mount option for ext3
  2010-07-12 15:19   ` Jan Kara
  2010-07-12 15:26     ` Lukas Czerner
  2010-07-12 15:27     ` Ric Wheeler
@ 2010-07-12 16:03     ` Ric Wheeler
  2010-07-12 16:05       ` Lukas Czerner
  2 siblings, 1 reply; 20+ messages in thread
From: Ric Wheeler @ 2010-07-12 16:03 UTC (permalink / raw)
  To: Jan Kara; +Cc: Lukas Czerner, linux-ext4, jmoyer, eshishki, sandeen

On 07/12/2010 11:19 AM, Jan Kara wrote:
>> Those mount option has the same meaning as in ext4 file system. It
>> provide a way to enable/disable file system's trim support. The trim
>> support is off by default, thus nodiscard option is not actually
>> necessary.
>    I kind of miss why ext3 should have a 'discard' mount option. When
> user calls DISCARD ioctl on the filesystem, then he probably wants
> discard to be performed.
>
> 								Honza
>

Sorry I misunderstood your original question.

One reason that you might want to have a "discard" option is to allow a system 
admin to mount without barriers to protect flaky hardware (we have had some 
mixed results for example). As you say, the user probably wants to have the 
ioctl do the discard and should be reasonable for doing it only on solid devices,

Regards,

Ric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Add discard/nodiscard mount option for ext3
  2010-07-12 16:03     ` Ric Wheeler
@ 2010-07-12 16:05       ` Lukas Czerner
  2010-07-12 16:15         ` Lukas Czerner
  0 siblings, 1 reply; 20+ messages in thread
From: Lukas Czerner @ 2010-07-12 16:05 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: Jan Kara, Lukas Czerner, linux-ext4, jmoyer, eshishki, sandeen

On Mon, 12 Jul 2010, Ric Wheeler wrote:

> On 07/12/2010 11:19 AM, Jan Kara wrote:
> > > Those mount option has the same meaning as in ext4 file system. It
> > > provide a way to enable/disable file system's trim support. The trim
> > > support is off by default, thus nodiscard option is not actually
> > > necessary.
> >    I kind of miss why ext3 should have a 'discard' mount option. When
> > user calls DISCARD ioctl on the filesystem, then he probably wants
> > discard to be performed.
> > 
> > 								Honza
> > 
> 
> Sorry I misunderstood your original question.
> 
> One reason that you might want to have a "discard" option is to allow a system
> admin to mount without barriers to protect flaky hardware (we have had some
> mixed results for example). As you say, the user probably wants to have the
> ioctl do the discard and should be reasonable for doing it only on solid
> devices,

The question is what in does on device other than SSD. I know it does
not harm the deivce, but is there some kernel logic preventing the trim
command to be send to device that does not support it ? I hope so.

-Lukas

> 
> Regards,
> 
> Ric
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Add discard/nodiscard mount option for ext3
  2010-07-12 16:05       ` Lukas Czerner
@ 2010-07-12 16:15         ` Lukas Czerner
  2010-07-12 18:07           ` Eric Sandeen
  0 siblings, 1 reply; 20+ messages in thread
From: Lukas Czerner @ 2010-07-12 16:15 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: Ric Wheeler, Jan Kara, linux-ext4, jmoyer, eshishki, sandeen

On Mon, 12 Jul 2010, Lukas Czerner wrote:

> On Mon, 12 Jul 2010, Ric Wheeler wrote:
> 
> > On 07/12/2010 11:19 AM, Jan Kara wrote:
> > > > Those mount option has the same meaning as in ext4 file system. It
> > > > provide a way to enable/disable file system's trim support. The trim
> > > > support is off by default, thus nodiscard option is not actually
> > > > necessary.
> > >    I kind of miss why ext3 should have a 'discard' mount option. When
> > > user calls DISCARD ioctl on the filesystem, then he probably wants
> > > discard to be performed.
> > > 
> > > 								Honza
> > > 
> > 
> > Sorry I misunderstood your original question.
> > 
> > One reason that you might want to have a "discard" option is to allow a system
> > admin to mount without barriers to protect flaky hardware (we have had some
> > mixed results for example). As you say, the user probably wants to have the
> > ioctl do the discard and should be reasonable for doing it only on solid
> > devices,
> 
> The question is what in does on device other than SSD. I know it does
> not harm the deivce, but is there some kernel logic preventing the trim
> command to be send to device that does not support it ? I hope so.
> 

Yes, there is a check whether device support trim in blkdev_issue_discard
code.

-Lukas

> 
> > 
> > Regards,
> > 
> > Ric
> > 
> 

-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] Add discard/nodiscard mount option for ext3
  2010-07-12 16:15         ` Lukas Czerner
@ 2010-07-12 18:07           ` Eric Sandeen
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Sandeen @ 2010-07-12 18:07 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: Ric Wheeler, Jan Kara, linux-ext4, jmoyer, eshishki

On 07/12/2010 11:15 AM, Lukas Czerner wrote:
> On Mon, 12 Jul 2010, Lukas Czerner wrote:
> 
>> On Mon, 12 Jul 2010, Ric Wheeler wrote:
>>
>>> On 07/12/2010 11:19 AM, Jan Kara wrote:
>>>>> Those mount option has the same meaning as in ext4 file system. It
>>>>> provide a way to enable/disable file system's trim support. The trim
>>>>> support is off by default, thus nodiscard option is not actually
>>>>> necessary.
>>>>    I kind of miss why ext3 should have a 'discard' mount option. When
>>>> user calls DISCARD ioctl on the filesystem, then he probably wants
>>>> discard to be performed.
>>>>
>>>> 								Honza
>>>>
>>>
>>> Sorry I misunderstood your original question.
>>>
>>> One reason that you might want to have a "discard" option is to allow a system
>>> admin to mount without barriers to protect flaky hardware (we have had some
>>> mixed results for example). As you say, the user probably wants to have the
>>> ioctl do the discard and should be reasonable for doing it only on solid
>>> devices,
>>
>> The question is what in does on device other than SSD. I know it does
>> not harm the deivce, but is there some kernel logic preventing the trim
>> command to be send to device that does not support it ? I hope so.
>>
> 
> Yes, there is a check whether device support trim in blkdev_issue_discard
> code.

I also sent an ext4 patch so that a failed discard clears the flag so we
don't keep attempting more.

However, the LVM guys aren't too happy with that because they point to
cases like:

a) pvmove off of, and back onto, a trim-capable device
b) composite devices of trim-capable and -incapable devices

I don't care much either way, I don't think we do much harm in
continuing to send discards after a failure; the original motivation for
the patch was so that if a user asked for discard and the (simple)
device didn't support it, they'd get a message, and we'd stop trying.

-Eric

> -Lukas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Add batched discard support for ext3
  2010-07-12 15:58     ` Lukas Czerner
@ 2010-07-12 19:57       ` Jan Kara
  2010-07-13 15:55         ` Lukas Czerner
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2010-07-12 19:57 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: Jan Kara, linux-ext4, jmoyer, rwheeler, eshishki, sandeen

On Mon 12-07-10 17:58:46, Lukas Czerner wrote:
> On Mon, 12 Jul 2010, Jan Kara wrote:
> 
> > > Walk through each allocation group and trim all free extents. It can be
> > > invoked through TRIM ioctl on the file system. The main idea is to
> > > provide a way to trim the whole file system if needed, since some SSD's
> > > may suffer from performance loss after the whole device was filled (it
> > > does not mean that fs is full!).
> > > 
> > > It search for free extents in each allocation group. When the free
> > > extent is found, blocks are marked as used and then trimmed. Afterwards
> > > these blocks are marked as free in per-group bitmap.
> > > 
> > > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > > ---
> > >  fs/ext3/balloc.c        |  145 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/ext3/super.c         |    1 +
> > >  include/linux/ext3_fs.h |    1 +
> > >  3 files changed, 147 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
> > > index a177122..bcee525 100644
> > > --- a/fs/ext3/balloc.c
> > > +++ b/fs/ext3/balloc.c
> > ...
> > > +		/**
> > > +		 * Allocate contiguous free extents by setting bits in the
> > > +		 * block bitmap
> > > +		 */
> > > +		while (next < max
> > > +			&& !ext3_set_bit_atomic(sb_bgl_lock(sbi, group),
> > > +					next, bh->b_data)) {
> > > +			next++;
> > > +		}
> >   This is actually wrong. You completely ignore journalling here. You can't
> > just go and modify metadata buffer - other process can be modifying it as well
> > and writing it to disk and thus your changes will also get written. And if
> > a crash happens afterwards before the bitmap is written again, you'll get an
> > inconsistent filesystem.
> >   Also you have to check whether the block isn't actually still used by a
> > running/committing transaction - look at fs/ext3/balloc.c:claim_block() to see
> > how you have to allocate free blocks.
> 
> I may be wrong, but I thought that since the trim command ensures that
> every operation in queue completes before the trim proceed, I do not
> need to care much about the journaling and running transaction. But I
> will took at it once more..
  Consider just a simple race:

  thread A:			thread B:

  allocate blocks in group G
  				set bits for free blocks in group G
  transaction with allocation
    commits - bitmap has bits
    from thread B set
----------------------------------------------- crash
  After a journal replay we have just leaked blocks set in the bitmap
by thread B...
  And there are probably races with worse consequences. This is just the
simplest one.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] Add batched discard support for ext3
  2010-07-12 19:57       ` Jan Kara
@ 2010-07-13 15:55         ` Lukas Czerner
  0 siblings, 0 replies; 20+ messages in thread
From: Lukas Czerner @ 2010-07-13 15:55 UTC (permalink / raw)
  To: Jan Kara; +Cc: Lukas Czerner, linux-ext4, jmoyer, rwheeler, eshishki, sandeen

On Mon, 12 Jul 2010, Jan Kara wrote:

> On Mon 12-07-10 17:58:46, Lukas Czerner wrote:
> > On Mon, 12 Jul 2010, Jan Kara wrote:
> > 
> > > > Walk through each allocation group and trim all free extents. It can be
> > > > invoked through TRIM ioctl on the file system. The main idea is to
> > > > provide a way to trim the whole file system if needed, since some SSD's
> > > > may suffer from performance loss after the whole device was filled (it
> > > > does not mean that fs is full!).
> > > > 
> > > > It search for free extents in each allocation group. When the free
> > > > extent is found, blocks are marked as used and then trimmed. Afterwards
> > > > these blocks are marked as free in per-group bitmap.
> > > > 
> > > > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > > > ---
> > > >  fs/ext3/balloc.c        |  145 +++++++++++++++++++++++++++++++++++++++++++++++
> > > >  fs/ext3/super.c         |    1 +
> > > >  include/linux/ext3_fs.h |    1 +
> > > >  3 files changed, 147 insertions(+), 0 deletions(-)
> > > > 
> > > > diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
> > > > index a177122..bcee525 100644
> > > > --- a/fs/ext3/balloc.c
> > > > +++ b/fs/ext3/balloc.c
> > > ...
> > > > +		/**
> > > > +		 * Allocate contiguous free extents by setting bits in the
> > > > +		 * block bitmap
> > > > +		 */
> > > > +		while (next < max
> > > > +			&& !ext3_set_bit_atomic(sb_bgl_lock(sbi, group),
> > > > +					next, bh->b_data)) {
> > > > +			next++;
> > > > +		}
> > >   This is actually wrong. You completely ignore journalling here. You can't
> > > just go and modify metadata buffer - other process can be modifying it as well
> > > and writing it to disk and thus your changes will also get written. And if
> > > a crash happens afterwards before the bitmap is written again, you'll get an
> > > inconsistent filesystem.
> > >   Also you have to check whether the block isn't actually still used by a
> > > running/committing transaction - look at fs/ext3/balloc.c:claim_block() to see
> > > how you have to allocate free blocks.
> > 
> > I may be wrong, but I thought that since the trim command ensures that
> > every operation in queue completes before the trim proceed, I do not
> > need to care much about the journaling and running transaction. But I
> > will took at it once more..
>   Consider just a simple race:
> 
>   thread A:			thread B:
> 
>   allocate blocks in group G
>   				set bits for free blocks in group G
>   transaction with allocation
>     commits - bitmap has bits
>     from thread B set
> ----------------------------------------------- crash
>   After a journal replay we have just leaked blocks set in the bitmap
> by thread B...
>   And there are probably races with worse consequences. This is just the
> simplest one.
> 
> 								Honza
> 

Ok, I was terribly wrong! I am going to fix it, as well as ext4 patch.

Thanks for clarifying that!

-Lukas

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-07-13 15:55 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-07 13:18 Ext3: batched discard support Lukas Czerner
2010-07-07 13:18 ` [PATCH 1/2] Add discard/nodiscard mount option for ext3 Lukas Czerner
2010-07-12 15:19   ` Jan Kara
2010-07-12 15:26     ` Lukas Czerner
2010-07-12 15:50       ` Jan Kara
2010-07-12 16:01         ` Lukas Czerner
2010-07-12 15:27     ` Ric Wheeler
2010-07-12 16:03     ` Ric Wheeler
2010-07-12 16:05       ` Lukas Czerner
2010-07-12 16:15         ` Lukas Czerner
2010-07-12 18:07           ` Eric Sandeen
2010-07-07 13:18 ` [PATCH 2/2] Add batched discard support " Lukas Czerner
2010-07-12 15:28   ` Jan Kara
2010-07-12 15:58     ` Lukas Czerner
2010-07-12 19:57       ` Jan Kara
2010-07-13 15:55         ` Lukas Czerner
2010-07-07 19:14 ` Ext3: batched discard support Greg Freemyer
2010-07-09  8:53   ` Lukas Czerner
2010-07-09 10:18     ` Ric Wheeler
2010-07-12 15:09       ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.