All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Batched discard support
@ 2010-08-06 11:31 Lukas Czerner
  2010-08-06 11:31 ` [PATCH 1/3] Add ioctl FITRIM Lukas Czerner
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-08-06 11:31 UTC (permalink / raw)
  To: linux-ext4
  Cc: dmonakhov, jmoyer, rwheeler, eshishki, sandeen, jack, tytso, lczerner

Hi, all

because people were worried about possibly long stalls appearing
when FITRIM ioctl is working, I have changed the FITRIM interface
as Dimitry suggested. Now you can choose whether to trim whole
file system or just a part of it, resp. you can specify the range
of Bytes to trim.

To be specific you can create something like this:

int main(int argc, char **argv)
{
	int fd;
	uint64_t range[3];

	range[0] = 40960;
	range[1] = 134217728;
	range[2] = 4096;

	fd = open(argv[1], O_RDONLY);
	if (fd < 0) {
		perror("open");
		return 1;
	}

	if (ioctl(fd, FITRIM, range)) {
		if (errno == EOPNOTSUPP)
			fprintf(stderr, "FITRIM not supported\n");
		else
			perror("FITRIM");
		return 1;
	}

	return 0;
}

Range items have following meaning:

range[0] - (start) first Byte to trim
range[1] - (len) number of Bytes to trim from start
range[2] - (minlen) minimum extent length to trim, free extents shorter
than this number of Bytes will be ignored. This number will be rounded
up to the block size.

So in my example it will trim all free extents from block 10 of first
alloc. group to block 10 of second alloc. group, assuming we have
block_size = 4096.

Also, when you want to trim the whole fs, you can simply pass NULL
instead of range into the ioctl, or you can specify the range correctly
to cover the whole fs.

Regards
-Lukas

[PATCH 1/3] Add ioctl FITRIM.
[PATCH 2/3] Add batched discard support for ext3
[PATCH 3/3] Add batched discard support for ext4

 fs/ext3/balloc.c        |  249 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext3/super.c         |    1 +
 fs/ext4/ext4.h          |    2 +
 fs/ext4/mballoc.c       |  194 ++++++++++++++++++++++++++++++++++++
 fs/ext4/super.c         |    1 +
 fs/ioctl.c              |   34 +++++++
 include/linux/ext3_fs.h |    1 +
 include/linux/fs.h      |    2 +
 8 files changed, 484 insertions(+), 0 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] Add ioctl FITRIM.
  2010-08-06 11:31 [PATCH 0/3] Batched discard support Lukas Czerner
@ 2010-08-06 11:31 ` Lukas Czerner
  2010-08-06 11:31 ` [PATCH 2/3] Add batched discard support for ext3 Lukas Czerner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-08-06 11:31 UTC (permalink / raw)
  To: linux-ext4
  Cc: dmonakhov, jmoyer, rwheeler, eshishki, sandeen, jack, tytso, lczerner

Adds an filesystem independent ioctl to allow implementation of file
system batched discard support. It takes an array of three uint64_t
as an argument, the meaning of each item is as follows:

array[0] - (start) first Byte to trim
array[1] - (len) number of Bytes to trim from start
array[2] - (minlen) minimum extent length to trim, free extents shorter
than this number of Bytes will be ignored. This number will be rounded
up to the block size.

It is also possible to specify NULL as an argument. In this case the
arguments will set itself as follows:

args[0] = 0;
args[1] = LLONG_MAX;
args[2] = 0;

So it will trim the whole file system at one run.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ioctl.c         |   34 ++++++++++++++++++++++++++++++++++
 include/linux/fs.h |    2 ++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index 2d140a7..a0e243c 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -540,6 +540,36 @@ static int ioctl_fsthaw(struct file *filp)
 	return thaw_super(sb);
 }
 
+static int ioctl_fstrim(struct file *filp, void __user *argp)
+{
+	struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
+	uint64_t args[3];
+	int err;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	/* If filesystem doesn't support trim feature, return. */
+	if (sb->s_op->trim_fs == NULL)
+		return -EOPNOTSUPP;
+
+	/* If a blockdevice-backed filesystem isn't specified, return EINVAL. */
+	if (sb->s_bdev == NULL)
+		return -EINVAL;
+
+	if (argp == NULL) {
+		args[0] = 0;
+		args[1] = ULLONG_MAX;
+		args[2] = 0;
+	} else if (copy_from_user(args, argp, sizeof(args)))
+		return -EFAULT;
+
+	err = sb->s_op->trim_fs(sb, args[0], args[1], args[2]);
+	if (err)
+		return err;
+	return 0;
+}
+
 /*
  * When you add any new common ioctls to the switches above and below
  * please update compat_sys_ioctl() too.
@@ -590,6 +620,10 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
 		error = ioctl_fsthaw(filp);
 		break;
 
+	case FITRIM:
+		error = ioctl_fstrim(filp, argp);
+		break;
+
 	case FS_IOC_FIEMAP:
 		return ioctl_fiemap(filp, arg);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e5106e4..0cc196f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -316,6 +316,7 @@ struct inodes_stat_t {
 #define FIGETBSZ   _IO(0x00,2)	/* get the block size used for bmap */
 #define FIFREEZE	_IOWR('X', 119, int)	/* Freeze */
 #define FITHAW		_IOWR('X', 120, int)	/* Thaw */
+#define FITRIM		_IOW('X', 121, uint64_t)/* Trim */
 
 #define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
 #define	FS_IOC_SETFLAGS			_IOW('f', 2, long)
@@ -1582,6 +1583,7 @@ struct super_operations {
 	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
 #endif
 	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
+	int (*trim_fs) (struct super_block *, uint64_t, uint64_t, uint64_t);
 };
 
 /*
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] Add batched discard support for ext3
  2010-08-06 11:31 [PATCH 0/3] Batched discard support Lukas Czerner
  2010-08-06 11:31 ` [PATCH 1/3] Add ioctl FITRIM Lukas Czerner
@ 2010-08-06 11:31 ` Lukas Czerner
  2010-08-06 11:31 ` [PATCH 3/3] Add batched discard support for ext4 Lukas Czerner
  2010-08-06 13:05 ` [PATCH 0/3] Batched discard support Dmitry Monakhov
  3 siblings, 0 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-08-06 11:31 UTC (permalink / raw)
  To: linux-ext4
  Cc: dmonakhov, jmoyer, rwheeler, eshishki, sandeen, jack, tytso, lczerner

Walk through allocation groups and trim all free extents. It can be
invoked through FITRIM ioctl on the file system. The main idea is to
provide a way to trim the whole file system if needed, since some SSD's
may suffer from performance loss after the whole device was filled (it
does not mean that fs is full!).

It search for free extents in allocation groups specified by Byte range
start -> start+len. When the free extent is within this range, blocks are
marked as used and then trimmed. Afterwards these blocks are marked as
free in per-group bitmap.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext3/balloc.c        |  249 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext3/super.c         |    1 +
 include/linux/ext3_fs.h |    1 +
 3 files changed, 251 insertions(+), 0 deletions(-)

diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
index 4a32511..72a9242 100644
--- a/fs/ext3/balloc.c
+++ b/fs/ext3/balloc.c
@@ -20,6 +20,7 @@
 #include <linux/ext3_jbd.h>
 #include <linux/quotaops.h>
 #include <linux/buffer_head.h>
+#include <linux/blkdev.h>
 
 /*
  * balloc.c contains the blocks allocation and deallocation routines
@@ -1882,3 +1883,251 @@ unsigned long ext3_bg_num_gdb(struct super_block *sb, int group)
 	return ext3_bg_num_gdb_meta(sb,group);
 
 }
+
+/**
+ * ext3_trim_all_free -- function to trim all free space in alloc. group
+ * @sb:			super block for file system
+ * @group:		allocation group to trim
+ * @start:		first group block to examine
+ * @max:		last group block to examine
+ * @gdp:		allocation group description structure
+ * @minblocks:		minimum extent block count
+ *
+ * ext3_trim_all_free walks through group's block bitmap searching for free
+ * blocks. When the free block is found, it tries to allocate this block and
+ * consequent free block to get the biggest free extent possible, until it
+ * reaches any used block. Then issue a TRIM command on this extent and free
+ * the extent in the block bitmap. This is done until whole group is scanned.
+ */
+ext3_grpblk_t ext3_trim_all_free(struct super_block *sb, unsigned int group,
+				ext3_grpblk_t start, ext3_grpblk_t max,
+				ext3_grpblk_t minblocks)
+{
+	handle_t *handle;
+	ext3_grpblk_t next, count = 0, bit;
+	struct ext3_sb_info *sbi;
+	ext3_fsblk_t discard_block;
+	struct buffer_head *bitmap_bh = NULL;
+	struct buffer_head *gdp_bh;
+	ext3_grpblk_t free_blocks;
+	struct ext3_group_desc *gdp;
+	int err = 0, ret = 0;
+	ext3_grpblk_t freed;
+
+	/*
+	 * We will update one block bitmap, and one group descriptor
+	 */
+	handle = ext3_journal_start_sb(sb, 2);
+	if (IS_ERR(handle)) {
+		err = PTR_ERR(handle);
+		return err;
+	}
+
+	bitmap_bh = read_block_bitmap(sb, group);
+	if (!bitmap_bh)
+		goto err_out;
+
+	BUFFER_TRACE(bitmap_bh, "getting undo access");
+	err = ext3_journal_get_undo_access(handle, bitmap_bh);
+	if (err)
+		goto err_out;
+
+	gdp = ext3_get_group_desc(sb, group, &gdp_bh);
+	if (!gdp)
+		goto err_out;
+
+	BUFFER_TRACE(gdp_bh, "get_write_access");
+	err = ext3_journal_get_write_access(handle, gdp_bh);
+	if (err)
+		goto err_out;
+
+	free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
+	sbi = EXT3_SB(sb);
+
+	 /* Walk through the whole group */
+	while (start < max) {
+
+		start = bitmap_search_next_usable_block(start, bitmap_bh, max);
+		if (start < 0)
+			break;
+		next = start;
+
+		/*
+		 * Allocate contiguous free extents by setting bits in the
+		 * block bitmap
+		 */
+		while (next < max
+			&& claim_block(sb_bgl_lock(sbi, group),
+					next, bitmap_bh)) {
+			next++;
+		}
+
+		 /* We did not claim any blocks */
+		if (next == start)
+			continue;
+
+		discard_block = (ext3_fsblk_t)start +
+				ext3_group_first_block_no(sb, group);
+
+		/* Update counters */
+		spin_lock(sb_bgl_lock(sbi, group));
+		le16_add_cpu(&gdp->bg_free_blocks_count, start - next);
+		spin_unlock(sb_bgl_lock(sbi, group));
+		percpu_counter_sub(&sbi->s_freeblocks_counter, next - start);
+
+		/* Do not issue a TRIM on extents smaller than minblocks */
+		if ((next - start) < minblocks)
+			goto free_extent;
+
+		 /* Send the TRIM command down to the device */
+		ret = sb_issue_discard(sb, discard_block, next - start);
+		count += (next - start);
+
+free_extent:
+		freed = 0;
+
+		/*
+		 * Clear bits in the bitmap
+		 */
+		for (bit = start; bit < next; bit++) {
+			BUFFER_TRACE(bitmap_bh, "clear bit");
+			if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, group),
+						bit, bitmap_bh->b_data)) {
+				ext3_error(sb, __func__,
+					"bit already cleared for block "E3FSBLK,
+					 (unsigned long)bit);
+				BUFFER_TRACE(bitmap_bh, "bit already cleared");
+			} else {
+				freed++;
+			}
+		}
+
+		/* Update couters */
+		spin_lock(sb_bgl_lock(sbi, group));
+		le16_add_cpu(&gdp->bg_free_blocks_count, freed);
+		spin_unlock(sb_bgl_lock(sbi, group));
+		percpu_counter_add(&sbi->s_freeblocks_counter, next - start);
+
+		start = next;
+
+		if (ret < 0) {
+			if (ret == -EOPNOTSUPP) {
+				ext3_warning(sb, __func__,
+					"discard not supported!");
+				count = ret;
+				break;
+			}
+			err = ret;
+			break;
+		}
+
+		if (signal_pending(current)) {
+			count = -ERESTARTSYS;
+			break;
+		}
+
+		cond_resched();
+
+		/* No more suitable extents */
+		if ((free_blocks - count) < minblocks)
+			break;
+	}
+
+	/* We dirtied the bitmap block */
+	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
+	err = ext3_journal_dirty_metadata(handle, bitmap_bh);
+
+	/* And the group descriptor block */
+	BUFFER_TRACE(gdp_bh, "dirtied group descriptor block");
+	ret = ext3_journal_dirty_metadata(handle, gdp_bh);
+	if (!err)
+		err = ret;
+
+	ext3_debug("trimmed %d blocks in the group %d\n",
+		count, group);
+
+err_out:
+	if (err) {
+		ext3_std_error(sb, err);
+		count = err;
+	}
+
+	ext3_journal_stop(handle);
+	brelse(bitmap_bh);
+
+	return count;
+}
+
+/**
+ * ext3_trim_fs() -- trim ioctl handle function
+ * @sb:			superblock for filesystem
+ * @start:		First Byte to trim
+ * @len:		number of Bytes to trim from start
+ * @minlen:		minimum extent length in Bytes
+ *
+ * ext3_trim_fs goes through all allocation groups containing Bytes from
+ * start to start+len. For each such a group ext3_trim_all_free function
+ * is invoked to trim all free space.
+ */
+int ext3_trim_fs(struct super_block *sb, uint64_t start, uint64_t len,
+		uint64_t minlen)
+{
+	ext3_grpblk_t last_block, first_block, free_blocks;
+	unsigned long long first_group, last_group;
+	unsigned long group, ngroups;
+	struct ext3_group_desc *gdp;
+	struct ext3_super_block *es;
+	int ret = 0;
+
+	start >>= sb->s_blocksize_bits;
+	len >>= sb->s_blocksize_bits;
+	minlen >>= sb->s_blocksize_bits;
+
+	if (unlikely(minlen > EXT3_BLOCKS_PER_GROUP(sb)))
+		return -EINVAL;
+
+	es = EXT3_SB(sb)->s_es;
+	ngroups = EXT3_SB(sb)->s_groups_count;
+	smp_rmb();
+
+	/* Determine first and last group to examine based on start and len */
+	first_group = (start - le32_to_cpu(es->s_first_data_block)) /
+		      EXT3_BLOCKS_PER_GROUP(sb);
+	last_group = (start + len - le32_to_cpu(es->s_first_data_block)) /
+		     EXT3_BLOCKS_PER_GROUP(sb);
+	last_group = (last_group > ngroups - 1) ? ngroups - 1 : last_group;
+
+	if (first_group > last_group)
+		return -EINVAL;
+
+	first_block = start % EXT3_BLOCKS_PER_GROUP(sb);
+	last_block = EXT3_BLOCKS_PER_GROUP(sb);
+
+	for (group = first_group; group <= last_group; group++) {
+
+		gdp = ext3_get_group_desc(sb, group, NULL);
+		if (!gdp)
+			break;
+
+		free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
+		if (free_blocks < minlen)
+			continue;
+
+		if (len >= EXT3_BLOCKS_PER_GROUP(sb))
+			len -= (EXT3_BLOCKS_PER_GROUP(sb) - first_block);
+		else
+			last_block = len;
+
+		ret = ext3_trim_all_free(sb, group, first_block,
+					last_block, minlen);
+		if (ret < 0)
+			break;
+
+		first_block = 0;
+	}
+
+	if (ret >= 0)
+		ret = 0;
+
+	return ret;
+}
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 6c953bb..b172c88 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -797,6 +797,7 @@ static const struct super_operations ext3_sops = {
 	.quota_write	= ext3_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.trim_fs	= ext3_trim_fs,
 };
 
 static const struct export_operations ext3_export_ops = {
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index 7fc62d4..9afacd4 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -857,6 +857,7 @@ extern struct ext3_group_desc * ext3_get_group_desc(struct super_block * sb,
 extern int ext3_should_retry_alloc(struct super_block *sb, int *retries);
 extern void ext3_init_block_alloc_info(struct inode *);
 extern void ext3_rsv_window_add(struct super_block *sb, struct ext3_reserve_window_node *rsv);
+extern int ext3_trim_fs(struct super_block *, uint64_t, uint64_t, uint64_t);
 
 /* dir.c */
 extern int ext3_check_dir_entry(const char *, struct inode *,
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] Add batched discard support for ext4
  2010-08-06 11:31 [PATCH 0/3] Batched discard support Lukas Czerner
  2010-08-06 11:31 ` [PATCH 1/3] Add ioctl FITRIM Lukas Czerner
  2010-08-06 11:31 ` [PATCH 2/3] Add batched discard support for ext3 Lukas Czerner
@ 2010-08-06 11:31 ` Lukas Czerner
  2010-08-06 13:03   ` Dmitry Monakhov
  2010-08-07 22:25   ` Jan Kara
  2010-08-06 13:05 ` [PATCH 0/3] Batched discard support Dmitry Monakhov
  3 siblings, 2 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-08-06 11:31 UTC (permalink / raw)
  To: linux-ext4
  Cc: dmonakhov, jmoyer, rwheeler, eshishki, sandeen, jack, tytso, lczerner

Walk through allocation groups and trim all free extents. It can be
invoked through FITRIM ioctl on the file system. The main idea is to
provide a way to trim the whole file system if needed, since some SSD's
may suffer from performance loss after the whole device was filled (it
does not mean that fs is full!).

It search for free extents in allocation groups specified by Byte range
start -> start+len. When the free extent is within this range, blocks
are marked as used and then trimmed. Afterwards these blocks are marked
as free in per-group bitmap.

Since fstrim is a long operation it is good to have an ability to
interrupt it by a signal. This was added by Dmitry Monakhov.
Thanks Dimitry.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4.h    |    2 +
 fs/ext4/mballoc.c |  194 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext4/super.c   |    1 +
 3 files changed, 197 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 19a4de5..2f96dc5 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1558,6 +1558,8 @@ extern int ext4_mb_add_groupinfo(struct super_block *sb,
 extern int ext4_mb_get_buddy_cache_lock(struct super_block *, ext4_group_t);
 extern void ext4_mb_put_buddy_cache_lock(struct super_block *,
 						ext4_group_t, int);
+extern int ext4_trim_fs(struct super_block *, uint64_t, uint64_t, uint64_t);
+
 /* inode.c */
 struct buffer_head *ext4_getblk(handle_t *, struct inode *,
 						ext4_lblk_t, int, int *);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 0e83dfd..5210676 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4690,3 +4690,197 @@ error_return:
 		kmem_cache_free(ext4_ac_cachep, ac);
 	return;
 }
+
+/**
+ * ext4_trim_extent -- function to TRIM one single free extent in the group
+ * @sb:		super block for the file system
+ * @start:	starting block of the free extent in the alloc. group
+ * @count:	number of blocks to TRIM
+ * @group:	alloc. group we are working with
+ * @e4b:	ext4 buddy for the group
+ *
+ * Trim "count" blocks starting at "start" in the "group". To assure that no
+ * one will allocate those blocks, mark it as used in buddy bitmap. This must
+ * be called with under the group lock.
+ */
+static int ext4_trim_extent(struct super_block *sb, int start, int count,
+		ext4_group_t group, struct ext4_buddy *e4b)
+{
+	ext4_fsblk_t discard_block;
+	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
+	struct ext4_free_extent ex;
+	int ret = 0;
+
+	assert_spin_locked(ext4_group_lock_ptr(sb, group));
+
+	ex.fe_start = start;
+	ex.fe_group = group;
+	ex.fe_len = count;
+
+	/*
+	 * Mark blocks used, so no one can reuse them while
+	 * being trimmed.
+	 */
+	mb_mark_used(e4b, &ex);
+	ext4_unlock_group(sb, group);
+
+	discard_block = (ext4_fsblk_t)group *
+			EXT4_BLOCKS_PER_GROUP(sb)
+			+ start
+			+ le32_to_cpu(es->s_first_data_block);
+	trace_ext4_discard_blocks(sb,
+			(unsigned long long)discard_block,
+			count);
+	ret = sb_issue_discard(sb, discard_block, count);
+	if (ret == -EOPNOTSUPP) {
+		ext4_warning(sb,
+			"discard not supported!");
+	} else if (ret < 0) {
+		ext4_std_error(sb, ret);
+	}
+	cond_resched();
+
+	ext4_lock_group(sb, group);
+	mb_free_blocks(NULL, e4b, start, ex.fe_len);
+	return ret;
+}
+
+/**
+ * ext4_trim_all_free -- function to trim all free space in alloc. group
+ * @sb:			super block for file system
+ * @e4b:		ext4 buddy
+ * @start:		first group block to examine
+ * @max:		last group block to examine
+ * @minblocks:		minimum extent block count
+ *
+ * ext4_trim_all_free walks through group's buddy bitmap searching for free
+ * extents. When the free block is found, ext4_trim_extent is called to TRIM
+ * the extent.
+ *
+ *
+ * ext4_trim_all_free walks through group's block bitmap searching for free
+ * extents. When the free extent is found, mark it as used in group buddy
+ * bitmap. Then issue a TRIM command on this extent and free the extent in
+ * the group buddy bitmap. This is done until whole group is scanned.
+ */
+ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
+		ext4_grpblk_t start, ext4_grpblk_t max, ext4_grpblk_t minblocks)
+{
+	void *bitmap;
+	ext4_grpblk_t next, count = 0;
+	ext4_group_t group;
+	int ret = 0;
+
+	BUG_ON(e4b == NULL);
+
+	bitmap = e4b->bd_bitmap;
+	group = e4b->bd_group;
+	start = (e4b->bd_info->bb_first_free > start) ?
+		e4b->bd_info->bb_first_free : start;
+	ext4_lock_group(sb, group);
+
+	while (start < max) {
+
+		start = mb_find_next_zero_bit(bitmap, max, start);
+		if (start >= max)
+			break;
+		next = mb_find_next_bit(bitmap, max, start);
+
+		if ((next - start) >= minblocks) {
+			ret = ext4_trim_extent(sb, start,
+				next - start, group, e4b);
+			if (ret < 0)
+				break;
+			count += next - start;
+		}
+		start = next + 1;
+
+		if (signal_pending(current)) {
+			count = -ERESTARTSYS;
+			break;
+		}
+
+		if ((e4b->bd_info->bb_free - count) < minblocks)
+			break;
+	}
+	ext4_unlock_group(sb, group);
+
+	ext4_debug("trimmed %d blocks in the group %d\n",
+		count, group);
+
+	if (ret < 0)
+		count = ret;
+
+	return count;
+}
+
+/**
+ * ext4_trim_fs() -- trim ioctl handle function
+ * @sb:			superblock for filesystem
+ * @start:		First Byte to trim
+ * @len:		number of Bytes to trim from start
+ * @minlen:		minimum extent length in Bytes
+ *
+ * ext4_trim_fs goes through all allocation groups containing Bytes from
+ * start to start+len. For each such a group ext4_trim_all_free function
+ * is invoked to trim all free space.
+ */
+int ext4_trim_fs(struct super_block *sb, uint64_t start, uint64_t len,
+		uint64_t minlen)
+{
+	struct ext4_buddy e4b;
+	ext4_fsblk_t first_group, last_group;
+	ext4_group_t group, ngroups = ext4_get_groups_count(sb);
+	ext4_grpblk_t cnt, first_block, last_block;
+	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
+	int ret = 0;
+
+	start >>= sb->s_blocksize_bits;
+	len >>= sb->s_blocksize_bits;
+	minlen >>= sb->s_blocksize_bits;
+
+	if (unlikely(minlen > EXT4_BLOCKS_PER_GROUP(sb)))
+		return -EINVAL;
+
+	/* Determine first and last group to examine based on start and len */
+	first_group = (start - le32_to_cpu(es->s_first_data_block)) /
+		      EXT4_BLOCKS_PER_GROUP(sb);
+	last_group = (start + len - le32_to_cpu(es->s_first_data_block)) /
+		     EXT4_BLOCKS_PER_GROUP(sb);
+	last_group = (last_group > ngroups - 1) ? ngroups - 1 : last_group;
+
+	if (first_group > last_group)
+		return -EINVAL;
+
+	first_block = start % EXT4_BLOCKS_PER_GROUP(sb);
+	last_block = EXT4_BLOCKS_PER_GROUP(sb);
+
+	for (group = first_group; group <= last_group; group++) {
+
+		ret = ext4_mb_load_buddy(sb, group, &e4b);
+		if (ret) {
+			ext4_error(sb, "Error in loading buddy "
+					"information for %u", group);
+			break;
+		}
+
+		if (len >= EXT4_BLOCKS_PER_GROUP(sb))
+			len -= (EXT4_BLOCKS_PER_GROUP(sb) - first_block);
+		else
+			last_block = len;
+
+		if (e4b.bd_info->bb_free >= minlen) {
+			cnt = ext4_trim_all_free(sb, &e4b, first_block,
+						last_block, minlen);
+			if (cnt < 0) {
+				ret = cnt;
+				ext4_mb_unload_buddy(&e4b);
+				break;
+			}
+		}
+		ext4_mb_unload_buddy(&e4b);
+
+		first_block = 0;
+	}
+	return ret;
+}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e72d323..3b05b3a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1114,6 +1114,7 @@ static const struct super_operations ext4_sops = {
 	.quota_write	= ext4_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.trim_fs	= ext4_trim_fs
 };
 
 static const struct super_operations ext4_nojournal_sops = {
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] Add batched discard support for ext4
  2010-08-06 11:31 ` [PATCH 3/3] Add batched discard support for ext4 Lukas Czerner
@ 2010-08-06 13:03   ` Dmitry Monakhov
  2010-08-06 13:23     ` Lukas Czerner
  2010-08-07 22:25   ` Jan Kara
  1 sibling, 1 reply; 14+ messages in thread
From: Dmitry Monakhov @ 2010-08-06 13:03 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: linux-ext4, jmoyer, rwheeler, eshishki, sandeen, jack, tytso

Lukas Czerner <lczerner@redhat.com> writes:

> Walk through allocation groups and trim all free extents. It can be
> invoked through FITRIM ioctl on the file system. The main idea is to
> provide a way to trim the whole file system if needed, since some SSD's
> may suffer from performance loss after the whole device was filled (it
> does not mean that fs is full!).
>
> It search for free extents in allocation groups specified by Byte range
> start -> start+len. When the free extent is within this range, blocks
> are marked as used and then trimmed. Afterwards these blocks are marked
> as free in per-group bitmap.
>
> Since fstrim is a long operation it is good to have an ability to
> interrupt it by a signal. This was added by Dmitry Monakhov.
> Thanks Dimitry.
>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> Reviewed-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/ext4.h    |    2 +
>  fs/ext4/mballoc.c |  194 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ext4/super.c   |    1 +
>  3 files changed, 197 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 19a4de5..2f96dc5 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1558,6 +1558,8 @@ extern int ext4_mb_add_groupinfo(struct super_block *sb,
>  extern int ext4_mb_get_buddy_cache_lock(struct super_block *, ext4_group_t);
>  extern void ext4_mb_put_buddy_cache_lock(struct super_block *,
>  						ext4_group_t, int);
> +extern int ext4_trim_fs(struct super_block *, uint64_t, uint64_t, uint64_t);
> +
>  /* inode.c */
>  struct buffer_head *ext4_getblk(handle_t *, struct inode *,
>  						ext4_lblk_t, int, int *);
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 0e83dfd..5210676 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4690,3 +4690,197 @@ error_return:
>  		kmem_cache_free(ext4_ac_cachep, ac);
>  	return;
>  }
> +
> +/**
> + * ext4_trim_extent -- function to TRIM one single free extent in the group
> + * @sb:		super block for the file system
> + * @start:	starting block of the free extent in the alloc. group
> + * @count:	number of blocks to TRIM
> + * @group:	alloc. group we are working with
> + * @e4b:	ext4 buddy for the group
> + *
> + * Trim "count" blocks starting at "start" in the "group". To assure that no
> + * one will allocate those blocks, mark it as used in buddy bitmap. This must
> + * be called with under the group lock.
> + */
> +static int ext4_trim_extent(struct super_block *sb, int start, int count,
> +		ext4_group_t group, struct ext4_buddy *e4b)
> +{
> +	ext4_fsblk_t discard_block;
> +	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> +	struct ext4_free_extent ex;
> +	int ret = 0;
> +
> +	assert_spin_locked(ext4_group_lock_ptr(sb, group));
> +
> +	ex.fe_start = start;
> +	ex.fe_group = group;
> +	ex.fe_len = count;
> +
> +	/*
> +	 * Mark blocks used, so no one can reuse them while
> +	 * being trimmed.
> +	 */
> +	mb_mark_used(e4b, &ex);
> +	ext4_unlock_group(sb, group);
> +
> +	discard_block = (ext4_fsblk_t)group *
> +			EXT4_BLOCKS_PER_GROUP(sb)
> +			+ start
> +			+ le32_to_cpu(es->s_first_data_block);
> +	trace_ext4_discard_blocks(sb,
> +			(unsigned long long)discard_block,
> +			count);
> +	ret = sb_issue_discard(sb, discard_block, count);
> +	if (ret == -EOPNOTSUPP) {
> +		ext4_warning(sb,
> +			"discard not supported!");
> +	} else if (ret < 0) {
> +		ext4_std_error(sb, ret);
> +	}
> +	cond_resched();
OOps seems i'm was wrong when suggested to put cond_resched() here.
It is useless because we probably already slept inside sb_issue_discard()
But it is reasonable to move it to ext4_trim_all_free()
because fs size may be huge so simple bitmap traverse will take long
enough, but almost fully populated so there are no extents longer than
minlen so discard would not being called.
BTW: Since Jan already noted about that in ext3's patch so ext3 version
is good.

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index d3a0763..00e6147 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4688,7 +4688,6 @@ static int ext4_trim_extent(struct super_block *sb, int start, int count,
 	} else if (ret < 0) {
 		ext4_std_error(sb, ret);
 	}
-	cond_resched();
 
 	ext4_lock_group(sb, group);
 	mb_free_blocks(NULL, e4b, start, ex.fe_len);
@@ -4745,6 +4744,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
 		}
 		start = next + 1;
 
+		if (need_resched()) {
+			ext4_unlock_group(sb, group);
+			cond_resched();
+			ext4_lock_group(sb, group);
+		}
 		if (signal_pending(current)) {
 			count = -ERESTARTSYS;
 			break;


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Batched discard support
  2010-08-06 11:31 [PATCH 0/3] Batched discard support Lukas Czerner
                   ` (2 preceding siblings ...)
  2010-08-06 11:31 ` [PATCH 3/3] Add batched discard support for ext4 Lukas Czerner
@ 2010-08-06 13:05 ` Dmitry Monakhov
  2010-08-06 13:49   ` Lukas Czerner
  3 siblings, 1 reply; 14+ messages in thread
From: Dmitry Monakhov @ 2010-08-06 13:05 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: linux-ext4, jmoyer, rwheeler, eshishki, sandeen, jack, tytso

Lukas Czerner <lczerner@redhat.com> writes:

> Hi, all
>
> because people were worried about possibly long stalls appearing
> when FITRIM ioctl is working, I have changed the FITRIM interface
> as Dimitry suggested. Now you can choose whether to trim whole
> file system or just a part of it, resp. you can specify the range
> of Bytes to trim.
Agree with whole patch-set, except minor note for ext4'th path.
Please feel free to add
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> to the series

The only thing what is still not obvious for me is that, there are
several types of discard request possible
1) Simple discard 
2) Secure discard which was proposed here http://lkml.org/lkml/2010/6/24/71
Should we specify which type should be used in ioctl flags?
But i hope that we can just stick maximum security scenario
Use secure discard if possible.
>
> To be specific you can create something like this:
>
> int main(int argc, char **argv)
> {
> 	int fd;
> 	uint64_t range[3];
>
> 	range[0] = 40960;
> 	range[1] = 134217728;
> 	range[2] = 4096;
>
> 	fd = open(argv[1], O_RDONLY);
> 	if (fd < 0) {
> 		perror("open");
> 		return 1;
> 	}
>
> 	if (ioctl(fd, FITRIM, range)) {
> 		if (errno == EOPNOTSUPP)
> 			fprintf(stderr, "FITRIM not supported\n");
> 		else
> 			perror("FITRIM");
> 		return 1;
> 	}
>
> 	return 0;
> }
>
> Range items have following meaning:
>
> range[0] - (start) first Byte to trim
> range[1] - (len) number of Bytes to trim from start
> range[2] - (minlen) minimum extent length to trim, free extents shorter
> than this number of Bytes will be ignored. This number will be rounded
> up to the block size.
>
> So in my example it will trim all free extents from block 10 of first
> alloc. group to block 10 of second alloc. group, assuming we have
> block_size = 4096.
>
> Also, when you want to trim the whole fs, you can simply pass NULL
> instead of range into the ioctl, or you can specify the range correctly
> to cover the whole fs.
>
> Regards
> -Lukas
>
> [PATCH 1/3] Add ioctl FITRIM.
> [PATCH 2/3] Add batched discard support for ext3
> [PATCH 3/3] Add batched discard support for ext4
>
>  fs/ext3/balloc.c        |  249 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ext3/super.c         |    1 +
>  fs/ext4/ext4.h          |    2 +
>  fs/ext4/mballoc.c       |  194 ++++++++++++++++++++++++++++++++++++
>  fs/ext4/super.c         |    1 +
>  fs/ioctl.c              |   34 +++++++
>  include/linux/ext3_fs.h |    1 +
>  include/linux/fs.h      |    2 +
>  8 files changed, 484 insertions(+), 0 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] Add batched discard support for ext4
  2010-08-06 13:03   ` Dmitry Monakhov
@ 2010-08-06 13:23     ` Lukas Czerner
  0 siblings, 0 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-08-06 13:23 UTC (permalink / raw)
  To: Dmitry Monakhov
  Cc: Lukas Czerner, linux-ext4, jmoyer, rwheeler, eshishki, sandeen,
	jack, tytso

On Fri, 6 Aug 2010, Dmitry Monakhov wrote:

> Lukas Czerner <lczerner@redhat.com> writes:
> 
> > Walk through allocation groups and trim all free extents. It can be
> > invoked through FITRIM ioctl on the file system. The main idea is to
> > provide a way to trim the whole file system if needed, since some SSD's
> > may suffer from performance loss after the whole device was filled (it
> > does not mean that fs is full!).
> >
> > It search for free extents in allocation groups specified by Byte range
> > start -> start+len. When the free extent is within this range, blocks
> > are marked as used and then trimmed. Afterwards these blocks are marked
> > as free in per-group bitmap.
> >
> > Since fstrim is a long operation it is good to have an ability to
> > interrupt it by a signal. This was added by Dmitry Monakhov.
> > Thanks Dimitry.
> >
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/ext4/ext4.h    |    2 +
> >  fs/ext4/mballoc.c |  194 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/ext4/super.c   |    1 +
> >  3 files changed, 197 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 19a4de5..2f96dc5 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -1558,6 +1558,8 @@ extern int ext4_mb_add_groupinfo(struct super_block *sb,
> >  extern int ext4_mb_get_buddy_cache_lock(struct super_block *, ext4_group_t);
> >  extern void ext4_mb_put_buddy_cache_lock(struct super_block *,
> >  						ext4_group_t, int);
> > +extern int ext4_trim_fs(struct super_block *, uint64_t, uint64_t, uint64_t);
> > +
> >  /* inode.c */
> >  struct buffer_head *ext4_getblk(handle_t *, struct inode *,
> >  						ext4_lblk_t, int, int *);
> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > index 0e83dfd..5210676 100644
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
> > @@ -4690,3 +4690,197 @@ error_return:
> >  		kmem_cache_free(ext4_ac_cachep, ac);
> >  	return;
> >  }
> > +
> > +/**
> > + * ext4_trim_extent -- function to TRIM one single free extent in the group
> > + * @sb:		super block for the file system
> > + * @start:	starting block of the free extent in the alloc. group
> > + * @count:	number of blocks to TRIM
> > + * @group:	alloc. group we are working with
> > + * @e4b:	ext4 buddy for the group
> > + *
> > + * Trim "count" blocks starting at "start" in the "group". To assure that no
> > + * one will allocate those blocks, mark it as used in buddy bitmap. This must
> > + * be called with under the group lock.
> > + */
> > +static int ext4_trim_extent(struct super_block *sb, int start, int count,
> > +		ext4_group_t group, struct ext4_buddy *e4b)
> > +{
> > +	ext4_fsblk_t discard_block;
> > +	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> > +	struct ext4_free_extent ex;
> > +	int ret = 0;
> > +
> > +	assert_spin_locked(ext4_group_lock_ptr(sb, group));
> > +
> > +	ex.fe_start = start;
> > +	ex.fe_group = group;
> > +	ex.fe_len = count;
> > +
> > +	/*
> > +	 * Mark blocks used, so no one can reuse them while
> > +	 * being trimmed.
> > +	 */
> > +	mb_mark_used(e4b, &ex);
> > +	ext4_unlock_group(sb, group);
> > +
> > +	discard_block = (ext4_fsblk_t)group *
> > +			EXT4_BLOCKS_PER_GROUP(sb)
> > +			+ start
> > +			+ le32_to_cpu(es->s_first_data_block);
> > +	trace_ext4_discard_blocks(sb,
> > +			(unsigned long long)discard_block,
> > +			count);
> > +	ret = sb_issue_discard(sb, discard_block, count);
> > +	if (ret == -EOPNOTSUPP) {
> > +		ext4_warning(sb,
> > +			"discard not supported!");
> > +	} else if (ret < 0) {
> > +		ext4_std_error(sb, ret);
> > +	}
> > +	cond_resched();
> OOps seems i'm was wrong when suggested to put cond_resched() here.
> It is useless because we probably already slept inside sb_issue_discard()
> But it is reasonable to move it to ext4_trim_all_free()
> because fs size may be huge so simple bitmap traverse will take long
> enough, but almost fully populated so there are no extents longer than
> minlen so discard would not being called.
> BTW: Since Jan already noted about that in ext3's patch so ext3 version
> is good.
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index d3a0763..00e6147 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4688,7 +4688,6 @@ static int ext4_trim_extent(struct super_block *sb, int start, int count,
>  	} else if (ret < 0) {
>  		ext4_std_error(sb, ret);
>  	}
> -	cond_resched();
>  
>  	ext4_lock_group(sb, group);
>  	mb_free_blocks(NULL, e4b, start, ex.fe_len);
> @@ -4745,6 +4744,11 @@ ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, struct ext4_buddy *e4b,
>  		}
>  		start = next + 1;
>  
> +		if (need_resched()) {
> +			ext4_unlock_group(sb, group);
> +			cond_resched();
> +			ext4_lock_group(sb, group);
> +		}
>  		if (signal_pending(current)) {
>  			count = -ERESTARTSYS;
>  			break;
> 
> 

Hmm, are you sure about this ? Wouldn't make more sense to put just
cond_resched() into the ext4_trim_fs loop, since there is no need to
check for need_resched() and lock/unlock the group. Do you expect the
group traversal to take that long ?

There are other examples of traversing whole group, but none of it is
trying to reschedule in the middle of the group.

Regards
-Lukas


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Batched discard support
  2010-08-06 13:05 ` [PATCH 0/3] Batched discard support Dmitry Monakhov
@ 2010-08-06 13:49   ` Lukas Czerner
  2010-08-06 14:24     ` Dmitry Monakhov
  2010-08-06 14:32     ` Lukas Czerner
  0 siblings, 2 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-08-06 13:49 UTC (permalink / raw)
  To: Dmitry Monakhov
  Cc: Lukas Czerner, linux-ext4, jmoyer, rwheeler, eshishki, sandeen,
	jack, tytso

On Fri, 6 Aug 2010, Dmitry Monakhov wrote:

> Lukas Czerner <lczerner@redhat.com> writes:
> 
> > Hi, all
> >
> > because people were worried about possibly long stalls appearing
> > when FITRIM ioctl is working, I have changed the FITRIM interface
> > as Dimitry suggested. Now you can choose whether to trim whole
> > file system or just a part of it, resp. you can specify the range
> > of Bytes to trim.
> Agree with whole patch-set, except minor note for ext4'th path.
> Please feel free to add
> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> to the series
> 
> The only thing what is still not obvious for me is that, there are
> several types of discard request possible
> 1) Simple discard 
> 2) Secure discard which was proposed here http://lkml.org/lkml/2010/6/24/71
> Should we specify which type should be used in ioctl flags?
> But i hope that we can just stick maximum security scenario
> Use secure discard if possible.

First of all, thanks for you review Dimitry. And second, to be honest I
am not entirely familiar with the Secure discard implementation. Right
now it just doing the simple discard like "send TRIM command", so it
does work just for devices which supports it. I suppose we can just
check blk_queue_discard() at some level and then decide whether to do
simple discard (TRIM), or secure discard "Write zeroes", when the device
does not support TRIM - if it is what you mean by secure discard.

Regards
-Lukas

> >
> > To be specific you can create something like this:
> >
> > int main(int argc, char **argv)
> > {
> > 	int fd;
> > 	uint64_t range[3];
> >
> > 	range[0] = 40960;
> > 	range[1] = 134217728;
> > 	range[2] = 4096;
> >
> > 	fd = open(argv[1], O_RDONLY);
> > 	if (fd < 0) {
> > 		perror("open");
> > 		return 1;
> > 	}
> >
> > 	if (ioctl(fd, FITRIM, range)) {
> > 		if (errno == EOPNOTSUPP)
> > 			fprintf(stderr, "FITRIM not supported\n");
> > 		else
> > 			perror("FITRIM");
> > 		return 1;
> > 	}
> >
> > 	return 0;
> > }
> >
> > Range items have following meaning:
> >
> > range[0] - (start) first Byte to trim
> > range[1] - (len) number of Bytes to trim from start
> > range[2] - (minlen) minimum extent length to trim, free extents shorter
> > than this number of Bytes will be ignored. This number will be rounded
> > up to the block size.
> >
> > So in my example it will trim all free extents from block 10 of first
> > alloc. group to block 10 of second alloc. group, assuming we have
> > block_size = 4096.
> >
> > Also, when you want to trim the whole fs, you can simply pass NULL
> > instead of range into the ioctl, or you can specify the range correctly
> > to cover the whole fs.
> >
> > Regards
> > -Lukas
> >
> > [PATCH 1/3] Add ioctl FITRIM.
> > [PATCH 2/3] Add batched discard support for ext3
> > [PATCH 3/3] Add batched discard support for ext4
> >
> >  fs/ext3/balloc.c        |  249 +++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/ext3/super.c         |    1 +
> >  fs/ext4/ext4.h          |    2 +
> >  fs/ext4/mballoc.c       |  194 ++++++++++++++++++++++++++++++++++++
> >  fs/ext4/super.c         |    1 +
> >  fs/ioctl.c              |   34 +++++++
> >  include/linux/ext3_fs.h |    1 +
> >  include/linux/fs.h      |    2 +
> >  8 files changed, 484 insertions(+), 0 deletions(-)
> 

-- 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Batched discard support
  2010-08-06 13:49   ` Lukas Czerner
@ 2010-08-06 14:24     ` Dmitry Monakhov
  2010-08-06 14:32     ` Lukas Czerner
  1 sibling, 0 replies; 14+ messages in thread
From: Dmitry Monakhov @ 2010-08-06 14:24 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: linux-ext4, jmoyer, rwheeler, eshishki, sandeen, jack, tytso

Lukas Czerner <lczerner@redhat.com> writes:

> On Fri, 6 Aug 2010, Dmitry Monakhov wrote:
>
>> Lukas Czerner <lczerner@redhat.com> writes:
>> 
>> > Hi, all
>> >
>> > because people were worried about possibly long stalls appearing
>> > when FITRIM ioctl is working, I have changed the FITRIM interface
>> > as Dimitry suggested. Now you can choose whether to trim whole
>> > file system or just a part of it, resp. you can specify the range
>> > of Bytes to trim.
>> Agree with whole patch-set, except minor note for ext4'th path.
>> Please feel free to add
>> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> to the series
>> 
>> The only thing what is still not obvious for me is that, there are
>> several types of discard request possible
>> 1) Simple discard 
>> 2) Secure discard which was proposed here http://lkml.org/lkml/2010/6/24/71
>> Should we specify which type should be used in ioctl flags?
>> But i hope that we can just stick maximum security scenario
>> Use secure discard if possible.
>
> First of all, thanks for you review Dimitry. And second, to be honest I
> am not entirely familiar with the Secure discard implementation. Right
> now it just doing the simple discard like "send TRIM command", so it
> does work just for devices which supports it. I suppose we can just
> check blk_queue_discard() at some level and then decide whether to do
> simple discard (TRIM), or secure discard "Write zeroes", when the device
> does not support TRIM - if it is what you mean by secure discard.
I'm also not quite familiar with SECDISCARD feature also :)
BUT as soon as i understand MMC cards has ability to discard
with zeroes, and destroy backup blocks in one discard request.
At the same time secure discard emulation with writing zeroes will
take too long (hours and days) so i don't see any reason for emulation.
We have SECRM flag for this purpose.

>
> Regards
> -Lukas
>
>> >
>> > To be specific you can create something like this:
>> >
>> > int main(int argc, char **argv)
>> > {
>> > 	int fd;
>> > 	uint64_t range[3];
>> >
>> > 	range[0] = 40960;
>> > 	range[1] = 134217728;
>> > 	range[2] = 4096;
>> >
>> > 	fd = open(argv[1], O_RDONLY);
>> > 	if (fd < 0) {
>> > 		perror("open");
>> > 		return 1;
>> > 	}
>> >
>> > 	if (ioctl(fd, FITRIM, range)) {
>> > 		if (errno == EOPNOTSUPP)
>> > 			fprintf(stderr, "FITRIM not supported\n");
>> > 		else
>> > 			perror("FITRIM");
>> > 		return 1;
>> > 	}
>> >
>> > 	return 0;
>> > }
>> >
>> > Range items have following meaning:
>> >
>> > range[0] - (start) first Byte to trim
>> > range[1] - (len) number of Bytes to trim from start
>> > range[2] - (minlen) minimum extent length to trim, free extents shorter
>> > than this number of Bytes will be ignored. This number will be rounded
>> > up to the block size.
>> >
>> > So in my example it will trim all free extents from block 10 of first
>> > alloc. group to block 10 of second alloc. group, assuming we have
>> > block_size = 4096.
>> >
>> > Also, when you want to trim the whole fs, you can simply pass NULL
>> > instead of range into the ioctl, or you can specify the range correctly
>> > to cover the whole fs.
>> >
>> > Regards
>> > -Lukas
>> >
>> > [PATCH 1/3] Add ioctl FITRIM.
>> > [PATCH 2/3] Add batched discard support for ext3
>> > [PATCH 3/3] Add batched discard support for ext4
>> >
>> >  fs/ext3/balloc.c        |  249 +++++++++++++++++++++++++++++++++++++++++++++++
>> >  fs/ext3/super.c         |    1 +
>> >  fs/ext4/ext4.h          |    2 +
>> >  fs/ext4/mballoc.c       |  194 ++++++++++++++++++++++++++++++++++++
>> >  fs/ext4/super.c         |    1 +
>> >  fs/ioctl.c              |   34 +++++++
>> >  include/linux/ext3_fs.h |    1 +
>> >  include/linux/fs.h      |    2 +
>> >  8 files changed, 484 insertions(+), 0 deletions(-)
>> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Batched discard support
  2010-08-06 13:49   ` Lukas Czerner
  2010-08-06 14:24     ` Dmitry Monakhov
@ 2010-08-06 14:32     ` Lukas Czerner
  2010-08-06 15:02       ` Dmitry Monakhov
  1 sibling, 1 reply; 14+ messages in thread
From: Lukas Czerner @ 2010-08-06 14:32 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: Dmitry Monakhov, linux-ext4, jmoyer, rwheeler, eshishki, sandeen,
	jack, tytso

On Fri, 6 Aug 2010, Lukas Czerner wrote:

> On Fri, 6 Aug 2010, Dmitry Monakhov wrote:
> 
> > Lukas Czerner <lczerner@redhat.com> writes:
> > 
> > > Hi, all
> > >
> > > because people were worried about possibly long stalls appearing
> > > when FITRIM ioctl is working, I have changed the FITRIM interface
> > > as Dimitry suggested. Now you can choose whether to trim whole
> > > file system or just a part of it, resp. you can specify the range
> > > of Bytes to trim.
> > Agree with whole patch-set, except minor note for ext4'th path.
> > Please feel free to add
> > Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> to the series
> > 
> > The only thing what is still not obvious for me is that, there are
> > several types of discard request possible
> > 1) Simple discard 
> > 2) Secure discard which was proposed here http://lkml.org/lkml/2010/6/24/71
> > Should we specify which type should be used in ioctl flags?
> > But i hope that we can just stick maximum security scenario
> > Use secure discard if possible.
> 
> First of all, thanks for you review Dimitry. And second, to be honest I
> am not entirely familiar with the Secure discard implementation. Right
> now it just doing the simple discard like "send TRIM command", so it
> does work just for devices which supports it. I suppose we can just
> check blk_queue_discard() at some level and then decide whether to do
> simple discard (TRIM), or secure discard "Write zeroes", when the device
> does not support TRIM - if it is what you mean by secure discard.
> 
> Regards
> -Lukas
> 

When I am thinking about this, it may not be a bad idea to create a
completely new ioctl for this purpose of "zeroing all free space". We
do the trimming for completely different reasons, and the "secure"
thing is just an side effect, so we probably should not mix it together.

The new ioclt (FISECER ?) and FITRIM can use the same infrastructure in
ext3/4, but we should add a flag to distinguish what we need to do -
TRIM or secure erase. What do you think ?

Regards
-lukas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Batched discard support
  2010-08-06 14:32     ` Lukas Czerner
@ 2010-08-06 15:02       ` Dmitry Monakhov
  2010-08-10 11:31         ` Lukas Czerner
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Monakhov @ 2010-08-06 15:02 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: linux-ext4, jmoyer, rwheeler, eshishki, sandeen, jack, tytso

Lukas Czerner <lczerner@redhat.com> writes:

> On Fri, 6 Aug 2010, Lukas Czerner wrote:
>
>> On Fri, 6 Aug 2010, Dmitry Monakhov wrote:
>> 
>> > Lukas Czerner <lczerner@redhat.com> writes:
>> > 
>> > > Hi, all
>> > >
>> > > because people were worried about possibly long stalls appearing
>> > > when FITRIM ioctl is working, I have changed the FITRIM interface
>> > > as Dimitry suggested. Now you can choose whether to trim whole
>> > > file system or just a part of it, resp. you can specify the range
>> > > of Bytes to trim.
>> > Agree with whole patch-set, except minor note for ext4'th path.
>> > Please feel free to add
>> > Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> to the series
>> > 
>> > The only thing what is still not obvious for me is that, there are
>> > several types of discard request possible
>> > 1) Simple discard 
>> > 2) Secure discard which was proposed here http://lkml.org/lkml/2010/6/24/71
>> > Should we specify which type should be used in ioctl flags?
>> > But i hope that we can just stick maximum security scenario
>> > Use secure discard if possible.
>> 
>> First of all, thanks for you review Dimitry. And second, to be honest I
>> am not entirely familiar with the Secure discard implementation. Right
>> now it just doing the simple discard like "send TRIM command", so it
>> does work just for devices which supports it. I suppose we can just
>> check blk_queue_discard() at some level and then decide whether to do
>> simple discard (TRIM), or secure discard "Write zeroes", when the device
>> does not support TRIM - if it is what you mean by secure discard.
>> 
>> Regards
>> -Lukas
>> 
>
> When I am thinking about this, it may not be a bad idea to create a
> completely new ioctl for this purpose of "zeroing all free space". We
> do the trimming for completely different reasons, and the "secure"
Actually you may be right here.
For example it is usual to give some one an usb stick, 
and always assumes what USB stick is WhatYouSeeIsWhatYouGet storage.
but this is obviously not true, a man now has full access to that
device, so  stale data is almost transparently available.
Off course i can use SECRM but it has runtime overhead.
So i can easily call SECDISCARD (even in emulation mode) before umount
in order to be on safe side and then share my USB stick without any
fears.

> thing is just an side effect, so we probably should not mix it together.
>
> The new ioclt (FISECER ?) and FITRIM can use the same infrastructure in
> ext3/4, but we should add a flag to distinguish what we need to do -
> TRIM or secure erase. What do you think ?
Or we can just add a behavior flags filed
DISCARD    :will works only for SDD and return ENOTSUPP for others
SECURE_DEL :will guarantee that data will be zeroed on success.

DISCARD ->    (simple discard) send discard requests
SECURE_DEL -> (simple emulation) write free space with zeroes
(DISCARD|SECURE_DEL) -> send discards request with secure flag enabled.
>
> Regards
> -lukas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] Add batched discard support for ext4
  2010-08-06 11:31 ` [PATCH 3/3] Add batched discard support for ext4 Lukas Czerner
  2010-08-06 13:03   ` Dmitry Monakhov
@ 2010-08-07 22:25   ` Jan Kara
  2010-08-10 11:32     ` Lukas Czerner
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Kara @ 2010-08-07 22:25 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: linux-ext4, dmonakhov, jmoyer, rwheeler, eshishki, sandeen, jack, tytso

On Fri 06-08-10 13:31:16, Lukas Czerner wrote:
...
> +/**
> + * ext4_trim_fs() -- trim ioctl handle function
> + * @sb:			superblock for filesystem
> + * @start:		First Byte to trim
> + * @len:		number of Bytes to trim from start
> + * @minlen:		minimum extent length in Bytes
> + *
> + * ext4_trim_fs goes through all allocation groups containing Bytes from
> + * start to start+len. For each such a group ext4_trim_all_free function
> + * is invoked to trim all free space.
> + */
> +int ext4_trim_fs(struct super_block *sb, uint64_t start, uint64_t len,
> +		uint64_t minlen)
> +{
> +	struct ext4_buddy e4b;
> +	ext4_fsblk_t first_group, last_group;
> +	ext4_group_t group, ngroups = ext4_get_groups_count(sb);
> +	ext4_grpblk_t cnt, first_block, last_block;
> +	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> +	int ret = 0;
> +
> +	start >>= sb->s_blocksize_bits;
> +	len >>= sb->s_blocksize_bits;
> +	minlen >>= sb->s_blocksize_bits;
> +
> +	if (unlikely(minlen > EXT4_BLOCKS_PER_GROUP(sb)))
> +		return -EINVAL;
> +
> +	/* Determine first and last group to examine based on start and len */
> +	first_group = (start - le32_to_cpu(es->s_first_data_block)) /
> +		      EXT4_BLOCKS_PER_GROUP(sb);
> +	last_group = (start + len - le32_to_cpu(es->s_first_data_block)) /
> +		     EXT4_BLOCKS_PER_GROUP(sb);
> +	last_group = (last_group > ngroups - 1) ? ngroups - 1 : last_group;
> +
> +	if (first_group > last_group)
> +		return -EINVAL;
> +
> +	first_block = start % EXT4_BLOCKS_PER_GROUP(sb);
  I think you should substract "le32_to_cpu(es->s_first_data_block)" from
the start before doing modulo.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Batched discard support
  2010-08-06 15:02       ` Dmitry Monakhov
@ 2010-08-10 11:31         ` Lukas Czerner
  0 siblings, 0 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-08-10 11:31 UTC (permalink / raw)
  To: Dmitry Monakhov
  Cc: Lukas Czerner, linux-ext4, jmoyer, rwheeler, eshishki, sandeen,
	jack, tytso

On Fri, 6 Aug 2010, Dmitry Monakhov wrote:

> Lukas Czerner <lczerner@redhat.com> writes:
> 
> > On Fri, 6 Aug 2010, Lukas Czerner wrote:
> >
> >> On Fri, 6 Aug 2010, Dmitry Monakhov wrote:
> >> 
> >> > Lukas Czerner <lczerner@redhat.com> writes:
> >> > 
> >> > > Hi, all
> >> > >
> >> > > because people were worried about possibly long stalls appearing
> >> > > when FITRIM ioctl is working, I have changed the FITRIM interface
> >> > > as Dimitry suggested. Now you can choose whether to trim whole
> >> > > file system or just a part of it, resp. you can specify the range
> >> > > of Bytes to trim.
> >> > Agree with whole patch-set, except minor note for ext4'th path.
> >> > Please feel free to add
> >> > Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> to the series
> >> > 
> >> > The only thing what is still not obvious for me is that, there are
> >> > several types of discard request possible
> >> > 1) Simple discard 
> >> > 2) Secure discard which was proposed here http://lkml.org/lkml/2010/6/24/71
> >> > Should we specify which type should be used in ioctl flags?
> >> > But i hope that we can just stick maximum security scenario
> >> > Use secure discard if possible.
> >> 
> >> First of all, thanks for you review Dimitry. And second, to be honest I
> >> am not entirely familiar with the Secure discard implementation. Right
> >> now it just doing the simple discard like "send TRIM command", so it
> >> does work just for devices which supports it. I suppose we can just
> >> check blk_queue_discard() at some level and then decide whether to do
> >> simple discard (TRIM), or secure discard "Write zeroes", when the device
> >> does not support TRIM - if it is what you mean by secure discard.
> >> 
> >> Regards
> >> -Lukas
> >> 
> >
> > When I am thinking about this, it may not be a bad idea to create a
> > completely new ioctl for this purpose of "zeroing all free space". We
> > do the trimming for completely different reasons, and the "secure"
> Actually you may be right here.
> For example it is usual to give some one an usb stick, 
> and always assumes what USB stick is WhatYouSeeIsWhatYouGet storage.
> but this is obviously not true, a man now has full access to that
> device, so  stale data is almost transparently available.
> Off course i can use SECRM but it has runtime overhead.
> So i can easily call SECDISCARD (even in emulation mode) before umount
> in order to be on safe side and then share my USB stick without any
> fears.
> 
> > thing is just an side effect, so we probably should not mix it together.
> >
> > The new ioclt (FISECER ?) and FITRIM can use the same infrastructure in
> > ext3/4, but we should add a flag to distinguish what we need to do -
> > TRIM or secure erase. What do you think ?
> Or we can just add a behavior flags filed
> DISCARD    :will works only for SDD and return ENOTSUPP for others
> SECURE_DEL :will guarantee that data will be zeroed on success.
> 
> DISCARD ->    (simple discard) send discard requests
> SECURE_DEL -> (simple emulation) write free space with zeroes
> (DISCARD|SECURE_DEL) -> send discards request with secure flag enabled.

Ok, so lets finish this simple discard thing first and then, when there
will be a "real" interest in this we can do something about it.

Thanks, Dimitry.
-Lukas

> >
> > Regards
> > -lukas
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] Add batched discard support for ext4
  2010-08-07 22:25   ` Jan Kara
@ 2010-08-10 11:32     ` Lukas Czerner
  0 siblings, 0 replies; 14+ messages in thread
From: Lukas Czerner @ 2010-08-10 11:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: Lukas Czerner, linux-ext4, dmonakhov, jmoyer, rwheeler, eshishki,
	sandeen, tytso

On Sun, 8 Aug 2010, Jan Kara wrote:

> On Fri 06-08-10 13:31:16, Lukas Czerner wrote:
> ...
> > +/**
> > + * ext4_trim_fs() -- trim ioctl handle function
> > + * @sb:			superblock for filesystem
> > + * @start:		First Byte to trim
> > + * @len:		number of Bytes to trim from start
> > + * @minlen:		minimum extent length in Bytes
> > + *
> > + * ext4_trim_fs goes through all allocation groups containing Bytes from
> > + * start to start+len. For each such a group ext4_trim_all_free function
> > + * is invoked to trim all free space.
> > + */
> > +int ext4_trim_fs(struct super_block *sb, uint64_t start, uint64_t len,
> > +		uint64_t minlen)
> > +{
> > +	struct ext4_buddy e4b;
> > +	ext4_fsblk_t first_group, last_group;
> > +	ext4_group_t group, ngroups = ext4_get_groups_count(sb);
> > +	ext4_grpblk_t cnt, first_block, last_block;
> > +	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> > +	int ret = 0;
> > +
> > +	start >>= sb->s_blocksize_bits;
> > +	len >>= sb->s_blocksize_bits;
> > +	minlen >>= sb->s_blocksize_bits;
> > +
> > +	if (unlikely(minlen > EXT4_BLOCKS_PER_GROUP(sb)))
> > +		return -EINVAL;
> > +
> > +	/* Determine first and last group to examine based on start and len */
> > +	first_group = (start - le32_to_cpu(es->s_first_data_block)) /
> > +		      EXT4_BLOCKS_PER_GROUP(sb);
> > +	last_group = (start + len - le32_to_cpu(es->s_first_data_block)) /
> > +		     EXT4_BLOCKS_PER_GROUP(sb);
> > +	last_group = (last_group > ngroups - 1) ? ngroups - 1 : last_group;
> > +
> > +	if (first_group > last_group)
> > +		return -EINVAL;
> > +
> > +	first_block = start % EXT4_BLOCKS_PER_GROUP(sb);
>   I think you should substract "le32_to_cpu(es->s_first_data_block)" from
> the start before doing modulo.
> 
> 									Honza

Oh, right thanks.

-Lukas

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-08-10 11:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-06 11:31 [PATCH 0/3] Batched discard support Lukas Czerner
2010-08-06 11:31 ` [PATCH 1/3] Add ioctl FITRIM Lukas Czerner
2010-08-06 11:31 ` [PATCH 2/3] Add batched discard support for ext3 Lukas Czerner
2010-08-06 11:31 ` [PATCH 3/3] Add batched discard support for ext4 Lukas Czerner
2010-08-06 13:03   ` Dmitry Monakhov
2010-08-06 13:23     ` Lukas Czerner
2010-08-07 22:25   ` Jan Kara
2010-08-10 11:32     ` Lukas Czerner
2010-08-06 13:05 ` [PATCH 0/3] Batched discard support Dmitry Monakhov
2010-08-06 13:49   ` Lukas Czerner
2010-08-06 14:24     ` Dmitry Monakhov
2010-08-06 14:32     ` Lukas Czerner
2010-08-06 15:02       ` Dmitry Monakhov
2010-08-10 11:31         ` Lukas Czerner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.