linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 0/4] Btrfs: batched discard support for btrfs
@ 2011-03-24 10:24 Li Dongyang
  2011-03-24 10:24 ` [PATCH V4 1/4] Btrfs: make update_reserved_bytes() public Li Dongyang
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Li Dongyang @ 2011-03-24 10:24 UTC (permalink / raw)
  To: linux-btrfs

Dear list,
This is V4 of batched discard support, now we will get full mapping of
the free space on each device for RAID0/1/10/DUP instead of just a single
stripe length, and tested with xfsstests 251, Thanks.
Changelog V4:
    *make btrfs_map_block() return full mapping.
Changelog V3:
    *fix style problems.
    *rebase to 2.6.38-rc7.
Changelog V2:
    *Check if we have devices support trim before trying to trim the fs, also adjust
      minlen according to the discard_granularity.
    *Update reserved extent calculations in btrfs_trim_block_group().
    *Call cond_resched() without checking need_resched()
    *Use bitmap_clear_bits() and unlink_free_space() instead of btrfs_remove_free_space(),
      so we won't search the same extent for twice.
    *Try harder in btrfs_discard_extent(), now we won't report errors
     if it's not a EOPNOTSUPP.
    *make sure the block group is cached before trimming it,or we'll see an empty caching
     tree if the block group is not cached.
    *Minor return value fix in btrfs_discard_block_group().

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V4 1/4] Btrfs: make update_reserved_bytes() public
  2011-03-24 10:24 [PATCH V4 0/4] Btrfs: batched discard support for btrfs Li Dongyang
@ 2011-03-24 10:24 ` Li Dongyang
  2011-03-24 10:24 ` [PATCH V4 2/4] Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP Li Dongyang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Li Dongyang @ 2011-03-24 10:24 UTC (permalink / raw)
  To: linux-btrfs

Make the function public as we should update the reserved extents calculations
after taking out an extent for trimming.

Signed-off-by: Li Dongyang <lidongyang@novell.com>
---
 fs/btrfs/ctree.h        |    2 ++
 fs/btrfs/extent-tree.c  |   16 +++++++---------
 2 files changed, 9 insertions(+), 9 deletions(-)
 create mode 100644 fs/btrfs/Module.symvers

diff --git a/fs/btrfs/Module.symvers b/fs/btrfs/Module.symvers
new file mode 100644
index 0000000..e69de29
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7f78cc7..2c84551 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2157,6 +2157,8 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
 		      u64 root_objectid, u64 owner, u64 offset);
 
 int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len);
+int btrfs_update_reserved_bytes(struct btrfs_block_group_cache *cache,
+				u64 num_bytes, int reserve, int sinfo);
 int btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans,
 				struct btrfs_root *root);
 int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7b3089b..caa4254 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -36,8 +36,6 @@
 static int update_block_group(struct btrfs_trans_handle *trans,
 			      struct btrfs_root *root,
 			      u64 bytenr, u64 num_bytes, int alloc);
-static int update_reserved_bytes(struct btrfs_block_group_cache *cache,
-				 u64 num_bytes, int reserve, int sinfo);
 static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 				struct btrfs_root *root,
 				u64 bytenr, u64 num_bytes, u64 parent,
@@ -4223,8 +4221,8 @@ int btrfs_pin_extent(struct btrfs_root *root,
  * update size of reserved extents. this function may return -EAGAIN
  * if 'reserve' is true or 'sinfo' is false.
  */
-static int update_reserved_bytes(struct btrfs_block_group_cache *cache,
-				 u64 num_bytes, int reserve, int sinfo)
+int btrfs_update_reserved_bytes(struct btrfs_block_group_cache *cache,
+				u64 num_bytes, int reserve, int sinfo)
 {
 	int ret = 0;
 	if (sinfo) {
@@ -4704,10 +4702,10 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 		WARN_ON(test_bit(EXTENT_BUFFER_DIRTY, &buf->bflags));
 
 		btrfs_add_free_space(cache, buf->start, buf->len);
-		ret = update_reserved_bytes(cache, buf->len, 0, 0);
+		ret = btrfs_update_reserved_bytes(cache, buf->len, 0, 0);
 		if (ret == -EAGAIN) {
 			/* block group became read-only */
-			update_reserved_bytes(cache, buf->len, 0, 1);
+			btrfs_update_reserved_bytes(cache, buf->len, 0, 1);
 			goto out;
 		}
 
@@ -5191,7 +5189,7 @@ checks:
 					     search_start - offset);
 		BUG_ON(offset > search_start);
 
-		ret = update_reserved_bytes(block_group, num_bytes, 1,
+		ret = btrfs_update_reserved_bytes(block_group, num_bytes, 1,
 					    (data & BTRFS_BLOCK_GROUP_DATA));
 		if (ret == -EAGAIN) {
 			btrfs_add_free_space(block_group, offset, num_bytes);
@@ -5415,7 +5413,7 @@ int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len)
 	ret = btrfs_discard_extent(root, start, len);
 
 	btrfs_add_free_space(cache, start, len);
-	update_reserved_bytes(cache, len, 0, 1);
+	btrfs_update_reserved_bytes(cache, len, 0, 1);
 	btrfs_put_block_group(cache);
 
 	return ret;
@@ -5614,7 +5612,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
 		put_caching_control(caching_ctl);
 	}
 
-	ret = update_reserved_bytes(block_group, ins->offset, 1, 1);
+	ret = btrfs_update_reserved_bytes(block_group, ins->offset, 1, 1);
 	BUG_ON(ret);
 	btrfs_put_block_group(block_group);
 	ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V4 2/4] Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP
  2011-03-24 10:24 [PATCH V4 0/4] Btrfs: batched discard support for btrfs Li Dongyang
  2011-03-24 10:24 ` [PATCH V4 1/4] Btrfs: make update_reserved_bytes() public Li Dongyang
@ 2011-03-24 10:24 ` Li Dongyang
  2011-03-24 10:24 ` [PATCH V4 3/4] Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes Li Dongyang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Li Dongyang @ 2011-03-24 10:24 UTC (permalink / raw)
  To: linux-btrfs

btrfs_map_block() will only return a single stripe length, but we want the
full extent be mapped to each disk when we are trimming the extent,
so we add length to btrfs_bio_stripe and fill it if we are mapping for REQ_DISCARD.

Signed-off-by: Li Dongyang <lidongyang@novell.com>
---
 fs/btrfs/volumes.c |  150 ++++++++++++++++++++++++++++++++++++++++++++--------
 fs/btrfs/volumes.h |    1 +
 2 files changed, 129 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index dd13eb8..e81cce6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2962,7 +2962,10 @@ static int __btrfs_map_block(struct btrfs_mapping_tree *map_tree, int rw,
 	struct extent_map_tree *em_tree = &map_tree->map_tree;
 	u64 offset;
 	u64 stripe_offset;
+	u64 stripe_end_offset;
 	u64 stripe_nr;
+	u64 stripe_nr_orig;
+	u64 stripe_nr_end;
 	int stripes_allocated = 8;
 	int stripes_required = 1;
 	int stripe_index;
@@ -2971,7 +2974,7 @@ static int __btrfs_map_block(struct btrfs_mapping_tree *map_tree, int rw,
 	int max_errors = 0;
 	struct btrfs_multi_bio *multi = NULL;
 
-	if (multi_ret && !(rw & REQ_WRITE))
+	if (multi_ret && !(rw & (REQ_WRITE | REQ_DISCARD)))
 		stripes_allocated = 1;
 again:
 	if (multi_ret) {
@@ -3017,7 +3020,15 @@ again:
 			max_errors = 1;
 		}
 	}
-	if (multi_ret && (rw & REQ_WRITE) &&
+	if (rw & REQ_DISCARD) {
+		if (map->type & (BTRFS_BLOCK_GROUP_RAID0 |
+				 BTRFS_BLOCK_GROUP_RAID1 |
+				 BTRFS_BLOCK_GROUP_DUP |
+				 BTRFS_BLOCK_GROUP_RAID10)) {
+			stripes_required = map->num_stripes;
+		}
+	}
+	if (multi_ret && (rw & (REQ_WRITE | REQ_DISCARD)) &&
 	    stripes_allocated < stripes_required) {
 		stripes_allocated = map->num_stripes;
 		free_extent_map(em);
@@ -3037,12 +3048,15 @@ again:
 	/* stripe_offset is the offset of this block in its stripe*/
 	stripe_offset = offset - stripe_offset;
 
-	if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 |
-			 BTRFS_BLOCK_GROUP_RAID10 |
-			 BTRFS_BLOCK_GROUP_DUP)) {
+	if (rw & REQ_DISCARD)
+		*length = min_t(u64, em->len - offset, *length);
+	else if (map->type & (BTRFS_BLOCK_GROUP_RAID0 |
+			      BTRFS_BLOCK_GROUP_RAID1 |
+			      BTRFS_BLOCK_GROUP_RAID10 |
+			      BTRFS_BLOCK_GROUP_DUP)) {
 		/* we limit the length of each bio to what fits in a stripe */
 		*length = min_t(u64, em->len - offset,
-			      map->stripe_len - stripe_offset);
+				map->stripe_len - stripe_offset);
 	} else {
 		*length = em->len - offset;
 	}
@@ -3052,8 +3066,19 @@ again:
 
 	num_stripes = 1;
 	stripe_index = 0;
-	if (map->type & BTRFS_BLOCK_GROUP_RAID1) {
-		if (unplug_page || (rw & REQ_WRITE))
+	stripe_nr_orig = stripe_nr;
+	stripe_nr_end = (offset + *length + map->stripe_len - 1) &
+			(~(map->stripe_len - 1));
+	do_div(stripe_nr_end, map->stripe_len);
+	stripe_end_offset = stripe_nr_end * map->stripe_len -
+			    (offset + *length);
+	if (map->type & BTRFS_BLOCK_GROUP_RAID0) {
+		if (rw & REQ_DISCARD)
+			num_stripes = min_t(u64, map->num_stripes,
+					    stripe_nr_end - stripe_nr_orig);
+		stripe_index = do_div(stripe_nr, map->num_stripes);
+	} else if (map->type & BTRFS_BLOCK_GROUP_RAID1) {
+		if (unplug_page || (rw & (REQ_WRITE | REQ_DISCARD)))
 			num_stripes = map->num_stripes;
 		else if (mirror_num)
 			stripe_index = mirror_num - 1;
@@ -3064,7 +3089,7 @@ again:
 		}
 
 	} else if (map->type & BTRFS_BLOCK_GROUP_DUP) {
-		if (rw & REQ_WRITE)
+		if (rw & (REQ_WRITE | REQ_DISCARD))
 			num_stripes = map->num_stripes;
 		else if (mirror_num)
 			stripe_index = mirror_num - 1;
@@ -3077,6 +3102,10 @@ again:
 
 		if (unplug_page || (rw & REQ_WRITE))
 			num_stripes = map->sub_stripes;
+		else if (rw & REQ_DISCARD)
+			num_stripes = min_t(u64, map->sub_stripes *
+					    (stripe_nr_end - stripe_nr_orig),
+					    map->num_stripes);
 		else if (mirror_num)
 			stripe_index += mirror_num - 1;
 		else {
@@ -3094,24 +3123,101 @@ again:
 	}
 	BUG_ON(stripe_index >= map->num_stripes);
 
-	for (i = 0; i < num_stripes; i++) {
-		if (unplug_page) {
-			struct btrfs_device *device;
-			struct backing_dev_info *bdi;
-
-			device = map->stripes[stripe_index].dev;
-			if (device->bdev) {
-				bdi = blk_get_backing_dev_info(device->bdev);
-				if (bdi->unplug_io_fn)
-					bdi->unplug_io_fn(bdi, unplug_page);
-			}
-		} else {
+	if (rw & REQ_DISCARD) {
+		for (i = 0; i < num_stripes; i++) {
 			multi->stripes[i].physical =
 				map->stripes[stripe_index].physical +
 				stripe_offset + stripe_nr * map->stripe_len;
 			multi->stripes[i].dev = map->stripes[stripe_index].dev;
+
+			if (map->type & BTRFS_BLOCK_GROUP_RAID0) {
+				u64 stripes;
+				int last_stripe = (stripe_nr_end - 1) %
+					map->num_stripes;
+				int j;
+
+				for (j = 0; j < map->num_stripes; j++) {
+					if ((stripe_nr_end - 1 - j) %
+					      map->num_stripes == stripe_index)
+						break;
+				}
+				stripes = stripe_nr_end - 1 - j;
+				do_div(stripes, map->num_stripes);
+				multi->stripes[i].length = map->stripe_len *
+					(stripes - stripe_nr + 1);
+
+				if (i == 0) {
+					multi->stripes[i].length -=
+						stripe_offset;
+					stripe_offset = 0;
+				}
+				if (stripe_index == last_stripe)
+					multi->stripes[i].length -=
+						stripe_end_offset;
+			} else if (map->type & BTRFS_BLOCK_GROUP_RAID10) {
+				u64 stripes;
+				int j;
+				int factor = map->num_stripes /
+					     map->sub_stripes;
+				int last_stripe = (stripe_nr_end - 1) % factor;
+				last_stripe *= map->sub_stripes;
+
+				for (j = 0; j < factor; j++) {
+					if ((stripe_nr_end - 1 - j) % factor ==
+					    stripe_index / map->sub_stripes)
+						break;
+				}
+				stripes = stripe_nr_end - 1 - j;
+				do_div(stripes, factor);
+				multi->stripes[i].length = map->stripe_len *
+					(stripes - stripe_nr + 1);
+
+				if (i < map->sub_stripes) {
+					multi->stripes[i].length -=
+						stripe_offset;
+					if (i == map->sub_stripes - 1)
+						stripe_offset = 0;
+				}
+				if (stripe_index >= last_stripe &&
+				    stripe_index <= (last_stripe +
+						     map->sub_stripes - 1)) {
+					multi->stripes[i].length -=
+						stripe_end_offset;
+				}
+			} else
+				multi->stripes[i].length = *length;
+
+			stripe_index++;
+			if (stripe_index == map->num_stripes) {
+				/* This could only happen for RAID0/10 */
+				stripe_index = 0;
+				stripe_nr++;
+			}
+		}
+	} else {
+		for (i = 0; i < num_stripes; i++) {
+			if (unplug_page) {
+				struct btrfs_device *device;
+				struct backing_dev_info *bdi;
+
+				device = map->stripes[stripe_index].dev;
+				if (device->bdev) {
+					bdi = blk_get_backing_dev_info(device->
+								       bdev);
+					if (bdi->unplug_io_fn)
+						bdi->unplug_io_fn(bdi,
+								  unplug_page);
+				}
+			} else {
+				multi->stripes[i].physical =
+					map->stripes[stripe_index].physical +
+					stripe_offset +
+					stripe_nr * map->stripe_len;
+				multi->stripes[i].dev =
+					map->stripes[stripe_index].dev;
+			}
+			stripe_index++;
 		}
-		stripe_index++;
 	}
 	if (multi_ret) {
 		*multi_ret = multi;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 7fb59d4..5ae2569 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -126,6 +126,7 @@ struct btrfs_fs_devices {
 struct btrfs_bio_stripe {
 	struct btrfs_device *dev;
 	u64 physical;
+	u64 length; /* only used for discard mappings */
 };
 
 struct btrfs_multi_bio {
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V4 3/4] Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes
  2011-03-24 10:24 [PATCH V4 0/4] Btrfs: batched discard support for btrfs Li Dongyang
  2011-03-24 10:24 ` [PATCH V4 1/4] Btrfs: make update_reserved_bytes() public Li Dongyang
  2011-03-24 10:24 ` [PATCH V4 2/4] Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP Li Dongyang
@ 2011-03-24 10:24 ` Li Dongyang
  2011-03-24 10:24 ` [PATCH V4 4/4] Btrfs: add btrfs_trim_fs() to handle FITRIM Li Dongyang
  2011-03-27 18:10 ` [PATCH V4 0/4] Btrfs: batched discard support for btrfs Chris Mason
  4 siblings, 0 replies; 9+ messages in thread
From: Li Dongyang @ 2011-03-24 10:24 UTC (permalink / raw)
  To: linux-btrfs

Callers of btrfs_discard_extent() should check if we are mounted with -o discard,
as we want to make fitrim to work even the fs is not mounted with -o discard.
Also we should use REQ_DISCARD to map the free extent to get a full mapping,
last we only return errors if
1. the error is not a EOPNOTSUPP
2. no device supports discard

Signed-off-by: Li Dongyang <lidongyang@novell.com>
---
 fs/btrfs/ctree.h       |    2 +-
 fs/btrfs/disk-io.c     |    5 ++++-
 fs/btrfs/extent-tree.c |   45 ++++++++++++++++++++++++++-------------------
 3 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2c84551..94bb772 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2229,7 +2229,7 @@ u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo);
 int btrfs_error_unpin_extent_range(struct btrfs_root *root,
 				   u64 start, u64 end);
 int btrfs_error_discard_extent(struct btrfs_root *root, u64 bytenr,
-			       u64 num_bytes);
+			       u64 num_bytes, u64 *actual_bytes);
 int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans,
 			    struct btrfs_root *root, u64 type);
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 100b07f..98b60b0 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2947,7 +2947,10 @@ static int btrfs_destroy_pinned_extent(struct btrfs_root *root,
 			break;
 
 		/* opt_discard */
-		ret = btrfs_error_discard_extent(root, start, end + 1 - start);
+		if (btrfs_test_opt(root, DISCARD))
+			ret = btrfs_error_discard_extent(root, start,
+							 end + 1 - start,
+							 NULL);
 
 		clear_extent_dirty(unpin, start, end, GFP_NOFS);
 		btrfs_error_unpin_extent_range(root, start, end);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index caa4254..10e542a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1738,40 +1738,44 @@ static int remove_extent_backref(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-static void btrfs_issue_discard(struct block_device *bdev,
+static int btrfs_issue_discard(struct block_device *bdev,
 				u64 start, u64 len)
 {
-	blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_KERNEL, 0);
+	return blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_KERNEL, 0);
 }
 
 static int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
-				u64 num_bytes)
+				u64 num_bytes, u64 *actual_bytes)
 {
 	int ret;
-	u64 map_length = num_bytes;
+	u64 discarded_bytes = 0;
 	struct btrfs_multi_bio *multi = NULL;
 
-	if (!btrfs_test_opt(root, DISCARD))
-		return 0;
-
 	/* Tell the block device(s) that the sectors can be discarded */
-	ret = btrfs_map_block(&root->fs_info->mapping_tree, READ,
-			      bytenr, &map_length, &multi, 0);
+	ret = btrfs_map_block(&root->fs_info->mapping_tree, REQ_DISCARD,
+			      bytenr, &num_bytes, &multi, 0);
 	if (!ret) {
 		struct btrfs_bio_stripe *stripe = multi->stripes;
 		int i;
 
-		if (map_length > num_bytes)
-			map_length = num_bytes;
-
 		for (i = 0; i < multi->num_stripes; i++, stripe++) {
-			btrfs_issue_discard(stripe->dev->bdev,
-					    stripe->physical,
-					    map_length);
+			ret = btrfs_issue_discard(stripe->dev->bdev,
+						  stripe->physical,
+						  stripe->length);
+			if (!ret)
+				discarded_bytes += stripe->length;
+			else if (ret != -EOPNOTSUPP)
+				break;
 		}
 		kfree(multi);
 	}
 
+	if (discarded_bytes && ret == -EOPNOTSUPP)
+		ret = 0;
+
+	if (actual_bytes)
+		*actual_bytes = discarded_bytes;
+
 	return ret;
 }
 
@@ -4361,7 +4365,9 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans,
 		if (ret)
 			break;
 
-		ret = btrfs_discard_extent(root, start, end + 1 - start);
+		if (btrfs_test_opt(root, DISCARD))
+			ret = btrfs_discard_extent(root, start,
+						   end + 1 - start, NULL);
 
 		clear_extent_dirty(unpin, start, end, GFP_NOFS);
 		unpin_extent_range(root, start, end);
@@ -5410,7 +5416,8 @@ int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len)
 		return -ENOSPC;
 	}
 
-	ret = btrfs_discard_extent(root, start, len);
+	if (btrfs_test_opt(root, DISCARD))
+		ret = btrfs_discard_extent(root, start, len, NULL);
 
 	btrfs_add_free_space(cache, start, len);
 	btrfs_update_reserved_bytes(cache, len, 0, 1);
@@ -8728,7 +8735,7 @@ int btrfs_error_unpin_extent_range(struct btrfs_root *root, u64 start, u64 end)
 }
 
 int btrfs_error_discard_extent(struct btrfs_root *root, u64 bytenr,
-			       u64 num_bytes)
+			       u64 num_bytes, u64 *actual_bytes)
 {
-	return btrfs_discard_extent(root, bytenr, num_bytes);
+	return btrfs_discard_extent(root, bytenr, num_bytes, actual_bytes);
 }
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V4 4/4] Btrfs: add btrfs_trim_fs() to handle FITRIM
  2011-03-24 10:24 [PATCH V4 0/4] Btrfs: batched discard support for btrfs Li Dongyang
                   ` (2 preceding siblings ...)
  2011-03-24 10:24 ` [PATCH V4 3/4] Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes Li Dongyang
@ 2011-03-24 10:24 ` Li Dongyang
  2011-03-27 18:10 ` [PATCH V4 0/4] Btrfs: batched discard support for btrfs Chris Mason
  4 siblings, 0 replies; 9+ messages in thread
From: Li Dongyang @ 2011-03-24 10:24 UTC (permalink / raw)
  To: linux-btrfs

We take an free extent out from allocator, trim it, then put it back,
but before we trim the block group, we should make sure the block group is
cached, so plus a little change to make cache_block_group() run without a
transaction.

Signed-off-by: Li Dongyang <lidongyang@novell.com>
---
 fs/btrfs/ctree.h            |    1 +
 fs/btrfs/extent-tree.c      |   50 +++++++++++++++++++++++-
 fs/btrfs/free-space-cache.c |   92 +++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/free-space-cache.h |    2 +
 fs/btrfs/ioctl.c            |   46 +++++++++++++++++++++
 5 files changed, 190 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 94bb772..df206c1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2232,6 +2232,7 @@ int btrfs_error_discard_extent(struct btrfs_root *root, u64 bytenr,
 			       u64 num_bytes, u64 *actual_bytes);
 int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans,
 			    struct btrfs_root *root, u64 type);
+int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range);
 
 /* ctree.c */
 int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 10e542a..d876759 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -440,7 +440,7 @@ static int cache_block_group(struct btrfs_block_group_cache *cache,
 	 * allocate blocks for the tree root we can't do the fast caching since
 	 * we likely hold important locks.
 	 */
-	if (!trans->transaction->in_commit &&
+	if (trans && (!trans->transaction->in_commit) &&
 	    (root && root != root->fs_info->tree_root)) {
 		spin_lock(&cache->lock);
 		if (cache->cached != BTRFS_CACHE_NO) {
@@ -8739,3 +8739,51 @@ int btrfs_error_discard_extent(struct btrfs_root *root, u64 bytenr,
 {
 	return btrfs_discard_extent(root, bytenr, num_bytes, actual_bytes);
 }
+
+int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range)
+{
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_block_group_cache *cache = NULL;
+	u64 group_trimmed;
+	u64 start;
+	u64 end;
+	u64 trimmed = 0;
+	int ret = 0;
+
+	cache = btrfs_lookup_block_group(fs_info, range->start);
+
+	while (cache) {
+		if (cache->key.objectid >= (range->start + range->len)) {
+			btrfs_put_block_group(cache);
+			break;
+		}
+
+		start = max(range->start, cache->key.objectid);
+		end = min(range->start + range->len,
+				cache->key.objectid + cache->key.offset);
+
+		if (end - start >= range->minlen) {
+			if (!block_group_cache_done(cache)) {
+				ret = cache_block_group(cache, NULL, root, 0);
+				if (!ret)
+					wait_block_group_cache_done(cache);
+			}
+			ret = btrfs_trim_block_group(cache,
+						     &group_trimmed,
+						     start,
+						     end,
+						     range->minlen);
+
+			trimmed += group_trimmed;
+			if (ret) {
+				btrfs_put_block_group(cache);
+				break;
+			}
+		}
+
+		cache = next_block_group(fs_info->tree_root, cache);
+	}
+
+	range->len = trimmed;
+	return ret;
+}
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index a039065..d0dc812 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2154,3 +2154,95 @@ void btrfs_init_free_cluster(struct btrfs_free_cluster *cluster)
 	cluster->block_group = NULL;
 }
 
+int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group,
+			   u64 *trimmed, u64 start, u64 end, u64 minlen)
+{
+	struct btrfs_free_space *entry = NULL;
+	struct btrfs_fs_info *fs_info = block_group->fs_info;
+	u64 bytes = 0;
+	u64 actually_trimmed;
+	int ret = 0;
+
+	*trimmed = 0;
+
+	while (start < end) {
+		spin_lock(&block_group->tree_lock);
+
+		if (block_group->free_space < minlen) {
+			spin_unlock(&block_group->tree_lock);
+			break;
+		}
+
+		entry = tree_search_offset(block_group, start, 0, 1);
+		if (!entry)
+			entry = tree_search_offset(block_group,
+						   offset_to_bitmap(block_group,
+								    start),
+						   1, 1);
+
+		if (!entry || entry->offset >= end) {
+			spin_unlock(&block_group->tree_lock);
+			break;
+		}
+
+		if (entry->bitmap) {
+			ret = search_bitmap(block_group, entry, &start, &bytes);
+			if (!ret) {
+				if (start >= end) {
+					spin_unlock(&block_group->tree_lock);
+					break;
+				}
+				bytes = min(bytes, end - start);
+				bitmap_clear_bits(block_group, entry,
+						  start, bytes);
+				if (entry->bytes == 0)
+					free_bitmap(block_group, entry);
+			} else {
+				start = entry->offset + BITS_PER_BITMAP *
+					block_group->sectorsize;
+				spin_unlock(&block_group->tree_lock);
+				ret = 0;
+				continue;
+			}
+		} else {
+			start = entry->offset;
+			bytes = min(entry->bytes, end - start);
+			unlink_free_space(block_group, entry);
+			kfree(entry);
+		}
+
+		spin_unlock(&block_group->tree_lock);
+
+		if (bytes >= minlen) {
+			int update_ret;
+			update_ret = btrfs_update_reserved_bytes(block_group,
+								 bytes, 1, 1);
+
+			ret = btrfs_error_discard_extent(fs_info->extent_root,
+							 start,
+							 bytes,
+							 &actually_trimmed);
+
+			btrfs_add_free_space(block_group,
+					     start, bytes);
+			if (!update_ret)
+				btrfs_update_reserved_bytes(block_group,
+							    bytes, 0, 1);
+
+			if (ret)
+				break;
+			*trimmed += actually_trimmed;
+		}
+		start += bytes;
+		bytes = 0;
+
+		if (fatal_signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+
+		cond_resched();
+	}
+
+	return ret;
+}
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index e49ca5c..65c3b93 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -68,4 +68,6 @@ u64 btrfs_alloc_from_cluster(struct btrfs_block_group_cache *block_group,
 int btrfs_return_cluster_to_free_space(
 			       struct btrfs_block_group_cache *block_group,
 			       struct btrfs_free_cluster *cluster);
+int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group,
+			   u64 *trimmed, u64 start, u64 end, u64 minlen);
 #endif
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5fdb2ab..eff9228 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -40,6 +40,7 @@
 #include <linux/xattr.h>
 #include <linux/vmalloc.h>
 #include <linux/slab.h>
+#include <linux/blkdev.h>
 #include "compat.h"
 #include "ctree.h"
 #include "disk-io.h"
@@ -225,6 +226,49 @@ static int btrfs_ioctl_getversion(struct file *file, int __user *arg)
 	return put_user(inode->i_generation, arg);
 }
 
+static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
+{
+	struct btrfs_root *root = fdentry(file)->d_sb->s_fs_info;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_device *device;
+	struct request_queue *q;
+	struct fstrim_range range;
+	u64 minlen = ULLONG_MAX;
+	u64 num_devices = 0;
+	int ret;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	mutex_lock(&fs_info->fs_devices->device_list_mutex);
+	list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {
+		if (!device->bdev)
+			continue;
+		q = bdev_get_queue(device->bdev);
+		if (blk_queue_discard(q)) {
+			num_devices++;
+			minlen = min((u64)q->limits.discard_granularity,
+				     minlen);
+		}
+	}
+	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
+	if (!num_devices)
+		return -EOPNOTSUPP;
+
+	if (copy_from_user(&range, arg, sizeof(range)))
+		return -EFAULT;
+
+	range.minlen = max(range.minlen, minlen);
+	ret = btrfs_trim_fs(root, &range);
+	if (ret < 0)
+		return ret;
+
+	if (copy_to_user(arg, &range, sizeof(range)))
+		return -EFAULT;
+
+	return 0;
+}
+
 static noinline int create_subvol(struct btrfs_root *root,
 				  struct dentry *dentry,
 				  char *name, int namelen,
@@ -2388,6 +2432,8 @@ long btrfs_ioctl(struct file *file, unsigned int
 		return btrfs_ioctl_setflags(file, argp);
 	case FS_IOC_GETVERSION:
 		return btrfs_ioctl_getversion(file, argp);
+	case FITRIM:
+		return btrfs_ioctl_fitrim(file, argp);
 	case BTRFS_IOC_SNAP_CREATE:
 		return btrfs_ioctl_snap_create(file, argp, 0);
 	case BTRFS_IOC_SNAP_CREATE_V2:
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH V4 0/4] Btrfs: batched discard support for btrfs
  2011-03-24 10:24 [PATCH V4 0/4] Btrfs: batched discard support for btrfs Li Dongyang
                   ` (3 preceding siblings ...)
  2011-03-24 10:24 ` [PATCH V4 4/4] Btrfs: add btrfs_trim_fs() to handle FITRIM Li Dongyang
@ 2011-03-27 18:10 ` Chris Mason
  2011-03-28  1:30   ` Chris Mason
  4 siblings, 1 reply; 9+ messages in thread
From: Chris Mason @ 2011-03-27 18:10 UTC (permalink / raw)
  To: Li Dongyang; +Cc: linux-btrfs

Excerpts from Li Dongyang's message of 2011-03-24 06:24:24 -0400:
> Dear list,
> This is V4 of batched discard support, now we will get full mapping of
> the free space on each device for RAID0/1/10/DUP instead of just a single
> stripe length, and tested with xfsstests 251, Thanks.

I've pushed this out into the for-linus branch, along with a full merge
to 2.6.39 current git.

Please take a look and make sure I've merged it correctly.

Thanks!

-chris

> Changelog V4:
>     *make btrfs_map_block() return full mapping.
> Changelog V3:
>     *fix style problems.
>     *rebase to 2.6.38-rc7.
> Changelog V2:
>     *Check if we have devices support trim before trying to trim the fs, also adjust
>       minlen according to the discard_granularity.
>     *Update reserved extent calculations in btrfs_trim_block_group().
>     *Call cond_resched() without checking need_resched()
>     *Use bitmap_clear_bits() and unlink_free_space() instead of btrfs_remove_free_space(),
>       so we won't search the same extent for twice.
>     *Try harder in btrfs_discard_extent(), now we won't report errors
>      if it's not a EOPNOTSUPP.
>     *make sure the block group is cached before trimming it,or we'll see an empty caching
>      tree if the block group is not cached.
>     *Minor return value fix in btrfs_discard_block_group().

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V4 0/4] Btrfs: batched discard support for btrfs
  2011-03-27 18:10 ` [PATCH V4 0/4] Btrfs: batched discard support for btrfs Chris Mason
@ 2011-03-28  1:30   ` Chris Mason
  2011-03-28  1:39     ` Chris Mason
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Mason @ 2011-03-28  1:30 UTC (permalink / raw)
  To: Chris Mason; +Cc: Li Dongyang, linux-btrfs

Excerpts from Chris Mason's message of 2011-03-27 14:10:46 -0400:
> Excerpts from Li Dongyang's message of 2011-03-24 06:24:24 -0400:
> > Dear list,
> > This is V4 of batched discard support, now we will get full mapping of
> > the free space on each device for RAID0/1/10/DUP instead of just a single
> > stripe length, and tested with xfsstests 251, Thanks.
> 
> I've pushed this out into the for-linus branch, along with a full merge
> to 2.6.39 current git.
> 
> Please take a look and make sure I've merged it correctly.

Hmmm, this was doing mod operations on 64 bit numbers, so it didn't
compile at all on 32 bit machines.  I've fixed it up and pushed the
result out to for-linus.  Please check the math ;)

-chris

> 
> Thanks!
> 
> -chris
> 
> > Changelog V4:
> >     *make btrfs_map_block() return full mapping.
> > Changelog V3:
> >     *fix style problems.
> >     *rebase to 2.6.38-rc7.
> > Changelog V2:
> >     *Check if we have devices support trim before trying to trim the fs, also adjust
> >       minlen according to the discard_granularity.
> >     *Update reserved extent calculations in btrfs_trim_block_group().
> >     *Call cond_resched() without checking need_resched()
> >     *Use bitmap_clear_bits() and unlink_free_space() instead of btrfs_remove_free_space(),
> >       so we won't search the same extent for twice.
> >     *Try harder in btrfs_discard_extent(), now we won't report errors
> >      if it's not a EOPNOTSUPP.
> >     *make sure the block group is cached before trimming it,or we'll see an empty caching
> >      tree if the block group is not cached.
> >     *Minor return value fix in btrfs_discard_block_group().

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V4 0/4] Btrfs: batched discard support for btrfs
  2011-03-28  1:30   ` Chris Mason
@ 2011-03-28  1:39     ` Chris Mason
  2011-03-28  9:25       ` Li Dongyang
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Mason @ 2011-03-28  1:39 UTC (permalink / raw)
  To: Chris Mason; +Cc: Li Dongyang, linux-btrfs

Excerpts from Chris Mason's message of 2011-03-27 21:30:20 -0400:
> Excerpts from Chris Mason's message of 2011-03-27 14:10:46 -0400:
> > Excerpts from Li Dongyang's message of 2011-03-24 06:24:24 -0400:
> > > Dear list,
> > > This is V4 of batched discard support, now we will get full mapping of
> > > the free space on each device for RAID0/1/10/DUP instead of just a single
> > > stripe length, and tested with xfsstests 251, Thanks.
> > 
> > I've pushed this out into the for-linus branch, along with a full merge
> > to 2.6.39 current git.
> > 
> > Please take a look and make sure I've merged it correctly.
> 
> Hmmm, this was doing mod operations on 64 bit numbers, so it didn't
> compile at all on 32 bit machines.  I've fixed it up and pushed the
> result out to for-linus.  Please check the math ;)

BTW, I just rebased this so the incremental fix was before merging into
Linus' tree.

-chris

> 
> -chris
> 
> > 
> > Thanks!
> > 
> > -chris
> > 
> > > Changelog V4:
> > >     *make btrfs_map_block() return full mapping.
> > > Changelog V3:
> > >     *fix style problems.
> > >     *rebase to 2.6.38-rc7.
> > > Changelog V2:
> > >     *Check if we have devices support trim before trying to trim the fs, also adjust
> > >       minlen according to the discard_granularity.
> > >     *Update reserved extent calculations in btrfs_trim_block_group().
> > >     *Call cond_resched() without checking need_resched()
> > >     *Use bitmap_clear_bits() and unlink_free_space() instead of btrfs_remove_free_space(),
> > >       so we won't search the same extent for twice.
> > >     *Try harder in btrfs_discard_extent(), now we won't report errors
> > >      if it's not a EOPNOTSUPP.
> > >     *make sure the block group is cached before trimming it,or we'll see an empty caching
> > >      tree if the block group is not cached.
> > >     *Minor return value fix in btrfs_discard_block_group().

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V4 0/4] Btrfs: batched discard support for btrfs
  2011-03-28  1:39     ` Chris Mason
@ 2011-03-28  9:25       ` Li Dongyang
  0 siblings, 0 replies; 9+ messages in thread
From: Li Dongyang @ 2011-03-28  9:25 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

On Monday, March 28, 2011 09:39:26 AM Chris Mason wrote:
> Excerpts from Chris Mason's message of 2011-03-27 21:30:20 -0400:
> > Excerpts from Chris Mason's message of 2011-03-27 14:10:46 -0400:
> > > Excerpts from Li Dongyang's message of 2011-03-24 06:24:24 -0400:
> > > > Dear list,
> > > > This is V4 of batched discard support, now we will get full mapping of
> > > > the free space on each device for RAID0/1/10/DUP instead of just a single
> > > > stripe length, and tested with xfsstests 251, Thanks.
> > > 
> > > I've pushed this out into the for-linus branch, along with a full merge
> > > to 2.6.39 current git.
> > > 
> > > Please take a look and make sure I've merged it correctly.
Looks good to me.
> > 
> > Hmmm, this was doing mod operations on 64 bit numbers, so it didn't
> > compile at all on 32 bit machines.  I've fixed it up and pushed the
> > result out to for-linus.  Please check the math ;)
> 
sorry for being so stupid, thanks for fixing ;-)

Br,
Li Dongyang
> BTW, I just rebased this so the incremental fix was before merging into
> Linus' tree.
> 
> -chris
> 
> > 
> > -chris
> > 
> > > 
> > > Thanks!
> > > 
> > > -chris
> > > 
> > > > Changelog V4:
> > > >     *make btrfs_map_block() return full mapping.
> > > > Changelog V3:
> > > >     *fix style problems.
> > > >     *rebase to 2.6.38-rc7.
> > > > Changelog V2:
> > > >     *Check if we have devices support trim before trying to trim the fs, also adjust
> > > >       minlen according to the discard_granularity.
> > > >     *Update reserved extent calculations in btrfs_trim_block_group().
> > > >     *Call cond_resched() without checking need_resched()
> > > >     *Use bitmap_clear_bits() and unlink_free_space() instead of btrfs_remove_free_space(),
> > > >       so we won't search the same extent for twice.
> > > >     *Try harder in btrfs_discard_extent(), now we won't report errors
> > > >      if it's not a EOPNOTSUPP.
> > > >     *make sure the block group is cached before trimming it,or we'll see an empty caching
> > > >      tree if the block group is not cached.
> > > >     *Minor return value fix in btrfs_discard_block_group().
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-03-28  9:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-24 10:24 [PATCH V4 0/4] Btrfs: batched discard support for btrfs Li Dongyang
2011-03-24 10:24 ` [PATCH V4 1/4] Btrfs: make update_reserved_bytes() public Li Dongyang
2011-03-24 10:24 ` [PATCH V4 2/4] Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP Li Dongyang
2011-03-24 10:24 ` [PATCH V4 3/4] Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes Li Dongyang
2011-03-24 10:24 ` [PATCH V4 4/4] Btrfs: add btrfs_trim_fs() to handle FITRIM Li Dongyang
2011-03-27 18:10 ` [PATCH V4 0/4] Btrfs: batched discard support for btrfs Chris Mason
2011-03-28  1:30   ` Chris Mason
2011-03-28  1:39     ` Chris Mason
2011-03-28  9:25       ` Li Dongyang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).