Linux-BTRFS Archive on lore.kernel.org
 help / Atom feed
* [PATCH v2 00/12] FITRIM improvements
@ 2019-02-11  8:34 Nikolay Borisov
  2019-02-11  8:34 ` [PATCH v2 01/12] btrfs: Honour FITRIM range constraints during free space trim Nikolay Borisov
                   ` (11 more replies)
  0 siblings, 12 replies; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-11  8:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Here is the second version of the FITRIM patchset. For background information 
consult the previous [0] post. Changes since v1: 

 * Dropped some cleanup patches as they have been merged in the meantime. 

 * In Patch 2 switched list iteration to list_for_each_entry_safe in
   btrfs_cleanup_one_transaction. (Johanness Thumshirn) 

 * Fixed a bug in Patch 12 which leads to infinite loop. Now find_first_clear_bit
   returns void since it doesn't have enough context to tell whether it exhausted 
   the search space

 * Fixed kerneldoc format in Patch 11 (Johaness Thumshirn)

 * Rebased on latest misc-next branch

 * Further testing of the patchset with newly enabled shared/298 test - succeeds. 
 
 [0] https://lore.kernel.org/linux-btrfs/20190130145102.4708-1-nborisov@suse.com/

Jeff Mahoney (2):
  btrfs: combine device update operations during transaction commit
  btrfs: replace pending/pinned chunks lists with io tree

Nikolay Borisov (10):
  btrfs: Honour FITRIM range constraints during free space trim
  btrfs: Handle pending/pinned chunks before blockgroup relocation
    during device shrink
  btrfs: Rename and export clear_btree_io_tree
  btrfs: Populate ->orig_block_len during read_one_chunk
  btrfs: Introduce new bits for device allocation tree
  btrfs: Remove 'trans' argument from find_free_dev_extent(_start)
  btrfs: Factor out in_range macro
  btrfs: Optimize unallocated chunks discard
  btrfs: Implement find_first_clear_extent_bit
  btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit

 fs/btrfs/ctree.h            |   8 +-
 fs/btrfs/dev-replace.c      |   2 +-
 fs/btrfs/disk-io.c          |  18 +--
 fs/btrfs/extent-tree.c      | 102 +++++--------
 fs/btrfs/extent_io.c        | 103 +++++++++++++-
 fs/btrfs/extent_io.h        |  19 ++-
 fs/btrfs/extent_map.c       |  38 +++++
 fs/btrfs/extent_map.h       |   1 -
 fs/btrfs/free-space-cache.c |   4 -
 fs/btrfs/transaction.c      |  53 ++-----
 fs/btrfs/transaction.h      |   2 +-
 fs/btrfs/volumes.c          | 277 ++++++++++++++----------------------
 fs/btrfs/volumes.h          |  23 ++-
 13 files changed, 332 insertions(+), 318 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 01/12] btrfs: Honour FITRIM range constraints during free space trim
  2019-02-11  8:34 [PATCH v2 00/12] FITRIM improvements Nikolay Borisov
@ 2019-02-11  8:34 ` Nikolay Borisov
  2019-02-11  8:35 ` [PATCH v2 02/12] btrfs: combine device update operations during transaction commit Nikolay Borisov
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-11  8:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Up until know trimming the freespace was done irrespective of what the
arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments
will be entirely ignored. Fix it by correctly handling those paramter.
This requires breaking if the found freespace extent is after the end
of the passed range as well as completing trim after trimming
fstrim_range::len bytes.

Fixes: 499f377f49f0 ("btrfs: iterate over unused chunk space in FITRIM")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent-tree.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9f012c2facbe..33ecd4128898 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -11258,9 +11258,9 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
  * held back allocations.
  */
 static int btrfs_trim_free_extents(struct btrfs_device *device,
-				   u64 minlen, u64 *trimmed)
+				   struct fstrim_range *range, u64 *trimmed)
 {
-	u64 start = 0, len = 0;
+	u64 start = range->start, len = 0;
 	int ret;
 
 	*trimmed = 0;
@@ -11303,8 +11303,8 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 		if (!trans)
 			up_read(&fs_info->commit_root_sem);
 
-		ret = find_free_dev_extent_start(trans, device, minlen, start,
-						 &start, &len);
+		ret = find_free_dev_extent_start(trans, device, range->minlen,
+						 start, &start, &len);
 		if (trans) {
 			up_read(&fs_info->commit_root_sem);
 			btrfs_put_transaction(trans);
@@ -11317,6 +11317,16 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 			break;
 		}
 
+		/* If we are out of the passed range break */
+		if (start > range->start + range->len - 1) {
+			mutex_unlock(&fs_info->chunk_mutex);
+			ret = 0;
+			break;
+		}
+
+		start = max(range->start, start);
+		len = min(range->len, len);
+
 		ret = btrfs_issue_discard(device->bdev, start, len, &bytes);
 		mutex_unlock(&fs_info->chunk_mutex);
 
@@ -11326,6 +11336,10 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 		start += len;
 		*trimmed += bytes;
 
+		/* We've trimmed enough */
+		if (*trimmed >= range->len)
+			break;
+
 		if (fatal_signal_pending(current)) {
 			ret = -ERESTARTSYS;
 			break;
@@ -11409,8 +11423,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
 	mutex_lock(&fs_info->fs_devices->device_list_mutex);
 	devices = &fs_info->fs_devices->devices;
 	list_for_each_entry(device, devices, dev_list) {
-		ret = btrfs_trim_free_extents(device, range->minlen,
-					      &group_trimmed);
+		ret = btrfs_trim_free_extents(device, range, &group_trimmed);
 		if (ret) {
 			dev_failed++;
 			dev_ret = ret;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 02/12] btrfs: combine device update operations during transaction commit
  2019-02-11  8:34 [PATCH v2 00/12] FITRIM improvements Nikolay Borisov
  2019-02-11  8:34 ` [PATCH v2 01/12] btrfs: Honour FITRIM range constraints during free space trim Nikolay Borisov
@ 2019-02-11  8:35 ` Nikolay Borisov
  2019-02-28 16:52   ` David Sterba
  2019-02-11  8:35 ` [PATCH v2 03/12] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink Nikolay Borisov
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-11  8:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Jeff Mahoney, Nikolay Borisov

From: Jeff Mahoney <jeffm@suse.com>

We currently overload the pending_chunks list to handle updating
btrfs_device->commit_bytes used.  We don't actually care about
the extent mapping or even the device mapping for the chunk - we
just need the device, and we can end up processing it multiple
times.  The fs_devices->resized_list does more or less the same
thing, but with the disk size.  They are called consecutively
during commit and have more or less the same purpose.

We can combine the two lists into a single list that attaches
to the transaction and contains a list of devices that need
updating.  Since we always add the device to a list when we
change bytes_used or disk_total_size, there's no harm in
copying both values at once.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/dev-replace.c |  2 +-
 fs/btrfs/disk-io.c     |  7 ++++
 fs/btrfs/transaction.c |  7 ++--
 fs/btrfs/transaction.h |  1 +
 fs/btrfs/volumes.c     | 84 ++++++++++++++++++------------------------
 fs/btrfs/volumes.h     | 13 ++-----
 6 files changed, 52 insertions(+), 62 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 13863354ff9d..f335cbf3bf1f 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -662,7 +662,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 	btrfs_device_set_disk_total_bytes(tgt_device,
 					  src_device->disk_total_bytes);
 	btrfs_device_set_bytes_used(tgt_device, src_device->bytes_used);
-	ASSERT(list_empty(&src_device->resized_list));
+	ASSERT(list_empty(&src_device->post_commit_list));
 	tgt_device->commit_total_bytes = src_device->commit_total_bytes;
 	tgt_device->commit_bytes_used = src_device->bytes_used;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8c0038de73ee..f1c42d242d48 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4483,10 +4483,17 @@ void btrfs_cleanup_dirty_bgs(struct btrfs_transaction *cur_trans,
 void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans,
 				   struct btrfs_fs_info *fs_info)
 {
+	struct btrfs_device *dev, *tmp;
+
 	btrfs_cleanup_dirty_bgs(cur_trans, fs_info);
 	ASSERT(list_empty(&cur_trans->dirty_bgs));
 	ASSERT(list_empty(&cur_trans->io_bgs));
 
+	list_for_each_entry_safe(dev, tmp, &cur_trans->dev_update_list,
+				 post_commit_list) {
+		list_del_init(&dev->post_commit_list);
+	}
+
 	btrfs_destroy_delayed_refs(cur_trans, fs_info);
 
 	cur_trans->state = TRANS_STATE_COMMIT_START;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index acdad6d658f5..d12c9c17d9e9 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -75,6 +75,7 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction)
 			btrfs_put_block_group_trimming(cache);
 			btrfs_put_block_group(cache);
 		}
+		BUG_ON(!list_empty(&transaction->dev_update_list));
 		kfree(transaction);
 	}
 }
@@ -264,6 +265,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 
 	INIT_LIST_HEAD(&cur_trans->pending_snapshots);
 	INIT_LIST_HEAD(&cur_trans->pending_chunks);
+	INIT_LIST_HEAD(&cur_trans->dev_update_list);
 	INIT_LIST_HEAD(&cur_trans->switch_commits);
 	INIT_LIST_HEAD(&cur_trans->dirty_bgs);
 	INIT_LIST_HEAD(&cur_trans->io_bgs);
@@ -550,7 +552,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 	 * and then we deadlock with somebody doing a freeze.
 	 *
 	 * If we are ATTACH, it means we just want to catch the current
-	 * transaction and commit it, so we needn't do sb_start_intwrite(). 
+	 * transaction and commit it, so we needn't do sb_start_intwrite().
 	 */
 	if (type & __TRANS_FREEZABLE)
 		sb_start_intwrite(fs_info->sb);
@@ -2204,8 +2206,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	memcpy(fs_info->super_for_commit, fs_info->super_copy,
 	       sizeof(*fs_info->super_copy));
 
-	btrfs_update_commit_device_size(fs_info);
-	btrfs_update_commit_device_bytes_used(cur_trans);
+	btrfs_commit_device_sizes(cur_trans);
 
 	clear_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags);
 	clear_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index f1ba78949d1b..e0a04fa4de66 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -52,6 +52,7 @@ struct btrfs_transaction {
 	wait_queue_head_t commit_wait;
 	struct list_head pending_snapshots;
 	struct list_head pending_chunks;
+	struct list_head dev_update_list;
 	struct list_head switch_commits;
 	struct list_head dirty_bgs;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 03f223aa7194..2d763f944298 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -318,7 +318,6 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
 	mutex_init(&fs_devs->device_list_mutex);
 
 	INIT_LIST_HEAD(&fs_devs->devices);
-	INIT_LIST_HEAD(&fs_devs->resized_devices);
 	INIT_LIST_HEAD(&fs_devs->alloc_list);
 	INIT_LIST_HEAD(&fs_devs->fs_list);
 	if (fsid)
@@ -334,6 +333,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
 
 void btrfs_free_device(struct btrfs_device *device)
 {
+	BUG_ON(!list_empty(&device->post_commit_list));
 	rcu_string_free(device->name);
 	bio_put(device->flush_bio);
 	kfree(device);
@@ -402,7 +402,7 @@ static struct btrfs_device *__alloc_device(void)
 
 	INIT_LIST_HEAD(&dev->dev_list);
 	INIT_LIST_HEAD(&dev->dev_alloc_list);
-	INIT_LIST_HEAD(&dev->resized_list);
+	INIT_LIST_HEAD(&dev->post_commit_list);
 
 	spin_lock_init(&dev->io_lock);
 
@@ -2880,9 +2880,9 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans,
 	btrfs_device_set_total_bytes(device, new_size);
 	btrfs_device_set_disk_total_bytes(device, new_size);
 	btrfs_clear_space_info_full(device->fs_info);
-	if (list_empty(&device->resized_list))
-		list_add_tail(&device->resized_list,
-			      &fs_devices->resized_devices);
+	if (list_empty(&device->post_commit_list))
+		list_add_tail(&device->post_commit_list,
+			      &trans->transaction->dev_update_list);
 	mutex_unlock(&fs_info->chunk_mutex);
 
 	return btrfs_update_device(trans, device);
@@ -4871,9 +4871,9 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	}
 
 	btrfs_device_set_disk_total_bytes(device, new_size);
-	if (list_empty(&device->resized_list))
-		list_add_tail(&device->resized_list,
-			      &fs_info->fs_devices->resized_devices);
+	if (list_empty(&device->post_commit_list))
+		list_add_tail(&device->post_commit_list,
+			      &trans->transaction->dev_update_list);
 
 	WARN_ON(diff > old_total);
 	btrfs_set_super_total_bytes(super_copy,
@@ -5222,9 +5222,14 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	if (ret)
 		goto error_del_extent;
 
-	for (i = 0; i < map->num_stripes; i++)
-		btrfs_device_set_bytes_used(map->stripes[i].dev,
-				map->stripes[i].dev->bytes_used + stripe_size);
+	for (i = 0; i < map->num_stripes; i++) {
+		struct btrfs_device *dev = map->stripes[i].dev;
+
+		btrfs_device_set_bytes_used(dev, dev->bytes_used + stripe_size);
+		if (list_empty(&dev->post_commit_list))
+			list_add_tail(&dev->post_commit_list,
+				      &trans->transaction->dev_update_list);
+	}
 
 	atomic64_sub(stripe_size * map->num_stripes, &info->free_chunk_space);
 
@@ -7674,51 +7679,34 @@ void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_pat
 }
 
 /*
- * Update the size of all devices, which is used for writing out the
- * super blocks.
+ * Update the size and bytes used for each device where it changed.
+ * This is delayed since we would otherwise get errors while writing
+ * out the superblocks.
+ *
+ * Must be invoked during transaction commit.
  */
-void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info)
+void btrfs_commit_device_sizes(struct btrfs_transaction *trans)
 {
-	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	struct btrfs_device *curr, *next;
 
-	if (list_empty(&fs_devices->resized_devices))
-		return;
-
-	mutex_lock(&fs_devices->device_list_mutex);
-	mutex_lock(&fs_info->chunk_mutex);
-	list_for_each_entry_safe(curr, next, &fs_devices->resized_devices,
-				 resized_list) {
-		list_del_init(&curr->resized_list);
-		curr->commit_total_bytes = curr->disk_total_bytes;
-	}
-	mutex_unlock(&fs_info->chunk_mutex);
-	mutex_unlock(&fs_devices->device_list_mutex);
-}
-
-/* Must be invoked during the transaction commit */
-void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans)
-{
-	struct btrfs_fs_info *fs_info = trans->fs_info;
-	struct extent_map *em;
-	struct map_lookup *map;
-	struct btrfs_device *dev;
-	int i;
+	BUG_ON(trans->state != TRANS_STATE_COMMIT_DOING);
 
-	if (list_empty(&trans->pending_chunks))
+	if (list_empty(&trans->dev_update_list))
 		return;
 
-	/* In order to kick the device replace finish process */
-	mutex_lock(&fs_info->chunk_mutex);
-	list_for_each_entry(em, &trans->pending_chunks, list) {
-		map = em->map_lookup;
-
-		for (i = 0; i < map->num_stripes; i++) {
-			dev = map->stripes[i].dev;
-			dev->commit_bytes_used = dev->bytes_used;
-		}
+	/*
+	 * We don't need the device_list_mutex here.  This list is owned
+	 * by the transaction and the transaction must complete before
+	 * the device is released.
+	 */
+	mutex_lock(&trans->fs_info->chunk_mutex);
+	list_for_each_entry_safe(curr, next, &trans->dev_update_list,
+				 post_commit_list) {
+		list_del_init(&curr->post_commit_list);
+		curr->commit_total_bytes = curr->disk_total_bytes;
+		curr->commit_bytes_used = curr->bytes_used;
 	}
-	mutex_unlock(&fs_info->chunk_mutex);
+	mutex_unlock(&trans->fs_info->chunk_mutex);
 }
 
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 3ad9d58d1b66..a0f09aad3770 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -45,6 +45,7 @@ struct btrfs_pending_bios {
 struct btrfs_device {
 	struct list_head dev_list;
 	struct list_head dev_alloc_list;
+	struct list_head post_commit_list; /* chunk mutex */
 	struct btrfs_fs_devices *fs_devices;
 	struct btrfs_fs_info *fs_info;
 
@@ -102,18 +103,12 @@ struct btrfs_device {
 	 * size of the device on the current transaction
 	 *
 	 * This variant is update when committing the transaction,
-	 * and protected by device_list_mutex
+	 * and protected by chunk mutex
 	 */
 	u64 commit_total_bytes;
 
 	/* bytes used on the current transaction */
 	u64 commit_bytes_used;
-	/*
-	 * used to manage the device which is resized
-	 *
-	 * It is protected by chunk_lock.
-	 */
-	struct list_head resized_list;
 
 	/* for sending down flush barriers */
 	struct bio *flush_bio;
@@ -235,7 +230,6 @@ struct btrfs_fs_devices {
 	struct mutex device_list_mutex;
 	struct list_head devices;
 
-	struct list_head resized_devices;
 	/* devices not currently being allocated */
 	struct list_head alloc_list;
 
@@ -558,8 +552,7 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags)
 
 const char *get_raid_name(enum btrfs_raid_types type);
 
-void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info);
-void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans);
+void btrfs_commit_device_sizes(struct btrfs_transaction *trans);
 
 struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 03/12] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink
  2019-02-11  8:34 [PATCH v2 00/12] FITRIM improvements Nikolay Borisov
  2019-02-11  8:34 ` [PATCH v2 01/12] btrfs: Honour FITRIM range constraints during free space trim Nikolay Borisov
  2019-02-11  8:35 ` [PATCH v2 02/12] btrfs: combine device update operations during transaction commit Nikolay Borisov
@ 2019-02-11  8:35 ` Nikolay Borisov
  2019-02-11  8:35 ` [PATCH v2 04/12] btrfs: Rename and export clear_btree_io_tree Nikolay Borisov
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-11  8:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

During device shrink pinned/pending chunks (i.e those which have been
deleted/created respectively, in the current transaction and haven't
touched disk) need to be accounted when doing device shrink. Presently
this happens after the main relocation loop in btrfs_shrink_device,
which could lead to making another go in the body of the function.

Since there is no hard requirement to perform pinned/pending chunks
handling after the relocation loop, move the code before it. This leads
to simplifying the code flow around - i.e no need to use 'goto again'.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/volumes.c | 54 ++++++++++++++++++----------------------------
 1 file changed, 21 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2d763f944298..f0d91b1fda1c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4722,15 +4722,15 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	int slot;
 	int failed = 0;
 	bool retried = false;
-	bool checked_pending_chunks = false;
 	struct extent_buffer *l;
 	struct btrfs_key key;
 	struct btrfs_super_block *super_copy = fs_info->super_copy;
 	u64 old_total = btrfs_super_total_bytes(super_copy);
 	u64 old_size = btrfs_device_get_total_bytes(device);
 	u64 diff;
+	u64 start;
 
-	new_size = round_down(new_size, fs_info->sectorsize);
+	start = new_size = round_down(new_size, fs_info->sectorsize);
 	diff = round_down(old_size - new_size, fs_info->sectorsize);
 
 	if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state))
@@ -4742,6 +4742,10 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 
 	path->reada = READA_BACK;
 
+	trans = btrfs_start_transaction(root, 0);
+	if (IS_ERR(trans))
+		return PTR_ERR(trans);
+
 	mutex_lock(&fs_info->chunk_mutex);
 
 	btrfs_device_set_total_bytes(device, new_size);
@@ -4749,7 +4753,21 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 		device->fs_devices->total_rw_bytes -= diff;
 		atomic64_sub(diff, &fs_info->free_chunk_space);
 	}
-	mutex_unlock(&fs_info->chunk_mutex);
+
+	/*
+	 * Once the device's size has been set to the new size, ensure all
+	 * in-memory chunks are synced to disk so that the loop below sees them
+	 * and relocates them accordingly.
+	 */
+	if (contains_pending_extent(trans->transaction, device, &start, diff)) {
+		mutex_unlock(&fs_info->chunk_mutex);
+		ret = btrfs_commit_transaction(trans);
+		if (ret)
+			goto done;
+	} else {
+		mutex_unlock(&fs_info->chunk_mutex);
+		btrfs_end_transaction(trans);
+	}
 
 again:
 	key.objectid = device->devid;
@@ -4840,36 +4858,6 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	}
 
 	mutex_lock(&fs_info->chunk_mutex);
-
-	/*
-	 * We checked in the above loop all device extents that were already in
-	 * the device tree. However before we have updated the device's
-	 * total_bytes to the new size, we might have had chunk allocations that
-	 * have not complete yet (new block groups attached to transaction
-	 * handles), and therefore their device extents were not yet in the
-	 * device tree and we missed them in the loop above. So if we have any
-	 * pending chunk using a device extent that overlaps the device range
-	 * that we can not use anymore, commit the current transaction and
-	 * repeat the search on the device tree - this way we guarantee we will
-	 * not have chunks using device extents that end beyond 'new_size'.
-	 */
-	if (!checked_pending_chunks) {
-		u64 start = new_size;
-		u64 len = old_size - new_size;
-
-		if (contains_pending_extent(trans->transaction, device,
-					    &start, len)) {
-			mutex_unlock(&fs_info->chunk_mutex);
-			checked_pending_chunks = true;
-			failed = 0;
-			retried = false;
-			ret = btrfs_commit_transaction(trans);
-			if (ret)
-				goto done;
-			goto again;
-		}
-	}
-
 	btrfs_device_set_disk_total_bytes(device, new_size);
 	if (list_empty(&device->post_commit_list))
 		list_add_tail(&device->post_commit_list,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 04/12] btrfs: Rename and export clear_btree_io_tree
  2019-02-11  8:34 [PATCH v2 00/12] FITRIM improvements Nikolay Borisov
                   ` (2 preceding siblings ...)
  2019-02-11  8:35 ` [PATCH v2 03/12] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink Nikolay Borisov
@ 2019-02-11  8:35 ` Nikolay Borisov
  2019-02-28 16:53   ` David Sterba
  2019-02-11  8:35 ` [PATCH v2 05/12] btrfs: Populate ->orig_block_len during read_one_chunk Nikolay Borisov
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-11  8:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This function is going to be used to clear out the device extent
allocation information. Give it a more generic name and export it. This
is in preparation to replacing the pending/pinned chunk lists with an
extent tree. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c   | 28 ++++++++++++++++++++++++++++
 fs/btrfs/extent_io.h   |  1 +
 fs/btrfs/transaction.c | 37 ++++---------------------------------
 3 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a9e74bd6c434..ae1049824739 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -241,6 +241,34 @@ void extent_io_tree_init(struct extent_io_tree *tree,
 	tree->private_data = private_data;
 }
 
+void extent_io_tree_release(struct extent_io_tree *tree)
+{
+	spin_lock(&tree->lock);
+	/*
+	 * Do a single barrier for the waitqueue_active check here, the state
+	 * of the waitqueue should not change once clear_btree_io_tree is
+	 * called.
+	 */
+	smp_mb();
+	while (!RB_EMPTY_ROOT(&tree->state)) {
+		struct rb_node *node;
+		struct extent_state *state;
+
+		node = rb_first(&tree->state);
+		state = rb_entry(node, struct extent_state, rb_node);
+		rb_erase(&state->rb_node, &tree->state);
+		RB_CLEAR_NODE(&state->rb_node);
+		/*
+		 * btree io trees aren't supposed to have tasks waiting for
+		 * changes in the flags of extent states ever.
+		 */
+		ASSERT(!waitqueue_active(&state->wq));
+		free_extent_state(state);
+
+		cond_resched_lock(&tree->lock);
+	}
+	spin_unlock(&tree->lock);
+}
 static struct extent_state *alloc_extent_state(gfp_t mask)
 {
 	struct extent_state *state;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 08749e0b9c32..d7beb2b3bc7d 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -240,6 +240,7 @@ typedef struct extent_map *(get_extent_t)(struct btrfs_inode *inode,
 					  int create);
 
 void extent_io_tree_init(struct extent_io_tree *tree, void *private_data);
+void extent_io_tree_release(struct extent_io_tree *tree);
 int try_release_extent_mapping(struct page *page, gfp_t mask);
 int try_release_extent_buffer(struct page *page);
 int lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index d12c9c17d9e9..07f5477abd0a 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -80,35 +80,6 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction)
 	}
 }
 
-static void clear_btree_io_tree(struct extent_io_tree *tree)
-{
-	spin_lock(&tree->lock);
-	/*
-	 * Do a single barrier for the waitqueue_active check here, the state
-	 * of the waitqueue should not change once clear_btree_io_tree is
-	 * called.
-	 */
-	smp_mb();
-	while (!RB_EMPTY_ROOT(&tree->state)) {
-		struct rb_node *node;
-		struct extent_state *state;
-
-		node = rb_first(&tree->state);
-		state = rb_entry(node, struct extent_state, rb_node);
-		rb_erase(&state->rb_node, &tree->state);
-		RB_CLEAR_NODE(&state->rb_node);
-		/*
-		 * btree io trees aren't supposed to have tasks waiting for
-		 * changes in the flags of extent states ever.
-		 */
-		ASSERT(!waitqueue_active(&state->wq));
-		free_extent_state(state);
-
-		cond_resched_lock(&tree->lock);
-	}
-	spin_unlock(&tree->lock);
-}
-
 static noinline void switch_commit_roots(struct btrfs_transaction *trans)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
@@ -122,7 +93,7 @@ static noinline void switch_commit_roots(struct btrfs_transaction *trans)
 		root->commit_root = btrfs_root_node(root);
 		if (is_fstree(root->root_key.objectid))
 			btrfs_unpin_free_ino(root);
-		clear_btree_io_tree(&root->dirty_log_pages);
+		extent_io_tree_release(&root->dirty_log_pages);
 		btrfs_qgroup_clean_swapped_blocks(root);
 	}
 
@@ -930,7 +901,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
 		 * superblock that points to btree nodes/leafs for which
 		 * writeback hasn't finished yet (and without errors).
 		 * We cleanup any entries left in the io tree when committing
-		 * the transaction (through clear_btree_io_tree()).
+		 * the transaction (through extent_io_tree_release()).
 		 */
 		if (err == -ENOMEM) {
 			err = 0;
@@ -975,7 +946,7 @@ static int __btrfs_wait_marked_extents(struct btrfs_fs_info *fs_info,
 		 * left in the io tree. For a log commit, we don't remove them
 		 * after committing the log because the tree can be accessed
 		 * concurrently - we do it only at transaction commit time when
-		 * it's safe to do it (through clear_btree_io_tree()).
+		 * it's safe to do it (through extent_io_tree_release()).
 		 */
 		err = clear_extent_bit(dirty_pages, start, end,
 				       EXTENT_NEED_WAIT, 0, 0, &cached_state);
@@ -1053,7 +1024,7 @@ static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans)
 	blk_finish_plug(&plug);
 	ret2 = btrfs_wait_extents(fs_info, dirty_pages);
 
-	clear_btree_io_tree(&trans->transaction->dirty_pages);
+	extent_io_tree_release(&trans->transaction->dirty_pages);
 
 	if (ret)
 		return ret;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 05/12] btrfs: Populate ->orig_block_len during read_one_chunk
  2019-02-11  8:34 [PATCH v2 00/12] FITRIM improvements Nikolay Borisov
                   ` (3 preceding siblings ...)
  2019-02-11  8:35 ` [PATCH v2 04/12] btrfs: Rename and export clear_btree_io_tree Nikolay Borisov
@ 2019-02-11  8:35 ` Nikolay Borisov
  2019-02-28 16:53   ` David Sterba
  2019-02-11  8:35 ` [PATCH v2 06/12] btrfs: Introduce new bits for device allocation tree Nikolay Borisov
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-11  8:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Chunks read from disk currently don't get their ->orig_block_len member
set, in contrast when a new chunk is allocated, the respective
extent_map's ->orig_block_len is assigned the size of the stripe of this
chunk. Let's apply the same strategy for chunks which are read from
disk, not only does this codify the invariant that ->orig_block_len
always contains the size of the stripe for a chunk (when the em belongs
to the mapping tree). But it's also a preparatory patch for further work
around tracking chunk allocation in an extent tree rather than
pinned/pending lists.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/volumes.c | 41 ++++++++++++++++++++++-------------------
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f0d91b1fda1c..4c654e0bd618 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6811,6 +6811,26 @@ static void btrfs_report_missing_device(struct btrfs_fs_info *fs_info,
 			      devid, uuid);
 }
 
+static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes)
+{
+	int index = btrfs_bg_flags_to_raid_index(type);
+	int ncopies = btrfs_raid_array[index].ncopies;
+	int data_stripes;
+
+	switch (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
+	case BTRFS_BLOCK_GROUP_RAID5:
+		data_stripes = num_stripes - 1;
+		break;
+	case BTRFS_BLOCK_GROUP_RAID6:
+		data_stripes = num_stripes - 2;
+		break;
+	default:
+		data_stripes = num_stripes / ncopies;
+		break;
+	}
+	return div_u64(chunk_len, data_stripes);
+}
+
 static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 			  struct extent_buffer *leaf,
 			  struct btrfs_chunk *chunk)
@@ -6870,6 +6890,8 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 	map->type = btrfs_chunk_type(leaf, chunk);
 	map->sub_stripes = btrfs_chunk_sub_stripes(leaf, chunk);
 	map->verified_stripes = 0;
+	em->orig_block_len = calc_stripe_length(map->type, em->len,
+						map->num_stripes);
 	for (i = 0; i < num_stripes; i++) {
 		map->stripes[i].physical =
 			btrfs_stripe_offset_nr(leaf, chunk, i);
@@ -7727,25 +7749,6 @@ int btrfs_bg_type_to_factor(u64 flags)
 }
 
 
-static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes)
-{
-	int index = btrfs_bg_flags_to_raid_index(type);
-	int ncopies = btrfs_raid_array[index].ncopies;
-	int data_stripes;
-
-	switch (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
-	case BTRFS_BLOCK_GROUP_RAID5:
-		data_stripes = num_stripes - 1;
-		break;
-	case BTRFS_BLOCK_GROUP_RAID6:
-		data_stripes = num_stripes - 2;
-		break;
-	default:
-		data_stripes = num_stripes / ncopies;
-		break;
-	}
-	return div_u64(chunk_len, data_stripes);
-}
 
 static int verify_one_dev_extent(struct btrfs_fs_info *fs_info,
 				 u64 chunk_offset, u64 devid,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 06/12] btrfs: Introduce new bits for device allocation tree
  2019-02-11  8:34 [PATCH v2 00/12] FITRIM improvements Nikolay Borisov
                   ` (4 preceding siblings ...)
  2019-02-11  8:35 ` [PATCH v2 05/12] btrfs: Populate ->orig_block_len during read_one_chunk Nikolay Borisov
@ 2019-02-11  8:35 ` Nikolay Borisov
  2019-02-28 16:57   ` David Sterba
  2019-02-11  8:35 ` [PATCH v2 07/12] btrfs: replace pending/pinned chunks lists with io tree Nikolay Borisov
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-11  8:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Rather than hijacking the existing defines let's just define new bits,
with more descriptive names. Instead of using yet more (currently at 18)
bits for the new flags, use the fact those flags will be specific to
the device allocation tree so define them using existing EXTENT_* flags.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index d7beb2b3bc7d..af7e00a3678c 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -29,6 +29,10 @@
 				 EXTENT_CLEAR_DATA_RESV)
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING)
 
+
+/* Redefined bits above which are used only in the device allocation tree */
+#define CHUNK_ALLOCATED EXTENT_DIRTY
+
 /*
  * flags for bio submission. The high bits indicate the compression
  * type for this bio
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 07/12] btrfs: replace pending/pinned chunks lists with io tree
  2019-02-11  8:34 [PATCH v2 00/12] FITRIM improvements Nikolay Borisov
                   ` (5 preceding siblings ...)
  2019-02-11  8:35 ` [PATCH v2 06/12] btrfs: Introduce new bits for device allocation tree Nikolay Borisov
@ 2019-02-11  8:35 ` Nikolay Borisov
  2019-02-12 14:13   ` [PATCH] btrfs: Transpose btrfs_close_devices/btrfs_mapping_tree_free in close_ctree Nikolay Borisov
  2019-02-11  8:35 ` [PATCH v2 08/12] btrfs: Remove 'trans' argument from find_free_dev_extent(_start) Nikolay Borisov
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-11  8:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Jeff Mahoney, Nikolay Borisov

From: Jeff Mahoney <jeffm@suse.com>

The pending chunks list contains chunks that are allocated in the
current transaction but haven't been created yet. The pinned chunks
list contains chunks that are being released in the current transaction.
Both describe chunks that are not reflected on disk as in use but are
unavailable just the same.

The pending chunks list is anchored by the transaction handle, which
means that we need to hold a reference to a transaction when working
with the list.

We use these lists to ensure that we don't end up discarding chunks
that are allocated or released in the current transaction.  What we r

The way we use them is by iterating over both lists to perform
comparisons on the stripes they describe for each device. This is
backwards and requires that we keep a transaction handle open while
we're trimming.

This patchset adds an extent_io_tree to btrfs_device that maintains
the allocation state of the device.  Extents are set dirty when
chunks are first allocated -- when the extent maps are added to the
mapping tree. They're cleared when last removed -- when the extent
maps are removed from the mapping tree. This matches the lifespan
of the pending and pinned chunks list and allows us to do trims
on unallocated space safely without pinning the transaction for what
may be a lengthy operation. We can also use this io tree to mark
which chunks have already been trimmed so we don't repeat the operation.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/ctree.h            |  6 ---
 fs/btrfs/disk-io.c          | 11 -----
 fs/btrfs/extent-tree.c      | 28 -----------
 fs/btrfs/extent_io.c        |  2 +-
 fs/btrfs/extent_io.h        |  6 ++-
 fs/btrfs/extent_map.c       | 36 ++++++++++++++
 fs/btrfs/extent_map.h       |  1 -
 fs/btrfs/free-space-cache.c |  4 --
 fs/btrfs/transaction.c      |  9 ----
 fs/btrfs/transaction.h      |  1 -
 fs/btrfs/volumes.c          | 96 +++++++++++++------------------------
 fs/btrfs/volumes.h          |  2 +
 12 files changed, 76 insertions(+), 126 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9306925b6790..86dbf2160ae2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1149,12 +1149,6 @@ struct btrfs_fs_info {
 	struct mutex unused_bg_unpin_mutex;
 	struct mutex delete_unused_bgs_mutex;
 
-	/*
-	 * Chunks that can't be freed yet (under a trim/discard operation)
-	 * and will be latter freed. Protected by fs_info->chunk_mutex.
-	 */
-	struct list_head pinned_chunks;
-
 	/* Cached block sizes */
 	u32 nodesize;
 	u32 sectorsize;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f1c42d242d48..acf312203cd1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2775,8 +2775,6 @@ int open_ctree(struct super_block *sb,
 	init_waitqueue_head(&fs_info->async_submit_wait);
 	init_waitqueue_head(&fs_info->delayed_iputs_wait);
 
-	INIT_LIST_HEAD(&fs_info->pinned_chunks);
-
 	/* Usable values until the real ones are cached from the superblock */
 	fs_info->nodesize = 4096;
 	fs_info->sectorsize = 4096;
@@ -4051,15 +4049,6 @@ void close_ctree(struct btrfs_fs_info *fs_info)
 
 	btrfs_free_stripe_hash_table(fs_info);
 	btrfs_free_ref_cache(fs_info);
-
-	while (!list_empty(&fs_info->pinned_chunks)) {
-		struct extent_map *em;
-
-		em = list_first_entry(&fs_info->pinned_chunks,
-				      struct extent_map, list);
-		list_del_init(&em->list);
-		free_extent_map(em);
-	}
 }
 
 int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 33ecd4128898..48bf3df0b194 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10895,10 +10895,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	memcpy(&key, &block_group->key, sizeof(key));
 
 	mutex_lock(&fs_info->chunk_mutex);
-	if (!list_empty(&em->list)) {
-		/* We're in the transaction->pending_chunks list. */
-		free_extent_map(em);
-	}
 	spin_lock(&block_group->lock);
 	block_group->removed = 1;
 	/*
@@ -10925,25 +10921,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	 * the transaction commit has completed.
 	 */
 	remove_em = (atomic_read(&block_group->trimming) == 0);
-	/*
-	 * Make sure a trimmer task always sees the em in the pinned_chunks list
-	 * if it sees block_group->removed == 1 (needs to lock block_group->lock
-	 * before checking block_group->removed).
-	 */
-	if (!remove_em) {
-		/*
-		 * Our em might be in trans->transaction->pending_chunks which
-		 * is protected by fs_info->chunk_mutex ([lock|unlock]_chunks),
-		 * and so is the fs_info->pinned_chunks list.
-		 *
-		 * So at this point we must be holding the chunk_mutex to avoid
-		 * any races with chunk allocation (more specifically at
-		 * volumes.c:contains_pending_extent()), to ensure it always
-		 * sees the em, either in the pending_chunks list or in the
-		 * pinned_chunks list.
-		 */
-		list_move_tail(&em->list, &fs_info->pinned_chunks);
-	}
 	spin_unlock(&block_group->lock);
 
 	if (remove_em) {
@@ -10951,11 +10928,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 
 		em_tree = &fs_info->mapping_tree.map_tree;
 		write_lock(&em_tree->lock);
-		/*
-		 * The em might be in the pending_chunks list, so make sure the
-		 * chunk mutex is locked, since remove_extent_mapping() will
-		 * delete us from that list.
-		 */
 		remove_extent_mapping(em_tree, em);
 		write_unlock(&em_tree->lock);
 		/* once for the tree */
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ae1049824739..9733790881ae 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -891,7 +891,7 @@ static void cache_state(struct extent_state *state,
  * [start, end] is inclusive This takes the tree lock.
  */
 
-static int __must_check
+int __must_check
 __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		 unsigned bits, unsigned exclusive_bits,
 		 u64 *failed_start, struct extent_state **cached_state,
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index af7e00a3678c..d4227e40c8ee 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -314,7 +314,11 @@ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		   unsigned bits, u64 *failed_start,
 		   struct extent_state **cached_state, gfp_t mask);
-
+int
+__set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
+		 unsigned bits, unsigned exclusive_bits,
+		 u64 *failed_start, struct extent_state **cached_state,
+		 gfp_t mask, struct extent_changeset *changeset);
 static inline int set_extent_bits(struct extent_io_tree *tree, u64 start,
 		u64 end, unsigned bits)
 {
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 928f729c55ba..0820f6fcf3a6 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -4,6 +4,7 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include "ctree.h"
+#include "volumes.h"
 #include "extent_map.h"
 #include "compression.h"
 
@@ -336,6 +337,37 @@ static inline void setup_extent_mapping(struct extent_map_tree *tree,
 	else
 		try_merge_map(tree, em);
 }
+static void extent_map_device_set_bits(struct extent_map *em, unsigned bits)
+{
+	struct map_lookup *map = em->map_lookup;
+	u64 stripe_size = em->orig_block_len;
+	int i;
+
+	for (i = 0; i < map->num_stripes; i++) {
+		struct btrfs_bio_stripe *stripe = &map->stripes[i];
+		struct btrfs_device *device = stripe->dev;
+
+		__set_extent_bit(&device->alloc_state, stripe->physical,
+				 stripe->physical + stripe_size - 1, bits,
+				 0, NULL, NULL, GFP_NOWAIT, NULL);
+	}
+}
+
+static void extent_map_device_clear_bits(struct extent_map *em, unsigned bits)
+{
+	struct map_lookup *map = em->map_lookup;
+	u64 stripe_size = em->orig_block_len;
+	int i;
+
+	for (i = 0; i < map->num_stripes; i++) {
+		struct btrfs_bio_stripe *stripe = &map->stripes[i];
+		struct btrfs_device *device = stripe->dev;
+
+		__clear_extent_bit(&device->alloc_state, stripe->physical,
+				   stripe->physical + stripe_size - 1, bits,
+				   0, 0, NULL, GFP_NOWAIT, NULL);
+	}
+}
 
 /**
  * add_extent_mapping - add new extent map to the extent tree
@@ -357,6 +389,8 @@ int add_extent_mapping(struct extent_map_tree *tree,
 		goto out;
 
 	setup_extent_mapping(tree, em, modified);
+	if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags))
+		extent_map_device_set_bits(em, CHUNK_ALLOCATED);
 out:
 	return ret;
 }
@@ -438,6 +472,8 @@ void remove_extent_mapping(struct extent_map_tree *tree, struct extent_map *em)
 	rb_erase_cached(&em->rb_node, &tree->map);
 	if (!test_bit(EXTENT_FLAG_LOGGING, &em->flags))
 		list_del_init(&em->list);
+	if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags))
+		extent_map_device_clear_bits(em, CHUNK_ALLOCATED);
 	RB_CLEAR_NODE(&em->rb_node);
 }
 
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index 473f039fcd7c..72b46833f236 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -91,7 +91,6 @@ void replace_extent_mapping(struct extent_map_tree *tree,
 			    struct extent_map *cur,
 			    struct extent_map *new,
 			    int modified);
-
 struct extent_map *alloc_extent_map(void);
 void free_extent_map(struct extent_map *em);
 int __init extent_map_init(void);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 74aa552f4793..207fb50dcc7a 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3366,10 +3366,6 @@ void btrfs_put_block_group_trimming(struct btrfs_block_group_cache *block_group)
 		em = lookup_extent_mapping(em_tree, block_group->key.objectid,
 					   1);
 		BUG_ON(!em); /* logic error, can't happen */
-		/*
-		 * remove_extent_mapping() will delete us from the pinned_chunks
-		 * list, which is protected by the chunk mutex.
-		 */
 		remove_extent_mapping(em_tree, em);
 		write_unlock(&em_tree->lock);
 		mutex_unlock(&fs_info->chunk_mutex);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 07f5477abd0a..e91f1a98d0dd 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -50,14 +50,6 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction)
 			btrfs_err(transaction->fs_info,
 				  "pending csums is %llu",
 				  transaction->delayed_refs.pending_csums);
-		while (!list_empty(&transaction->pending_chunks)) {
-			struct extent_map *em;
-
-			em = list_first_entry(&transaction->pending_chunks,
-					      struct extent_map, list);
-			list_del_init(&em->list);
-			free_extent_map(em);
-		}
 		/*
 		 * If any block groups are found in ->deleted_bgs then it's
 		 * because the transaction was aborted and a commit did not
@@ -235,7 +227,6 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 	spin_lock_init(&cur_trans->delayed_refs.lock);
 
 	INIT_LIST_HEAD(&cur_trans->pending_snapshots);
-	INIT_LIST_HEAD(&cur_trans->pending_chunks);
 	INIT_LIST_HEAD(&cur_trans->dev_update_list);
 	INIT_LIST_HEAD(&cur_trans->switch_commits);
 	INIT_LIST_HEAD(&cur_trans->dirty_bgs);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index e0a04fa4de66..134d1a0bd92f 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -51,7 +51,6 @@ struct btrfs_transaction {
 	wait_queue_head_t writer_wait;
 	wait_queue_head_t commit_wait;
 	struct list_head pending_snapshots;
-	struct list_head pending_chunks;
 	struct list_head dev_update_list;
 	struct list_head switch_commits;
 	struct list_head dirty_bgs;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4c654e0bd618..f8baa9d4c796 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -335,6 +335,8 @@ void btrfs_free_device(struct btrfs_device *device)
 {
 	BUG_ON(!list_empty(&device->post_commit_list));
 	rcu_string_free(device->name);
+	if (!in_softirq())
+		extent_io_tree_release(&device->alloc_state);
 	bio_put(device->flush_bio);
 	kfree(device);
 }
@@ -411,6 +413,7 @@ static struct btrfs_device *__alloc_device(void)
 	btrfs_device_data_ordered_init(dev);
 	INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
 	INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
+	extent_io_tree_init(&dev->alloc_state, NULL);
 
 	return dev;
 }
@@ -1269,6 +1272,9 @@ static void btrfs_close_one_device(struct btrfs_device *device)
 	if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
 		fs_devices->missing_devices--;
 
+	/* Remove alloc state now since it cannot be done in RCU context */
+	extent_io_tree_release(&device->alloc_state);
+
 	btrfs_close_bdev(device);
 
 	new_device = btrfs_alloc_device(NULL, &device->devid,
@@ -1505,58 +1511,29 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
 	return device;
 }
 
-static int contains_pending_extent(struct btrfs_transaction *transaction,
-				   struct btrfs_device *device,
-				   u64 *start, u64 len)
-{
-	struct btrfs_fs_info *fs_info = device->fs_info;
-	struct extent_map *em;
-	struct list_head *search_list = &fs_info->pinned_chunks;
-	int ret = 0;
-	u64 physical_start = *start;
-
-	if (transaction)
-		search_list = &transaction->pending_chunks;
-again:
-	list_for_each_entry(em, search_list, list) {
-		struct map_lookup *map;
-		int i;
-
-		map = em->map_lookup;
-		for (i = 0; i < map->num_stripes; i++) {
-			u64 end;
-
-			if (map->stripes[i].dev != device)
-				continue;
-			if (map->stripes[i].physical >= physical_start + len ||
-			    map->stripes[i].physical + em->orig_block_len <=
-			    physical_start)
-				continue;
-			/*
-			 * Make sure that while processing the pinned list we do
-			 * not override our *start with a lower value, because
-			 * we can have pinned chunks that fall within this
-			 * device hole and that have lower physical addresses
-			 * than the pending chunks we processed before. If we
-			 * do not take this special care we can end up getting
-			 * 2 pending chunks that start at the same physical
-			 * device offsets because the end offset of a pinned
-			 * chunk can be equal to the start offset of some
-			 * pending chunk.
-			 */
-			end = map->stripes[i].physical + em->orig_block_len;
-			if (end > *start) {
-				*start = end;
-				ret = 1;
-			}
+/*
+ * Tries to find a chunk that intersects [start, start +len] range and when one
+ * such is found, records the end of it in *start
+ */
+#define in_range(b, first, len)        ((b) >= (first) && (b) < (first) + (len))
+static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
+				    u64 len)
+{
+	u64 physical_start, physical_end;
+	lockdep_assert_held(&device->fs_info->chunk_mutex);
+
+	if (!find_first_extent_bit(&device->alloc_state, *start,
+				   &physical_start, &physical_end,
+				   CHUNK_ALLOCATED, NULL)) {
+
+		if (in_range(physical_start, *start, len) ||
+		    in_range(*start, physical_start,
+			     physical_end - physical_start)) {
+			*start = physical_end + 1;
+			return true;
 		}
 	}
-	if (search_list != &fs_info->pinned_chunks) {
-		search_list = &fs_info->pinned_chunks;
-		goto again;
-	}
-
-	return ret;
+	return false;
 }
 
 
@@ -1667,15 +1644,12 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction,
 			 * Have to check before we set max_hole_start, otherwise
 			 * we could end up sending back this offset anyway.
 			 */
-			if (contains_pending_extent(transaction, device,
-						    &search_start,
+			if (contains_pending_extent(device, &search_start,
 						    hole_size)) {
-				if (key.offset >= search_start) {
+				if (key.offset >= search_start)
 					hole_size = key.offset - search_start;
-				} else {
-					WARN_ON_ONCE(1);
+				else
 					hole_size = 0;
-				}
 			}
 
 			if (hole_size > max_hole_size) {
@@ -1716,8 +1690,7 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction,
 	if (search_end > search_start) {
 		hole_size = search_end - search_start;
 
-		if (contains_pending_extent(transaction, device, &search_start,
-					    hole_size)) {
+		if (contains_pending_extent(device, &search_start, hole_size)) {
 			btrfs_release_path(path);
 			goto again;
 		}
@@ -4759,7 +4732,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	 * in-memory chunks are synced to disk so that the loop below sees them
 	 * and relocates them accordingly.
 	 */
-	if (contains_pending_extent(trans->transaction, device, &start, diff)) {
+	if (contains_pending_extent(device, &start, diff)) {
 		mutex_unlock(&fs_info->chunk_mutex);
 		ret = btrfs_commit_transaction(trans);
 		if (ret)
@@ -5201,9 +5174,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		free_extent_map(em);
 		goto error;
 	}
-
-	list_add_tail(&em->list, &trans->transaction->pending_chunks);
-	refcount_inc(&em->refs);
 	write_unlock(&em_tree->lock);
 
 	ret = btrfs_make_block_group(trans, 0, type, start, chunk_size);
@@ -5236,8 +5206,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	free_extent_map(em);
 	/* One for the tree reference */
 	free_extent_map(em);
-	/* One for the pending_chunks list reference */
-	free_extent_map(em);
 error:
 	kfree(devices_info);
 	return ret;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index a0f09aad3770..49dc737d8a54 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -134,6 +134,8 @@ struct btrfs_device {
 	/* Counter to record the change of device stats */
 	atomic_t dev_stats_ccnt;
 	atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
+
+	struct extent_io_tree alloc_state;
 };
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 08/12] btrfs: Remove 'trans' argument from find_free_dev_extent(_start)
  2019-02-11  8:34 [PATCH v2 00/12] FITRIM improvements Nikolay Borisov
                   ` (6 preceding siblings ...)
  2019-02-11  8:35 ` [PATCH v2 07/12] btrfs: replace pending/pinned chunks lists with io tree Nikolay Borisov
@ 2019-02-11  8:35 ` Nikolay Borisov
  2019-02-11  8:35 ` [PATCH v2 09/12] btrfs: Factor out in_range macro Nikolay Borisov
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-11  8:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Now that those function no longer require a handle to transaction to
inspect pending/pinned chunks the argument can be removed. At the same
time also remove any surrounding code which acquired the handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent-tree.c | 36 +++---------------------------------
 fs/btrfs/volumes.c     | 11 ++++-------
 fs/btrfs/volumes.h     |  8 +++-----
 3 files changed, 10 insertions(+), 45 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 48bf3df0b194..39647ddb2195 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9864,12 +9864,10 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache)
  */
 int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 {
-	struct btrfs_root *root = fs_info->extent_root;
 	struct btrfs_block_group_cache *block_group;
 	struct btrfs_space_info *space_info;
 	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	struct btrfs_device *device;
-	struct btrfs_trans_handle *trans;
 	u64 min_free;
 	u64 dev_min = 1;
 	u64 dev_nr = 0;
@@ -9968,13 +9966,6 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 		min_free = div64_u64(min_free, dev_min);
 	}
 
-	/* We need to do this so that we can look at pending chunks */
-	trans = btrfs_join_transaction(root);
-	if (IS_ERR(trans)) {
-		ret = PTR_ERR(trans);
-		goto out;
-	}
-
 	mutex_lock(&fs_info->chunk_mutex);
 	list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) {
 		u64 dev_offset;
@@ -9985,7 +9976,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 		 */
 		if (device->total_bytes > device->bytes_used + min_free &&
 		    !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
-			ret = find_free_dev_extent(trans, device, min_free,
+			ret = find_free_dev_extent(device, min_free,
 						   &dev_offset, NULL);
 			if (!ret)
 				dev_nr++;
@@ -10001,7 +9992,6 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 			   "no space to allocate a new chunk for block group %llu",
 			   block_group->key.objectid);
 	mutex_unlock(&fs_info->chunk_mutex);
-	btrfs_end_transaction(trans);
 out:
 	btrfs_put_block_group(block_group);
 	return ret;
@@ -11253,34 +11243,14 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 
 	while (1) {
 		struct btrfs_fs_info *fs_info = device->fs_info;
-		struct btrfs_transaction *trans;
 		u64 bytes;
 
 		ret = mutex_lock_interruptible(&fs_info->chunk_mutex);
 		if (ret)
 			break;
 
-		ret = down_read_killable(&fs_info->commit_root_sem);
-		if (ret) {
-			mutex_unlock(&fs_info->chunk_mutex);
-			break;
-		}
-
-		spin_lock(&fs_info->trans_lock);
-		trans = fs_info->running_transaction;
-		if (trans)
-			refcount_inc(&trans->use_count);
-		spin_unlock(&fs_info->trans_lock);
-
-		if (!trans)
-			up_read(&fs_info->commit_root_sem);
-
-		ret = find_free_dev_extent_start(trans, device, range->minlen,
-						 start, &start, &len);
-		if (trans) {
-			up_read(&fs_info->commit_root_sem);
-			btrfs_put_transaction(trans);
-		}
+		ret = find_free_dev_extent_start(device, range->minlen, start,
+						 &start, &len);
 
 		if (ret) {
 			mutex_unlock(&fs_info->chunk_mutex);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f8baa9d4c796..ece4e5fad9c6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1558,8 +1558,7 @@ static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
  * But if we don't find suitable free space, it is used to store the size of
  * the max free space.
  */
-int find_free_dev_extent_start(struct btrfs_transaction *transaction,
-			       struct btrfs_device *device, u64 num_bytes,
+int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes,
 			       u64 search_start, u64 *start, u64 *len)
 {
 	struct btrfs_fs_info *fs_info = device->fs_info;
@@ -1715,13 +1714,11 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction,
 	return ret;
 }
 
-int find_free_dev_extent(struct btrfs_trans_handle *trans,
-			 struct btrfs_device *device, u64 num_bytes,
+int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
 			 u64 *start, u64 *len)
 {
 	/* FIXME use last free of some kind */
-	return find_free_dev_extent_start(trans->transaction, device,
-					  num_bytes, 0, start, len);
+	return find_free_dev_extent_start(device, num_bytes, 0, start, len);
 }
 
 static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans,
@@ -5040,7 +5037,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		if (total_avail == 0)
 			continue;
 
-		ret = find_free_dev_extent(trans, device,
+		ret = find_free_dev_extent(device,
 					   max_stripe_size * dev_stripes,
 					   &dev_offset, &max_avail);
 		if (ret && ret != -ENOSPC)
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 49dc737d8a54..30c1f3002a81 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -445,11 +445,9 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info);
 int btrfs_create_uuid_tree(struct btrfs_fs_info *fs_info);
 int btrfs_check_uuid_tree(struct btrfs_fs_info *fs_info);
 int btrfs_chunk_readonly(struct btrfs_fs_info *fs_info, u64 chunk_offset);
-int find_free_dev_extent_start(struct btrfs_transaction *transaction,
-			 struct btrfs_device *device, u64 num_bytes,
-			 u64 search_start, u64 *start, u64 *max_avail);
-int find_free_dev_extent(struct btrfs_trans_handle *trans,
-			 struct btrfs_device *device, u64 num_bytes,
+int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes,
+			       u64 search_start, u64 *start, u64 *max_avail);
+int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
 			 u64 *start, u64 *max_avail);
 void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index);
 int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 09/12] btrfs: Factor out in_range macro
  2019-02-11  8:34 [PATCH v2 00/12] FITRIM improvements Nikolay Borisov
                   ` (7 preceding siblings ...)
  2019-02-11  8:35 ` [PATCH v2 08/12] btrfs: Remove 'trans' argument from find_free_dev_extent(_start) Nikolay Borisov
@ 2019-02-11  8:35 ` Nikolay Borisov
  2019-02-28 18:00   ` David Sterba
  2019-02-11  8:35 ` [PATCH v2 10/12] btrfs: Optimize unallocated chunks discard Nikolay Borisov
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-11  8:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This is used in more than one places so let's factor it out in ctree.h.
No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/ctree.h       | 2 ++
 fs/btrfs/extent-tree.c | 1 -
 fs/btrfs/volumes.c     | 1 -
 3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 86dbf2160ae2..c61fff4c294d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3808,6 +3808,8 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info)
 	return signal_pending(current);
 }
 
+#define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len))
+
 /* Sanity test specific functions */
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 void btrfs_test_inode_set_ops(struct inode *inode);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 39647ddb2195..188774ed7795 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1905,7 +1905,6 @@ static int remove_extent_backref(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-#define in_range(b, first, len)        ((b) >= (first) && (b) < (first) + (len))
 static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
 			       u64 *discarded_bytes)
 {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ece4e5fad9c6..6fd8df6e3964 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1515,7 +1515,6 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
  * Tries to find a chunk that intersects [start, start +len] range and when one
  * such is found, records the end of it in *start
  */
-#define in_range(b, first, len)        ((b) >= (first) && (b) < (first) + (len))
 static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
 				    u64 len)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 10/12] btrfs: Optimize unallocated chunks discard
  2019-02-11  8:34 [PATCH v2 00/12] FITRIM improvements Nikolay Borisov
                   ` (8 preceding siblings ...)
  2019-02-11  8:35 ` [PATCH v2 09/12] btrfs: Factor out in_range macro Nikolay Borisov
@ 2019-02-11  8:35 ` Nikolay Borisov
  2019-02-11  8:35 ` [PATCH v2 11/12] btrfs: Implement find_first_clear_extent_bit Nikolay Borisov
  2019-02-11  8:35 ` [PATCH v2 12/12] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit Nikolay Borisov
  11 siblings, 0 replies; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-11  8:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Currently unallocated chunks are always trimmed. For example
2 consecutive trims on large storage would trim freespace twice
irrespective of whether the space was actually allocated or not between
those trims.

Optimise this behavior by exploiting the newly introduced alloc_state
tree of btrfs_device. A new CHUNK_TRIMMED bit is used to mark
those unallocated chunks which have been trimmed and have not been
allocated afterwards. On chunk allocation the respective underlying devices'
physical space will have its CHUNK_TRIMMED flag cleared. This avoids
submitting discards for space which hasn't been changed since the last
time discard was issued.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent-tree.c | 57 +++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/extent_io.h   |  8 +++++-
 fs/btrfs/extent_map.c  |  4 ++-
 3 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 188774ed7795..2d4d597c8ca4 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -11198,6 +11198,54 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
 	return unpin_extent_range(fs_info, start, end, false);
 }
 
+static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len)
+{
+	u64 trimmed_start = 0, trimmed_end = 0;
+	u64 end = *start + *len - 1;
+
+	if (!find_first_extent_bit(&device->alloc_state, *start, &trimmed_start,
+				   &trimmed_end, CHUNK_TRIMMED, NULL)) {
+		u64 trimmed_len = trimmed_end - trimmed_start + 1;
+
+		if (*start < trimmed_start) {
+			if (in_range(end, trimmed_start, trimmed_len) ||
+			    end > trimmed_end) {
+				/*
+				 * start|------|end
+				 *      ts|--|trimmed_len
+				 *      OR
+				 * start|-----|end
+				 *      ts|-----|trimmed_len
+				 */
+				*len = trimmed_start - *start;
+				return false;
+			} else if (end < trimmed_start) {
+				/*
+				 * start|------|end
+				 *             ts|--|trimmed_len
+				 */
+				return false;
+			}
+		} else if (in_range(*start, trimmed_start, trimmed_len)) {
+			if (in_range(end, trimmed_start, trimmed_len)) {
+				/*
+				 * start|------|end
+				 *  ts|----------|trimmed_len
+				 */
+				return true;
+			} else {
+				/*
+				 * start|-----------|end
+				 *  ts|----------|trimmed_len
+				 */
+				*start = trimmed_end + 1;
+				*len = end - *start + 1;
+				return false;
+			}
+		}
+	}
+	return false;
+}
 /*
  * It used to be that old block groups would be left around forever.
  * Iterating over them would be enough to trim unused space.  Since we
@@ -11268,7 +11316,14 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 		start = max(range->start, start);
 		len = min(range->len, len);
 
-		ret = btrfs_issue_discard(device->bdev, start, len, &bytes);
+		if (!should_skip_trim(device, &start, &len)) {
+			ret = btrfs_issue_discard(device->bdev, start, len,
+						  &bytes);
+			if (!ret)
+				set_extent_bits(&device->alloc_state, start,
+						start + bytes - 1,
+						CHUNK_TRIMMED);
+		}
 		mutex_unlock(&fs_info->chunk_mutex);
 
 		if (ret)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index d4227e40c8ee..d238efd628cf 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -30,8 +30,14 @@
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING)
 
 
-/* Redefined bits above which are used only in the device allocation tree */
+/*
+ * Redefined bits above which are used only in the device allocation tree,
+ * shouldn't be using EXTENT_IOBITS(EXTENT_LOCKED/EXTENT_WRITEBACK) /
+ * EXTENT_BOUNDARY / EXTENT_CLEAR_META_RESV / EXTENT_CLEAR_DATA_RESV because
+ * they have special meaning to the bit manipulation functions
+ */
 #define CHUNK_ALLOCATED EXTENT_DIRTY
+#define CHUNK_TRIMMED   EXTENT_DEFRAG
 
 /*
  * flags for bio submission. The high bits indicate the compression
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 0820f6fcf3a6..9e8c0904f623 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -389,8 +389,10 @@ int add_extent_mapping(struct extent_map_tree *tree,
 		goto out;
 
 	setup_extent_mapping(tree, em, modified);
-	if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags))
+	if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags)) {
 		extent_map_device_set_bits(em, CHUNK_ALLOCATED);
+		extent_map_device_clear_bits(em, CHUNK_TRIMMED);
+	}
 out:
 	return ret;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 11/12] btrfs: Implement find_first_clear_extent_bit
  2019-02-11  8:34 [PATCH v2 00/12] FITRIM improvements Nikolay Borisov
                   ` (9 preceding siblings ...)
  2019-02-11  8:35 ` [PATCH v2 10/12] btrfs: Optimize unallocated chunks discard Nikolay Borisov
@ 2019-02-11  8:35 ` Nikolay Borisov
  2019-02-11  8:35 ` [PATCH v2 12/12] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit Nikolay Borisov
  11 siblings, 0 replies; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-11  8:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This function is very similar to find_first_extent_bit except that it
locates the first contiguous span of space which does not have bits set.
It's intended use is in the freespace trimming code.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/extent_io.h |  2 ++
 2 files changed, 75 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9733790881ae..95b9af7376c7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1505,6 +1505,79 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 	return ret;
 }
 
+/**
+ * find_first_clear_extent_bit - finds the first range that has @bits not set
+ * and that starts after @start
+ *
+ * @tree - the tree to search
+ * @start - the offset at/after which the found extent should start
+ * @start_ret - records the beginning of the range
+ * @end_ret - records the end of the range (inclusive)
+ * @bits - the set of bits which must be unset
+ *
+ * Since unallocated range is also considered one which doesn't have the bits
+ * set it's possible that @end_ret contains -1, this happens in case the range
+ * spans (last_range_end, end of device]. In this case it's up to the caller to
+ * trim @end_ret to the appropriate size.
+ */
+void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
+				 u64 *start_ret, u64 *end_ret, unsigned bits)
+{
+	struct extent_state *state;
+	struct rb_node *node, *prev = NULL, *next;
+
+	spin_lock(&tree->lock);
+
+	/* Find first extent with bits cleared */
+	while (1) {
+		node = __etree_search(tree, start, &next, &prev, NULL, NULL);
+		if (!node) {
+			node = next;
+			if (!node) {
+				/*
+				 * We are past the last allocated chunk,
+				 * set start at the end of the last extent. The
+				 * device alloc tree should never be empty so
+				 * prev is always set.
+				 */
+				ASSERT(prev);
+				state = rb_entry(prev, struct extent_state, rb_node);
+				*start_ret = state->end + 1;
+				*end_ret = -1;
+				goto out;
+			}
+		}
+		state = rb_entry(node, struct extent_state, rb_node);
+		if (in_range(start, state->start, state->end - state->start + 1) &&
+			(state->state & bits)) {
+			start = state->end + 1;
+		} else {
+			*start_ret = start;
+			break;
+		}
+	}
+
+	/*
+	 * Find the longest stretch from start until an entry which has the
+	 * bits set
+	 */
+	while (1) {
+		state = rb_entry(node, struct extent_state, rb_node);
+		if (state->end >= start && !(state->state & bits)) {
+			*end_ret = state->end;
+		} else {
+			*end_ret = state->start - 1;
+			break;
+		}
+
+		node = rb_next(node);
+		if (!node)
+			break;
+	}
+out:
+	spin_unlock(&tree->lock);
+}
+
 /*
  * find a contiguous range of bytes in the file marked as delalloc, not
  * more than 'max_bytes'.  start and end are used to return the range,
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index d238efd628cf..7012ff27da82 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -391,6 +391,8 @@ static inline int set_extent_uptodate(struct extent_io_tree *tree, u64 start,
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 			  u64 *start_ret, u64 *end_ret, unsigned bits,
 			  struct extent_state **cached_state);
+void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
+				 u64 *start_ret, u64 *end_ret, unsigned bits);
 int extent_invalidatepage(struct extent_io_tree *tree,
 			  struct page *page, unsigned long offset);
 int extent_write_full_page(struct page *page, struct writeback_control *wbc);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 12/12] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit
  2019-02-11  8:34 [PATCH v2 00/12] FITRIM improvements Nikolay Borisov
                   ` (10 preceding siblings ...)
  2019-02-11  8:35 ` [PATCH v2 11/12] btrfs: Implement find_first_clear_extent_bit Nikolay Borisov
@ 2019-02-11  8:35 ` Nikolay Borisov
  11 siblings, 0 replies; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-11  8:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Instead of always calling the allocator to search for a free extent,
that satisfies the input criteria, switch btrfs_trim_free_extents to
using find_first_clear_extent_bit. With this change it's no longer
necessary to read the device tree in order to figure out holes in
the devices.

Now the code always searches in-memory data structure to figure out the
space range which contains the requested which should result in speed
oups.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent-tree.c | 89 ++++++++++++------------------------------
 1 file changed, 26 insertions(+), 63 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2d4d597c8ca4..cb56fbd84e6a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -11198,54 +11198,6 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
 	return unpin_extent_range(fs_info, start, end, false);
 }
 
-static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len)
-{
-	u64 trimmed_start = 0, trimmed_end = 0;
-	u64 end = *start + *len - 1;
-
-	if (!find_first_extent_bit(&device->alloc_state, *start, &trimmed_start,
-				   &trimmed_end, CHUNK_TRIMMED, NULL)) {
-		u64 trimmed_len = trimmed_end - trimmed_start + 1;
-
-		if (*start < trimmed_start) {
-			if (in_range(end, trimmed_start, trimmed_len) ||
-			    end > trimmed_end) {
-				/*
-				 * start|------|end
-				 *      ts|--|trimmed_len
-				 *      OR
-				 * start|-----|end
-				 *      ts|-----|trimmed_len
-				 */
-				*len = trimmed_start - *start;
-				return false;
-			} else if (end < trimmed_start) {
-				/*
-				 * start|------|end
-				 *             ts|--|trimmed_len
-				 */
-				return false;
-			}
-		} else if (in_range(*start, trimmed_start, trimmed_len)) {
-			if (in_range(end, trimmed_start, trimmed_len)) {
-				/*
-				 * start|------|end
-				 *  ts|----------|trimmed_len
-				 */
-				return true;
-			} else {
-				/*
-				 * start|-----------|end
-				 *  ts|----------|trimmed_len
-				 */
-				*start = trimmed_end + 1;
-				*len = end - *start + 1;
-				return false;
-			}
-		}
-	}
-	return false;
-}
 /*
  * It used to be that old block groups would be left around forever.
  * Iterating over them would be enough to trim unused space.  Since we
@@ -11269,7 +11221,7 @@ static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len)
 static int btrfs_trim_free_extents(struct btrfs_device *device,
 				   struct fstrim_range *range, u64 *trimmed)
 {
-	u64 start = range->start, len = 0;
+	u64 start = max_t(u64, range->start, SZ_1M), len = 0, end = 0;
 	int ret;
 
 	*trimmed = 0;
@@ -11296,34 +11248,45 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 		if (ret)
 			break;
 
-		ret = find_free_dev_extent_start(device, range->minlen, start,
-						 &start, &len);
+		find_first_clear_extent_bit(&device->alloc_state, start,
+					    &start, &end,
+					    CHUNK_TRIMMED | CHUNK_ALLOCATED);
+		/* If find_first_clear_extent_bit find a range that spans the
+		 * end of the device it will set end to -1, in this case it's up
+		 * to the caller to trim the value to the size of the device.
+		 */
+		end = min(end, device->total_bytes);
+		len = end - start + 1;
 
-		if (ret) {
+		/* We didn't find any extents */
+		if (!len) {
 			mutex_unlock(&fs_info->chunk_mutex);
-			if (ret == -ENOSPC)
-				ret = 0;
+			ret = 0;
 			break;
 		}
 
+		/* Keep going until we satisfy minlen or reach end of space */
+		if (len < range->minlen) {
+			mutex_unlock(&fs_info->chunk_mutex);
+			start += len;
+			continue;
+		}
+
 		/* If we are out of the passed range break */
 		if (start > range->start + range->len - 1) {
 			mutex_unlock(&fs_info->chunk_mutex);
-			ret = 0;
 			break;
 		}
 
 		start = max(range->start, start);
 		len = min(range->len, len);
 
-		if (!should_skip_trim(device, &start, &len)) {
-			ret = btrfs_issue_discard(device->bdev, start, len,
-						  &bytes);
-			if (!ret)
-				set_extent_bits(&device->alloc_state, start,
-						start + bytes - 1,
-						CHUNK_TRIMMED);
-		}
+		ret = btrfs_issue_discard(device->bdev, start, len,
+					  &bytes);
+		if (!ret)
+			set_extent_bits(&device->alloc_state, start,
+					start + bytes - 1,
+					CHUNK_TRIMMED);
 		mutex_unlock(&fs_info->chunk_mutex);
 
 		if (ret)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] btrfs: Transpose btrfs_close_devices/btrfs_mapping_tree_free in close_ctree
  2019-02-11  8:35 ` [PATCH v2 07/12] btrfs: replace pending/pinned chunks lists with io tree Nikolay Borisov
@ 2019-02-12 14:13   ` Nikolay Borisov
  0 siblings, 0 replies; 19+ messages in thread
From: Nikolay Borisov @ 2019-02-12 14:13 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Following the introduction of the alloc_state tree, some of the callees
of btrfs_mapping_tree_free will have to interact with the btrfs_device
of the constituent devices. Enable this by moving the code responsible
for freeing devices after the last user (btrfs_mapping_tree_free).
Otherwise the kernel could crash due to UAF.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---

Hello David, 

The following patch either needs to come before "btrfs: replace pending/pinned
chunks lists with io tree" or has to be squashed in said commit. I'd prefer it 
to be a separate patch to have the explanation of the order of functions in 
the commit log. 


 fs/btrfs/disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f1c42d242d48..4f74942f1d97 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4041,8 +4041,8 @@ void close_ctree(struct btrfs_fs_info *fs_info)
 		btrfsic_unmount(fs_info->fs_devices);
 #endif
 
-	btrfs_close_devices(fs_info->fs_devices);
 	btrfs_mapping_tree_free(&fs_info->mapping_tree);
+	btrfs_close_devices(fs_info->fs_devices);
 
 	percpu_counter_destroy(&fs_info->dirty_metadata_bytes);
 	percpu_counter_destroy(&fs_info->delalloc_bytes);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 02/12] btrfs: combine device update operations during transaction commit
  2019-02-11  8:35 ` [PATCH v2 02/12] btrfs: combine device update operations during transaction commit Nikolay Borisov
@ 2019-02-28 16:52   ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2019-02-28 16:52 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, Jeff Mahoney

On Mon, Feb 11, 2019 at 10:35:00AM +0200, Nikolay Borisov wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> We currently overload the pending_chunks list to handle updating
> btrfs_device->commit_bytes used.  We don't actually care about
> the extent mapping or even the device mapping for the chunk - we
> just need the device, and we can end up processing it multiple
> times.  The fs_devices->resized_list does more or less the same
> thing, but with the disk size.  They are called consecutively
> during commit and have more or less the same purpose.
> 
> We can combine the two lists into a single list that attaches
> to the transaction and contains a list of devices that need
> updating.  Since we always add the device to a list when we
> change bytes_used or disk_total_size, there's no harm in
> copying both values at once.

Agreed, overall sounds good.

> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -75,6 +75,7 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction)
>  			btrfs_put_block_group_trimming(cache);
>  			btrfs_put_block_group(cache);
>  		}
> +		BUG_ON(!list_empty(&transaction->dev_update_list));

Why BUG_ON? Would ASSERT or WARN_ON enough?

>  		kfree(transaction);
>  	}
>  }
> @@ -334,6 +333,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
>  
>  void btrfs_free_device(struct btrfs_device *device)
>  {
> +	BUG_ON(!list_empty(&device->post_commit_list));

Same here

>  	rcu_string_free(device->name);
>  	bio_put(device->flush_bio);
>  	kfree(device);
> +	/*
> +	 * We don't need the device_list_mutex here.  This list is owned
> +	 * by the transaction and the transaction must complete before
> +	 * the device is released.
> +	 */
> +	mutex_lock(&trans->fs_info->chunk_mutex);
> +	list_for_each_entry_safe(curr, next, &trans->dev_update_list,
> +				 post_commit_list) {
> +		list_del_init(&curr->post_commit_list);
> +		curr->commit_total_bytes = curr->disk_total_bytes;
> +		curr->commit_bytes_used = curr->bytes_used;

Agreed, the chunk_mutex should be enough from what I've seen.

>  	}
> -	mutex_unlock(&fs_info->chunk_mutex);
> +	mutex_unlock(&trans->fs_info->chunk_mutex);
>  }
>  
>  void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 04/12] btrfs: Rename and export clear_btree_io_tree
  2019-02-11  8:35 ` [PATCH v2 04/12] btrfs: Rename and export clear_btree_io_tree Nikolay Borisov
@ 2019-02-28 16:53   ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2019-02-28 16:53 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Mon, Feb 11, 2019 at 10:35:02AM +0200, Nikolay Borisov wrote:
> This function is going to be used to clear out the device extent
> allocation information. Give it a more generic name and export it. This
> is in preparation to replacing the pending/pinned chunk lists with an
> extent tree. No functional changes.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 05/12] btrfs: Populate ->orig_block_len during read_one_chunk
  2019-02-11  8:35 ` [PATCH v2 05/12] btrfs: Populate ->orig_block_len during read_one_chunk Nikolay Borisov
@ 2019-02-28 16:53   ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2019-02-28 16:53 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Mon, Feb 11, 2019 at 10:35:03AM +0200, Nikolay Borisov wrote:
> Chunks read from disk currently don't get their ->orig_block_len member
> set, in contrast when a new chunk is allocated, the respective
> extent_map's ->orig_block_len is assigned the size of the stripe of this
> chunk. Let's apply the same strategy for chunks which are read from
> disk, not only does this codify the invariant that ->orig_block_len
> always contains the size of the stripe for a chunk (when the em belongs
> to the mapping tree). But it's also a preparatory patch for further work
> around tracking chunk allocation in an extent tree rather than
> pinned/pending lists.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 06/12] btrfs: Introduce new bits for device allocation tree
  2019-02-11  8:35 ` [PATCH v2 06/12] btrfs: Introduce new bits for device allocation tree Nikolay Borisov
@ 2019-02-28 16:57   ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2019-02-28 16:57 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Mon, Feb 11, 2019 at 10:35:04AM +0200, Nikolay Borisov wrote:
> Rather than hijacking the existing defines let's just define new bits,
> with more descriptive names. Instead of using yet more (currently at 18)
> bits for the new flags, use the fact those flags will be specific to
> the device allocation tree so define them using existing EXTENT_* flags.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  fs/btrfs/extent_io.h | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index d7beb2b3bc7d..af7e00a3678c 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -29,6 +29,10 @@
>  				 EXTENT_CLEAR_DATA_RESV)
>  #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING)
>  
> +
> +/* Redefined bits above which are used only in the device allocation tree */
> +#define CHUNK_ALLOCATED EXTENT_DIRTY

I see that 18 and more flags can be a lot, though for clarity could we
use unique values? The bitmask is u32 or unsigned, so still some left to
use. And I ve seen only 2 new more so it's not like you need 10, there
it would make more sense to reuse.

As the extent_io tree is used in different contexts, we can add some
sanity checks based on the disjoint sets of flags, but that's just an
idea and I don't have concrete examples.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 09/12] btrfs: Factor out in_range macro
  2019-02-11  8:35 ` [PATCH v2 09/12] btrfs: Factor out in_range macro Nikolay Borisov
@ 2019-02-28 18:00   ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2019-02-28 18:00 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Mon, Feb 11, 2019 at 10:35:07AM +0200, Nikolay Borisov wrote:
> This is used in more than one places so let's factor it out in ctree.h.
> No functional changes.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>

> ---
>  fs/btrfs/ctree.h       | 2 ++
>  fs/btrfs/extent-tree.c | 1 -
>  fs/btrfs/volumes.c     | 1 -
>  3 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 86dbf2160ae2..c61fff4c294d 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3808,6 +3808,8 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info)
>  	return signal_pending(current);
>  }
>  
> +#define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len))

I think it's time to add a new file for such small helpers and not
clutter ctree.h. There's math.h but too specific (and we did object
against that back then but, well). Something like utils.h or common.h
would be good, with eg. the assert macros or cond_wake_mb, and the math
helpers too. That's for later and does not affect this patchet.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, back to index

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-11  8:34 [PATCH v2 00/12] FITRIM improvements Nikolay Borisov
2019-02-11  8:34 ` [PATCH v2 01/12] btrfs: Honour FITRIM range constraints during free space trim Nikolay Borisov
2019-02-11  8:35 ` [PATCH v2 02/12] btrfs: combine device update operations during transaction commit Nikolay Borisov
2019-02-28 16:52   ` David Sterba
2019-02-11  8:35 ` [PATCH v2 03/12] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink Nikolay Borisov
2019-02-11  8:35 ` [PATCH v2 04/12] btrfs: Rename and export clear_btree_io_tree Nikolay Borisov
2019-02-28 16:53   ` David Sterba
2019-02-11  8:35 ` [PATCH v2 05/12] btrfs: Populate ->orig_block_len during read_one_chunk Nikolay Borisov
2019-02-28 16:53   ` David Sterba
2019-02-11  8:35 ` [PATCH v2 06/12] btrfs: Introduce new bits for device allocation tree Nikolay Borisov
2019-02-28 16:57   ` David Sterba
2019-02-11  8:35 ` [PATCH v2 07/12] btrfs: replace pending/pinned chunks lists with io tree Nikolay Borisov
2019-02-12 14:13   ` [PATCH] btrfs: Transpose btrfs_close_devices/btrfs_mapping_tree_free in close_ctree Nikolay Borisov
2019-02-11  8:35 ` [PATCH v2 08/12] btrfs: Remove 'trans' argument from find_free_dev_extent(_start) Nikolay Borisov
2019-02-11  8:35 ` [PATCH v2 09/12] btrfs: Factor out in_range macro Nikolay Borisov
2019-02-28 18:00   ` David Sterba
2019-02-11  8:35 ` [PATCH v2 10/12] btrfs: Optimize unallocated chunks discard Nikolay Borisov
2019-02-11  8:35 ` [PATCH v2 11/12] btrfs: Implement find_first_clear_extent_bit Nikolay Borisov
2019-02-11  8:35 ` [PATCH v2 12/12] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit Nikolay Borisov

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox