Linux-BTRFS Archive on lore.kernel.org
 help / Atom feed
* [PATCH 00/15] Improvements to fitrim 
@ 2019-01-30 14:50 Nikolay Borisov
  2019-01-30 14:50 ` [PATCH 01/15] btrfs: Honour FITRIM range constraints during free space trim Nikolay Borisov
                   ` (14 more replies)
  0 siblings, 15 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

Here's a series that spruces up btrfs' trim implementation. The main goal is to 
optimise trim of freespace so that when a range is trimmed once and not allocated, 
subsequent trims will skip it, thus improving performance. 

First 3 patches are misc cleanups which are mostly independent. Of them, perhaps
the first one is the most important since it adds support for FITRIM ranges and 
is likely stable material. Currently btrfs will entirely ignore range options 
of the FITRIM ioctl. 

Patch 4 is from Jeff and it simplifies management of resized_devices/pending_chunks
lists. Apart from a nice cleanup it's a prep patch for further removal of 
pinned/pending chunks lists

Patch 5 is a self-contained cleanup that resulted from me investigating how 
contains_pending_extent is being used. The gist is that the check for pending
extents can be done before the main shrink loop in btrfs_shrink_device. This
makes the code more linear. 

Patches 6/7/8 small, self-explanatory, prep patches for removal of pinned/pending
chunk lists. 

Patch 9 replaces the now-reduced scope of pending/pinned chunks with an 
extent_io_tree. The patch was initially developed by Jeff and I just finished 
his work by fixing several bugs. Those were mainly around memory allocation 
contexts and locking.

Patches 10/11 perform refactoring which is enabled by the previous patch. Mainly, 
simplifying find_free_dev_extent and no longer requiring a transaction be held 
open during trim. This is already a performance improvement in and of itself. 
Though find_free_dev_extent_start is still used that is the device tree is being
read.

Patch 12 makes trimming code smarter by recording which trange has been trimmed 
and avoid trimming when possible - i.e a range has been trimmed and subsequently 
not allocated. 

Patch 13 minor quality-of-life patch, self-explanatory

Patch 14/15 switch btrfs_trim_free_extents to using an entirely in-memory 
datastructure when looking for ranges to trim. The idea is this should be faster
than working with the device tree. In practice, though, I'm not entirely sure 
since :

1. We still have to hold the chunk_mutex
2. I suspect that the device tree will be more or less cached in memory. 

Review of the series is welcome, I'd also urge people to pay attention  when 
reviewing code responsible for calculating trim ranges. An error there could
lead to data loss if allocated space is trimmed. I have manually tested the 
ranged trim support to ensure this won't happen but one can never get enough 
testing :). Also, this series survived multiple xfstest runs. 

Jeff Mahoney (2):
  btrfs: combine device update operations during transaction commit
  btrfs: replace pending/pinned chunks lists with io tree

Nikolay Borisov (13):
  btrfs: Honour FITRIM range constraints during free space trim
  btrfs: Make WARN_ON in a canonical form
  btrfs: Remove  EXTENT_FIRST_DELALLOC bit
  btrfs: Handle pending/pinned chunks before blockgroup relocation
    during device shrink
  btrfs: Rename and export clear_btree_io_tree
  btrfs: Populate ->orig_block_len during read_one_chunk
  btrfs: Introduce new bits for device allocation tree
  btrfs: Remove 'trans' argument from find_free_dev_extent(_start)
  btrfs: Factor out in_range macro
  btrfs: Optimize unallocated chunks discard
  btrfs: Fix gross misnaming
  btrfs: Implement find_first_clear_extent_bit
  btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit

 fs/btrfs/ctree.h            |   8 +-
 fs/btrfs/dev-replace.c      |   2 +-
 fs/btrfs/disk-io.c          |  20 ++-
 fs/btrfs/extent-tree.c      | 107 +++++---------
 fs/btrfs/extent_io.c        | 126 ++++++++++++++--
 fs/btrfs/extent_io.h        |  34 +++--
 fs/btrfs/extent_map.c       |  38 +++++
 fs/btrfs/extent_map.h       |   1 -
 fs/btrfs/free-space-cache.c |   4 -
 fs/btrfs/transaction.c      |  53 ++-----
 fs/btrfs/transaction.h      |   2 +-
 fs/btrfs/volumes.c          | 277 ++++++++++++++----------------------
 fs/btrfs/volumes.h          |  23 ++-
 13 files changed, 354 insertions(+), 341 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/15] btrfs: Honour FITRIM range constraints during free space trim
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
@ 2019-01-30 14:50 ` Nikolay Borisov
  2019-01-31 15:21   ` David Sterba
  2019-01-30 14:50 ` [PATCH 02/15] btrfs: Make WARN_ON in a canonical form Nikolay Borisov
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

Up until know trimming the freespace was done irrespective of what the
arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments
will be entirely ignored. Fix it by correctly handling those paramter.
This requires breaking if the found freespace extent is after the end
of the passed range as well as completing trim after trimming
fstrim_range::len bytes.

Fixes: 499f377f49f0 ("btrfs: iterate over unused chunk space in FITRIM")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent-tree.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1fdba38761f7..fc3f6acc3c9b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -11190,9 +11190,9 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
  * held back allocations.
  */
 static int btrfs_trim_free_extents(struct btrfs_device *device,
-				   u64 minlen, u64 *trimmed)
+				   struct fstrim_range *range, u64 *trimmed)
 {
-	u64 start = 0, len = 0;
+	u64 start = range->start, len = 0;
 	int ret;
 
 	*trimmed = 0;
@@ -11235,8 +11235,8 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 		if (!trans)
 			up_read(&fs_info->commit_root_sem);
 
-		ret = find_free_dev_extent_start(trans, device, minlen, start,
-						 &start, &len);
+		ret = find_free_dev_extent_start(trans, device, range->minlen,
+						 start, &start, &len);
 		if (trans) {
 			up_read(&fs_info->commit_root_sem);
 			btrfs_put_transaction(trans);
@@ -11249,6 +11249,16 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 			break;
 		}
 
+		/* If we are out of the passed range break */
+		if (start > range->start + range->len - 1) {
+			mutex_unlock(&fs_info->chunk_mutex);
+			ret = 0;
+			break;
+		}
+
+		start = max(range->start, start);
+		len = min(range->len, len);
+
 		ret = btrfs_issue_discard(device->bdev, start, len, &bytes);
 		mutex_unlock(&fs_info->chunk_mutex);
 
@@ -11258,6 +11268,10 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 		start += len;
 		*trimmed += bytes;
 
+		/* We've trimmed enough */
+		if (*trimmed >= range->len)
+			break;
+
 		if (fatal_signal_pending(current)) {
 			ret = -ERESTARTSYS;
 			break;
@@ -11341,8 +11355,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
 	mutex_lock(&fs_info->fs_devices->device_list_mutex);
 	devices = &fs_info->fs_devices->devices;
 	list_for_each_entry(device, devices, dev_list) {
-		ret = btrfs_trim_free_extents(device, range->minlen,
-					      &group_trimmed);
+		ret = btrfs_trim_free_extents(device, range, &group_trimmed);
 		if (ret) {
 			dev_failed++;
 			dev_ret = ret;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 02/15] btrfs: Make WARN_ON in a canonical form
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
  2019-01-30 14:50 ` [PATCH 01/15] btrfs: Honour FITRIM range constraints during free space trim Nikolay Borisov
@ 2019-01-30 14:50 ` Nikolay Borisov
  2019-01-31 15:22   ` David Sterba
  2019-02-04 13:12   ` Johannes Thumshirn
  2019-01-30 14:50 ` [PATCH 03/15] btrfs: Remove EXTENT_FIRST_DELALLOC bit Nikolay Borisov
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

There is no point in using a construct like 'if (!condition)
WARN_ON(1)'. Use WARN_ON(!condition) directly. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent-tree.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fc3f6acc3c9b..79d0bc813a72 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10798,13 +10798,10 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	}
 
 	spin_lock(&trans->transaction->dirty_bgs_lock);
-	if (!list_empty(&block_group->dirty_list)) {
-		WARN_ON(1);
-	}
-	if (!list_empty(&block_group->io_list)) {
-		WARN_ON(1);
-	}
+	WARN_ON(!list_empty(&block_group->dirty_list));
+	WARN_ON(!list_empty(&block_group->io_list));
 	spin_unlock(&trans->transaction->dirty_bgs_lock);
+
 	btrfs_remove_free_space_cache(block_group);
 
 	spin_lock(&block_group->space_info->lock);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 03/15] btrfs: Remove  EXTENT_FIRST_DELALLOC bit
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
  2019-01-30 14:50 ` [PATCH 01/15] btrfs: Honour FITRIM range constraints during free space trim Nikolay Borisov
  2019-01-30 14:50 ` [PATCH 02/15] btrfs: Make WARN_ON in a canonical form Nikolay Borisov
@ 2019-01-30 14:50 ` Nikolay Borisov
  2019-01-31 15:23   ` David Sterba
  2019-02-04 13:15   ` Johannes Thumshirn
  2019-01-30 14:50 ` [PATCH 04/15] btrfs: combine device update operations during transaction commit Nikolay Borisov
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

With the refactoring introduced in 8b62f87bad9c ("Btrfs: reworki
outstanding_extents") this flag became unused. Remove it and renumber
the following flags accordingly. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c |  2 --
 fs/btrfs/extent_io.h | 15 +++++++--------
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 71db9f8bcdaa..01f596d42f59 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -585,7 +585,6 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 
 	if (delete)
 		bits |= ~EXTENT_CTLBITS;
-	bits |= EXTENT_FIRST_DELALLOC;
 
 	if (bits & (EXTENT_IOBITS | EXTENT_BOUNDARY))
 		clear = 1;
@@ -850,7 +849,6 @@ __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 
 	btrfs_debug_check_extent_io_range(tree, start, end);
 
-	bits |= EXTENT_FIRST_DELALLOC;
 again:
 	if (!prealloc && gfpflags_allow_blocking(mask)) {
 		/*
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 9673be3f3d1f..08749e0b9c32 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -18,17 +18,16 @@
 #define EXTENT_BOUNDARY		(1U << 9)
 #define EXTENT_NODATASUM	(1U << 10)
 #define EXTENT_CLEAR_META_RESV	(1U << 11)
-#define EXTENT_FIRST_DELALLOC	(1U << 12)
-#define EXTENT_NEED_WAIT	(1U << 13)
-#define EXTENT_DAMAGED		(1U << 14)
-#define EXTENT_NORESERVE	(1U << 15)
-#define EXTENT_QGROUP_RESERVED	(1U << 16)
-#define EXTENT_CLEAR_DATA_RESV	(1U << 17)
-#define EXTENT_DELALLOC_NEW	(1U << 18)
+#define EXTENT_NEED_WAIT	(1U << 12)
+#define EXTENT_DAMAGED		(1U << 13)
+#define EXTENT_NORESERVE	(1U << 14)
+#define EXTENT_QGROUP_RESERVED	(1U << 15)
+#define EXTENT_CLEAR_DATA_RESV	(1U << 16)
+#define EXTENT_DELALLOC_NEW	(1U << 17)
 #define EXTENT_IOBITS		(EXTENT_LOCKED | EXTENT_WRITEBACK)
 #define EXTENT_DO_ACCOUNTING    (EXTENT_CLEAR_META_RESV | \
 				 EXTENT_CLEAR_DATA_RESV)
-#define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC)
+#define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING)
 
 /*
  * flags for bio submission. The high bits indicate the compression
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 04/15] btrfs: combine device update operations during transaction commit
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
                   ` (2 preceding siblings ...)
  2019-01-30 14:50 ` [PATCH 03/15] btrfs: Remove EXTENT_FIRST_DELALLOC bit Nikolay Borisov
@ 2019-01-30 14:50 ` Nikolay Borisov
  2019-02-04 13:25   ` Johannes Thumshirn
  2019-01-30 14:50 ` [PATCH 05/15] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink Nikolay Borisov
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm

From: Jeff Mahoney <jeffm@suse.com>

We currently overload the pending_chunks list to handle updating
btrfs_device->commit_bytes used.  We don't actually care about
the extent mapping or even the device mapping for the chunk - we
just need the device, and we can end up processing it multiple
times.  The fs_devices->resized_list does more or less the same
thing, but with the disk size.  They are called consecutively
during commit and have more or less the same purpose.

We can combine the two lists into a single list that attaches
to the transaction and contains a list of devices that need
updating.  Since we always add the device to a list when we
change bytes_used or disk_total_size, there's no harm in
copying both values at once.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/dev-replace.c |  2 +-
 fs/btrfs/disk-io.c     |  9 +++++
 fs/btrfs/transaction.c |  7 ++--
 fs/btrfs/transaction.h |  1 +
 fs/btrfs/volumes.c     | 84 ++++++++++++++++++------------------------
 fs/btrfs/volumes.h     | 13 ++-----
 6 files changed, 54 insertions(+), 62 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 13863354ff9d..f335cbf3bf1f 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -662,7 +662,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 	btrfs_device_set_disk_total_bytes(tgt_device,
 					  src_device->disk_total_bytes);
 	btrfs_device_set_bytes_used(tgt_device, src_device->bytes_used);
-	ASSERT(list_empty(&src_device->resized_list));
+	ASSERT(list_empty(&src_device->post_commit_list));
 	tgt_device->commit_total_bytes = src_device->commit_total_bytes;
 	tgt_device->commit_bytes_used = src_device->bytes_used;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 888d72dda794..25dab68070dc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4498,6 +4498,15 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans,
 	ASSERT(list_empty(&cur_trans->dirty_bgs));
 	ASSERT(list_empty(&cur_trans->io_bgs));
 
+	while (!list_empty(&cur_trans->dev_update_list)) {
+		struct btrfs_device *dev;
+
+		dev = list_first_entry(&cur_trans->dev_update_list,
+				       struct btrfs_device,
+				       post_commit_list);
+		list_del_init(&dev->post_commit_list);
+	}
+
 	btrfs_destroy_delayed_refs(cur_trans, fs_info);
 
 	cur_trans->state = TRANS_STATE_COMMIT_START;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 84ffb381098d..8d340957f7d6 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -75,6 +75,7 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction)
 			btrfs_put_block_group_trimming(cache);
 			btrfs_put_block_group(cache);
 		}
+		BUG_ON(!list_empty(&transaction->dev_update_list));
 		kfree(transaction);
 	}
 }
@@ -263,6 +264,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 
 	INIT_LIST_HEAD(&cur_trans->pending_snapshots);
 	INIT_LIST_HEAD(&cur_trans->pending_chunks);
+	INIT_LIST_HEAD(&cur_trans->dev_update_list);
 	INIT_LIST_HEAD(&cur_trans->switch_commits);
 	INIT_LIST_HEAD(&cur_trans->dirty_bgs);
 	INIT_LIST_HEAD(&cur_trans->io_bgs);
@@ -549,7 +551,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 	 * and then we deadlock with somebody doing a freeze.
 	 *
 	 * If we are ATTACH, it means we just want to catch the current
-	 * transaction and commit it, so we needn't do sb_start_intwrite(). 
+	 * transaction and commit it, so we needn't do sb_start_intwrite().
 	 */
 	if (type & __TRANS_FREEZABLE)
 		sb_start_intwrite(fs_info->sb);
@@ -2198,8 +2200,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	memcpy(fs_info->super_for_commit, fs_info->super_copy,
 	       sizeof(*fs_info->super_copy));
 
-	btrfs_update_commit_device_size(fs_info);
-	btrfs_update_commit_device_bytes_used(cur_trans);
+	btrfs_commit_device_sizes(cur_trans);
 
 	clear_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags);
 	clear_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index f1ba78949d1b..e0a04fa4de66 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -52,6 +52,7 @@ struct btrfs_transaction {
 	wait_queue_head_t commit_wait;
 	struct list_head pending_snapshots;
 	struct list_head pending_chunks;
+	struct list_head dev_update_list;
 	struct list_head switch_commits;
 	struct list_head dirty_bgs;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0572831947ab..4f645e1b5539 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -318,7 +318,6 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
 	mutex_init(&fs_devs->device_list_mutex);
 
 	INIT_LIST_HEAD(&fs_devs->devices);
-	INIT_LIST_HEAD(&fs_devs->resized_devices);
 	INIT_LIST_HEAD(&fs_devs->alloc_list);
 	INIT_LIST_HEAD(&fs_devs->fs_list);
 	if (fsid)
@@ -334,6 +333,7 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
 
 void btrfs_free_device(struct btrfs_device *device)
 {
+	BUG_ON(!list_empty(&device->post_commit_list));
 	rcu_string_free(device->name);
 	bio_put(device->flush_bio);
 	kfree(device);
@@ -402,7 +402,7 @@ static struct btrfs_device *__alloc_device(void)
 
 	INIT_LIST_HEAD(&dev->dev_list);
 	INIT_LIST_HEAD(&dev->dev_alloc_list);
-	INIT_LIST_HEAD(&dev->resized_list);
+	INIT_LIST_HEAD(&dev->post_commit_list);
 
 	spin_lock_init(&dev->io_lock);
 
@@ -2870,9 +2870,9 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans,
 	btrfs_device_set_total_bytes(device, new_size);
 	btrfs_device_set_disk_total_bytes(device, new_size);
 	btrfs_clear_space_info_full(device->fs_info);
-	if (list_empty(&device->resized_list))
-		list_add_tail(&device->resized_list,
-			      &fs_devices->resized_devices);
+	if (list_empty(&device->post_commit_list))
+		list_add_tail(&device->post_commit_list,
+			      &trans->transaction->dev_update_list);
 	mutex_unlock(&fs_info->chunk_mutex);
 
 	return btrfs_update_device(trans, device);
@@ -4861,9 +4861,9 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	}
 
 	btrfs_device_set_disk_total_bytes(device, new_size);
-	if (list_empty(&device->resized_list))
-		list_add_tail(&device->resized_list,
-			      &fs_info->fs_devices->resized_devices);
+	if (list_empty(&device->post_commit_list))
+		list_add_tail(&device->post_commit_list,
+			      &trans->transaction->dev_update_list);
 
 	WARN_ON(diff > old_total);
 	btrfs_set_super_total_bytes(super_copy,
@@ -5212,9 +5212,14 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	if (ret)
 		goto error_del_extent;
 
-	for (i = 0; i < map->num_stripes; i++)
-		btrfs_device_set_bytes_used(map->stripes[i].dev,
-				map->stripes[i].dev->bytes_used + stripe_size);
+	for (i = 0; i < map->num_stripes; i++) {
+		struct btrfs_device *dev = map->stripes[i].dev;
+
+		btrfs_device_set_bytes_used(dev, dev->bytes_used + stripe_size);
+		if (list_empty(&dev->post_commit_list))
+			list_add_tail(&dev->post_commit_list,
+				      &trans->transaction->dev_update_list);
+	}
 
 	atomic64_sub(stripe_size * map->num_stripes, &info->free_chunk_space);
 
@@ -7664,51 +7669,34 @@ void btrfs_scratch_superblocks(struct block_device *bdev, const char *device_pat
 }
 
 /*
- * Update the size of all devices, which is used for writing out the
- * super blocks.
+ * Update the size and bytes used for each device where it changed.
+ * This is delayed since we would otherwise get errors while writing
+ * out the superblocks.
+ *
+ * Must be invoked during transaction commit.
  */
-void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info)
+void btrfs_commit_device_sizes(struct btrfs_transaction *trans)
 {
-	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	struct btrfs_device *curr, *next;
 
-	if (list_empty(&fs_devices->resized_devices))
-		return;
-
-	mutex_lock(&fs_devices->device_list_mutex);
-	mutex_lock(&fs_info->chunk_mutex);
-	list_for_each_entry_safe(curr, next, &fs_devices->resized_devices,
-				 resized_list) {
-		list_del_init(&curr->resized_list);
-		curr->commit_total_bytes = curr->disk_total_bytes;
-	}
-	mutex_unlock(&fs_info->chunk_mutex);
-	mutex_unlock(&fs_devices->device_list_mutex);
-}
-
-/* Must be invoked during the transaction commit */
-void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans)
-{
-	struct btrfs_fs_info *fs_info = trans->fs_info;
-	struct extent_map *em;
-	struct map_lookup *map;
-	struct btrfs_device *dev;
-	int i;
+	BUG_ON(trans->state != TRANS_STATE_COMMIT_DOING);
 
-	if (list_empty(&trans->pending_chunks))
+	if (list_empty(&trans->dev_update_list))
 		return;
 
-	/* In order to kick the device replace finish process */
-	mutex_lock(&fs_info->chunk_mutex);
-	list_for_each_entry(em, &trans->pending_chunks, list) {
-		map = em->map_lookup;
-
-		for (i = 0; i < map->num_stripes; i++) {
-			dev = map->stripes[i].dev;
-			dev->commit_bytes_used = dev->bytes_used;
-		}
+	/*
+	 * We don't need the device_list_mutex here.  This list is owned
+	 * by the transaction and the transaction must complete before
+	 * the device is released.
+	 */
+	mutex_lock(&trans->fs_info->chunk_mutex);
+	list_for_each_entry_safe(curr, next, &trans->dev_update_list,
+				 post_commit_list) {
+		list_del_init(&curr->post_commit_list);
+		curr->commit_total_bytes = curr->disk_total_bytes;
+		curr->commit_bytes_used = curr->bytes_used;
 	}
-	mutex_unlock(&fs_info->chunk_mutex);
+	mutex_unlock(&trans->fs_info->chunk_mutex);
 }
 
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 656ea8d85770..b61935286e17 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -45,6 +45,7 @@ struct btrfs_pending_bios {
 struct btrfs_device {
 	struct list_head dev_list;
 	struct list_head dev_alloc_list;
+	struct list_head post_commit_list; /* chunk mutex */
 	struct btrfs_fs_devices *fs_devices;
 	struct btrfs_fs_info *fs_info;
 
@@ -102,18 +103,12 @@ struct btrfs_device {
 	 * size of the device on the current transaction
 	 *
 	 * This variant is update when committing the transaction,
-	 * and protected by device_list_mutex
+	 * and protected by chunk mutex
 	 */
 	u64 commit_total_bytes;
 
 	/* bytes used on the current transaction */
 	u64 commit_bytes_used;
-	/*
-	 * used to manage the device which is resized
-	 *
-	 * It is protected by chunk_lock.
-	 */
-	struct list_head resized_list;
 
 	/* for sending down flush barriers */
 	struct bio *flush_bio;
@@ -235,7 +230,6 @@ struct btrfs_fs_devices {
 	struct mutex device_list_mutex;
 	struct list_head devices;
 
-	struct list_head resized_devices;
 	/* devices not currently being allocated */
 	struct list_head alloc_list;
 
@@ -557,8 +551,7 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags)
 
 const char *get_raid_name(enum btrfs_raid_types type);
 
-void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info);
-void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans);
+void btrfs_commit_device_sizes(struct btrfs_transaction *trans);
 
 struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 05/15] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
                   ` (3 preceding siblings ...)
  2019-01-30 14:50 ` [PATCH 04/15] btrfs: combine device update operations during transaction commit Nikolay Borisov
@ 2019-01-30 14:50 ` Nikolay Borisov
  2019-02-04 13:29   ` Johannes Thumshirn
  2019-01-30 14:50 ` [PATCH 06/15] btrfs: Rename and export clear_btree_io_tree Nikolay Borisov
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

During device shrink pinned/pending chunks (i.e those which have been
deleted/created respectively, in the current transaction and haven't
touched disk) need to be accounted when doing device shrink. Presently
this happens after the main relocation loop in btrfs_shrink_device,
which could lead to making another go in the body of the function.

Since there is no hard requirement to perform pinned/pending chunks
handling after the relocation loop, move the code before it. This leads
to simplifying the code flow around - i.e no need to use 'goto again'.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/volumes.c | 54 ++++++++++++++++++----------------------------
 1 file changed, 21 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4f645e1b5539..be91432697ab 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4712,15 +4712,15 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	int slot;
 	int failed = 0;
 	bool retried = false;
-	bool checked_pending_chunks = false;
 	struct extent_buffer *l;
 	struct btrfs_key key;
 	struct btrfs_super_block *super_copy = fs_info->super_copy;
 	u64 old_total = btrfs_super_total_bytes(super_copy);
 	u64 old_size = btrfs_device_get_total_bytes(device);
 	u64 diff;
+	u64 start;
 
-	new_size = round_down(new_size, fs_info->sectorsize);
+	start = new_size = round_down(new_size, fs_info->sectorsize);
 	diff = round_down(old_size - new_size, fs_info->sectorsize);
 
 	if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state))
@@ -4732,6 +4732,10 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 
 	path->reada = READA_BACK;
 
+	trans = btrfs_start_transaction(root, 0);
+	if (IS_ERR(trans))
+		return PTR_ERR(trans);
+
 	mutex_lock(&fs_info->chunk_mutex);
 
 	btrfs_device_set_total_bytes(device, new_size);
@@ -4739,7 +4743,21 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 		device->fs_devices->total_rw_bytes -= diff;
 		atomic64_sub(diff, &fs_info->free_chunk_space);
 	}
-	mutex_unlock(&fs_info->chunk_mutex);
+
+	/*
+	 * Once the device's size has been set to the new size, ensure all
+	 * in-memory chunks are synced to disk so that the loop below sees them
+	 * and relocates them accordingly.
+	 */
+	if (contains_pending_extent(trans->transaction, device, &start, diff)) {
+		mutex_unlock(&fs_info->chunk_mutex);
+		ret = btrfs_commit_transaction(trans);
+		if (ret)
+			goto done;
+	} else {
+		mutex_unlock(&fs_info->chunk_mutex);
+		btrfs_end_transaction(trans);
+	}
 
 again:
 	key.objectid = device->devid;
@@ -4830,36 +4848,6 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	}
 
 	mutex_lock(&fs_info->chunk_mutex);
-
-	/*
-	 * We checked in the above loop all device extents that were already in
-	 * the device tree. However before we have updated the device's
-	 * total_bytes to the new size, we might have had chunk allocations that
-	 * have not complete yet (new block groups attached to transaction
-	 * handles), and therefore their device extents were not yet in the
-	 * device tree and we missed them in the loop above. So if we have any
-	 * pending chunk using a device extent that overlaps the device range
-	 * that we can not use anymore, commit the current transaction and
-	 * repeat the search on the device tree - this way we guarantee we will
-	 * not have chunks using device extents that end beyond 'new_size'.
-	 */
-	if (!checked_pending_chunks) {
-		u64 start = new_size;
-		u64 len = old_size - new_size;
-
-		if (contains_pending_extent(trans->transaction, device,
-					    &start, len)) {
-			mutex_unlock(&fs_info->chunk_mutex);
-			checked_pending_chunks = true;
-			failed = 0;
-			retried = false;
-			ret = btrfs_commit_transaction(trans);
-			if (ret)
-				goto done;
-			goto again;
-		}
-	}
-
 	btrfs_device_set_disk_total_bytes(device, new_size);
 	if (list_empty(&device->post_commit_list))
 		list_add_tail(&device->post_commit_list,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 06/15] btrfs: Rename and export clear_btree_io_tree
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
                   ` (4 preceding siblings ...)
  2019-01-30 14:50 ` [PATCH 05/15] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink Nikolay Borisov
@ 2019-01-30 14:50 ` Nikolay Borisov
  2019-02-04 13:31   ` Johannes Thumshirn
  2019-01-30 14:50 ` [PATCH 07/15] btrfs: Populate ->orig_block_len during read_one_chunk Nikolay Borisov
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

This function is going to be used to clear out the device extent
allocation information. Give it a more generic name and export it. This
is in preparation to replacing the pending/pinned chunk lists with an
extent tree. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c   | 28 ++++++++++++++++++++++++++++
 fs/btrfs/extent_io.h   |  1 +
 fs/btrfs/transaction.c | 37 ++++---------------------------------
 3 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 01f596d42f59..ab1c81a1918e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -210,6 +210,34 @@ void extent_io_tree_init(struct extent_io_tree *tree,
 	tree->private_data = private_data;
 }
 
+void extent_io_tree_release(struct extent_io_tree *tree)
+{
+	spin_lock(&tree->lock);
+	/*
+	 * Do a single barrier for the waitqueue_active check here, the state
+	 * of the waitqueue should not change once clear_btree_io_tree is
+	 * called.
+	 */
+	smp_mb();
+	while (!RB_EMPTY_ROOT(&tree->state)) {
+		struct rb_node *node;
+		struct extent_state *state;
+
+		node = rb_first(&tree->state);
+		state = rb_entry(node, struct extent_state, rb_node);
+		rb_erase(&state->rb_node, &tree->state);
+		RB_CLEAR_NODE(&state->rb_node);
+		/*
+		 * btree io trees aren't supposed to have tasks waiting for
+		 * changes in the flags of extent states ever.
+		 */
+		ASSERT(!waitqueue_active(&state->wq));
+		free_extent_state(state);
+
+		cond_resched_lock(&tree->lock);
+	}
+	spin_unlock(&tree->lock);
+}
 static struct extent_state *alloc_extent_state(gfp_t mask)
 {
 	struct extent_state *state;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 08749e0b9c32..d7beb2b3bc7d 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -240,6 +240,7 @@ typedef struct extent_map *(get_extent_t)(struct btrfs_inode *inode,
 					  int create);
 
 void extent_io_tree_init(struct extent_io_tree *tree, void *private_data);
+void extent_io_tree_release(struct extent_io_tree *tree);
 int try_release_extent_mapping(struct page *page, gfp_t mask);
 int try_release_extent_buffer(struct page *page);
 int lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 8d340957f7d6..0f5c8d9b50d7 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -80,35 +80,6 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction)
 	}
 }
 
-static void clear_btree_io_tree(struct extent_io_tree *tree)
-{
-	spin_lock(&tree->lock);
-	/*
-	 * Do a single barrier for the waitqueue_active check here, the state
-	 * of the waitqueue should not change once clear_btree_io_tree is
-	 * called.
-	 */
-	smp_mb();
-	while (!RB_EMPTY_ROOT(&tree->state)) {
-		struct rb_node *node;
-		struct extent_state *state;
-
-		node = rb_first(&tree->state);
-		state = rb_entry(node, struct extent_state, rb_node);
-		rb_erase(&state->rb_node, &tree->state);
-		RB_CLEAR_NODE(&state->rb_node);
-		/*
-		 * btree io trees aren't supposed to have tasks waiting for
-		 * changes in the flags of extent states ever.
-		 */
-		ASSERT(!waitqueue_active(&state->wq));
-		free_extent_state(state);
-
-		cond_resched_lock(&tree->lock);
-	}
-	spin_unlock(&tree->lock);
-}
-
 static noinline void switch_commit_roots(struct btrfs_transaction *trans)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
@@ -122,7 +93,7 @@ static noinline void switch_commit_roots(struct btrfs_transaction *trans)
 		root->commit_root = btrfs_root_node(root);
 		if (is_fstree(root->root_key.objectid))
 			btrfs_unpin_free_ino(root);
-		clear_btree_io_tree(&root->dirty_log_pages);
+		extent_io_tree_release(&root->dirty_log_pages);
 	}
 
 	/* We can free old roots now. */
@@ -938,7 +909,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
 		 * superblock that points to btree nodes/leafs for which
 		 * writeback hasn't finished yet (and without errors).
 		 * We cleanup any entries left in the io tree when committing
-		 * the transaction (through clear_btree_io_tree()).
+		 * the transaction (through extent_io_tree_release()).
 		 */
 		if (err == -ENOMEM) {
 			err = 0;
@@ -983,7 +954,7 @@ static int __btrfs_wait_marked_extents(struct btrfs_fs_info *fs_info,
 		 * left in the io tree. For a log commit, we don't remove them
 		 * after committing the log because the tree can be accessed
 		 * concurrently - we do it only at transaction commit time when
-		 * it's safe to do it (through clear_btree_io_tree()).
+		 * it's safe to do it (through extent_io_tree_release()).
 		 */
 		err = clear_extent_bit(dirty_pages, start, end,
 				       EXTENT_NEED_WAIT, 0, 0, &cached_state);
@@ -1061,7 +1032,7 @@ static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans)
 	blk_finish_plug(&plug);
 	ret2 = btrfs_wait_extents(fs_info, dirty_pages);
 
-	clear_btree_io_tree(&trans->transaction->dirty_pages);
+	extent_io_tree_release(&trans->transaction->dirty_pages);
 
 	if (ret)
 		return ret;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 07/15] btrfs: Populate ->orig_block_len during read_one_chunk
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
                   ` (5 preceding siblings ...)
  2019-01-30 14:50 ` [PATCH 06/15] btrfs: Rename and export clear_btree_io_tree Nikolay Borisov
@ 2019-01-30 14:50 ` Nikolay Borisov
  2019-01-30 14:50 ` [PATCH 08/15] btrfs: Introduce new bits for device allocation tree Nikolay Borisov
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

Chunks read from disk currently don't get their ->orig_block_len member
set, in contrast when a new chunk is allocated, the respective
extent_map's ->orig_block_len is assigned the size of the stripe of this
chunk. Let's apply the same strategy for chunks which are read from
disk, not only does this codify the invariant that ->orig_block_len
always contains the size of the stripe for a chunk (when the em belongs
to the mapping tree). But it's also a preparatory patch for further work
around tracking chunk allocation in an extent tree rather than
pinned/pending lists.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/volumes.c | 41 ++++++++++++++++++++++-------------------
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index be91432697ab..e678cad34bd3 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6801,6 +6801,26 @@ static void btrfs_report_missing_device(struct btrfs_fs_info *fs_info,
 			      devid, uuid);
 }
 
+static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes)
+{
+	int index = btrfs_bg_flags_to_raid_index(type);
+	int ncopies = btrfs_raid_array[index].ncopies;
+	int data_stripes;
+
+	switch (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
+	case BTRFS_BLOCK_GROUP_RAID5:
+		data_stripes = num_stripes - 1;
+		break;
+	case BTRFS_BLOCK_GROUP_RAID6:
+		data_stripes = num_stripes - 2;
+		break;
+	default:
+		data_stripes = num_stripes / ncopies;
+		break;
+	}
+	return div_u64(chunk_len, data_stripes);
+}
+
 static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 			  struct extent_buffer *leaf,
 			  struct btrfs_chunk *chunk)
@@ -6860,6 +6880,8 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 	map->type = btrfs_chunk_type(leaf, chunk);
 	map->sub_stripes = btrfs_chunk_sub_stripes(leaf, chunk);
 	map->verified_stripes = 0;
+	em->orig_block_len = calc_stripe_length(map->type, em->len,
+						map->num_stripes);
 	for (i = 0; i < num_stripes; i++) {
 		map->stripes[i].physical =
 			btrfs_stripe_offset_nr(leaf, chunk, i);
@@ -7717,25 +7739,6 @@ int btrfs_bg_type_to_factor(u64 flags)
 }
 
 
-static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes)
-{
-	int index = btrfs_bg_flags_to_raid_index(type);
-	int ncopies = btrfs_raid_array[index].ncopies;
-	int data_stripes;
-
-	switch (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
-	case BTRFS_BLOCK_GROUP_RAID5:
-		data_stripes = num_stripes - 1;
-		break;
-	case BTRFS_BLOCK_GROUP_RAID6:
-		data_stripes = num_stripes - 2;
-		break;
-	default:
-		data_stripes = num_stripes / ncopies;
-		break;
-	}
-	return div_u64(chunk_len, data_stripes);
-}
 
 static int verify_one_dev_extent(struct btrfs_fs_info *fs_info,
 				 u64 chunk_offset, u64 devid,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 08/15] btrfs: Introduce new bits for device allocation tree
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
                   ` (6 preceding siblings ...)
  2019-01-30 14:50 ` [PATCH 07/15] btrfs: Populate ->orig_block_len during read_one_chunk Nikolay Borisov
@ 2019-01-30 14:50 ` Nikolay Borisov
  2019-01-30 14:50 ` [PATCH 09/15] btrfs: replace pending/pinned chunks lists with io tree Nikolay Borisov
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

Rather than hijacking the existing defines let's just define new bits,
with more descriptive names. Instead of using yet more (currently at 18)
bits for the new flags, use the fact those flags will be specific to
the device allocation tree so define them using existing EXTENT_* flags.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index d7beb2b3bc7d..af7e00a3678c 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -29,6 +29,10 @@
 				 EXTENT_CLEAR_DATA_RESV)
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING)
 
+
+/* Redefined bits above which are used only in the device allocation tree */
+#define CHUNK_ALLOCATED EXTENT_DIRTY
+
 /*
  * flags for bio submission. The high bits indicate the compression
  * type for this bio
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 09/15] btrfs: replace pending/pinned chunks lists with io tree
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
                   ` (7 preceding siblings ...)
  2019-01-30 14:50 ` [PATCH 08/15] btrfs: Introduce new bits for device allocation tree Nikolay Borisov
@ 2019-01-30 14:50 ` Nikolay Borisov
  2019-01-30 14:50 ` [PATCH 10/15] btrfs: Remove 'trans' argument from find_free_dev_extent(_start) Nikolay Borisov
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

From: Jeff Mahoney <jeffm@suse.com>

The pending chunks list contains chunks that are allocated in the
current transaction but haven't been created yet. The pinned chunks
list contains chunks that are being released in the current transaction.
Both describe chunks that are not reflected on disk as in use but are
unavailable just the same.

The pending chunks list is anchored by the transaction handle, which
means that we need to hold a reference to a transaction when working
with the list.

We use these lists to ensure that we don't end up discarding chunks
that are allocated or released in the current transaction.  What we r

The way we use them is by iterating over both lists to perform
comparisons on the stripes they describe for each device. This is
backwards and requires that we keep a transaction handle open while
we're trimming.

This patchset adds an extent_io_tree to btrfs_device that maintains
the allocation state of the device.  Extents are set dirty when
chunks are first allocated -- when the extent maps are added to the
mapping tree. They're cleared when last removed -- when the extent
maps are removed from the mapping tree. This matches the lifespan
of the pending and pinned chunks list and allows us to do trims
on unallocated space safely without pinning the transaction for what
may be a lengthy operation. We can also use this io tree to mark
which chunks have already been trimmed so we don't repeat the operation.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/ctree.h            |  6 ---
 fs/btrfs/disk-io.c          | 11 -----
 fs/btrfs/extent-tree.c      | 28 -----------
 fs/btrfs/extent_io.c        |  2 +-
 fs/btrfs/extent_io.h        |  6 ++-
 fs/btrfs/extent_map.c       | 36 ++++++++++++++
 fs/btrfs/extent_map.h       |  1 -
 fs/btrfs/free-space-cache.c |  4 --
 fs/btrfs/transaction.c      |  9 ----
 fs/btrfs/transaction.h      |  1 -
 fs/btrfs/volumes.c          | 96 +++++++++++++------------------------
 fs/btrfs/volumes.h          |  2 +
 12 files changed, 76 insertions(+), 126 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fecc64d8e285..a08791dc5cc2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1148,12 +1148,6 @@ struct btrfs_fs_info {
 	struct mutex unused_bg_unpin_mutex;
 	struct mutex delete_unused_bgs_mutex;
 
-	/*
-	 * Chunks that can't be freed yet (under a trim/discard operation)
-	 * and will be latter freed. Protected by fs_info->chunk_mutex.
-	 */
-	struct list_head pinned_chunks;
-
 	/* Cached block sizes */
 	u32 nodesize;
 	u32 sectorsize;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 25dab68070dc..8ef7b8f9e4be 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2774,8 +2774,6 @@ int open_ctree(struct super_block *sb,
 	init_waitqueue_head(&fs_info->transaction_blocked_wait);
 	init_waitqueue_head(&fs_info->async_submit_wait);
 
-	INIT_LIST_HEAD(&fs_info->pinned_chunks);
-
 	/* Usable values until the real ones are cached from the superblock */
 	fs_info->nodesize = 4096;
 	fs_info->sectorsize = 4096;
@@ -4050,15 +4048,6 @@ void close_ctree(struct btrfs_fs_info *fs_info)
 
 	btrfs_free_stripe_hash_table(fs_info);
 	btrfs_free_ref_cache(fs_info);
-
-	while (!list_empty(&fs_info->pinned_chunks)) {
-		struct extent_map *em;
-
-		em = list_first_entry(&fs_info->pinned_chunks,
-				      struct extent_map, list);
-		list_del_init(&em->list);
-		free_extent_map(em);
-	}
 }
 
 int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 79d0bc813a72..a5613fe35b90 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10824,10 +10824,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	memcpy(&key, &block_group->key, sizeof(key));
 
 	mutex_lock(&fs_info->chunk_mutex);
-	if (!list_empty(&em->list)) {
-		/* We're in the transaction->pending_chunks list. */
-		free_extent_map(em);
-	}
 	spin_lock(&block_group->lock);
 	block_group->removed = 1;
 	/*
@@ -10854,25 +10850,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	 * the transaction commit has completed.
 	 */
 	remove_em = (atomic_read(&block_group->trimming) == 0);
-	/*
-	 * Make sure a trimmer task always sees the em in the pinned_chunks list
-	 * if it sees block_group->removed == 1 (needs to lock block_group->lock
-	 * before checking block_group->removed).
-	 */
-	if (!remove_em) {
-		/*
-		 * Our em might be in trans->transaction->pending_chunks which
-		 * is protected by fs_info->chunk_mutex ([lock|unlock]_chunks),
-		 * and so is the fs_info->pinned_chunks list.
-		 *
-		 * So at this point we must be holding the chunk_mutex to avoid
-		 * any races with chunk allocation (more specifically at
-		 * volumes.c:contains_pending_extent()), to ensure it always
-		 * sees the em, either in the pending_chunks list or in the
-		 * pinned_chunks list.
-		 */
-		list_move_tail(&em->list, &fs_info->pinned_chunks);
-	}
 	spin_unlock(&block_group->lock);
 
 	if (remove_em) {
@@ -10880,11 +10857,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 
 		em_tree = &fs_info->mapping_tree.map_tree;
 		write_lock(&em_tree->lock);
-		/*
-		 * The em might be in the pending_chunks list, so make sure the
-		 * chunk mutex is locked, since remove_extent_mapping() will
-		 * delete us from that list.
-		 */
 		remove_extent_mapping(em_tree, em);
 		write_unlock(&em_tree->lock);
 		/* once for the tree */
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ab1c81a1918e..e788d4aa53d1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -860,7 +860,7 @@ static void cache_state(struct extent_state *state,
  * [start, end] is inclusive This takes the tree lock.
  */
 
-static int __must_check
+int __must_check
 __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		 unsigned bits, unsigned exclusive_bits,
 		 u64 *failed_start, struct extent_state **cached_state,
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index af7e00a3678c..d4227e40c8ee 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -314,7 +314,11 @@ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		   unsigned bits, u64 *failed_start,
 		   struct extent_state **cached_state, gfp_t mask);
-
+int
+__set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
+		 unsigned bits, unsigned exclusive_bits,
+		 u64 *failed_start, struct extent_state **cached_state,
+		 gfp_t mask, struct extent_changeset *changeset);
 static inline int set_extent_bits(struct extent_io_tree *tree, u64 start,
 		u64 end, unsigned bits)
 {
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 928f729c55ba..0820f6fcf3a6 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -4,6 +4,7 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include "ctree.h"
+#include "volumes.h"
 #include "extent_map.h"
 #include "compression.h"
 
@@ -336,6 +337,37 @@ static inline void setup_extent_mapping(struct extent_map_tree *tree,
 	else
 		try_merge_map(tree, em);
 }
+static void extent_map_device_set_bits(struct extent_map *em, unsigned bits)
+{
+	struct map_lookup *map = em->map_lookup;
+	u64 stripe_size = em->orig_block_len;
+	int i;
+
+	for (i = 0; i < map->num_stripes; i++) {
+		struct btrfs_bio_stripe *stripe = &map->stripes[i];
+		struct btrfs_device *device = stripe->dev;
+
+		__set_extent_bit(&device->alloc_state, stripe->physical,
+				 stripe->physical + stripe_size - 1, bits,
+				 0, NULL, NULL, GFP_NOWAIT, NULL);
+	}
+}
+
+static void extent_map_device_clear_bits(struct extent_map *em, unsigned bits)
+{
+	struct map_lookup *map = em->map_lookup;
+	u64 stripe_size = em->orig_block_len;
+	int i;
+
+	for (i = 0; i < map->num_stripes; i++) {
+		struct btrfs_bio_stripe *stripe = &map->stripes[i];
+		struct btrfs_device *device = stripe->dev;
+
+		__clear_extent_bit(&device->alloc_state, stripe->physical,
+				   stripe->physical + stripe_size - 1, bits,
+				   0, 0, NULL, GFP_NOWAIT, NULL);
+	}
+}
 
 /**
  * add_extent_mapping - add new extent map to the extent tree
@@ -357,6 +389,8 @@ int add_extent_mapping(struct extent_map_tree *tree,
 		goto out;
 
 	setup_extent_mapping(tree, em, modified);
+	if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags))
+		extent_map_device_set_bits(em, CHUNK_ALLOCATED);
 out:
 	return ret;
 }
@@ -438,6 +472,8 @@ void remove_extent_mapping(struct extent_map_tree *tree, struct extent_map *em)
 	rb_erase_cached(&em->rb_node, &tree->map);
 	if (!test_bit(EXTENT_FLAG_LOGGING, &em->flags))
 		list_del_init(&em->list);
+	if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags))
+		extent_map_device_clear_bits(em, CHUNK_ALLOCATED);
 	RB_CLEAR_NODE(&em->rb_node);
 }
 
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index 473f039fcd7c..72b46833f236 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -91,7 +91,6 @@ void replace_extent_mapping(struct extent_map_tree *tree,
 			    struct extent_map *cur,
 			    struct extent_map *new,
 			    int modified);
-
 struct extent_map *alloc_extent_map(void);
 void free_extent_map(struct extent_map *em);
 int __init extent_map_init(void);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 74aa552f4793..207fb50dcc7a 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3366,10 +3366,6 @@ void btrfs_put_block_group_trimming(struct btrfs_block_group_cache *block_group)
 		em = lookup_extent_mapping(em_tree, block_group->key.objectid,
 					   1);
 		BUG_ON(!em); /* logic error, can't happen */
-		/*
-		 * remove_extent_mapping() will delete us from the pinned_chunks
-		 * list, which is protected by the chunk mutex.
-		 */
 		remove_extent_mapping(em_tree, em);
 		write_unlock(&em_tree->lock);
 		mutex_unlock(&fs_info->chunk_mutex);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 0f5c8d9b50d7..32af91b0a132 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -50,14 +50,6 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction)
 			btrfs_err(transaction->fs_info,
 				  "pending csums is %llu",
 				  transaction->delayed_refs.pending_csums);
-		while (!list_empty(&transaction->pending_chunks)) {
-			struct extent_map *em;
-
-			em = list_first_entry(&transaction->pending_chunks,
-					      struct extent_map, list);
-			list_del_init(&em->list);
-			free_extent_map(em);
-		}
 		/*
 		 * If any block groups are found in ->deleted_bgs then it's
 		 * because the transaction was aborted and a commit did not
@@ -234,7 +226,6 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 	spin_lock_init(&cur_trans->delayed_refs.lock);
 
 	INIT_LIST_HEAD(&cur_trans->pending_snapshots);
-	INIT_LIST_HEAD(&cur_trans->pending_chunks);
 	INIT_LIST_HEAD(&cur_trans->dev_update_list);
 	INIT_LIST_HEAD(&cur_trans->switch_commits);
 	INIT_LIST_HEAD(&cur_trans->dirty_bgs);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index e0a04fa4de66..134d1a0bd92f 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -51,7 +51,6 @@ struct btrfs_transaction {
 	wait_queue_head_t writer_wait;
 	wait_queue_head_t commit_wait;
 	struct list_head pending_snapshots;
-	struct list_head pending_chunks;
 	struct list_head dev_update_list;
 	struct list_head switch_commits;
 	struct list_head dirty_bgs;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e678cad34bd3..25a3c1bdfc99 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -335,6 +335,8 @@ void btrfs_free_device(struct btrfs_device *device)
 {
 	BUG_ON(!list_empty(&device->post_commit_list));
 	rcu_string_free(device->name);
+	if (!in_softirq())
+		extent_io_tree_release(&device->alloc_state);
 	bio_put(device->flush_bio);
 	kfree(device);
 }
@@ -411,6 +413,7 @@ static struct btrfs_device *__alloc_device(void)
 	btrfs_device_data_ordered_init(dev);
 	INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
 	INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
+	extent_io_tree_init(&dev->alloc_state, NULL);
 
 	return dev;
 }
@@ -1270,6 +1273,9 @@ static void btrfs_close_one_device(struct btrfs_device *device)
 	if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
 		fs_devices->missing_devices--;
 
+	/* Remove alloc state now since it cannot be done in RCU context */
+	extent_io_tree_release(&device->alloc_state);
+
 	btrfs_close_bdev(device);
 
 	new_device = btrfs_alloc_device(NULL, &device->devid,
@@ -1495,58 +1501,29 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
 	return device;
 }
 
-static int contains_pending_extent(struct btrfs_transaction *transaction,
-				   struct btrfs_device *device,
-				   u64 *start, u64 len)
-{
-	struct btrfs_fs_info *fs_info = device->fs_info;
-	struct extent_map *em;
-	struct list_head *search_list = &fs_info->pinned_chunks;
-	int ret = 0;
-	u64 physical_start = *start;
-
-	if (transaction)
-		search_list = &transaction->pending_chunks;
-again:
-	list_for_each_entry(em, search_list, list) {
-		struct map_lookup *map;
-		int i;
-
-		map = em->map_lookup;
-		for (i = 0; i < map->num_stripes; i++) {
-			u64 end;
-
-			if (map->stripes[i].dev != device)
-				continue;
-			if (map->stripes[i].physical >= physical_start + len ||
-			    map->stripes[i].physical + em->orig_block_len <=
-			    physical_start)
-				continue;
-			/*
-			 * Make sure that while processing the pinned list we do
-			 * not override our *start with a lower value, because
-			 * we can have pinned chunks that fall within this
-			 * device hole and that have lower physical addresses
-			 * than the pending chunks we processed before. If we
-			 * do not take this special care we can end up getting
-			 * 2 pending chunks that start at the same physical
-			 * device offsets because the end offset of a pinned
-			 * chunk can be equal to the start offset of some
-			 * pending chunk.
-			 */
-			end = map->stripes[i].physical + em->orig_block_len;
-			if (end > *start) {
-				*start = end;
-				ret = 1;
-			}
+/*
+ * Tries to find a chunk that intersects [start, start +len] range and when one
+ * such is found, records the end of it in *start
+ */
+#define in_range(b, first, len)        ((b) >= (first) && (b) < (first) + (len))
+static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
+				    u64 len)
+{
+	u64 physical_start, physical_end;
+	lockdep_assert_held(&device->fs_info->chunk_mutex);
+
+	if (!find_first_extent_bit(&device->alloc_state, *start,
+				   &physical_start, &physical_end,
+				   CHUNK_ALLOCATED, NULL)) {
+
+		if (in_range(physical_start, *start, len) ||
+		    in_range(*start, physical_start,
+			     physical_end - physical_start)) {
+			*start = physical_end + 1;
+			return true;
 		}
 	}
-	if (search_list != &fs_info->pinned_chunks) {
-		search_list = &fs_info->pinned_chunks;
-		goto again;
-	}
-
-	return ret;
+	return false;
 }
 
 
@@ -1657,15 +1634,12 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction,
 			 * Have to check before we set max_hole_start, otherwise
 			 * we could end up sending back this offset anyway.
 			 */
-			if (contains_pending_extent(transaction, device,
-						    &search_start,
+			if (contains_pending_extent(device, &search_start,
 						    hole_size)) {
-				if (key.offset >= search_start) {
+				if (key.offset >= search_start)
 					hole_size = key.offset - search_start;
-				} else {
-					WARN_ON_ONCE(1);
+				else
 					hole_size = 0;
-				}
 			}
 
 			if (hole_size > max_hole_size) {
@@ -1706,8 +1680,7 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction,
 	if (search_end > search_start) {
 		hole_size = search_end - search_start;
 
-		if (contains_pending_extent(transaction, device, &search_start,
-					    hole_size)) {
+		if (contains_pending_extent(device, &search_start, hole_size)) {
 			btrfs_release_path(path);
 			goto again;
 		}
@@ -4749,7 +4722,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 	 * in-memory chunks are synced to disk so that the loop below sees them
 	 * and relocates them accordingly.
 	 */
-	if (contains_pending_extent(trans->transaction, device, &start, diff)) {
+	if (contains_pending_extent(device, &start, diff)) {
 		mutex_unlock(&fs_info->chunk_mutex);
 		ret = btrfs_commit_transaction(trans);
 		if (ret)
@@ -5191,9 +5164,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		free_extent_map(em);
 		goto error;
 	}
-
-	list_add_tail(&em->list, &trans->transaction->pending_chunks);
-	refcount_inc(&em->refs);
 	write_unlock(&em_tree->lock);
 
 	ret = btrfs_make_block_group(trans, 0, type, start, chunk_size);
@@ -5226,8 +5196,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	free_extent_map(em);
 	/* One for the tree reference */
 	free_extent_map(em);
-	/* One for the pending_chunks list reference */
-	free_extent_map(em);
 error:
 	kfree(devices_info);
 	return ret;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index b61935286e17..8fbeba92025d 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -134,6 +134,8 @@ struct btrfs_device {
 	/* Counter to record the change of device stats */
 	atomic_t dev_stats_ccnt;
 	atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
+
+	struct extent_io_tree alloc_state;
 };
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 10/15] btrfs: Remove 'trans' argument from find_free_dev_extent(_start)
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
                   ` (8 preceding siblings ...)
  2019-01-30 14:50 ` [PATCH 09/15] btrfs: replace pending/pinned chunks lists with io tree Nikolay Borisov
@ 2019-01-30 14:50 ` Nikolay Borisov
  2019-02-04 14:36   ` Johannes Thumshirn
  2019-01-30 14:50 ` [PATCH 11/15] btrfs: Factor out in_range macro Nikolay Borisov
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

Now that those function no longer require a handle to transaction to
inspect pending/pinned chunks the argument can be removed. At the same
time also remove any surrounding code which acquired the handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent-tree.c | 36 +++---------------------------------
 fs/btrfs/volumes.c     | 11 ++++-------
 fs/btrfs/volumes.h     |  8 +++-----
 3 files changed, 10 insertions(+), 45 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a5613fe35b90..86e6d19871f9 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9787,12 +9787,10 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache)
  */
 int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 {
-	struct btrfs_root *root = fs_info->extent_root;
 	struct btrfs_block_group_cache *block_group;
 	struct btrfs_space_info *space_info;
 	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 	struct btrfs_device *device;
-	struct btrfs_trans_handle *trans;
 	u64 min_free;
 	u64 dev_min = 1;
 	u64 dev_nr = 0;
@@ -9891,13 +9889,6 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 		min_free = div64_u64(min_free, dev_min);
 	}
 
-	/* We need to do this so that we can look at pending chunks */
-	trans = btrfs_join_transaction(root);
-	if (IS_ERR(trans)) {
-		ret = PTR_ERR(trans);
-		goto out;
-	}
-
 	mutex_lock(&fs_info->chunk_mutex);
 	list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) {
 		u64 dev_offset;
@@ -9908,7 +9899,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 		 */
 		if (device->total_bytes > device->bytes_used + min_free &&
 		    !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
-			ret = find_free_dev_extent(trans, device, min_free,
+			ret = find_free_dev_extent(device, min_free,
 						   &dev_offset, NULL);
 			if (!ret)
 				dev_nr++;
@@ -9924,7 +9915,6 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 			   "no space to allocate a new chunk for block group %llu",
 			   block_group->key.objectid);
 	mutex_unlock(&fs_info->chunk_mutex);
-	btrfs_end_transaction(trans);
 out:
 	btrfs_put_block_group(block_group);
 	return ret;
@@ -11182,34 +11172,14 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 
 	while (1) {
 		struct btrfs_fs_info *fs_info = device->fs_info;
-		struct btrfs_transaction *trans;
 		u64 bytes;
 
 		ret = mutex_lock_interruptible(&fs_info->chunk_mutex);
 		if (ret)
 			break;
 
-		ret = down_read_killable(&fs_info->commit_root_sem);
-		if (ret) {
-			mutex_unlock(&fs_info->chunk_mutex);
-			break;
-		}
-
-		spin_lock(&fs_info->trans_lock);
-		trans = fs_info->running_transaction;
-		if (trans)
-			refcount_inc(&trans->use_count);
-		spin_unlock(&fs_info->trans_lock);
-
-		if (!trans)
-			up_read(&fs_info->commit_root_sem);
-
-		ret = find_free_dev_extent_start(trans, device, range->minlen,
-						 start, &start, &len);
-		if (trans) {
-			up_read(&fs_info->commit_root_sem);
-			btrfs_put_transaction(trans);
-		}
+		ret = find_free_dev_extent_start(device, range->minlen, start,
+						 &start, &len);
 
 		if (ret) {
 			mutex_unlock(&fs_info->chunk_mutex);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 25a3c1bdfc99..cb01d40993dd 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1548,8 +1548,7 @@ static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
  * But if we don't find suitable free space, it is used to store the size of
  * the max free space.
  */
-int find_free_dev_extent_start(struct btrfs_transaction *transaction,
-			       struct btrfs_device *device, u64 num_bytes,
+int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes,
 			       u64 search_start, u64 *start, u64 *len)
 {
 	struct btrfs_fs_info *fs_info = device->fs_info;
@@ -1705,13 +1704,11 @@ int find_free_dev_extent_start(struct btrfs_transaction *transaction,
 	return ret;
 }
 
-int find_free_dev_extent(struct btrfs_trans_handle *trans,
-			 struct btrfs_device *device, u64 num_bytes,
+int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
 			 u64 *start, u64 *len)
 {
 	/* FIXME use last free of some kind */
-	return find_free_dev_extent_start(trans->transaction, device,
-					  num_bytes, 0, start, len);
+	return find_free_dev_extent_start(device, num_bytes, 0, start, len);
 }
 
 static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans,
@@ -5030,7 +5027,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		if (total_avail == 0)
 			continue;
 
-		ret = find_free_dev_extent(trans, device,
+		ret = find_free_dev_extent(device,
 					   max_stripe_size * dev_stripes,
 					   &dev_offset, &max_avail);
 		if (ret && ret != -ENOSPC)
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 8fbeba92025d..91f0fc889678 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -444,11 +444,9 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info);
 int btrfs_create_uuid_tree(struct btrfs_fs_info *fs_info);
 int btrfs_check_uuid_tree(struct btrfs_fs_info *fs_info);
 int btrfs_chunk_readonly(struct btrfs_fs_info *fs_info, u64 chunk_offset);
-int find_free_dev_extent_start(struct btrfs_transaction *transaction,
-			 struct btrfs_device *device, u64 num_bytes,
-			 u64 search_start, u64 *start, u64 *max_avail);
-int find_free_dev_extent(struct btrfs_trans_handle *trans,
-			 struct btrfs_device *device, u64 num_bytes,
+int find_free_dev_extent_start(struct btrfs_device *device, u64 num_bytes,
+			       u64 search_start, u64 *start, u64 *max_avail);
+int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
 			 u64 *start, u64 *max_avail);
 void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index);
 int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 11/15] btrfs: Factor out in_range macro
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
                   ` (9 preceding siblings ...)
  2019-01-30 14:50 ` [PATCH 10/15] btrfs: Remove 'trans' argument from find_free_dev_extent(_start) Nikolay Borisov
@ 2019-01-30 14:50 ` Nikolay Borisov
  2019-02-04 13:57   ` Johannes Thumshirn
  2019-01-30 14:50 ` [PATCH 12/15] btrfs: Optimize unallocated chunks discard Nikolay Borisov
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

This is used in more than one places so let's factor it out in ctree.h.
No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/ctree.h       | 2 ++
 fs/btrfs/extent-tree.c | 1 -
 fs/btrfs/volumes.c     | 1 -
 3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a08791dc5cc2..648c3504ffcc 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3779,6 +3779,8 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info)
 	return signal_pending(current);
 }
 
+#define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len))
+
 /* Sanity test specific functions */
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 void btrfs_test_inode_set_ops(struct inode *inode);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 86e6d19871f9..610bed028511 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1905,7 +1905,6 @@ static int remove_extent_backref(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-#define in_range(b, first, len)        ((b) >= (first) && (b) < (first) + (len))
 static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
 			       u64 *discarded_bytes)
 {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index cb01d40993dd..0657ad510664 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1505,7 +1505,6 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
  * Tries to find a chunk that intersects [start, start +len] range and when one
  * such is found, records the end of it in *start
  */
-#define in_range(b, first, len)        ((b) >= (first) && (b) < (first) + (len))
 static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
 				    u64 len)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 12/15] btrfs: Optimize unallocated chunks discard
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
                   ` (10 preceding siblings ...)
  2019-01-30 14:50 ` [PATCH 11/15] btrfs: Factor out in_range macro Nikolay Borisov
@ 2019-01-30 14:50 ` Nikolay Borisov
  2019-01-30 14:51 ` [PATCH 13/15] btrfs: Fix gross misnaming Nikolay Borisov
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:50 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

Currently unallocated chunks are always trimmed. For example
2 consecutive trims on large storage would trim freespace twice
irrespective of whether the space was actually allocated or not between
those trims.

Optimise this behavior by exploiting the newly introduced alloc_state
tree of btrfs_device. A new CHUNK_TRIMMED bit is used to mark
those unallocated chunks which have been trimmed and have not been
allocated afterwards. On chunk allocation the respective underlying devices'
physical space will have its CHUNK_TRIMMED flag cleared. This avoids
submitting discards for space which hasn't been changed since the last
time discard was issued.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent-tree.c | 57 +++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/extent_io.h   |  8 +++++-
 fs/btrfs/extent_map.c  |  4 ++-
 3 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 610bed028511..f5005ef39f98 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -11127,6 +11127,54 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
 	return unpin_extent_range(fs_info, start, end, false);
 }
 
+static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len)
+{
+	u64 trimmed_start = 0, trimmed_end = 0;
+	u64 end = *start + *len - 1;
+
+	if (!find_first_extent_bit(&device->alloc_state, *start, &trimmed_start,
+				   &trimmed_end, CHUNK_TRIMMED, NULL)) {
+		u64 trimmed_len = trimmed_end - trimmed_start + 1;
+
+		if (*start < trimmed_start) {
+			if (in_range(end, trimmed_start, trimmed_len) ||
+			    end > trimmed_end) {
+				/*
+				 * start|------|end
+				 *      ts|--|trimmed_len
+				 *      OR
+				 * start|-----|end
+				 *      ts|-----|trimmed_len
+				 */
+				*len = trimmed_start - *start;
+				return false;
+			} else if (end < trimmed_start) {
+				/*
+				 * start|------|end
+				 *             ts|--|trimmed_len
+				 */
+				return false;
+			}
+		} else if (in_range(*start, trimmed_start, trimmed_len)) {
+			if (in_range(end, trimmed_start, trimmed_len)) {
+				/*
+				 * start|------|end
+				 *  ts|----------|trimmed_len
+				 */
+				return true;
+			} else {
+				/*
+				 * start|-----------|end
+				 *  ts|----------|trimmed_len
+				 */
+				*start = trimmed_end + 1;
+				*len = end - *start + 1;
+				return false;
+			}
+		}
+	}
+	return false;
+}
 /*
  * It used to be that old block groups would be left around forever.
  * Iterating over them would be enough to trim unused space.  Since we
@@ -11197,7 +11245,14 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 		start = max(range->start, start);
 		len = min(range->len, len);
 
-		ret = btrfs_issue_discard(device->bdev, start, len, &bytes);
+		if (!should_skip_trim(device, &start, &len)) {
+			ret = btrfs_issue_discard(device->bdev, start, len,
+						  &bytes);
+			if (!ret)
+				set_extent_bits(&device->alloc_state, start,
+						start + bytes - 1,
+						CHUNK_TRIMMED);
+		}
 		mutex_unlock(&fs_info->chunk_mutex);
 
 		if (ret)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index d4227e40c8ee..d238efd628cf 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -30,8 +30,14 @@
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING)
 
 
-/* Redefined bits above which are used only in the device allocation tree */
+/*
+ * Redefined bits above which are used only in the device allocation tree,
+ * shouldn't be using EXTENT_IOBITS(EXTENT_LOCKED/EXTENT_WRITEBACK) /
+ * EXTENT_BOUNDARY / EXTENT_CLEAR_META_RESV / EXTENT_CLEAR_DATA_RESV because
+ * they have special meaning to the bit manipulation functions
+ */
 #define CHUNK_ALLOCATED EXTENT_DIRTY
+#define CHUNK_TRIMMED   EXTENT_DEFRAG
 
 /*
  * flags for bio submission. The high bits indicate the compression
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 0820f6fcf3a6..9e8c0904f623 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -389,8 +389,10 @@ int add_extent_mapping(struct extent_map_tree *tree,
 		goto out;
 
 	setup_extent_mapping(tree, em, modified);
-	if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags))
+	if (test_bit(EXTENT_FLAG_FS_MAPPING, &em->flags)) {
 		extent_map_device_set_bits(em, CHUNK_ALLOCATED);
+		extent_map_device_clear_bits(em, CHUNK_TRIMMED);
+	}
 out:
 	return ret;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 13/15] btrfs: Fix gross misnaming
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
                   ` (11 preceding siblings ...)
  2019-01-30 14:50 ` [PATCH 12/15] btrfs: Optimize unallocated chunks discard Nikolay Borisov
@ 2019-01-30 14:51 ` Nikolay Borisov
  2019-01-30 14:51 ` [PATCH 14/15] btrfs: Implement find_first_clear_extent_bit Nikolay Borisov
  2019-01-30 14:51 ` [PATCH 15/15] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit Nikolay Borisov
  14 siblings, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

The variables and function parameters of __etree_search which pertain to
prev/next are grossly misnamed. Namely, prev_ret holds the next state
and not the previous. Similarly, next_ret actually holds the previous
extent state relating to the offset we are interested in. Fix this by
renaming the variables as well as switching the arguments order. No
functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e788d4aa53d1..d5525d18803d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -309,8 +309,8 @@ static struct rb_node *tree_insert(struct rb_root *root,
 }
 
 static struct rb_node *__etree_search(struct extent_io_tree *tree, u64 offset,
-				      struct rb_node **prev_ret,
 				      struct rb_node **next_ret,
+				      struct rb_node **prev_ret,
 				      struct rb_node ***p_ret,
 				      struct rb_node **parent_ret)
 {
@@ -339,23 +339,23 @@ static struct rb_node *__etree_search(struct extent_io_tree *tree, u64 offset,
 	if (parent_ret)
 		*parent_ret = prev;
 
-	if (prev_ret) {
+	if (next_ret) {
 		orig_prev = prev;
 		while (prev && offset > prev_entry->end) {
 			prev = rb_next(prev);
 			prev_entry = rb_entry(prev, struct tree_entry, rb_node);
 		}
-		*prev_ret = prev;
+		*next_ret = prev;
 		prev = orig_prev;
 	}
 
-	if (next_ret) {
+	if (prev_ret) {
 		prev_entry = rb_entry(prev, struct tree_entry, rb_node);
 		while (prev && offset < prev_entry->start) {
 			prev = rb_prev(prev);
 			prev_entry = rb_entry(prev, struct tree_entry, rb_node);
 		}
-		*next_ret = prev;
+		*prev_ret = prev;
 	}
 	return NULL;
 }
@@ -366,12 +366,12 @@ tree_search_for_insert(struct extent_io_tree *tree,
 		       struct rb_node ***p_ret,
 		       struct rb_node **parent_ret)
 {
-	struct rb_node *prev = NULL;
+	struct rb_node *next= NULL;
 	struct rb_node *ret;
 
-	ret = __etree_search(tree, offset, &prev, NULL, p_ret, parent_ret);
+	ret = __etree_search(tree, offset, &next, NULL, p_ret, parent_ret);
 	if (!ret)
-		return prev;
+		return next;
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 14/15] btrfs: Implement find_first_clear_extent_bit
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
                   ` (12 preceding siblings ...)
  2019-01-30 14:51 ` [PATCH 13/15] btrfs: Fix gross misnaming Nikolay Borisov
@ 2019-01-30 14:51 ` Nikolay Borisov
  2019-02-04 14:04   ` Johannes Thumshirn
  2019-02-04 16:57   ` Nikolay Borisov
  2019-01-30 14:51 ` [PATCH 15/15] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit Nikolay Borisov
  14 siblings, 2 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

This function is very similar to find_first_extent_bit except that it
locates the first contiguous span of space which does not have bits set.
It's intended use is in the freespace trimming code.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c | 78 ++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/extent_io.h |  2 ++
 2 files changed, 80 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d5525d18803d..542eaab9c0c1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1474,6 +1474,84 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 	return ret;
 }
 
+/* find_first_clear_extent_bit - finds the first range that has @bits not set
+ * and that starts after @start
+ *
+ * @tree - the tree to search
+ * @start - the offset at/after which the found extent should start
+ * @start_ret - records the beginning of the range
+ * @end_ret - records the end of the range (inclusive)
+ * @bits - the set of bits which must be unset
+ *
+ * Returns 0 when a range is found and @start_ret/@end_ret are initialised
+ * accordingly, 1 otherwise. Since unallocated range is also considered one
+ * which doesn't have the bits set it's possible that @end_ret contains -1, this
+ * happens in case the range spans (last_range_end, end of device]. In this case
+ * it's up to the caller to trim @end_ret to the appropriate size.
+ */
+int find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
+				u64 *start_ret, u64 *end_ret, unsigned bits)
+{
+	struct extent_state *state;
+	struct rb_node *node, *prev = NULL, *next;
+	int ret = 1;
+
+	spin_lock(&tree->lock);
+
+	/* Find first extent with bits cleared */
+	while (1) {
+		node = __etree_search(tree, start, &next, &prev, NULL, NULL);
+		if (!node) {
+			node = next;
+			if (!node) {
+				/*
+				 * We are past the last allocated chunk,
+				 * set start at the end of the last extent. The
+				 * device alloc tree should never be empty so
+				 * prev is always set.
+				 */
+				ASSERT(prev);
+				state = rb_entry(prev, struct extent_state, rb_node);
+				*start_ret = state->end + 1;
+				*end_ret = -1;
+				ret = 0;
+				goto out;
+			}
+		}
+		state = rb_entry(node, struct extent_state, rb_node);
+		if (in_range(start, state->start, state->end - state->start + 1) &&
+			(state->state & bits)) {
+			start = state->end + 1;
+		} else {
+			*start_ret = start;
+			break;
+		}
+	}
+
+	/*
+	 * Find the longest stretch from start until an entry which has the
+	 * bits set
+	 */
+	while (1) {
+		state = rb_entry(node, struct extent_state, rb_node);
+		if (state->end >= start && !(state->state & bits)) {
+			*end_ret = state->end;
+			ret = 0;
+		} else {
+			*end_ret = state->start - 1;
+			ret = 0;
+			break;
+		}
+
+		node = rb_next(node);
+		if (!node)
+			break;
+	}
+out:
+	spin_unlock(&tree->lock);
+	return ret;
+}
+
 /*
  * find a contiguous range of bytes in the file marked as delalloc, not
  * more than 'max_bytes'.  start and end are used to return the range,
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index d238efd628cf..7ddb3ec70023 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -391,6 +391,8 @@ static inline int set_extent_uptodate(struct extent_io_tree *tree, u64 start,
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 			  u64 *start_ret, u64 *end_ret, unsigned bits,
 			  struct extent_state **cached_state);
+int find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
+				u64 *start_ret, u64 *end_ret, unsigned bits);
 int extent_invalidatepage(struct extent_io_tree *tree,
 			  struct page *page, unsigned long offset);
 int extent_write_full_page(struct page *page, struct writeback_control *wbc);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 15/15] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit
  2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
                   ` (13 preceding siblings ...)
  2019-01-30 14:51 ` [PATCH 14/15] btrfs: Implement find_first_clear_extent_bit Nikolay Borisov
@ 2019-01-30 14:51 ` Nikolay Borisov
  2019-01-31 15:38   ` [PATCH v2] " Nikolay Borisov
  2019-01-31 15:41   ` [PATCH v3] " Nikolay Borisov
  14 siblings, 2 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm, Nikolay Borisov

Instead of always calling the allocator to search for a free extent,
that satisfies the input criteria, switch btrfs_trim_free_extents to
using find_first_clear_extent_bit. With this change it's no longer
necessary to read the device tree in order to figure out holes in
the devices.

Now the code always searches in-memory data structure to figure out the
space range which contains the requested which should result in speed
oups.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent-tree.c | 87 ++++++++++++------------------------------
 1 file changed, 24 insertions(+), 63 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f5005ef39f98..7a2b144bdb76 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -11127,54 +11127,6 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
 	return unpin_extent_range(fs_info, start, end, false);
 }
 
-static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len)
-{
-	u64 trimmed_start = 0, trimmed_end = 0;
-	u64 end = *start + *len - 1;
-
-	if (!find_first_extent_bit(&device->alloc_state, *start, &trimmed_start,
-				   &trimmed_end, CHUNK_TRIMMED, NULL)) {
-		u64 trimmed_len = trimmed_end - trimmed_start + 1;
-
-		if (*start < trimmed_start) {
-			if (in_range(end, trimmed_start, trimmed_len) ||
-			    end > trimmed_end) {
-				/*
-				 * start|------|end
-				 *      ts|--|trimmed_len
-				 *      OR
-				 * start|-----|end
-				 *      ts|-----|trimmed_len
-				 */
-				*len = trimmed_start - *start;
-				return false;
-			} else if (end < trimmed_start) {
-				/*
-				 * start|------|end
-				 *             ts|--|trimmed_len
-				 */
-				return false;
-			}
-		} else if (in_range(*start, trimmed_start, trimmed_len)) {
-			if (in_range(end, trimmed_start, trimmed_len)) {
-				/*
-				 * start|------|end
-				 *  ts|----------|trimmed_len
-				 */
-				return true;
-			} else {
-				/*
-				 * start|-----------|end
-				 *  ts|----------|trimmed_len
-				 */
-				*start = trimmed_end + 1;
-				*len = end - *start + 1;
-				return false;
-			}
-		}
-	}
-	return false;
-}
 /*
  * It used to be that old block groups would be left around forever.
  * Iterating over them would be enough to trim unused space.  Since we
@@ -11198,7 +11150,7 @@ static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len)
 static int btrfs_trim_free_extents(struct btrfs_device *device,
 				   struct fstrim_range *range, u64 *trimmed)
 {
-	u64 start = range->start, len = 0;
+	u64 start = range->start, len = 0, end = 0;
 	int ret;
 
 	*trimmed = 0;
@@ -11225,34 +11177,43 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 		if (ret)
 			break;
 
-		ret = find_free_dev_extent_start(device, range->minlen, start,
-						 &start, &len);
-
+		ret = find_first_clear_extent_bit(&device->alloc_state, start,
+				&start, &end,
+				CHUNK_TRIMMED | CHUNK_ALLOCATED);
 		if (ret) {
 			mutex_unlock(&fs_info->chunk_mutex);
-			if (ret == -ENOSPC)
-				ret = 0;
+			ret = 0;
 			break;
 		}
+		/* If find_first_clear_extent_bit find a range that spans the
+		 * end of the device it will set end to -1, in this case it's up
+		 * to the caller to trim the value to the size of the device.
+		 */
+		end = min(end, device->total_bytes);
+		len = end - start + 1;
+
+		/* Keep going until we satisfy minlen or reach end of space */
+		if (len < range->minlen) {
+			mutex_unlock(&fs_info->chunk_mutex);
+			start += len;
+			continue;
+		}
 
 		/* If we are out of the passed range break */
 		if (start > range->start + range->len - 1) {
 			mutex_unlock(&fs_info->chunk_mutex);
-			ret = 0;
 			break;
 		}
 
 		start = max(range->start, start);
 		len = min(range->len, len);
 
-		if (!should_skip_trim(device, &start, &len)) {
-			ret = btrfs_issue_discard(device->bdev, start, len,
-						  &bytes);
-			if (!ret)
-				set_extent_bits(&device->alloc_state, start,
-						start + bytes - 1,
-						CHUNK_TRIMMED);
-		}
+		ret = btrfs_issue_discard(device->bdev, start, len,
+					  &bytes);
+		if (!ret)
+			set_extent_bits(&device->alloc_state, start,
+					start + bytes - 1,
+					CHUNK_TRIMMED);
 		mutex_unlock(&fs_info->chunk_mutex);
 
 		if (ret)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/15] btrfs: Honour FITRIM range constraints during free space trim
  2019-01-30 14:50 ` [PATCH 01/15] btrfs: Honour FITRIM range constraints during free space trim Nikolay Borisov
@ 2019-01-31 15:21   ` David Sterba
  2019-01-31 15:35     ` Nikolay Borisov
  0 siblings, 1 reply; 36+ messages in thread
From: David Sterba @ 2019-01-31 15:21 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, jeffm

On Wed, Jan 30, 2019 at 04:50:48PM +0200, Nikolay Borisov wrote:
> Up until know trimming the freespace was done irrespective of what the
> arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments
> will be entirely ignored. Fix it by correctly handling those paramter.
> This requires breaking if the found freespace extent is after the end
> of the passed range as well as completing trim after trimming
> fstrim_range::len bytes.

How does this work with with multipe-device filesystems? The fstrim
range would apply to all of them. Which does make some sense, though
might be unexpected as this does not happen for other filesystems.

The FITRIM range is in the physical coordinates, so eg. the taking the
maximum size of all devices and iterating over that by 1GiB steps would
go though the whole filesystem. Something to put to the changelog and
documentation.

> Fixes: 499f377f49f0 ("btrfs: iterate over unused chunk space in FITRIM")
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 25 +++++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 1fdba38761f7..fc3f6acc3c9b 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -11190,9 +11190,9 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
>   * held back allocations.
>   */
>  static int btrfs_trim_free_extents(struct btrfs_device *device,
> -				   u64 minlen, u64 *trimmed)
> +				   struct fstrim_range *range, u64 *trimmed)
>  {
> -	u64 start = 0, len = 0;
> +	u64 start = range->start, len = 0;
>  	int ret;
>  
>  	*trimmed = 0;
> @@ -11235,8 +11235,8 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>  		if (!trans)
>  			up_read(&fs_info->commit_root_sem);
>  
> -		ret = find_free_dev_extent_start(trans, device, minlen, start,
> -						 &start, &len);
> +		ret = find_free_dev_extent_start(trans, device, range->minlen,
> +						 start, &start, &len);
>  		if (trans) {
>  			up_read(&fs_info->commit_root_sem);
>  			btrfs_put_transaction(trans);
> @@ -11249,6 +11249,16 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>  			break;
>  		}
>  
> +		/* If we are out of the passed range break */
> +		if (start > range->start + range->len - 1) {
> +			mutex_unlock(&fs_info->chunk_mutex);
> +			ret = 0;
> +			break;
> +		}
> +
> +		start = max(range->start, start);
> +		len = min(range->len, len);
> +
>  		ret = btrfs_issue_discard(device->bdev, start, len, &bytes);
>  		mutex_unlock(&fs_info->chunk_mutex);
>  
> @@ -11258,6 +11268,10 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>  		start += len;
>  		*trimmed += bytes;
>  
> +		/* We've trimmed enough */
> +		if (*trimmed >= range->len)
> +			break;
> +
>  		if (fatal_signal_pending(current)) {
>  			ret = -ERESTARTSYS;
>  			break;
> @@ -11341,8 +11355,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>  	mutex_lock(&fs_info->fs_devices->device_list_mutex);
>  	devices = &fs_info->fs_devices->devices;
>  	list_for_each_entry(device, devices, dev_list) {
> -		ret = btrfs_trim_free_extents(device, range->minlen,
> -					      &group_trimmed);
> +		ret = btrfs_trim_free_extents(device, range, &group_trimmed);
>  		if (ret) {
>  			dev_failed++;
>  			dev_ret = ret;
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/15] btrfs: Make WARN_ON in a canonical form
  2019-01-30 14:50 ` [PATCH 02/15] btrfs: Make WARN_ON in a canonical form Nikolay Borisov
@ 2019-01-31 15:22   ` David Sterba
  2019-02-04 13:12   ` Johannes Thumshirn
  1 sibling, 0 replies; 36+ messages in thread
From: David Sterba @ 2019-01-31 15:22 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, jeffm

On Wed, Jan 30, 2019 at 04:50:49PM +0200, Nikolay Borisov wrote:
> There is no point in using a construct like 'if (!condition)
> WARN_ON(1)'. Use WARN_ON(!condition) directly. No functional changes.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/15] btrfs: Remove  EXTENT_FIRST_DELALLOC bit
  2019-01-30 14:50 ` [PATCH 03/15] btrfs: Remove EXTENT_FIRST_DELALLOC bit Nikolay Borisov
@ 2019-01-31 15:23   ` David Sterba
  2019-02-04 13:15   ` Johannes Thumshirn
  1 sibling, 0 replies; 36+ messages in thread
From: David Sterba @ 2019-01-31 15:23 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, jeffm

On Wed, Jan 30, 2019 at 04:50:50PM +0200, Nikolay Borisov wrote:
> With the refactoring introduced in 8b62f87bad9c ("Btrfs: reworki
> outstanding_extents") this flag became unused. Remove it and renumber
> the following flags accordingly. No functional changes.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/15] btrfs: Honour FITRIM range constraints during free space trim
  2019-01-31 15:21   ` David Sterba
@ 2019-01-31 15:35     ` Nikolay Borisov
  2019-01-31 15:48       ` David Sterba
  2019-01-31 19:30       ` Jeff Mahoney
  0 siblings, 2 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-31 15:35 UTC (permalink / raw)
  To: dsterba, linux-btrfs, jeffm



On 31.01.19 г. 17:21 ч., David Sterba wrote:
> On Wed, Jan 30, 2019 at 04:50:48PM +0200, Nikolay Borisov wrote:
>> Up until know trimming the freespace was done irrespective of what the
>> arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments
>> will be entirely ignored. Fix it by correctly handling those paramter.
>> This requires breaking if the found freespace extent is after the end
>> of the passed range as well as completing trim after trimming
>> fstrim_range::len bytes.
> 
> How does this work with with multipe-device filesystems? The fstrim
> range would apply to all of them. Which does make some sense, though
> might be unexpected as this does not happen for other filesystems.

Well FITRIM doesn't have support for multiple device so it's to the
discretion of the fs how exactly this is implemented. And this is indeed
the way things work currently.

> 
> The FITRIM range is in the physical coordinates, so eg. the taking the
> maximum size of all devices and iterating over that by 1GiB steps would
> go though the whole filesystem. Something to put to the changelog and
> documentation.

I don't follow, trimming would just trim the physical range as passed by
fstrim -o/-l options. That 1gb value is not hardcoded anywhere. If
someone wants to trim all of the freespace (which is what the majority
of the user do) then they can run FITRIM with 0 for offset and -1 for
length.

> 
>> Fixes: 499f377f49f0 ("btrfs: iterate over unused chunk space in FITRIM")
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> ---
>>  fs/btrfs/extent-tree.c | 25 +++++++++++++++++++------
>>  1 file changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 1fdba38761f7..fc3f6acc3c9b 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -11190,9 +11190,9 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
>>   * held back allocations.
>>   */
>>  static int btrfs_trim_free_extents(struct btrfs_device *device,
>> -				   u64 minlen, u64 *trimmed)
>> +				   struct fstrim_range *range, u64 *trimmed)
>>  {
>> -	u64 start = 0, len = 0;
>> +	u64 start = range->start, len = 0;
>>  	int ret;
>>  
>>  	*trimmed = 0;
>> @@ -11235,8 +11235,8 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>>  		if (!trans)
>>  			up_read(&fs_info->commit_root_sem);
>>  
>> -		ret = find_free_dev_extent_start(trans, device, minlen, start,
>> -						 &start, &len);
>> +		ret = find_free_dev_extent_start(trans, device, range->minlen,
>> +						 start, &start, &len);
>>  		if (trans) {
>>  			up_read(&fs_info->commit_root_sem);
>>  			btrfs_put_transaction(trans);
>> @@ -11249,6 +11249,16 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>>  			break;
>>  		}
>>  
>> +		/* If we are out of the passed range break */
>> +		if (start > range->start + range->len - 1) {
>> +			mutex_unlock(&fs_info->chunk_mutex);
>> +			ret = 0;
>> +			break;
>> +		}
>> +
>> +		start = max(range->start, start);
>> +		len = min(range->len, len);
>> +
>>  		ret = btrfs_issue_discard(device->bdev, start, len, &bytes);
>>  		mutex_unlock(&fs_info->chunk_mutex);
>>  
>> @@ -11258,6 +11268,10 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>>  		start += len;
>>  		*trimmed += bytes;
>>  
>> +		/* We've trimmed enough */
>> +		if (*trimmed >= range->len)
>> +			break;
>> +
>>  		if (fatal_signal_pending(current)) {
>>  			ret = -ERESTARTSYS;
>>  			break;
>> @@ -11341,8 +11355,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>>  	mutex_lock(&fs_info->fs_devices->device_list_mutex);
>>  	devices = &fs_info->fs_devices->devices;
>>  	list_for_each_entry(device, devices, dev_list) {
>> -		ret = btrfs_trim_free_extents(device, range->minlen,
>> -					      &group_trimmed);
>> +		ret = btrfs_trim_free_extents(device, range, &group_trimmed);
>>  		if (ret) {
>>  			dev_failed++;
>>  			dev_ret = ret;
>> -- 
>> 2.17.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit
  2019-01-30 14:51 ` [PATCH 15/15] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit Nikolay Borisov
@ 2019-01-31 15:38   ` " Nikolay Borisov
  2019-01-31 15:41   ` [PATCH v3] " Nikolay Borisov
  1 sibling, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-31 15:38 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, jeffm, Nikolay Borisov

Instead of always calling the allocator to search for a free extent,
that satisfies the input criteria, switch btrfs_trim_free_extents to
using find_first_clear_extent_bit. With this change it's no longer
necessary to read the device tree in order to figure out holes in
the devices.

Now the code always searches in-memory data structure to figure out the
space range which contains the requested which should result in speed
oups.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---

V2: 
 Explicitly ensure that the offset of the passed range is at least 1mb. This 
 mimics the behavior of the chunk allocator where the first megabyte of every 
 device is never touched. This ensures we are not screwing up with any bootloaders
 that might occupy the first megabyte. 

 fs/btrfs/extent-tree.c | 87 ++++++++++++------------------------------
 1 file changed, 24 insertions(+), 63 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f5005ef39f98..23bda5d58d9f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -11127,54 +11127,6 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
 	return unpin_extent_range(fs_info, start, end, false);
 }
 
-static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len)
-{
-	u64 trimmed_start = 0, trimmed_end = 0;
-	u64 end = *start + *len - 1;
-
-	if (!find_first_extent_bit(&device->alloc_state, *start, &trimmed_start,
-				   &trimmed_end, CHUNK_TRIMMED, NULL)) {
-		u64 trimmed_len = trimmed_end - trimmed_start + 1;
-
-		if (*start < trimmed_start) {
-			if (in_range(end, trimmed_start, trimmed_len) ||
-			    end > trimmed_end) {
-				/*
-				 * start|------|end
-				 *      ts|--|trimmed_len
-				 *      OR
-				 * start|-----|end
-				 *      ts|-----|trimmed_len
-				 */
-				*len = trimmed_start - *start;
-				return false;
-			} else if (end < trimmed_start) {
-				/*
-				 * start|------|end
-				 *             ts|--|trimmed_len
-				 */
-				return false;
-			}
-		} else if (in_range(*start, trimmed_start, trimmed_len)) {
-			if (in_range(end, trimmed_start, trimmed_len)) {
-				/*
-				 * start|------|end
-				 *  ts|----------|trimmed_len
-				 */
-				return true;
-			} else {
-				/*
-				 * start|-----------|end
-				 *  ts|----------|trimmed_len
-				 */
-				*start = trimmed_end + 1;
-				*len = end - *start + 1;
-				return false;
-			}
-		}
-	}
-	return false;
-}
 /*
  * It used to be that old block groups would be left around forever.
  * Iterating over them would be enough to trim unused space.  Since we
@@ -11198,7 +11150,7 @@ static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len)
 static int btrfs_trim_free_extents(struct btrfs_device *device,
 				   struct fstrim_range *range, u64 *trimmed)
 {
-	u64 start = range->start, len = 0;
+	u64 start = max(range->start, SZ_1M), len = 0, end = 0;
 	int ret;
 
 	*trimmed = 0;
@@ -11225,34 +11177,43 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 		if (ret)
 			break;
 
-		ret = find_free_dev_extent_start(device, range->minlen, start,
-						 &start, &len);
-
+		ret = find_first_clear_extent_bit(&device->alloc_state, start,
+				&start, &end,
+				CHUNK_TRIMMED | CHUNK_ALLOCATED);
 		if (ret) {
 			mutex_unlock(&fs_info->chunk_mutex);
-			if (ret == -ENOSPC)
-				ret = 0;
+			ret = 0;
 			break;
 		}
+		/* If find_first_clear_extent_bit find a range that spans the
+		 * end of the device it will set end to -1, in this case it's up
+		 * to the caller to trim the value to the size of the device.
+		 */
+		end = min(end, device->total_bytes);
+		len = end - start + 1;
+
+		/* Keep going until we satisfy minlen or reach end of space */
+		if (len < range->minlen) {
+			mutex_unlock(&fs_info->chunk_mutex);
+			start += len;
+			continue;
+		}
 
 		/* If we are out of the passed range break */
 		if (start > range->start + range->len - 1) {
 			mutex_unlock(&fs_info->chunk_mutex);
-			ret = 0;
 			break;
 		}
 
 		start = max(range->start, start);
 		len = min(range->len, len);
 
-		if (!should_skip_trim(device, &start, &len)) {
-			ret = btrfs_issue_discard(device->bdev, start, len,
-						  &bytes);
-			if (!ret)
-				set_extent_bits(&device->alloc_state, start,
-						start + bytes - 1,
-						CHUNK_TRIMMED);
-		}
+		ret = btrfs_issue_discard(device->bdev, start, len,
+					  &bytes);
+		if (!ret)
+			set_extent_bits(&device->alloc_state, start,
+					start + bytes - 1,
+					CHUNK_TRIMMED);
 		mutex_unlock(&fs_info->chunk_mutex);
 
 		if (ret)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit
  2019-01-30 14:51 ` [PATCH 15/15] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit Nikolay Borisov
  2019-01-31 15:38   ` [PATCH v2] " Nikolay Borisov
@ 2019-01-31 15:41   ` " Nikolay Borisov
  1 sibling, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-31 15:41 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, jeffm, Nikolay Borisov

Instead of always calling the allocator to search for a free extent,
that satisfies the input criteria, switch btrfs_trim_free_extents to
using find_first_clear_extent_bit. With this change it's no longer
necessary to read the device tree in order to figure out holes in
the devices.

Now the code always searches in-memory data structure to figure out the
space range which contains the requested which should result in speed
oups.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
V3: 
 * Switch to using max_t to avoid build warning
V2: 
 * Explicitly ensure that the offset of the passed range is at least 1mb. This 
   mimics the behavior of the chunk allocator where the first megabyte of every 
   device is never touched. This ensures we are not screwing up with any bootloaders
   that might occupy the first megabyte. 
 fs/btrfs/extent-tree.c | 87 ++++++++++++------------------------------
 1 file changed, 24 insertions(+), 63 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f5005ef39f98..04c5bd555177 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -11127,54 +11127,6 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
 	return unpin_extent_range(fs_info, start, end, false);
 }
 
-static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len)
-{
-	u64 trimmed_start = 0, trimmed_end = 0;
-	u64 end = *start + *len - 1;
-
-	if (!find_first_extent_bit(&device->alloc_state, *start, &trimmed_start,
-				   &trimmed_end, CHUNK_TRIMMED, NULL)) {
-		u64 trimmed_len = trimmed_end - trimmed_start + 1;
-
-		if (*start < trimmed_start) {
-			if (in_range(end, trimmed_start, trimmed_len) ||
-			    end > trimmed_end) {
-				/*
-				 * start|------|end
-				 *      ts|--|trimmed_len
-				 *      OR
-				 * start|-----|end
-				 *      ts|-----|trimmed_len
-				 */
-				*len = trimmed_start - *start;
-				return false;
-			} else if (end < trimmed_start) {
-				/*
-				 * start|------|end
-				 *             ts|--|trimmed_len
-				 */
-				return false;
-			}
-		} else if (in_range(*start, trimmed_start, trimmed_len)) {
-			if (in_range(end, trimmed_start, trimmed_len)) {
-				/*
-				 * start|------|end
-				 *  ts|----------|trimmed_len
-				 */
-				return true;
-			} else {
-				/*
-				 * start|-----------|end
-				 *  ts|----------|trimmed_len
-				 */
-				*start = trimmed_end + 1;
-				*len = end - *start + 1;
-				return false;
-			}
-		}
-	}
-	return false;
-}
 /*
  * It used to be that old block groups would be left around forever.
  * Iterating over them would be enough to trim unused space.  Since we
@@ -11198,7 +11150,7 @@ static bool should_skip_trim(struct btrfs_device *device, u64 *start, u64 *len)
 static int btrfs_trim_free_extents(struct btrfs_device *device,
 				   struct fstrim_range *range, u64 *trimmed)
 {
-	u64 start = range->start, len = 0;
+	u64 start = max_t(u64, range->start, SZ_1M), len = 0, end = 0;
 	int ret;
 
 	*trimmed = 0;
@@ -11225,34 +11177,43 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 		if (ret)
 			break;
 
-		ret = find_free_dev_extent_start(device, range->minlen, start,
-						 &start, &len);
-
+		ret = find_first_clear_extent_bit(&device->alloc_state, start,
+				&start, &end,
+				CHUNK_TRIMMED | CHUNK_ALLOCATED);
 		if (ret) {
 			mutex_unlock(&fs_info->chunk_mutex);
-			if (ret == -ENOSPC)
-				ret = 0;
+			ret = 0;
 			break;
 		}
+		/* If find_first_clear_extent_bit find a range that spans the
+		 * end of the device it will set end to -1, in this case it's up
+		 * to the caller to trim the value to the size of the device.
+		 */
+		end = min(end, device->total_bytes);
+		len = end - start + 1;
+
+		/* Keep going until we satisfy minlen or reach end of space */
+		if (len < range->minlen) {
+			mutex_unlock(&fs_info->chunk_mutex);
+			start += len;
+			continue;
+		}
 
 		/* If we are out of the passed range break */
 		if (start > range->start + range->len - 1) {
 			mutex_unlock(&fs_info->chunk_mutex);
-			ret = 0;
 			break;
 		}
 
 		start = max(range->start, start);
 		len = min(range->len, len);
 
-		if (!should_skip_trim(device, &start, &len)) {
-			ret = btrfs_issue_discard(device->bdev, start, len,
-						  &bytes);
-			if (!ret)
-				set_extent_bits(&device->alloc_state, start,
-						start + bytes - 1,
-						CHUNK_TRIMMED);
-		}
+		ret = btrfs_issue_discard(device->bdev, start, len,
+					  &bytes);
+		if (!ret)
+			set_extent_bits(&device->alloc_state, start,
+					start + bytes - 1,
+					CHUNK_TRIMMED);
 		mutex_unlock(&fs_info->chunk_mutex);
 
 		if (ret)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/15] btrfs: Honour FITRIM range constraints during free space trim
  2019-01-31 15:35     ` Nikolay Borisov
@ 2019-01-31 15:48       ` David Sterba
  2019-01-31 19:30       ` Jeff Mahoney
  1 sibling, 0 replies; 36+ messages in thread
From: David Sterba @ 2019-01-31 15:48 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dsterba, linux-btrfs, jeffm

On Thu, Jan 31, 2019 at 05:35:10PM +0200, Nikolay Borisov wrote:
> On 31.01.19 г. 17:21 ч., David Sterba wrote:
> > On Wed, Jan 30, 2019 at 04:50:48PM +0200, Nikolay Borisov wrote:
> >> Up until know trimming the freespace was done irrespective of what the
> >> arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments
> >> will be entirely ignored. Fix it by correctly handling those paramter.
> >> This requires breaking if the found freespace extent is after the end
> >> of the passed range as well as completing trim after trimming
> >> fstrim_range::len bytes.
> > 
> > How does this work with with multipe-device filesystems? The fstrim
> > range would apply to all of them. Which does make some sense, though
> > might be unexpected as this does not happen for other filesystems.
> 
> Well FITRIM doesn't have support for multiple device so it's to the
> discretion of the fs how exactly this is implemented. And this is indeed
> the way things work currently.
> 
> > The FITRIM range is in the physical coordinates, so eg. the taking the
> > maximum size of all devices and iterating over that by 1GiB steps would
> > go though the whole filesystem. Something to put to the changelog and
> > documentation.
> 
> I don't follow, trimming would just trim the physical range as passed by
> fstrim -o/-l options. That 1gb value is not hardcoded anywhere. If
> someone wants to trim all of the freespace (which is what the majority
> of the user do) then they can run FITRIM with 0 for offset and -1 for
> length.

Yeah but I wanted to give an example of usecase when the range is not
0/-1 and how this is could to be used.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/15] btrfs: Honour FITRIM range constraints during free space trim
  2019-01-31 15:35     ` Nikolay Borisov
  2019-01-31 15:48       ` David Sterba
@ 2019-01-31 19:30       ` Jeff Mahoney
  2019-01-31 21:45         ` Nikolay Borisov
  1 sibling, 1 reply; 36+ messages in thread
From: Jeff Mahoney @ 2019-01-31 19:30 UTC (permalink / raw)
  To: Nikolay Borisov, dsterba, linux-btrfs

On 1/31/19 10:35 AM, Nikolay Borisov wrote:
> 
> 
> On 31.01.19 г. 17:21 ч., David Sterba wrote:
>> On Wed, Jan 30, 2019 at 04:50:48PM +0200, Nikolay Borisov wrote:
>>> Up until know trimming the freespace was done irrespective of what the
>>> arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments
>>> will be entirely ignored. Fix it by correctly handling those paramter.
>>> This requires breaking if the found freespace extent is after the end
>>> of the passed range as well as completing trim after trimming
>>> fstrim_range::len bytes.
>>
>> How does this work with with multipe-device filesystems? The fstrim
>> range would apply to all of them. Which does make some sense, though
>> might be unexpected as this does not happen for other filesystems.
> 
> Well FITRIM doesn't have support for multiple device so it's to the
> discretion of the fs how exactly this is implemented. And this is indeed
> the way things work currently.
> 
>>
>> The FITRIM range is in the physical coordinates, so eg. the taking the
>> maximum size of all devices and iterating over that by 1GiB steps would
>> go though the whole filesystem. Something to put to the changelog and
>> documentation.
> 
> I don't follow, trimming would just trim the physical range as passed by
> fstrim -o/-l options. That 1gb value is not hardcoded anywhere. If
> someone wants to trim all of the freespace (which is what the majority
> of the user do) then they can run FITRIM with 0 for offset and -1 for
> length.

But what does physical range mean in this context?  The address space
presented to the user and which FITRIM operates on is the logical
address space that the extent tree uses.  For allocated chunks, there's
a mapping to physical devices.  It can have holes in it.  What does it
mean when that offset/length hits a hole?

I'm okay with (0,-1) referring to all the space on all devices.  That
seems obvious enough.  Perhaps for bounded trims, it only applies to the
parts within the range that are allocated?

-Jeff


>>> Fixes: 499f377f49f0 ("btrfs: iterate over unused chunk space in FITRIM")
>>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>>> ---
>>>  fs/btrfs/extent-tree.c | 25 +++++++++++++++++++------
>>>  1 file changed, 19 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 1fdba38761f7..fc3f6acc3c9b 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -11190,9 +11190,9 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
>>>   * held back allocations.
>>>   */
>>>  static int btrfs_trim_free_extents(struct btrfs_device *device,
>>> -				   u64 minlen, u64 *trimmed)
>>> +				   struct fstrim_range *range, u64 *trimmed)
>>>  {
>>> -	u64 start = 0, len = 0;
>>> +	u64 start = range->start, len = 0;
>>>  	int ret;
>>>  
>>>  	*trimmed = 0;
>>> @@ -11235,8 +11235,8 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>>>  		if (!trans)
>>>  			up_read(&fs_info->commit_root_sem);
>>>  
>>> -		ret = find_free_dev_extent_start(trans, device, minlen, start,
>>> -						 &start, &len);
>>> +		ret = find_free_dev_extent_start(trans, device, range->minlen,
>>> +						 start, &start, &len);
>>>  		if (trans) {
>>>  			up_read(&fs_info->commit_root_sem);
>>>  			btrfs_put_transaction(trans);
>>> @@ -11249,6 +11249,16 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>>>  			break;
>>>  		}
>>>  
>>> +		/* If we are out of the passed range break */
>>> +		if (start > range->start + range->len - 1) {
>>> +			mutex_unlock(&fs_info->chunk_mutex);
>>> +			ret = 0;
>>> +			break;
>>> +		}
>>> +
>>> +		start = max(range->start, start);
>>> +		len = min(range->len, len);
>>> +
>>>  		ret = btrfs_issue_discard(device->bdev, start, len, &bytes);
>>>  		mutex_unlock(&fs_info->chunk_mutex);
>>>  
>>> @@ -11258,6 +11268,10 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>>>  		start += len;
>>>  		*trimmed += bytes;
>>>  
>>> +		/* We've trimmed enough */
>>> +		if (*trimmed >= range->len)
>>> +			break;
>>> +
>>>  		if (fatal_signal_pending(current)) {
>>>  			ret = -ERESTARTSYS;
>>>  			break;
>>> @@ -11341,8 +11355,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>>>  	mutex_lock(&fs_info->fs_devices->device_list_mutex);
>>>  	devices = &fs_info->fs_devices->devices;
>>>  	list_for_each_entry(device, devices, dev_list) {
>>> -		ret = btrfs_trim_free_extents(device, range->minlen,
>>> -					      &group_trimmed);
>>> +		ret = btrfs_trim_free_extents(device, range, &group_trimmed);
>>>  		if (ret) {
>>>  			dev_failed++;
>>>  			dev_ret = ret;
>>> -- 
>>> 2.17.1
> 


-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/15] btrfs: Honour FITRIM range constraints during free space trim
  2019-01-31 19:30       ` Jeff Mahoney
@ 2019-01-31 21:45         ` Nikolay Borisov
  0 siblings, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-01-31 21:45 UTC (permalink / raw)
  To: Jeff Mahoney, dsterba, linux-btrfs



On 31.01.19 г. 21:30 ч., Jeff Mahoney wrote:
> On 1/31/19 10:35 AM, Nikolay Borisov wrote:
>>
>>
>> On 31.01.19 г. 17:21 ч., David Sterba wrote:
>>> On Wed, Jan 30, 2019 at 04:50:48PM +0200, Nikolay Borisov wrote:
>>>> Up until know trimming the freespace was done irrespective of what the
>>>> arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments
>>>> will be entirely ignored. Fix it by correctly handling those paramter.
>>>> This requires breaking if the found freespace extent is after the end
>>>> of the passed range as well as completing trim after trimming
>>>> fstrim_range::len bytes.
>>>
>>> How does this work with with multipe-device filesystems? The fstrim
>>> range would apply to all of them. Which does make some sense, though
>>> might be unexpected as this does not happen for other filesystems.
>>
>> Well FITRIM doesn't have support for multiple device so it's to the
>> discretion of the fs how exactly this is implemented. And this is indeed
>> the way things work currently.
>>
>>>
>>> The FITRIM range is in the physical coordinates, so eg. the taking the
>>> maximum size of all devices and iterating over that by 1GiB steps would
>>> go though the whole filesystem. Something to put to the changelog and
>>> documentation.
>>
>> I don't follow, trimming would just trim the physical range as passed by
>> fstrim -o/-l options. That 1gb value is not hardcoded anywhere. If
>> someone wants to trim all of the freespace (which is what the majority
>> of the user do) then they can run FITRIM with 0 for offset and -1 for
>> length.
> 
> But what does physical range mean in this context?  The address space
> presented to the user and which FITRIM operates on is the logical
> address space that the extent tree uses.  For allocated chunks, there's
> a mapping to physical devices.  It can have holes in it.  What does it
> mean when that offset/length hits a hole?

Physical in this context mean the actual space on the disk themselves.
This really has repercussion when trimming freespace. So for example when

fstrim -o 50m -l 20g is run this means that all space between 50m and
20g in the block groups (logical free space) will be trimmed.
Additionally, the physical (that is the space on the actual devices)
will also be trimmed. I don't think we can avoid this.

> 
> I'm okay with (0,-1) referring to all the space on all devices.  That
> seems obvious enough.  Perhaps for bounded trims, it only applies to the
> parts within the range that are allocated?
> 
> -Jeff
> 
> 
>>>> Fixes: 499f377f49f0 ("btrfs: iterate over unused chunk space in FITRIM")
>>>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>>>> ---
>>>>  fs/btrfs/extent-tree.c | 25 +++++++++++++++++++------
>>>>  1 file changed, 19 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>> index 1fdba38761f7..fc3f6acc3c9b 100644
>>>> --- a/fs/btrfs/extent-tree.c
>>>> +++ b/fs/btrfs/extent-tree.c
>>>> @@ -11190,9 +11190,9 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
>>>>   * held back allocations.
>>>>   */
>>>>  static int btrfs_trim_free_extents(struct btrfs_device *device,
>>>> -				   u64 minlen, u64 *trimmed)
>>>> +				   struct fstrim_range *range, u64 *trimmed)
>>>>  {
>>>> -	u64 start = 0, len = 0;
>>>> +	u64 start = range->start, len = 0;
>>>>  	int ret;
>>>>  
>>>>  	*trimmed = 0;
>>>> @@ -11235,8 +11235,8 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>>>>  		if (!trans)
>>>>  			up_read(&fs_info->commit_root_sem);
>>>>  
>>>> -		ret = find_free_dev_extent_start(trans, device, minlen, start,
>>>> -						 &start, &len);
>>>> +		ret = find_free_dev_extent_start(trans, device, range->minlen,
>>>> +						 start, &start, &len);
>>>>  		if (trans) {
>>>>  			up_read(&fs_info->commit_root_sem);
>>>>  			btrfs_put_transaction(trans);
>>>> @@ -11249,6 +11249,16 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>>>>  			break;
>>>>  		}
>>>>  
>>>> +		/* If we are out of the passed range break */
>>>> +		if (start > range->start + range->len - 1) {
>>>> +			mutex_unlock(&fs_info->chunk_mutex);
>>>> +			ret = 0;
>>>> +			break;
>>>> +		}
>>>> +
>>>> +		start = max(range->start, start);
>>>> +		len = min(range->len, len);
>>>> +
>>>>  		ret = btrfs_issue_discard(device->bdev, start, len, &bytes);
>>>>  		mutex_unlock(&fs_info->chunk_mutex);
>>>>  
>>>> @@ -11258,6 +11268,10 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>>>>  		start += len;
>>>>  		*trimmed += bytes;
>>>>  
>>>> +		/* We've trimmed enough */
>>>> +		if (*trimmed >= range->len)
>>>> +			break;
>>>> +
>>>>  		if (fatal_signal_pending(current)) {
>>>>  			ret = -ERESTARTSYS;
>>>>  			break;
>>>> @@ -11341,8 +11355,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>>>>  	mutex_lock(&fs_info->fs_devices->device_list_mutex);
>>>>  	devices = &fs_info->fs_devices->devices;
>>>>  	list_for_each_entry(device, devices, dev_list) {
>>>> -		ret = btrfs_trim_free_extents(device, range->minlen,
>>>> -					      &group_trimmed);
>>>> +		ret = btrfs_trim_free_extents(device, range, &group_trimmed);
>>>>  		if (ret) {
>>>>  			dev_failed++;
>>>>  			dev_ret = ret;
>>>> -- 
>>>> 2.17.1
>>
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/15] btrfs: Make WARN_ON in a canonical form
  2019-01-30 14:50 ` [PATCH 02/15] btrfs: Make WARN_ON in a canonical form Nikolay Borisov
  2019-01-31 15:22   ` David Sterba
@ 2019-02-04 13:12   ` Johannes Thumshirn
  1 sibling, 0 replies; 36+ messages in thread
From: Johannes Thumshirn @ 2019-02-04 13:12 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: jeffm

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/15] btrfs: Remove EXTENT_FIRST_DELALLOC bit
  2019-01-30 14:50 ` [PATCH 03/15] btrfs: Remove EXTENT_FIRST_DELALLOC bit Nikolay Borisov
  2019-01-31 15:23   ` David Sterba
@ 2019-02-04 13:15   ` Johannes Thumshirn
  1 sibling, 0 replies; 36+ messages in thread
From: Johannes Thumshirn @ 2019-02-04 13:15 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: jeffm

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/15] btrfs: combine device update operations during transaction commit
  2019-01-30 14:50 ` [PATCH 04/15] btrfs: combine device update operations during transaction commit Nikolay Borisov
@ 2019-02-04 13:25   ` Johannes Thumshirn
  2019-02-04 14:48     ` Nikolay Borisov
  0 siblings, 1 reply; 36+ messages in thread
From: Johannes Thumshirn @ 2019-02-04 13:25 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: jeffm

On 30/01/2019 15:50, Nikolay Borisov wrote:
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 888d72dda794..25dab68070dc 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4498,6 +4498,15 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans,
>  	ASSERT(list_empty(&cur_trans->dirty_bgs));
>  	ASSERT(list_empty(&cur_trans->io_bgs));
>  
> +	while (!list_empty(&cur_trans->dev_update_list)) {
> +		struct btrfs_device *dev;
> +
> +		dev = list_first_entry(&cur_trans->dev_update_list,
> +				       struct btrfs_device,
> +				       post_commit_list);
> +		list_del_init(&dev->post_commit_list);
> +	}
> +

Why not?

	list_for_each_entry_safe(dev, next, &cur_trans->dev_update_list,
				 post_commit_list)
		list_del_init(&dev->post_commit_list);

-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/15] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink
  2019-01-30 14:50 ` [PATCH 05/15] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink Nikolay Borisov
@ 2019-02-04 13:29   ` Johannes Thumshirn
  0 siblings, 0 replies; 36+ messages in thread
From: Johannes Thumshirn @ 2019-02-04 13:29 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: jeffm

Looks good to me,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/15] btrfs: Rename and export clear_btree_io_tree
  2019-01-30 14:50 ` [PATCH 06/15] btrfs: Rename and export clear_btree_io_tree Nikolay Borisov
@ 2019-02-04 13:31   ` Johannes Thumshirn
  0 siblings, 0 replies; 36+ messages in thread
From: Johannes Thumshirn @ 2019-02-04 13:31 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: jeffm

Looks good,
Reviewed-by: Johannes Thumshirn <jtumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/15] btrfs: Factor out in_range macro
  2019-01-30 14:50 ` [PATCH 11/15] btrfs: Factor out in_range macro Nikolay Borisov
@ 2019-02-04 13:57   ` Johannes Thumshirn
  0 siblings, 0 replies; 36+ messages in thread
From: Johannes Thumshirn @ 2019-02-04 13:57 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: jeffm

On 30/01/2019 15:50, Nikolay Borisov wrote:
> This is used in more than one places so let's factor it out in ctree.h.
> No functional changes.

You could move this before 9/15 of your series so you don't need to
duplicate the macro in 9/15.

And also (wanted to comment this to 9/15 but OK here as well) the macro
is prone to overflows. Not yet sure what the impact would be apart from
failing the detection if it is in a range or not. If it lives in ctree.h
this fact at least needs documentation, I guess.

	Johannes
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 14/15] btrfs: Implement find_first_clear_extent_bit
  2019-01-30 14:51 ` [PATCH 14/15] btrfs: Implement find_first_clear_extent_bit Nikolay Borisov
@ 2019-02-04 14:04   ` Johannes Thumshirn
  2019-02-04 16:57   ` Nikolay Borisov
  1 sibling, 0 replies; 36+ messages in thread
From: Johannes Thumshirn @ 2019-02-04 14:04 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: jeffm

On 30/01/2019 15:51, Nikolay Borisov wrote:
[...]
>  
> +/* find_first_clear_extent_bit - finds the first range that has @bits not set
> + * and that starts after @start
> + *

NIT: Shouldn't kerneldoc comments start with a /** and a newline afterwards?

-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/15] btrfs: Remove 'trans' argument from find_free_dev_extent(_start)
  2019-01-30 14:50 ` [PATCH 10/15] btrfs: Remove 'trans' argument from find_free_dev_extent(_start) Nikolay Borisov
@ 2019-02-04 14:36   ` Johannes Thumshirn
  0 siblings, 0 replies; 36+ messages in thread
From: Johannes Thumshirn @ 2019-02-04 14:36 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: jeffm

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
--
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/15] btrfs: combine device update operations during transaction commit
  2019-02-04 13:25   ` Johannes Thumshirn
@ 2019-02-04 14:48     ` Nikolay Borisov
  2019-02-05  9:21       ` Johannes Thumshirn
  0 siblings, 1 reply; 36+ messages in thread
From: Nikolay Borisov @ 2019-02-04 14:48 UTC (permalink / raw)
  To: Johannes Thumshirn, linux-btrfs; +Cc: jeffm



On 4.02.19 г. 15:25 ч., Johannes Thumshirn wrote:
> On 30/01/2019 15:50, Nikolay Borisov wrote:
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 888d72dda794..25dab68070dc 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -4498,6 +4498,15 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans,
>>  	ASSERT(list_empty(&cur_trans->dirty_bgs));
>>  	ASSERT(list_empty(&cur_trans->io_bgs));
>>  
>> +	while (!list_empty(&cur_trans->dev_update_list)) {
>> +		struct btrfs_device *dev;
>> +
>> +		dev = list_first_entry(&cur_trans->dev_update_list,
>> +				       struct btrfs_device,
>> +				       post_commit_list);
>> +		list_del_init(&dev->post_commit_list);
>> +	}
>> +
> 
> Why not?
> 
> 	list_for_each_entry_safe(dev, next, &cur_trans->dev_update_list,
> 				 post_commit_list)
> 		list_del_init(&dev->post_commit_list);

Yep, that should work as well, given that there could be no new updates
in the transaction at the point we are cleaning it up.
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 14/15] btrfs: Implement find_first_clear_extent_bit
  2019-01-30 14:51 ` [PATCH 14/15] btrfs: Implement find_first_clear_extent_bit Nikolay Borisov
  2019-02-04 14:04   ` Johannes Thumshirn
@ 2019-02-04 16:57   ` Nikolay Borisov
  1 sibling, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2019-02-04 16:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jeffm



On 30.01.19 г. 16:51 ч., Nikolay Borisov wrote:
> This function is very similar to find_first_extent_bit except that it
> locates the first contiguous span of space which does not have bits set.
> It's intended use is in the freespace trimming code.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  fs/btrfs/extent_io.c | 78 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/extent_io.h |  2 ++
>  2 files changed, 80 insertions(+)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index d5525d18803d..542eaab9c0c1 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1474,6 +1474,84 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
>  	return ret;
>  }
>  
> +/* find_first_clear_extent_bit - finds the first range that has @bits not set
> + * and that starts after @start
> + *
> + * @tree - the tree to search
> + * @start - the offset at/after which the found extent should start
> + * @start_ret - records the beginning of the range
> + * @end_ret - records the end of the range (inclusive)
> + * @bits - the set of bits which must be unset
> + *
> + * Returns 0 when a range is found and @start_ret/@end_ret are initialised
> + * accordingly, 1 otherwise. Since unallocated range is also considered one
> + * which doesn't have the bits set it's possible that @end_ret contains -1, this
> + * happens in case the range spans (last_range_end, end of device]. In this case
> + * it's up to the caller to trim @end_ret to the appropriate size.
> + */
> +int find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
> +				u64 *start_ret, u64 *end_ret, unsigned bits)
> +{
> +	struct extent_state *state;
> +	struct rb_node *node, *prev = NULL, *next;
> +	int ret = 1;
> +
> +	spin_lock(&tree->lock);
> +
> +	/* Find first extent with bits cleared */
> +	while (1) {
> +		node = __etree_search(tree, start, &next, &prev, NULL, NULL);
> +		if (!node) {
> +			node = next;
> +			if (!node) {
> +				/*
> +				 * We are past the last allocated chunk,
> +				 * set start at the end of the last extent. The
> +				 * device alloc tree should never be empty so
> +				 * prev is always set.
> +				 */
> +				ASSERT(prev);
> +				state = rb_entry(prev, struct extent_state, rb_node);
> +				*start_ret = state->end + 1;
> +				*end_ret = -1;
> +				ret = 0;
> +				goto out;

Bummer this is not working as expected. For example, for a fs which
doesn't have any holes it will return 0 but in fact it should return 1
and terminate the search. Have to figure how to distinguish between
"everything after last extent until -1 is not allocated" VS "there is no
extent which satisfies the condition so just return 1/failure"

> +			}
> +		}
> +		state = rb_entry(node, struct extent_state, rb_node);
> +		if (in_range(start, state->start, state->end - state->start + 1) &&
> +			(state->state & bits)) {
> +			start = state->end + 1;
> +		} else {
> +			*start_ret = start;
> +			break;
> +		}
> +	}
> +
> +	/*
> +	 * Find the longest stretch from start until an entry which has the
> +	 * bits set
> +	 */
> +	while (1) {
> +		state = rb_entry(node, struct extent_state, rb_node);
> +		if (state->end >= start && !(state->state & bits)) {
> +			*end_ret = state->end;
> +			ret = 0;
> +		} else {
> +			*end_ret = state->start - 1;
> +			ret = 0;
> +			break;
> +		}
> +
> +		node = rb_next(node);
> +		if (!node)
> +			break;
> +	}
> +out:
> +	spin_unlock(&tree->lock);
> +	return ret;
> +}
> +
>  /*
>   * find a contiguous range of bytes in the file marked as delalloc, not
>   * more than 'max_bytes'.  start and end are used to return the range,
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index d238efd628cf..7ddb3ec70023 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -391,6 +391,8 @@ static inline int set_extent_uptodate(struct extent_io_tree *tree, u64 start,
>  int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
>  			  u64 *start_ret, u64 *end_ret, unsigned bits,
>  			  struct extent_state **cached_state);
> +int find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
> +				u64 *start_ret, u64 *end_ret, unsigned bits);
>  int extent_invalidatepage(struct extent_io_tree *tree,
>  			  struct page *page, unsigned long offset);
>  int extent_write_full_page(struct page *page, struct writeback_control *wbc);
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/15] btrfs: combine device update operations during transaction commit
  2019-02-04 14:48     ` Nikolay Borisov
@ 2019-02-05  9:21       ` Johannes Thumshirn
  0 siblings, 0 replies; 36+ messages in thread
From: Johannes Thumshirn @ 2019-02-05  9:21 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: jeffm

On 04/02/2019 15:48, Nikolay Borisov wrote:
> 
> 
> On 4.02.19 г. 15:25 ч., Johannes Thumshirn wrote:
>> On 30/01/2019 15:50, Nikolay Borisov wrote:
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 888d72dda794..25dab68070dc 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -4498,6 +4498,15 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans,
>>>  	ASSERT(list_empty(&cur_trans->dirty_bgs));
>>>  	ASSERT(list_empty(&cur_trans->io_bgs));
>>>  
>>> +	while (!list_empty(&cur_trans->dev_update_list)) {
>>> +		struct btrfs_device *dev;
>>> +
>>> +		dev = list_first_entry(&cur_trans->dev_update_list,
>>> +				       struct btrfs_device,
>>> +				       post_commit_list);
>>> +		list_del_init(&dev->post_commit_list);
>>> +	}
>>> +
>>
>> Why not?
>>
>> 	list_for_each_entry_safe(dev, next, &cur_trans->dev_update_list,
>> 				 post_commit_list)
>> 		list_del_init(&dev->post_commit_list);
> 
> Yep, that should work as well, given that there could be no new updates
> in the transaction at the point we are cleaning it up.

Well if there are updates added to the dev_update_list
list_for_each_entry_safe() won't work yes, but if there are concurrent
updates the dev_update_list must be protected by a lock or both of the
above constructs will break into pieces.

	Johannes

-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, back to index

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-30 14:50 [PATCH 00/15] Improvements to fitrim Nikolay Borisov
2019-01-30 14:50 ` [PATCH 01/15] btrfs: Honour FITRIM range constraints during free space trim Nikolay Borisov
2019-01-31 15:21   ` David Sterba
2019-01-31 15:35     ` Nikolay Borisov
2019-01-31 15:48       ` David Sterba
2019-01-31 19:30       ` Jeff Mahoney
2019-01-31 21:45         ` Nikolay Borisov
2019-01-30 14:50 ` [PATCH 02/15] btrfs: Make WARN_ON in a canonical form Nikolay Borisov
2019-01-31 15:22   ` David Sterba
2019-02-04 13:12   ` Johannes Thumshirn
2019-01-30 14:50 ` [PATCH 03/15] btrfs: Remove EXTENT_FIRST_DELALLOC bit Nikolay Borisov
2019-01-31 15:23   ` David Sterba
2019-02-04 13:15   ` Johannes Thumshirn
2019-01-30 14:50 ` [PATCH 04/15] btrfs: combine device update operations during transaction commit Nikolay Borisov
2019-02-04 13:25   ` Johannes Thumshirn
2019-02-04 14:48     ` Nikolay Borisov
2019-02-05  9:21       ` Johannes Thumshirn
2019-01-30 14:50 ` [PATCH 05/15] btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink Nikolay Borisov
2019-02-04 13:29   ` Johannes Thumshirn
2019-01-30 14:50 ` [PATCH 06/15] btrfs: Rename and export clear_btree_io_tree Nikolay Borisov
2019-02-04 13:31   ` Johannes Thumshirn
2019-01-30 14:50 ` [PATCH 07/15] btrfs: Populate ->orig_block_len during read_one_chunk Nikolay Borisov
2019-01-30 14:50 ` [PATCH 08/15] btrfs: Introduce new bits for device allocation tree Nikolay Borisov
2019-01-30 14:50 ` [PATCH 09/15] btrfs: replace pending/pinned chunks lists with io tree Nikolay Borisov
2019-01-30 14:50 ` [PATCH 10/15] btrfs: Remove 'trans' argument from find_free_dev_extent(_start) Nikolay Borisov
2019-02-04 14:36   ` Johannes Thumshirn
2019-01-30 14:50 ` [PATCH 11/15] btrfs: Factor out in_range macro Nikolay Borisov
2019-02-04 13:57   ` Johannes Thumshirn
2019-01-30 14:50 ` [PATCH 12/15] btrfs: Optimize unallocated chunks discard Nikolay Borisov
2019-01-30 14:51 ` [PATCH 13/15] btrfs: Fix gross misnaming Nikolay Borisov
2019-01-30 14:51 ` [PATCH 14/15] btrfs: Implement find_first_clear_extent_bit Nikolay Borisov
2019-02-04 14:04   ` Johannes Thumshirn
2019-02-04 16:57   ` Nikolay Borisov
2019-01-30 14:51 ` [PATCH 15/15] btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit Nikolay Borisov
2019-01-31 15:38   ` [PATCH v2] " Nikolay Borisov
2019-01-31 15:41   ` [PATCH v3] " Nikolay Borisov

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox