Linux-BTRFS Archive on lore.kernel.org
 help / Atom feed
* [PATCH 0/5] Misc cleanups in balance code
@ 2018-10-26 11:43 Nikolay Borisov
  2018-10-26 11:43 ` [PATCH 1/5] btrfs: Ensure at least 1g is free for balance Nikolay Borisov
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Nikolay Borisov @ 2018-10-26 11:43 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

While investigating the balance hang I came across various inconsistencies in 
the source. This series aims to fix those. 

The first patch is (I believe) a fix to a longstanding bug that could cause 
balance to fail due to ENOSPC. The code no properly ensures that there is 
at least 1g of unallocated space on every device being balance.

Patch 2 makes btrfs_can_relocate a bit more obvious and removes leftovers from 
previous cleanup

Patches 3/4/5 remove some redundant code from various functions. 

This series has survived multiple xfstest runs. 

Nikolay Borisov (5):
  btrfs: Ensure at least 1g is free for balance
  btrfs: Refactor btrfs_can_relocate
  btrfs: Remove superfluous check form btrfs_remove_chunk
  btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument
  btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range

 fs/btrfs/ctree.h                 |  2 +-
 fs/btrfs/extent-tree.c           | 39 ++++++++++++++-------------------------
 fs/btrfs/extent_io.c             | 13 ++++++-------
 fs/btrfs/extent_io.h             |  2 +-
 fs/btrfs/tests/extent-io-tests.c | 10 +++++-----
 fs/btrfs/volumes.c               | 17 +++++++----------
 6 files changed, 34 insertions(+), 49 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5] btrfs: Ensure at least 1g is free for balance
  2018-10-26 11:43 [PATCH 0/5] Misc cleanups in balance code Nikolay Borisov
@ 2018-10-26 11:43 ` Nikolay Borisov
  2018-10-26 12:04   ` Qu Wenruo
  2018-10-26 12:09   ` Hans van Kranenburg
  2018-10-26 11:43 ` [PATCH 2/5] btrfs: Refactor btrfs_can_relocate Nikolay Borisov
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Nikolay Borisov @ 2018-10-26 11:43 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

The first part of balance operation is to shrink every constituting
device to ensure there is free space for chunk allocation. However, the code
has been buggy ever since its introduction since calculating the space to shrink
the device by was bounded by 1 mb. Most likely the original intention was to
have an upper bound of 1g and not 1m, since the largest chunk size is 1g. This 
means the first stage in __btrfs_balance so far has been a null op since it
effectively freed just a single megabyte.

Fix this by setting an upper bound of size_to_free of 1g. 

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f435d397019e..8b0fd7bf3447 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3467,7 +3467,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
 	list_for_each_entry(device, devices, dev_list) {
 		old_size = btrfs_device_get_total_bytes(device);
 		size_to_free = div_factor(old_size, 1);
-		size_to_free = min_t(u64, size_to_free, SZ_1M);
+		size_to_free = min_t(u64, size_to_free, SZ_1G);
 		if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) ||
 		    btrfs_device_get_total_bytes(device) -
 		    btrfs_device_get_bytes_used(device) > size_to_free ||
-- 
2.7.4


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/5] btrfs: Refactor btrfs_can_relocate
  2018-10-26 11:43 [PATCH 0/5] Misc cleanups in balance code Nikolay Borisov
  2018-10-26 11:43 ` [PATCH 1/5] btrfs: Ensure at least 1g is free for balance Nikolay Borisov
@ 2018-10-26 11:43 ` Nikolay Borisov
  2018-10-26 12:35   ` Qu Wenruo
  2018-11-17  1:29   ` Anand Jain
  2018-10-26 11:43 ` [PATCH 3/5] btrfs: Remove superfluous check form btrfs_remove_chunk Nikolay Borisov
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Nikolay Borisov @ 2018-10-26 11:43 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

btrfs_can_relocate returns 0 when it concludes the given chunk can be
relocated and -1 otherwise. Since this function is used as a predicated
and it return a binary value it makes no sense to have it's return
value as an int so change it to bool. Furthermore, remove a stale
leftover comment from e6ec716f0ddb ("Btrfs: make raid attr array more
readable").

Finally make the logic in the list_for_each_entry loop a bit more obvious. Up
until now the fact that we are returning success (0) was a bit masked by the
fact that the 0 is re-used from the return value of find_free_dev_extent.
Instead, now just increment dev_nr if we find a suitable extent and explicitly
set can_reloc to true if enough devices with unallocated space are present.
No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/ctree.h       |  2 +-
 fs/btrfs/extent-tree.c | 39 ++++++++++++++-------------------------
 fs/btrfs/volumes.c     |  3 +--
 3 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 68ca41dbbef3..06edc4f9ceb2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2680,7 +2680,7 @@ int btrfs_setup_space_cache(struct btrfs_trans_handle *trans,
 int btrfs_extent_readonly(struct btrfs_fs_info *fs_info, u64 bytenr);
 int btrfs_free_block_groups(struct btrfs_fs_info *info);
 int btrfs_read_block_groups(struct btrfs_fs_info *info);
-int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr);
+bool btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr);
 int btrfs_make_block_group(struct btrfs_trans_handle *trans,
 			   u64 bytes_used, u64 type, u64 chunk_offset,
 			   u64 size);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a1febf155747..816bca482358 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9423,10 +9423,10 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache)
 /*
  * checks to see if its even possible to relocate this block group.
  *
- * @return - -1 if it's not a good idea to relocate this block group, 0 if its
- * ok to go ahead and try.
+ * @return - false if not enough space can be found for relocation, true
+ * otherwise
  */
-int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
+bool btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 {
 	struct btrfs_root *root = fs_info->extent_root;
 	struct btrfs_block_group_cache *block_group;
@@ -9441,7 +9441,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 	int debug;
 	int index;
 	int full = 0;
-	int ret = 0;
+	bool can_reloc = true;
 
 	debug = btrfs_test_opt(fs_info, ENOSPC_DEBUG);
 
@@ -9453,7 +9453,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 			btrfs_warn(fs_info,
 				   "can't find block group for bytenr %llu",
 				   bytenr);
-		return -1;
+		return false;
 	}
 
 	min_free = btrfs_block_group_used(&block_group->item);
@@ -9489,16 +9489,8 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 	 * group is going to be restriped, run checks against the target
 	 * profile instead of the current one.
 	 */
-	ret = -1;
+	can_reloc = false;
 
-	/*
-	 * index:
-	 *      0: raid10
-	 *      1: raid1
-	 *      2: dup
-	 *      3: raid0
-	 *      4: single
-	 */
 	target = get_restripe_target(fs_info, block_group->flags);
 	if (target) {
 		index = btrfs_bg_flags_to_raid_index(extended_to_chunk(target));
@@ -9534,10 +9526,8 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 
 	/* We need to do this so that we can look at pending chunks */
 	trans = btrfs_join_transaction(root);
-	if (IS_ERR(trans)) {
-		ret = PTR_ERR(trans);
+	if (IS_ERR(trans))
 		goto out;
-	}
 
 	mutex_lock(&fs_info->chunk_mutex);
 	list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) {
@@ -9549,18 +9539,17 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 		 */
 		if (device->total_bytes > device->bytes_used + min_free &&
 		    !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
-			ret = find_free_dev_extent(trans, device, min_free,
-						   &dev_offset, NULL);
-			if (!ret)
+			if (!find_free_dev_extent(trans, device, min_free,
+						   &dev_offset, NULL))
 				dev_nr++;
 
-			if (dev_nr >= dev_min)
+			if (dev_nr >= dev_min) {
+				can_reloc = true;
 				break;
-
-			ret = -1;
+			}
 		}
 	}
-	if (debug && ret == -1)
+	if (debug && !can_reloc)
 		btrfs_warn(fs_info,
 			   "no space to allocate a new chunk for block group %llu",
 			   block_group->key.objectid);
@@ -9568,7 +9557,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
 	btrfs_end_transaction(trans);
 out:
 	btrfs_put_block_group(block_group);
-	return ret;
+	return can_reloc;
 }
 
 static int find_first_block_group(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8b0fd7bf3447..dc53d94a62aa 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2856,8 +2856,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
 	 */
 	lockdep_assert_held(&fs_info->delete_unused_bgs_mutex);
 
-	ret = btrfs_can_relocate(fs_info, chunk_offset);
-	if (ret)
+	if (!btrfs_can_relocate(fs_info, chunk_offset))
 		return -ENOSPC;
 
 	/* step one, relocate all the extents inside this chunk */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/5] btrfs: Remove superfluous check form btrfs_remove_chunk
  2018-10-26 11:43 [PATCH 0/5] Misc cleanups in balance code Nikolay Borisov
  2018-10-26 11:43 ` [PATCH 1/5] btrfs: Ensure at least 1g is free for balance Nikolay Borisov
  2018-10-26 11:43 ` [PATCH 2/5] btrfs: Refactor btrfs_can_relocate Nikolay Borisov
@ 2018-10-26 11:43 ` Nikolay Borisov
  2018-10-26 12:40   ` Qu Wenruo
  2018-11-16 23:57   ` Anand Jain
  2018-10-26 11:43 ` [PATCH 4/5] btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument Nikolay Borisov
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Nikolay Borisov @ 2018-10-26 11:43 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

It's unnecessary to check map->stripes[i].dev for NULL given its value
is already set and dereferenced above the the check. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/volumes.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index dc53d94a62aa..f0db43d08456 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2797,13 +2797,11 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
 			mutex_unlock(&fs_info->chunk_mutex);
 		}
 
-		if (map->stripes[i].dev) {
-			ret = btrfs_update_device(trans, map->stripes[i].dev);
-			if (ret) {
-				mutex_unlock(&fs_devices->device_list_mutex);
-				btrfs_abort_transaction(trans, ret);
-				goto out;
-			}
+		ret = btrfs_update_device(trans, device);
+		if (ret) {
+			mutex_unlock(&fs_devices->device_list_mutex);
+			btrfs_abort_transaction(trans, ret);
+			goto out;
 		}
 	}
 	mutex_unlock(&fs_devices->device_list_mutex);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 4/5] btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument
  2018-10-26 11:43 [PATCH 0/5] Misc cleanups in balance code Nikolay Borisov
                   ` (2 preceding siblings ...)
  2018-10-26 11:43 ` [PATCH 3/5] btrfs: Remove superfluous check form btrfs_remove_chunk Nikolay Borisov
@ 2018-10-26 11:43 ` Nikolay Borisov
  2018-10-26 12:42   ` Qu Wenruo
  2018-11-17  0:53   ` Anand Jain
  2018-10-26 11:43 ` [PATCH 5/5] btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range Nikolay Borisov
  2018-11-16 15:18 ` [PATCH 0/5] Misc cleanups in balance code David Sterba
  5 siblings, 2 replies; 25+ messages in thread
From: Nikolay Borisov @ 2018-10-26 11:43 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

All callers of this function pass BTRFS_MAX_EXTENT_SIZE (128M) so let's
reduce the argument count and make that a local variable. No functional
changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c             | 11 +++++------
 fs/btrfs/extent_io.h             |  2 +-
 fs/btrfs/tests/extent-io-tests.c | 10 +++++-----
 3 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6877a74c7469..1a9a521aefe5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1566,8 +1566,9 @@ static noinline int lock_delalloc_pages(struct inode *inode,
 static noinline_for_stack u64 find_lock_delalloc_range(struct inode *inode,
 				    struct extent_io_tree *tree,
 				    struct page *locked_page, u64 *start,
-				    u64 *end, u64 max_bytes)
+				    u64 *end)
 {
+	u64 max_bytes = BTRFS_MAX_EXTENT_SIZE;
 	u64 delalloc_start;
 	u64 delalloc_end;
 	u64 found;
@@ -1647,10 +1648,9 @@ static noinline_for_stack u64 find_lock_delalloc_range(struct inode *inode,
 u64 btrfs_find_lock_delalloc_range(struct inode *inode,
 				    struct extent_io_tree *tree,
 				    struct page *locked_page, u64 *start,
-				    u64 *end, u64 max_bytes)
+				    u64 *end)
 {
-	return find_lock_delalloc_range(inode, tree, locked_page, start, end,
-			max_bytes);
+	return find_lock_delalloc_range(inode, tree, locked_page, start, end);
 }
 #endif
 
@@ -3233,8 +3233,7 @@ static noinline_for_stack int writepage_delalloc(struct inode *inode,
 		nr_delalloc = find_lock_delalloc_range(inode, tree,
 					       page,
 					       &delalloc_start,
-					       &delalloc_end,
-					       BTRFS_MAX_EXTENT_SIZE);
+					       &delalloc_end);
 		if (nr_delalloc == 0) {
 			delalloc_start = delalloc_end + 1;
 			continue;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 369daa5d4f73..7ec4f93caf78 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -549,7 +549,7 @@ int free_io_failure(struct extent_io_tree *failure_tree,
 u64 btrfs_find_lock_delalloc_range(struct inode *inode,
 				      struct extent_io_tree *tree,
 				      struct page *locked_page, u64 *start,
-				      u64 *end, u64 max_bytes);
+				      u64 *end);
 #endif
 struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 					       u64 start);
diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
index 9e0f4a01be14..8ea8c2aa6696 100644
--- a/fs/btrfs/tests/extent-io-tests.c
+++ b/fs/btrfs/tests/extent-io-tests.c
@@ -107,7 +107,7 @@ static int test_find_delalloc(u32 sectorsize)
 	start = 0;
 	end = 0;
 	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
-					 &end, max_bytes);
+					 &end);
 	if (!found) {
 		test_err("should have found at least one delalloc");
 		goto out_bits;
@@ -138,7 +138,7 @@ static int test_find_delalloc(u32 sectorsize)
 	start = test_start;
 	end = 0;
 	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
-					 &end, max_bytes);
+					 &end);
 	if (!found) {
 		test_err("couldn't find delalloc in our range");
 		goto out_bits;
@@ -172,7 +172,7 @@ static int test_find_delalloc(u32 sectorsize)
 	start = test_start;
 	end = 0;
 	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
-					 &end, max_bytes);
+					 &end);
 	if (found) {
 		test_err("found range when we shouldn't have");
 		goto out_bits;
@@ -193,7 +193,7 @@ static int test_find_delalloc(u32 sectorsize)
 	start = test_start;
 	end = 0;
 	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
-					 &end, max_bytes);
+					 &end);
 	if (!found) {
 		test_err("didn't find our range");
 		goto out_bits;
@@ -234,7 +234,7 @@ static int test_find_delalloc(u32 sectorsize)
 	 * tests expected behavior.
 	 */
 	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
-					 &end, max_bytes);
+					 &end);
 	if (!found) {
 		test_err("didn't find our range");
 		goto out_bits;
-- 
2.7.4


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 5/5] btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range
  2018-10-26 11:43 [PATCH 0/5] Misc cleanups in balance code Nikolay Borisov
                   ` (3 preceding siblings ...)
  2018-10-26 11:43 ` [PATCH 4/5] btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument Nikolay Borisov
@ 2018-10-26 11:43 ` Nikolay Borisov
  2018-10-26 12:44   ` Qu Wenruo
  2018-11-17  1:02   ` Anand Jain
  2018-11-16 15:18 ` [PATCH 0/5] Misc cleanups in balance code David Sterba
  5 siblings, 2 replies; 25+ messages in thread
From: Nikolay Borisov @ 2018-10-26 11:43 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

lock_delalloc_pages should only return 2 values - 0 in case of success
and -EAGAIN if the range of pages to be locked should be shrunk due to
some of gone. Manual inspections confirms that this is
indeed the case since __process_pages_contig is where lock_delalloc_pages
gets its return value. The latter always returns 0  or -EAGAIN so the
invariant holds. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1a9a521aefe5..94bc53472031 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1606,6 +1606,7 @@ static noinline_for_stack u64 find_lock_delalloc_range(struct inode *inode,
 	/* step two, lock all the pages after the page that has start */
 	ret = lock_delalloc_pages(inode, locked_page,
 				  delalloc_start, delalloc_end);
+	ASSERT(!ret || ret == -EAGAIN);
 	if (ret == -EAGAIN) {
 		/* some of the pages are gone, lets avoid looping by
 		 * shortening the size of the delalloc range we're searching
@@ -1621,7 +1622,6 @@ static noinline_for_stack u64 find_lock_delalloc_range(struct inode *inode,
 			goto out_failed;
 		}
 	}
-	BUG_ON(ret); /* Only valid values are 0 and -EAGAIN */
 
 	/* step three, lock the state bits for the whole range */
 	lock_extent_bits(tree, delalloc_start, delalloc_end, &cached_state);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] btrfs: Ensure at least 1g is free for balance
  2018-10-26 11:43 ` [PATCH 1/5] btrfs: Ensure at least 1g is free for balance Nikolay Borisov
@ 2018-10-26 12:04   ` Qu Wenruo
  2018-10-26 12:08     ` Nikolay Borisov
  2018-10-26 12:09   ` Hans van Kranenburg
  1 sibling, 1 reply; 25+ messages in thread
From: Qu Wenruo @ 2018-10-26 12:04 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018/10/26 下午7:43, Nikolay Borisov wrote:
> The first part of balance operation is to shrink every constituting
> device to ensure there is free space for chunk allocation. However, the code
> has been buggy ever since its introduction since calculating the space to shrink
> the device by was bounded by 1 mb. Most likely the original intention was to
> have an upper bound of 1g and not 1m, since the largest chunk size is 1g.

Minor nitpick, largest chunk size -> largest chunk stripe size.

As for data chunk, it's possible to get a 10G chunk, but still only 1G
stripe up limit.

> This 
> means the first stage in __btrfs_balance so far has been a null op since it
> effectively freed just a single megabyte.
> 
> Fix this by setting an upper bound of size_to_free of 1g. 

One question come to me naturally, what if we failed to shrink the device?

In fact if btrfs_shrink_device() returns ENOSPC we just skip to
relocation part, so it doesn't look like to cause regression.

If this can be mentioned in the commit message, it would save reviewer
minutes to read the code.



BTW, I think for that (ret == ENOSPC) after btrfs_shrink_device(), we
should continue other than break, to get more chance to secure
unallocated space.

Thanks,
Qu

> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  fs/btrfs/volumes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f435d397019e..8b0fd7bf3447 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3467,7 +3467,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>  	list_for_each_entry(device, devices, dev_list) {
>  		old_size = btrfs_device_get_total_bytes(device);
>  		size_to_free = div_factor(old_size, 1);
> -		size_to_free = min_t(u64, size_to_free, SZ_1M);
> +		size_to_free = min_t(u64, size_to_free, SZ_1G);
>  		if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) ||
>  		    btrfs_device_get_total_bytes(device) -
>  		    btrfs_device_get_bytes_used(device) > size_to_free ||
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] btrfs: Ensure at least 1g is free for balance
  2018-10-26 12:04   ` Qu Wenruo
@ 2018-10-26 12:08     ` Nikolay Borisov
  2018-10-26 12:32       ` Qu Wenruo
  0 siblings, 1 reply; 25+ messages in thread
From: Nikolay Borisov @ 2018-10-26 12:08 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 26.10.2018 15:04, Qu Wenruo wrote:
> 
> 
> On 2018/10/26 下午7:43, Nikolay Borisov wrote:
>> The first part of balance operation is to shrink every constituting
>> device to ensure there is free space for chunk allocation. However, the code
>> has been buggy ever since its introduction since calculating the space to shrink
>> the device by was bounded by 1 mb. Most likely the original intention was to
>> have an upper bound of 1g and not 1m, since the largest chunk size is 1g.
> 
> Minor nitpick, largest chunk size -> largest chunk stripe size.
> 
> As for data chunk, it's possible to get a 10G chunk, but still only 1G
> stripe up limit.
> 
>> This 
>> means the first stage in __btrfs_balance so far has been a null op since it
>> effectively freed just a single megabyte.
>>
>> Fix this by setting an upper bound of size_to_free of 1g. 
> 
> One question come to me naturally, what if we failed to shrink the device?
> 
> In fact if btrfs_shrink_device() returns ENOSPC we just skip to
> relocation part, so it doesn't look like to cause regression.
> 
> If this can be mentioned in the commit message, it would save reviewer
> minutes to read the code.

Will incorporate it in v2.

> 
> 
> 
> BTW, I think for that (ret == ENOSPC) after btrfs_shrink_device(), we
> should continue other than break, to get more chance to secure
> unallocated space.

I agree but this should be done in a separate patch, this one deals with
the silly upper bound of 1m.

> 
> Thanks,
> Qu
> 
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> ---
>>  fs/btrfs/volumes.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index f435d397019e..8b0fd7bf3447 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -3467,7 +3467,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>>  	list_for_each_entry(device, devices, dev_list) {
>>  		old_size = btrfs_device_get_total_bytes(device);
>>  		size_to_free = div_factor(old_size, 1);
>> -		size_to_free = min_t(u64, size_to_free, SZ_1M);
>> +		size_to_free = min_t(u64, size_to_free, SZ_1G);
>>  		if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) ||
>>  		    btrfs_device_get_total_bytes(device) -
>>  		    btrfs_device_get_bytes_used(device) > size_to_free ||
>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] btrfs: Ensure at least 1g is free for balance
  2018-10-26 11:43 ` [PATCH 1/5] btrfs: Ensure at least 1g is free for balance Nikolay Borisov
  2018-10-26 12:04   ` Qu Wenruo
@ 2018-10-26 12:09   ` Hans van Kranenburg
  2018-10-26 12:16     ` Nikolay Borisov
  1 sibling, 1 reply; 25+ messages in thread
From: Hans van Kranenburg @ 2018-10-26 12:09 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs

On 10/26/18 1:43 PM, Nikolay Borisov wrote:
> The first part of balance operation is to shrink every constituting
> device to ensure there is free space for chunk allocation.

A very useful thing to be able to do if there's no unallocated raw disk
space left, is use balance to rewrite some blockgroup into the free
space of other ones.

This does not need any raw space for a chunk allocation. Requiring so
makes it unneccesarily more complicated for users to fix the situation
they got in.

> However, the code
> has been buggy ever since its introduction since calculating the space to shrink
> the device by was bounded by 1 mb. Most likely the original intention was to
> have an upper bound of 1g and not 1m, since the largest chunk size is 1g. This 
> means the first stage in __btrfs_balance so far has been a null op since it
> effectively freed just a single megabyte.
> 
> Fix this by setting an upper bound of size_to_free of 1g. 
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  fs/btrfs/volumes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f435d397019e..8b0fd7bf3447 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3467,7 +3467,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>  	list_for_each_entry(device, devices, dev_list) {
>  		old_size = btrfs_device_get_total_bytes(device);
>  		size_to_free = div_factor(old_size, 1);
> -		size_to_free = min_t(u64, size_to_free, SZ_1M);
> +		size_to_free = min_t(u64, size_to_free, SZ_1G);
>  		if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) ||
>  		    btrfs_device_get_total_bytes(device) -
>  		    btrfs_device_get_bytes_used(device) > size_to_free ||
> 


-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] btrfs: Ensure at least 1g is free for balance
  2018-10-26 12:09   ` Hans van Kranenburg
@ 2018-10-26 12:16     ` Nikolay Borisov
  2018-10-26 12:36       ` Hans van Kranenburg
  0 siblings, 1 reply; 25+ messages in thread
From: Nikolay Borisov @ 2018-10-26 12:16 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs; +Cc: Chris Mason


(Adding Chris to CC since he is the original author of the code)

On 26.10.2018 15:09, Hans van Kranenburg wrote:
> On 10/26/18 1:43 PM, Nikolay Borisov wrote:
>> The first part of balance operation is to shrink every constituting
>> device to ensure there is free space for chunk allocation.
> 
> A very useful thing to be able to do if there's no unallocated raw disk
> space left, is use balance to rewrite some blockgroup into the free
> space of other ones.
> 
> This does not need any raw space for a chunk allocation. Requiring so
> makes it unneccesarily more complicated for users to fix the situation
> they got in.

The current logic of the code doesn't preclude this use case since if we
can't shrink we just proceed to relocation. So even if 1g is replaced
with 5eb and shrink_device always fails balancing will still proceed.

However, all of this really makes me wonder do we even need this "first
stage" code in balancing (Chris, any thoughts?).

> 
>> However, the code
>> has been buggy ever since its introduction since calculating the space to shrink
>> the device by was bounded by 1 mb. Most likely the original intention was to
>> have an upper bound of 1g and not 1m, since the largest chunk size is 1g. This 
>> means the first stage in __btrfs_balance so far has been a null op since it
>> effectively freed just a single megabyte.
>>
>> Fix this by setting an upper bound of size_to_free of 1g. 
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> ---
>>  fs/btrfs/volumes.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index f435d397019e..8b0fd7bf3447 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -3467,7 +3467,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>>  	list_for_each_entry(device, devices, dev_list) {
>>  		old_size = btrfs_device_get_total_bytes(device);
>>  		size_to_free = div_factor(old_size, 1);
>> -		size_to_free = min_t(u64, size_to_free, SZ_1M);
>> +		size_to_free = min_t(u64, size_to_free, SZ_1G);
>>  		if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) ||
>>  		    btrfs_device_get_total_bytes(device) -
>>  		    btrfs_device_get_bytes_used(device) > size_to_free ||
>>
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] btrfs: Ensure at least 1g is free for balance
  2018-10-26 12:08     ` Nikolay Borisov
@ 2018-10-26 12:32       ` Qu Wenruo
  0 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2018-10-26 12:32 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018/10/26 下午8:08, Nikolay Borisov wrote:
> 
> 
> On 26.10.2018 15:04, Qu Wenruo wrote:
>>
>>
>> On 2018/10/26 下午7:43, Nikolay Borisov wrote:
>>> The first part of balance operation is to shrink every constituting
>>> device to ensure there is free space for chunk allocation. However, the code
>>> has been buggy ever since its introduction since calculating the space to shrink
>>> the device by was bounded by 1 mb. Most likely the original intention was to
>>> have an upper bound of 1g and not 1m, since the largest chunk size is 1g.
>>
>> Minor nitpick, largest chunk size -> largest chunk stripe size.
>>
>> As for data chunk, it's possible to get a 10G chunk, but still only 1G
>> stripe up limit.
>>
>>> This 
>>> means the first stage in __btrfs_balance so far has been a null op since it
>>> effectively freed just a single megabyte.
>>>
>>> Fix this by setting an upper bound of size_to_free of 1g. 
>>
>> One question come to me naturally, what if we failed to shrink the device?
>>
>> In fact if btrfs_shrink_device() returns ENOSPC we just skip to
>> relocation part, so it doesn't look like to cause regression.
>>
>> If this can be mentioned in the commit message, it would save reviewer
>> minutes to read the code.
> 
> Will incorporate it in v2.
> 
>>
>>
>>
>> BTW, I think for that (ret == ENOSPC) after btrfs_shrink_device(), we
>> should continue other than break, to get more chance to secure
>> unallocated space.
> 
> I agree but this should be done in a separate patch, this one deals with
> the silly upper bound of 1m.

No problem, just a hint for a new patch :)

Thanks,
Qu

> 
>>
>> Thanks,
>> Qu
>>
>>>
>>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>>> ---
>>>  fs/btrfs/volumes.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index f435d397019e..8b0fd7bf3447 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -3467,7 +3467,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>>>  	list_for_each_entry(device, devices, dev_list) {
>>>  		old_size = btrfs_device_get_total_bytes(device);
>>>  		size_to_free = div_factor(old_size, 1);
>>> -		size_to_free = min_t(u64, size_to_free, SZ_1M);
>>> +		size_to_free = min_t(u64, size_to_free, SZ_1G);
>>>  		if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) ||
>>>  		    btrfs_device_get_total_bytes(device) -
>>>  		    btrfs_device_get_bytes_used(device) > size_to_free ||
>>>
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] btrfs: Refactor btrfs_can_relocate
  2018-10-26 11:43 ` [PATCH 2/5] btrfs: Refactor btrfs_can_relocate Nikolay Borisov
@ 2018-10-26 12:35   ` Qu Wenruo
  2018-11-17  1:29   ` Anand Jain
  1 sibling, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2018-10-26 12:35 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018/10/26 下午7:43, Nikolay Borisov wrote:
> btrfs_can_relocate returns 0 when it concludes the given chunk can be
> relocated and -1 otherwise. Since this function is used as a predicated
> and it return a binary value it makes no sense to have it's return
> value as an int so change it to bool. Furthermore, remove a stale
> leftover comment from e6ec716f0ddb ("Btrfs: make raid attr array more
> readable").
> 
> Finally make the logic in the list_for_each_entry loop a bit more obvious. Up
> until now the fact that we are returning success (0) was a bit masked by the
> fact that the 0 is re-used from the return value of find_free_dev_extent.
> Instead, now just increment dev_nr if we find a suitable extent and explicitly
> set can_reloc to true if enough devices with unallocated space are present.
> No functional changes.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>  fs/btrfs/ctree.h       |  2 +-
>  fs/btrfs/extent-tree.c | 39 ++++++++++++++-------------------------
>  fs/btrfs/volumes.c     |  3 +--
>  3 files changed, 16 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 68ca41dbbef3..06edc4f9ceb2 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2680,7 +2680,7 @@ int btrfs_setup_space_cache(struct btrfs_trans_handle *trans,
>  int btrfs_extent_readonly(struct btrfs_fs_info *fs_info, u64 bytenr);
>  int btrfs_free_block_groups(struct btrfs_fs_info *info);
>  int btrfs_read_block_groups(struct btrfs_fs_info *info);
> -int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr);
> +bool btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr);
>  int btrfs_make_block_group(struct btrfs_trans_handle *trans,
>  			   u64 bytes_used, u64 type, u64 chunk_offset,
>  			   u64 size);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index a1febf155747..816bca482358 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9423,10 +9423,10 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache)
>  /*
>   * checks to see if its even possible to relocate this block group.
>   *
> - * @return - -1 if it's not a good idea to relocate this block group, 0 if its
> - * ok to go ahead and try.
> + * @return - false if not enough space can be found for relocation, true
> + * otherwise
>   */
> -int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
> +bool btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>  {
>  	struct btrfs_root *root = fs_info->extent_root;
>  	struct btrfs_block_group_cache *block_group;
> @@ -9441,7 +9441,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>  	int debug;
>  	int index;
>  	int full = 0;
> -	int ret = 0;
> +	bool can_reloc = true;
>  
>  	debug = btrfs_test_opt(fs_info, ENOSPC_DEBUG);
>  
> @@ -9453,7 +9453,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>  			btrfs_warn(fs_info,
>  				   "can't find block group for bytenr %llu",
>  				   bytenr);
> -		return -1;
> +		return false;
>  	}
>  
>  	min_free = btrfs_block_group_used(&block_group->item);
> @@ -9489,16 +9489,8 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>  	 * group is going to be restriped, run checks against the target
>  	 * profile instead of the current one.
>  	 */
> -	ret = -1;
> +	can_reloc = false;
>  
> -	/*
> -	 * index:
> -	 *      0: raid10
> -	 *      1: raid1
> -	 *      2: dup
> -	 *      3: raid0
> -	 *      4: single
> -	 */
>  	target = get_restripe_target(fs_info, block_group->flags);
>  	if (target) {
>  		index = btrfs_bg_flags_to_raid_index(extended_to_chunk(target));
> @@ -9534,10 +9526,8 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>  
>  	/* We need to do this so that we can look at pending chunks */
>  	trans = btrfs_join_transaction(root);
> -	if (IS_ERR(trans)) {
> -		ret = PTR_ERR(trans);
> +	if (IS_ERR(trans))
>  		goto out;
> -	}
>  
>  	mutex_lock(&fs_info->chunk_mutex);
>  	list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) {
> @@ -9549,18 +9539,17 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>  		 */
>  		if (device->total_bytes > device->bytes_used + min_free &&
>  		    !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
> -			ret = find_free_dev_extent(trans, device, min_free,
> -						   &dev_offset, NULL);
> -			if (!ret)
> +			if (!find_free_dev_extent(trans, device, min_free,
> +						   &dev_offset, NULL))
>  				dev_nr++;
>  
> -			if (dev_nr >= dev_min)
> +			if (dev_nr >= dev_min) {
> +				can_reloc = true;
>  				break;
> -
> -			ret = -1;
> +			}
>  		}
>  	}
> -	if (debug && ret == -1)
> +	if (debug && !can_reloc)
>  		btrfs_warn(fs_info,
>  			   "no space to allocate a new chunk for block group %llu",
>  			   block_group->key.objectid);
> @@ -9568,7 +9557,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>  	btrfs_end_transaction(trans);
>  out:
>  	btrfs_put_block_group(block_group);
> -	return ret;
> +	return can_reloc;
>  }
>  
>  static int find_first_block_group(struct btrfs_fs_info *fs_info,
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 8b0fd7bf3447..dc53d94a62aa 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2856,8 +2856,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>  	 */
>  	lockdep_assert_held(&fs_info->delete_unused_bgs_mutex);
>  
> -	ret = btrfs_can_relocate(fs_info, chunk_offset);
> -	if (ret)
> +	if (!btrfs_can_relocate(fs_info, chunk_offset))
>  		return -ENOSPC;
>  
>  	/* step one, relocate all the extents inside this chunk */
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] btrfs: Ensure at least 1g is free for balance
  2018-10-26 12:16     ` Nikolay Borisov
@ 2018-10-26 12:36       ` Hans van Kranenburg
  0 siblings, 0 replies; 25+ messages in thread
From: Hans van Kranenburg @ 2018-10-26 12:36 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: Chris Mason

On 10/26/18 2:16 PM, Nikolay Borisov wrote:
> 
> (Adding Chris to CC since he is the original author of the code)
> 
> On 26.10.2018 15:09, Hans van Kranenburg wrote:
>> On 10/26/18 1:43 PM, Nikolay Borisov wrote:
>>> The first part of balance operation is to shrink every constituting
>>> device to ensure there is free space for chunk allocation.
>>
>> A very useful thing to be able to do if there's no unallocated raw disk
>> space left, is use balance to rewrite some blockgroup into the free
>> space of other ones.
>>
>> This does not need any raw space for a chunk allocation. Requiring so
>> makes it unneccesarily more complicated for users to fix the situation
>> they got in.
> 
> The current logic of the code doesn't preclude this use case since if we
> can't shrink we just proceed to relocation. So even if 1g is replaced
> with 5eb and shrink_device always fails balancing will still proceed.
> 
> However, all of this really makes me wonder do we even need this "first
> stage" code in balancing (Chris, any thoughts?).

Having something that tries to free up 1G on each device can be very
useful when converting to another profile. I guess that's why it is there.

And shrink device + grow device is a nice way to try packing some data
into other existing block groups. Only... if shrink+grow is done for
each device after each other, and there's one with 1G unallocated left,
and it can't easily move it into existing free space, then it may just
be moving data around to the next disk all the time I guess, ending up
with the same situation as before. :D

>>> However, the code
>>> has been buggy ever since its introduction since calculating the space to shrink
>>> the device by was bounded by 1 mb. Most likely the original intention was to
>>> have an upper bound of 1g and not 1m, since the largest chunk size is 1g. This 
>>> means the first stage in __btrfs_balance so far has been a null op since it
>>> effectively freed just a single megabyte.
>>>
>>> Fix this by setting an upper bound of size_to_free of 1g. 
>>>
>>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>>> ---
>>>  fs/btrfs/volumes.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index f435d397019e..8b0fd7bf3447 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -3467,7 +3467,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>>>  	list_for_each_entry(device, devices, dev_list) {
>>>  		old_size = btrfs_device_get_total_bytes(device);
>>>  		size_to_free = div_factor(old_size, 1);
>>> -		size_to_free = min_t(u64, size_to_free, SZ_1M);
>>> +		size_to_free = min_t(u64, size_to_free, SZ_1G);
>>>  		if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) ||
>>>  		    btrfs_device_get_total_bytes(device) -
>>>  		    btrfs_device_get_bytes_used(device) > size_to_free ||
>>>
>>
>>


-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] btrfs: Remove superfluous check form btrfs_remove_chunk
  2018-10-26 11:43 ` [PATCH 3/5] btrfs: Remove superfluous check form btrfs_remove_chunk Nikolay Borisov
@ 2018-10-26 12:40   ` Qu Wenruo
  2018-11-16 23:57   ` Anand Jain
  1 sibling, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2018-10-26 12:40 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018/10/26 下午7:43, Nikolay Borisov wrote:
> It's unnecessary to check map->stripes[i].dev for NULL given its value
> is already set and dereferenced above the the check. No functional changes.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>  fs/btrfs/volumes.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index dc53d94a62aa..f0db43d08456 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2797,13 +2797,11 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>  			mutex_unlock(&fs_info->chunk_mutex);
>  		}
>  
> -		if (map->stripes[i].dev) {
> -			ret = btrfs_update_device(trans, map->stripes[i].dev);
> -			if (ret) {
> -				mutex_unlock(&fs_devices->device_list_mutex);
> -				btrfs_abort_transaction(trans, ret);
> -				goto out;
> -			}
> +		ret = btrfs_update_device(trans, device);
> +		if (ret) {
> +			mutex_unlock(&fs_devices->device_list_mutex);
> +			btrfs_abort_transaction(trans, ret);
> +			goto out;
>  		}
>  	}
>  	mutex_unlock(&fs_devices->device_list_mutex);
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument
  2018-10-26 11:43 ` [PATCH 4/5] btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument Nikolay Borisov
@ 2018-10-26 12:42   ` Qu Wenruo
  2018-11-17  0:53   ` Anand Jain
  1 sibling, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2018-10-26 12:42 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018/10/26 下午7:43, Nikolay Borisov wrote:
> All callers of this function pass BTRFS_MAX_EXTENT_SIZE (128M) so let's
> reduce the argument count and make that a local variable. No functional
> changes.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>  fs/btrfs/extent_io.c             | 11 +++++------
>  fs/btrfs/extent_io.h             |  2 +-
>  fs/btrfs/tests/extent-io-tests.c | 10 +++++-----
>  3 files changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 6877a74c7469..1a9a521aefe5 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1566,8 +1566,9 @@ static noinline int lock_delalloc_pages(struct inode *inode,
>  static noinline_for_stack u64 find_lock_delalloc_range(struct inode *inode,
>  				    struct extent_io_tree *tree,
>  				    struct page *locked_page, u64 *start,
> -				    u64 *end, u64 max_bytes)
> +				    u64 *end)
>  {
> +	u64 max_bytes = BTRFS_MAX_EXTENT_SIZE;
>  	u64 delalloc_start;
>  	u64 delalloc_end;
>  	u64 found;
> @@ -1647,10 +1648,9 @@ static noinline_for_stack u64 find_lock_delalloc_range(struct inode *inode,
>  u64 btrfs_find_lock_delalloc_range(struct inode *inode,
>  				    struct extent_io_tree *tree,
>  				    struct page *locked_page, u64 *start,
> -				    u64 *end, u64 max_bytes)
> +				    u64 *end)
>  {
> -	return find_lock_delalloc_range(inode, tree, locked_page, start, end,
> -			max_bytes);
> +	return find_lock_delalloc_range(inode, tree, locked_page, start, end);
>  }
>  #endif
>  
> @@ -3233,8 +3233,7 @@ static noinline_for_stack int writepage_delalloc(struct inode *inode,
>  		nr_delalloc = find_lock_delalloc_range(inode, tree,
>  					       page,
>  					       &delalloc_start,
> -					       &delalloc_end,
> -					       BTRFS_MAX_EXTENT_SIZE);
> +					       &delalloc_end);
>  		if (nr_delalloc == 0) {
>  			delalloc_start = delalloc_end + 1;
>  			continue;
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 369daa5d4f73..7ec4f93caf78 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -549,7 +549,7 @@ int free_io_failure(struct extent_io_tree *failure_tree,
>  u64 btrfs_find_lock_delalloc_range(struct inode *inode,
>  				      struct extent_io_tree *tree,
>  				      struct page *locked_page, u64 *start,
> -				      u64 *end, u64 max_bytes);
> +				      u64 *end);
>  #endif
>  struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
>  					       u64 start);
> diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
> index 9e0f4a01be14..8ea8c2aa6696 100644
> --- a/fs/btrfs/tests/extent-io-tests.c
> +++ b/fs/btrfs/tests/extent-io-tests.c
> @@ -107,7 +107,7 @@ static int test_find_delalloc(u32 sectorsize)
>  	start = 0;
>  	end = 0;
>  	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
> -					 &end, max_bytes);
> +					 &end);
>  	if (!found) {
>  		test_err("should have found at least one delalloc");
>  		goto out_bits;
> @@ -138,7 +138,7 @@ static int test_find_delalloc(u32 sectorsize)
>  	start = test_start;
>  	end = 0;
>  	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
> -					 &end, max_bytes);
> +					 &end);
>  	if (!found) {
>  		test_err("couldn't find delalloc in our range");
>  		goto out_bits;
> @@ -172,7 +172,7 @@ static int test_find_delalloc(u32 sectorsize)
>  	start = test_start;
>  	end = 0;
>  	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
> -					 &end, max_bytes);
> +					 &end);
>  	if (found) {
>  		test_err("found range when we shouldn't have");
>  		goto out_bits;
> @@ -193,7 +193,7 @@ static int test_find_delalloc(u32 sectorsize)
>  	start = test_start;
>  	end = 0;
>  	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
> -					 &end, max_bytes);
> +					 &end);
>  	if (!found) {
>  		test_err("didn't find our range");
>  		goto out_bits;
> @@ -234,7 +234,7 @@ static int test_find_delalloc(u32 sectorsize)
>  	 * tests expected behavior.
>  	 */
>  	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
> -					 &end, max_bytes);
> +					 &end);
>  	if (!found) {
>  		test_err("didn't find our range");
>  		goto out_bits;
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range
  2018-10-26 11:43 ` [PATCH 5/5] btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range Nikolay Borisov
@ 2018-10-26 12:44   ` Qu Wenruo
  2018-11-17  1:02   ` Anand Jain
  1 sibling, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2018-10-26 12:44 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2018/10/26 下午7:43, Nikolay Borisov wrote:
> lock_delalloc_pages should only return 2 values - 0 in case of success
> and -EAGAIN if the range of pages to be locked should be shrunk due to
> some of gone. Manual inspections confirms that this is
> indeed the case since __process_pages_contig is where lock_delalloc_pages
> gets its return value. The latter always returns 0  or -EAGAIN so the
> invariant holds. No functional changes.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>  fs/btrfs/extent_io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 1a9a521aefe5..94bc53472031 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1606,6 +1606,7 @@ static noinline_for_stack u64 find_lock_delalloc_range(struct inode *inode,
>  	/* step two, lock all the pages after the page that has start */
>  	ret = lock_delalloc_pages(inode, locked_page,
>  				  delalloc_start, delalloc_end);
> +	ASSERT(!ret || ret == -EAGAIN);
>  	if (ret == -EAGAIN) {
>  		/* some of the pages are gone, lets avoid looping by
>  		 * shortening the size of the delalloc range we're searching
> @@ -1621,7 +1622,6 @@ static noinline_for_stack u64 find_lock_delalloc_range(struct inode *inode,
>  			goto out_failed;
>  		}
>  	}
> -	BUG_ON(ret); /* Only valid values are 0 and -EAGAIN */
>  
>  	/* step three, lock the state bits for the whole range */
>  	lock_extent_bits(tree, delalloc_start, delalloc_end, &cached_state);
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] Misc cleanups in balance code
  2018-10-26 11:43 [PATCH 0/5] Misc cleanups in balance code Nikolay Borisov
                   ` (4 preceding siblings ...)
  2018-10-26 11:43 ` [PATCH 5/5] btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range Nikolay Borisov
@ 2018-11-16 15:18 ` David Sterba
  2018-11-16 15:36   ` Nikolay Borisov
  5 siblings, 1 reply; 25+ messages in thread
From: David Sterba @ 2018-11-16 15:18 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Fri, Oct 26, 2018 at 02:43:16PM +0300, Nikolay Borisov wrote:
> While investigating the balance hang I came across various inconsistencies in 
> the source. This series aims to fix those. 
> 
> The first patch is (I believe) a fix to a longstanding bug that could cause 
> balance to fail due to ENOSPC. The code no properly ensures that there is 
> at least 1g of unallocated space on every device being balance.
> 
> Patch 2 makes btrfs_can_relocate a bit more obvious and removes leftovers from 
> previous cleanup
> 
> Patches 3/4/5 remove some redundant code from various functions. 
> 
> This series has survived multiple xfstest runs. 
> 
> Nikolay Borisov (5):
>   btrfs: Ensure at least 1g is free for balance
>   btrfs: Refactor btrfs_can_relocate
>   btrfs: Remove superfluous check form btrfs_remove_chunk
>   btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument
>   btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range

Patches 2-5 on the way to misc-next, thanks. The first one can have user
visible consequences, so I'd rather first find out why the 1MB was there
and if it's really a bug and what exactly will change when it's 1G.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] Misc cleanups in balance code
  2018-11-16 15:18 ` [PATCH 0/5] Misc cleanups in balance code David Sterba
@ 2018-11-16 15:36   ` Nikolay Borisov
  0 siblings, 0 replies; 25+ messages in thread
From: Nikolay Borisov @ 2018-11-16 15:36 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On 16.11.18 г. 17:18 ч., David Sterba wrote:
> On Fri, Oct 26, 2018 at 02:43:16PM +0300, Nikolay Borisov wrote:
>> While investigating the balance hang I came across various inconsistencies in 
>> the source. This series aims to fix those. 
>>
>> The first patch is (I believe) a fix to a longstanding bug that could cause 
>> balance to fail due to ENOSPC. The code no properly ensures that there is 
>> at least 1g of unallocated space on every device being balance.
>>
>> Patch 2 makes btrfs_can_relocate a bit more obvious and removes leftovers from 
>> previous cleanup
>>
>> Patches 3/4/5 remove some redundant code from various functions. 
>>
>> This series has survived multiple xfstest runs. 
>>
>> Nikolay Borisov (5):
>>   btrfs: Ensure at least 1g is free for balance
>>   btrfs: Refactor btrfs_can_relocate
>>   btrfs: Remove superfluous check form btrfs_remove_chunk
>>   btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument
>>   btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range
> 
> Patches 2-5 on the way to misc-next, thanks. The first one can have user
> visible consequences, so I'd rather first find out why the 1MB was there
> and if it's really a bug and what exactly will change when it's 1G.

I'm fine with that, one thing I don't agree with, though, is the
conclusion that patch 1 has user visible consequence. As a matter of
fact it does not. If btrfs_shrink_device fails with ENOSPC we just
break, i/e we don't try to free balance space for any of the other
devices, but this doesn't stop from balance actually continuing. As it
stands today I think "step one" in __btrfs_balance is more or less
null-op, i.e cycles are wasted since all we do is shrink every device by
1mb, ensuring one more mb is already free. I think this is very
insufficient for a balance operation and when it succeeds during normal
operation it's due to the device already having enough unallocated space
before the balance.

Anyway, I will speak with Chris to try and find out why the code uses 1mb

> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] btrfs: Remove superfluous check form btrfs_remove_chunk
  2018-10-26 11:43 ` [PATCH 3/5] btrfs: Remove superfluous check form btrfs_remove_chunk Nikolay Borisov
  2018-10-26 12:40   ` Qu Wenruo
@ 2018-11-16 23:57   ` Anand Jain
  1 sibling, 0 replies; 25+ messages in thread
From: Anand Jain @ 2018-11-16 23:57 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 10/26/2018 07:43 PM, Nikolay Borisov wrote:
> It's unnecessary to check map->stripes[i].dev for NULL given its value
> is already set and dereferenced above the the check. No functional changes.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

> ---
>   fs/btrfs/volumes.c | 12 +++++-------
>   1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index dc53d94a62aa..f0db43d08456 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2797,13 +2797,11 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>   			mutex_unlock(&fs_info->chunk_mutex);
>   		}
>   
> -		if (map->stripes[i].dev) {
> -			ret = btrfs_update_device(trans, map->stripes[i].dev);
> -			if (ret) {
> -				mutex_unlock(&fs_devices->device_list_mutex);
> -				btrfs_abort_transaction(trans, ret);
> -				goto out;
> -			}
> +		ret = btrfs_update_device(trans, device);
> +		if (ret) {
> +			mutex_unlock(&fs_devices->device_list_mutex);
> +			btrfs_abort_transaction(trans, ret);
> +			goto out;
>   		}
>   	}
>   	mutex_unlock(&fs_devices->device_list_mutex);
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument
  2018-10-26 11:43 ` [PATCH 4/5] btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument Nikolay Borisov
  2018-10-26 12:42   ` Qu Wenruo
@ 2018-11-17  0:53   ` Anand Jain
  1 sibling, 0 replies; 25+ messages in thread
From: Anand Jain @ 2018-11-17  0:53 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 10/26/2018 07:43 PM, Nikolay Borisov wrote:
> All callers of this function pass BTRFS_MAX_EXTENT_SIZE (128M) so let's
> reduce the argument count and make that a local variable. No functional
> changes.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

   Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

> ---
>   fs/btrfs/extent_io.c             | 11 +++++------
>   fs/btrfs/extent_io.h             |  2 +-
>   fs/btrfs/tests/extent-io-tests.c | 10 +++++-----
>   3 files changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 6877a74c7469..1a9a521aefe5 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1566,8 +1566,9 @@ static noinline int lock_delalloc_pages(struct inode *inode,
>   static noinline_for_stack u64 find_lock_delalloc_range(struct inode *inode,
>   				    struct extent_io_tree *tree,
>   				    struct page *locked_page, u64 *start,
> -				    u64 *end, u64 max_bytes)
> +				    u64 *end)
>   {
> +	u64 max_bytes = BTRFS_MAX_EXTENT_SIZE;
>   	u64 delalloc_start;
>   	u64 delalloc_end;
>   	u64 found;
> @@ -1647,10 +1648,9 @@ static noinline_for_stack u64 find_lock_delalloc_range(struct inode *inode,
>   u64 btrfs_find_lock_delalloc_range(struct inode *inode,
>   				    struct extent_io_tree *tree,
>   				    struct page *locked_page, u64 *start,
> -				    u64 *end, u64 max_bytes)
> +				    u64 *end)
>   {
> -	return find_lock_delalloc_range(inode, tree, locked_page, start, end,
> -			max_bytes);
> +	return find_lock_delalloc_range(inode, tree, locked_page, start, end);
>   }
>   #endif
>   
> @@ -3233,8 +3233,7 @@ static noinline_for_stack int writepage_delalloc(struct inode *inode,
>   		nr_delalloc = find_lock_delalloc_range(inode, tree,
>   					       page,
>   					       &delalloc_start,
> -					       &delalloc_end,
> -					       BTRFS_MAX_EXTENT_SIZE);
> +					       &delalloc_end);
>   		if (nr_delalloc == 0) {
>   			delalloc_start = delalloc_end + 1;
>   			continue;
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 369daa5d4f73..7ec4f93caf78 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -549,7 +549,7 @@ int free_io_failure(struct extent_io_tree *failure_tree,
>   u64 btrfs_find_lock_delalloc_range(struct inode *inode,
>   				      struct extent_io_tree *tree,
>   				      struct page *locked_page, u64 *start,
> -				      u64 *end, u64 max_bytes);
> +				      u64 *end);
>   #endif
>   struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
>   					       u64 start);
> diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
> index 9e0f4a01be14..8ea8c2aa6696 100644
> --- a/fs/btrfs/tests/extent-io-tests.c
> +++ b/fs/btrfs/tests/extent-io-tests.c
> @@ -107,7 +107,7 @@ static int test_find_delalloc(u32 sectorsize)
>   	start = 0;
>   	end = 0;
>   	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
> -					 &end, max_bytes);
> +					 &end);
>   	if (!found) {
>   		test_err("should have found at least one delalloc");
>   		goto out_bits;
> @@ -138,7 +138,7 @@ static int test_find_delalloc(u32 sectorsize)
>   	start = test_start;
>   	end = 0;
>   	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
> -					 &end, max_bytes);
> +					 &end);
>   	if (!found) {
>   		test_err("couldn't find delalloc in our range");
>   		goto out_bits;
> @@ -172,7 +172,7 @@ static int test_find_delalloc(u32 sectorsize)
>   	start = test_start;
>   	end = 0;
>   	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
> -					 &end, max_bytes);
> +					 &end);
>   	if (found) {
>   		test_err("found range when we shouldn't have");
>   		goto out_bits;
> @@ -193,7 +193,7 @@ static int test_find_delalloc(u32 sectorsize)
>   	start = test_start;
>   	end = 0;
>   	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
> -					 &end, max_bytes);
> +					 &end);
>   	if (!found) {
>   		test_err("didn't find our range");
>   		goto out_bits;
> @@ -234,7 +234,7 @@ static int test_find_delalloc(u32 sectorsize)
>   	 * tests expected behavior.
>   	 */
>   	found = btrfs_find_lock_delalloc_range(inode, &tmp, locked_page, &start,
> -					 &end, max_bytes);
> +					 &end);
>   	if (!found) {
>   		test_err("didn't find our range");
>   		goto out_bits;
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range
  2018-10-26 11:43 ` [PATCH 5/5] btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range Nikolay Borisov
  2018-10-26 12:44   ` Qu Wenruo
@ 2018-11-17  1:02   ` Anand Jain
  1 sibling, 0 replies; 25+ messages in thread
From: Anand Jain @ 2018-11-17  1:02 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 10/26/2018 07:43 PM, Nikolay Borisov wrote:
> lock_delalloc_pages should only return 2 values - 0 in case of success
> and -EAGAIN if the range of pages to be locked should be shrunk due to
> some of gone. Manual inspections confirms that this is
> indeed the case since __process_pages_contig is where lock_delalloc_pages
> gets its return value. The latter always returns 0  or -EAGAIN so the
> invariant holds. No functional changes.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

> ---
>   fs/btrfs/extent_io.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 1a9a521aefe5..94bc53472031 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1606,6 +1606,7 @@ static noinline_for_stack u64 find_lock_delalloc_range(struct inode *inode,
>   	/* step two, lock all the pages after the page that has start */
>   	ret = lock_delalloc_pages(inode, locked_page,
>   				  delalloc_start, delalloc_end);
> +	ASSERT(!ret || ret == -EAGAIN);
>   	if (ret == -EAGAIN) {
>   		/* some of the pages are gone, lets avoid looping by
>   		 * shortening the size of the delalloc range we're searching
> @@ -1621,7 +1622,6 @@ static noinline_for_stack u64 find_lock_delalloc_range(struct inode *inode,
>   			goto out_failed;
>   		}
>   	}
> -	BUG_ON(ret); /* Only valid values are 0 and -EAGAIN */
>   
>   	/* step three, lock the state bits for the whole range */
>   	lock_extent_bits(tree, delalloc_start, delalloc_end, &cached_state);
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] btrfs: Refactor btrfs_can_relocate
  2018-10-26 11:43 ` [PATCH 2/5] btrfs: Refactor btrfs_can_relocate Nikolay Borisov
  2018-10-26 12:35   ` Qu Wenruo
@ 2018-11-17  1:29   ` Anand Jain
  2018-12-03 17:25     ` David Sterba
  1 sibling, 1 reply; 25+ messages in thread
From: Anand Jain @ 2018-11-17  1:29 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 10/26/2018 07:43 PM, Nikolay Borisov wrote:
> btrfs_can_relocate returns 0 when it concludes the given chunk can be
> relocated and -1 otherwise. Since this function is used as a predicated
> and it return a binary value it makes no sense to have it's return
> value as an int so change it to bool. Furthermore, remove a stale
> leftover comment from e6ec716f0ddb ("Btrfs: make raid attr array more
> readable").
> 
> Finally make the logic in the list_for_each_entry loop a bit more obvious. Up
> until now the fact that we are returning success (0) was a bit masked by the
> fact that the 0 is re-used from the return value of find_free_dev_extent.
> Instead, now just increment dev_nr if we find a suitable extent and explicitly
> set can_reloc to true if enough devices with unallocated space are present.
> No functional changes.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>   fs/btrfs/ctree.h       |  2 +-
>   fs/btrfs/extent-tree.c | 39 ++++++++++++++-------------------------
>   fs/btrfs/volumes.c     |  3 +--
>   3 files changed, 16 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 68ca41dbbef3..06edc4f9ceb2 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2680,7 +2680,7 @@ int btrfs_setup_space_cache(struct btrfs_trans_handle *trans,
>   int btrfs_extent_readonly(struct btrfs_fs_info *fs_info, u64 bytenr);
>   int btrfs_free_block_groups(struct btrfs_fs_info *info);
>   int btrfs_read_block_groups(struct btrfs_fs_info *info);
> -int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr);
> +bool btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr);
>   int btrfs_make_block_group(struct btrfs_trans_handle *trans,
>   			   u64 bytes_used, u64 type, u64 chunk_offset,
>   			   u64 size);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index a1febf155747..816bca482358 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9423,10 +9423,10 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache)
>   /*
>    * checks to see if its even possible to relocate this block group.
>    *
> - * @return - -1 if it's not a good idea to relocate this block group, 0 if its
> - * ok to go ahead and try.
> + * @return - false if not enough space can be found for relocation, true
> + * otherwise
>    */
> -int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
> +bool btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>   {
>   	struct btrfs_root *root = fs_info->extent_root;
>   	struct btrfs_block_group_cache *block_group;
> @@ -9441,7 +9441,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>   	int debug;
>   	int index;
>   	int full = 0;
> -	int ret = 0;
> +	bool can_reloc = true;
>   
>   	debug = btrfs_test_opt(fs_info, ENOSPC_DEBUG);
>   
> @@ -9453,7 +9453,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>   			btrfs_warn(fs_info,
>   				   "can't find block group for bytenr %llu",
>   				   bytenr);
> -		return -1;
> +		return false;
>   	}
>   
>   	min_free = btrfs_block_group_used(&block_group->item);
> @@ -9489,16 +9489,8 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>   	 * group is going to be restriped, run checks against the target
>   	 * profile instead of the current one.
>   	 */
> -	ret = -1;
> +	can_reloc = false;
>   
> -	/*
> -	 * index:
> -	 *      0: raid10
> -	 *      1: raid1
> -	 *      2: dup
> -	 *      3: raid0
> -	 *      4: single
> -	 */
>   	target = get_restripe_target(fs_info, block_group->flags);
>   	if (target) {
>   		index = btrfs_bg_flags_to_raid_index(extended_to_chunk(target));
> @@ -9534,10 +9526,8 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>   
>   	/* We need to do this so that we can look at pending chunks */
>   	trans = btrfs_join_transaction(root);
> -	if (IS_ERR(trans)) {
> -		ret = PTR_ERR(trans);
> +	if (IS_ERR(trans))
>   		goto out;
> -	}
>   
>   	mutex_lock(&fs_info->chunk_mutex);
>   	list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) {
> @@ -9549,18 +9539,17 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>   		 */
>   		if (device->total_bytes > device->bytes_used + min_free &&
>   		    !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
> -			ret = find_free_dev_extent(trans, device, min_free,
> -						   &dev_offset, NULL);
> -			if (!ret)
> +			if (!find_free_dev_extent(trans, device, min_free,
> +						   &dev_offset, NULL))

  This can return -ENOMEM;

>   				dev_nr++;
>   
> -			if (dev_nr >= dev_min)
> +			if (dev_nr >= dev_min) {
> +				can_reloc = true;
>   				break;
> -
> -			ret = -1;
> +			}
>   		}
>   	}
> -	if (debug && ret == -1)
> +	if (debug && !can_reloc)
>   		btrfs_warn(fs_info,
>   			   "no space to allocate a new chunk for block group %llu",
>   			   block_group->key.objectid);
> @@ -9568,7 +9557,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>   	btrfs_end_transaction(trans);
>   out:
>   	btrfs_put_block_group(block_group);
> -	return ret;
> +	return can_reloc;
>   }
>   
>   static int find_first_block_group(struct btrfs_fs_info *fs_info,
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 8b0fd7bf3447..dc53d94a62aa 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2856,8 +2856,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>   	 */
>   	lockdep_assert_held(&fs_info->delete_unused_bgs_mutex);
>   
> -	ret = btrfs_can_relocate(fs_info, chunk_offset);
> -	if (ret)
> +	if (!btrfs_can_relocate(fs_info, chunk_offset))
>   		return -ENOSPC;

  And ends up converting -ENOMEM to -ENOSPC.

  Its better to propagate the accurate error.

Thanks, Anand


>   
>   	/* step one, relocate all the extents inside this chunk */
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] btrfs: Refactor btrfs_can_relocate
  2018-11-17  1:29   ` Anand Jain
@ 2018-12-03 17:25     ` David Sterba
  2018-12-04  6:34       ` Nikolay Borisov
  0 siblings, 1 reply; 25+ messages in thread
From: David Sterba @ 2018-12-03 17:25 UTC (permalink / raw)
  To: Anand Jain; +Cc: Nikolay Borisov, linux-btrfs

On Sat, Nov 17, 2018 at 09:29:27AM +0800, Anand Jain wrote:
> > -			ret = find_free_dev_extent(trans, device, min_free,
> > -						   &dev_offset, NULL);
> > -			if (!ret)
> > +			if (!find_free_dev_extent(trans, device, min_free,
> > +						   &dev_offset, NULL))
> 
>   This can return -ENOMEM;
> 
> > @@ -2856,8 +2856,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
> >   	 */
> >   	lockdep_assert_held(&fs_info->delete_unused_bgs_mutex);
> >   
> > -	ret = btrfs_can_relocate(fs_info, chunk_offset);
> > -	if (ret)
> > +	if (!btrfs_can_relocate(fs_info, chunk_offset))
> >   		return -ENOSPC;
> 
>   And ends up converting -ENOMEM to -ENOSPC.
> 
>   Its better to propagate the accurate error.

Right, converting to bool is obscuring the reason why the functions
fail. Making the code simpler at this cost does not look like a good
idea to me. I'll remove the patch from misc-next for now.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] btrfs: Refactor btrfs_can_relocate
  2018-12-03 17:25     ` David Sterba
@ 2018-12-04  6:34       ` Nikolay Borisov
  2018-12-04 13:07         ` David Sterba
  0 siblings, 1 reply; 25+ messages in thread
From: Nikolay Borisov @ 2018-12-04  6:34 UTC (permalink / raw)
  To: dsterba, Anand Jain, linux-btrfs



On 3.12.18 г. 19:25 ч., David Sterba wrote:
> On Sat, Nov 17, 2018 at 09:29:27AM +0800, Anand Jain wrote:
>>> -			ret = find_free_dev_extent(trans, device, min_free,
>>> -						   &dev_offset, NULL);
>>> -			if (!ret)
>>> +			if (!find_free_dev_extent(trans, device, min_free,
>>> +						   &dev_offset, NULL))
>>
>>   This can return -ENOMEM;
>>
>>> @@ -2856,8 +2856,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>>>   	 */
>>>   	lockdep_assert_held(&fs_info->delete_unused_bgs_mutex);
>>>   
>>> -	ret = btrfs_can_relocate(fs_info, chunk_offset);
>>> -	if (ret)
>>> +	if (!btrfs_can_relocate(fs_info, chunk_offset))
>>>   		return -ENOSPC;
>>
>>   And ends up converting -ENOMEM to -ENOSPC.
>>
>>   Its better to propagate the accurate error.
> 
> Right, converting to bool is obscuring the reason why the functions
> fail. Making the code simpler at this cost does not look like a good
> idea to me. I'll remove the patch from misc-next for now.

The patch itself does not make the code more obscure than currently is,
because even if ENOMEM is returned it's still converted to ENOSPC in
btrfs_relocate_chunk.
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] btrfs: Refactor btrfs_can_relocate
  2018-12-04  6:34       ` Nikolay Borisov
@ 2018-12-04 13:07         ` David Sterba
  0 siblings, 0 replies; 25+ messages in thread
From: David Sterba @ 2018-12-04 13:07 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dsterba, Anand Jain, linux-btrfs

On Tue, Dec 04, 2018 at 08:34:15AM +0200, Nikolay Borisov wrote:
> 
> 
> On 3.12.18 г. 19:25 ч., David Sterba wrote:
> > On Sat, Nov 17, 2018 at 09:29:27AM +0800, Anand Jain wrote:
> >>> -			ret = find_free_dev_extent(trans, device, min_free,
> >>> -						   &dev_offset, NULL);
> >>> -			if (!ret)
> >>> +			if (!find_free_dev_extent(trans, device, min_free,
> >>> +						   &dev_offset, NULL))
> >>
> >>   This can return -ENOMEM;
> >>
> >>> @@ -2856,8 +2856,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
> >>>   	 */
> >>>   	lockdep_assert_held(&fs_info->delete_unused_bgs_mutex);
> >>>   
> >>> -	ret = btrfs_can_relocate(fs_info, chunk_offset);
> >>> -	if (ret)
> >>> +	if (!btrfs_can_relocate(fs_info, chunk_offset))
> >>>   		return -ENOSPC;
> >>
> >>   And ends up converting -ENOMEM to -ENOSPC.
> >>
> >>   Its better to propagate the accurate error.
> > 
> > Right, converting to bool is obscuring the reason why the functions
> > fail. Making the code simpler at this cost does not look like a good
> > idea to me. I'll remove the patch from misc-next for now.
> 
> The patch itself does not make the code more obscure than currently is,
> because even if ENOMEM is returned it's still converted to ENOSPC in
> btrfs_relocate_chunk.

Sorry, but this can be hardly used as an excuse to keep the code
obscure. btrfs_can_relocate & co are not very often changed so there's
cruft accumulated. The error value propagation was probably not a hot
topic in 2009, btrfs_can_relocate needs the cleanups but please let's do
that properly.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, back to index

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-26 11:43 [PATCH 0/5] Misc cleanups in balance code Nikolay Borisov
2018-10-26 11:43 ` [PATCH 1/5] btrfs: Ensure at least 1g is free for balance Nikolay Borisov
2018-10-26 12:04   ` Qu Wenruo
2018-10-26 12:08     ` Nikolay Borisov
2018-10-26 12:32       ` Qu Wenruo
2018-10-26 12:09   ` Hans van Kranenburg
2018-10-26 12:16     ` Nikolay Borisov
2018-10-26 12:36       ` Hans van Kranenburg
2018-10-26 11:43 ` [PATCH 2/5] btrfs: Refactor btrfs_can_relocate Nikolay Borisov
2018-10-26 12:35   ` Qu Wenruo
2018-11-17  1:29   ` Anand Jain
2018-12-03 17:25     ` David Sterba
2018-12-04  6:34       ` Nikolay Borisov
2018-12-04 13:07         ` David Sterba
2018-10-26 11:43 ` [PATCH 3/5] btrfs: Remove superfluous check form btrfs_remove_chunk Nikolay Borisov
2018-10-26 12:40   ` Qu Wenruo
2018-11-16 23:57   ` Anand Jain
2018-10-26 11:43 ` [PATCH 4/5] btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument Nikolay Borisov
2018-10-26 12:42   ` Qu Wenruo
2018-11-17  0:53   ` Anand Jain
2018-10-26 11:43 ` [PATCH 5/5] btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range Nikolay Borisov
2018-10-26 12:44   ` Qu Wenruo
2018-11-17  1:02   ` Anand Jain
2018-11-16 15:18 ` [PATCH 0/5] Misc cleanups in balance code David Sterba
2018-11-16 15:36   ` Nikolay Borisov

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable: git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox