All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] btrfs: trim enhancement to allow btrfs really trim block groups
@ 2018-08-29  5:15 Qu Wenruo
  2018-08-29  5:15 ` [PATCH v3 1/2] btrfs: Enhance btrfs_trim_fs function to handle error better Qu Wenruo
  2018-08-29  5:15 ` [PATCH v3 2/2] btrfs: Ensure btrfs_trim_fs can trim the whole fs Qu Wenruo
  0 siblings, 2 replies; 8+ messages in thread
From: Qu Wenruo @ 2018-08-29  5:15 UTC (permalink / raw)
  To: linux-btrfs

This patchset can be fetched from github:
https://github.com/adam900710/linux/tree/trim_fix
Which is based on v4.19-rc1 tag.

This patchset introduces 2 enhancement, one to output better error
messages during trim, the other one is to ensure we could really trim
block groups if logical bytenr of block groups are beyond physical
device size.

These two patches are in the wild for a long time, and are pretty small
and the 2nd patch in facts fix a regression, and we already have test
case for it (btrfs/156).

Changelog:
v2:
  Only report total number of errors and first errno to make it less
  noisy.
  Change message level from warning to debug
v3:
  Rebase to v4.19-rc1.
  Change back message level from debug to warning since it's less noisy
  and will only report total failed bgs and devices.

Qu Wenruo (2):
  btrfs: Enhance btrfs_trim_fs function to handle error better
  btrfs: Ensure btrfs_trim_fs can trim the whole fs

 fs/btrfs/extent-tree.c | 67 ++++++++++++++++++++++++++----------------
 fs/btrfs/ioctl.c       | 11 ++++---
 2 files changed, 49 insertions(+), 29 deletions(-)

-- 
2.18.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/2] btrfs: Enhance btrfs_trim_fs function to handle error better
  2018-08-29  5:15 [PATCH v3 0/2] btrfs: trim enhancement to allow btrfs really trim block groups Qu Wenruo
@ 2018-08-29  5:15 ` Qu Wenruo
  2018-08-29 13:43   ` Nikolay Borisov
  2018-08-29  5:15 ` [PATCH v3 2/2] btrfs: Ensure btrfs_trim_fs can trim the whole fs Qu Wenruo
  1 sibling, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2018-08-29  5:15 UTC (permalink / raw)
  To: linux-btrfs

Function btrfs_trim_fs() doesn't handle errors in a consistent way, if
error happens when trimming existing block groups, it will skip the
remaining blocks and continue to trim unallocated space for each device.

And the return value will only reflect the final error from device
trimming.

This patch will fix such behavior by:

1) Recording last error from block group or device trimming
   So return value will also reflect the last error during trimming.
   Make developer more aware of the problem.

2) Continuing trimming if we can
   If we failed to trim one block group or device, we could still try
   next block group or device.

3) Report number of failures during block group and device trimming
   So it would be less noisy, but still gives user a brief summary of
   what's going wrong.

Such behavior can avoid confusion for case like failure to trim the
first block group and then only unallocated space is trimmed.

Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 57 ++++++++++++++++++++++++++++++------------
 1 file changed, 41 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index de6f75f5547b..7768f206196a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10832,6 +10832,16 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
 	return ret;
 }
 
+/*
+ * Trim the whole fs, by:
+ * 1) Trimming free space in each block group
+ * 2) Trimming unallocated space in each device
+ *
+ * Will try to continue trimming even if we failed to trim one block group or
+ * device.
+ * The return value will be the last error during trim.
+ * Or 0 if nothing wrong happened.
+ */
 int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
 {
 	struct btrfs_block_group_cache *cache = NULL;
@@ -10842,6 +10852,10 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
 	u64 end;
 	u64 trimmed = 0;
 	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
+	u64 bg_failed = 0;
+	u64 dev_failed = 0;
+	int bg_ret = 0;
+	int dev_ret = 0;
 	int ret = 0;
 
 	/*
@@ -10852,7 +10866,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
 	else
 		cache = btrfs_lookup_block_group(fs_info, range->start);
 
-	while (cache) {
+	for (; cache; cache = next_block_group(fs_info, cache)) {
 		if (cache->key.objectid >= (range->start + range->len)) {
 			btrfs_put_block_group(cache);
 			break;
@@ -10866,45 +10880,56 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
 			if (!block_group_cache_done(cache)) {
 				ret = cache_block_group(cache, 0);
 				if (ret) {
-					btrfs_put_block_group(cache);
-					break;
+					bg_failed++;
+					bg_ret = ret;
+					continue;
 				}
 				ret = wait_block_group_cache_done(cache);
 				if (ret) {
-					btrfs_put_block_group(cache);
-					break;
+					bg_failed++;
+					bg_ret = ret;
+					continue;
 				}
 			}
-			ret = btrfs_trim_block_group(cache,
-						     &group_trimmed,
-						     start,
-						     end,
-						     range->minlen);
+			ret = btrfs_trim_block_group(cache, &group_trimmed,
+						start, end, range->minlen);
 
 			trimmed += group_trimmed;
 			if (ret) {
-				btrfs_put_block_group(cache);
-				break;
+				bg_failed++;
+				bg_ret = ret;
+				continue;
 			}
 		}
-
-		cache = next_block_group(fs_info, cache);
 	}
 
+	if (bg_failed)
+		btrfs_warn(fs_info,
+		"failed to trim %llu block group(s), last error was %d",
+			   bg_failed, bg_ret);
 	mutex_lock(&fs_info->fs_devices->device_list_mutex);
 	devices = &fs_info->fs_devices->alloc_list;
 	list_for_each_entry(device, devices, dev_alloc_list) {
 		ret = btrfs_trim_free_extents(device, range->minlen,
 					      &group_trimmed);
-		if (ret)
+		if (ret) {
+			dev_failed++;
+			dev_ret = ret;
 			break;
+		}
 
 		trimmed += group_trimmed;
 	}
 	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
 
+	if (dev_failed)
+		btrfs_warn(fs_info,
+		"failed to trim %llu device(s), last error was %d",
+			   dev_failed, dev_ret);
 	range->len = trimmed;
-	return ret;
+	if (bg_ret)
+		return bg_ret;
+	return dev_ret;
 }
 
 /*
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/2] btrfs: Ensure btrfs_trim_fs can trim the whole fs
  2018-08-29  5:15 [PATCH v3 0/2] btrfs: trim enhancement to allow btrfs really trim block groups Qu Wenruo
  2018-08-29  5:15 ` [PATCH v3 1/2] btrfs: Enhance btrfs_trim_fs function to handle error better Qu Wenruo
@ 2018-08-29  5:15 ` Qu Wenruo
  2018-08-29 14:24   ` Nikolay Borisov
  1 sibling, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2018-08-29  5:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: stable

[BUG]
fstrim on some btrfs only trims the unallocated space, not trimming any
space in existing block groups.

[CAUSE]
Before fstrim_range passed to btrfs_trim_fs(), it get truncated to
range [0, super->total_bytes).
So later btrfs_trim_fs() will only be able to trim block groups in range
[0, super->total_bytes).

While for btrfs, any bytenr aligned to sector size is valid, since btrfs use
its logical address space, there is nothing limiting the location where
we put block groups.

For btrfs with routine balance, it's quite easy to relocate all
block groups and bytenr of block groups will start beyond super->total_bytes.

In that case, btrfs will not trim existing block groups.

[FIX]
Just remove the truncation in btrfs_ioctl_fitrim(), so btrfs_trim_fs()
can get the unmodified range, which is normally set to [0, U64_MAX].

Reported-by: Chris Murphy <lists@colorremedies.com>
Fixes: f4c697e6406d ("btrfs: return EINVAL if start > total_bytes in fitrim ioctl")
Cc: <stable@vger.kernel.org> # v4.0+
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 10 +---------
 fs/btrfs/ioctl.c       | 11 +++++++----
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7768f206196a..d1478d66c7a5 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10851,21 +10851,13 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
 	u64 start;
 	u64 end;
 	u64 trimmed = 0;
-	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
 	u64 bg_failed = 0;
 	u64 dev_failed = 0;
 	int bg_ret = 0;
 	int dev_ret = 0;
 	int ret = 0;
 
-	/*
-	 * try to trim all FS space, our block group may start from non-zero.
-	 */
-	if (range->len == total_bytes)
-		cache = btrfs_lookup_first_block_group(fs_info, range->start);
-	else
-		cache = btrfs_lookup_block_group(fs_info, range->start);
-
+	cache = btrfs_lookup_first_block_group(fs_info, range->start);
 	for (; cache; cache = next_block_group(fs_info, cache)) {
 		if (cache->key.objectid >= (range->start + range->len)) {
 			btrfs_put_block_group(cache);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 63600dc2ac4c..8165a4bfa579 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -491,7 +491,6 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
 	struct fstrim_range range;
 	u64 minlen = ULLONG_MAX;
 	u64 num_devices = 0;
-	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
 	int ret;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -515,11 +514,15 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
 		return -EOPNOTSUPP;
 	if (copy_from_user(&range, arg, sizeof(range)))
 		return -EFAULT;
-	if (range.start > total_bytes ||
-	    range.len < fs_info->sb->s_blocksize)
+
+	/*
+	 * NOTE: Don't truncate the range using super->total_bytes.
+	 * Bytenr of btrfs block group is in btrfs logical address space,
+	 * which can be any sector size aligned bytenr in [0, U64_MAX].
+	 */
+	if (range.len < fs_info->sb->s_blocksize)
 		return -EINVAL;
 
-	range.len = min(range.len, total_bytes - range.start);
 	range.minlen = max(range.minlen, minlen);
 	ret = btrfs_trim_fs(fs_info, &range);
 	if (ret < 0)
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] btrfs: Enhance btrfs_trim_fs function to handle error better
  2018-08-29  5:15 ` [PATCH v3 1/2] btrfs: Enhance btrfs_trim_fs function to handle error better Qu Wenruo
@ 2018-08-29 13:43   ` Nikolay Borisov
  2018-08-29 13:53     ` Qu Wenruo
  0 siblings, 1 reply; 8+ messages in thread
From: Nikolay Borisov @ 2018-08-29 13:43 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 29.08.2018 08:15, Qu Wenruo wrote:
> Function btrfs_trim_fs() doesn't handle errors in a consistent way, if
> error happens when trimming existing block groups, it will skip the
> remaining blocks and continue to trim unallocated space for each device.
> 
> And the return value will only reflect the final error from device
> trimming.
> 
> This patch will fix such behavior by:
> 
> 1) Recording last error from block group or device trimming
>    So return value will also reflect the last error during trimming.
>    Make developer more aware of the problem.
> 
> 2) Continuing trimming if we can
>    If we failed to trim one block group or device, we could still try
>    next block group or device.
> 
> 3) Report number of failures during block group and device trimming
>    So it would be less noisy, but still gives user a brief summary of
>    what's going wrong.
> 
> Such behavior can avoid confusion for case like failure to trim the
> first block group and then only unallocated space is trimmed.
> 
> Reported-by: Chris Murphy <lists@colorremedies.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 57 ++++++++++++++++++++++++++++++------------
>  1 file changed, 41 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index de6f75f5547b..7768f206196a 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10832,6 +10832,16 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>  	return ret;
>  }
>  
> +/*
> + * Trim the whole fs, by:
> + * 1) Trimming free space in each block group
> + * 2) Trimming unallocated space in each device
> + *
> + * Will try to continue trimming even if we failed to trim one block group or
> + * device.
> + * The return value will be the last error during trim.
> + * Or 0 if nothing wrong happened.
> + */
>  int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>  {
>  	struct btrfs_block_group_cache *cache = NULL;
> @@ -10842,6 +10852,10 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>  	u64 end;
>  	u64 trimmed = 0;
>  	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
> +	u64 bg_failed = 0;
> +	u64 dev_failed = 0;
> +	int bg_ret = 0;
> +	int dev_ret = 0;
>  	int ret = 0;
>  
>  	/*
> @@ -10852,7 +10866,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>  	else
>  		cache = btrfs_lookup_block_group(fs_info, range->start);
>  
> -	while (cache) {
> +	for (; cache; cache = next_block_group(fs_info, cache)) {
>  		if (cache->key.objectid >= (range->start + range->len)) {
>  			btrfs_put_block_group(cache);
>  			break;
> @@ -10866,45 +10880,56 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>  			if (!block_group_cache_done(cache)) {
>  				ret = cache_block_group(cache, 0);
>  				if (ret) {
> -					btrfs_put_block_group(cache);
> -					break;
> +					bg_failed++;
> +					bg_ret = ret;
> +					continue;
>  				}
>  				ret = wait_block_group_cache_done(cache);
>  				if (ret) {
> -					btrfs_put_block_group(cache);
> -					break;
> +					bg_failed++;
> +					bg_ret = ret;
> +					continue;
>  				}
>  			}
> -			ret = btrfs_trim_block_group(cache,
> -						     &group_trimmed,
> -						     start,
> -						     end,
> -						     range->minlen);
> +			ret = btrfs_trim_block_group(cache, &group_trimmed,
> +						start, end, range->minlen);
>  
>  			trimmed += group_trimmed;
>  			if (ret) {
> -				btrfs_put_block_group(cache);
> -				break;
> +				bg_failed++;
> +				bg_ret = ret;
> +				continue;
>  			}
>  		}
> -
> -		cache = next_block_group(fs_info, cache);
>  	}
>  
> +	if (bg_failed)
> +		btrfs_warn(fs_info,
> +		"failed to trim %llu block group(s), last error was %d",
> +			   bg_failed, bg_ret);

IMO this error handling strategy doesn't really bring any value. The
only thing which the user really gathers from that error message is that
N block groups failed. But there is no information whether it failed due
to read failure hence cannot load the freespace cache or there was some
error during the actual trimming.

I agree that if we fail for 1 bg we shouldn't terminate the whole
process but just skip it. However, a more useful error handling strategy
would be to have btrfs_warns for every failed block group for every
failed function. I.e one for wait_block_group_cache since the low-level
code in cache_block_group already prints something if it encounters
errors. And one for btrfs_trim_block_group

>  	mutex_lock(&fs_info->fs_devices->device_list_mutex);
>  	devices = &fs_info->fs_devices->alloc_list;
>  	list_for_each_entry(device, devices, dev_alloc_list) {
>  		ret = btrfs_trim_free_extents(device, range->minlen,
>  					      &group_trimmed);
> -		if (ret)
> +		if (ret) {
> +			dev_failed++;
> +			dev_ret = ret;
>  			break;
> +		}
>  
>  		trimmed += group_trimmed;
>  	}
>  	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>  
> +	if (dev_failed)
> +		btrfs_warn(fs_info,
> +		"failed to trim %llu device(s), last error was %d",
> +			   dev_failed, dev_ret);

Same thing here, I'd rather see one message per device error and also
identify the device by name.

>  	range->len = trimmed;
> -	return ret;
> +	if (bg_ret)
> +		return bg_ret;
> +	return dev_ret;
>  }
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] btrfs: Enhance btrfs_trim_fs function to handle error better
  2018-08-29 13:43   ` Nikolay Borisov
@ 2018-08-29 13:53     ` Qu Wenruo
  2018-08-29 14:40       ` Nikolay Borisov
  0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2018-08-29 13:53 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2018/8/29 下午9:43, Nikolay Borisov wrote:
> 
> 
> On 29.08.2018 08:15, Qu Wenruo wrote:
>> Function btrfs_trim_fs() doesn't handle errors in a consistent way, if
>> error happens when trimming existing block groups, it will skip the
>> remaining blocks and continue to trim unallocated space for each device.
>>
>> And the return value will only reflect the final error from device
>> trimming.
>>
>> This patch will fix such behavior by:
>>
>> 1) Recording last error from block group or device trimming
>>    So return value will also reflect the last error during trimming.
>>    Make developer more aware of the problem.
>>
>> 2) Continuing trimming if we can
>>    If we failed to trim one block group or device, we could still try
>>    next block group or device.
>>
>> 3) Report number of failures during block group and device trimming
>>    So it would be less noisy, but still gives user a brief summary of
>>    what's going wrong.
>>
>> Such behavior can avoid confusion for case like failure to trim the
>> first block group and then only unallocated space is trimmed.
>>
>> Reported-by: Chris Murphy <lists@colorremedies.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/extent-tree.c | 57 ++++++++++++++++++++++++++++++------------
>>  1 file changed, 41 insertions(+), 16 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index de6f75f5547b..7768f206196a 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -10832,6 +10832,16 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>>  	return ret;
>>  }
>>  
>> +/*
>> + * Trim the whole fs, by:
>> + * 1) Trimming free space in each block group
>> + * 2) Trimming unallocated space in each device
>> + *
>> + * Will try to continue trimming even if we failed to trim one block group or
>> + * device.
>> + * The return value will be the last error during trim.
>> + * Or 0 if nothing wrong happened.
>> + */
>>  int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>>  {
>>  	struct btrfs_block_group_cache *cache = NULL;
>> @@ -10842,6 +10852,10 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>>  	u64 end;
>>  	u64 trimmed = 0;
>>  	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
>> +	u64 bg_failed = 0;
>> +	u64 dev_failed = 0;
>> +	int bg_ret = 0;
>> +	int dev_ret = 0;
>>  	int ret = 0;
>>  
>>  	/*
>> @@ -10852,7 +10866,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>>  	else
>>  		cache = btrfs_lookup_block_group(fs_info, range->start);
>>  
>> -	while (cache) {
>> +	for (; cache; cache = next_block_group(fs_info, cache)) {
>>  		if (cache->key.objectid >= (range->start + range->len)) {
>>  			btrfs_put_block_group(cache);
>>  			break;
>> @@ -10866,45 +10880,56 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>>  			if (!block_group_cache_done(cache)) {
>>  				ret = cache_block_group(cache, 0);
>>  				if (ret) {
>> -					btrfs_put_block_group(cache);
>> -					break;
>> +					bg_failed++;
>> +					bg_ret = ret;
>> +					continue;
>>  				}
>>  				ret = wait_block_group_cache_done(cache);
>>  				if (ret) {
>> -					btrfs_put_block_group(cache);
>> -					break;
>> +					bg_failed++;
>> +					bg_ret = ret;
>> +					continue;
>>  				}
>>  			}
>> -			ret = btrfs_trim_block_group(cache,
>> -						     &group_trimmed,
>> -						     start,
>> -						     end,
>> -						     range->minlen);
>> +			ret = btrfs_trim_block_group(cache, &group_trimmed,
>> +						start, end, range->minlen);
>>  
>>  			trimmed += group_trimmed;
>>  			if (ret) {
>> -				btrfs_put_block_group(cache);
>> -				break;
>> +				bg_failed++;
>> +				bg_ret = ret;
>> +				continue;
>>  			}
>>  		}
>> -
>> -		cache = next_block_group(fs_info, cache);
>>  	}
>>  
>> +	if (bg_failed)
>> +		btrfs_warn(fs_info,
>> +		"failed to trim %llu block group(s), last error was %d",
>> +			   bg_failed, bg_ret);
> 
> IMO this error handling strategy doesn't really bring any value. The
> only thing which the user really gathers from that error message is that
> N block groups failed. But there is no information whether it failed due
> to read failure hence cannot load the freespace cache or there was some
> error during the actual trimming.
> 
> I agree that if we fail for 1 bg we shouldn't terminate the whole
> process but just skip it. However, a more useful error handling strategy
> would be to have btrfs_warns for every failed block group for every
> failed function.

Yep, previous version goes that way.

But even for btrfs_warn_rl() it could be too noisy.
And just as commented by David, user may not even care, thus such too
noisy report makes not much sense.

E.g. if something really went wrong and make the fs RO, then there will
be tons of error messages flooding dmesg (although most of them will be
rate limited), and really makes no sense.

Thanks,
Qu


> I.e one for wait_block_group_cache since the low-level
> code in cache_block_group already prints something if it encounters
> errors. And one for btrfs_trim_block_group
> 
>>  	mutex_lock(&fs_info->fs_devices->device_list_mutex);
>>  	devices = &fs_info->fs_devices->alloc_list;
>>  	list_for_each_entry(device, devices, dev_alloc_list) {
>>  		ret = btrfs_trim_free_extents(device, range->minlen,
>>  					      &group_trimmed);
>> -		if (ret)
>> +		if (ret) {
>> +			dev_failed++;
>> +			dev_ret = ret;
>>  			break;
>> +		}
>>  
>>  		trimmed += group_trimmed;
>>  	}
>>  	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>  
>> +	if (dev_failed)
>> +		btrfs_warn(fs_info,
>> +		"failed to trim %llu device(s), last error was %d",
>> +			   dev_failed, dev_ret);
> 
> Same thing here, I'd rather see one message per device error and also
> identify the device by name.
> 
>>  	range->len = trimmed;
>> -	return ret;
>> +	if (bg_ret)
>> +		return bg_ret;
>> +	return dev_ret;
>>  }
>>  
>>  /*
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] btrfs: Ensure btrfs_trim_fs can trim the whole fs
  2018-08-29  5:15 ` [PATCH v3 2/2] btrfs: Ensure btrfs_trim_fs can trim the whole fs Qu Wenruo
@ 2018-08-29 14:24   ` Nikolay Borisov
  0 siblings, 0 replies; 8+ messages in thread
From: Nikolay Borisov @ 2018-08-29 14:24 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: stable



On 29.08.2018 08:15, Qu Wenruo wrote:
> [BUG]
> fstrim on some btrfs only trims the unallocated space, not trimming any
> space in existing block groups.
> 
> [CAUSE]
> Before fstrim_range passed to btrfs_trim_fs(), it get truncated to
> range [0, super->total_bytes).
> So later btrfs_trim_fs() will only be able to trim block groups in range
> [0, super->total_bytes).
> 
> While for btrfs, any bytenr aligned to sector size is valid, since btrfs use
> its logical address space, there is nothing limiting the location where
> we put block groups.
> 
> For btrfs with routine balance, it's quite easy to relocate all
> block groups and bytenr of block groups will start beyond super->total_bytes.
> 
> In that case, btrfs will not trim existing block groups.
> 
> [FIX]
> Just remove the truncation in btrfs_ioctl_fitrim(), so btrfs_trim_fs()
> can get the unmodified range, which is normally set to [0, U64_MAX].
> 
> Reported-by: Chris Murphy <lists@colorremedies.com>
> Fixes: f4c697e6406d ("btrfs: return EINVAL if start > total_bytes in fitrim ioctl")
> Cc: <stable@vger.kernel.org> # v4.0+
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Seems legit,

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/extent-tree.c | 10 +---------
>  fs/btrfs/ioctl.c       | 11 +++++++----
>  2 files changed, 8 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 7768f206196a..d1478d66c7a5 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10851,21 +10851,13 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>  	u64 start;
>  	u64 end;
>  	u64 trimmed = 0;
> -	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
>  	u64 bg_failed = 0;
>  	u64 dev_failed = 0;
>  	int bg_ret = 0;
>  	int dev_ret = 0;
>  	int ret = 0;
>  
> -	/*
> -	 * try to trim all FS space, our block group may start from non-zero.
> -	 */
> -	if (range->len == total_bytes)
> -		cache = btrfs_lookup_first_block_group(fs_info, range->start);
> -	else
> -		cache = btrfs_lookup_block_group(fs_info, range->start);
> -
> +	cache = btrfs_lookup_first_block_group(fs_info, range->start);
>  	for (; cache; cache = next_block_group(fs_info, cache)) {
>  		if (cache->key.objectid >= (range->start + range->len)) {
>  			btrfs_put_block_group(cache);
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 63600dc2ac4c..8165a4bfa579 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -491,7 +491,6 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
>  	struct fstrim_range range;
>  	u64 minlen = ULLONG_MAX;
>  	u64 num_devices = 0;
> -	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
>  	int ret;
>  
>  	if (!capable(CAP_SYS_ADMIN))
> @@ -515,11 +514,15 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
>  		return -EOPNOTSUPP;
>  	if (copy_from_user(&range, arg, sizeof(range)))
>  		return -EFAULT;
> -	if (range.start > total_bytes ||
> -	    range.len < fs_info->sb->s_blocksize)
> +
> +	/*
> +	 * NOTE: Don't truncate the range using super->total_bytes.
> +	 * Bytenr of btrfs block group is in btrfs logical address space,
> +	 * which can be any sector size aligned bytenr in [0, U64_MAX].
> +	 */
> +	if (range.len < fs_info->sb->s_blocksize)
>  		return -EINVAL;
>  
> -	range.len = min(range.len, total_bytes - range.start);
>  	range.minlen = max(range.minlen, minlen);
>  	ret = btrfs_trim_fs(fs_info, &range);
>  	if (ret < 0)
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] btrfs: Enhance btrfs_trim_fs function to handle error better
  2018-08-29 13:53     ` Qu Wenruo
@ 2018-08-29 14:40       ` Nikolay Borisov
  2018-09-10 18:42         ` David Sterba
  0 siblings, 1 reply; 8+ messages in thread
From: Nikolay Borisov @ 2018-08-29 14:40 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs



On 29.08.2018 16:53, Qu Wenruo wrote:
> 
> 
> On 2018/8/29 下午9:43, Nikolay Borisov wrote:
>>
>>
>> On 29.08.2018 08:15, Qu Wenruo wrote:
>>> Function btrfs_trim_fs() doesn't handle errors in a consistent way, if
>>> error happens when trimming existing block groups, it will skip the
>>> remaining blocks and continue to trim unallocated space for each device.
>>>
>>> And the return value will only reflect the final error from device
>>> trimming.
>>>
>>> This patch will fix such behavior by:
>>>
>>> 1) Recording last error from block group or device trimming
>>>    So return value will also reflect the last error during trimming.
>>>    Make developer more aware of the problem.
>>>
>>> 2) Continuing trimming if we can
>>>    If we failed to trim one block group or device, we could still try
>>>    next block group or device.
>>>
>>> 3) Report number of failures during block group and device trimming
>>>    So it would be less noisy, but still gives user a brief summary of
>>>    what's going wrong.
>>>
>>> Such behavior can avoid confusion for case like failure to trim the
>>> first block group and then only unallocated space is trimmed.
>>>
>>> Reported-by: Chris Murphy <lists@colorremedies.com>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>  fs/btrfs/extent-tree.c | 57 ++++++++++++++++++++++++++++++------------
>>>  1 file changed, 41 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index de6f75f5547b..7768f206196a 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -10832,6 +10832,16 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
>>>  	return ret;
>>>  }
>>>  
>>> +/*
>>> + * Trim the whole fs, by:
>>> + * 1) Trimming free space in each block group
>>> + * 2) Trimming unallocated space in each device
>>> + *
>>> + * Will try to continue trimming even if we failed to trim one block group or
>>> + * device.
>>> + * The return value will be the last error during trim.
>>> + * Or 0 if nothing wrong happened.
>>> + */
>>>  int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>>>  {
>>>  	struct btrfs_block_group_cache *cache = NULL;
>>> @@ -10842,6 +10852,10 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>>>  	u64 end;
>>>  	u64 trimmed = 0;
>>>  	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
>>> +	u64 bg_failed = 0;
>>> +	u64 dev_failed = 0;
>>> +	int bg_ret = 0;
>>> +	int dev_ret = 0;
>>>  	int ret = 0;
>>>  
>>>  	/*
>>> @@ -10852,7 +10866,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>>>  	else
>>>  		cache = btrfs_lookup_block_group(fs_info, range->start);
>>>  
>>> -	while (cache) {
>>> +	for (; cache; cache = next_block_group(fs_info, cache)) {
>>>  		if (cache->key.objectid >= (range->start + range->len)) {
>>>  			btrfs_put_block_group(cache);
>>>  			break;
>>> @@ -10866,45 +10880,56 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
>>>  			if (!block_group_cache_done(cache)) {
>>>  				ret = cache_block_group(cache, 0);
>>>  				if (ret) {
>>> -					btrfs_put_block_group(cache);
>>> -					break;
>>> +					bg_failed++;
>>> +					bg_ret = ret;
>>> +					continue;
>>>  				}
>>>  				ret = wait_block_group_cache_done(cache);
>>>  				if (ret) {
>>> -					btrfs_put_block_group(cache);
>>> -					break;
>>> +					bg_failed++;
>>> +					bg_ret = ret;
>>> +					continue;
>>>  				}
>>>  			}
>>> -			ret = btrfs_trim_block_group(cache,
>>> -						     &group_trimmed,
>>> -						     start,
>>> -						     end,
>>> -						     range->minlen);
>>> +			ret = btrfs_trim_block_group(cache, &group_trimmed,
>>> +						start, end, range->minlen);
>>>  
>>>  			trimmed += group_trimmed;
>>>  			if (ret) {
>>> -				btrfs_put_block_group(cache);
>>> -				break;
>>> +				bg_failed++;
>>> +				bg_ret = ret;
>>> +				continue;
>>>  			}
>>>  		}
>>> -
>>> -		cache = next_block_group(fs_info, cache);
>>>  	}
>>>  
>>> +	if (bg_failed)
>>> +		btrfs_warn(fs_info,
>>> +		"failed to trim %llu block group(s), last error was %d",
>>> +			   bg_failed, bg_ret);
>>
>> IMO this error handling strategy doesn't really bring any value. The
>> only thing which the user really gathers from that error message is that
>> N block groups failed. But there is no information whether it failed due
>> to read failure hence cannot load the freespace cache or there was some
>> error during the actual trimming.
>>
>> I agree that if we fail for 1 bg we shouldn't terminate the whole
>> process but just skip it. However, a more useful error handling strategy
>> would be to have btrfs_warns for every failed block group for every
>> failed function.
> 
> Yep, previous version goes that way.
> 
> But even for btrfs_warn_rl() it could be too noisy.
> And just as commented by David, user may not even care, thus such too
> noisy report makes not much sense.
> 
> E.g. if something really went wrong and make the fs RO, then there will
> be tons of error messages flooding dmesg (although most of them will be
> rate limited), and really makes no sense.

Well in that case I don't see value in retaining the last error message
so you can just leave the "%llu block groups failed to be trimmed"
messages. The last error is not meaningful.


> 
> Thanks,
> Qu
> 
> 
>> I.e one for wait_block_group_cache since the low-level
>> code in cache_block_group already prints something if it encounters
>> errors. And one for btrfs_trim_block_group
>>
>>>  	mutex_lock(&fs_info->fs_devices->device_list_mutex);
>>>  	devices = &fs_info->fs_devices->alloc_list;
>>>  	list_for_each_entry(device, devices, dev_alloc_list) {
>>>  		ret = btrfs_trim_free_extents(device, range->minlen,
>>>  					      &group_trimmed);
>>> -		if (ret)
>>> +		if (ret) {
>>> +			dev_failed++;
>>> +			dev_ret = ret;
>>>  			break;
>>> +		}
>>>  
>>>  		trimmed += group_trimmed;
>>>  	}
>>>  	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>>  
>>> +	if (dev_failed)
>>> +		btrfs_warn(fs_info,
>>> +		"failed to trim %llu device(s), last error was %d",
>>> +			   dev_failed, dev_ret);
>>
>> Same thing here, I'd rather see one message per device error and also
>> identify the device by name.
>>
>>>  	range->len = trimmed;
>>> -	return ret;
>>> +	if (bg_ret)
>>> +		return bg_ret;
>>> +	return dev_ret;
>>>  }
>>>  
>>>  /*
>>>
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] btrfs: Enhance btrfs_trim_fs function to handle error better
  2018-08-29 14:40       ` Nikolay Borisov
@ 2018-09-10 18:42         ` David Sterba
  0 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2018-09-10 18:42 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Qu Wenruo, Qu Wenruo, linux-btrfs

On Wed, Aug 29, 2018 at 05:40:12PM +0300, Nikolay Borisov wrote:
> 
> 
> On 29.08.2018 16:53, Qu Wenruo wrote:
> > 
> > 
> > On 2018/8/29 下午9:43, Nikolay Borisov wrote:
> >>
> >>
> >> On 29.08.2018 08:15, Qu Wenruo wrote:
> >>> Function btrfs_trim_fs() doesn't handle errors in a consistent way, if
> >>> error happens when trimming existing block groups, it will skip the
> >>> remaining blocks and continue to trim unallocated space for each device.
> >>>
> >>> And the return value will only reflect the final error from device
> >>> trimming.
> >>>
> >>> This patch will fix such behavior by:
> >>>
> >>> 1) Recording last error from block group or device trimming
> >>>    So return value will also reflect the last error during trimming.
> >>>    Make developer more aware of the problem.
> >>>
> >>> 2) Continuing trimming if we can
> >>>    If we failed to trim one block group or device, we could still try
> >>>    next block group or device.
> >>>
> >>> 3) Report number of failures during block group and device trimming
> >>>    So it would be less noisy, but still gives user a brief summary of
> >>>    what's going wrong.
> >>>
> >>> Such behavior can avoid confusion for case like failure to trim the
> >>> first block group and then only unallocated space is trimmed.
> >>>
> >>> Reported-by: Chris Murphy <lists@colorremedies.com>
> >>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >>> ---
> >>>  fs/btrfs/extent-tree.c | 57 ++++++++++++++++++++++++++++++------------
> >>>  1 file changed, 41 insertions(+), 16 deletions(-)
> >>>
> >>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >>> index de6f75f5547b..7768f206196a 100644
> >>> --- a/fs/btrfs/extent-tree.c
> >>> +++ b/fs/btrfs/extent-tree.c
> >>> @@ -10832,6 +10832,16 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
> >>>  	return ret;
> >>>  }
> >>>  
> >>> +/*
> >>> + * Trim the whole fs, by:
> >>> + * 1) Trimming free space in each block group
> >>> + * 2) Trimming unallocated space in each device
> >>> + *
> >>> + * Will try to continue trimming even if we failed to trim one block group or
> >>> + * device.
> >>> + * The return value will be the last error during trim.
> >>> + * Or 0 if nothing wrong happened.
> >>> + */
> >>>  int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
> >>>  {
> >>>  	struct btrfs_block_group_cache *cache = NULL;
> >>> @@ -10842,6 +10852,10 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
> >>>  	u64 end;
> >>>  	u64 trimmed = 0;
> >>>  	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
> >>> +	u64 bg_failed = 0;
> >>> +	u64 dev_failed = 0;
> >>> +	int bg_ret = 0;
> >>> +	int dev_ret = 0;
> >>>  	int ret = 0;
> >>>  
> >>>  	/*
> >>> @@ -10852,7 +10866,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
> >>>  	else
> >>>  		cache = btrfs_lookup_block_group(fs_info, range->start);
> >>>  
> >>> -	while (cache) {
> >>> +	for (; cache; cache = next_block_group(fs_info, cache)) {
> >>>  		if (cache->key.objectid >= (range->start + range->len)) {
> >>>  			btrfs_put_block_group(cache);
> >>>  			break;
> >>> @@ -10866,45 +10880,56 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
> >>>  			if (!block_group_cache_done(cache)) {
> >>>  				ret = cache_block_group(cache, 0);
> >>>  				if (ret) {
> >>> -					btrfs_put_block_group(cache);
> >>> -					break;
> >>> +					bg_failed++;
> >>> +					bg_ret = ret;
> >>> +					continue;
> >>>  				}
> >>>  				ret = wait_block_group_cache_done(cache);
> >>>  				if (ret) {
> >>> -					btrfs_put_block_group(cache);
> >>> -					break;
> >>> +					bg_failed++;
> >>> +					bg_ret = ret;
> >>> +					continue;
> >>>  				}
> >>>  			}
> >>> -			ret = btrfs_trim_block_group(cache,
> >>> -						     &group_trimmed,
> >>> -						     start,
> >>> -						     end,
> >>> -						     range->minlen);
> >>> +			ret = btrfs_trim_block_group(cache, &group_trimmed,
> >>> +						start, end, range->minlen);
> >>>  
> >>>  			trimmed += group_trimmed;
> >>>  			if (ret) {
> >>> -				btrfs_put_block_group(cache);
> >>> -				break;
> >>> +				bg_failed++;
> >>> +				bg_ret = ret;
> >>> +				continue;
> >>>  			}
> >>>  		}
> >>> -
> >>> -		cache = next_block_group(fs_info, cache);
> >>>  	}
> >>>  
> >>> +	if (bg_failed)
> >>> +		btrfs_warn(fs_info,
> >>> +		"failed to trim %llu block group(s), last error was %d",
> >>> +			   bg_failed, bg_ret);
> >>
> >> IMO this error handling strategy doesn't really bring any value. The
> >> only thing which the user really gathers from that error message is that
> >> N block groups failed. But there is no information whether it failed due
> >> to read failure hence cannot load the freespace cache or there was some
> >> error during the actual trimming.
> >>
> >> I agree that if we fail for 1 bg we shouldn't terminate the whole
> >> process but just skip it. However, a more useful error handling strategy
> >> would be to have btrfs_warns for every failed block group for every
> >> failed function.
> > 
> > Yep, previous version goes that way.
> > 
> > But even for btrfs_warn_rl() it could be too noisy.
> > And just as commented by David, user may not even care, thus such too
> > noisy report makes not much sense.
> > 
> > E.g. if something really went wrong and make the fs RO, then there will
> > be tons of error messages flooding dmesg (although most of them will be
> > rate limited), and really makes no sense.
> 
> Well in that case I don't see value in retaining the last error message
> so you can just leave the "%llu block groups failed to be trimmed"
> messages. The last error is not meaningful.

Do you mean the error value of the last error, saved to the bg_ret
variable? I'd say it's at least something to be returned to the user,
I find the bare "%llu failed to trim" not meaningful.

We had a discussion with Qu last time how to best report the errors from
the trim loops so I'm open to suggestions, but I don't see other options
if we don't want to flood the logs.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-09-10 23:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-29  5:15 [PATCH v3 0/2] btrfs: trim enhancement to allow btrfs really trim block groups Qu Wenruo
2018-08-29  5:15 ` [PATCH v3 1/2] btrfs: Enhance btrfs_trim_fs function to handle error better Qu Wenruo
2018-08-29 13:43   ` Nikolay Borisov
2018-08-29 13:53     ` Qu Wenruo
2018-08-29 14:40       ` Nikolay Borisov
2018-09-10 18:42         ` David Sterba
2018-08-29  5:15 ` [PATCH v3 2/2] btrfs: Ensure btrfs_trim_fs can trim the whole fs Qu Wenruo
2018-08-29 14:24   ` Nikolay Borisov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.