Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] btrfs: trim: fix range start validity check
@ 2019-08-07  8:20 Anand Jain
  2019-08-07  8:44 ` Filipe Manana
  0 siblings, 1 reply; 9+ messages in thread
From: Anand Jain @ 2019-08-07  8:20 UTC (permalink / raw)
  To: linux-btrfs; +Cc: wqu

Commit 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole
filesystem) makes sure we always trim starting from the first block group.
However it also removed the range.start validity check which is set in the
context of the user, where its range is from 0 to maximum of filesystem
totalbytes and so we have to check its validity in the kernel.

Also as in the fstrim(8) [1] the kernel layers may modify the trim range.

[1]
Further, the kernel block layer reserves the right to adjust the discard
ranges to fit raid stripe geometry, non-trim capable devices in a LVM
setup, etc. These reductions would not be reflected in fstrim_range.len
(the --length option).

This patch undos the deleted range::start validity check.

Fixes: 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole filesystem)
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
  With this patch fstests generic/260 is successful now.

 fs/btrfs/ioctl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b431f7877e88..9345fcdf80c7 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -521,6 +521,8 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
 		return -EOPNOTSUPP;
 	if (copy_from_user(&range, arg, sizeof(range)))
 		return -EFAULT;
+	if (range.start > btrfs_super_total_bytes(fs_info->super_copy))
+		return -EINVAL;
 
 	/*
 	 * NOTE: Don't truncate the range using super->total_bytes.  Bytenr of
-- 
2.21.0 (Apple Git-120)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: trim: fix range start validity check
  2019-08-07  8:20 [PATCH] btrfs: trim: fix range start validity check Anand Jain
@ 2019-08-07  8:44 ` Filipe Manana
  2019-08-07  8:55   ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Filipe Manana @ 2019-08-07  8:44 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, Qu Wenruo

On Wed, Aug 7, 2019 at 9:35 AM Anand Jain <anand.jain@oracle.com> wrote:
>
> Commit 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole
> filesystem) makes sure we always trim starting from the first block group.
> However it also removed the range.start validity check which is set in the
> context of the user, where its range is from 0 to maximum of filesystem
> totalbytes and so we have to check its validity in the kernel.
>
> Also as in the fstrim(8) [1] the kernel layers may modify the trim range.
>
> [1]
> Further, the kernel block layer reserves the right to adjust the discard
> ranges to fit raid stripe geometry, non-trim capable devices in a LVM
> setup, etc. These reductions would not be reflected in fstrim_range.len
> (the --length option).
>
> This patch undos the deleted range::start validity check.
>
> Fixes: 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole filesystem)
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>   With this patch fstests generic/260 is successful now.
>
>  fs/btrfs/ioctl.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index b431f7877e88..9345fcdf80c7 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -521,6 +521,8 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
>                 return -EOPNOTSUPP;
>         if (copy_from_user(&range, arg, sizeof(range)))
>                 return -EFAULT;
> +       if (range.start > btrfs_super_total_bytes(fs_info->super_copy))
> +               return -EINVAL;

This makes it impossible to trim block groups that start at an offset
greater then the super_total_bytes values. In fact, in extreme cases
it's possible all block groups start at offsets larger then
super_total_bytes.
Have you considered that, or am I missing something?

The change log is also vague to me, doesn't explain why you are
re-adding that check.

Thanks.

>
>         /*
>          * NOTE: Don't truncate the range using super->total_bytes.  Bytenr of
> --
> 2.21.0 (Apple Git-120)
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: trim: fix range start validity check
  2019-08-07  8:44 ` Filipe Manana
@ 2019-08-07  8:55   ` Qu Wenruo
  2019-08-07  9:43     ` Anand Jain
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2019-08-07  8:55 UTC (permalink / raw)
  To: fdmanana, Anand Jain; +Cc: linux-btrfs, Qu Wenruo

[-- Attachment #1.1: Type: text/plain, Size: 2509 bytes --]



On 2019/8/7 下午4:44, Filipe Manana wrote:
> On Wed, Aug 7, 2019 at 9:35 AM Anand Jain <anand.jain@oracle.com> wrote:
>>
>> Commit 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole
>> filesystem) makes sure we always trim starting from the first block group.
>> However it also removed the range.start validity check which is set in the
>> context of the user, where its range is from 0 to maximum of filesystem
>> totalbytes and so we have to check its validity in the kernel.
>>
>> Also as in the fstrim(8) [1] the kernel layers may modify the trim range.
>>
>> [1]
>> Further, the kernel block layer reserves the right to adjust the discard
>> ranges to fit raid stripe geometry, non-trim capable devices in a LVM
>> setup, etc. These reductions would not be reflected in fstrim_range.len
>> (the --length option).
>>
>> This patch undos the deleted range::start validity check.
>>
>> Fixes: 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole filesystem)
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   With this patch fstests generic/260 is successful now.
>>
>>  fs/btrfs/ioctl.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index b431f7877e88..9345fcdf80c7 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -521,6 +521,8 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
>>                 return -EOPNOTSUPP;
>>         if (copy_from_user(&range, arg, sizeof(range)))
>>                 return -EFAULT;
>> +       if (range.start > btrfs_super_total_bytes(fs_info->super_copy))
>> +               return -EINVAL;
> 
> This makes it impossible to trim block groups that start at an offset
> greater then the super_total_bytes values.

Exactly.

> In fact, in extreme cases
> it's possible all block groups start at offsets larger then
> super_total_bytes.
> Have you considered that, or am I missing something?

And, I have already mentioned exactly the same reason in that commit
message.

To address it once again, the bytenr is btrfs logical address space, has
nothing to do with any device.
Thus it can be [0, U64MAX].

Thanks,
Qu

> 
> The change log is also vague to me, doesn't explain why you are
> re-adding that check.
> 
> Thanks.
> 
>>
>>         /*
>>          * NOTE: Don't truncate the range using super->total_bytes.  Bytenr of
>> --
>> 2.21.0 (Apple Git-120)
>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: trim: fix range start validity check
  2019-08-07  8:55   ` Qu Wenruo
@ 2019-08-07  9:43     ` Anand Jain
  2019-08-07 11:15       ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Anand Jain @ 2019-08-07  9:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Anand Jain, fdmanana, linux-btrfs, Qu Wenruo



> On 7 Aug 2019, at 4:55 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> 
> 
> 
> On 2019/8/7 下午4:44, Filipe Manana wrote:
>> On Wed, Aug 7, 2019 at 9:35 AM Anand Jain <anand.jain@oracle.com> wrote:
>>> 
>>> Commit 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole
>>> filesystem) makes sure we always trim starting from the first block group.
>>> However it also removed the range.start validity check which is set in the
>>> context of the user, where its range is from 0 to maximum of filesystem
>>> totalbytes and so we have to check its validity in the kernel.
>>> 
>>> Also as in the fstrim(8) [1] the kernel layers may modify the trim range.
>>> 
>>> [1]
>>> Further, the kernel block layer reserves the right to adjust the discard
>>> ranges to fit raid stripe geometry, non-trim capable devices in a LVM
>>> setup, etc. These reductions would not be reflected in fstrim_range.len
>>> (the --length option).
>>> 
>>> This patch undos the deleted range::start validity check.
>>> 
>>> Fixes: 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole filesystem)
>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>> ---
>>>  With this patch fstests generic/260 is successful now.
>>> 
>>> fs/btrfs/ioctl.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>> 
>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>> index b431f7877e88..9345fcdf80c7 100644
>>> --- a/fs/btrfs/ioctl.c
>>> +++ b/fs/btrfs/ioctl.c
>>> @@ -521,6 +521,8 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
>>>                return -EOPNOTSUPP;
>>>        if (copy_from_user(&range, arg, sizeof(range)))
>>>                return -EFAULT;
>>> +       if (range.start > btrfs_super_total_bytes(fs_info->super_copy))
>>> +               return -EINVAL;
>> 
>> This makes it impossible to trim block groups that start at an offset
>> greater then the super_total_bytes values.
> 

 what did I miss? As there is no limit on the range::length
 so we can still trim beyond super_total_bytes. So I don’t
 understand why not? More below.


> Exactly.
> 
>> In fact, in extreme cases
>> it's possible all block groups start at offsets larger then
>> super_total_bytes.
>> Have you considered that, or am I missing something?
> 
> And, I have already mentioned exactly the same reason in that commit
> message.
> 
> To address it once again, the bytenr is btrfs logical address space, has
> nothing to do with any device.
> Thus it can be [0, U64MAX].
> 

 Fundamentally, logical address space has no relevance in the user context,
 so also I don’t understand your view on how anyone shall use the range::start
 even if there is no check?

 As in the man page it's ok to adjust the range internally, and as length
 can be up to U64MAX we can still trim beyond super_total_bytes?

Thanks, Anand


> Thanks,
> Qu
> 
>> 
>> The change log is also vague to me, doesn't explain why you are
>> re-adding that check.
>> 
>> Thanks.
>> 
>>> 
>>>        /*
>>>         * NOTE: Don't truncate the range using super->total_bytes.  Bytenr of
>>> --
>>> 2.21.0 (Apple Git-120)
>>> 
>> 
>> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: trim: fix range start validity check
  2019-08-07  9:43     ` Anand Jain
@ 2019-08-07 11:15       ` Qu Wenruo
  2019-08-08  3:53         ` Anand Jain
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2019-08-07 11:15 UTC (permalink / raw)
  To: Anand Jain; +Cc: fdmanana, linux-btrfs, Qu Wenruo

[-- Attachment #1.1: Type: text/plain, Size: 4662 bytes --]



On 2019/8/7 下午5:43, Anand Jain wrote:
> 
> 
>> On 7 Aug 2019, at 4:55 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2019/8/7 下午4:44, Filipe Manana wrote:
>>> On Wed, Aug 7, 2019 at 9:35 AM Anand Jain <anand.jain@oracle.com> wrote:
>>>>
>>>> Commit 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole
>>>> filesystem) makes sure we always trim starting from the first block group.
>>>> However it also removed the range.start validity check which is set in the
>>>> context of the user, where its range is from 0 to maximum of filesystem
>>>> totalbytes and so we have to check its validity in the kernel.
>>>>
>>>> Also as in the fstrim(8) [1] the kernel layers may modify the trim range.
>>>>
>>>> [1]
>>>> Further, the kernel block layer reserves the right to adjust the discard
>>>> ranges to fit raid stripe geometry, non-trim capable devices in a LVM
>>>> setup, etc. These reductions would not be reflected in fstrim_range.len
>>>> (the --length option).
>>>>
>>>> This patch undos the deleted range::start validity check.
>>>>
>>>> Fixes: 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole filesystem)
>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>>> ---
>>>>  With this patch fstests generic/260 is successful now.
>>>>
>>>> fs/btrfs/ioctl.c | 2 ++
>>>> 1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>> index b431f7877e88..9345fcdf80c7 100644
>>>> --- a/fs/btrfs/ioctl.c
>>>> +++ b/fs/btrfs/ioctl.c
>>>> @@ -521,6 +521,8 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
>>>>                return -EOPNOTSUPP;
>>>>        if (copy_from_user(&range, arg, sizeof(range)))
>>>>                return -EFAULT;
>>>> +       if (range.start > btrfs_super_total_bytes(fs_info->super_copy))
>>>> +               return -EINVAL;
>>>
>>> This makes it impossible to trim block groups that start at an offset
>>> greater then the super_total_bytes values.
>>
> 
>  what did I miss? As there is no limit on the range::length
>  so we can still trim beyond super_total_bytes. So I don’t
>  understand why not? More below.
> 
> 
>> Exactly.
>>
>>> In fact, in extreme cases
>>> it's possible all block groups start at offsets larger then
>>> super_total_bytes.
>>> Have you considered that, or am I missing something?
>>
>> And, I have already mentioned exactly the same reason in that commit
>> message.
>>
>> To address it once again, the bytenr is btrfs logical address space, has
>> nothing to do with any device.
>> Thus it can be [0, U64MAX].
>>
> 
>  Fundamentally, logical address space has no relevance in the user context,
>  so also I don’t understand your view on how anyone shall use the range::start
>  even if there is no check?

range::start == bg_bytenr, range::len = bg_len to trim only a bg.

And that bg_bytenr is at 128T, since the fs has gone through several
balance.
But there is only one device, and its size is only 1T.

Please tell me how to trim that block group only?

> 
>  As in the man page it's ok to adjust the range internally, and as length
>  can be up to U64MAX we can still trim beyond super_total_bytes?

As I said already, super_total_bytes makes no sense in logical address
space.
As super_total_bytes is just the sum of all devices, it's a device layer
thing, nothing to do with logical address space.

You're mixing logical bytenr with something not even a device physical
offset, how can it be correct?

Let me make it more clear, btrfs has its own logical address space
unrelated to whatever the devices mapping are.
It's always [0, U64_MAX], no matter how many devices there are.

If btrfs is implemented using dm, it should be more clear.

(single device btrfs)
          |
(dm linear, 0 ~ U64_MAX, virtual devices)<- that's logical address space
  |   |   |    |
  |   |   |    \- (dm raid1, 1T ~ 1T + 128M, devid1 XXX, devid2 XXX)
  |   |   \------ (dm raid0, 2T ~ 2T + 1G, devid1 XXX, devid2 XXX)
  |   \---------- (dm raid1, 128G ~ 128G + 128M, devi1 XXX, devid xxx)
  \-------------- (dm raid0, 1M ~ 1M + 1G, devid1 xxx, devid2 xxx).

If we're trim such fs layout, you tell me which offset you should use.

Thanks,
Qu

> 
> Thanks, Anand
> 
> 
>> Thanks,
>> Qu
>>
>>>
>>> The change log is also vague to me, doesn't explain why you are
>>> re-adding that check.
>>>
>>> Thanks.
>>>
>>>>
>>>>        /*
>>>>         * NOTE: Don't truncate the range using super->total_bytes.  Bytenr of
>>>> --
>>>> 2.21.0 (Apple Git-120)
>>>>
>>>
>>>
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: trim: fix range start validity check
  2019-08-07 11:15       ` Qu Wenruo
@ 2019-08-08  3:53         ` Anand Jain
  2019-08-08  5:55           ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Anand Jain @ 2019-08-08  3:53 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Anand Jain, fdmanana, linux-btrfs, Qu Wenruo



> On 7 Aug 2019, at 7:15 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> 
> 
> 
> On 2019/8/7 下午5:43, Anand Jain wrote:
>> 
>> 
>>> On 7 Aug 2019, at 4:55 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>> 
>>> 
>>> 
>>> On 2019/8/7 下午4:44, Filipe Manana wrote:
>>>> On Wed, Aug 7, 2019 at 9:35 AM Anand Jain <anand.jain@oracle.com> wrote:
>>>>> 
>>>>> Commit 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole
>>>>> filesystem) makes sure we always trim starting from the first block group.
>>>>> However it also removed the range.start validity check which is set in the
>>>>> context of the user, where its range is from 0 to maximum of filesystem
>>>>> totalbytes and so we have to check its validity in the kernel.
>>>>> 
>>>>> Also as in the fstrim(8) [1] the kernel layers may modify the trim range.
>>>>> 
>>>>> [1]
>>>>> Further, the kernel block layer reserves the right to adjust the discard
>>>>> ranges to fit raid stripe geometry, non-trim capable devices in a LVM
>>>>> setup, etc. These reductions would not be reflected in fstrim_range.len
>>>>> (the --length option).
>>>>> 
>>>>> This patch undos the deleted range::start validity check.
>>>>> 
>>>>> Fixes: 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole filesystem)
>>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>>>> ---
>>>>> With this patch fstests generic/260 is successful now.
>>>>> 
>>>>> fs/btrfs/ioctl.c | 2 ++
>>>>> 1 file changed, 2 insertions(+)
>>>>> 
>>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>>> index b431f7877e88..9345fcdf80c7 100644
>>>>> --- a/fs/btrfs/ioctl.c
>>>>> +++ b/fs/btrfs/ioctl.c
>>>>> @@ -521,6 +521,8 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
>>>>>               return -EOPNOTSUPP;
>>>>>       if (copy_from_user(&range, arg, sizeof(range)))
>>>>>               return -EFAULT;
>>>>> +       if (range.start > btrfs_super_total_bytes(fs_info->super_copy))
>>>>> +               return -EINVAL;
>>>> 
>>>> This makes it impossible to trim block groups that start at an offset
>>>> greater then the super_total_bytes values.
>>> 
>> 
>> what did I miss? As there is no limit on the range::length
>> so we can still trim beyond super_total_bytes. So I don’t
>> understand why not? More below.
>> 
>> 
>>> Exactly.
>>> 
>>>> In fact, in extreme cases
>>>> it's possible all block groups start at offsets larger then
>>>> super_total_bytes.
>>>> Have you considered that, or am I missing something?
>>> 
>>> And, I have already mentioned exactly the same reason in that commit
>>> message.
>>> 
>>> To address it once again, the bytenr is btrfs logical address space, has
>>> nothing to do with any device.
>>> Thus it can be [0, U64MAX].
>>> 
>> 
>> Fundamentally, logical address space has no relevance in the user context,
>> so also I don’t understand your view on how anyone shall use the range::start
>> even if there is no check?
> 
> range::start == bg_bytenr, range::len = bg_len to trim only a bg.
> 

 Thanks for the efforts in explaining.

 My point is- it should not be one single bg but it should rather be all bg(s) in the specified range [start, length] so the %range.start=0 and %range.length=<U64MAX/total_bytes> should trim all the bg(s). May be your next question is- as we relocate the chunks how would the user ever know correct range.start to use? for which I don’t have an answer and the same question again applies to your proposal range.start=[0 to U64MAX] as well.

 So I am asking you again, even if you allow range.start=[0 to U64MAX] how will the user use it? Can you please explain?


> And that bg_bytenr is at 128T, since the fs has gone through several
> balance.
> But there is only one device, and its size is only 1T.
> 
> Please tell me how to trim that block group only?
> 

 Block groups are something internal the users don’t have to worry about it. The range is [0 to totalbytes] for start and [0 to U64MAX] for len is fair.

>> 
>> As in the man page it's ok to adjust the range internally, and as length
>> can be up to U64MAX we can still trim beyond super_total_bytes?
> 
> As I said already, super_total_bytes makes no sense in logical address
> space.

 But super_total_bytes makes sense in the user land though, on the other hand logical address space which you are trying to expose to the user land does not make sense to me.

> As super_total_bytes is just the sum of all devices, it's a device layer
> thing, nothing to do with logical address space.
> 
> You're mixing logical bytenr with something not even a device physical
> offset, how can it be correct?
> 
> Let me make it more clear, btrfs has its own logical address space
> unrelated to whatever the devices mapping are.
> It's always [0, U64_MAX], no matter how many devices there are.
> 
> If btrfs is implemented using dm, it should be more clear.
> 
> (single device btrfs)
>          |
> (dm linear, 0 ~ U64_MAX, virtual devices)<- that's logical address space
>  |   |   |    |
>  |   |   |    \- (dm raid1, 1T ~ 1T + 128M, devid1 XXX, devid2 XXX)
>  |   |   \------ (dm raid0, 2T ~ 2T + 1G, devid1 XXX, devid2 XXX)
>  |   \---------- (dm raid1, 128G ~ 128G + 128M, devi1 XXX, devid xxx)
>  \-------------- (dm raid0, 1M ~ 1M + 1G, devid1 xxx, devid2 xxx).
> 
> If we're trim such fs layout, you tell me which offset you should use.
> 

 There is no perfect solution, the nearest solution I can think - map range.start and range.len to the physical disk range and search and discard free spaces in that range. This idea may be ok for raids/linear profiles, but again as btrfs can relocate the chunks its not perfect.

Thanks, Anand


> Thanks,
> Qu
> 
>> 
>> Thanks, Anand
>> 
>> 
>>> Thanks,
>>> Qu
>>> 
>>>> 
>>>> The change log is also vague to me, doesn't explain why you are
>>>> re-adding that check.
>>>> 
>>>> Thanks.
>>>> 
>>>>> 
>>>>>       /*
>>>>>        * NOTE: Don't truncate the range using super->total_bytes.  Bytenr of
>>>>> --
>>>>> 2.21.0 (Apple Git-120)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: trim: fix range start validity check
  2019-08-08  3:53         ` Anand Jain
@ 2019-08-08  5:55           ` Qu Wenruo
  2019-08-08  8:34             ` Anand Jain
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2019-08-08  5:55 UTC (permalink / raw)
  To: Anand Jain; +Cc: fdmanana, linux-btrfs, Qu Wenruo

[-- Attachment #1.1: Type: text/plain, Size: 5331 bytes --]

[...]
>>>
>>> Fundamentally, logical address space has no relevance in the user context,
>>> so also I don’t understand your view on how anyone shall use the range::start
>>> even if there is no check?
>>
>> range::start == bg_bytenr, range::len = bg_len to trim only a bg.
>>
> 
>  Thanks for the efforts in explaining.
> 
>  My point is- it should not be one single bg but it should rather be all bg(s) in the specified range [start, length] so the %range.start=0 and %range.length=<U64MAX/total_bytes> should trim all the bg(s).

That's the common usage, but it doesn't mean it's the only usage.

Above bg range trim is also a valid use case.

> May be your next question is- as we relocate the chunks how would the user ever know correct range.start to use? for which I don’t have an answer and the same question again applies to your proposal range.start=[0 to U64MAX] as well.
> 
>  So I am asking you again, even if you allow range.start=[0 to U64MAX] how will the user use it? Can you please explain?

There are a lot of tools to show the bg bytenr and usage of each bg.
It isn't a problem at all.

> 
> 
>> And that bg_bytenr is at 128T, since the fs has gone through several
>> balance.
>> But there is only one device, and its size is only 1T.
>>
>> Please tell me how to trim that block group only?
>>
> 
>  Block groups are something internal the users don’t have to worry about it. The range is [0 to totalbytes] for start and [0 to U64MAX] for len is fair.

Nope, users sometimes care. Especially for the usage of each bg.

Furthermore, we have vusage/vrange filter for balance, so user is not
blocked from the whole bg thing.

> 
>>>
>>> As in the man page it's ok to adjust the range internally, and as length
>>> can be up to U64MAX we can still trim beyond super_total_bytes?
>>
>> As I said already, super_total_bytes makes no sense in logical address
>> space.
> 
>  But super_total_bytes makes sense in the user land though, on the other hand logical address space which you are trying to expose to the user land does not make sense to me.

Nope, super_total_bytes in fact makes no sense under most cases.
It doesn't even shows the up limit of usable space. (E.g. For all RADI1
profiles, it's only half the space at most. Even for all SINGLE
profiles, it doesn't account the 1M reserved space).

It's a good thing to detect device list corruption, but despite that, it
really doesn't make much sense.

For logical address space, as explains, we have tools (not in
btrfs-progs though) and interface (balance vrange filter) to take use of
them.

> 
>> As super_total_bytes is just the sum of all devices, it's a device layer
>> thing, nothing to do with logical address space.
>>
>> You're mixing logical bytenr with something not even a device physical
>> offset, how can it be correct?
>>
>> Let me make it more clear, btrfs has its own logical address space
>> unrelated to whatever the devices mapping are.
>> It's always [0, U64_MAX], no matter how many devices there are.
>>
>> If btrfs is implemented using dm, it should be more clear.
>>
>> (single device btrfs)
>>          |
>> (dm linear, 0 ~ U64_MAX, virtual devices)<- that's logical address space
>>  |   |   |    |
>>  |   |   |    \- (dm raid1, 1T ~ 1T + 128M, devid1 XXX, devid2 XXX)
>>  |   |   \------ (dm raid0, 2T ~ 2T + 1G, devid1 XXX, devid2 XXX)
>>  |   \---------- (dm raid1, 128G ~ 128G + 128M, devi1 XXX, devid xxx)
>>  \-------------- (dm raid0, 1M ~ 1M + 1G, devid1 xxx, devid2 xxx).
>>
>> If we're trim such fs layout, you tell me which offset you should use.
>>
> 
>  There is no perfect solution, the nearest solution I can think - map range.start and range.len to the physical disk range and search and discard free spaces in that range.

Nope, that's way worse than current behavior.
See the above example, how did you pass devid? Above case is using RAID0
and RAID1 on two devices, how do you handle that?
Furthermore, btrfs can have different devices sizes for RAID profiles,
how to handle that them? Using super total bytes would easily exceed
every devices boundary.

Yes, the current behavior is not the perfect solution either, but you're
attacking from the wrong direction.
In fact, for allocated bgs, the current behavior is the best solution,
you can choose to trim any range and you have the tool like Hans'
python-btrfs.

The not-so-perfect part is about the unallocated range.
IIRC things like thin-provision LVM choose not to trim the unallocated
part, while btrfs choose to trim all the unallocated part.

If you're arguing how btrfs handles unallocated space, I have no word to
defend at all. But for the logical address part? I can't have more words
to spare.

Thanks,
Qu

> This idea may be ok for raids/linear profiles, but again as btrfs can relocate the chunks its not perfect.
> 
> Thanks, Anand
> 
> 
>> Thanks,
>> Qu
>>
>>>
>>> Thanks, Anand
>>>
>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>>
>>>>> The change log is also vague to me, doesn't explain why you are
>>>>> re-adding that check.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>>
>>>>>>       /*
>>>>>>        * NOTE: Don't truncate the range using super->total_bytes.  Bytenr of
>>>>>> --
>>>>>> 2.21.0 (Apple Git-120)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: trim: fix range start validity check
  2019-08-08  5:55           ` Qu Wenruo
@ 2019-08-08  8:34             ` Anand Jain
  2019-08-08  8:40               ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Anand Jain @ 2019-08-08  8:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Anand Jain, fdmanana, linux-btrfs, Qu Wenruo



> On 8 Aug 2019, at 1:55 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> 
> [...]
>>>> 
>>>> Fundamentally, logical address space has no relevance in the user context,
>>>> so also I don’t understand your view on how anyone shall use the range::start
>>>> even if there is no check?
>>> 
>>> range::start == bg_bytenr, range::len = bg_len to trim only a bg.
>>> 
>> 
>> Thanks for the efforts in explaining.
>> 
>> My point is- it should not be one single bg but it should rather be all bg(s) in the specified range [start, length] so the %range.start=0 and %range.length=<U64MAX/total_bytes> should trim all the bg(s).
> 
> That's the common usage, but it doesn't mean it's the only usage.

 Oh you are right. man page doesn’t restrict range.start to be within super_total_bytes. It's only generic/260 that it is trying to enforce.

> Above bg range trim is also a valid use case.
> 
>> May be your next question is- as we relocate the chunks how would the user ever know correct range.start to use? for which I don’t have an answer and the same question again applies to your proposal range.start=[0 to U64MAX] as well.
>> 
>> So I am asking you again, even if you allow range.start=[0 to U64MAX] how will the user use it? Can you please explain?
> 
> There are a lot of tools to show the bg bytenr and usage of each bg.
> It isn't a problem at all.
> 

External tools sounds better than some logic within kernel to perform such a transformations. Now I get your idea. My bad.

I am withdrawing this patch.

Thanks, Anand

>> 
>> 
>>> And that bg_bytenr is at 128T, since the fs has gone through several
>>> balance.
>>> But there is only one device, and its size is only 1T.
>>> 
>>> Please tell me how to trim that block group only?
>>> 
>> 
>> Block groups are something internal the users don’t have to worry about it. The range is [0 to totalbytes] for start and [0 to U64MAX] for len is fair.
> 
> Nope, users sometimes care. Especially for the usage of each bg.
> 
> Furthermore, we have vusage/vrange filter for balance, so user is not
> blocked from the whole bg thing.
> 
>> 
>>>> 
>>>> As in the man page it's ok to adjust the range internally, and as length
>>>> can be up to U64MAX we can still trim beyond super_total_bytes?
>>> 
>>> As I said already, super_total_bytes makes no sense in logical address
>>> space.
>> 
>> But super_total_bytes makes sense in the user land though, on the other hand logical address space which you are trying to expose to the user land does not make sense to me.
> 
> Nope, super_total_bytes in fact makes no sense under most cases.
> It doesn't even shows the up limit of usable space. (E.g. For all RADI1
> profiles, it's only half the space at most. Even for all SINGLE
> profiles, it doesn't account the 1M reserved space).
> 
> It's a good thing to detect device list corruption, but despite that, it
> really doesn't make much sense.
> 
> For logical address space, as explains, we have tools (not in
> btrfs-progs though) and interface (balance vrange filter) to take use of
> them.
> 
>> 
>>> As super_total_bytes is just the sum of all devices, it's a device layer
>>> thing, nothing to do with logical address space.
>>> 
>>> You're mixing logical bytenr with something not even a device physical
>>> offset, how can it be correct?
>>> 
>>> Let me make it more clear, btrfs has its own logical address space
>>> unrelated to whatever the devices mapping are.
>>> It's always [0, U64_MAX], no matter how many devices there are.
>>> 
>>> If btrfs is implemented using dm, it should be more clear.
>>> 
>>> (single device btrfs)
>>>         |
>>> (dm linear, 0 ~ U64_MAX, virtual devices)<- that's logical address space
>>> |   |   |    |
>>> |   |   |    \- (dm raid1, 1T ~ 1T + 128M, devid1 XXX, devid2 XXX)
>>> |   |   \------ (dm raid0, 2T ~ 2T + 1G, devid1 XXX, devid2 XXX)
>>> |   \---------- (dm raid1, 128G ~ 128G + 128M, devi1 XXX, devid xxx)
>>> \-------------- (dm raid0, 1M ~ 1M + 1G, devid1 xxx, devid2 xxx).
>>> 
>>> If we're trim such fs layout, you tell me which offset you should use.
>>> 
>> 
>> There is no perfect solution, the nearest solution I can think - map range.start and range.len to the physical disk range and search and discard free spaces in that range.
> 
> Nope, that's way worse than current behavior.
> See the above example, how did you pass devid? Above case is using RAID0
> and RAID1 on two devices, how do you handle that?
> Furthermore, btrfs can have different devices sizes for RAID profiles,
> how to handle that them? Using super total bytes would easily exceed
> every devices boundary.
> 
> Yes, the current behavior is not the perfect solution either, but you're
> attacking from the wrong direction.
> In fact, for allocated bgs, the current behavior is the best solution,
> you can choose to trim any range and you have the tool like Hans'
> python-btrfs.
> 
> The not-so-perfect part is about the unallocated range.
> IIRC things like thin-provision LVM choose not to trim the unallocated
> part, while btrfs choose to trim all the unallocated part.
> 
> If you're arguing how btrfs handles unallocated space, I have no word to
> defend at all. But for the logical address part? I can't have more words
> to spare.
> 
> Thanks,
> Qu
> 
>> This idea may be ok for raids/linear profiles, but again as btrfs can relocate the chunks its not perfect.
>> 
>> Thanks, Anand
>> 
>> 
>>> Thanks,
>>> Qu
>>> 
>>>> 
>>>> Thanks, Anand
>>>> 
>>>> 
>>>>> Thanks,
>>>>> Qu
>>>>> 
>>>>>> 
>>>>>> The change log is also vague to me, doesn't explain why you are
>>>>>> re-adding that check.
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>>> 
>>>>>>>      /*
>>>>>>>       * NOTE: Don't truncate the range using super->total_bytes.  Bytenr of
>>>>>>> --
>>>>>>> 2.21.0 (Apple Git-120)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: trim: fix range start validity check
  2019-08-08  8:34             ` Anand Jain
@ 2019-08-08  8:40               ` Qu Wenruo
  0 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2019-08-08  8:40 UTC (permalink / raw)
  To: Anand Jain; +Cc: fdmanana, linux-btrfs, Qu Wenruo

[-- Attachment #1.1: Type: text/plain, Size: 6233 bytes --]



On 2019/8/8 下午4:34, Anand Jain wrote:
> 
> 
>> On 8 Aug 2019, at 1:55 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>> [...]
>>>>>
>>>>> Fundamentally, logical address space has no relevance in the user context,
>>>>> so also I don’t understand your view on how anyone shall use the range::start
>>>>> even if there is no check?
>>>>
>>>> range::start == bg_bytenr, range::len = bg_len to trim only a bg.
>>>>
>>>
>>> Thanks for the efforts in explaining.
>>>
>>> My point is- it should not be one single bg but it should rather be all bg(s) in the specified range [start, length] so the %range.start=0 and %range.length=<U64MAX/total_bytes> should trim all the bg(s).
>>
>> That's the common usage, but it doesn't mean it's the only usage.
> 
>  Oh you are right. man page doesn’t restrict range.start to be within super_total_bytes. It's only generic/260 that it is trying to enforce.

This reminds me again about the test case update of generic/260.

It looks like my previous update is not yet merged...

Thanks,
Qu

> 
>> Above bg range trim is also a valid use case.
>>
>>> May be your next question is- as we relocate the chunks how would the user ever know correct range.start to use? for which I don’t have an answer and the same question again applies to your proposal range.start=[0 to U64MAX] as well.
>>>
>>> So I am asking you again, even if you allow range.start=[0 to U64MAX] how will the user use it? Can you please explain?
>>
>> There are a lot of tools to show the bg bytenr and usage of each bg.
>> It isn't a problem at all.
>>
> 
> External tools sounds better than some logic within kernel to perform such a transformations. Now I get your idea. My bad.
> 
> I am withdrawing this patch.
> 
> Thanks, Anand
> 
>>>
>>>
>>>> And that bg_bytenr is at 128T, since the fs has gone through several
>>>> balance.
>>>> But there is only one device, and its size is only 1T.
>>>>
>>>> Please tell me how to trim that block group only?
>>>>
>>>
>>> Block groups are something internal the users don’t have to worry about it. The range is [0 to totalbytes] for start and [0 to U64MAX] for len is fair.
>>
>> Nope, users sometimes care. Especially for the usage of each bg.
>>
>> Furthermore, we have vusage/vrange filter for balance, so user is not
>> blocked from the whole bg thing.
>>
>>>
>>>>>
>>>>> As in the man page it's ok to adjust the range internally, and as length
>>>>> can be up to U64MAX we can still trim beyond super_total_bytes?
>>>>
>>>> As I said already, super_total_bytes makes no sense in logical address
>>>> space.
>>>
>>> But super_total_bytes makes sense in the user land though, on the other hand logical address space which you are trying to expose to the user land does not make sense to me.
>>
>> Nope, super_total_bytes in fact makes no sense under most cases.
>> It doesn't even shows the up limit of usable space. (E.g. For all RADI1
>> profiles, it's only half the space at most. Even for all SINGLE
>> profiles, it doesn't account the 1M reserved space).
>>
>> It's a good thing to detect device list corruption, but despite that, it
>> really doesn't make much sense.
>>
>> For logical address space, as explains, we have tools (not in
>> btrfs-progs though) and interface (balance vrange filter) to take use of
>> them.
>>
>>>
>>>> As super_total_bytes is just the sum of all devices, it's a device layer
>>>> thing, nothing to do with logical address space.
>>>>
>>>> You're mixing logical bytenr with something not even a device physical
>>>> offset, how can it be correct?
>>>>
>>>> Let me make it more clear, btrfs has its own logical address space
>>>> unrelated to whatever the devices mapping are.
>>>> It's always [0, U64_MAX], no matter how many devices there are.
>>>>
>>>> If btrfs is implemented using dm, it should be more clear.
>>>>
>>>> (single device btrfs)
>>>>         |
>>>> (dm linear, 0 ~ U64_MAX, virtual devices)<- that's logical address space
>>>> |   |   |    |
>>>> |   |   |    \- (dm raid1, 1T ~ 1T + 128M, devid1 XXX, devid2 XXX)
>>>> |   |   \------ (dm raid0, 2T ~ 2T + 1G, devid1 XXX, devid2 XXX)
>>>> |   \---------- (dm raid1, 128G ~ 128G + 128M, devi1 XXX, devid xxx)
>>>> \-------------- (dm raid0, 1M ~ 1M + 1G, devid1 xxx, devid2 xxx).
>>>>
>>>> If we're trim such fs layout, you tell me which offset you should use.
>>>>
>>>
>>> There is no perfect solution, the nearest solution I can think - map range.start and range.len to the physical disk range and search and discard free spaces in that range.
>>
>> Nope, that's way worse than current behavior.
>> See the above example, how did you pass devid? Above case is using RAID0
>> and RAID1 on two devices, how do you handle that?
>> Furthermore, btrfs can have different devices sizes for RAID profiles,
>> how to handle that them? Using super total bytes would easily exceed
>> every devices boundary.
>>
>> Yes, the current behavior is not the perfect solution either, but you're
>> attacking from the wrong direction.
>> In fact, for allocated bgs, the current behavior is the best solution,
>> you can choose to trim any range and you have the tool like Hans'
>> python-btrfs.
>>
>> The not-so-perfect part is about the unallocated range.
>> IIRC things like thin-provision LVM choose not to trim the unallocated
>> part, while btrfs choose to trim all the unallocated part.
>>
>> If you're arguing how btrfs handles unallocated space, I have no word to
>> defend at all. But for the logical address part? I can't have more words
>> to spare.
>>
>> Thanks,
>> Qu
>>
>>> This idea may be ok for raids/linear profiles, but again as btrfs can relocate the chunks its not perfect.
>>>
>>> Thanks, Anand
>>>
>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>>
>>>>> Thanks, Anand
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Qu
>>>>>>
>>>>>>>
>>>>>>> The change log is also vague to me, doesn't explain why you are
>>>>>>> re-adding that check.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>>
>>>>>>>>      /*
>>>>>>>>       * NOTE: Don't truncate the range using super->total_bytes.  Bytenr of
>>>>>>>> --
>>>>>>>> 2.21.0 (Apple Git-120)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-07  8:20 [PATCH] btrfs: trim: fix range start validity check Anand Jain
2019-08-07  8:44 ` Filipe Manana
2019-08-07  8:55   ` Qu Wenruo
2019-08-07  9:43     ` Anand Jain
2019-08-07 11:15       ` Qu Wenruo
2019-08-08  3:53         ` Anand Jain
2019-08-08  5:55           ` Qu Wenruo
2019-08-08  8:34             ` Anand Jain
2019-08-08  8:40               ` Qu Wenruo

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox