Re: [PATCH] btrfs: trim: fix range start validity check

From: Anand Jain <anand.jain@oracle.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Anand Jain <anand.jain@oracle.com>,
	fdmanana@gmail.com, linux-btrfs <linux-btrfs@vger.kernel.org>,
	Qu Wenruo <wqu@suse.com>
Subject: Re: [PATCH] btrfs: trim: fix range start validity check
Date: Thu, 8 Aug 2019 16:34:59 +0800	[thread overview]
Message-ID: <F6A06BCB-D2A7-4B9F-A6F4-04498572AEF5@oracle.com> (raw)
In-Reply-To: <8f289892-1977-6f52-3585-8d7dcbd4d54a@gmx.com>

> On 8 Aug 2019, at 1:55 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> 
> [...]
>>>> 
>>>> Fundamentally, logical address space has no relevance in the user context,
>>>> so also I don’t understand your view on how anyone shall use the range::start
>>>> even if there is no check?
>>> 
>>> range::start == bg_bytenr, range::len = bg_len to trim only a bg.
>>> 
>> 
>> Thanks for the efforts in explaining.
>> 
>> My point is- it should not be one single bg but it should rather be all bg(s) in the specified range [start, length] so the %range.start=0 and %range.length=<U64MAX/total_bytes> should trim all the bg(s).
> 
> That's the common usage, but it doesn't mean it's the only usage.

 Oh you are right. man page doesn’t restrict range.start to be within super_total_bytes. It's only generic/260 that it is trying to enforce.

> Above bg range trim is also a valid use case.
> 
>> May be your next question is- as we relocate the chunks how would the user ever know correct range.start to use? for which I don’t have an answer and the same question again applies to your proposal range.start=[0 to U64MAX] as well.
>> 
>> So I am asking you again, even if you allow range.start=[0 to U64MAX] how will the user use it? Can you please explain?
> 
> There are a lot of tools to show the bg bytenr and usage of each bg.
> It isn't a problem at all.
> 

External tools sounds better than some logic within kernel to perform such a transformations. Now I get your idea. My bad.

I am withdrawing this patch.

Thanks, Anand

>> 
>> 
>>> And that bg_bytenr is at 128T, since the fs has gone through several
>>> balance.
>>> But there is only one device, and its size is only 1T.
>>> 
>>> Please tell me how to trim that block group only?
>>> 
>> 
>> Block groups are something internal the users don’t have to worry about it. The range is [0 to totalbytes] for start and [0 to U64MAX] for len is fair.
> 
> Nope, users sometimes care. Especially for the usage of each bg.
> 
> Furthermore, we have vusage/vrange filter for balance, so user is not
> blocked from the whole bg thing.
> 
>> 
>>>> 
>>>> As in the man page it's ok to adjust the range internally, and as length
>>>> can be up to U64MAX we can still trim beyond super_total_bytes?
>>> 
>>> As I said already, super_total_bytes makes no sense in logical address
>>> space.
>> 
>> But super_total_bytes makes sense in the user land though, on the other hand logical address space which you are trying to expose to the user land does not make sense to me.
> 
> Nope, super_total_bytes in fact makes no sense under most cases.
> It doesn't even shows the up limit of usable space. (E.g. For all RADI1
> profiles, it's only half the space at most. Even for all SINGLE
> profiles, it doesn't account the 1M reserved space).
> 
> It's a good thing to detect device list corruption, but despite that, it
> really doesn't make much sense.
> 
> For logical address space, as explains, we have tools (not in
> btrfs-progs though) and interface (balance vrange filter) to take use of
> them.
> 
>> 
>>> As super_total_bytes is just the sum of all devices, it's a device layer
>>> thing, nothing to do with logical address space.
>>> 
>>> You're mixing logical bytenr with something not even a device physical
>>> offset, how can it be correct?
>>> 
>>> Let me make it more clear, btrfs has its own logical address space
>>> unrelated to whatever the devices mapping are.
>>> It's always [0, U64_MAX], no matter how many devices there are.
>>> 
>>> If btrfs is implemented using dm, it should be more clear.
>>> 
>>> (single device btrfs)
>>>         |
>>> (dm linear, 0 ~ U64_MAX, virtual devices)<- that's logical address space
>>> |   |   |    |
>>> |   |   |    \- (dm raid1, 1T ~ 1T + 128M, devid1 XXX, devid2 XXX)
>>> |   |   \------ (dm raid0, 2T ~ 2T + 1G, devid1 XXX, devid2 XXX)
>>> |   \---------- (dm raid1, 128G ~ 128G + 128M, devi1 XXX, devid xxx)
>>> \-------------- (dm raid0, 1M ~ 1M + 1G, devid1 xxx, devid2 xxx).
>>> 
>>> If we're trim such fs layout, you tell me which offset you should use.
>>> 
>> 
>> There is no perfect solution, the nearest solution I can think - map range.start and range.len to the physical disk range and search and discard free spaces in that range.
> 
> Nope, that's way worse than current behavior.
> See the above example, how did you pass devid? Above case is using RAID0
> and RAID1 on two devices, how do you handle that?
> Furthermore, btrfs can have different devices sizes for RAID profiles,
> how to handle that them? Using super total bytes would easily exceed
> every devices boundary.
> 
> Yes, the current behavior is not the perfect solution either, but you're
> attacking from the wrong direction.
> In fact, for allocated bgs, the current behavior is the best solution,
> you can choose to trim any range and you have the tool like Hans'
> python-btrfs.
> 
> The not-so-perfect part is about the unallocated range.
> IIRC things like thin-provision LVM choose not to trim the unallocated
> part, while btrfs choose to trim all the unallocated part.
> 
> If you're arguing how btrfs handles unallocated space, I have no word to
> defend at all. But for the logical address part? I can't have more words
> to spare.
> 
> Thanks,
> Qu
> 
>> This idea may be ok for raids/linear profiles, but again as btrfs can relocate the chunks its not perfect.
>> 
>> Thanks, Anand
>> 
>> 
>>> Thanks,
>>> Qu
>>> 
>>>> 
>>>> Thanks, Anand
>>>> 
>>>> 
>>>>> Thanks,
>>>>> Qu
>>>>> 
>>>>>> 
>>>>>> The change log is also vague to me, doesn't explain why you are
>>>>>> re-adding that check.
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>>> 
>>>>>>>      /*
>>>>>>>       * NOTE: Don't truncate the range using super->total_bytes.  Bytenr of
>>>>>>> --
>>>>>>> 2.21.0 (Apple Git-120)