linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Anand Jain <anand.jain@oracle.com>,
	fdmanana@gmail.com, linux-btrfs <linux-btrfs@vger.kernel.org>,
	Qu Wenruo <wqu@suse.com>
Subject: Re: [PATCH] btrfs: trim: fix range start validity check
Date: Thu, 8 Aug 2019 16:34:59 +0800	[thread overview]
Message-ID: <F6A06BCB-D2A7-4B9F-A6F4-04498572AEF5@oracle.com> (raw)
In-Reply-To: <8f289892-1977-6f52-3585-8d7dcbd4d54a@gmx.com>



> On 8 Aug 2019, at 1:55 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> 
> [...]
>>>> 
>>>> Fundamentally, logical address space has no relevance in the user context,
>>>> so also I don’t understand your view on how anyone shall use the range::start
>>>> even if there is no check?
>>> 
>>> range::start == bg_bytenr, range::len = bg_len to trim only a bg.
>>> 
>> 
>> Thanks for the efforts in explaining.
>> 
>> My point is- it should not be one single bg but it should rather be all bg(s) in the specified range [start, length] so the %range.start=0 and %range.length=<U64MAX/total_bytes> should trim all the bg(s).
> 
> That's the common usage, but it doesn't mean it's the only usage.

 Oh you are right. man page doesn’t restrict range.start to be within super_total_bytes. It's only generic/260 that it is trying to enforce.

> Above bg range trim is also a valid use case.
> 
>> May be your next question is- as we relocate the chunks how would the user ever know correct range.start to use? for which I don’t have an answer and the same question again applies to your proposal range.start=[0 to U64MAX] as well.
>> 
>> So I am asking you again, even if you allow range.start=[0 to U64MAX] how will the user use it? Can you please explain?
> 
> There are a lot of tools to show the bg bytenr and usage of each bg.
> It isn't a problem at all.
> 

External tools sounds better than some logic within kernel to perform such a transformations. Now I get your idea. My bad.

I am withdrawing this patch.

Thanks, Anand

>> 
>> 
>>> And that bg_bytenr is at 128T, since the fs has gone through several
>>> balance.
>>> But there is only one device, and its size is only 1T.
>>> 
>>> Please tell me how to trim that block group only?
>>> 
>> 
>> Block groups are something internal the users don’t have to worry about it. The range is [0 to totalbytes] for start and [0 to U64MAX] for len is fair.
> 
> Nope, users sometimes care. Especially for the usage of each bg.
> 
> Furthermore, we have vusage/vrange filter for balance, so user is not
> blocked from the whole bg thing.
> 
>> 
>>>> 
>>>> As in the man page it's ok to adjust the range internally, and as length
>>>> can be up to U64MAX we can still trim beyond super_total_bytes?
>>> 
>>> As I said already, super_total_bytes makes no sense in logical address
>>> space.
>> 
>> But super_total_bytes makes sense in the user land though, on the other hand logical address space which you are trying to expose to the user land does not make sense to me.
> 
> Nope, super_total_bytes in fact makes no sense under most cases.
> It doesn't even shows the up limit of usable space. (E.g. For all RADI1
> profiles, it's only half the space at most. Even for all SINGLE
> profiles, it doesn't account the 1M reserved space).
> 
> It's a good thing to detect device list corruption, but despite that, it
> really doesn't make much sense.
> 
> For logical address space, as explains, we have tools (not in
> btrfs-progs though) and interface (balance vrange filter) to take use of
> them.
> 
>> 
>>> As super_total_bytes is just the sum of all devices, it's a device layer
>>> thing, nothing to do with logical address space.
>>> 
>>> You're mixing logical bytenr with something not even a device physical
>>> offset, how can it be correct?
>>> 
>>> Let me make it more clear, btrfs has its own logical address space
>>> unrelated to whatever the devices mapping are.
>>> It's always [0, U64_MAX], no matter how many devices there are.
>>> 
>>> If btrfs is implemented using dm, it should be more clear.
>>> 
>>> (single device btrfs)
>>>         |
>>> (dm linear, 0 ~ U64_MAX, virtual devices)<- that's logical address space
>>> |   |   |    |
>>> |   |   |    \- (dm raid1, 1T ~ 1T + 128M, devid1 XXX, devid2 XXX)
>>> |   |   \------ (dm raid0, 2T ~ 2T + 1G, devid1 XXX, devid2 XXX)
>>> |   \---------- (dm raid1, 128G ~ 128G + 128M, devi1 XXX, devid xxx)
>>> \-------------- (dm raid0, 1M ~ 1M + 1G, devid1 xxx, devid2 xxx).
>>> 
>>> If we're trim such fs layout, you tell me which offset you should use.
>>> 
>> 
>> There is no perfect solution, the nearest solution I can think - map range.start and range.len to the physical disk range and search and discard free spaces in that range.
> 
> Nope, that's way worse than current behavior.
> See the above example, how did you pass devid? Above case is using RAID0
> and RAID1 on two devices, how do you handle that?
> Furthermore, btrfs can have different devices sizes for RAID profiles,
> how to handle that them? Using super total bytes would easily exceed
> every devices boundary.
> 
> Yes, the current behavior is not the perfect solution either, but you're
> attacking from the wrong direction.
> In fact, for allocated bgs, the current behavior is the best solution,
> you can choose to trim any range and you have the tool like Hans'
> python-btrfs.
> 
> The not-so-perfect part is about the unallocated range.
> IIRC things like thin-provision LVM choose not to trim the unallocated
> part, while btrfs choose to trim all the unallocated part.
> 
> If you're arguing how btrfs handles unallocated space, I have no word to
> defend at all. But for the logical address part? I can't have more words
> to spare.
> 
> Thanks,
> Qu
> 
>> This idea may be ok for raids/linear profiles, but again as btrfs can relocate the chunks its not perfect.
>> 
>> Thanks, Anand
>> 
>> 
>>> Thanks,
>>> Qu
>>> 
>>>> 
>>>> Thanks, Anand
>>>> 
>>>> 
>>>>> Thanks,
>>>>> Qu
>>>>> 
>>>>>> 
>>>>>> The change log is also vague to me, doesn't explain why you are
>>>>>> re-adding that check.
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>>> 
>>>>>>>      /*
>>>>>>>       * NOTE: Don't truncate the range using super->total_bytes.  Bytenr of
>>>>>>> --
>>>>>>> 2.21.0 (Apple Git-120)


  reply	other threads:[~2019-08-08  8:35 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-07  8:20 [PATCH] btrfs: trim: fix range start validity check Anand Jain
2019-08-07  8:44 ` Filipe Manana
2019-08-07  8:55   ` Qu Wenruo
2019-08-07  9:43     ` Anand Jain
2019-08-07 11:15       ` Qu Wenruo
2019-08-08  3:53         ` Anand Jain
2019-08-08  5:55           ` Qu Wenruo
2019-08-08  8:34             ` Anand Jain [this message]
2019-08-08  8:40               ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F6A06BCB-D2A7-4B9F-A6F4-04498572AEF5@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=fdmanana@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).