linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Report correct filesystem usage / limits on BTRFS subvolumes with quota
@ 2018-07-31 13:49 Thomas Leister
  2018-07-31 14:32 ` Qu Wenruo
  2018-08-14  2:49 ` Jeff Mahoney
  0 siblings, 2 replies; 26+ messages in thread
From: Thomas Leister @ 2018-07-31 13:49 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs, lxc-devel

Dear David,
hello everyone,

during a recent project of mine involving LXD and BTRFS I found out that
quotas on BTRFS subvolumes are enforced, but file system usage and
limits set via quotas are not reported correctly in LXC containers.

I've found this discussion regarding my problem:
https://github.com/lxc/lxd/issues/2180

There was already a proposal to introduce subvolume quota support some
time ago:
https://marc.info/?l=linux-btrfs&m=147576434114415&w=2

@David as I've seen your response on that topic on the mailing list,
maybe you can tell me if there are any plans to support correct
subvolume quota reporting e.g. for "df -h" calls from within a
container? Maybe there's already something on your / SUSE's roadmap? :-)

As more and more container environments spin up these days, there might
be a growing demand on that :-) Personally I'd really appreciate if I
could read the current file system usage and limit from within a
container using BTRFS as storage backend.

Best regards,
Thomas



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-07-31 13:49 Report correct filesystem usage / limits on BTRFS subvolumes with quota Thomas Leister
@ 2018-07-31 14:32 ` Qu Wenruo
  2018-07-31 16:03   ` Austin S. Hemmelgarn
  2018-08-09 17:48   ` Tomasz Pala
  2018-08-14  2:49 ` Jeff Mahoney
  1 sibling, 2 replies; 26+ messages in thread
From: Qu Wenruo @ 2018-07-31 14:32 UTC (permalink / raw)
  To: Thomas Leister, dsterba; +Cc: linux-btrfs, lxc-devel


[-- Attachment #1.1: Type: text/plain, Size: 2956 bytes --]



On 2018年07月31日 21:49, Thomas Leister wrote:
> Dear David,
> hello everyone,
> 
> during a recent project of mine involving LXD and BTRFS I found out that
> quotas on BTRFS subvolumes are enforced, but file system usage and
> limits set via quotas are not reported correctly in LXC containers.
> 
> I've found this discussion regarding my problem:
> https://github.com/lxc/lxd/issues/2180

That's not the expected usage of btrfs qgroup/quota.

Quota only accounts how many bytes are used exclusively or shared
between subvolumes at extent level.

> 
> There was already a proposal to introduce subvolume quota support some
> time ago:
> https://marc.info/?l=linux-btrfs&m=147576434114415&w=2

It's in fact impossible if I didn't miss something.

There are several technical problems in the proposal:

1) Multi-level qgroups
   The real limit is limited by all related qgroups, including higher
   level qgroup.
   Such design makes it pretty hard to calculation the real limit.

2) Different limitations on exclusive/shared bytes
   Btrfs can set different limit on exclusive/shared bytes, further
   complicating the problem.

3) Btrfs quota only accounts data/metadata used by the subvolume
   It lacks all the shared trees (mentioned below), and in fact such
   shared tree can be pretty large (especially for extent tree and csum
   tree).
   Only accounting quota limit would hit real ENOSPC easily IMHO.

> 
> @David as I've seen your response on that topic on the mailing list,
> maybe you can tell me if there are any plans to support correct
> subvolume quota reporting e.g. for "df -h" calls from within a
> container? Maybe there's already something on your / SUSE's roadmap? :-)
> 
> As more and more container environments spin up these days, there might
> be a growing demand on that :-) Personally I'd really appreciate if I
> could read the current file system usage and limit from within a
> container using BTRFS as storage backend.

For current btrfs design, I think it's skeptical to implement such design.
The main problem here is, btrfs doesn't do the full LVM work. (unlike
ZFS IIRC)
It doesn't really manage multiple volumes, that's why it's called
subvolume in btrfs.
A subvolume is not a fully usable fs, it's just a subset of a full fs.
It relies on all the other trees (root tree, extent tree, chunk tree,
csum tree, and quota tree in this case) to do all the work.
Thus it's pretty hard to implement such special purposed df call.

On the other hand, isn't easier to implement special interface for
container to get real disk usage/limit other than using the old vanilla
df interface?

Thanks,
Qu

> 
> Best regards,
> Thomas
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-07-31 14:32 ` Qu Wenruo
@ 2018-07-31 16:03   ` Austin S. Hemmelgarn
  2018-08-01  1:23     ` Qu Wenruo
  2018-08-09 17:48   ` Tomasz Pala
  1 sibling, 1 reply; 26+ messages in thread
From: Austin S. Hemmelgarn @ 2018-07-31 16:03 UTC (permalink / raw)
  To: Qu Wenruo, Thomas Leister, dsterba; +Cc: linux-btrfs, lxc-devel

On 2018-07-31 10:32, Qu Wenruo wrote:
> 
> 
> On 2018年07月31日 21:49, Thomas Leister wrote:
>> Dear David,
>> hello everyone,
>>
>> during a recent project of mine involving LXD and BTRFS I found out that
>> quotas on BTRFS subvolumes are enforced, but file system usage and
>> limits set via quotas are not reported correctly in LXC containers.
>>
>> I've found this discussion regarding my problem:
>> https://github.com/lxc/lxd/issues/2180
> 
> That's not the expected usage of btrfs qgroup/quota.
> 
> Quota only accounts how many bytes are used exclusively or shared
> between subvolumes at extent level.
> 
>>
>> There was already a proposal to introduce subvolume quota support some
>> time ago:
>> https://marc.info/?l=linux-btrfs&m=147576434114415&w=2
> 
> It's in fact impossible if I didn't miss something.
> 
> There are several technical problems in the proposal:
> 
> 1) Multi-level qgroups
>     The real limit is limited by all related qgroups, including higher
>     level qgroup.
>     Such design makes it pretty hard to calculation the real limit.
> 
> 2) Different limitations on exclusive/shared bytes
>     Btrfs can set different limit on exclusive/shared bytes, further
>     complicating the problem.
> 
> 3) Btrfs quota only accounts data/metadata used by the subvolume
>     It lacks all the shared trees (mentioned below), and in fact such
>     shared tree can be pretty large (especially for extent tree and csum
>     tree).
>     Only accounting quota limit would hit real ENOSPC easily IMHO.
> 
>>
>> @David as I've seen your response on that topic on the mailing list,
>> maybe you can tell me if there are any plans to support correct
>> subvolume quota reporting e.g. for "df -h" calls from within a
>> container? Maybe there's already something on your / SUSE's roadmap? :-)
>>
>> As more and more container environments spin up these days, there might
>> be a growing demand on that :-) Personally I'd really appreciate if I
>> could read the current file system usage and limit from within a
>> container using BTRFS as storage backend.
> 
> For current btrfs design, I think it's skeptical to implement such design.
> The main problem here is, btrfs doesn't do the full LVM work. (unlike
> ZFS IIRC)
> It doesn't really manage multiple volumes, that's why it's called
> subvolume in btrfs.
ZFS quotas work the way they do not because it's trivial to implement 
them that way due to the underlying implementation, but because they 
provide the functionality that people actually want.  Being able to put 
proper hard limits on space usage for a given volume/subvolume/dataset 
is _critical_ for a large number of enterprise deployment scenarios. 
Same goes for being able to put a fixed space reservation for a given 
volume/subvolume/dataset.  If we want to even remotely compete (and it 
sure seems like we do), we need equivalent features that work 
intuitively for _regular_ people (not those who have intimate 
understandings of the internal workings of BTRFS).

> A subvolume is not a fully usable fs, it's just a subset of a full fs.
> It relies on all the other trees (root tree, extent tree, chunk tree,
> csum tree, and quota tree in this case) to do all the work.
A ZFS dataset isn't a fully usable FS either.  It's still dependent on 
all the underlying infrastructure from the zpool itself (and so are 
zvols), which, in fact, does a vast majority of the work.  The 
difference here is that a ZFS dataset is far more self-contained than a 
BTRFS subvolume.  If we ever want sane per-subvolume storage profiles or 
mount options, we're going to need to get a lot closer to that anyway.

> Thus it's pretty hard to implement such special purposed df call.
To implement it perfectly maybe.  Except most applications don't need it 
to be perfect, they want to know how much space they can actually use. 
Even a trivial blatantly imperfect implementation that just shows you 
the total space that can be used and how much is used based on quotas 
will give better behavior that the current case of just hiding the 
quotas behind a root-only call.  Pretty much anything which does it's 
own disk usage management is currently broken on BTRFS when quotas are 
being used.  Just reporting the quota for the total space, and the space 
accounted to the subvolume by the quota would fix almost all such 
applications.
> 
> On the other hand, isn't easier to implement special interface for
> container to get real disk usage/limit other than using the old vanilla
> df interface?
This isn't just an issue for containers.  Anybody who is using quotas 
like they are typically used in ZFS deployments has the same issue, and 
there _ARE_ people doing that (see for example OpenSUSE, where they are 
using quotas (if they are enabled because of snapshot support) to limit 
space consumption of paths like /tmp).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-07-31 16:03   ` Austin S. Hemmelgarn
@ 2018-08-01  1:23     ` Qu Wenruo
  0 siblings, 0 replies; 26+ messages in thread
From: Qu Wenruo @ 2018-08-01  1:23 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Thomas Leister, dsterba; +Cc: linux-btrfs, lxc-devel


[-- Attachment #1.1: Type: text/plain, Size: 5624 bytes --]



On 2018年08月01日 00:03, Austin S. Hemmelgarn wrote:
> On 2018-07-31 10:32, Qu Wenruo wrote:
>>
>>
>> On 2018年07月31日 21:49, Thomas Leister wrote:
>>> Dear David,
>>> hello everyone,
>>>
>>> during a recent project of mine involving LXD and BTRFS I found out that
>>> quotas on BTRFS subvolumes are enforced, but file system usage and
>>> limits set via quotas are not reported correctly in LXC containers.
>>>
>>> I've found this discussion regarding my problem:
>>> https://github.com/lxc/lxd/issues/2180
>>
>> That's not the expected usage of btrfs qgroup/quota.
>>
>> Quota only accounts how many bytes are used exclusively or shared
>> between subvolumes at extent level.
>>
>>>
>>> There was already a proposal to introduce subvolume quota support some
>>> time ago:
>>> https://marc.info/?l=linux-btrfs&m=147576434114415&w=2
>>
>> It's in fact impossible if I didn't miss something.
>>
>> There are several technical problems in the proposal:
>>
>> 1) Multi-level qgroups
>>     The real limit is limited by all related qgroups, including higher
>>     level qgroup.
>>     Such design makes it pretty hard to calculation the real limit.
>>
>> 2) Different limitations on exclusive/shared bytes
>>     Btrfs can set different limit on exclusive/shared bytes, further
>>     complicating the problem.
>>
>> 3) Btrfs quota only accounts data/metadata used by the subvolume
>>     It lacks all the shared trees (mentioned below), and in fact such
>>     shared tree can be pretty large (especially for extent tree and csum
>>     tree).
>>     Only accounting quota limit would hit real ENOSPC easily IMHO.
>>
>>>
>>> @David as I've seen your response on that topic on the mailing list,
>>> maybe you can tell me if there are any plans to support correct
>>> subvolume quota reporting e.g. for "df -h" calls from within a
>>> container? Maybe there's already something on your / SUSE's roadmap? :-)
>>>
>>> As more and more container environments spin up these days, there might
>>> be a growing demand on that :-) Personally I'd really appreciate if I
>>> could read the current file system usage and limit from within a
>>> container using BTRFS as storage backend.
>>
>> For current btrfs design, I think it's skeptical to implement such
>> design.
>> The main problem here is, btrfs doesn't do the full LVM work. (unlike
>> ZFS IIRC)
>> It doesn't really manage multiple volumes, that's why it's called
>> subvolume in btrfs.
> ZFS quotas work the way they do not because it's trivial to implement
> them that way due to the underlying implementation, but because they
> provide the functionality that people actually want.  Being able to put
> proper hard limits on space usage for a given volume/subvolume/dataset
> is _critical_ for a large number of enterprise deployment scenarios.
> Same goes for being able to put a fixed space reservation for a given
> volume/subvolume/dataset.  If we want to even remotely compete (and it
> sure seems like we do), we need equivalent features that work
> intuitively for _regular_ people (not those who have intimate
> understandings of the internal workings of BTRFS).

Then, the design and use case of btrfs quota itself needs to be reworked
from the the very beginning.
At least get rid of the high level qgroups and exclusive/reference limit.

Or there will be no way to report in df using that 2 limits.

Thanks,
Qu

> 
>> A subvolume is not a fully usable fs, it's just a subset of a full fs.
>> It relies on all the other trees (root tree, extent tree, chunk tree,
>> csum tree, and quota tree in this case) to do all the work.
> A ZFS dataset isn't a fully usable FS either.  It's still dependent on
> all the underlying infrastructure from the zpool itself (and so are
> zvols), which, in fact, does a vast majority of the work.  The
> difference here is that a ZFS dataset is far more self-contained than a
> BTRFS subvolume.  If we ever want sane per-subvolume storage profiles or
> mount options, we're going to need to get a lot closer to that anyway.
> 
>> Thus it's pretty hard to implement such special purposed df call.
> To implement it perfectly maybe.  Except most applications don't need it
> to be perfect, they want to know how much space they can actually use.
> Even a trivial blatantly imperfect implementation that just shows you
> the total space that can be used and how much is used based on quotas
> will give better behavior that the current case of just hiding the
> quotas behind a root-only call.  Pretty much anything which does it's
> own disk usage management is currently broken on BTRFS when quotas are
> being used.  Just reporting the quota for the total space, and the space
> accounted to the subvolume by the quota would fix almost all such
> applications.
>>
>> On the other hand, isn't easier to implement special interface for
>> container to get real disk usage/limit other than using the old vanilla
>> df interface?
> This isn't just an issue for containers.  Anybody who is using quotas
> like they are typically used in ZFS deployments has the same issue, and
> there _ARE_ people doing that (see for example OpenSUSE, where they are
> using quotas (if they are enabled because of snapshot support) to limit
> space consumption of paths like /tmp).
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-07-31 14:32 ` Qu Wenruo
  2018-07-31 16:03   ` Austin S. Hemmelgarn
@ 2018-08-09 17:48   ` Tomasz Pala
  2018-08-09 23:35     ` Qu Wenruo
                       ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Tomasz Pala @ 2018-08-09 17:48 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:

> 2) Different limitations on exclusive/shared bytes
>    Btrfs can set different limit on exclusive/shared bytes, further
>    complicating the problem.
> 
> 3) Btrfs quota only accounts data/metadata used by the subvolume
>    It lacks all the shared trees (mentioned below), and in fact such
>    shared tree can be pretty large (especially for extent tree and csum
>    tree).

I'm not sure about the implications, but just to clarify some things:

when limiting somebody's data space we usually don't care about the
underlying "savings" coming from any deduplicating technique - these are
purely bonuses for system owner, so he could do larger resource overbooking.

So - the limit set on any user should enforce maximum and absolute space
he has allocated, including the shared stuff. I could even imagine that
creating a snapshot might immediately "eat" the available quota. In a
way, that quota returned matches (give or take) `du` reported usage,
unless "do not account reflinks withing single qgroup" was easy to implemet.

I.e.: every shared segment should be accounted within quota (at least once).

And the numbers accounted should reflect the uncompressed sizes.


Moreover - if there would be per-subvolume RAID levels someday, the data
should be accouted in relation to "default" (filesystem) RAID level,
i.e. having a RAID0 subvolume on RAID1 fs should account half of the
data, and twice the data in an opposite scenario (like "dup" profile on
single-drive filesystem).


In short: values representing quotas are user-oriented ("the numbers one
bought"), not storage-oriented ("the numbers they actually occupy").

-- 
Tomasz Pala <gotar@pld-linux.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-09 17:48   ` Tomasz Pala
@ 2018-08-09 23:35     ` Qu Wenruo
  2018-08-10  7:17       ` Tomasz Pala
                         ` (2 more replies)
       [not found]     ` <f66b8ff3-d7ec-31ad-e9ca-e09c9eb76474@gmail.com>
  2018-08-10 11:39     ` Austin S. Hemmelgarn
  2 siblings, 3 replies; 26+ messages in thread
From: Qu Wenruo @ 2018-08-09 23:35 UTC (permalink / raw)
  To: Tomasz Pala; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2635 bytes --]



On 8/10/18 1:48 AM, Tomasz Pala wrote:
> On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:
> 
>> 2) Different limitations on exclusive/shared bytes
>>    Btrfs can set different limit on exclusive/shared bytes, further
>>    complicating the problem.
>>
>> 3) Btrfs quota only accounts data/metadata used by the subvolume
>>    It lacks all the shared trees (mentioned below), and in fact such
>>    shared tree can be pretty large (especially for extent tree and csum
>>    tree).
> 
> I'm not sure about the implications, but just to clarify some things:
> 
> when limiting somebody's data space we usually don't care about the
> underlying "savings" coming from any deduplicating technique - these are
> purely bonuses for system owner, so he could do larger resource overbooking.

In reality that's definitely not the case.

From what I see, most users would care more about exclusively used space
(excl), other than the total space one subvolume is referring to (rfer).

The most common case is, you do a snapshot, user would only care how
much new space can be written into the subvolume, other than the total
subvolume size.

> 
> So - the limit set on any user should enforce maximum and absolute space
> he has allocated, including the shared stuff. I could even imagine that
> creating a snapshot might immediately "eat" the available quota. In a
> way, that quota returned matches (give or take) `du` reported usage,
> unless "do not account reflinks withing single qgroup" was easy to implemet.

In fact, that's the case. In current implementation, accounting on
extent is the easiest (if not the only) way to implement.

> 
> I.e.: every shared segment should be accounted within quota (at least once).

Already accounted, at least for rfer.

> 
> And the numbers accounted should reflect the uncompressed sizes.

No way for current extent based solution.

> 
> 
> Moreover - if there would be per-subvolume RAID levels someday, the data
> should be accouted in relation to "default" (filesystem) RAID level,
> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
> data, and twice the data in an opposite scenario (like "dup" profile on
> single-drive filesystem).

No possible again for current extent based solution.

> 
> 
> In short: values representing quotas are user-oriented ("the numbers one
> bought"), not storage-oriented ("the numbers they actually occupy").

Well, if something is not possible or brings so big performance impact,
there will be no argument on how it should work in the first place.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-09 23:35     ` Qu Wenruo
@ 2018-08-10  7:17       ` Tomasz Pala
  2018-08-10  7:55         ` Qu Wenruo
  2018-08-10 11:32       ` Austin S. Hemmelgarn
  2018-08-10 18:07       ` Chris Murphy
  2 siblings, 1 reply; 26+ messages in thread
From: Tomasz Pala @ 2018-08-10  7:17 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Aug 10, 2018 at 07:35:32 +0800, Qu Wenruo wrote:

>> when limiting somebody's data space we usually don't care about the
>> underlying "savings" coming from any deduplicating technique - these are
>> purely bonuses for system owner, so he could do larger resource overbooking.
> 
> In reality that's definitely not the case.

Definitely? How do you "sell" a disk space when there is no upper bound?
Every, and I mean _every_ resource quota out in the wild gives you an user-perspective.
You can assign CPU cores/time, RAM or network bandwidth with HARD limit.

Only after that you _can_ sometimes assign some best-effort
outer, not guaranteed limits, like extra network bandwidth or grace
periods with filesystem usage (disregarding technical details - in case
of quota you move hard limit beyond and apply lowere soft limit).

This is the primary quota usage. Quotas don't save system resources,
quotas are valuables to "sell" (by quotes I mean every possible
allocations, including interorganisation accouting).

Quotas are overbookable by design and like I said before, the underlying
savings mechanism allow sysadm to increase actual overbooking ratio.

If I run out of CPU, RAM, storage or network I simply need to expand
such resource. I won't shrink quotas in such case.
Or apply some other resuorce-saving technique, like LVM with VDO,
swapping, RAM deduplication etc.

If that is not the usecase of btrfs quotas, then it should be renamed to
not confuse users. Using the incorrect terms for things widely known
leads to user frustration at least.

> From what I see, most users would care more about exclusively used space
> (excl), other than the total space one subvolume is referring to (rfer).

Consider this:
1. there is some "template" system-wide snapshot,
2. users X and Y have CoW copies of it - both see "0 bytes exclusive"?
3. sysadm removes "template" - what happens to X and Y quotas?
4. user X removes his copy - what happens to Y quota?

The first thing about virtually every mechanism should be
discoverability and reliability. I expect my quota not to change without
my interaction. Never. How did you cope with this?
If not - how are you going to explain such weird behaviour to users?

Once again: numbers of quotas *I* got must not be influenced by external
operations or foreign users.

> The most common case is, you do a snapshot, user would only care how
> much new space can be written into the subvolume, other than the total
> subvolume size.

If only that would be the case... then exactly - I do care how much new
data is _guaranteed_ to fit on my storage.

So please tell me, as I might get it wrong - what happens if source
subvolume get's removed and the CoWed data are not shared anymore?
Is the quota recalculated? - this would be wrong, as there were no new data written.
Is the quota left intact? - this is wrong too, as this gives the false view of exclusive space taken.

This is just another reincarnation of famous "btrfs df" problem you
couldn't comprehend so long - when reporting "disk FREE" status I want
to know the amount of data that is guaranteed to be written in current
RAID profile, i.e. ignoring any possible savings from compression etc.


Please note: my assumptions are based on
https://btrfs.wiki.kernel.org/index.php/Quota_support

"File copy and file deletion may both affect limits since the unshared
limit of another qgroup can change if the original volume's files are
deleted and only one copy is remaining"

so if I write something invalid this might be the source of my mistake.


>> And the numbers accounted should reflect the uncompressed sizes.
> 
> No way for current extent based solution.

OK, since the data is provided by the user, it's "compressableness"
might be considered his saving (we only provide transparency).

>> Moreover - if there would be per-subvolume RAID levels someday, the data
>> should be accouted in relation to "default" (filesystem) RAID level,
>> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
>> data, and twice the data in an opposite scenario (like "dup" profile on
>> single-drive filesystem).
> 
> No possible again for current extent based solution.

Doesn't extent have information about devices it's cloned on? But OK,
this is not important until per-subvolume profiles are available.

>> In short: values representing quotas are user-oriented ("the numbers one
>> bought"), not storage-oriented ("the numbers they actually occupy").
> 
> Well, if something is not possible or brings so big performance impact,
> there will be no argument on how it should work in the first place.

Actually I think you did something overcomplicated (shared/exclusive),
which would only lead to user confusion (especially when his data
becomes "exclusive" one day without any known reason), misnamed ...and
not reflecting anything valuable, unless the problems with extent
fragmentation are already resolved somehow?

So IMHO current quotas are:
- not discoverable for user (shared->exclusive transition of my data by someone's else action),
- not reliable for sysadm (offensive write pattern by any user can allocate virtually any space despite of quotas).

-- 
Tomasz Pala <gotar@pld-linux.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
       [not found]     ` <f66b8ff3-d7ec-31ad-e9ca-e09c9eb76474@gmail.com>
@ 2018-08-10  7:33       ` Tomasz Pala
  2018-08-11  5:46         ` Andrei Borzenkov
  0 siblings, 1 reply; 26+ messages in thread
From: Tomasz Pala @ 2018-08-10  7:33 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Qu Wenruo, linux-btrfs

On Fri, Aug 10, 2018 at 07:03:18 +0300, Andrei Borzenkov wrote:

>> So - the limit set on any user
> 
> Does btrfs support per-user quota at all? I am aware only of per-subvolume quotas.

Well, this is a kind of deceptive word usage in "post-truth" times.

In this case both "user" and "quota" are not valid...
- by "user" I ment general word, not unix-user account; such user might
  possess some container running full-blown guest OS,
- by "quota" btrfs means - I guess, dataset-quotas?


In fact: https://btrfs.wiki.kernel.org/index.php/Quota_support
"Quota support in BTRFS is implemented at a subvolume level by the use of quota groups or qgroup"

- what the hell is "quota group" and how it differs from qgroup? According to btrfs-quota(8):

"The quota groups (qgroups) are managed by the subcommand btrfs qgroup(8)"

- they are the same... just completely different from traditional "quotas".


My suggestion would be to completely remove the standalone "quota" word
from btrfs documentation - there is no "quota", just "subvolume quota"
or "qgroup" supported.

-- 
Tomasz Pala <gotar@pld-linux.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-10  7:17       ` Tomasz Pala
@ 2018-08-10  7:55         ` Qu Wenruo
  2018-08-10  9:33           ` Tomasz Pala
  0 siblings, 1 reply; 26+ messages in thread
From: Qu Wenruo @ 2018-08-10  7:55 UTC (permalink / raw)
  To: Tomasz Pala; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 9127 bytes --]



On 8/10/18 3:17 PM, Tomasz Pala wrote:
> On Fri, Aug 10, 2018 at 07:35:32 +0800, Qu Wenruo wrote:
> 
>>> when limiting somebody's data space we usually don't care about the
>>> underlying "savings" coming from any deduplicating technique - these are
>>> purely bonuses for system owner, so he could do larger resource overbooking.
>>
>> In reality that's definitely not the case.
> 
> Definitely? How do you "sell" a disk space when there is no upper bound?
> Every, and I mean _every_ resource quota out in the wild gives you an user-perspective.
> You can assign CPU cores/time, RAM or network bandwidth with HARD limit.
> 
> Only after that you _can_ sometimes assign some best-effort
> outer, not guaranteed limits, like extra network bandwidth or grace
> periods with filesystem usage (disregarding technical details - in case
> of quota you move hard limit beyond and apply lowere soft limit).
> 
> This is the primary quota usage. Quotas don't save system resources,
> quotas are valuables to "sell" (by quotes I mean every possible
> allocations, including interorganisation accouting).
> 
> Quotas are overbookable by design and like I said before, the underlying
> savings mechanism allow sysadm to increase actual overbooking ratio.
> 
> If I run out of CPU, RAM, storage or network I simply need to expand
> such resource. I won't shrink quotas in such case.
> Or apply some other resuorce-saving technique, like LVM with VDO,
> swapping, RAM deduplication etc.
> 
> If that is not the usecase of btrfs quotas, then it should be renamed to
> not confuse users. Using the incorrect terms for things widely known
> leads to user frustration at least.
> 
>> From what I see, most users would care more about exclusively used space
>> (excl), other than the total space one subvolume is referring to (rfer).
> 
> Consider this:
> 1. there is some "template" system-wide snapshot,
> 2. users X and Y have CoW copies of it - both see "0 bytes exclusive"?

Yep, although not zero, it's 16K.

> 3. sysadm removes "template" - what happens to X and Y quotas?

Still 16K, unless X or Y dropes their copy.

> 4. user X removes his copy - what happens to Y quota?

Now Y owns the all the snapshot exclusively.

In fact, it's not the correct way to organize your qgroups.
In your case, you should put a higher qgroup (1/0) to contain all the
original snapshot, and user X/Y's subvolume.

In that case, all the snapshots' data and X/Y's newer data are all
exclusive to qgroup 1/0 (as long as you don't do reflink to files out of
subvolume X/Y/snapshot).

And then exclusive number of qgroup 1/0 should be your total usage, and
as long as you don't do reflink out of X/Y/snapshot source, your rfer is
the same as excl, both representing how many bytes used by all three
subvolumes.

This is in btrfs-quota(5) man page.

> 
> The first thing about virtually every mechanism should be
> discoverability and reliability. I expect my quota not to change without
> my interaction. Never. How did you cope with this?
> If not - how are you going to explain such weird behaviour to users?

Read the manual first.
Not every feature is suitable for every use case.

IIRC lvm thin is pretty much the same for the same case.

> 
> Once again: numbers of quotas *I* got must not be influenced by external
> operations or foreign users.
> 
>> The most common case is, you do a snapshot, user would only care how
>> much new space can be written into the subvolume, other than the total
>> subvolume size.
> 
> If only that would be the case... then exactly - I do care how much new
> data is _guaranteed_ to fit on my storage.
> 
> So please tell me, as I might get it wrong - what happens if source
> subvolume get's removed and the CoWed data are not shared anymore?

It's exclusive to the only owner.

> Is the quota recalculated? - this would be wrong, as there were no new data written.

It's recalculated and due to the owner change, the number will change.
It's about extent ownership, as already stated, not all solution suit
all use case.

If you don't think ownership change should change quota, then just don't
use btrfs quota (nor LVM thin if I didn't miss something), it doesn't
fit your use case.

Your use case need LVM snapshot (dm-snapshot), or follow my multi-level
qgroup setup above.

> Is the quota left intact? - this is wrong too, as this gives the false view of exclusive space taken.
> 
> This is just another reincarnation of famous "btrfs df" problem you
> couldn't comprehend so long - when reporting "disk FREE" status I want
> to know the amount of data that is guaranteed to be written in current
> RAID profile, i.e. ignoring any possible savings from compression etc.

Because we have so many ways to use the unallocated space.
It's just impossible to give you a single number of how many space you
can use.

For 4 disk with 1T free space each, if you're using RAID5 for data, then
you can write 3T data.
But if you're also using RAID10 for metadata, and you're using default
inline, we can use small files to fill the free space, resulting 2T
available space.

So in this case how would you calculate the free space? 3T or 2T or
anything between them?

Only yourself know what the heck you're going to use the that 4 disks
with 1T free space each.
Btrfs can't look into your head and know what you're thinking.

> 
> 
> Please note: my assumptions are based on
> https://btrfs.wiki.kernel.org/index.php/Quota_support
> 
> "File copy and file deletion may both affect limits since the unshared
> limit of another qgroup can change if the original volume's files are
> deleted and only one copy is remaining"
> 
> so if I write something invalid this might be the source of my mistake.
> 
> 
>>> And the numbers accounted should reflect the uncompressed sizes.
>>
>> No way for current extent based solution.
> 
> OK, since the data is provided by the user, it's "compressableness"
> might be considered his saving (we only provide transparency).
> 
>>> Moreover - if there would be per-subvolume RAID levels someday, the data
>>> should be accouted in relation to "default" (filesystem) RAID level,
>>> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
>>> data, and twice the data in an opposite scenario (like "dup" profile on
>>> single-drive filesystem).
>>
>> No possible again for current extent based solution.
> 
> Doesn't extent have information about devices it's cloned on? But OK,
> this is not important until per-subvolume profiles are available.

For device related info, it's block group related, and in fact you
shouldn't do the cross level calculation (mixing extent and block group
level together).

For extent level, there is just a super large plain address space from
0~U64_MAX.
Without extra inspection into block group/chunk mapping, we don't know
and have no need to know which extent is located.

Just consider btrfs as a filesystem on a dm-linear device, and parts of
the dm-linear space is mapped using dm-raid1/10/5/6, like:

                 Btrfs logical address space
            /                                  \
 /                                                           \
0///////|       ...           |////////|    ...               u64_max
 \     /                       \      /
 Chunk1                         Chunk N
 SINGLE                         RAID 1
 Mapped using dev A             Mapped using dev B and C
 Physical range X-Y             Physical range B, X-Y, C W-Z

Then you should understand what's going on and why your idea of mixing
extent and chunk are making things worse and confusing.

> 
>>> In short: values representing quotas are user-oriented ("the numbers one
>>> bought"), not storage-oriented ("the numbers they actually occupy").
>>
>> Well, if something is not possible or brings so big performance impact,
>> there will be no argument on how it should work in the first place.
> 
> Actually I think you did something overcomplicated (shared/exclusive),
> which would only lead to user confusion (especially when his data
> becomes "exclusive" one day without any known reason), misnamed ...and
> not reflecting anything valuable, unless the problems with extent
> fragmentation are already resolved somehow?

That's the design from the very beginning of btrfs, yelling at me makes
no sense at all.

If you want some solution to fit your case, I can only tell what btrfs
can do and can't.
I have tried to explain what btrfs quota does and it doesn't, if it
doesn't fit you use case, that's all.
(Whether you have ever tried to understand is another problem)

In fact your idea is pretty hard or even impossible to implement in near
future in btrfs.

> 
> So IMHO current quotas are:
> - not discoverable for user (shared->exclusive transition of my data by someone's else action),
> - not reliable for sysadm (offensive write pattern by any user can allocate virtually any space despite of quotas).


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-10  7:55         ` Qu Wenruo
@ 2018-08-10  9:33           ` Tomasz Pala
  2018-08-11  6:54             ` Andrei Borzenkov
  0 siblings, 1 reply; 26+ messages in thread
From: Tomasz Pala @ 2018-08-10  9:33 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Aug 10, 2018 at 15:55:46 +0800, Qu Wenruo wrote:

>> The first thing about virtually every mechanism should be
>> discoverability and reliability. I expect my quota not to change without
>> my interaction. Never. How did you cope with this?
>> If not - how are you going to explain such weird behaviour to users?
> 
> Read the manual first.
> Not every feature is suitable for every use case.

I, the sysadm, must RTFM.
My users won't comprehend this and moreover - they won't even care.

> IIRC lvm thin is pretty much the same for the same case.

LVM doesn't pretend to be user-oriented, it is the system scope.
LVM didn't name it's thin provisioning "quotas".

> For 4 disk with 1T free space each, if you're using RAID5 for data, then
> you can write 3T data.
> But if you're also using RAID10 for metadata, and you're using default
> inline, we can use small files to fill the free space, resulting 2T
> available space.
> 
> So in this case how would you calculate the free space? 3T or 2T or
> anything between them?

The answear is pretty simple: 3T. Rationale:
- this is the space I do can put in a single data stream,
- people are aware that there is metadata overhead with any object;
  after all, metadata are also data,
- while filling the fs with small files the free space available would
  self-adjust after every single file put, so after uploading 1T of such
  files the df should report 1.5T free. There would be nothing weird(er
  that now) that 1T of data has actually eaten 1.5T of storage.

No crystal ball calculations, just KISS; since one _can_ put 3T file
(non sparse, uncompressible, bulk written) on a filesystem, the free space is 3T.

> Only yourself know what the heck you're going to use the that 4 disks
> with 1T free space each.
> Btrfs can't look into your head and know what you're thinking.

It shouldn't. I expect raw data - there is 3TB of unallocated space for
current data profile.

> That's the design from the very beginning of btrfs, yelling at me makes
> no sense at all.

Sorry if you receive me "yelling" - I honestly must put in on my
non-native english. I just want to clarify some terminology and
perspective expectations. They are irrevelant to the underlying
technical solutions, but the literal *description* of the solution
you provide should match user expectations of that terminology.

> I have tried to explain what btrfs quota does and it doesn't, if it
> doesn't fit you use case, that's all.
> (Whether you have ever tried to understand is another problem)

I am (more than before) aware what btrfs quotas are not.

So, my only expectation (except for worldwide peace and other
unrealistic ones) would be to stop using "quotas", "subvolume quotas"
and "qgroups" interchangeably in btrfs context, as IMvHO these are not
plain, well-known "quotas".

-- 
Tomasz Pala <gotar@pld-linux.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-09 23:35     ` Qu Wenruo
  2018-08-10  7:17       ` Tomasz Pala
@ 2018-08-10 11:32       ` Austin S. Hemmelgarn
  2018-08-10 18:07       ` Chris Murphy
  2 siblings, 0 replies; 26+ messages in thread
From: Austin S. Hemmelgarn @ 2018-08-10 11:32 UTC (permalink / raw)
  To: Qu Wenruo, Tomasz Pala; +Cc: linux-btrfs

On 2018-08-09 19:35, Qu Wenruo wrote:
> 
> 
> On 8/10/18 1:48 AM, Tomasz Pala wrote:
>> On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:
>>
>>> 2) Different limitations on exclusive/shared bytes
>>>     Btrfs can set different limit on exclusive/shared bytes, further
>>>     complicating the problem.
>>>
>>> 3) Btrfs quota only accounts data/metadata used by the subvolume
>>>     It lacks all the shared trees (mentioned below), and in fact such
>>>     shared tree can be pretty large (especially for extent tree and csum
>>>     tree).
>>
>> I'm not sure about the implications, but just to clarify some things:
>>
>> when limiting somebody's data space we usually don't care about the
>> underlying "savings" coming from any deduplicating technique - these are
>> purely bonuses for system owner, so he could do larger resource overbooking.
> 
> In reality that's definitely not the case.
> 
>  From what I see, most users would care more about exclusively used space
> (excl), other than the total space one subvolume is referring to (rfer).
> 
> The most common case is, you do a snapshot, user would only care how
> much new space can be written into the subvolume, other than the total
> subvolume size.
I would really love to know exactly who these users are, because it 
sounds to me like you've heard from exactly zero people who are 
currently using conventional quotas to impose actual resource limits on 
other filesystems (instead of just using them for accounting, which is a 
valid use case but not what they were originally designed for).
> 
>>
>> So - the limit set on any user should enforce maximum and absolute space
>> he has allocated, including the shared stuff. I could even imagine that
>> creating a snapshot might immediately "eat" the available quota. In a
>> way, that quota returned matches (give or take) `du` reported usage,
>> unless "do not account reflinks withing single qgroup" was easy to implemet.
> 
> In fact, that's the case. In current implementation, accounting on
> extent is the easiest (if not the only) way to implement.
> 
>>
>> I.e.: every shared segment should be accounted within quota (at least once).
> 
> Already accounted, at least for rfer.
> 
>>
>> And the numbers accounted should reflect the uncompressed sizes.
> 
> No way for current extent based solution.
While this may be true, this would be a killer feature to have.
> 
>>
>>
>> Moreover - if there would be per-subvolume RAID levels someday, the data
>> should be accouted in relation to "default" (filesystem) RAID level,
>> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
>> data, and twice the data in an opposite scenario (like "dup" profile on
>> single-drive filesystem).
> 
> No possible again for current extent based solution.
> 
>>
>>
>> In short: values representing quotas are user-oriented ("the numbers one
>> bought"), not storage-oriented ("the numbers they actually occupy").
> 
> Well, if something is not possible or brings so big performance impact,
> there will be no argument on how it should work in the first place.
> 
> Thanks,
> Qu
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-09 17:48   ` Tomasz Pala
  2018-08-09 23:35     ` Qu Wenruo
       [not found]     ` <f66b8ff3-d7ec-31ad-e9ca-e09c9eb76474@gmail.com>
@ 2018-08-10 11:39     ` Austin S. Hemmelgarn
  2018-08-10 18:21       ` Tomasz Pala
  2 siblings, 1 reply; 26+ messages in thread
From: Austin S. Hemmelgarn @ 2018-08-10 11:39 UTC (permalink / raw)
  To: Tomasz Pala, Qu Wenruo; +Cc: linux-btrfs

On 2018-08-09 13:48, Tomasz Pala wrote:
> On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:
> 
>> 2) Different limitations on exclusive/shared bytes
>>     Btrfs can set different limit on exclusive/shared bytes, further
>>     complicating the problem.
>>
>> 3) Btrfs quota only accounts data/metadata used by the subvolume
>>     It lacks all the shared trees (mentioned below), and in fact such
>>     shared tree can be pretty large (especially for extent tree and csum
>>     tree).
> 
> I'm not sure about the implications, but just to clarify some things:
> 
> when limiting somebody's data space we usually don't care about the
> underlying "savings" coming from any deduplicating technique - these are
> purely bonuses for system owner, so he could do larger resource overbooking.
> 
> So - the limit set on any user should enforce maximum and absolute space
> he has allocated, including the shared stuff. I could even imagine that
> creating a snapshot might immediately "eat" the available quota. In a
> way, that quota returned matches (give or take) `du` reported usage,
> unless "do not account reflinks withing single qgroup" was easy to implemet.
> 
> I.e.: every shared segment should be accounted within quota (at least once).
I think what you mean to say here is that every shared extent should be 
accounted to quotas for every location it is reflinked from.  IOW, that 
if an extent is shared between two subvolumes each with it's own quota, 
they should both have it accounted against their quota.
> 
> And the numbers accounted should reflect the uncompressed sizes.
This is actually inconsistent with pretty much every other VFS level 
quota system in existence.  Even ZFS does it's accounting _after_ 
compression.  At this point, it's actually expected by most sysadmins 
that things behave that way.
> 
> 
> Moreover - if there would be per-subvolume RAID levels someday, the data
> should be accouted in relation to "default" (filesystem) RAID level,
> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
> data, and twice the data in an opposite scenario (like "dup" profile on
> single-drive filesystem).
This is irrelevant to your point here.  In fact, it goes against it, 
you're arguing for quotas to report data like `du`, but all of 
chunk-profile stuff is invisible to `du` (and everything else in 
userspace that doesn't look through BTRFS ioctls).
> 
> 
> In short: values representing quotas are user-oriented ("the numbers one
> bought"), not storage-oriented ("the numbers they actually occupy").

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-09 23:35     ` Qu Wenruo
  2018-08-10  7:17       ` Tomasz Pala
  2018-08-10 11:32       ` Austin S. Hemmelgarn
@ 2018-08-10 18:07       ` Chris Murphy
  2018-08-10 19:10         ` Austin S. Hemmelgarn
  2018-08-11  3:29         ` Duncan
  2 siblings, 2 replies; 26+ messages in thread
From: Chris Murphy @ 2018-08-10 18:07 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Tomasz Pala, Btrfs BTRFS

On Thu, Aug 9, 2018 at 5:35 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 8/10/18 1:48 AM, Tomasz Pala wrote:
>> On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:
>>
>>> 2) Different limitations on exclusive/shared bytes
>>>    Btrfs can set different limit on exclusive/shared bytes, further
>>>    complicating the problem.
>>>
>>> 3) Btrfs quota only accounts data/metadata used by the subvolume
>>>    It lacks all the shared trees (mentioned below), and in fact such
>>>    shared tree can be pretty large (especially for extent tree and csum
>>>    tree).
>>
>> I'm not sure about the implications, but just to clarify some things:
>>
>> when limiting somebody's data space we usually don't care about the
>> underlying "savings" coming from any deduplicating technique - these are
>> purely bonuses for system owner, so he could do larger resource overbooking.
>
> In reality that's definitely not the case.
>
> From what I see, most users would care more about exclusively used space
> (excl), other than the total space one subvolume is referring to (rfer).

I'm confused.

So what happens in the following case with quotas enabled on Btrfs:

1. Provision a user with a directory, pre-populated with files, using
snapshot. Let's say it's 1GiB of files.
2. Set a quota for this user's directory, 1GiB.

The way I'm reading the description of Btrfs quotas, the 1GiB quota
applies to exclusive used space. So for starters, they have 1GiB of
shared data that does not affect their 1GiB quota at all.

3. User creates 500MiB worth of new files, this is exclusive usage.
They are still within their quota limit.
4. The shared data becomes obsolete for all but this one user, and is deleted.

Suddenly, 1GiB of shared data for this user is no longer shared data,
it instantly becomes exclusive data and their quota is busted. Now
consider scaling this to 12TiB of storage, with hundreds of users, and
dozens of abruptly busted quotas following this same scenario on a
weekly basis.

I *might* buy off on the idea that an overlay2 based initial
provisioning would not affect quotas. But whether data is shared or
exclusive seems potentially ephemeral, and not something a sysadmin
should even be able to anticipate let alone individual users.

Going back to the example, I'd expect to give the user a 2GiB quota,
with 1GiB of initially provisioned data via snapshot, so right off the
bat they are at 50% usage of their quota. If they were to modify every
single provisioned file, they'd in effect go from 100% shared data to
100% exclusive data, but their quota usage would still be 50%. That's
completely sane and easily understandable by a regular user. The idea
that they'd start modifying shared files, and their quota usage climbs
is weird to me. The state of files being shared or exclusive is not
user domain terminology anyway.


>
> The most common case is, you do a snapshot, user would only care how
> much new space can be written into the subvolume, other than the total
> subvolume size.

I think that's expecting a lot of users.

I also wonder if it expects a lot from services like samba and NFS who
have to communicate all of this in some sane way to remote clients? My
expectation is that a remote client shows Free Space on a quota'd
system to be based on the unused amount of the quota. I also expect if
I delete a 1GiB file, that my quota consumption goes down. But you're
saying it would be unchanged if I delete a 1GiB shared file, and would
only go down if I delete a 1GiB exclusive file. Do samba and NFS know
about shared and exclusive files? If samba and NFS don't understand
this, then how is a user supposed to understand it?

And now I'm sufficiently confused I'm ready for the weekend!


>> And the numbers accounted should reflect the uncompressed sizes.
>
> No way for current extent based solution.

I'm less concerned about this. But since the extent item shows both
ram and disk byte values, why couldn't the quota and the space
reporting be predicated on the ram value which is always uncompressed?



>
>>
>>
>> Moreover - if there would be per-subvolume RAID levels someday, the data
>> should be accouted in relation to "default" (filesystem) RAID level,
>> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
>> data, and twice the data in an opposite scenario (like "dup" profile on
>> single-drive filesystem).
>
> No possible again for current extent based solution.

It's fine, I think it's unintuitive for DUP or raid1 profiles to cause
quota consumption to double. The underlying configuration of the array
is not the business of the user. They can only be expected to
understand file size. Underlying space consumed, whether compressed,
or duplicated, or compressed and duplicated, is out of scope for the
user. And we can't have quotas getting busted all of a sudden because
the sysadmin decides to do -dconvert -mconvert raid1, without
requiring the sysadmin to double everyone's quota before performing
the operation.





>
>>
>>
>> In short: values representing quotas are user-oriented ("the numbers one
>> bought"), not storage-oriented ("the numbers they actually occupy").
>
> Well, if something is not possible or brings so big performance impact,
> there will be no argument on how it should work in the first place.

Yep!

What is VFS disk quotas and does Btrfs use that at all? If not, why
not? It seems to me there really should be a high level basic per
directory quota implementation at the VFS layer, with a single kernel
interface as well as a single user space interface, regardless of the
file system. Additional file system specific quota features can of
course have their own tools, but all of this re-invention of the wheel
for basic directory quotas is a mystery to me.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-10 11:39     ` Austin S. Hemmelgarn
@ 2018-08-10 18:21       ` Tomasz Pala
  2018-08-10 18:48         ` Austin S. Hemmelgarn
  2018-08-11  6:18         ` Andrei Borzenkov
  0 siblings, 2 replies; 26+ messages in thread
From: Tomasz Pala @ 2018-08-10 18:21 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Qu Wenruo, linux-btrfs

On Fri, Aug 10, 2018 at 07:39:30 -0400, Austin S. Hemmelgarn wrote:

>> I.e.: every shared segment should be accounted within quota (at least once).
> I think what you mean to say here is that every shared extent should be 
> accounted to quotas for every location it is reflinked from.  IOW, that 
> if an extent is shared between two subvolumes each with it's own quota, 
> they should both have it accounted against their quota.

Yes.

>> Moreover - if there would be per-subvolume RAID levels someday, the data
>> should be accouted in relation to "default" (filesystem) RAID level,
>> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
>> data, and twice the data in an opposite scenario (like "dup" profile on
>> single-drive filesystem).
>
> This is irrelevant to your point here.  In fact, it goes against it, 
> you're arguing for quotas to report data like `du`, but all of 
> chunk-profile stuff is invisible to `du` (and everything else in 
> userspace that doesn't look through BTRFS ioctls).

My point is user-point, not some system tool like du. Consider this:
1. user wants higher (than default) protection of some data,
2. user wants more storage space with less protection.

Ad. 1 - requesting better redundancy is similar to cp --reflink=never
- there are functional differences, but the cost is similar: trading
  space for security,

Ad. 2 - many would like to have .cache, .ccache, tmp or some build
system directory with faster writes and no redundancy at all. This
requires per-file/directory data profile attrs though.

Since we agreed that transparent data compression is user's storage bonus,
gains from the reduced redundancy should also profit user.


Disclaimer: all the above statements in relation to conception and
understanding of quotas, not to be confused with qgroups.

-- 
Tomasz Pala <gotar@pld-linux.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-10 18:21       ` Tomasz Pala
@ 2018-08-10 18:48         ` Austin S. Hemmelgarn
  2018-08-11  6:18         ` Andrei Borzenkov
  1 sibling, 0 replies; 26+ messages in thread
From: Austin S. Hemmelgarn @ 2018-08-10 18:48 UTC (permalink / raw)
  To: Tomasz Pala; +Cc: Qu Wenruo, linux-btrfs

On 2018-08-10 14:21, Tomasz Pala wrote:
> On Fri, Aug 10, 2018 at 07:39:30 -0400, Austin S. Hemmelgarn wrote:
> 
>>> I.e.: every shared segment should be accounted within quota (at least once).
>> I think what you mean to say here is that every shared extent should be
>> accounted to quotas for every location it is reflinked from.  IOW, that
>> if an extent is shared between two subvolumes each with it's own quota,
>> they should both have it accounted against their quota.
> 
> Yes.
> 
>>> Moreover - if there would be per-subvolume RAID levels someday, the data
>>> should be accouted in relation to "default" (filesystem) RAID level,
>>> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
>>> data, and twice the data in an opposite scenario (like "dup" profile on
>>> single-drive filesystem).
>>
>> This is irrelevant to your point here.  In fact, it goes against it,
>> you're arguing for quotas to report data like `du`, but all of
>> chunk-profile stuff is invisible to `du` (and everything else in
>> userspace that doesn't look through BTRFS ioctls).
> 
> My point is user-point, not some system tool like du. Consider this:
> 1. user wants higher (than default) protection of some data,
> 2. user wants more storage space with less protection.
> 
> Ad. 1 - requesting better redundancy is similar to cp --reflink=never
> - there are functional differences, but the cost is similar: trading
>    space for security,
> 
> Ad. 2 - many would like to have .cache, .ccache, tmp or some build
> system directory with faster writes and no redundancy at all. This
> requires per-file/directory data profile attrs though.
> 
> Since we agreed that transparent data compression is user's storage bonus,
> gains from the reduced redundancy should also profit user.
Do you actually know of any services that do this though?  I mean, 
Amazon S3 and similar services have the option of reduced redundancy 
(and other alternate storage tiers), but they charge 
per-unit-data-per-unit-time with no hard limit on how much space they 
use, and charge different rates for different storage tiers.  In 
comparison, what you appear to be talking about is something more 
similar to Dropbox or Google Drive, where you pay up front for a fixed 
amount of storage for a fixed amount of time and can't use more than 
that, and all the services I know of like that offer exactly one option 
for storage redundancy.

That aside, you seem to be overthinking this.  No sane provider is going 
to give their users the ability to create subvolumes themselves (there's 
too much opportunity for a tiny bug in your software to cost you a _lot_ 
of lost revenue, because creating subvolumes can let you escape qgroups) 
  That means in turn that what you're trying to argue for is no 
different from the provider just selling units of storage for different 
redundancy levels separately, and charging different rates for each of 
them.  In fact, that approach is better, because it works independent of 
the underlying storage technology (it will work with hardware RAID, 
LVM2, MD, ZFS, and even distributed storage platforms like Ceph and 
Gluster), _and_ it lets them charge differently than the trivial case of 
N copies costing N times as much as one copy (which is not quite 
accurate in terms of actual management costs).

Now, if BTRFS were to have the ability to set profiles per-file, then 
this might be useful, albeit with the option to tune how it gets accounted.
> 
> Disclaimer: all the above statements in relation to conception and
> understanding of quotas, not to be confused with qgroups.
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-10 18:07       ` Chris Murphy
@ 2018-08-10 19:10         ` Austin S. Hemmelgarn
  2018-08-11  3:29         ` Duncan
  1 sibling, 0 replies; 26+ messages in thread
From: Austin S. Hemmelgarn @ 2018-08-10 19:10 UTC (permalink / raw)
  To: Chris Murphy, Qu Wenruo; +Cc: Tomasz Pala, Btrfs BTRFS

On 2018-08-10 14:07, Chris Murphy wrote:
> On Thu, Aug 9, 2018 at 5:35 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>> On 8/10/18 1:48 AM, Tomasz Pala wrote:
>>> On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:
>>>
>>>> 2) Different limitations on exclusive/shared bytes
>>>>     Btrfs can set different limit on exclusive/shared bytes, further
>>>>     complicating the problem.
>>>>
>>>> 3) Btrfs quota only accounts data/metadata used by the subvolume
>>>>     It lacks all the shared trees (mentioned below), and in fact such
>>>>     shared tree can be pretty large (especially for extent tree and csum
>>>>     tree).
>>>
>>> I'm not sure about the implications, but just to clarify some things:
>>>
>>> when limiting somebody's data space we usually don't care about the
>>> underlying "savings" coming from any deduplicating technique - these are
>>> purely bonuses for system owner, so he could do larger resource overbooking.
>>
>> In reality that's definitely not the case.
>>
>>  From what I see, most users would care more about exclusively used space
>> (excl), other than the total space one subvolume is referring to (rfer).
> 
> I'm confused.
> 
> So what happens in the following case with quotas enabled on Btrfs:
> 
> 1. Provision a user with a directory, pre-populated with files, using
> snapshot. Let's say it's 1GiB of files.
> 2. Set a quota for this user's directory, 1GiB.
> 
> The way I'm reading the description of Btrfs quotas, the 1GiB quota
> applies to exclusive used space. So for starters, they have 1GiB of
> shared data that does not affect their 1GiB quota at all.
> 
> 3. User creates 500MiB worth of new files, this is exclusive usage.
> They are still within their quota limit.
> 4. The shared data becomes obsolete for all but this one user, and is deleted.
> 
> Suddenly, 1GiB of shared data for this user is no longer shared data,
> it instantly becomes exclusive data and their quota is busted. Now
> consider scaling this to 12TiB of storage, with hundreds of users, and
> dozens of abruptly busted quotas following this same scenario on a
> weekly basis.
> 
> I *might* buy off on the idea that an overlay2 based initial
> provisioning would not affect quotas. But whether data is shared or
> exclusive seems potentially ephemeral, and not something a sysadmin
> should even be able to anticipate let alone individual users.
> 
> Going back to the example, I'd expect to give the user a 2GiB quota,
> with 1GiB of initially provisioned data via snapshot, so right off the
> bat they are at 50% usage of their quota. If they were to modify every
> single provisioned file, they'd in effect go from 100% shared data to
> 100% exclusive data, but their quota usage would still be 50%. That's
> completely sane and easily understandable by a regular user. The idea
> that they'd start modifying shared files, and their quota usage climbs
> is weird to me. The state of files being shared or exclusive is not
> user domain terminology anyway.
And it's important to note that this is the _only_ way this can sanely 
work for actually partitioning resources, which is the primary classical 
use case for quotas.

Being able to see how much data is shared and exclusive in a subvolume 
is nice, but quota groups are the wrong name for it because the current 
implementation does not work at all like quotas and can trivially result 
in both users escaping quotas (multiple ways), and in quotas being 
overreached by very large amounts for potentially indefinite periods of 
time because of actions of individuals who _don't_ own the data the 
quota is for.
> 
> 
>>
>> The most common case is, you do a snapshot, user would only care how
>> much new space can be written into the subvolume, other than the total
>> subvolume size.
> 
> I think that's expecting a lot of users.
> 
> I also wonder if it expects a lot from services like samba and NFS who
> have to communicate all of this in some sane way to remote clients? My
> expectation is that a remote client shows Free Space on a quota'd
> system to be based on the unused amount of the quota. I also expect if
> I delete a 1GiB file, that my quota consumption goes down. But you're
> saying it would be unchanged if I delete a 1GiB shared file, and would
> only go down if I delete a 1GiB exclusive file. Do samba and NFS know
> about shared and exclusive files? If samba and NFS don't understand
> this, then how is a user supposed to understand it?
It might be worth looking at how Samba and NFS work on top of ZFS on a 
platform like FreeNAS and trying to emulate that.

Behavior there is as-follows:

* The total size of the 'disk' reported over SMB (shown on Windows only 
if you map the share as a drive) is equal to the quota for the 
underlying dataset.
* The reported space used on the 'disk' reported over SMB is based on 
physical space usage after compression, with a few caveats relating to 
deduplication:
     - Data which is shared across multiple datasets is accounted 
against _all_ datasets that reference it.
     - Data which is shared only within a given dataset is accounted 
only once.
* Free space is reported simply as the total size minus the used space.
* Usage reported by `du` equivalent tools shows numbers _before_ 
compression and deduplication (so, it shows you how much space you would 
need to store all the data elsewhere).
* Whether or not the files are transparently compressed is actually 
reported properly.
> 
> And now I'm sufficiently confused I'm ready for the weekend!
> 
> 
>>> And the numbers accounted should reflect the uncompressed sizes.
>>
>> No way for current extent based solution.
> 
> I'm less concerned about this. But since the extent item shows both
> ram and disk byte values, why couldn't the quota and the space
> reporting be predicated on the ram value which is always uncompressed?
> 
> 
> 
>>
>>>
>>>
>>> Moreover - if there would be per-subvolume RAID levels someday, the data
>>> should be accouted in relation to "default" (filesystem) RAID level,
>>> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
>>> data, and twice the data in an opposite scenario (like "dup" profile on
>>> single-drive filesystem).
>>
>> No possible again for current extent based solution.
> 
> It's fine, I think it's unintuitive for DUP or raid1 profiles to cause
> quota consumption to double. The underlying configuration of the array
> is not the business of the user. They can only be expected to
> understand file size. Underlying space consumed, whether compressed,
> or duplicated, or compressed and duplicated, is out of scope for the
> user. And we can't have quotas getting busted all of a sudden because
> the sysadmin decides to do -dconvert -mconvert raid1, without
> requiring the sysadmin to double everyone's quota before performing
> the operation.
It's not just unintuitive, it's broken unless you have per-object profiles.
> 
>>
>>>
>>> In short: values representing quotas are user-oriented ("the numbers one
>>> bought"), not storage-oriented ("the numbers they actually occupy").
>>
>> Well, if something is not possible or brings so big performance impact,
>> there will be no argument on how it should work in the first place.
> 
> Yep!
> 
> What is VFS disk quotas and does Btrfs use that at all? If not, why
> not? It seems to me there really should be a high level basic per
> directory quota implementation at the VFS layer, with a single kernel
> interface as well as a single user space interface, regardless of the
> file system. Additional file system specific quota features can of
> course have their own tools, but all of this re-invention of the wheel
> for basic directory quotas is a mystery to me.
No, we don't use VFS disk quotas.  I don't know enough about the 
in-kernel API for it to be certain, but I believe that the way BTRFS 
handles data violates some of the constraints that are required by that 
API, which is why we don't use it.  It might be possible if we could 
have a way to get total data accounted for a given directory (in a way 
that behaves like the above mentioned FreeNAS Samba handling for 
calculating 'disk' usage).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-10 18:07       ` Chris Murphy
  2018-08-10 19:10         ` Austin S. Hemmelgarn
@ 2018-08-11  3:29         ` Duncan
  2018-08-12  3:16           ` Chris Murphy
  1 sibling, 1 reply; 26+ messages in thread
From: Duncan @ 2018-08-11  3:29 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Fri, 10 Aug 2018 12:07:34 -0600 as excerpted:

> But whether data is shared or exclusive seems potentially ephemeral, and
> not something a sysadmin should even be able to anticipate let alone
> individual users.

Define "user(s)".

Arguably, in the context of btrfs tool usage, "user" /is/ the admin, the 
one who cares that it's btrfs in the first place, who should have chosen 
btrfs based on the best-case match for the use-case, and who continues to 
maintain the system's btrfs filesystems using btrfs tools.

Arguably, in this context "user" is /not/ the other users the admin is 
caring for the system in behalf of, who don't care /what/ is under the 
covers so long as it works, to which should be made available more 
appropriate-to-their-needs tools should they be found necessary or useful.

> Going back to the example, I'd expect to give the user a 2GiB quota,
> with 1GiB of initially provisioned data via snapshot, so right off the
> bat they are at 50% usage of their quota. If they were to modify every
> single provisioned file, they'd in effect go from 100% shared data to
> 100% exclusive data, but their quota usage would still be 50%. That's
> completely sane and easily understandable by a regular user. The idea
> that they'd start modifying shared files, and their quota usage climbs
> is weird to me. The state of files being shared or exclusive is not user
> domain terminology anyway.

It's user-domain terminology if the "user" is the admin, who will care 
about shared/exclusive usage in the context of how it affects the usage 
of available storage resources.

"Regular users" as you use the term, that is the non-admins who just need 
to know how close they are to running out of their allotted storage 
resources, shouldn't really need to care about btrfs tool usage in the 
first place, and btrfs commands in general, including btrfs quota related 
commands, really aren't targeted at them, and aren't designed to report 
the type of information they are likely to find useful.  Other tools will 
be more appropriate.

>> The most common case is, you do a snapshot, user would only care how
>> much new space can be written into the subvolume, other than the total
>> subvolume size.
> 
> I think that's expecting a lot of users.

Not really.  Remember, "users" in this context are admins, those to whom 
the duty of maintaining their btrfs falls, and the ones at whom btrfs * 
commands are normally targeted, since this is the btrfs tool designed to 
help them with that job.

And said "users" will (presumably) be concerned about shared/exclusive if 
they're using btrfs quotas because they are trying to well manage the 
filesystem space utilization per subvolume.

(FWIW, "presumably" is thrown in there because here I don't use 
subvolumes /or/ sub-filesystem-level quotas as personally, I prefer to 
manage that at the filesystem level, with multiple independent 
filesystems and the size of individual filesystems enforcing limits on 
how much the stuff stored in them can grow.)

> I also wonder if it expects a lot from services like samba and NFS who
> have to communicate all of this in some sane way to remote clients? My
> expectation is that a remote client shows Free Space on a quota'd system
> to be based on the unused amount of the quota. I also expect if I delete
> a 1GiB file, that my quota consumption goes down. But you're saying it
> would be unchanged if I delete a 1GiB shared file, and would only go
> down if I delete a 1GiB exclusive file. Do samba and NFS know about
> shared and exclusive files? If samba and NFS don't understand this, then
> how is a user supposed to understand it?

There's a reason btrfs quotas don't work with standard VFS level quotas.  
They're managing two different things, and I'd assume the btrfs quota 
information isn't typically what samba/NFS information exporting is 
designed to deal with in the first place.  Just because a screwdriver 
/can/ be used as a hammer doesn't make it the appropriate tool for the 
job.

> And now I'm sufficiently confused I'm ready for the weekend!

LOL!

(I had today/Friday off, arguably why I'm even taking the time to reply, 
but my second day off this "week" is next Tuesday, the last day of the 
schedule-week.  I had actually forgotten that this was the last day of 
the work-week for most, until I saw that, but then, LOL!)

> And we can't have quotas getting busted all of a sudden because the
> sysadmin decides to do -dconvert -mconvert raid1, without requiring the
> sysadmin to double everyone's quota before performing the operation.

Not every_one's_, every-subvolume's.  "Everyone's" quotas shouldn't be 
affected, because that's not what btrfs quotas manage.  There are other 
(non-btrfs) tools for that.

>>> In short: values representing quotas are user-oriented ("the numbers
>>> one bought"), not storage-oriented ("the numbers they actually
>>> occupy").

Btrfs quotas are storage-oriented, and if you're using them, at least 
directly, for user-oriented, you're using the proverbial screwdriver as a 
proverbial hammer.

> What is VFS disk quotas and does Btrfs use that at all? If not, why not?
> It seems to me there really should be a high level basic per directory
> quota implementation at the VFS layer, with a single kernel interface as
> well as a single user space interface, regardless of the file system.
> Additional file system specific quota features can of course have their
> own tools, but all of this re-invention of the wheel for basic directory
> quotas is a mystery to me.

As mentioned above and by others, btrfs quotas don't use vfs quotas (or 
the reverse, really, it'd be vfs quotas using information exposed by 
btrfs quotas... if it worked that way), because there's an API mis-match 
because their intended usage and the information they convey and control 
is different, and (AFAIK) was never intended or claimed to be the same.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-10  7:33       ` Tomasz Pala
@ 2018-08-11  5:46         ` Andrei Borzenkov
  0 siblings, 0 replies; 26+ messages in thread
From: Andrei Borzenkov @ 2018-08-11  5:46 UTC (permalink / raw)
  To: Tomasz Pala; +Cc: Qu Wenruo, linux-btrfs

10.08.2018 10:33, Tomasz Pala пишет:
> On Fri, Aug 10, 2018 at 07:03:18 +0300, Andrei Borzenkov wrote:
> 
>>> So - the limit set on any user
>>
>> Does btrfs support per-user quota at all? I am aware only of per-subvolume quotas.
> 
> Well, this is a kind of deceptive word usage in "post-truth" times.
> 
> In this case both "user" and "quota" are not valid...
> - by "user" I ment general word, not unix-user account; such user might
>   possess some container running full-blown guest OS,
> - by "quota" btrfs means - I guess, dataset-quotas?
> 
> 
> In fact: https://btrfs.wiki.kernel.org/index.php/Quota_support
> "Quota support in BTRFS is implemented at a subvolume level by the use of quota groups or qgroup"
> 
> - what the hell is "quota group" and how it differs from qgroup? According to btrfs-quota(8):
> 
> "The quota groups (qgroups) are managed by the subcommand btrfs qgroup(8)"
> 
> - they are the same... just completely different from traditional "quotas".
> 
> 
> My suggestion would be to completely remove the standalone "quota" word
> from btrfs documentation - there is no "quota", just "subvolume quota"
> or "qgroup" supported.
> 

Well, qgroup allows you to limit amount of data that can be stored in
subvolume (or under quota group in general), so it behaves like
traditional quota to me.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-10 18:21       ` Tomasz Pala
  2018-08-10 18:48         ` Austin S. Hemmelgarn
@ 2018-08-11  6:18         ` Andrei Borzenkov
  1 sibling, 0 replies; 26+ messages in thread
From: Andrei Borzenkov @ 2018-08-11  6:18 UTC (permalink / raw)
  To: Tomasz Pala, Austin S. Hemmelgarn; +Cc: Qu Wenruo, linux-btrfs

10.08.2018 21:21, Tomasz Pala пишет:
> On Fri, Aug 10, 2018 at 07:39:30 -0400, Austin S. Hemmelgarn wrote:
> 
>>> I.e.: every shared segment should be accounted within quota (at least once).
>> I think what you mean to say here is that every shared extent should be 
>> accounted to quotas for every location it is reflinked from.  IOW, that 
>> if an extent is shared between two subvolumes each with it's own quota, 
>> they should both have it accounted against their quota.
> 
> Yes.
> 

This is what "referenced" in quota group report is, is not it? What is
missing here?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-10  9:33           ` Tomasz Pala
@ 2018-08-11  6:54             ` Andrei Borzenkov
  0 siblings, 0 replies; 26+ messages in thread
From: Andrei Borzenkov @ 2018-08-11  6:54 UTC (permalink / raw)
  To: Tomasz Pala, Qu Wenruo; +Cc: linux-btrfs

10.08.2018 12:33, Tomasz Pala пишет:
> 
>> For 4 disk with 1T free space each, if you're using RAID5 for data, then
>> you can write 3T data.
>> But if you're also using RAID10 for metadata, and you're using default
>> inline, we can use small files to fill the free space, resulting 2T
>> available space.
>>
>> So in this case how would you calculate the free space? 3T or 2T or
>> anything between them?
> 
> The answear is pretty simple: 3T. Rationale:
> - this is the space I do can put in a single data stream,
> - people are aware that there is metadata overhead with any object;
>   after all, metadata are also data,
> - while filling the fs with small files the free space available would
>   self-adjust after every single file put, so after uploading 1T of such
>   files the df should report 1.5T free. There would be nothing weird(er
>   that now) that 1T of data has actually eaten 1.5T of storage.
> 
> No crystal ball calculations, just KISS; since one _can_ put 3T file
> (non sparse, uncompressible, bulk written) on a filesystem, the free space is 3T.
> 

As far as I can tell, that is exactly what "df" reports now. "btrfs fi
us" will tell you both max (reported by "df") and worst case min.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-11  3:29         ` Duncan
@ 2018-08-12  3:16           ` Chris Murphy
  2018-08-12  7:04             ` Andrei Borzenkov
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Murphy @ 2018-08-12  3:16 UTC (permalink / raw)
  To: Duncan; +Cc: Btrfs BTRFS

On Fri, Aug 10, 2018 at 9:29 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> Chris Murphy posted on Fri, 10 Aug 2018 12:07:34 -0600 as excerpted:
>
>> But whether data is shared or exclusive seems potentially ephemeral, and
>> not something a sysadmin should even be able to anticipate let alone
>> individual users.
>
> Define "user(s)".

The person who is saving their document on a network share, and
they've never heard of Btrfs.


> Arguably, in the context of btrfs tool usage, "user" /is/ the admin,

I'm not talking about btrfs tools. I'm talking about rational,
predictable behavior of a shared folder.

If I try to drop a 1GiB file into my share and I'm denied, not enough
free space, and behind the scenes it's because of a quota limit, I
expect I can delete *any* file(s) amounting to create 1GiB free space
and then I'll be able to drop that file successfully without error.

But if I'm unwittingly deleting shared files, my quota usage won't go
down, and I still can't save my file. So now I somehow need a secret
incantation to discover only my exclusive files and delete enough of
them in order to save this 1GiB file. It's weird, it's unexpected, I
think it's a use case failure. Maybe Btrfs quotas isn't meant to work
with samba or NFS shares. *shrug*



>
> "Regular users" as you use the term, that is the non-admins who just need
> to know how close they are to running out of their allotted storage
> resources, shouldn't really need to care about btrfs tool usage in the
> first place, and btrfs commands in general, including btrfs quota related
> commands, really aren't targeted at them, and aren't designed to report
> the type of information they are likely to find useful.  Other tools will
> be more appropriate.

I'm not talking about any btrfs commands or even the term quota for
regular users. I'm talking about saving a file, being denied, and how
does the user figure out how to free up space?

Anyway, it's a hypothetical scenario. While I have Samba running on a
Btrfs volume with various shares as subvolumes, I don't have quotas
enabled.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-12  3:16           ` Chris Murphy
@ 2018-08-12  7:04             ` Andrei Borzenkov
  2018-08-12 17:39               ` Andrei Borzenkov
  2018-08-13 11:23               ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 26+ messages in thread
From: Andrei Borzenkov @ 2018-08-12  7:04 UTC (permalink / raw)
  To: Chris Murphy, Duncan; +Cc: Btrfs BTRFS

12.08.2018 06:16, Chris Murphy пишет:
> On Fri, Aug 10, 2018 at 9:29 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>> Chris Murphy posted on Fri, 10 Aug 2018 12:07:34 -0600 as excerpted:
>>
>>> But whether data is shared or exclusive seems potentially ephemeral, and
>>> not something a sysadmin should even be able to anticipate let alone
>>> individual users.
>>
>> Define "user(s)".
> 
> The person who is saving their document on a network share, and
> they've never heard of Btrfs.
> 
> 
>> Arguably, in the context of btrfs tool usage, "user" /is/ the admin,
> 
> I'm not talking about btrfs tools. I'm talking about rational,
> predictable behavior of a shared folder.
> 
> If I try to drop a 1GiB file into my share and I'm denied, not enough
> free space, and behind the scenes it's because of a quota limit, I
> expect I can delete *any* file(s) amounting to create 1GiB free space
> and then I'll be able to drop that file successfully without error.
> 
> But if I'm unwittingly deleting shared files, my quota usage won't go
> down, and I still can't save my file. So now I somehow need a secret
> incantation to discover only my exclusive files and delete enough of
> them in order to save this 1GiB file. It's weird, it's unexpected, I
> think it's a use case failure. Maybe Btrfs quotas isn't meant to work
> with samba or NFS shares. *shrug*
> 

That's how both NetApp and ZFS work as well. I doubt anyone can
seriously call NetApp "not meant to work with NFS or CIFS shares".

On NetApp space available to NFS/CIFS user is volume size minus space
frozen in snapshots. If file, captured in snapshot, is deleted in active
file system, it does not make a single byte available to external user.
That's what surprised most every first time NetApp users.

On ZFS snapshots are contained in dataset and you limit total dataset
space consumption including all snapshots. Thus end effect is the same -
deleting data that is itself captured in snapshot does not make a single
byte available. ZFS allows you to additionally restrict active file
system size ("referenced" quota in ZFS) - this more closely matches your
expectation - deleting file in active file system decreases its
"referenced" size thus allowing user to write more data (as long as user
does not exceed total dataset quota). This is different from btrfs
"exculsive" and "shared". This should not be hard to implement in btrfs,
as "referenced" simply means all data in current subvolume, be it
exclusive or shared.

IOW ZFS allows to place restriction on both how much data user can use
and how much data user is allowed additionally to protect (snapshot).

> 
> 
>>
>> "Regular users" as you use the term, that is the non-admins who just need
>> to know how close they are to running out of their allotted storage
>> resources, shouldn't really need to care about btrfs tool usage in the
>> first place, and btrfs commands in general, including btrfs quota related
>> commands, really aren't targeted at them, and aren't designed to report
>> the type of information they are likely to find useful.  Other tools will
>> be more appropriate.
> 
> I'm not talking about any btrfs commands or even the term quota for
> regular users. I'm talking about saving a file, being denied, and how
> does the user figure out how to free up space?
> 

Users need to be educated. Same as with NetApp and ZFS. There is no
magic, redirect-on-write filesystems work differently than traditional
and users need to adapt.

Of course devil is in details, and usability of btrfs quota is far lower
than NetApp/ZFS. In those space consumption information is first class
citizen integrated into the very basic tools, not something bolted on
later and mostly incomprehensible to end user.

> Anyway, it's a hypothetical scenario. While I have Samba running on a
> Btrfs volume with various shares as subvolumes, I don't have quotas
> enabled.
> 
> 
> 

Given all performance issues with quota reported on this list it is
probably just as good for you.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-12  7:04             ` Andrei Borzenkov
@ 2018-08-12 17:39               ` Andrei Borzenkov
  2018-08-13 11:23               ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 26+ messages in thread
From: Andrei Borzenkov @ 2018-08-12 17:39 UTC (permalink / raw)
  To: Chris Murphy, Duncan; +Cc: Btrfs BTRFS

12.08.2018 10:04, Andrei Borzenkov пишет:
> 
> On ZFS snapshots are contained in dataset and you limit total dataset
> space consumption including all snapshots. Thus end effect is the same -
> deleting data that is itself captured in snapshot does not make a single
> byte available. ZFS allows you to additionally restrict active file
> system size ("referenced" quota in ZFS) - this more closely matches your
> expectation - deleting file in active file system decreases its
> "referenced" size thus allowing user to write more data (as long as user
> does not exceed total dataset quota). This is different from btrfs
> "exculsive" and "shared". This should not be hard to implement in btrfs,
> as "referenced" simply means all data in current subvolume, be it
> exclusive or shared.
> 

Oops, actually this is exactly what "referenced" quota is. Limiting
total subvolume + snapshots is more difficult, as there is no inherent
connection between qgroups of source and snapshot nor any built-in way
to include snapshot qgroup in some common total qgroup when creating
snapshot.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-12  7:04             ` Andrei Borzenkov
  2018-08-12 17:39               ` Andrei Borzenkov
@ 2018-08-13 11:23               ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 26+ messages in thread
From: Austin S. Hemmelgarn @ 2018-08-13 11:23 UTC (permalink / raw)
  To: Andrei Borzenkov, Chris Murphy, Duncan; +Cc: Btrfs BTRFS

On 2018-08-12 03:04, Andrei Borzenkov wrote:
> 12.08.2018 06:16, Chris Murphy пишет:
>> On Fri, Aug 10, 2018 at 9:29 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>>> Chris Murphy posted on Fri, 10 Aug 2018 12:07:34 -0600 as excerpted:
>>>
>>>> But whether data is shared or exclusive seems potentially ephemeral, and
>>>> not something a sysadmin should even be able to anticipate let alone
>>>> individual users.
>>>
>>> Define "user(s)".
>>
>> The person who is saving their document on a network share, and
>> they've never heard of Btrfs.
>>
>>
>>> Arguably, in the context of btrfs tool usage, "user" /is/ the admin,
>>
>> I'm not talking about btrfs tools. I'm talking about rational,
>> predictable behavior of a shared folder.
>>
>> If I try to drop a 1GiB file into my share and I'm denied, not enough
>> free space, and behind the scenes it's because of a quota limit, I
>> expect I can delete *any* file(s) amounting to create 1GiB free space
>> and then I'll be able to drop that file successfully without error.
>>
>> But if I'm unwittingly deleting shared files, my quota usage won't go
>> down, and I still can't save my file. So now I somehow need a secret
>> incantation to discover only my exclusive files and delete enough of
>> them in order to save this 1GiB file. It's weird, it's unexpected, I
>> think it's a use case failure. Maybe Btrfs quotas isn't meant to work
>> with samba or NFS shares. *shrug*
>>
> 
> That's how both NetApp and ZFS work as well. I doubt anyone can
> seriously call NetApp "not meant to work with NFS or CIFS shares".
> 
> On NetApp space available to NFS/CIFS user is volume size minus space
> frozen in snapshots. If file, captured in snapshot, is deleted in active
> file system, it does not make a single byte available to external user.
> That's what surprised most every first time NetApp users.
> 
> On ZFS snapshots are contained in dataset and you limit total dataset
> space consumption including all snapshots. Thus end effect is the same -
> deleting data that is itself captured in snapshot does not make a single
> byte available. ZFS allows you to additionally restrict active file
> system size ("referenced" quota in ZFS) - this more closely matches your
> expectation - deleting file in active file system decreases its
> "referenced" size thus allowing user to write more data (as long as user
> does not exceed total dataset quota). This is different from btrfs
> "exculsive" and "shared". This should not be hard to implement in btrfs,
> as "referenced" simply means all data in current subvolume, be it
> exclusive or shared.
> 
> IOW ZFS allows to place restriction on both how much data user can use
> and how much data user is allowed additionally to protect (snapshot).
Except user created snapshots are kind of irrelevant here.  If we're 
talking about NFS/CIFS/SMB, there is no way for the user to create a 
snapshot (at least, not in-band), so provided the admin is sensible and 
only uses the referenced quota for limiting space usage by users, things 
behave no differently on  ZFS than they do on ext4 or XFS using user quotas.

Note also that a lot of storage appliances that use ZFS as the 
underlying storage don't expose any way for the admin to use anything 
other than the referenced quota (and usually space reservations).  They 
do this because it makes the system behave as pretty much everyone 
intuitively expects, and it ensures that users don't have to go to an 
admin to remedy their free space issues.
> 
>>
>>>
>>> "Regular users" as you use the term, that is the non-admins who just need
>>> to know how close they are to running out of their allotted storage
>>> resources, shouldn't really need to care about btrfs tool usage in the
>>> first place, and btrfs commands in general, including btrfs quota related
>>> commands, really aren't targeted at them, and aren't designed to report
>>> the type of information they are likely to find useful.  Other tools will
>>> be more appropriate.
>>
>> I'm not talking about any btrfs commands or even the term quota for
>> regular users. I'm talking about saving a file, being denied, and how
>> does the user figure out how to free up space?
>>
> 
> Users need to be educated. Same as with NetApp and ZFS. There is no
> magic, redirect-on-write filesystems work differently than traditional
> and users need to adapt.
> 
> Of course devil is in details, and usability of btrfs quota is far lower
> than NetApp/ZFS. In those space consumption information is first class
> citizen integrated into the very basic tools, not something bolted on
> later and mostly incomprehensible to end user.
Except that this _CAN_ be made to work and behave just like classic 
quotas.  Your example of ZFS above proves it (referenced quotas behave 
just like classic VFS quotas).  Yes, we need to educate users regarding 
qgroups, but we need a _WORKING_ alternative so they can do things like 
they always have, and like most stuff that uses ZFS as part of a 
pre-built system (FreeNAS for example) does.
> 
>> Anyway, it's a hypothetical scenario. While I have Samba running on a
>> Btrfs volume with various shares as subvolumes, I don't have quotas
>> enabled.
> 
> Given all performance issues with quota reported on this list it is
> probably just as good for you.
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-07-31 13:49 Report correct filesystem usage / limits on BTRFS subvolumes with quota Thomas Leister
  2018-07-31 14:32 ` Qu Wenruo
@ 2018-08-14  2:49 ` Jeff Mahoney
  2018-08-15 11:22   ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff Mahoney @ 2018-08-14  2:49 UTC (permalink / raw)
  To: Thomas Leister, dsterba; +Cc: linux-btrfs, lxc-devel


[-- Attachment #1.1: Type: text/plain, Size: 2806 bytes --]

On 7/31/18 9:49 AM, Thomas Leister wrote:
> Dear David,
> hello everyone,
> 
> during a recent project of mine involving LXD and BTRFS I found out that
> quotas on BTRFS subvolumes are enforced, but file system usage and
> limits set via quotas are not reported correctly in LXC containers.
> 
> I've found this discussion regarding my problem:
> https://github.com/lxc/lxd/issues/2180
> 
> There was already a proposal to introduce subvolume quota support some
> time ago:
> https://marc.info/?l=linux-btrfs&m=147576434114415&w=2
> 
> @David as I've seen your response on that topic on the mailing list,
> maybe you can tell me if there are any plans to support correct
> subvolume quota reporting e.g. for "df -h" calls from within a
> container? Maybe there's already something on your / SUSE's roadmap? :-)
> 
> As more and more container environments spin up these days, there might
> be a growing demand on that :-) Personally I'd really appreciate if I
> could read the current file system usage and limit from within a
> container using BTRFS as storage backend.

In addition to Qu's deeper dive into the qgroup internals, the bigger
issue is that statfs(), which df uses, only allows the kernel to report
two numbers to describe space usage: total blocks and free blocks[1].

Currently, the only space tracking we have at a subvolume level is
qgroups.  There are other trees than "file" (i.e. subvolume) trees in
btrfs and those aren't accounted using qgroups.

So what should these numbers contain to describe to the user what their
space usage looks like?  We can have subvolume qgroup limits, nested
qgroup limits, disk capacity limits.  Df on btrfs is already confusing
enough with the data/metadata allocation split combined with potentially
different allocation policies for each.

That's the biggest hurdle to reporting per-subvolume information via df.
 It's a feature request that's been in my TODO inbox for a while.  I
started in on it again this year and came to the conclusion that wiring
qgroups into df would be easy[2] - but would ultimately raise more
questions than it solved for the user.  That's pretty much what I'm
seeing in this thread.

So, what I have mostly working is adding support for 'btrfs qgroup show'
to output in JSON format so that tools can be easily written to use the
numbers available to provide exactly what information the user wants
(given the limit of what's available.)

-Jeff

[1] There's f_bavail too, but that's not really relevant and runs into
the same issue as above in any case.
[2] Well, wiring it into statfs would be easy.  Publishing vfsmounts for
every subvolume so df actually calls it on unqualified 'df' executions
is rather a bit more involved.
-- 
Jeff Mahoney
SUSE Labs



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
  2018-08-14  2:49 ` Jeff Mahoney
@ 2018-08-15 11:22   ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 26+ messages in thread
From: Austin S. Hemmelgarn @ 2018-08-15 11:22 UTC (permalink / raw)
  To: Jeff Mahoney, Thomas Leister, dsterba; +Cc: linux-btrfs, lxc-devel

On 2018-08-13 22:49, Jeff Mahoney wrote:
> On 7/31/18 9:49 AM, Thomas Leister wrote:
>> Dear David,
>> hello everyone,
>>
>> during a recent project of mine involving LXD and BTRFS I found out that
>> quotas on BTRFS subvolumes are enforced, but file system usage and
>> limits set via quotas are not reported correctly in LXC containers.
>>
>> I've found this discussion regarding my problem:
>> https://github.com/lxc/lxd/issues/2180
>>
>> There was already a proposal to introduce subvolume quota support some
>> time ago:
>> https://marc.info/?l=linux-btrfs&m=147576434114415&w=2
>>
>> @David as I've seen your response on that topic on the mailing list,
>> maybe you can tell me if there are any plans to support correct
>> subvolume quota reporting e.g. for "df -h" calls from within a
>> container? Maybe there's already something on your / SUSE's roadmap? :-)
>>
>> As more and more container environments spin up these days, there might
>> be a growing demand on that :-) Personally I'd really appreciate if I
>> could read the current file system usage and limit from within a
>> container using BTRFS as storage backend.
> 
> In addition to Qu's deeper dive into the qgroup internals, the bigger
> issue is that statfs(), which df uses, only allows the kernel to report
> two numbers to describe space usage: total blocks and free blocks[1].
> 
> Currently, the only space tracking we have at a subvolume level is
> qgroups.  There are other trees than "file" (i.e. subvolume) trees in
> btrfs and those aren't accounted using qgroups.
Last I knew, ext4 and XFS didn't count metadata (other than xattrs) 
against quotas, so outside of accounting inlined files properly (I'm not 
sure if qgroups do so or not), we're about equivalent to them in terms 
of what we're counting.
> 
> So what should these numbers contain to describe to the user what their
> space usage looks like?  We can have subvolume qgroup limits, nested
> qgroup limits, disk capacity limits.  Df on btrfs is already confusing
> enough with the data/metadata allocation split combined with potentially
> different allocation policies for each.
This is _easy_ though.  You report whatever the lowest current limit is. 
  99% of the time if something or someone is calling `df`, they want to 
know how much space on that volume is currently available for usage. 
Just reporting whichever of the limits they're going to hit first 
handles that case perfectly, and more importantly, is completely 
unambiguous.
> 
> That's the biggest hurdle to reporting per-subvolume information via df.
>   It's a feature request that's been in my TODO inbox for a while.  I
> started in on it again this year and came to the conclusion that wiring
> qgroups into df would be easy[2] - but would ultimately raise more
> questions than it solved for the user.  That's pretty much what I'm
> seeing in this thread.
> 
> So, what I have mostly working is adding support for 'btrfs qgroup show'
> to output in JSON format so that tools can be easily written to use the
> numbers available to provide exactly what information the user wants
> (given the limit of what's available.)
> 
> -Jeff
> 
> [1] There's f_bavail too, but that's not really relevant and runs into
> the same issue as above in any case.
> [2] Well, wiring it into statfs would be easy.  Publishing vfsmounts for
> every subvolume so df actually calls it on unqualified 'df' executions
> is rather a bit more involved.
But probably worth it, because we could then expose in the mount options 
for each subvolume whether or not it's in a qgroup (and possibly which 
one if it is), which would help alleviate a decent percentage of the 
confusion.  Right now, it's a pain in the arse to figure out if a given 
subvolume has a quota or not, because _none_ of the conventional methods 
of determining if quotas are in effect work at all.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-08-15 14:14 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-31 13:49 Report correct filesystem usage / limits on BTRFS subvolumes with quota Thomas Leister
2018-07-31 14:32 ` Qu Wenruo
2018-07-31 16:03   ` Austin S. Hemmelgarn
2018-08-01  1:23     ` Qu Wenruo
2018-08-09 17:48   ` Tomasz Pala
2018-08-09 23:35     ` Qu Wenruo
2018-08-10  7:17       ` Tomasz Pala
2018-08-10  7:55         ` Qu Wenruo
2018-08-10  9:33           ` Tomasz Pala
2018-08-11  6:54             ` Andrei Borzenkov
2018-08-10 11:32       ` Austin S. Hemmelgarn
2018-08-10 18:07       ` Chris Murphy
2018-08-10 19:10         ` Austin S. Hemmelgarn
2018-08-11  3:29         ` Duncan
2018-08-12  3:16           ` Chris Murphy
2018-08-12  7:04             ` Andrei Borzenkov
2018-08-12 17:39               ` Andrei Borzenkov
2018-08-13 11:23               ` Austin S. Hemmelgarn
     [not found]     ` <f66b8ff3-d7ec-31ad-e9ca-e09c9eb76474@gmail.com>
2018-08-10  7:33       ` Tomasz Pala
2018-08-11  5:46         ` Andrei Borzenkov
2018-08-10 11:39     ` Austin S. Hemmelgarn
2018-08-10 18:21       ` Tomasz Pala
2018-08-10 18:48         ` Austin S. Hemmelgarn
2018-08-11  6:18         ` Andrei Borzenkov
2018-08-14  2:49 ` Jeff Mahoney
2018-08-15 11:22   ` Austin S. Hemmelgarn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).