linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Provide an estimation of (free/total) inodes in statfs
@ 2019-10-24 15:44 Johannes Thumshirn
  2019-10-24 15:44 ` [PATCH 1/2] btrfs: remove cached space_info in btrfs_statfs() Johannes Thumshirn
  2019-10-24 15:44 ` [PATCH 2/2] btrfs: provide an estimated number of inodes for statfs Johannes Thumshirn
  0 siblings, 2 replies; 7+ messages in thread
From: Johannes Thumshirn @ 2019-10-24 15:44 UTC (permalink / raw)
  To: David Sterba; +Cc: Linux BTRFS Mailinglist, Johannes Thumshirn

There is a user report on the BeeGFS mailing list, that it's impossible to run
BeeGFS on top of BTRFS.

This is because BeeGFS is storing it's meta-data either in the underlying
file-systems extended attributes or directly in an inode if it can make use of
inline inodes.

A more detailed description is in patch 2/2.

Without the patch applied:
rapido1:/# df -hTi /mnt/test
Filesystem     Type     Inodes IUsed IFree IUse% Mounted on
/mnt/test      btrfs         0     0     0     - /mnt/test

With the patch applied on an empty fs:
rapido1:/# df -hTi /mnt/test
Filesystem     Type     Inodes IUsed IFree IUse% Mounted on
/dev/zram0     btrfs      1.6K     0  1.6K    0% /mnt/test

With the patch applied on a dirty fs:
rapido1:/# df -hTi /mnt/test
Filesystem     Type     Inodes IUsed IFree IUse% Mounted on
/dev/zram0     btrfs      1.6K  1.5K   197   88% /mnt/test

Johannes Thumshirn (2):
  btrfs: remove cached space_info in btrfs_statfs()
  btrfs: provide an estimated number of inodes for statfs

 fs/btrfs/super.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] btrfs: remove cached space_info in btrfs_statfs()
  2019-10-24 15:44 [PATCH 0/2] Provide an estimation of (free/total) inodes in statfs Johannes Thumshirn
@ 2019-10-24 15:44 ` Johannes Thumshirn
  2019-10-25 16:41   ` David Sterba
  2019-10-24 15:44 ` [PATCH 2/2] btrfs: provide an estimated number of inodes for statfs Johannes Thumshirn
  1 sibling, 1 reply; 7+ messages in thread
From: Johannes Thumshirn @ 2019-10-24 15:44 UTC (permalink / raw)
  To: David Sterba; +Cc: Linux BTRFS Mailinglist, Johannes Thumshirn

In btrfs_statfs() we cache fs_info::space_info in a local variable only
to use it once in a list_for_each_rcu() statement.

Not only is the local variable unnecessary it even makes the code harder
to follow as it's not clear which list it is iterating.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 fs/btrfs/super.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d5d15a19f51d..b818f764c1c9 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2021,7 +2021,6 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb);
 	struct btrfs_super_block *disk_super = fs_info->super_copy;
-	struct list_head *head = &fs_info->space_info;
 	struct btrfs_space_info *found;
 	u64 total_used = 0;
 	u64 total_free_data = 0;
@@ -2035,7 +2034,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	int mixed = 0;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(found, head, list) {
+	list_for_each_entry_rcu(found, &fs_info->space_info, list) {
 		if (found->flags & BTRFS_BLOCK_GROUP_DATA) {
 			int i;
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] btrfs: provide an estimated number of inodes for statfs
  2019-10-24 15:44 [PATCH 0/2] Provide an estimation of (free/total) inodes in statfs Johannes Thumshirn
  2019-10-24 15:44 ` [PATCH 1/2] btrfs: remove cached space_info in btrfs_statfs() Johannes Thumshirn
@ 2019-10-24 15:44 ` Johannes Thumshirn
  2019-10-25  0:56   ` Qu Wenruo
  2019-10-25 10:05   ` David Sterba
  1 sibling, 2 replies; 7+ messages in thread
From: Johannes Thumshirn @ 2019-10-24 15:44 UTC (permalink / raw)
  To: David Sterba; +Cc: Linux BTRFS Mailinglist, Johannes Thumshirn

On the BeeGFS Mailing list there is a report claiming BTRFS is not usable
with BeeGFS, as BeeGFS is using statfs output to determine the number of
total and free inodes. BeeGFS needs the number of free inodes as it stores
its meta-data either in extended attributes of the underlying file-system
or directly in an inline inode. According to the BeeGFS Server Tuning
Guide:

"""
BeeGFS metadata is stored as extended attributes (EAs) on the underlying
file system to optimal performance. One metadata file will be created for
each file that a user creates. About extended attributes usage: BeeGFS
Metadata files have a size of 0 bytes (i.e. no normal file contents).

Access to extended attributes is possible with the getfattr tool.

If the inodes of the underlying file system are sufficiently large, EAs
can be inlined into the inode of the underlying file system.  Additional
data blocks are then not required anymore and metadata disk usage will be
reduced.  With EAs inlined into the inode, access latencies are reduced as
seeking to an extra data block is not required anymore.
"""

Provide some estimated numbers of total and free inodes in statfs by
dividing the number of blocks by the size of an inode-item for the total
number of possible inodes and for the number of free inodes divide the
number of free blocks by the size of an inode-item, similar to what other
file-systems without a fixed number of inodes do.

This of is just an estimation and should not be relied upon.

Without the patch applied:
rapido1:/# df -hTi /mnt/test
Filesystem     Type     Inodes IUsed IFree IUse% Mounted on
/mnt/test      btrfs         0     0     0     - /mnt/test

With the patch applied on an empty fs:
rapido1:/# df -hTi /mnt/test
Filesystem     Type     Inodes IUsed IFree IUse% Mounted on
/dev/zram0     btrfs      1.6K     0  1.6K    0% /mnt/test

With the patch applied on a dirty fs:
rapido1:/# df -hTi /mnt/test
Filesystem     Type     Inodes IUsed IFree IUse% Mounted on
/dev/zram0     btrfs      1.6K  1.5K   197   88% /mnt/test

Link: https://groups.google.com/forum/#!msg/fhgfs-user/IJqGS5o1UD0/8ftDdUI3AQAJ
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 fs/btrfs/super.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b818f764c1c9..6f6f6a70eb1e 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2068,6 +2068,8 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	buf->f_blocks = div_u64(btrfs_super_total_bytes(disk_super), factor);
 	buf->f_blocks >>= bits;
 	buf->f_bfree = buf->f_blocks - (div_u64(total_used, factor) >> bits);
+	buf->f_files = div_u64(buf->f_blocks, sizeof(struct btrfs_inode_item));
+	buf->f_ffree = div_u64(buf->f_bfree, sizeof(struct btrfs_inode_item));
 
 	/* Account global block reserve as used, it's in logical size already */
 	spin_lock(&block_rsv->lock);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] btrfs: provide an estimated number of inodes for statfs
  2019-10-24 15:44 ` [PATCH 2/2] btrfs: provide an estimated number of inodes for statfs Johannes Thumshirn
@ 2019-10-25  0:56   ` Qu Wenruo
  2019-10-25  8:55     ` Johannes Thumshirn
  2019-10-25 10:05   ` David Sterba
  1 sibling, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2019-10-25  0:56 UTC (permalink / raw)
  To: Johannes Thumshirn, David Sterba; +Cc: Linux BTRFS Mailinglist


[-- Attachment #1.1: Type: text/plain, Size: 4137 bytes --]



On 2019/10/24 下午11:44, Johannes Thumshirn wrote:
> On the BeeGFS Mailing list there is a report claiming BTRFS is not usable
> with BeeGFS, as BeeGFS is using statfs output to determine the number of
> total and free inodes. BeeGFS needs the number of free inodes as it stores
> its meta-data either in extended attributes of the underlying file-system
> or directly in an inline inode. According to the BeeGFS Server Tuning
> Guide:
> 
> """
> BeeGFS metadata is stored as extended attributes (EAs) on the underlying
> file system to optimal performance. One metadata file will be created for
> each file that a user creates. About extended attributes usage: BeeGFS
> Metadata files have a size of 0 bytes (i.e. no normal file contents).
> 
> Access to extended attributes is possible with the getfattr tool.
> 
> If the inodes of the underlying file system are sufficiently large, EAs
> can be inlined into the inode of the underlying file system.  Additional
> data blocks are then not required anymore and metadata disk usage will be
> reduced.  With EAs inlined into the inode, access latencies are reduced as
> seeking to an extra data block is not required anymore.
> """

Personally speaking, reporting 0 used and 0 free should be the proper
way. User of the fs should be aware of dynamical fs which doesn't go
fixed inodes.

I really think it's BeeFS' job to change their behavior.

Since there are more thing to consider when faking the used/free inodes.

> 
> Provide some estimated numbers of total and free inodes in statfs by
> dividing the number of blocks by the size of an inode-item for the total
> number of possible inodes and for the number of free inodes divide the
> number of free blocks by the size of an inode-item, similar to what other
> file-systems without a fixed number of inodes do.
> 
> This of is just an estimation and should not be relied upon.
> 
> Without the patch applied:
> rapido1:/# df -hTi /mnt/test
> Filesystem     Type     Inodes IUsed IFree IUse% Mounted on
> /mnt/test      btrfs         0     0     0     - /mnt/test
> 
> With the patch applied on an empty fs:
> rapido1:/# df -hTi /mnt/test
> Filesystem     Type     Inodes IUsed IFree IUse% Mounted on
> /dev/zram0     btrfs      1.6K     0  1.6K    0% /mnt/test
> 
> With the patch applied on a dirty fs:
> rapido1:/# df -hTi /mnt/test
> Filesystem     Type     Inodes IUsed IFree IUse% Mounted on
> /dev/zram0     btrfs      1.6K  1.5K   197   88% /mnt/test
> 
> Link: https://groups.google.com/forum/#!msg/fhgfs-user/IJqGS5o1UD0/8ftDdUI3AQAJ
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  fs/btrfs/super.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index b818f764c1c9..6f6f6a70eb1e 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2068,6 +2068,8 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  	buf->f_blocks = div_u64(btrfs_super_total_bytes(disk_super), factor);
>  	buf->f_blocks >>= bits;
>  	buf->f_bfree = buf->f_blocks - (div_u64(total_used, factor) >> bits);
> +	buf->f_files = div_u64(buf->f_blocks, sizeof(struct btrfs_inode_item));

That's too optimistic. (I'd call it even beyond Elon Musk's schedule)

We have tree block header overhead, and with the increase of tree
blocks, the size of extent tree will also increase and bring overhead.

In long run, user will report that the ffiles increases more than they used.
It will be a hell to calculate such estimation, and we will never reach
a good enough point for that.

> +	buf->f_ffree = div_u64(buf->f_bfree, sizeof(struct btrfs_inode_item));

The same can be applied to ffree, it will decrease faster than real usage.

If whatever the distributed fs is using ffree/files as an indicator,
it's not reliable anyway. And if they accept such unreliable indicator,
they'd better double think before using that indicator.

Thanks,
Qu

>  
>  	/* Account global block reserve as used, it's in logical size already */
>  	spin_lock(&block_rsv->lock);
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] btrfs: provide an estimated number of inodes for statfs
  2019-10-25  0:56   ` Qu Wenruo
@ 2019-10-25  8:55     ` Johannes Thumshirn
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Thumshirn @ 2019-10-25  8:55 UTC (permalink / raw)
  To: Qu Wenruo, David Sterba; +Cc: Linux BTRFS Mailinglist

On 25/10/2019 02:56, Qu Wenruo wrote:
[...]
> Personally speaking, reporting 0 used and 0 free should be the proper
> way. User of the fs should be aware of dynamical fs which doesn't go
> fixed inodes.
> 
> I really think it's BeeFS' job to change their behavior.
> 
> Since there are more thing to consider when faking the used/free inodes.

I'm with you on this. It is something BeeGFS has to fix, but judging
from what other file-systems do, some do have a real fixed number of
inodes, some assign 0 or -1, some do not touch the variable at all and
some (i.e. xfs) fake a number. My role model was xfs here.

-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] btrfs: provide an estimated number of inodes for statfs
  2019-10-24 15:44 ` [PATCH 2/2] btrfs: provide an estimated number of inodes for statfs Johannes Thumshirn
  2019-10-25  0:56   ` Qu Wenruo
@ 2019-10-25 10:05   ` David Sterba
  1 sibling, 0 replies; 7+ messages in thread
From: David Sterba @ 2019-10-25 10:05 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: David Sterba, Linux BTRFS Mailinglist

On Thu, Oct 24, 2019 at 05:44:55PM +0200, Johannes Thumshirn wrote:
> On the BeeGFS Mailing list there is a report claiming BTRFS is not usable
> with BeeGFS, as BeeGFS is using statfs output to determine the number of
> total and free inodes. BeeGFS needs the number of free inodes as it stores
> its meta-data either in extended attributes of the underlying file-system
> or directly in an inline inode. According to the BeeGFS Server Tuning
> Guide:
> 
> """
> BeeGFS metadata is stored as extended attributes (EAs) on the underlying
> file system to optimal performance. One metadata file will be created for
> each file that a user creates. About extended attributes usage: BeeGFS
> Metadata files have a size of 0 bytes (i.e. no normal file contents).

That's not really typical use of a files and the 'optimal performance'
claim would need some clarifications.

> Access to extended attributes is possible with the getfattr tool.
> 
> If the inodes of the underlying file system are sufficiently large, EAs
> can be inlined into the inode of the underlying file system.  Additional
> data blocks are then not required anymore and metadata disk usage will be
> reduced.  With EAs inlined into the inode, access latencies are reduced as
> seeking to an extra data block is not required anymore.

So this describes how it's implemented in EXT4 and the BeeGFS is
probably tuned to work 'optimally' there.

> """
> 
> Provide some estimated numbers of total and free inodes in statfs by
> dividing the number of blocks by the size of an inode-item for the total
> number of possible inodes and for the number of free inodes divide the
> number of free blocks by the size of an inode-item, similar to what other
> file-systems without a fixed number of inodes do.
> 
> This of is just an estimation and should not be relied upon.

This is the most problematic part. The inode counts cannot be calculated
exactly on btrfs, because of the dynamic nature of the space usage. We
can only give rough estimates "how the rest of unallocated space would
be used if [assumptions]". We have this problem with explaining 'df'
values and now somebody is asking for the same with 'df -i'.

The Inode/IFree numbers are intentionally zero, to avoid confusion of
monitoring tools to report low inode counts. Though I can't find a
documented and standardized interpretation of the numbers, manual page
of statfs only says

	fsfilcnt_t f_files;   /* Total file nodes in filesystem */
	fsfilcnt_t f_ffree;   /* Free file nodes in filesystem */

for the respective fields. And nothing else.

For traditional filesystems, and EXT2/3/4 in particular, the inodes are
preallocated at creation time so calculating the numbers is easy.

I believe XFS does that too without the option inode64, so users are
used to see non-zero value and nowadays it has to be faked. That makes
sense from backward compatibility POV. But still the numbers are made up
and can change unexpectedly.

Btrfs has reported 0/0/0 since the beginning to not cofuse monitoring
tools, yet this is exactly what can be seen at

https://groups.google.com/forum/#!msg/fhgfs-user/IJqGS5o1UD0/8ftDdUI3AQAJ

I'd say fix your monitoring tool not to interpret 0% free inodes in case
there's also 0 in total. This is not even btrfs-specific fix, IMHO this
is interpreting the numbers in the wrong way.

> Without the patch applied:
> rapido1:/# df -hTi /mnt/test
> Filesystem     Type     Inodes IUsed IFree IUse% Mounted on
> /mnt/test      btrfs         0     0     0     - /mnt/test
> 
> With the patch applied on an empty fs:
> rapido1:/# df -hTi /mnt/test
> Filesystem     Type     Inodes IUsed IFree IUse% Mounted on
> /dev/zram0     btrfs      1.6K     0  1.6K    0% /mnt/test
> 
> With the patch applied on a dirty fs:
> rapido1:/# df -hTi /mnt/test
> Filesystem     Type     Inodes IUsed IFree IUse% Mounted on
> /dev/zram0     btrfs      1.6K  1.5K   197   88% /mnt/test

At the moment I object against conjuring up numbers like that. It's
perhaps going to silence some tools but would cause lots of questions
because the numbers otherwise don't reflect reality, not even close.

We try hard to make the regular Allocated/Free space numbers to match
users' expectations, but it's not perfect and can't be made much better.
And I'm glad we have a simple answer to the inode counts.

Should the discussion continue, it would be good to have interested
people from BeeGFS on CC.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: remove cached space_info in btrfs_statfs()
  2019-10-24 15:44 ` [PATCH 1/2] btrfs: remove cached space_info in btrfs_statfs() Johannes Thumshirn
@ 2019-10-25 16:41   ` David Sterba
  0 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2019-10-25 16:41 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: David Sterba, Linux BTRFS Mailinglist

On Thu, Oct 24, 2019 at 05:44:54PM +0200, Johannes Thumshirn wrote:
> In btrfs_statfs() we cache fs_info::space_info in a local variable only
> to use it once in a list_for_each_rcu() statement.
> 
> Not only is the local variable unnecessary it even makes the code harder
> to follow as it's not clear which list it is iterating.
> 
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>

Added to misc-next as it's not related to the statfs change itself.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-10-25 16:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24 15:44 [PATCH 0/2] Provide an estimation of (free/total) inodes in statfs Johannes Thumshirn
2019-10-24 15:44 ` [PATCH 1/2] btrfs: remove cached space_info in btrfs_statfs() Johannes Thumshirn
2019-10-25 16:41   ` David Sterba
2019-10-24 15:44 ` [PATCH 2/2] btrfs: provide an estimated number of inodes for statfs Johannes Thumshirn
2019-10-25  0:56   ` Qu Wenruo
2019-10-25  8:55     ` Johannes Thumshirn
2019-10-25 10:05   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).