All of lore.kernel.org
 help / color / mirror / Atom feed
* How to reliably measure fs usage with reflinks enabled?
@ 2018-05-14 20:02 Tarik Ceylan
  2018-05-14 22:02 ` Eric Sandeen
  0 siblings, 1 reply; 12+ messages in thread
From: Tarik Ceylan @ 2018-05-14 20:02 UTC (permalink / raw)
  To: linux-xfs

How can one reliably measure filesystem usage on partitions that were 
compiled with -m reflink=1 ?
Here are some numbers i am measuring with df -h (on different partitions 
holding the same data):
7.7G of 36G  (-b size=512  -m crc=0 )
8.6G of 36G  (-b size=4096 -m crc=1 )
11G  of 36G  (-b size=1024 -m crc=1,reflink=1,rmapbt=1 -i sparse=1 )
32G  of 864G (-b size=4096 -m crc=1,reflink=1 )

I already ruled out fragmentation as a cause. The data does not contain 
many duplicates (roughly 200mb could be freed by deduplicating). Since 
measuring fs usage on btrfs also isn't trivial, i would suspect that 
there are similar problems happening here. But i could not find any 
information on how to measure fs usage properly when using xfs with 
reflinks. Kernel in use is 4.14.40.

Tarik Ceylan

(I am not subscribed to this list, i'd be grateful if you could CC me in 
your reply)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to reliably measure fs usage with reflinks enabled?
  2018-05-14 20:02 How to reliably measure fs usage with reflinks enabled? Tarik Ceylan
@ 2018-05-14 22:02 ` Eric Sandeen
  2018-05-14 22:57   ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Sandeen @ 2018-05-14 22:02 UTC (permalink / raw)
  To: Tarik Ceylan, linux-xfs



On 5/14/18 3:02 PM, Tarik Ceylan wrote:
> How can one reliably measure filesystem usage on partitions that were compiled with -m reflink=1 ?
> Here are some numbers i am measuring with df -h (on different partitions holding the same data):
> 7.7G of 36G  (-b size=512  -m crc=0 )
> 8.6G of 36G  (-b size=4096 -m crc=1 )

8x larger inodes will take 8x more space, but you didn't say how many
inodes you have allocated.

> 11G  of 36G  (-b size=1024 -m crc=1,reflink=1,rmapbt=1 -i sparse=1 )
> 32G  of 864G (-b size=4096 -m crc=1,reflink=1 )

In that last case, you have a wildly different total fs size, so probably
no fair comparison here either.

The reverse mapping btree also takes up space.  You're turning too many
knobs at once.  ;)

> I already ruled out fragmentation as a cause. The data does not contain many duplicates (roughly 200mb could be freed by deduplicating). Since measuring fs usage on btrfs also isn't trivial, i would suspect that there are similar problems happening here. But i could not find any information on how to measure fs usage properly when using xfs with reflinks. Kernel in use is 4.14.40.

Perhaps you can change only one variable at a time to make the experiment
more meaningful.

-Eric

> Tarik Ceylan
> 
> (I am not subscribed to this list, i'd be grateful if you could CC me in your reply)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to reliably measure fs usage with reflinks enabled?
  2018-05-14 22:02 ` Eric Sandeen
@ 2018-05-14 22:57   ` Dave Chinner
  2018-05-14 23:37     ` Tarik Ceylan
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2018-05-14 22:57 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Tarik Ceylan, linux-xfs

On Mon, May 14, 2018 at 05:02:53PM -0500, Eric Sandeen wrote:
> 
> 
> On 5/14/18 3:02 PM, Tarik Ceylan wrote:
> > How can one reliably measure filesystem usage on partitions that were compiled with -m reflink=1 ?
> > Here are some numbers i am measuring with df -h (on different partitions holding the same data):
> > 7.7G of 36G  (-b size=512  -m crc=0 )
> > 8.6G of 36G  (-b size=4096 -m crc=1 )
> 
> 8x larger inodes will take 8x more space, but you didn't say how many
> inodes you have allocated.
> 
> > 11G  of 36G  (-b size=1024 -m crc=1,reflink=1,rmapbt=1 -i sparse=1 )
> > 32G  of 864G (-b size=4096 -m crc=1,reflink=1 )
> 
> In that last case, you have a wildly different total fs size, so probably
> no fair comparison here either.
> 
> The reverse mapping btree also takes up space.  You're turning too many
> knobs at once.  ;)

Also, we reserve a lot of space for reflink/rmapbt metadata that
isn't actually used, so you're not actually using any more space
than the "-b size=4096 -m crc=1" case. I have plans for hiding that
reservation from users so that we don't get questions like this....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to reliably measure fs usage with reflinks enabled?
  2018-05-14 22:57   ` Dave Chinner
@ 2018-05-14 23:37     ` Tarik Ceylan
  2018-05-15  1:29       ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Tarik Ceylan @ 2018-05-14 23:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, sandeen

Am 2018-05-15 00:57, schrieb Dave Chinner:
> On Mon, May 14, 2018 at 05:02:53PM -0500, Eric Sandeen wrote:
>> 
>> 
>> On 5/14/18 3:02 PM, Tarik Ceylan wrote:
>> > How can one reliably measure filesystem usage on partitions that were compiled with -m reflink=1 ?
>> > Here are some numbers i am measuring with df -h (on different partitions holding the same data):
>> > 7.7G of 36G  (-b size=512  -m crc=0 )
>> > 8.6G of 36G  (-b size=4096 -m crc=1 )
>> 
>> 8x larger inodes will take 8x more space, but you didn't say how many
>> inodes you have allocated.
>> 
>> > 11G  of 36G  (-b size=1024 -m crc=1,reflink=1,rmapbt=1 -i sparse=1 )
>> > 32G  of 864G (-b size=4096 -m crc=1,reflink=1 )
>> 
>> In that last case, you have a wildly different total fs size, so 
>> probably
>> no fair comparison here either.
>> 
>> The reverse mapping btree also takes up space.  You're turning too 
>> many
>> knobs at once.  ;)

Thanks,
here's a test in which i only compare reflink=0 to reflink=1, all other
variables being the same:

mkfs.xfs -f -m reflink=0 /dev/sdc4
meta-data=/dev/sdc4              isize=512    agcount=4, agsize=58687982 
blks
          =                       sectsz=512   attr=2, projid32bit=1
          =                       crc=1        finobt=1, sparse=0, 
rmapbt=0, reflink=0
data     =                       bsize=4096   blocks=234751926, 
imaxpct=25
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=114624, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

"df -h" shows a usage of 8.8G of 896G

mkfs.xfs -f -m reflink=1 /dev/sdc4
[output same as before except the reflink parameter]
15G of 896G

> 
> Also, we reserve a lot of space for reflink/rmapbt metadata that
> isn't actually used, so you're not actually using any more space
> than the "-b size=4096 -m crc=1" case. I have plans for hiding that
> reservation from users so that we don't get questions like this....

That should resolve my confusion. Sorry to have bothered, but it's kind 
of
an obvious question.
To get back to my original question - can i assume  "df" to be a 
reliable
way of measuring fs usage going forward (after the change you mention),
or will specialized tools be necessary as is the case with btrfs?

Tarik

> 
> Cheers,
> 
> Dave.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to reliably measure fs usage with reflinks enabled?
  2018-05-14 23:37     ` Tarik Ceylan
@ 2018-05-15  1:29       ` Dave Chinner
  2018-05-15 13:52         ` Mike Fleetwood
  2018-05-18 14:58         ` Darrick J. Wong
  0 siblings, 2 replies; 12+ messages in thread
From: Dave Chinner @ 2018-05-15  1:29 UTC (permalink / raw)
  To: Tarik Ceylan; +Cc: linux-xfs, sandeen

On Tue, May 15, 2018 at 01:37:32AM +0200, Tarik Ceylan wrote:
> Am 2018-05-15 00:57, schrieb Dave Chinner:
> >On Mon, May 14, 2018 at 05:02:53PM -0500, Eric Sandeen wrote:
> >>
> >>
> >>On 5/14/18 3:02 PM, Tarik Ceylan wrote:
> >>> How can one reliably measure filesystem usage on partitions that were compiled with -m reflink=1 ?
> >>> Here are some numbers i am measuring with df -h (on different partitions holding the same data):
> >>> 7.7G of 36G  (-b size=512  -m crc=0 )
> >>> 8.6G of 36G  (-b size=4096 -m crc=1 )
> >>
> >>8x larger inodes will take 8x more space, but you didn't say how many
> >>inodes you have allocated.
> >>
> >>> 11G  of 36G  (-b size=1024 -m crc=1,reflink=1,rmapbt=1 -i sparse=1 )
> >>> 32G  of 864G (-b size=4096 -m crc=1,reflink=1 )
> >>
> >>In that last case, you have a wildly different total fs size, so
> >>probably
> >>no fair comparison here either.
> >>
> >>The reverse mapping btree also takes up space.  You're turning
> >>too many
> >>knobs at once.  ;)
> 
> Thanks,
> here's a test in which i only compare reflink=0 to reflink=1, all other
> variables being the same:
> 
> mkfs.xfs -f -m reflink=0 /dev/sdc4
> meta-data=/dev/sdc4              isize=512    agcount=4,
> agsize=58687982 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=0,
> rmapbt=0, reflink=0
> data     =                       bsize=4096   blocks=234751926,
> imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> log      =internal log           bsize=4096   blocks=114624, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> "df -h" shows a usage of 8.8G of 896G
> 
> mkfs.xfs -f -m reflink=1 /dev/sdc4
> [output same as before except the reflink parameter]
> 15G of 896G

So the reflink code reserved ~7GB of space in the filesystem (less
than 1%) for it's own reflink related metadata if it ever needs it.
It hasn't used it yet but we need to make sure that it's available
when the filesystem is near ENOSPC. Hence it's considered used space
because users cannot store user data in that space.

The change I plan to make is to reduce the user reported filesystem
size rather than account for it as used space. IOWs, you'd see a
filesystem size of 889G instead of 896G, but have only 8.8GB used.
It means exactly the same thingi and will behave exactly the same
way, it's just a different space accounting technique....

> >Also, we reserve a lot of space for reflink/rmapbt metadata that
> >isn't actually used, so you're not actually using any more space
> >than the "-b size=4096 -m crc=1" case. I have plans for hiding that
> >reservation from users so that we don't get questions like this....
> 
> That should resolve my confusion. Sorry to have bothered, but it's
> kind of an obvious question.

It's the sort of "obvious question" which almost no-one has asked us
about... :)

> To get back to my original question - can i assume  "df" to be a
> reliable
> way of measuring fs usage going forward (after the change you mention),

df is reliable now, regardless of any change we make in the future.

> or will specialized tools be necessary as is the case with btrfs?

No - df works and it should always work. We try to learn from other
people's mistakes, not just our own... :)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to reliably measure fs usage with reflinks enabled?
  2018-05-15  1:29       ` Dave Chinner
@ 2018-05-15 13:52         ` Mike Fleetwood
  2018-05-16  0:13           ` Dave Chinner
  2018-05-18 14:58         ` Darrick J. Wong
  1 sibling, 1 reply; 12+ messages in thread
From: Mike Fleetwood @ 2018-05-15 13:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Tarik Ceylan, linux-xfs, sandeen

On 15 May 2018 at 02:29, Dave Chinner <david@fromorbit.com> wrote:
> So the reflink code reserved ~7GB of space in the filesystem (less
> than 1%) for it's own reflink related metadata if it ever needs it.
> It hasn't used it yet but we need to make sure that it's available
> when the filesystem is near ENOSPC. Hence it's considered used space
> because users cannot store user data in that space.
>
> The change I plan to make is to reduce the user reported filesystem
> size rather than account for it as used space. IOWs, you'd see a
> filesystem size of 889G instead of 896G, but have only 8.8GB used.
> It means exactly the same thingi and will behave exactly the same
> way, it's just a different space accounting technique....

I'm one of the authors of GParted and it uses the reported file system
size [1] and compares it to the block device size to see if the file
system fills the partition or not and whether to show unallocated space
to the user and advise them to grown the file system to fill the block
device [2].  As such we prefer that the reported size of the file system
match the highest offset that the file system can write to in the block
device.  Hence space not free for storing data such as super blocks and
other reserved metadata be included in used space.

[1] For mounted file systems it uses statvfs() and unmounted XFS file
    systems it uses:
      xfs_db -c 'sb 0' -c 'print blocksize' -c 'print dblocks' /dev/sda7
    to get the fs block size and number of blocks.

[2] For full disclosure, because tools for various FSs under report
    their file system size, there is a heuristic that there must be at
    least 2% difference before unallocated space and grow file system
    recommendation is generated so under reporting the FS size by less
    than 1% wouldn't actually be an issue. for us.

Just providing an app authors point of view.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to reliably measure fs usage with reflinks enabled?
  2018-05-15 13:52         ` Mike Fleetwood
@ 2018-05-16  0:13           ` Dave Chinner
  2018-05-18 14:43             ` Mike Fleetwood
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2018-05-16  0:13 UTC (permalink / raw)
  To: Mike Fleetwood; +Cc: Tarik Ceylan, linux-xfs, sandeen

On Tue, May 15, 2018 at 02:52:30PM +0100, Mike Fleetwood wrote:
> On 15 May 2018 at 02:29, Dave Chinner <david@fromorbit.com> wrote:
> > So the reflink code reserved ~7GB of space in the filesystem (less
> > than 1%) for it's own reflink related metadata if it ever needs it.
> > It hasn't used it yet but we need to make sure that it's available
> > when the filesystem is near ENOSPC. Hence it's considered used space
> > because users cannot store user data in that space.
> >
> > The change I plan to make is to reduce the user reported filesystem
> > size rather than account for it as used space. IOWs, you'd see a
> > filesystem size of 889G instead of 896G, but have only 8.8GB used.
> > It means exactly the same thingi and will behave exactly the same
> > way, it's just a different space accounting technique....
> 
> I'm one of the authors of GParted and it uses the reported file system
> size [1] and compares it to the block device size to see if the file
> system fills the partition or not and whether to show unallocated space
> to the user and advise them to grown the file system to fill the block
> device [2].  As such we prefer that the reported size of the file system
> match the highest offset that the file system can write to in the block
> device.

I think that's a narrow, use case specific assumption. There is
absolutely no guarantee that the filesystem on a device fills the
entire device or that the filesystem space reported by df/statvfs
accurately reflects the size of the underlying block device.

Filesystems are moving towards a virtualised world where space usage
and capacity is kept separate from the capacity of the underlying
storage provider. That's a solid direction we are moving with xfs:

https://www.spinics.net/lists/linux-xfs/msg12216.html

so we can support subvolumes:

https://www.youtube.com/watch?v=wG8FUvSGROw

via a virtual block address space that remaps the filesystem space
accounting away from the underlying physical block device:

https://lwn.net/SubscriberLink/753650/32230c15f3453808/

This will completely break any assumption that the filesystem size
is related to the underlying storage device(s).

GParted deals very firmly with a specific aspect of disk based
storage - managing partitions on a physical block device.
Filesystems need to move beyond physical block devices - sanely
supporting sparse virtual block devices has been on everyone's
enterprise filesystem wish list for years.

GParted doesn't have to support these new features - it can simply
turn them off for filesystems it creates on physical disk
partitions, but we're doing stuff to support the storage models
needed for container hosting, virtualisation, efficient backups and
cloning, etc. If that means we have to break assumptions that legacy
infrastructure make to support those new features, then so be it....

<snip>

> [2] For full disclosure, because tools for various FSs under report
>     their file system size, there is a heuristic that there must be at
>     least 2% difference before unallocated space and grow file system
>     recommendation is generated so under reporting the FS size by less
>     than 1% wouldn't actually be an issue. for us.

So, an ext3 example on a small root filesystem:

$ grep sda1 /proc/partitions 
   8        1    9984366 sda1
$ df -k /
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root        9696448 8615892    581340  94% /
$

Just under 3% difference between fs reported size and the block
device size, and obviously GParted has been fine with this sort of
discrepancy on ext3 for the past 15+years. IIRC the XFS metadata
reservations max out at around 3% of total filesystem space, so
GParted should be just fine with us hiding them by reducing total
filesystem size...

> Just providing an app authors point of view.

*nod*.

We're aware that we need to let existing apps continue to work on
existing formats and features. But we need to break from the old
ways to do what people are asking us to do, so we're not going to
lock ourselves in. If we're not breaking old things and making
people unhappy, then we're not making sufficient progress.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to reliably measure fs usage with reflinks enabled?
  2018-05-16  0:13           ` Dave Chinner
@ 2018-05-18 14:43             ` Mike Fleetwood
  2018-05-18 14:56               ` Eric Sandeen
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Fleetwood @ 2018-05-18 14:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Tarik Ceylan, linux-xfs, Eric Sandeen

(Sorry for the late reply, work commitments)

On 16 May 2018 at 01:13, Dave Chinner <david@fromorbit.com> wrote:
> On Tue, May 15, 2018 at 02:52:30PM +0100, Mike Fleetwood wrote:
>> On 15 May 2018 at 02:29, Dave Chinner <david@fromorbit.com> wrote:
>> > So the reflink code reserved ~7GB of space in the filesystem (less
>> > than 1%) for it's own reflink related metadata if it ever needs it.
>> > It hasn't used it yet but we need to make sure that it's available
>> > when the filesystem is near ENOSPC. Hence it's considered used space
>> > because users cannot store user data in that space.
>> >
>> > The change I plan to make is to reduce the user reported filesystem
>> > size rather than account for it as used space. IOWs, you'd see a
>> > filesystem size of 889G instead of 896G, but have only 8.8GB used.
>> > It means exactly the same thingi and will behave exactly the same
>> > way, it's just a different space accounting technique....
>>
>> I'm one of the authors of GParted and it uses the reported file system
>> size [1] and compares it to the block device size to see if the file
>> system fills the partition or not and whether to show unallocated space
>> to the user and advise them to grown the file system to fill the block
>> device [2].  As such we prefer that the reported size of the file system
>> match the highest offset that the file system can write to in the block
>> device.
>
> I think that's a narrow, use case specific assumption. There is
> absolutely no guarantee that the filesystem on a device fills the
> entire device or that the filesystem space reported by df/statvfs
> accurately reflects the size of the underlying block device.
>
> Filesystems are moving towards a virtualised world where space usage
> and capacity is kept separate from the capacity of the underlying
> storage provider. That's a solid direction we are moving with xfs:
>
> https://www.spinics.net/lists/linux-xfs/msg12216.html
>
> so we can support subvolumes:
>
> https://www.youtube.com/watch?v=wG8FUvSGROw
>
> via a virtual block address space that remaps the filesystem space
> accounting away from the underlying physical block device:
>
> https://lwn.net/SubscriberLink/753650/32230c15f3453808/
>
> This will completely break any assumption that the filesystem size
> is related to the underlying storage device(s).
>
> GParted deals very firmly with a specific aspect of disk based
> storage - managing partitions on a physical block device.
> Filesystems need to move beyond physical block devices - sanely
> supporting sparse virtual block devices has been on everyone's
> enterprise filesystem wish list for years.

Agreed that GParted is a tool for simple storage setups with current
full fat block devices and file systems.  As such enterprise users with
multiple levels in their storage stack is not it's target audience.

> GParted doesn't have to support these new features - it can simply
> turn them off for filesystems it creates on physical disk
> partitions, but we're doing stuff to support the storage models
> needed for container hosting, virtualisation, efficient backups and
> cloning, etc. If that means we have to break assumptions that legacy
> infrastructure make to support those new features, then so be it....
>
> <snip>
>
>> [2] For full disclosure, because tools for various FSs under report
>>     their file system size, there is a heuristic that there must be at
>>     least 2% difference before unallocated space and grow file system
>>     recommendation is generated so under reporting the FS size by less
>>     than 1% wouldn't actually be an issue. for us.
>
> So, an ext3 example on a small root filesystem:
>
> $ grep sda1 /proc/partitions
>    8        1    9984366 sda1
> $ df -k /
> Filesystem     1K-blocks    Used Available Use% Mounted on
> /dev/root        9696448 8615892    581340  94% /
> $
>
> Just under 3% difference between fs reported size and the block
> device size, and obviously GParted has been fine with this sort of
> discrepancy on ext3 for the past 15+years. IIRC the XFS metadata
> reservations max out at around 3% of total filesystem space, so
> GParted should be just fine with us hiding them by reducing total
> filesystem size...

(I assume you are aware, but for completeness ...)
By default ext2/4 kernel code subtracts some overhead blocks from the
statvfs reported f_blocks figure.  This is documented in mount(8)
against the bsddf/minixdf options.

So after checking, GParted was modified to use the dumpe2fs command to
read the superblock to get the file system size for mounted ext* file
systems too.

https://marc.info/?l=linux-ext4&m=134706477618732&w=2

I see that xfs_db doesn't allow reading the super block of mounted XFS
file systems.  So for the case of a mounted XFS on full fat block device
I guess I'll wait and see how much overhead is subtracted from the
statvfs f_blocks figure and make sure GParted accounts for that.

>> Just providing an app authors point of view.
>
> *nod*.
>
> We're aware that we need to let existing apps continue to work on
> existing formats and features. But we need to break from the old
> ways to do what people are asking us to do, so we're not going to
> lock ourselves in. If we're not breaking old things and making
> people unhappy, then we're not making sufficient progress.

Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to reliably measure fs usage with reflinks enabled?
  2018-05-18 14:43             ` Mike Fleetwood
@ 2018-05-18 14:56               ` Eric Sandeen
  2018-05-19  8:36                 ` Mike Fleetwood
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Sandeen @ 2018-05-18 14:56 UTC (permalink / raw)
  To: Mike Fleetwood, Dave Chinner; +Cc: Tarik Ceylan, linux-xfs

On 5/18/18 9:43 AM, Mike Fleetwood wrote:
> (Sorry for the late reply, work commitments)
> 
...

> So after checking, GParted was modified to use the dumpe2fs command to
> read the superblock to get the file system size for mounted ext* file
> systems too.
> 
> https://marc.info/?l=linux-ext4&m=134706477618732&w=2
> 
> I see that xfs_db doesn't allow reading the super block of mounted XFS
> file systems.  So for the case of a mounted XFS on full fat block device
> I guess I'll wait and see how much overhead is subtracted from the
> statvfs f_blocks figure and make sure GParted accounts for that.

Actually you can, with -r:

# mount /dev/sda1 /mnt/test
# xfs_db -r -c "sb 0" -c "p dblocks" /dev/sda1
dblocks = 229771264

though I may be giving you rope to hang yourself here ;)

It's generally a bit dicey to be reading a mounted block device for any
filesystem, as there's no coordination with changes the filesystem may
be making while it's mounted.

The XFS_IOC_FSGEOMETRY would be a better choice for gathering geometry
information for a mounted xfs filesystem.

-Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to reliably measure fs usage with reflinks enabled?
  2018-05-15  1:29       ` Dave Chinner
  2018-05-15 13:52         ` Mike Fleetwood
@ 2018-05-18 14:58         ` Darrick J. Wong
  2018-05-20  0:10           ` Dave Chinner
  1 sibling, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2018-05-18 14:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Tarik Ceylan, linux-xfs, sandeen

On Tue, May 15, 2018 at 11:29:26AM +1000, Dave Chinner wrote:
> On Tue, May 15, 2018 at 01:37:32AM +0200, Tarik Ceylan wrote:
> > Am 2018-05-15 00:57, schrieb Dave Chinner:
> > >On Mon, May 14, 2018 at 05:02:53PM -0500, Eric Sandeen wrote:
> > >>
> > >>
> > >>On 5/14/18 3:02 PM, Tarik Ceylan wrote:
> > >>> How can one reliably measure filesystem usage on partitions that were compiled with -m reflink=1 ?
> > >>> Here are some numbers i am measuring with df -h (on different partitions holding the same data):
> > >>> 7.7G of 36G  (-b size=512  -m crc=0 )
> > >>> 8.6G of 36G  (-b size=4096 -m crc=1 )
> > >>
> > >>8x larger inodes will take 8x more space, but you didn't say how many
> > >>inodes you have allocated.
> > >>
> > >>> 11G  of 36G  (-b size=1024 -m crc=1,reflink=1,rmapbt=1 -i sparse=1 )
> > >>> 32G  of 864G (-b size=4096 -m crc=1,reflink=1 )
> > >>
> > >>In that last case, you have a wildly different total fs size, so
> > >>probably
> > >>no fair comparison here either.
> > >>
> > >>The reverse mapping btree also takes up space.  You're turning
> > >>too many
> > >>knobs at once.  ;)
> > 
> > Thanks,
> > here's a test in which i only compare reflink=0 to reflink=1, all other
> > variables being the same:
> > 
> > mkfs.xfs -f -m reflink=0 /dev/sdc4
> > meta-data=/dev/sdc4              isize=512    agcount=4,
> > agsize=58687982 blks
> >          =                       sectsz=512   attr=2, projid32bit=1
> >          =                       crc=1        finobt=1, sparse=0,
> > rmapbt=0, reflink=0
> > data     =                       bsize=4096   blocks=234751926,
> > imaxpct=25
> >          =                       sunit=0      swidth=0 blks
> > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > log      =internal log           bsize=4096   blocks=114624, version=2
> >          =                       sectsz=512   sunit=0 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > 
> > "df -h" shows a usage of 8.8G of 896G
> > 
> > mkfs.xfs -f -m reflink=1 /dev/sdc4
> > [output same as before except the reflink parameter]
> > 15G of 896G
> 
> So the reflink code reserved ~7GB of space in the filesystem (less
> than 1%) for it's own reflink related metadata if it ever needs it.
> It hasn't used it yet but we need to make sure that it's available
> when the filesystem is near ENOSPC. Hence it's considered used space
> because users cannot store user data in that space.
> 
> The change I plan to make is to reduce the user reported filesystem
> size rather than account for it as used space. IOWs, you'd see a
> filesystem size of 889G instead of 896G, but have only 8.8GB used.
> It means exactly the same thingi and will behave exactly the same
> way, it's just a different space accounting technique....

FWIW generic/260 also assumes that f_blocks reflects the size of the
device and stumbles when we tell it to fstrim (0..ULLONG_MAX) and the
number of bytes returned is greater than the f_blocks size of the fs,
which is what (I think) will happen if we start reducing f_blocks by the
size of the per-AG reservations.

I think the underlying problem is confusion over the definition of the
address space that fstrim's range parameters run over.  The current
usage in ext4/xfs suggests that the units are byte offsets into the main
block device, but there's no uniform way to find out the maximum
physical address that the filesystem uses, is there?  And what of
multi-device filesystems like btrfs and xfs+realtime?  Do we just
concatenate the block devices in a virtual address space?

ext4: reports physical size of fs via f_blocks

xfs: reports physical size of fs via f_blocks, but soon will start
decreasing f_blocks by the size of per-ag metadata reservations since it
is never possible for users to get at those blocks

btrfs: iirc internally they create a virtual address space out of all
the devices attached, but I've no idea how to find the size

Looking over xfs_ioc_trim, it seems to me that we do not ever try to
trim the realtime device?

I /hope/ the common caller case is (0..ULLONG_MAX)...

--D

> > >Also, we reserve a lot of space for reflink/rmapbt metadata that
> > >isn't actually used, so you're not actually using any more space
> > >than the "-b size=4096 -m crc=1" case. I have plans for hiding that
> > >reservation from users so that we don't get questions like this....
> > 
> > That should resolve my confusion. Sorry to have bothered, but it's
> > kind of an obvious question.
> 
> It's the sort of "obvious question" which almost no-one has asked us
> about... :)
> 
> > To get back to my original question - can i assume  "df" to be a
> > reliable
> > way of measuring fs usage going forward (after the change you mention),
> 
> df is reliable now, regardless of any change we make in the future.
> 
> > or will specialized tools be necessary as is the case with btrfs?
> 
> No - df works and it should always work. We try to learn from other
> people's mistakes, not just our own... :)
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to reliably measure fs usage with reflinks enabled?
  2018-05-18 14:56               ` Eric Sandeen
@ 2018-05-19  8:36                 ` Mike Fleetwood
  0 siblings, 0 replies; 12+ messages in thread
From: Mike Fleetwood @ 2018-05-19  8:36 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Chinner, Tarik Ceylan, linux-xfs

On 18 May 2018 at 15:56, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 5/18/18 9:43 AM, Mike Fleetwood wrote:
>>
>> (Sorry for the late reply, work commitments)
>>
> ...
>
>> So after checking, GParted was modified to use the dumpe2fs command to
>> read the superblock to get the file system size for mounted ext* file
>> systems too.
>>
>> https://marc.info/?l=linux-ext4&m=134706477618732&w=2
>>
>> I see that xfs_db doesn't allow reading the super block of mounted XFS
>> file systems.  So for the case of a mounted XFS on full fat block device
>> I guess I'll wait and see how much overhead is subtracted from the
>> statvfs f_blocks figure and make sure GParted accounts for that.
>
>
> Actually you can, with -r:
>
> # mount /dev/sda1 /mnt/test
> # xfs_db -r -c "sb 0" -c "p dblocks" /dev/sda1
> dblocks = 229771264
>
> though I may be giving you rope to hang yourself here ;)
>
> It's generally a bit dicey to be reading a mounted block device for any
> filesystem, as there's no coordination with changes the filesystem may
> be making while it's mounted.
>
> The XFS_IOC_FSGEOMETRY would be a better choice for gathering geometry
> information for a mounted xfs filesystem.

Thanks, the xfs_db -r option is exactly what I need.  (Should have
checked the man page myself for that).

Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to reliably measure fs usage with reflinks enabled?
  2018-05-18 14:58         ` Darrick J. Wong
@ 2018-05-20  0:10           ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2018-05-20  0:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Tarik Ceylan, linux-xfs, sandeen

On Fri, May 18, 2018 at 07:58:04AM -0700, Darrick J. Wong wrote:
> On Tue, May 15, 2018 at 11:29:26AM +1000, Dave Chinner wrote:
> > On Tue, May 15, 2018 at 01:37:32AM +0200, Tarik Ceylan wrote:
> > > Am 2018-05-15 00:57, schrieb Dave Chinner:
> > > >On Mon, May 14, 2018 at 05:02:53PM -0500, Eric Sandeen wrote:
> > > >>
> > > >>
> > > >>On 5/14/18 3:02 PM, Tarik Ceylan wrote:
> > > >>> How can one reliably measure filesystem usage on partitions that were compiled with -m reflink=1 ?
> > > >>> Here are some numbers i am measuring with df -h (on different partitions holding the same data):
> > > >>> 7.7G of 36G  (-b size=512  -m crc=0 )
> > > >>> 8.6G of 36G  (-b size=4096 -m crc=1 )
> > > >>
> > > >>8x larger inodes will take 8x more space, but you didn't say how many
> > > >>inodes you have allocated.
> > > >>
> > > >>> 11G  of 36G  (-b size=1024 -m crc=1,reflink=1,rmapbt=1 -i sparse=1 )
> > > >>> 32G  of 864G (-b size=4096 -m crc=1,reflink=1 )
> > > >>
> > > >>In that last case, you have a wildly different total fs size, so
> > > >>probably
> > > >>no fair comparison here either.
> > > >>
> > > >>The reverse mapping btree also takes up space.  You're turning
> > > >>too many
> > > >>knobs at once.  ;)
> > > 
> > > Thanks,
> > > here's a test in which i only compare reflink=0 to reflink=1, all other
> > > variables being the same:
> > > 
> > > mkfs.xfs -f -m reflink=0 /dev/sdc4
> > > meta-data=/dev/sdc4              isize=512    agcount=4,
> > > agsize=58687982 blks
> > >          =                       sectsz=512   attr=2, projid32bit=1
> > >          =                       crc=1        finobt=1, sparse=0,
> > > rmapbt=0, reflink=0
> > > data     =                       bsize=4096   blocks=234751926,
> > > imaxpct=25
> > >          =                       sunit=0      swidth=0 blks
> > > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > > log      =internal log           bsize=4096   blocks=114624, version=2
> > >          =                       sectsz=512   sunit=0 blks, lazy-count=1
> > > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > > 
> > > "df -h" shows a usage of 8.8G of 896G
> > > 
> > > mkfs.xfs -f -m reflink=1 /dev/sdc4
> > > [output same as before except the reflink parameter]
> > > 15G of 896G
> > 
> > So the reflink code reserved ~7GB of space in the filesystem (less
> > than 1%) for it's own reflink related metadata if it ever needs it.
> > It hasn't used it yet but we need to make sure that it's available
> > when the filesystem is near ENOSPC. Hence it's considered used space
> > because users cannot store user data in that space.
> > 
> > The change I plan to make is to reduce the user reported filesystem
> > size rather than account for it as used space. IOWs, you'd see a
> > filesystem size of 889G instead of 896G, but have only 8.8GB used.
> > It means exactly the same thingi and will behave exactly the same
> > way, it's just a different space accounting technique....
> 
> FWIW generic/260 also assumes that f_blocks reflects the size of the
> device and stumbles when we tell it to fstrim (0..ULLONG_MAX) and the
> number of bytes returned is greater than the f_blocks size of the fs,
> which is what (I think) will happen if we start reducing f_blocks by the
> size of the per-AG reservations.

That's trivial to fix, though. Just clamp the return bytes to the
size reported to userspace via statfs().

> I think the underlying problem is confusion over the definition of the
> address space that fstrim's range parameters run over.

I think it's pretty clear in the man page by the offset and length
parameters. Specifically, the length parameter:

	[....] If the specified value extends past  the  end of the
	filesystem, fstrim will stop at the filesystem size boundary

IOWs, the filesystem decides what the filesystem size is, not the
caller. That means if the fs is smaller than the block device it
sits on, it will not discard the region of the block device beyond
the end of the filesystem....

Really, I think you're conflating /filesystem storage capacity/ with
/device address space/.  They are two different things.  statfs()
reports filesystem capacity, not the size of the underlying device
(because the filesystem may not have an "underlying device").
/proc/partitions reports the size of the underlying device for block
based filesystems, but tells us nothing about how much of that space
the filesystem will present to the user as available storage.

And when we get subvolumes on XFS what, exactly, is the "underlying
device"? It's not a block device....

Filesystem storage capacity is not the same thing as the size of
the linear block address space it sits on. That association was
broken a long, long time ago, so can we please stop acting as though
they are one-and-the-same?

> The current
> usage in ext4/xfs suggests that the units are byte offsets into the main
> block device, but there's no uniform way to find out the maximum
> physical address that the filesystem uses, is there?

For XFS: XFS_IOC_FSGEOMETRY. Even after we change the accounting, we
will still be able to get the physical address space size the
filesystem is using from this.

> And what of
> multi-device filesystems like btrfs and xfs+realtime?  Do we just
> concatenate the block devices in a virtual address space?

IMO, hacks like that are a path to certain insanity. :/

Unfortunately, fstrim was not written with multidevice filesystems
in mind, so if we want to support them, we need a new syscall/ioctl
to make these work sanely.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-05-20  0:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-14 20:02 How to reliably measure fs usage with reflinks enabled? Tarik Ceylan
2018-05-14 22:02 ` Eric Sandeen
2018-05-14 22:57   ` Dave Chinner
2018-05-14 23:37     ` Tarik Ceylan
2018-05-15  1:29       ` Dave Chinner
2018-05-15 13:52         ` Mike Fleetwood
2018-05-16  0:13           ` Dave Chinner
2018-05-18 14:43             ` Mike Fleetwood
2018-05-18 14:56               ` Eric Sandeen
2018-05-19  8:36                 ` Mike Fleetwood
2018-05-18 14:58         ` Darrick J. Wong
2018-05-20  0:10           ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.