All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Tarik Ceylan <Tarik.Ceylan@ruhr-uni-bochum.de>,
	linux-xfs@vger.kernel.org, sandeen@sandeen.net
Subject: Re: How to reliably measure fs usage with reflinks enabled?
Date: Fri, 18 May 2018 07:58:04 -0700	[thread overview]
Message-ID: <20180518145713.GF23858@magnolia> (raw)
In-Reply-To: <20180515012926.GC10363@dastard>

On Tue, May 15, 2018 at 11:29:26AM +1000, Dave Chinner wrote:
> On Tue, May 15, 2018 at 01:37:32AM +0200, Tarik Ceylan wrote:
> > Am 2018-05-15 00:57, schrieb Dave Chinner:
> > >On Mon, May 14, 2018 at 05:02:53PM -0500, Eric Sandeen wrote:
> > >>
> > >>
> > >>On 5/14/18 3:02 PM, Tarik Ceylan wrote:
> > >>> How can one reliably measure filesystem usage on partitions that were compiled with -m reflink=1 ?
> > >>> Here are some numbers i am measuring with df -h (on different partitions holding the same data):
> > >>> 7.7G of 36G  (-b size=512  -m crc=0 )
> > >>> 8.6G of 36G  (-b size=4096 -m crc=1 )
> > >>
> > >>8x larger inodes will take 8x more space, but you didn't say how many
> > >>inodes you have allocated.
> > >>
> > >>> 11G  of 36G  (-b size=1024 -m crc=1,reflink=1,rmapbt=1 -i sparse=1 )
> > >>> 32G  of 864G (-b size=4096 -m crc=1,reflink=1 )
> > >>
> > >>In that last case, you have a wildly different total fs size, so
> > >>probably
> > >>no fair comparison here either.
> > >>
> > >>The reverse mapping btree also takes up space.  You're turning
> > >>too many
> > >>knobs at once.  ;)
> > 
> > Thanks,
> > here's a test in which i only compare reflink=0 to reflink=1, all other
> > variables being the same:
> > 
> > mkfs.xfs -f -m reflink=0 /dev/sdc4
> > meta-data=/dev/sdc4              isize=512    agcount=4,
> > agsize=58687982 blks
> >          =                       sectsz=512   attr=2, projid32bit=1
> >          =                       crc=1        finobt=1, sparse=0,
> > rmapbt=0, reflink=0
> > data     =                       bsize=4096   blocks=234751926,
> > imaxpct=25
> >          =                       sunit=0      swidth=0 blks
> > naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> > log      =internal log           bsize=4096   blocks=114624, version=2
> >          =                       sectsz=512   sunit=0 blks, lazy-count=1
> > realtime =none                   extsz=4096   blocks=0, rtextents=0
> > 
> > "df -h" shows a usage of 8.8G of 896G
> > 
> > mkfs.xfs -f -m reflink=1 /dev/sdc4
> > [output same as before except the reflink parameter]
> > 15G of 896G
> 
> So the reflink code reserved ~7GB of space in the filesystem (less
> than 1%) for it's own reflink related metadata if it ever needs it.
> It hasn't used it yet but we need to make sure that it's available
> when the filesystem is near ENOSPC. Hence it's considered used space
> because users cannot store user data in that space.
> 
> The change I plan to make is to reduce the user reported filesystem
> size rather than account for it as used space. IOWs, you'd see a
> filesystem size of 889G instead of 896G, but have only 8.8GB used.
> It means exactly the same thingi and will behave exactly the same
> way, it's just a different space accounting technique....

FWIW generic/260 also assumes that f_blocks reflects the size of the
device and stumbles when we tell it to fstrim (0..ULLONG_MAX) and the
number of bytes returned is greater than the f_blocks size of the fs,
which is what (I think) will happen if we start reducing f_blocks by the
size of the per-AG reservations.

I think the underlying problem is confusion over the definition of the
address space that fstrim's range parameters run over.  The current
usage in ext4/xfs suggests that the units are byte offsets into the main
block device, but there's no uniform way to find out the maximum
physical address that the filesystem uses, is there?  And what of
multi-device filesystems like btrfs and xfs+realtime?  Do we just
concatenate the block devices in a virtual address space?

ext4: reports physical size of fs via f_blocks

xfs: reports physical size of fs via f_blocks, but soon will start
decreasing f_blocks by the size of per-ag metadata reservations since it
is never possible for users to get at those blocks

btrfs: iirc internally they create a virtual address space out of all
the devices attached, but I've no idea how to find the size

Looking over xfs_ioc_trim, it seems to me that we do not ever try to
trim the realtime device?

I /hope/ the common caller case is (0..ULLONG_MAX)...

--D

> > >Also, we reserve a lot of space for reflink/rmapbt metadata that
> > >isn't actually used, so you're not actually using any more space
> > >than the "-b size=4096 -m crc=1" case. I have plans for hiding that
> > >reservation from users so that we don't get questions like this....
> > 
> > That should resolve my confusion. Sorry to have bothered, but it's
> > kind of an obvious question.
> 
> It's the sort of "obvious question" which almost no-one has asked us
> about... :)
> 
> > To get back to my original question - can i assume  "df" to be a
> > reliable
> > way of measuring fs usage going forward (after the change you mention),
> 
> df is reliable now, regardless of any change we make in the future.
> 
> > or will specialized tools be necessary as is the case with btrfs?
> 
> No - df works and it should always work. We try to learn from other
> people's mistakes, not just our own... :)
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2018-05-18 14:58 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-14 20:02 How to reliably measure fs usage with reflinks enabled? Tarik Ceylan
2018-05-14 22:02 ` Eric Sandeen
2018-05-14 22:57   ` Dave Chinner
2018-05-14 23:37     ` Tarik Ceylan
2018-05-15  1:29       ` Dave Chinner
2018-05-15 13:52         ` Mike Fleetwood
2018-05-16  0:13           ` Dave Chinner
2018-05-18 14:43             ` Mike Fleetwood
2018-05-18 14:56               ` Eric Sandeen
2018-05-19  8:36                 ` Mike Fleetwood
2018-05-18 14:58         ` Darrick J. Wong [this message]
2018-05-20  0:10           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180518145713.GF23858@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=Tarik.Ceylan@ruhr-uni-bochum.de \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.