All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems
Date: Mon, 6 Nov 2017 08:07:08 -0500	[thread overview]
Message-ID: <20171106130708.GB30884@bfoster.bfoster> (raw)
In-Reply-To: <20171105235104.GF4094@dastard>

On Mon, Nov 06, 2017 at 10:51:04AM +1100, Dave Chinner wrote:
> On Fri, Nov 03, 2017 at 07:26:27AM -0400, Brian Foster wrote:
> > On Fri, Nov 03, 2017 at 10:30:17AM +1100, Dave Chinner wrote:
> > > On Thu, Nov 02, 2017 at 07:25:33AM -0400, Brian Foster wrote:
> > > > On Thu, Nov 02, 2017 at 10:53:00AM +1100, Dave Chinner wrote:
> > > > > On Wed, Nov 01, 2017 at 10:17:21AM -0400, Brian Foster wrote:
> > > > > > On Wed, Nov 01, 2017 at 11:45:13AM +1100, Dave Chinner wrote:
> > > > > > > On Tue, Oct 31, 2017 at 07:24:32AM -0400, Brian Foster wrote:
> > > > > > > > On Tue, Oct 31, 2017 at 08:09:41AM +1100, Dave Chinner wrote:
> > > > > > > > > On Mon, Oct 30, 2017 at 09:31:17AM -0400, Brian Foster wrote:
> > > > > > > > > > On Thu, Oct 26, 2017 at 07:33:08PM +1100, Dave Chinner wrote:
> > ...
> > > > > > BTW, was there ever any kind of solution to the metadata block
> > > > > > reservation issue in the thin case? We now hide metadata reservation
> > > > > > from the user via the m_usable_blocks account. If m_phys_blocks
> > > > > > represents a thin volume, how exactly do we prevent those metadata
> > > > > > allocations/writes from overrunning what the admin has specified as
> > > > > > "usable" with respect to the thin volume?
> > > > > 
> > > > > The reserved metadata blocks are not accounted from free space when
> > > > > they are allocated - they are pulled from the reserved space that
> > > > > has already been removed from the free space.
> > > > > 
> > > > 
> > > > Ok, so the user can set a usable blocks value of something less than the
> > > > fs geometry, then the reservation is pulled from that, reducing the
> > > > reported "usable" value further. Hence, what ends up reported to the
> > > > user is actually something less than the value set by the user, which
> > > > means that the filesystem overall respects how much space the admin says
> > > > it can use in the underlying volume.
> > > > 
> > > > For example, the user creates a 100T thin volume with 10T of usable
> > > > space. The fs reserves a further 2T out of that for metadata, so then
> > > > what the user sees is 8T of writeable space.  The filesystem itself
> > > > cannot use more than 10T out of the volume, as instructed. Am I
> > > > following that correctly? If so, that sounds reasonable to me from the
> > > > "don't overflow my thin volume" perspective.
> > > 
> > > No, that's not what happens. For thick filesystems, the 100TB volume
> > > gets 2TB pulled from it so it appears as a 98TB filesystem. This is
> > > done by modifying the free block counts and m_usable_space when the
> > > reservations are made.
> > > 
> > 
> > Ok..
> > 
> > > For thin filesystems, we've already got 90TB of space "reserved",
> > > and so the metadata reservations and allocations come from that.
> > > i.e. we skip the modification of free block counts and m_usable
> > > space in the case of a thinspace filesystem, and so the user still
> > > sees 10TB of usable space that they asked to have.
> > > 
> > 
> > Hmm.. so then I'm slightly confused regarding the thin use case
> > regarding prevention of pool depletion. The usable blocks value that the
> > user settles on is likely based on how much space the filesystem should
> > use to safely avoid pool depletion.
> 
> I did say up front that the user data thinspace accounting would not
> be an exact reflection of underlying storage pool usage. Things like
> partially written blocks in the underlying storage pool mean write
> amplification factors would need to be considered, but that's
> something the admin already has to deal with in thinly provisioned
> storage.
> 

Ok, I recall this coming up one way or another. For some reason I
thought something might have changed in the implementation since then
and/or managed to confuse myself over the current behavior.

> > If a usable value of 10T means the
> > filesystem can write to the usable 10T + some amount of metadata
> > reservation, how does the user determine a sane usable value based on
> > the current pool geometry?
> 
> From an admin POV it's damn easy to document in admin guides that
> actual space usage of a thinspace filesysetm is going to be in the
> order of 2% greater than the space given to the filesystem for user
> data. Use an overhead of 2-5% for internal management and the "small
> amount of extra space for internal metadata" issue can be ignored.
> 

It's easy to document whatever we want. :) I'm not convinced that is as
effective as a hard limit based on the fs features, but the latter is
more complex and may be overkill in most cases. So, documentation works
for me until/unless testing or real usage shows otherwise.

If it does come up, perhaps a script or userspace tool that somehow
presents the current internal reservation calculations (combined with
whatever geometry information is relevant) as something consumable for
the user (whether it be a simple dump of the active reservations, the
worst case consumption of a thin fs, etc.) might be a nice compromise.

> > > > The best I can read into the response here is that you think physical
> > > > shrink is unlikely enough to not need to care very much what kind of
> > > > interface confusion could result from needing to rev the current growfs
> > > > interface to support physical shrink on thin filesystems in the future.
> > > > Is that a fair assessment..?
> > > 
> > > Not really. I understand just how complex a physical shrink
> > > implementation is going to be, and have a fair idea of the sorts of
> > > craziness we'll need to add to xfs_growfs to support/co-ordinate a
> > > physical shrink operation.  From that perspective, I don't see a
> > > physical shrink working with an unchanged growfs interface. The
> > > discussion about whether or not we should physically shrink
> > > thinspace filesystems is almost completely irrelevant to the
> > > interface requirements of a physical shrink....
> > 
> > So it's not so much about the likelihood of realizing physical shrink,
> > but rather the likelihood that physical shrink would require to rev the
> > growfs structure anyways (regardless of this feature).
> 
> Yup, pretty much.
> 

Ok. I don't agree, but at least I understand your perspective. ;)

Brian

> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

      reply	other threads:[~2017-11-06 13:07 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-26  8:33 [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems Dave Chinner
2017-10-26  8:33 ` [PATCH 01/14] xfs: factor out AG header initialisation from growfs core Dave Chinner
2017-10-26  8:33 ` [PATCH 02/14] xfs: convert growfs AG header init to use buffer lists Dave Chinner
2017-10-26  8:33 ` [PATCH 03/14] xfs: factor ag btree reoot block initialisation Dave Chinner
2017-10-26  8:33 ` [PATCH 04/14] xfs: turn ag header initialisation into a table driven operation Dave Chinner
2017-10-26  8:33 ` [PATCH 05/14] xfs: make imaxpct changes in growfs separate Dave Chinner
2017-10-26  8:33 ` [PATCH 06/14] xfs: separate secondary sb update in growfs Dave Chinner
2017-10-26  8:33 ` [PATCH 07/14] xfs: rework secondary superblock updates " Dave Chinner
2017-10-26  8:33 ` [PATCH 08/14] xfs: move various type verifiers to common file Dave Chinner
2017-10-26  8:33 ` [PATCH 09/14] xfs: split usable space from block device size Dave Chinner
2017-10-26  8:33 ` [PATCH 10/14] xfs: hide reserved metadata space from users Dave Chinner
2017-10-26  8:33 ` [PATCH 11/14] xfs: bump XFS_IOC_FSGEOMETRY to v5 structures Dave Chinner
2017-10-26  8:33 ` [PATCH 12/14] xfs: convert remaingin xfs_sb_version_... checks to bool Dave Chinner
2017-10-26 16:03   ` Darrick J. Wong
2017-10-26  8:33 ` [PATCH 13/14] xfs: add suport for "thin space" filesystems Dave Chinner
2017-10-26  8:33 ` [PATCH 14/14] xfs: add growfs support for changing usable blocks Dave Chinner
2017-10-26 11:30   ` Amir Goldstein
2017-10-26 12:48     ` Dave Chinner
2017-10-26 13:32       ` Amir Goldstein
2017-10-27 10:26         ` Amir Goldstein
2017-10-26 11:09 ` [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems Amir Goldstein
2017-10-26 12:35   ` Dave Chinner
2017-11-01 22:31     ` Darrick J. Wong
2017-10-30 13:31 ` Brian Foster
2017-10-30 21:09   ` Dave Chinner
2017-10-31  4:49     ` Amir Goldstein
2017-10-31 22:40       ` Dave Chinner
2017-10-31 11:24     ` Brian Foster
2017-11-01  0:45       ` Dave Chinner
2017-11-01 14:17         ` Brian Foster
2017-11-01 23:53           ` Dave Chinner
2017-11-02 11:25             ` Brian Foster
2017-11-02 23:30               ` Dave Chinner
2017-11-03  2:47                 ` Darrick J. Wong
2017-11-03 11:36                   ` Brian Foster
2017-11-05 22:50                     ` Dave Chinner
2017-11-06 13:01                       ` Brian Foster
2017-11-06 21:20                         ` Dave Chinner
2017-11-07 11:28                           ` Brian Foster
2017-11-03 11:26                 ` Brian Foster
2017-11-03 12:19                   ` Amir Goldstein
2017-11-06  1:16                     ` Dave Chinner
2017-11-06  9:48                       ` Amir Goldstein
2017-11-06 21:46                         ` Dave Chinner
2017-11-07  5:30                           ` Amir Goldstein
2017-11-05 23:51                   ` Dave Chinner
2017-11-06 13:07                     ` Brian Foster [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171106130708.GB30884@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.