All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: xfs@oss.sgi.com
Subject: Re: [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support
Date: Tue, 5 Jan 2016 07:42:26 -0500	[thread overview]
Message-ID: <20160105124226.GA38749@bfoster.bfoster> (raw)
In-Reply-To: <20160104235951.GE28330@birch.djwong.org>

On Mon, Jan 04, 2016 at 03:59:51PM -0800, Darrick J. Wong wrote:
> On Sun, Dec 20, 2015 at 09:02:54AM -0500, Brian Foster wrote:
> > On Sat, Dec 19, 2015 at 12:56:23AM -0800, Darrick J. Wong wrote:
> > > Hi all,
> > > 
> > ...
> > > Fixed since RFCv3:
> > > 
> > >  * The reflink and dedupe ioctls are being hoisted to the VFS, as
> > >    provided in the first few patches.  Patch 81 connects to this
> > >    functionality.
> > > 
> > >  * Copy on write has been rewritten for v4.  We now use the existing
> > >    delayed allocation mechanism to coalesce writes together, deferring
> > >    allocation until writeout time.  This enables CoW to make better
> > >    block placement decisions and significantly reduces overhead.
> > >    CoW is still pretty slow, but not as slow as before.
> > > 
> > >  * Direct IO CoW has been implemented using the same mechanism as
> > >    above, but modified to perform the allocation and remapping right
> > >    then and there.  Throughput is much higher than pushing data
> > >    through the page cache CoW.  (It's the same mechanism, but we're
> > >    playing with chunks bigger than a single memory page.)
> > > 
> > >  * CoW ENOSPC works correctly now, except in the pathological case
> > >    that the AG fills up and the rmap btree cannot expand.  That will
> > >    be addressed for v5.
> > > 
> > >  * fallocate will now unshare blocks to prevent future ENOSPC, as
> > >    you'd expect.
> > > 
> > >  * refcount btree blocks are preallocated at mount time to prevent
> > >    ENOSPC while trying to expand the tree.  This also has the effect
> > >    of grouping the btree blocks together, which can speed up CoW
> > >    remapping.
> > > 
> > 
> > Can you elaborate on how these blocks are preallocated? E.g., is the
> > tree "preconstructed" in some sense? However that is done, is this the
> > anticipated solution or a temporary workaround..?
> > 
> > Also, shouldn't the enospc condition be handled by the agfl? I take it
> > there is something going on here that renders that solution flawed, so
> > I'm just curious what it is.
> > 
> > (Sorry if this is all explained elsewhere, but I haven't yet had a
> > chance to take a close enough look at this feature..).
> 
> Reference count btree blocks aren't allocated from the AGFL; they're allocated
> from the free space in the same manner as the inobt, per a review comment from
> Dave a looong time ago. :) 
> 

Ah, Ok.

> As such, we can get ourselves into the nasty situation where every block in the
> AG has been allocated to file data.  If we then see a bunch of reference count
> changes that are scattered around the AG, the reference count btree has to
> expand to hold all the new records... but there isn't space, and the operation
> fails.  Given that we know the maximum possible size of the refcount btree
> (it's 0.3% of the AG size with 4k blocks), I figured it was easy enough to
> avoid ENOSPC for reflink operations.
> 

Sounds reasonable.

> I've temporarily fixed this by adding code that figures out how many blocks we
> need if the reference count btree has to have a unique record for every block
> in the AG and holding that many blocks until either they're allocated to the
> refcount btree or freed at umount time.  Right now it's a temporary fix (if the
> FS crashes, the reserved blocks are lost) but it wouldn't be difficult for the
> FS to make a permanent reservation that's recorded on disk somehow.  But that's
> involves writing things to disk + making xfsprogs understand the reservation;
> let's see what people say about the reserved pool idea at all.
> 
> Does that make sense? :)
> 

Yep, it sounds sort of like the reserve pool mechanism used to protect
against ENOSPC when freeing blocks. Curious... why are the reserved
blocks lost on fs crash? Wouldn't they be reserved again on the
subsequent mount?

Thanks for the explanation...

Brian

> --D
> 
> > 
> > Brian
> > 
> > > Issues: 
> > > 
> > >  * The extent swapping ioctl still allocates a bigger fixed-size
> > >    transaction.  That's most likely a stupid thing to do, so getting a
> > >    better grip on how the journalling code works and auditing all the
> > >    new transaction users will have to happen.  Right now it mostly
> > >    gets lucky.
> > > 
> > >  * EFI tracking for the allocated-but-not-yet-mapped blocks is
> > >    nonexistant.  A crash will leak them.
> > > 
> > >  * ENOSPC while expanding the rmap btree can crash the FS.  For now we
> > >    work around this problem by making the AGFL as big as possible,
> > >    failing CoW attempts with ENOSPC if there aren't enough AGFL blocks
> > >    available, and hoping that doesn't actually happen.
> > > 
> > > If you're going to start using this mess, you probably ought to just
> > > pull from my github trees for kernel[1], xfsprogs[2], and xfstests[3].
> > > There are also updates for xfs-docs[4] and man-pages[5].
> > > 
> > > The patches have been xfstested with x64, i386, and ppc64; while in
> > > general the tests run to completion, there are still periodic bugs
> > > that will be addressed by the next RFC.  There's a persistent crash on
> > > arm64 and ppc64el that I haven't been able to triage.
> > > 
> > > This is an extraordinary way to eat your data.  Enjoy! 
> > > Comments and questions are, as always, welcome.
> > > 
> > > --D
> > > 
> > > [1] https://github.com/djwong/linux/tree/for-dave
> > > [2] https://github.com/djwong/xfsprogs/tree/for-dave
> > > [3] https://github.com/djwong/xfstests/tree/for-dave
> > > [4] https://github.com/djwong/xfs-documentation/tree/for-dave
> > > [5] https://github.com/djwong/man-pages/commits/for-mtk
> > > 
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@oss.sgi.com
> > > http://oss.sgi.com/mailman/listinfo/xfs
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2016-01-05 12:42 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
2015-12-19  8:56 ` [PATCH 01/76] libxfs: make xfs_alloc_fix_freelist non-static Darrick J. Wong
2015-12-19  8:56 ` [PATCH 02/76] xfs: fix log ticket type printing Darrick J. Wong
2016-01-03 12:13   ` Christoph Hellwig
2016-01-03 21:29     ` Dave Chinner
2016-01-04 19:57       ` Darrick J. Wong
2015-12-19  8:56 ` [PATCH 03/76] libxfs: refactor the btree size calculator code Darrick J. Wong
2015-12-20 20:39   ` Dave Chinner
2016-01-04 22:06     ` Darrick J. Wong
2015-12-19  8:56 ` [PATCH 04/76] libxfs: use a convenience variable instead of open-coding the fork Darrick J. Wong
2015-12-19  8:56 ` [PATCH 05/76] libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct Darrick J. Wong
2016-01-03 12:15   ` Christoph Hellwig
2016-01-04 22:12     ` Darrick J. Wong
2016-01-04 23:23       ` Darrick J. Wong
2016-01-04 23:51       ` Dave Chinner
2015-12-19  8:57 ` [PATCH 06/76] xfs: introduce rmap btree definitions Darrick J. Wong
2015-12-19  8:57 ` [PATCH 07/76] xfs: add rmap btree stats infrastructure Darrick J. Wong
2015-12-19  8:57 ` [PATCH 08/76] xfs: rmap btree add more reserved blocks Darrick J. Wong
2015-12-19  8:57 ` [PATCH 09/76] xfs: add owner field to extent allocation and freeing Darrick J. Wong
2015-12-19  8:57 ` [PATCH 10/76] xfs: add extended " Darrick J. Wong
2015-12-19  8:57 ` [PATCH 11/76] xfs: introduce rmap extent operation stubs Darrick J. Wong
2015-12-19  8:57 ` [PATCH 12/76] xfs: extend rmap extent operation stubs to take full owner info Darrick J. Wong
2015-12-19  8:57 ` [PATCH 13/76] xfs: define the on-disk rmap btree format Darrick J. Wong
2015-12-19  8:57 ` [PATCH 14/76] xfs: enhance " Darrick J. Wong
2015-12-19  8:58 ` [PATCH 15/76] xfs: add rmap btree growfs support Darrick J. Wong
2015-12-19  8:58 ` [PATCH 16/76] xfs: enhance " Darrick J. Wong
2015-12-19  8:58 ` [PATCH 17/76] xfs: rmap btree transaction reservations Darrick J. Wong
2015-12-19  8:58 ` [PATCH 18/76] xfs: rmap btree requires more reserved free space Darrick J. Wong
2015-12-19  8:58 ` [PATCH 19/76] libxfs: fix min freelist length calculation Darrick J. Wong
2015-12-19  8:58 ` [PATCH 20/76] xfs: add rmap btree operations Darrick J. Wong
2015-12-19  8:58 ` [PATCH 21/76] xfs: enhance " Darrick J. Wong
2015-12-19  8:58 ` [PATCH 22/76] xfs: add an extent to the rmap btree Darrick J. Wong
2015-12-19  8:58 ` [PATCH 23/76] xfs: add tracepoints for the rmap-mirrors-bmbt functions Darrick J. Wong
2015-12-19  8:58 ` [PATCH 24/76] xfs: teach rmap_alloc how to deal with our larger rmap btree Darrick J. Wong
2015-12-19  8:59 ` [PATCH 25/76] xfs: remove an extent from the " Darrick J. Wong
2015-12-19  8:59 ` [PATCH 26/76] xfs: enhanced " Darrick J. Wong
2015-12-19  8:59 ` [PATCH 27/76] xfs: add rmap btree insert and delete helpers Darrick J. Wong
2015-12-19  8:59 ` [PATCH 28/76] xfs: piggyback rmapbt update intents in the bmap free structure Darrick J. Wong
2015-12-19  8:59 ` [PATCH 29/76] xfs: bmap btree changes should update rmap btree Darrick J. Wong
2015-12-19  8:59 ` [PATCH 30/76] xfs: add rmap btree geometry feature flag Darrick J. Wong
2015-12-19  8:59 ` [PATCH 31/76] xfs: add rmap btree block detection to log recovery Darrick J. Wong
2015-12-19  8:59 ` [PATCH 32/76] xfs: enable the rmap btree functionality Darrick J. Wong
2015-12-19  9:00 ` [PATCH 33/76] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled Darrick J. Wong
2015-12-19  9:00 ` [PATCH 34/76] xfs: implement " Darrick J. Wong
2016-01-03 12:17   ` Christoph Hellwig
2016-01-04 23:40     ` Darrick J. Wong
2016-01-05  2:41       ` Dave Chinner
2016-01-07  0:09         ` Darrick J. Wong
2015-12-19  9:00 ` [PATCH 35/76] libxfs: refactor short btree block verification Darrick J. Wong
2016-01-03 12:18   ` Christoph Hellwig
2016-01-03 21:30     ` Dave Chinner
2015-12-19  9:00 ` [PATCH 36/76] xfs: don't update rmapbt when fixing agfl Darrick J. Wong
2015-12-19  9:00 ` [PATCH 37/76] xfs: define tracepoints for refcount btree activities Darrick J. Wong
2015-12-19  9:00 ` [PATCH 38/76] xfs: introduce refcount btree definitions Darrick J. Wong
2015-12-19  9:00 ` [PATCH 39/76] xfs: add refcount btree stats infrastructure Darrick J. Wong
2015-12-19  9:00 ` [PATCH 40/76] xfs: refcount btree add more reserved blocks Darrick J. Wong
2015-12-19  9:00 ` [PATCH 41/76] xfs: define the on-disk refcount btree format Darrick J. Wong
2015-12-19  9:00 ` [PATCH 42/76] xfs: add refcount btree support to growfs Darrick J. Wong
2015-12-19  9:01 ` [PATCH 43/76] xfs: add refcount btree operations Darrick J. Wong
2015-12-19  9:01 ` [PATCH 44/76] libxfs: adjust refcount of an extent of blocks in refcount btree Darrick J. Wong
2015-12-19  9:01 ` [PATCH 45/76] libxfs: adjust refcount when unmapping file blocks Darrick J. Wong
2015-12-19  9:01 ` [PATCH 46/76] xfs: add refcount btree block detection to log recovery Darrick J. Wong
2015-12-19  9:01 ` [PATCH 47/76] xfs: refcount btree requires more reserved space Darrick J. Wong
2015-12-19  9:01 ` [PATCH 48/76] xfs: introduce reflink utility functions Darrick J. Wong
2015-12-19  9:01 ` [PATCH 49/76] xfs: define tracepoints for reflink activities Darrick J. Wong
2015-12-19  9:01 ` [PATCH 50/76] xfs: map an inode's offset to an exact physical block Darrick J. Wong
2015-12-19  9:02 ` [PATCH 51/76] xfs: add reflink feature flag to geometry Darrick J. Wong
2015-12-19  9:02 ` [PATCH 52/76] xfs: don't allow reflinked dir/dev/fifo/socket/pipe files Darrick J. Wong
2015-12-19  9:02 ` [PATCH 53/76] xfs: introduce the CoW fork Darrick J. Wong
2015-12-19  9:02 ` [PATCH 54/76] xfs: support bmapping delalloc extents in " Darrick J. Wong
2015-12-19  9:02 ` [PATCH 55/76] xfs: create delalloc extents in " Darrick J. Wong
2015-12-19  9:02 ` [PATCH 56/76] xfs: support allocating delayed " Darrick J. Wong
2015-12-19  9:02 ` [PATCH 57/76] xfs: allocate " Darrick J. Wong
2016-01-03 12:20   ` Christoph Hellwig
2016-01-05  1:13     ` Darrick J. Wong
2016-01-09  9:59   ` Darrick J. Wong
2015-12-19  9:02 ` [PATCH 58/76] xfs: support removing extents from " Darrick J. Wong
2015-12-19  9:03 ` [PATCH 59/76] xfs: move mappings from cow fork to data fork after copy-write Darrick J. Wong
2015-12-19  9:03 ` [PATCH 60/76] xfs: implement CoW for directio writes Darrick J. Wong
2016-01-08  9:34   ` Darrick J. Wong
2015-12-19  9:03 ` [PATCH 61/76] xfs: copy-on-write reflinked blocks when zeroing ranges of blocks Darrick J. Wong
2015-12-19  9:03 ` [PATCH 62/76] xfs: clear inode reflink flag when freeing blocks Darrick J. Wong
2015-12-19  9:03 ` [PATCH 63/76] xfs: cancel pending CoW reservations when destroying inodes Darrick J. Wong
2015-12-19  9:03 ` [PATCH 64/76] xfs: reflink extents from one file to another Darrick J. Wong
2015-12-19  9:03 ` [PATCH 65/76] xfs: add clone file and clone range ioctls Darrick J. Wong
2015-12-19  9:03 ` [PATCH 66/76] xfs: emulate the btrfs dedupe extent same ioctl Darrick J. Wong
2015-12-19  9:03 ` [PATCH 67/76] xfs: teach fiemap about reflink'd extents Darrick J. Wong
2015-12-19  9:03 ` [PATCH 68/76] xfs: swap inode reflink flags when swapping inode extents Darrick J. Wong
2015-12-19  9:04 ` [PATCH 69/76] xfs: unshare a range of blocks via fallocate Darrick J. Wong
2015-12-19  9:04 ` [PATCH 70/76] xfs: fork shared EOF block when truncating file Darrick J. Wong
2015-12-19  9:04 ` [PATCH 71/76] xfs: support XFS_XFLAG_REFLINK (and FS_NOCOW_FL) on reflink filesystems Darrick J. Wong
2015-12-19  9:04 ` [PATCH 72/76] xfs: recognize the reflink feature bit Darrick J. Wong
2015-12-19  9:04 ` [PATCH 73/76] xfs: use new vfs reflink and dedup function pointers Darrick J. Wong
2015-12-19  9:04 ` [PATCH 74/76] xfs: set up per-AG preallocated block pools Darrick J. Wong
2015-12-19  9:04 ` [PATCH 75/76] xfs: preallocate blocks for worst-case refcount btree expansion Darrick J. Wong
2015-12-19  9:04 ` [PATCH 76/76] xfs: try to prevent failed rmap btree expansion during cow Darrick J. Wong
2015-12-20 14:02 ` [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Brian Foster
2016-01-04 23:59   ` Darrick J. Wong
2016-01-05 12:42     ` Brian Foster [this message]
2016-01-06  2:04       ` Darrick J. Wong
2016-01-06  3:44         ` Dave Chinner
2016-02-02 23:06           ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160105124226.GA38749@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.