All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 04/11] xfs: CoW fork operations should only update quota reservations
Date: Thu, 25 Jan 2018 09:52:57 -0800	[thread overview]
Message-ID: <20180125175257.GG9068@magnolia> (raw)
In-Reply-To: <20180125130123.GA43198@bfoster.bfoster>

On Thu, Jan 25, 2018 at 08:01:24AM -0500, Brian Foster wrote:
> On Wed, Jan 24, 2018 at 11:14:25AM -0800, Darrick J. Wong wrote:
> > On Wed, Jan 24, 2018 at 09:22:16AM -0500, Brian Foster wrote:
> > > On Tue, Jan 23, 2018 at 06:18:23PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Since the CoW fork only exists in memory, it is incorrect to update the
> > > > on-disk quota block counts when we modify the CoW fork.  Unlike the data
> > > > fork, even real extents in the CoW fork are only reservations (on-disk
> > > > they're owned by the refcountbt) so they must not be tracked in the on
> > > > disk quota info.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_bmap.c |  203 ++++++++++++++++++++++++++++++++++++++++++++--
> > > >  fs/xfs/xfs_reflink.c     |    8 +-
> > > >  2 files changed, 196 insertions(+), 15 deletions(-)
> > > > 
> > > > 
> ...
> > > > + * - sb_fdblocks: Number of free blocks recorded in the superblock on disk.
> > > > + * - fdblocks: Number of free blocks recorded in the superblock minus any
> > > > + *           in-core reservations made in anticipation of future writes.
> > > > + *
> > > > + * - t_blk_res: Number of blocks reserved out of fdblocks for a transaction.
> > > > + *           When the transaction commits, t_blk_res - t_blk_res_used is given
> > > > + *           back to fdblocks.
> > > > + * - t_blk_res_used: Number of blocks used by this transaction that were
> > > > + *           reserved for this transaction.
> > > > + * - t_fdblocks_del: Number of blocks by which fdblocks and sb_fdblocks will
> > > > + *           have to decrease at commit.
> > > > + * - t_res_fdblocks_delta: Number of blocks by which sb_fdblocks will have to
> > > > + *           decrease at commit.  We assume that fdblocks was decreased
> > > > + *           prior to the transaction.
> > > > + *
> > > > + * Data fork block mappings have four logical states:
> > > > + *
> > > > + *    +--------> UNWRITTEN <------+
> > > > + *    |              ^            |
> > > > + *    |              v            v
> > > > + * DELALLOC <----> HOLE <------> REAL
> > > > + *    |                           ^
> > > > + *    |                           |
> > > > + *    +---------------------------+
> > > > + *
> > > 
> > > I'm not sure we need a graphic for the extent states. Non-hole
> > > conversions to delalloc is the only transition that doesn't make any
> > > sense.
> > 
> > First of all, Dave keeps asking for ASCII art 'when appropriate'. :)
> > 
> 
> I'm not against ASCII art in general...
> 
> > But on a more serious note, I thought the state diagram would be useful
> > for anyone who isn't so familiar with how blocks get mapped into files
> > in xfs, particularly to compare the data fork diagram against the
> > corresponding picture for the cow fork.
> > 
> 
> ... but TBH I didn't really notice they were different until you pointed
> it out. :/ It still seems like a rather verbose means to point out that
> COW fork apparently doesn't have the DELALLOC -> REAL transition.

It also doesn't have REAL -> UNWRITTEN transitions either.  Clearly the
diagrams aren't helping, point made. ;)

Heh, it also has HOLE -> REAL transitions even though the diagram does
not so indicate, so ... yeah.

> ...
> > 
> > > I'm also wondering how much of the whole picture really needs to be
> > > described here to cover quota accounting with respect to block mapping
> > > (sufficiently to differentiate data fork from cow fork, which I take is
> > > the purpose of this section). For example, do we really need the
> > > internal transaction details for how quota deltas are carried?
> > > 
> > > Instead, I think it might be sufficient to explain that the quota system
> > > works in two "levels" (for lack of a better term :/), one for
> > > reservation and another for real block usage. The reservation is
> > > associated with transaction reservation and/or delayed block allocation
> > > (no tx). In either case, quota reservation is converted to real quota
> > > block usage when a transaction commits that maps real/physical blocks.
> > > If the transaction held extra reservation that went unused, that quota
> > > reservation is released. The primary difference is that transactions to
> > > convert delalloc -> real do not reserve quota blocks in the first place,
> > > since that has already occurred, so they just need to make sure to
> > > convert/persist the quota res for the blocks that were converted to
> > > real.
> > 
> > I thought about cutting this whole comment down to a simple sentence
> > about how quota accounting is different between the data & cow forks (as
> > you figured out, we use delalloc quota reservations for everything in
> > the cow fork and only turn them into real ones when we go to remap) but
> > then worried that doing so would presuppose the reader knew anything
> > about how the extent lifecycles work... and that's how I end up with a
> > gigantic manual.
> > 
> 
> Which is probably fine for xfs-docs or something..
> 
> The more I think about it, the more I think this whole thing is better
> off focused on explaining the unique quota management of COW fork
> blocks. We can add additional comments about extent lifecycles, how
> quota is tracked in general, etc., but perhaps it's best to make those a
> separate patch rather than attempt to document a ground-up "COW fork
> quotas for dummies" in a single comment. :P Anyways, I'll reserve
> further judgement for the new patch..

Yeah.  I think I'll go update ch3 ("Sharing Data Blocks") in the design
document instead of dumping everything into a code comment that seems
somewhat misplaced.

Onto the next reply! :)

--D

> Brian
> 
> > > > + * Copy on Write Fork Mapping Lifecycle
> > > > + *
> > > > + * The CoW fork handles things differently from the data fork because its
> > > > + * mappings only exist in memory-- the refcount btree is the on-disk owner of
> > > > + * the extents until they're remapped into the data fork.  Therefore,
> > > > + * unwritten and real extents in the CoW fork are treated the same way as
> > > > + * delayed allocation extents.  Quota and fdblock changes only exist in
> > > > + * memory, which requires some twists in the bmap functions.
> > > > + *
> > > 
> > > Ok, but perhaps this should point out what happens when cow blocks are
> > > reserved with respect to quotas..? IIUC, a delalloc quota reservation
> > > occurs just the same as above, the blocks simply reside in another fork.
> > 
> > Yes.
> > 
> > > > + * The CoW fork extent state diagram looks like this:
> > > > + *
> > > > + *    +--------> UNWRITTEN -------+
> > > > + *    |              ^            |
> > > > + *    |              v            v
> > > > + * DELALLOC <----> HOLE <------- REAL
> > > > + *
> > > > + * Holes are still holes.  Delayed allocation extents reserve blocks for
> > > > + * landing future writes, just like they do in the data fork.  However, unlike
> > > > + * the data fork, unwritten extents signal an extent that has been allocated
> > > > + * but is not currently undergoing writeback.  Real extents are undergoing
> > > > + * writeback, and when that writeback finishes the corresponding data fork
> > > > + * extent will be punched out and the CoW fork counterpart moved to the new
> > > > + * hole in the data fork.
> > > > + *
> > > 
> > > Ok, so the difference is that for the COW fork, the _extent state_
> > > conversion is not the appropriate trigger event to convert quota
> > > reservation to real quota usage. Instead, the blocks being remapped from
> > > the COW fork to the data fork is when that should occur.
> > 
> > Yes.
> > 
> > > > + * The state transitions and required metadata updates are as follows:
> > > > + *
> > > > + * - HOLE to DELALLOC: Increase i_cow_blocks and q_res_bcount, and decrease
> > > > + *           fdblocks.
> > > > + * - HOLE to UNWRITTEN: Same as above, but since we reserved quota via
> > > > + *           qt_blk_res (which increased q_res_bcount) when we allocate the
> > > > + *           extent we have to decrease qt_blk_res so that the commit doesn't
> > > > + *           give the allocated CoW blocks back.
> > > > + *
> > > 
> > > Hmm, this is a little confusing. Looking at the code change and comment
> > > below, I think I get what this is trying to do, which is essentially
> > > make a real block cow fork alloc behave like a delalloc reservation
> > > (with respect to quota). FWIW, I think what confuses me is the assertion
> > > that the blocks would be "given back" otherwise. The only reference I
> > > have to compare is data fork alloc behavior, which implies that used
> > > reservation would not be given back, but rather converted to real quota
> > > usage on block allocation (and excess reservation would still be given
> > > back, which afaict we still want to happen). So the trickery is required
> > > to prevent conversion of quota reservation for the allocated cow blocks,
> > > let that res sit around until the cow blocks are remapped, and release
> > > unused reservation from the tx as normal. Am I following that correctly?
> > 
> > Yes.
> > 
> > > > + * - DELALLOC to UNWRITTEN: No change.
> > > > + * - DELALLOC to HOLE: Decrease i_cow_blocks and q_res_bcount, and increase
> > > > + *           fdblocks.
> > > > + *
> > > > + * - UNWRITTEN to HOLE: Same as DELALLOC to HOLE.
> > > > + * - UNWRITTEN to REAL: No change.
> > > > + *
> > > > + * - REAL to HOLE: This transition happens when we've finished a write
> > > > + *           operation and need to move the mapping to the data fork.  We
> > > > + *           punch the correspond data fork mappings, which decreases
> > > > + *           qt_bcount.  Then we map the CoW fork mapping into the hole we
> > > > + *           just cleared out of the data fork, which increases qt_bcount.
> > > > + *           There's a subtlety here -- if we promoted a write over a hole to
> > > > + *           CoW, there will be a net increase in qt_bcount, which is fine
> > > > + *           because we already reserved the quota when we filled the CoW
> > > > + *           fork.  Finally, we punch the CoW fork mapping, which decreases
> > > > + *           q_res_bcount.
> > > > + *
> > > > + * Notice how all CoW fork extents use transactionless quota reservations and
> > > > + * the in-core fdblocks to maintain state, and we avoid updating any on-disk
> > > > + * metadata.  This is essential to maintain metadata correctness if the system
> > > > + * goes down.
> > > > + */
> > > >  
> > > >  kmem_zone_t		*xfs_bmap_free_item_zone;
> > > >  
> > > > @@ -3337,6 +3476,39 @@ xfs_bmap_btalloc_filestreams(
> > > >  	return 0;
> > > >  }
> > > >  
> > > > +/* Deal with CoW fork accounting when we allocate a block. */
> > > > +static void
> > > > +xfs_bmap_btalloc_cow(
> > > > +	struct xfs_bmalloca	*ap,
> > > > +	struct xfs_alloc_arg	*args)
> > > > +{
> > > > +	/* Filling a previously reserved extent; nothing to do here. */
> > > > +	if (ap->wasdel)
> > > > +		return;
> > > > +
> > > > +	/*
> > > > +	 * The CoW fork only exists in memory, so the on-disk quota accounting
> > > > +	 * must not incude any CoW fork extents.  Therefore, CoW blocks are
> > > > +	 * only tracked in the in-core dquot block count (q_res_bcount).
> > > > +	 *
> > > > +	 * If we get here, we're filling a CoW hole with a real (non-delalloc)
> > > > +	 * CoW extent having reserved enough blocks from both q_res_bcount and
> > > > +	 * qt_blk_res to guarantee that we won't run out of space.  The unused
> > > > +	 * qt_blk_res is given back to q_res_bcount when the transaction
> > > > +	 * commits.
> > > > +	 *
> > > > +	 * We don't want the quota accounting for our newly allocated blocks
> > > > +	 * to be given back, so we must decrease qt_blk_res without decreasing
> > > > +	 * q_res_bcount.
> > > > +	 *
> > > > +	 * Note: If we're allocating a delalloc extent, we already reserved
> > > > +	 * the q_res_bcount blocks, so no quota accounting update is needed
> > > > +	 * here.
> > > > +	 */
> > > > +	xfs_trans_mod_dquot_byino(ap->tp, ap->ip, XFS_TRANS_DQ_RES_BLKS,
> > > > +			-(long)args->len);
> > > > +}
> > > 
> > > Factoring nit.. if we're going to refactor bits of xfs_bmap_btalloc()
> > > out, it might be cleaner to factor out all of the quota logic rather
> > > than just the cow bits (which is basically just a simple check and
> > > function call). E.g., refactor into an xfs_bmap_btalloc_quota() helper
> > > that does the right thing based on the fork, with comments as to why,
> > > etc. (and perhaps just leave the unrelated di_nblocks change behind).
> > 
> > I thought about factoring the data/attr fork stuff into its own
> > xfs_bmap_btalloc_quota() function too, since this function is already
> > eyewateringly long.  I think I'll do that as a separate refactor at the
> > end of the series, though...
> > 
> > --D
> > 
> > > Brian
> > > 
> > > > +
> > > >  STATIC int
> > > >  xfs_bmap_btalloc(
> > > >  	struct xfs_bmalloca	*ap)	/* bmap alloc argument struct */
> > > > @@ -3571,19 +3743,22 @@ xfs_bmap_btalloc(
> > > >  			*ap->firstblock = args.fsbno;
> > > >  		ASSERT(nullfb || fb_agno <= args.agno);
> > > >  		ap->length = args.len;
> > > > -		if (!(ap->flags & XFS_BMAPI_COWFORK))
> > > > -			ap->ip->i_d.di_nblocks += args.len;
> > > > -		xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
> > > >  		if (ap->wasdel)
> > > >  			ap->ip->i_delayed_blks -= args.len;
> > > > -		/*
> > > > -		 * Adjust the disk quota also. This was reserved
> > > > -		 * earlier.
> > > > -		 */
> > > > -		xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
> > > > -			ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT :
> > > > -					XFS_TRANS_DQ_BCOUNT,
> > > > -			(long) args.len);
> > > > +		if (ap->flags & XFS_BMAPI_COWFORK) {
> > > > +			xfs_bmap_btalloc_cow(ap, &args);
> > > > +		} else {
> > > > +			ap->ip->i_d.di_nblocks += args.len;
> > > > +			xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
> > > > +			/*
> > > > +			 * Adjust the disk quota also. This was reserved
> > > > +			 * earlier.
> > > > +			 */
> > > > +			xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
> > > > +				ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT :
> > > > +						XFS_TRANS_DQ_BCOUNT,
> > > > +				(long) args.len);
> > > > +		}
> > > >  	} else {
> > > >  		ap->blkno = NULLFSBLOCK;
> > > >  		ap->length = 0;
> > > > @@ -4776,6 +4951,7 @@ xfs_bmap_del_extent_cow(
> > > >  	struct xfs_bmbt_irec	new;
> > > >  	xfs_fileoff_t		del_endoff, got_endoff;
> > > >  	int			state = BMAP_COWFORK;
> > > > +	int			error;
> > > >  
> > > >  	XFS_STATS_INC(mp, xs_del_exlist);
> > > >  
> > > > @@ -4832,6 +5008,11 @@ xfs_bmap_del_extent_cow(
> > > >  		xfs_iext_insert(ip, icur, &new, state);
> > > >  		break;
> > > >  	}
> > > > +
> > > > +	/* Remove the quota reservation */
> > > > +	error = xfs_trans_reserve_quota_nblks(NULL, ip,
> > > > +			-(long)del->br_blockcount, 0, XFS_QMOPT_RES_REGBLKS);
> > > > +	ASSERT(error == 0);
> > > >  }
> > > >  
> > > >  /*
> > > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > > > index 82abff6..e367351 100644
> > > > --- a/fs/xfs/xfs_reflink.c
> > > > +++ b/fs/xfs/xfs_reflink.c
> > > > @@ -599,10 +599,6 @@ xfs_reflink_cancel_cow_blocks(
> > > >  					del.br_startblock, del.br_blockcount,
> > > >  					NULL);
> > > >  
> > > > -			/* Update quota accounting */
> > > > -			xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
> > > > -					-(long)del.br_blockcount);
> > > > -
> > > >  			/* Roll the transaction */
> > > >  			xfs_defer_ijoin(&dfops, ip);
> > > >  			error = xfs_defer_finish(tpp, &dfops);
> > > > @@ -795,6 +791,10 @@ xfs_reflink_end_cow(
> > > >  		if (error)
> > > >  			goto out_defer;
> > > >  
> > > > +		/* Charge this new data fork mapping to the on-disk quota. */
> > > > +		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT,
> > > > +				(long)del.br_blockcount);
> > > > +
> > > >  		/* Remove the mapping from the CoW fork. */
> > > >  		xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
> > > >  
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-01-25 17:53 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-24  2:17 [PATCH 00/11] xfs: reflink/scrub/quota fixes Darrick J. Wong
2018-01-24  2:18 ` [PATCH 01/11] xfs: reflink should break pnfs leases before sharing blocks Darrick J. Wong
2018-01-24 14:16   ` Brian Foster
2018-01-26  9:06   ` Christoph Hellwig
2018-01-26 18:26     ` Darrick J. Wong
2018-01-24  2:18 ` [PATCH 02/11] xfs: only grab shared inode locks for source file during reflink Darrick J. Wong
2018-01-24 14:18   ` Brian Foster
2018-01-24 18:40     ` Darrick J. Wong
2018-01-26 12:07   ` Christoph Hellwig
2018-01-26 18:48     ` Darrick J. Wong
2018-01-27  3:32     ` Dave Chinner
2018-01-24  2:18 ` [PATCH 03/11] xfs: call xfs_qm_dqattach before performing reflink operations Darrick J. Wong
2018-01-24 14:18   ` Brian Foster
2018-01-26  9:07   ` Christoph Hellwig
2018-01-24  2:18 ` [PATCH 04/11] xfs: CoW fork operations should only update quota reservations Darrick J. Wong
2018-01-24 14:22   ` Brian Foster
2018-01-24 19:14     ` Darrick J. Wong
2018-01-25 13:01       ` Brian Foster
2018-01-25 17:52         ` Darrick J. Wong [this message]
2018-01-25  1:20   ` [PATCH v2 " Darrick J. Wong
2018-01-25 13:03     ` Brian Foster
2018-01-25 18:20       ` Darrick J. Wong
2018-01-26 13:02         ` Brian Foster
2018-01-26 18:40           ` Darrick J. Wong
2018-01-26 12:12     ` Christoph Hellwig
2018-01-24  2:18 ` [PATCH 05/11] xfs: track CoW blocks separately in the inode Darrick J. Wong
2018-01-25 13:06   ` Brian Foster
2018-01-25 19:21     ` Darrick J. Wong
2018-01-26 13:04       ` Brian Foster
2018-01-26 19:08         ` Darrick J. Wong
2018-01-26 12:15   ` Christoph Hellwig
2018-01-26 19:00     ` Darrick J. Wong
2018-01-26 23:51       ` Darrick J. Wong
2018-01-24  2:18 ` [PATCH 06/11] xfs: fix up cowextsz allocation shortfalls Darrick J. Wong
2018-01-25 17:31   ` Brian Foster
2018-01-25 20:20     ` Darrick J. Wong
2018-01-26 13:06       ` Brian Foster
2018-01-26 19:12         ` Darrick J. Wong
2018-01-26  9:11   ` Christoph Hellwig
2018-01-24  2:18 ` [PATCH 07/11] xfs: always zero di_flags2 when we free the inode Darrick J. Wong
2018-01-25 17:31   ` Brian Foster
2018-01-25 18:36     ` Darrick J. Wong
2018-01-26  9:08   ` Christoph Hellwig
2018-01-24  2:18 ` [PATCH 08/11] xfs: fix tracepoint %p formats Darrick J. Wong
2018-01-25 17:31   ` Brian Foster
2018-01-25 18:47     ` Darrick J. Wong
2018-01-26  0:19       ` Darrick J. Wong
2018-01-26  9:09         ` Christoph Hellwig
2018-01-24  2:18 ` [PATCH 09/11] xfs: make tracepoint inode number format consistent Darrick J. Wong
2018-01-25 17:31   ` Brian Foster
2018-01-26  9:09   ` Christoph Hellwig
2018-01-24  2:19 ` [PATCH 10/11] xfs: refactor inode verifier corruption error printing Darrick J. Wong
2018-01-25 17:31   ` Brian Foster
2018-01-25 18:23     ` Darrick J. Wong
2018-01-26  9:10   ` Christoph Hellwig
2018-01-24  2:19 ` [PATCH 11/11] xfs: don't clobber inobt/finobt cursors when xref with rmap Darrick J. Wong
2018-01-26  9:10   ` Christoph Hellwig
2018-01-25  5:26 ` [PATCH 12/11] xfs: refactor quota code in xfs_bmap_btalloc Darrick J. Wong
2018-01-26 12:17   ` Christoph Hellwig
2018-01-26 21:46     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180125175257.GG9068@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.