All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org, david@fromorbit.com
Subject: Re: [PATCH 1/3] xfs: proper replay of deferred ops queued during log recovery
Date: Wed, 23 Sep 2020 08:48:16 +0100	[thread overview]
Message-ID: <20200923074816.GA31918@infradead.org> (raw)
In-Reply-To: <160031334683.3624461.7162765986332930241.stgit@magnolia>

On Wed, Sep 16, 2020 at 08:29:07PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When we replay unfinished intent items that have been recovered from the
> log, it's possible that the replay will cause the creation of more
> deferred work items.  As outlined in commit 509955823cc9c ("xfs: log
> recovery should replay deferred ops in order"), later work items have an
> implicit ordering dependency on earlier work items.  Therefore, recovery
> must replay the items (both recovered and created) in the same order
> that they would have been during normal operation.
> 
> For log recovery, we enforce this ordering by using an empty transaction
> to collect deferred ops that get created in the process of recovering a
> log intent item to prevent them from being committed before the rest of
> the recovered intent items.  After we finish committing all the
> recovered log items, we allocate a transaction with an enormous block
> reservation, splice our huge list of created deferred ops into that
> transaction, and commit it, thereby finishing all those ops.
> 
> This is /really/ hokey -- it's the one place in XFS where we allow
> nested transactions; the splicing of the defer ops list is is inelegant
> and has to be done twice per recovery function; and the broken way we
> handle inode pointers and block reservations cause subtle use-after-free
> and allocator problems that will be fixed by this patch and the two
> patches after it.
> 
> Therefore, replace the hokey empty transaction with a structure designed
> to capture each chain of deferred ops that are created as part of
> recovering a single unfinished log intent.  Finally, refactor the loop
> that replays those chains to do so using one transaction per chain.

Wow, that is a massive change, but I really like the direction.


> +int
>  xfs_defer_capture(
> +	struct xfs_trans		*tp,
> +	struct xfs_defer_capture	**dfcp)
>  {
> +	struct xfs_defer_capture	*dfc = NULL;
> +
> +	if (!list_empty(&tp->t_dfops)) {
> +		dfc = kmem_zalloc(sizeof(*dfc), KM_NOFS);
> +
> +		xfs_defer_create_intents(tp);
> +
> +		INIT_LIST_HEAD(&dfc->dfc_list);
> +		INIT_LIST_HEAD(&dfc->dfc_dfops);
> +
> +		/* Move the dfops chain and transaction state to the freezer. */
> +		list_splice_init(&tp->t_dfops, &dfc->dfc_dfops);
> +		dfc->dfc_tpflags = tp->t_flags & XFS_TRANS_LOWMODE;
> +		xfs_defer_reset(tp);
> +	}
> +
> +	*dfcp = dfc;
> +	return xfs_trans_commit(tp);

Don't we need to free the structure if xfs_trans_commit fails?

Wouldn't a saner calling convention would to pass the list_head into
->iop_recover and this function, and just add it to the list here
instead of passing it around as an output pointer argument.

> +/*
> + * Freeze any deferred ops and commit the transaction.  This is the last step
> + * needed to finish a log intent item that we recovered from the log.
> + */
> +int
> +xlog_recover_trans_commit(
> +	struct xfs_trans		*tp,
> +	struct xfs_defer_capture	**dfcp)
> +{
> +	return xfs_defer_capture(tp, dfcp);
> +}

I don't think this wrapper is very helpful.  I think a direct call
to xfs_defer_capture captures the intent better than pretending it just
is another commit.

> +
>  /* Take all the collected deferred ops and finish them in order. */
>  static int
>  xlog_finish_defer_ops(
> +	struct xfs_mount	*mp,
> +	struct list_head	*dfops_freezers)
>  {
> +	struct xfs_defer_capture *dfc, *next;
>  	struct xfs_trans	*tp;
>  	int64_t			freeblks;
> +	uint64_t		resblks;
> +	int			error = 0;
>  
> +	list_for_each_entry_safe(dfc, next, dfops_freezers, dfc_list) {
> +		/*
> +		 * We're finishing the defer_ops that accumulated as a result
> +		 * of recovering unfinished intent items during log recovery.
> +		 * We reserve an itruncate transaction because it is the
> +		 * largest permanent transaction type.  Since we're the only
> +		 * user of the fs right now, take 93% (15/16) of the available
> +		 * free blocks.  Use weird math to avoid a 64-bit division.
> +		 */
> +		freeblks = percpu_counter_sum(&mp->m_fdblocks);
> +		if (freeblks <= 0) {
> +			error = -ENOSPC;
> +			break;
> +		}
>  
> +		resblks = min_t(uint64_t, UINT_MAX, freeblks);
> +		resblks = (resblks * 15) >> 4;
> +		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> +				0, XFS_TRANS_RESERVE, &tp);

This isn't actually new, but how did we end up with the truncate
reservation here?

> +		if (error)
> +			break;
> +
> +		/* transfer all collected dfops to this transaction */
> +		list_del_init(&dfc->dfc_list);
> +		xfs_defer_continue(dfc, tp);
> +
> +		error = xfs_trans_commit(tp);
> +		xfs_defer_capture_free(mp, dfc);
> +		if (error)
> +			break;

I'd just do direct returns and return 0 outside the loop, but that is
just nitpicking.

>  /* Is this log item a deferred action intent? */
> @@ -2528,28 +2566,16 @@ STATIC int
>  xlog_recover_process_intents(
>  	struct xlog		*log)
>  {
> +	LIST_HEAD(dfops_freezers);
>  	struct xfs_ail_cursor	cur;
> +	struct xfs_defer_capture *freezer = NULL;
>  	struct xfs_log_item	*lip;
>  	struct xfs_ail		*ailp;
> +	int			error = 0;
>  #if defined(DEBUG) || defined(XFS_WARN)
>  	xfs_lsn_t		last_lsn;
>  #endif
>  
>  	ailp = log->l_ailp;
>  	spin_lock(&ailp->ail_lock);
>  	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
> @@ -2578,26 +2604,31 @@ xlog_recover_process_intents(
>  
>  		/*
>  		 * NOTE: If your intent processing routine can create more
> +		 * deferred ops, you /must/ attach them to the freezer in
>  		 * this routine or else those subsequent intents will get
>  		 * replayed in the wrong order!
>  		 */
>  		if (!test_and_set_bit(XFS_LI_RECOVERED, &lip->li_flags)) {
>  			spin_unlock(&ailp->ail_lock);
> +			error = lip->li_ops->iop_recover(lip, &freezer);
>  			spin_lock(&ailp->ail_lock);
>  		}
> +		if (freezer) {
> +			list_add_tail(&freezer->dfc_list, &dfops_freezers);
> +			freezer = NULL;
> +		}
>  		if (error)
> +			break;
> +
>  		lip = xfs_trans_ail_cursor_next(ailp, &cur);

The code flow looks really very odd (though correct) to me, what do you
think of something like:

  	spin_lock(&ailp->ail_lock);
	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
	     lip;
	     lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
		if (test_and_set_bit(XFS_LI_RECOVERED, &lip->li_flags))
			continue;

		spin_unlock(&ailp->ail_lock);
		error = lip->li_ops->iop_recover(lip, &dfops_freezers);
		spin_lock(&ailp->ail_lock);
		if (error)
			break;
	}
  	xfs_trans_ail_cursor_done(&cur);
  	spin_unlock(&ailp->ail_lock);

  reply	other threads:[~2020-09-23  7:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-17  3:29 [PATCH 0/3] xfs: fix how we deal with new intents during recovery Darrick J. Wong
2020-09-17  3:29 ` [PATCH 1/3] xfs: proper replay of deferred ops queued during log recovery Darrick J. Wong
2020-09-23  7:48   ` Christoph Hellwig [this message]
2020-09-23 15:46     ` Darrick J. Wong
2020-09-24  6:02   ` [PATCH v2 " Darrick J. Wong
2020-09-25 12:21     ` Brian Foster
2020-09-25 19:19       ` Darrick J. Wong
2020-09-17  3:29 ` [PATCH 2/3] xfs: xfs_defer_capture should absorb remaining block reservation Darrick J. Wong
2020-09-23  7:49   ` Christoph Hellwig
2020-09-24  6:04   ` [PATCH v2 " Darrick J. Wong
2020-09-17  3:29 ` [PATCH 3/3] xfs: xfs_defer_capture should absorb remaining transaction reservation Darrick J. Wong
2020-09-23  7:49   ` Christoph Hellwig
2020-09-23 15:22     ` Darrick J. Wong
2020-09-24  6:03   ` [PATCH v2 " Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2020-05-05  1:13 [PATCH v2 0/3] xfs: fix inode use-after-free during log recovery Darrick J. Wong
2020-05-05  1:13 ` [PATCH 1/3] xfs: proper replay of deferred ops queued " Darrick J. Wong
2020-05-05  2:33   ` Dave Chinner
2020-05-05  3:06     ` Darrick J. Wong
2020-05-05  5:10       ` Dave Chinner
2020-05-05 15:08         ` Darrick J. Wong
2020-04-22  2:08 [PATCH 0/3] xfs: fix inode use-after-free " Darrick J. Wong
2020-04-22  2:08 ` [PATCH 1/3] xfs: proper replay of deferred ops queued " Darrick J. Wong
2020-04-24 14:02   ` Brian Foster
2020-04-28 22:28     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200923074816.GA31918@infradead.org \
    --to=hch@infradead.org \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.