All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/3] xfs: proper replay of deferred ops queued during log recovery
Date: Tue, 5 May 2020 08:08:55 -0700	[thread overview]
Message-ID: <20200505150855.GQ5703@magnolia> (raw)
In-Reply-To: <20200505051029.GN2040@dread.disaster.area>

On Tue, May 05, 2020 at 03:10:29PM +1000, Dave Chinner wrote:
> On Mon, May 04, 2020 at 08:06:51PM -0700, Darrick J. Wong wrote:
> > On Tue, May 05, 2020 at 12:33:05PM +1000, Dave Chinner wrote:
> > > On Mon, May 04, 2020 at 06:13:39PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > When we replay unfinished intent items that have been recovered from the
> > > > log, it's possible that the replay will cause the creation of more
> > > > deferred work items.  As outlined in commit 509955823cc9c ("xfs: log
> > > > recovery should replay deferred ops in order"), later work items have an
> > > > implicit ordering dependency on earlier work items.  Therefore, recovery
> > > > must replay the items (both recovered and created) in the same order
> > > > that they would have been during normal operation.
> > > > 
> > > > For log recovery, we enforce this ordering by using an empty transaction
> > > > to collect deferred ops that get created in the process of recovering a
> > > > log intent item to prevent them from being committed before the rest of
> > > > the recovered intent items.  After we finish committing all the
> > > > recovered log items, we allocate a transaction with an enormous block
> > > > reservation, splice our huge list of created deferred ops into that
> > > > transaction, and commit it, thereby finishing all those ops.
> > > > 
> > > > This is /really/ hokey -- it's the one place in XFS where we allow
> > > > nested transactions; the splicing of the defer ops list is is inelegant
> > > > and has to be done twice per recovery function; and the broken way we
> > > > handle inode pointers and block reservations cause subtle use-after-free
> > > > and allocator problems that will be fixed by this patch and the two
> > > > patches after it.
> > > > 
> > > > Therefore, replace the hokey empty transaction with a structure designed
> > > > to capture each chain of deferred ops that are created as part of
> > > > recovering a single unfinished log intent.  Finally, refactor the loop
> > > > that replays those chains to do so using one transaction per chain.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > FWIW, I don't like the "freezer" based naming here. It's too easily
> > > confused with freezing and thawing the filesystem....
> > > 
> > > I know, "delayed deferred ops" isn't much better, but at least it
> > > won't get confused with existing unrelated functionality.
> > 
> > xfs_defer_{freeze,thaw} -> xfs_defer_{capture,relink} ?
> 
> Yeah, capture seems appropriate, maybe relink -> continue? i.e.
> capture the remaining defer_ops to be run, then continue running
> them?

I think I like capture/continue.

> /me shrugs and thinks "naming is hard"....

:)

> > > I've barely looked at the code, so no real comments on that yet,
> > > but I did notice this:
> > > 
> > > > @@ -2495,35 +2515,59 @@ xlog_recover_process_data(
> > > >  /* Take all the collected deferred ops and finish them in order. */
> > > >  static int
> > > >  xlog_finish_defer_ops(
> > > > -	struct xfs_trans	*parent_tp)
> > > > +	struct xfs_mount	*mp,
> > > > +	struct list_head	*dfops_freezers)
> > > >  {
> > > > -	struct xfs_mount	*mp = parent_tp->t_mountp;
> > > > +	struct xfs_defer_freezer *dff, *next;
> > > >  	struct xfs_trans	*tp;
> > > >  	int64_t			freeblks;
> > > >  	uint			resblks;
> > > ....
> > > > +		resblks = min_t(int64_t, UINT_MAX, freeblks);
> > > > +		resblks = (resblks * 15) >> 4;
> > > 
> > > Can overflow when freeblks > (UINT_MAX / 15).
> > 
> > D'oh.  Ugh, I hate this whole fugly hack.
> > 
> > TBH I've been thinking that perhaps the freezer function should be
> > capturing the unused transaction block reservation when we capture the
> > dfops chain from the transaction.
> 
> Exactly what problem is this hack supposed to avoid? having the
> filesystem ENOSPC before all the deferops have been completed?
>
> if so, can that even happen? Because the fact that the intents are
> in the log means that when they were started there was enough space
> in the fs for them to run, so ENOSPC should not be an issue, right?

We're probably not going to run out of space, seeing as we had enough
space to run the first time.  However, there's various parts of the
filesystem that either behave differently or ENOSPC early if the
transaction has no block reservation, so we need to avoid them.

The outcome of the recovery work ought to be as close as possible to
what would have happened if the fs hadn't gone down.

> > When we set up the second transaction, we then set t_blk_res to the
> > captured block reservation.  So long as the recovery function is smart
> > enough to set up sufficient reservation we should avoid hitting ENOSPC,
> > right?
> 
> I'm not sure ENOSPC is really a problem for recovery of deferred ops
> given the above...

<nod>

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2020-05-05 15:09 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-05  1:13 [PATCH v2 0/3] xfs: fix inode use-after-free during log recovery Darrick J. Wong
2020-05-05  1:13 ` [PATCH 1/3] xfs: proper replay of deferred ops queued " Darrick J. Wong
2020-05-05  2:33   ` Dave Chinner
2020-05-05  3:06     ` Darrick J. Wong
2020-05-05  5:10       ` Dave Chinner
2020-05-05 15:08         ` Darrick J. Wong [this message]
2020-05-05  1:13 ` [PATCH 2/3] xfs: reduce log recovery transaction block reservations Darrick J. Wong
2020-05-05  1:13 ` [PATCH 3/3] xfs: fix an incore inode UAF in xfs_bui_recover Darrick J. Wong
2020-05-05 14:11   ` Brian Foster
2020-05-06  0:34     ` Darrick J. Wong
2020-05-06 13:56       ` Brian Foster
2020-05-06 17:01         ` Darrick J. Wong
2020-05-07  9:53           ` Brian Foster
2020-05-07 15:09             ` Darrick J. Wong
2020-05-07 16:58               ` Brian Foster
  -- strict thread matches above, loose matches on Subject: below --
2020-09-17  3:29 [PATCH 0/3] xfs: fix how we deal with new intents during recovery Darrick J. Wong
2020-09-17  3:29 ` [PATCH 1/3] xfs: proper replay of deferred ops queued during log recovery Darrick J. Wong
2020-09-23  7:48   ` Christoph Hellwig
2020-09-23 15:46     ` Darrick J. Wong
2020-04-22  2:08 [PATCH 0/3] xfs: fix inode use-after-free " Darrick J. Wong
2020-04-22  2:08 ` [PATCH 1/3] xfs: proper replay of deferred ops queued " Darrick J. Wong
2020-04-24 14:02   ` Brian Foster
2020-04-28 22:28     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200505150855.GQ5703@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.