All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/3] xfs: EFI recovery needs it's own transaction reservation
Date: Thu, 10 Sep 2020 07:44:55 +1000	[thread overview]
Message-ID: <20200909214455.GQ12131@dread.disaster.area> (raw)
In-Reply-To: <20200909133111.GA765129@bfoster>

On Wed, Sep 09, 2020 at 09:31:11AM -0400, Brian Foster wrote:
> On Wed, Sep 09, 2020 at 06:19:10PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Recovering an EFI currently uses a itruncate reservation, which is
> > designed for a rolling transaction that modifies the BMBT and
> > logs the EFI in one commit, then frees the space and logs the EFD in
> > the second commit.
> > 
> > Recovering the EFI only requires the second transaction in this
> > pair, and hence has a smaller log space requirement than a truncate
> > operation. Hence when the extent free is being processed at runtime,
> > the log reservation that is held by the filesystem is only enough to
> > complete the extent free, not the entire truncate operation.
> > 
> > Hence if the EFI pins the tail of the log and the log fills up while
> > the extent is being freed, the amount of reserved free space in the
> > log is not enough to start another entire truncate operation. Hence
> > if we crash at this point, log recovery will deadlock with the EFI
> > pinning the tail of the log and the log not having enough free space
> > to reserve an itruncate transaction.
> > 
> > As such, EFI recovery needs it's own log space reservation separate
> > to the itruncate reservation. We only need what is required free the
> > extent, and this matches the space we have reserved at runtime for
> > this operation and hence should prevent the recovery deadlock from
> > occurring.
> > 
> > This patch adds the new reservation in a way that minimises the
> > change such that it should be back-portable to older kernels easily.
> > Follow up patches will factor and rework the reservations to be more
> > correct and more tightly defined.
> > 
> > Note: this would appear to be a generic problem with intent
> > recovery; we use the entire operation reservation for recovery,
> > not the reservation that was held at runtime after the intent was
> > logged. I suspect all intents are going to require their own
> > reservation as a result.
> > 
> 
> It might be worth explicitly pointing out that support for EFI/EFD
> intents goes farther back than the various intents associated with newer
> features, hence the value of a targeted fix.

Ok.

> > @@ -916,6 +916,16 @@ xfs_trans_resv_calc(
> >  		resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT;
> >  	resp->tr_qm_dqalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> >  
> > +	/*
> > +	 * Log recovery reservations for intent replay
> > +	 *
> > +	 * EFI recovery is itruncate minus the initial transaction that logs
> > +	 * logs the EFI.
> > +	 */
> > +	resp->tr_efi.tr_logres = resp->tr_itruncate.tr_logres;
> > +	resp->tr_efi.tr_logcount = resp->tr_itruncate.tr_logcount - 1;
> 
> tr_itruncate.tr_logcount looks like it's either 2 or 8 depending on
> whether reflink is enabled. On one hand this seems conservative enough,
> but do we know exactly what those extra unit counts are accounted for in
> the reflink case?

Right, in the reflink case we may have to roll the transaction many more
times to do the refcount btree and reverse map btree modifications.
Those are done under separate intents and so we have to roll and
commit the defered ops more times on a reflink/rmap based
filesystem. Hence the logcount is higher so that the initial
reservation can roll more times before regrant during a roll has to
go and physically reserve more write space in the log to continue
rolling the transaction.

> It looks like extents are only freed when the last
> reference is dropped (otherwise we log a refcount intent), which makes
> me wonder whether we really need 7 log count units if recovery
> encounters an EFI.

I don't know if the numbers are correct, and it really is out of
scope for this patch to audit/fix that. I really think we need to
map this whole thing out in a diagram at this point because I now
suspect that the allocfree log count calculation is not correct,
either...

> Also, while looking through the code I noticed that truncate does the
> following:
> 
> 		...
>                 error = xfs_defer_finish(&tp);
>                 if (error)
>                         goto out;
> 
>                 error = xfs_trans_roll_inode(&tp, ip);
>                 if (error)
>                         goto out;
> 		...
> 
> ... which looks like it rolls the transaction an extra time per-extent.
> I don't think that contributes to this problem vs just being a runtime
> inefficiency, so maybe I'll fling a patch up for that separately.

Yeah, I'm not sure if this is correct/needed or not. 

> >  	 * The following transactions are logged in logical format with
> >  	 * a default log count.
> > diff --git a/fs/xfs/libxfs/xfs_trans_resv.h b/fs/xfs/libxfs/xfs_trans_resv.h
> > index 7241ab28cf84..13173b3eaac9 100644
> > --- a/fs/xfs/libxfs/xfs_trans_resv.h
> > +++ b/fs/xfs/libxfs/xfs_trans_resv.h
> > @@ -50,6 +50,8 @@ struct xfs_trans_resv {
> >  	struct xfs_trans_res	tr_qm_equotaoff;/* end of turn quota off */
> >  	struct xfs_trans_res	tr_sb;		/* modify superblock */
> >  	struct xfs_trans_res	tr_fsyncts;	/* update timestamps on fsync */
> > +	struct xfs_trans_res	tr_efi;		/* EFI log item recovery */
> > +
> 
> Extra whitespace line.

Will fix.

Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2020-09-09 21:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-09  8:19 [PATCH 0/3] [RFC] xfs: intent recovery reservation changes Dave Chinner
2020-09-09  8:19 ` [PATCH 1/3] xfs: EFI recovery needs it's own transaction reservation Dave Chinner
2020-09-09 13:31   ` Brian Foster
2020-09-09 21:44     ` Dave Chinner [this message]
2020-09-10 13:18       ` Brian Foster
2020-09-10 21:29         ` Dave Chinner
2020-09-11 12:37           ` Brian Foster
2020-09-09  8:19 ` [PATCH 2/3] xfs: add a free space extent change reservation Dave Chinner
2020-09-09 13:35   ` Brian Foster
2020-09-09 22:51     ` Dave Chinner
2020-09-10 13:19       ` Brian Foster
2020-09-09  8:19 ` [PATCH 3/3] xfs: factor free space tree transaciton reservations Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200909214455.GQ12131@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.