linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Allison Henderson <allison.henderson@oracle.com>,
	xfs <linux-xfs@vger.kernel.org>
Subject: Re: [RFC[RAP] PATCH] xfs: allow setting and clearing of log incompat feature flags
Date: Tue, 8 Dec 2020 06:19:06 -0500	[thread overview]
Message-ID: <20201208111906.GA1679681@bfoster> (raw)
In-Reply-To: <20201208004028.GU629293@magnolia>

On Mon, Dec 07, 2020 at 04:40:28PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Log incompat feature flags in the superblock exist for one purpose: to
> protect the contents of a dirty log from replay on a kernel that isn't
> prepared to handle those dirty contents.  This means that they can be
> cleared if (a) we know the log is clean and (b) we know that there
> aren't any other threads in the system that might be setting or relying
> upon a log incompat flag.
> 
> Therefore, clear the log incompat flags when we've finished recovering
> the log, when we're unmounting cleanly, remounting read-only, or
> freezing; and provide a function so that subsequent patches can start
> using this.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> Note: I wrote this so that we could turn on log incompat flags for
> atomic extent swapping and Allison could probably use it for the delayed
> logged xattr patchset.  Not gonna try to land this in 5.11, FWIW...
> ---
>  fs/xfs/libxfs/xfs_format.h |   15 ++++++
>  fs/xfs/xfs_log.c           |    8 +++
>  fs/xfs/xfs_mount.c         |  119 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_mount.h         |    2 +
>  fs/xfs/xfs_super.c         |    9 +++
>  5 files changed, 153 insertions(+)
> 
...
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 669a9cf43582..68a5c90ec65b 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
...
> @@ -1361,6 +1369,117 @@ xfs_force_summary_recalc(
>  	xfs_fs_mark_checked(mp, XFS_SICK_FS_COUNTERS);
>  }
>  
> +/*
> + * Enable a log incompat feature flag in the primary superblock.  The caller
> + * cannot have any other transactions in progress.
> + */
> +int
> +xfs_add_incompat_log_feature(
> +	struct xfs_mount	*mp,
> +	uint32_t		feature)
> +{
> +	struct xfs_dsb		*dsb;
> +	int			error;
> +
> +	ASSERT(hweight32(feature) == 1);
> +	ASSERT(!(feature & XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN));
> +
> +	/*
> +	 * Force the log to disk and kick the background AIL thread to reduce
> +	 * the chances that the bwrite will stall waiting for the AIL to unpin
> +	 * the primary superblock buffer.  This isn't a data integrity
> +	 * operation, so we don't need a synchronous push.
> +	 */
> +	error = xfs_log_force(mp, XFS_LOG_SYNC);
> +	if (error)
> +		return error;
> +	xfs_ail_push_all(mp->m_ail);
> +
> +	/*
> +	 * Lock the primary superblock buffer to serialize all callers that
> +	 * are trying to set feature bits.
> +	 */
> +	xfs_buf_lock(mp->m_sb_bp);
> +	xfs_buf_hold(mp->m_sb_bp);
> +
> +	if (XFS_FORCED_SHUTDOWN(mp)) {
> +		error = -EIO;
> +		goto rele;
> +	}
> +
> +	if (xfs_sb_has_incompat_log_feature(&mp->m_sb, feature))
> +		goto rele;
> +
> +	/*
> +	 * Write the primary superblock to disk immediately, because we need
> +	 * the log_incompat bit to be set in the primary super now to protect
> +	 * the log items that we're going to commit later.
> +	 */
> +	dsb = mp->m_sb_bp->b_addr;
> +	xfs_sb_to_disk(dsb, &mp->m_sb);
> +	dsb->sb_features_log_incompat |= cpu_to_be32(feature);
> +	error = xfs_bwrite(mp->m_sb_bp);
> +	if (error)
> +		goto shutdown;
> +

Do we need to do all of this serialization and whatnot to set the
incompat bit? Can we just modify and log the in-core sb when or before
an incompatible item is logged, similar to how xfs_sbversion_add_attr2()
is implemented for example?

> +	/*
> +	 * Add the feature bits to the incore superblock before we unlock the
> +	 * buffer.
> +	 */
> +	xfs_sb_add_incompat_log_features(&mp->m_sb, feature);
> +	xfs_buf_relse(mp->m_sb_bp);
> +
> +	/* Log the superblock to disk. */
> +	return xfs_sync_sb(mp, false);
> +shutdown:
> +	xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
> +rele:
> +	xfs_buf_relse(mp->m_sb_bp);
> +	return error;
> +}
> +
> +/*
> + * Clear all the log incompat flags from the superblock.
> + *
> + * The caller must have freeze protection, cannot have any other transactions
> + * in progress, must ensure that the log does not contain any log items
> + * protected by any log incompat bit, and must ensure that there are no other
> + * threads that could be reading or writing the log incompat feature flag in
> + * the primary super.  In other words, we can only turn features off as one of
> + * the final steps of quiescing or unmounting the log.
> + */
> +int
> +xfs_clear_incompat_log_features(
> +	struct xfs_mount	*mp)
> +{
> +	if (!xfs_sb_version_hascrc(&mp->m_sb) ||
> +	    !xfs_sb_has_incompat_log_feature(&mp->m_sb,
> +				XFS_SB_FEAT_INCOMPAT_LOG_ALL) ||
> +	    XFS_FORCED_SHUTDOWN(mp))
> +		return 0;
> +
> +	/*
> +	 * Update the incore superblock.  We synchronize on the primary super
> +	 * buffer lock to be consistent with the add function, though at least
> +	 * in theory this shouldn't be necessary.
> +	 */
> +	xfs_buf_lock(mp->m_sb_bp);
> +	xfs_buf_hold(mp->m_sb_bp);
> +
> +	if (!xfs_sb_has_incompat_log_feature(&mp->m_sb,
> +				XFS_SB_FEAT_INCOMPAT_LOG_ALL))
> +		goto rele;
> +
> +	xfs_sb_remove_incompat_log_features(&mp->m_sb);
> +	xfs_buf_relse(mp->m_sb_bp);
> +
> +	/* Log the superblock to disk. */
> +	return xfs_sync_sb(mp, false);

Ok, so here it looks like we log the superblock change and presumably
rely on the unmount sequence to write it back...

> +rele:
> +	xfs_buf_relse(mp->m_sb_bp);
> +	return 0;
> +}
> +
>  /*
>   * Update the in-core delayed block counter.
>   *
...
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 343435a677f9..71e304c15e6b 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -919,6 +919,15 @@ xfs_quiesce_attr(
>  	/* force the log to unpin objects from the now complete transactions */
>  	xfs_log_force(mp, XFS_LOG_SYNC);
>  
> +	/*
> +	 * Clear log incompat features since we're freezing or remounting ro.
> +	 * Report failures, though it's not fatal to have a higher log feature
> +	 * protection level than the log contents actually require.
> +	 */
> +	error = xfs_clear_incompat_log_features(mp);
> +	if (error)
> +		xfs_warn(mp, "Unable to clear superblock log incompat flags. "
> +				"Frozen image may not be consistent.");
>  

What happens if the modified superblock is written back and then we
crash during a quiesce but before the log is fully clean? ISTM we could
end up in recovery of a log with potentially incompatible log items
where the feature bit has already been cleared. Not sure that's a
problem with intents, but seems risky in general.

Perhaps this needs to be pushed further down such that similar to
unmount, we ensure a full/sync AIL push completes (and moves the in-core
tail) before we log the feature bit change. I do wonder if it's worth
complicating the log quiesce path to clear feature bits at all, but I
suppose it could be a little inconsistent to clean the log on freeze yet
leave an incompat bit around. Perhaps we should push the clear bit
sequence down into the log quiesce path between completing the AIL push
and writing the unmount record. We may have to commit a sync transaction
and then push the AIL again, but that would cover the unmount and freeze
cases and I think we could probably do away with the post-recovery bit
clearing case entirely. A current/recovered mount should clear the
associated bits on the next log quiesce anyways. Hm?

Brian

>  	/* Push the superblock and write an unmount record */
>  	error = xfs_log_sbcount(mp);
> 


  reply	other threads:[~2020-12-08 11:20 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-08  0:40 [RFC[RAP] PATCH] xfs: allow setting and clearing of log incompat feature flags Darrick J. Wong
2020-12-08 11:19 ` Brian Foster [this message]
2020-12-08 18:10   ` Darrick J. Wong
2020-12-08 19:19     ` Brian Foster
2020-12-09  3:26       ` Darrick J. Wong
2020-12-09  4:19         ` Dave Chinner
2020-12-09 15:52           ` Brian Foster
2020-12-09 17:04             ` Brian Foster
2020-12-09 20:51               ` Dave Chinner
2020-12-10 14:23                 ` Brian Foster
2020-12-10 21:50                   ` Dave Chinner
2020-12-11 13:39                     ` Brian Foster
2020-12-11 23:35                       ` Darrick J. Wong
2020-12-14 15:25                         ` Brian Foster
2020-12-15 14:56                         ` Eric Sandeen
2020-12-12 21:14                       ` Dave Chinner
2020-12-14 15:58                         ` Brian Foster
2020-12-14 20:54                           ` Dave Chinner
2020-12-15 13:50                             ` Brian Foster
2021-01-07 23:28                               ` Darrick J. Wong
2021-01-13 21:31                                 ` Dave Chinner
2021-01-14  2:25                                   ` Darrick J. Wong
2021-01-14 10:30                                     ` Brian Foster
2021-01-21  0:12                                       ` Darrick J. Wong
2020-12-09 15:50         ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201208111906.GA1679681@bfoster \
    --to=bfoster@redhat.com \
    --cc=allison.henderson@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).