Re: [RFC[RAP] PATCH] xfs: allow setting and clearing of log incompat feature flags

From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Allison Henderson <allison.henderson@oracle.com>,
	xfs <linux-xfs@vger.kernel.org>
Subject: Re: [RFC[RAP] PATCH] xfs: allow setting and clearing of log incompat feature flags
Date: Fri, 11 Dec 2020 08:50:04 +1100	[thread overview]
Message-ID: <20201210215004.GC3913616@dread.disaster.area> (raw)
In-Reply-To: <20201210142358.GB1912831@bfoster>

On Thu, Dec 10, 2020 at 09:23:58AM -0500, Brian Foster wrote:
> On Thu, Dec 10, 2020 at 07:51:32AM +1100, Dave Chinner wrote:
> > For changing incompat log flags, covering also provides exactly what
> > we need - an empty log with only a dirty superblock in the journal
> > to recover. All that we need then is to recheck feature flags after
> > recovery (not just clear log incompat flags) because we might need
> > to use this to also set incompat feature flags dynamically.
> > 
> 
> I'd love it if we use a better term for a log isolated to the superblock
> buffer. So far we've discussed an empty log with a dummy superblock, an
> empty log with a dirty superblock, and I suppose we could also have an
> actual empty log. :P "Covered log," perhaps?

On terminology: "covered log" has been around for 25 years
in XFS, and it's very definition is "an empty, up-to-date log except
for a dummy object we logged without any changes to update the log
tail". And, by definition, that dummy object is dirty in the log
even if it contains no actual modifications when it was logged.
So in this case, "dummy" is directly interchangable with "dirty"
when looking at the log contents w.r.t. recovery behaviour.

It's worth looking at a historical change, too. Log covering
originally used the root inode and not the superblock. That caused
problems on linux dirtying VFS inode state, so we changed it to the
superblock in commit 1a387d3be2b3 ("xfs: dummy transactions should
not dirty VFS state") about a decade ago.  Note the name of the
function at the time (xfs_fs_log_dummy()) and that it's description
explicitly mentions that a dummy transaction used for updating the
log tail when covering it.

The only difference in this discussion is fact that we may be
replacing the "dummy" with a modified superblock. Both are still
dirty superblock objects in the log, and serve the same purpose for
covering the log, but instead of using a dummy object we update the
log incompat flags so taht the only thing that gets replayed if we
crash is the modification to the superblock flags...

Finally, no, we can't have a truly empty log while the filesystem is
mounted because log transaction records must not be empty. Further,
we can only bring the log down to an "empty but not quite clean"
state while mounted, because if the log is actually marked clean and
we crash then log recovery will not run and so we will not clean up
files that were open-but-unlinked when the system crashed.

> > > > but didn't necessarily clean the log. I wonder if we can fit that state
> > > > into the existing log covering mechanism by logging the update earlier
> > > > (or maybe via an intermediate state..)? I'll need to go stare at that
> > > > code a bit..
> > 
> > That's kinda what I'd like to see done - it gives us a generic way
> > of altering superblock incompat feature fields and filesystem
> > structures dynamically with no risk of log recovery needing to parse
> > structures it may not know about.
> > 
> 
> We might have to think about if we want to clear such feature bits in
> all log covering scenarios (idle, freeze) vs. just unmount.

I think clearing bits can probably be lazy. Set a state flag to tell
log covering that it needs to do an update, and so when the covering
code finally runs the feature bit modification just happens.

> My previous suggestion in the other sub-thread was to set bits on
> mount and clear on unmount because that (hopefully) simplifies the
> broader implementation.  Otherwise we need to manage dynamic
> setting of bits when the associated log items are active, and that
> still isn't truly representative because the bits would remain set
> long after active items fall out of the log, until the log is
> covered. Either way is possible, but I'm curious what others
> think.

I think that unless we are setting a log incompat flag, log covering
should unconditionally clear the log incompat flags. Because the log
is empty, and recovery of a superblock buffer should -always- be
possible, then by definition we have no log incompat state
present....

As for a mechanism for dynamically adding log incompat flags?
Perhaps we just do that in xfs_trans_alloc() - add an log incompat
flags field into the transaction reservation structure, and if
xfs_trans_alloc() sees an incompat field set and the superblock
doesn't have it set, the first thing it does is run a "set log
incompat flag" transaction before then doing it's normal work...

This should be rare enough it doesn't have any measurable
performance overhead, and it's flexible enough to support any log
incompat feature we might need to implement...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com