All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Brian Foster <bfoster@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Allison Henderson <allison.henderson@oracle.com>,
	xfs <linux-xfs@vger.kernel.org>
Subject: Re: [RFC[RAP] PATCH] xfs: allow setting and clearing of log incompat feature flags
Date: Wed, 20 Jan 2021 16:12:22 -0800	[thread overview]
Message-ID: <20210121001222.GU3134581@magnolia> (raw)
In-Reply-To: <20210114103047.GE1333929@bfoster>

On Thu, Jan 14, 2021 at 05:30:47AM -0500, Brian Foster wrote:
> On Wed, Jan 13, 2021 at 06:25:47PM -0800, Darrick J. Wong wrote:
> > On Thu, Jan 14, 2021 at 08:31:05AM +1100, Dave Chinner wrote:
> > > On Thu, Jan 07, 2021 at 03:28:21PM -0800, Darrick J. Wong wrote:
> > > > On Tue, Dec 15, 2020 at 08:50:03AM -0500, Brian Foster wrote:
> > > > > On Tue, Dec 15, 2020 at 07:54:56AM +1100, Dave Chinner wrote:
> > > > > > On Mon, Dec 14, 2020 at 10:58:31AM -0500, Brian Foster wrote:
> > > > > > > On Sun, Dec 13, 2020 at 08:14:39AM +1100, Dave Chinner wrote:
> > > > > > > > On Fri, Dec 11, 2020 at 08:39:01AM -0500, Brian Foster wrote:
> > > > > > > > > On Fri, Dec 11, 2020 at 08:50:04AM +1100, Dave Chinner wrote:
> > > > > > > > > > As for a mechanism for dynamically adding log incompat flags?
> > > > > > > > > > Perhaps we just do that in xfs_trans_alloc() - add an log incompat
> > > > > > > > > > flags field into the transaction reservation structure, and if
> > > > > > > > > > xfs_trans_alloc() sees an incompat field set and the superblock
> > > > > > > > > > doesn't have it set, the first thing it does is run a "set log
> > > > > > > > > > incompat flag" transaction before then doing it's normal work...
> > > > > > > > > > 
> > > > > > > > > > This should be rare enough it doesn't have any measurable
> > > > > > > > > > performance overhead, and it's flexible enough to support any log
> > > > > > > > > > incompat feature we might need to implement...
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > But I don't think that is sufficient. As Darrick pointed out up-thread,
> > > > > > > > > the updated superblock has to be written back before we're allowed to
> > > > > > > > > commit transactions with incompatible items. Otherwise, an older kernel
> > > > > > > > > can attempt log recovery with incompatible items present if the
> > > > > > > > > filesystem crashes before the superblock is written back.
> > > > > > > > 
> > > > > > > > Sure, that's what the hook in xfs_trans_alloc() would do. It can do
> > > > > > > > the work in the context that is going to need it, and set a wait
> > > > > > > > flag for all incoming transactions that need a log incompat flag to
> > > > > > > > wait for it do it's work.  Once it's done and the flag is set, it
> > > > > > > > can continue and wake all the waiters now that the log incompat flag
> > > > > > > > has been set. Anything that doesn't need a log incompat flag can
> > > > > > > > just keep going and doesn't ever get blocked....
> > > > > > > > 
> > > > > > > 
> > > > > > > It would have to be a sync transaction plus sync AIL force in
> > > > > > > transaction allocation context if we were to log the superblock change,
> > > > > > > which sounds a bit hairy...
> > > > > > 
> > > > > > Well, we already do sync AIL forces in transaction reservation when
> > > > > > we run out of log space, so there's no technical reason for this
> > > > > > being a problem at all. xfs_trans_alloc() is expected to block
> > > > > > waiting on AIL tail pushing....
> > > > > > 
> > > > > > > > I suspect this is one of the rare occasions where an unlogged
> > > > > > > > modification makes an awful lot of sense: we don't even log that we
> > > > > > > > are adding a log incompat flag, we just do an atomic synchronous
> > > > > > > > write straight to the superblock to set the incompat flag(s). The
> > > > > > > > entire modification can be done under the superblock buffer lock to
> > > > > > > > serialise multiple transactions all trying to set incompat bits, and
> > > > > > > > we don't set the in-memory superblock incompat bit until after it
> > > > > > > > has been set and written to disk. Hence multiple waits can check the
> > > > > > > > flag after they've got the sb buffer lock, and they'll see that it's
> > > > > > > > already been set and just continue...
> > > > > > > > 
> > > > > > > 
> > > > > > > Agreed. That is a notable simplification and I think much more
> > > > > > > preferable than the above for the dynamic approach.
> > > > > > > 
> > > > > > > That said, note that dynamic feature bits might introduce complexity in
> > > > > > > more subtle ways. For example, nothing that I can see currently
> > > > > > > serializes idle log covering with an active transaction (that may have
> > > > > > > just set an incompat bit via some hook yet not committed anything to the
> > > > > > > log subsystem), so it might not be as simple as just adding a hook
> > > > > > > somewhere.
> > > > > > 
> > > > > > Right, we had to make log covering away of the CIL to prevent it
> > > > > > from idling while there were multiple active committed transactions
> > > > > > in memory. So the state machine only progresses if both the CIL and
> > > > > > AIL are empty. If we had some way of knowing that a transaction is
> > > > > > in progress, we could check that in xfs_log_need_covered() and we'd
> > > > > > stop the state machine progress at that point. But we got rid of the
> > > > > > active transaction counter that we could use for that....
> > > > > > 
> > > > > > [Hmmm, didn't I recently have a patch that re-introduced that
> > > > > > counter to fix some other "we need to know if there's an active
> > > > > > transaction running" issue? Can't remember what that was now...]
> > > > > > 
> > > > > 
> > > > > I think you removed it, actually, via commit b41b46c20c0bd ("xfs: remove
> > > > > the m_active_trans counter"). We subsequently discussed reintroducing
> > > > > the same concept for the quotaoff rework [1], which might be what you're
> > > > > thinking of. That uses a percpu rwsem since we don't really need a
> > > > > counter, but I suspect could be reused for serialization in this use
> > > > > case as well (assuming I can get some reviews on it.. ;).
> > > > > 
> > > > > FWIW, I was considering putting those quotaoff patches ahead of the log
> > > > > covering work so we could reuse that code again in attr quiesce, but I
> > > > > think I'm pretty close to being able to remove that particular usage
> > > > > entirely.
> > > > 
> > > > I was thinking about using a rwsem to protect the log incompat flags --
> > > > code that thinks it might use a protected feature takes the lock in
> > > > read mode until commit; and the log covering code only clears the
> > > > flags if down_write_trylock succeeds.  That constrains the overhead to
> > > > threads that are trying to use the feature, instead of making all
> > > > threads pay the cost of bumping the counter.
> > > 
> > > If you are going to do that, make it a per-cpu rwsem, because we
> > > really only care about the global shared read overhead in the hot
> > > paths and not the overhead of taking it in write mode if
> > > it is only the log covering code that does that...
> > > 
> > > > > I'm more approaching this from a "what are the requirements and how/why
> > > > > do they justify the associated complexity?" angle. That's why I'm asking
> > > > > things like how much difference does a dynamic bit really make for
> > > > > something like xattrs. But I agree that's less of a concern when
> > > > > associated with more obscure or rarely used operations, so on balance I
> > > > > think that's a fair approach to this mechanism provided we consider
> > > > > suitability on a per feature basis.
> > > > 
> > > > Hm.  If I had to peer into my crystal ball I'd guess that the current
> > > > xattr logging scheme works fine for most xattr users, so I wouldn't
> > > > worry much about the dynamic bit.
> > > > 
> > > > However, I could see things like atomic range exchange being more
> > > > popular, in which case people might notice the overhead of tracking when
> > > > we can turn off the feature bit...
> > > 
> > > Hence a per-cpu rwsem... :)
> > 
> > Yup, it seems to work fine, though now I'm distracted over the posteof
> > cleanup serieses... :)

Well ok, it works with the /huge/ exception that there is no
percpu_rwsem_down_write_trylock, which means that we can't have the log
opportunistically try to grab it while covering.  That's kind of a
bummer, since I don't think that's at all desirable.

> FWIW, I have a patch on the list from a few months ago that introduces a
> transaction percpu rwsem for the quotaoff rework:
> 
> https://lore.kernel.org/linux-xfs/20201001150310.141467-3-bfoster@redhat.com/
> 
> Perhaps I should repost?

Ok?  Though /me is busy collecting things for 5.12 ATM... :)

--D

> 
> Brian
> 
> > --D
> > 
> > > Cheers,
> > > 
> > > Dave.
> > > -- 
> > > Dave Chinner
> > > david@fromorbit.com
> > 
> 

  reply	other threads:[~2021-01-21  1:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-08  0:40 [RFC[RAP] PATCH] xfs: allow setting and clearing of log incompat feature flags Darrick J. Wong
2020-12-08 11:19 ` Brian Foster
2020-12-08 18:10   ` Darrick J. Wong
2020-12-08 19:19     ` Brian Foster
2020-12-09  3:26       ` Darrick J. Wong
2020-12-09  4:19         ` Dave Chinner
2020-12-09 15:52           ` Brian Foster
2020-12-09 17:04             ` Brian Foster
2020-12-09 20:51               ` Dave Chinner
2020-12-10 14:23                 ` Brian Foster
2020-12-10 21:50                   ` Dave Chinner
2020-12-11 13:39                     ` Brian Foster
2020-12-11 23:35                       ` Darrick J. Wong
2020-12-14 15:25                         ` Brian Foster
2020-12-15 14:56                         ` Eric Sandeen
2020-12-12 21:14                       ` Dave Chinner
2020-12-14 15:58                         ` Brian Foster
2020-12-14 20:54                           ` Dave Chinner
2020-12-15 13:50                             ` Brian Foster
2021-01-07 23:28                               ` Darrick J. Wong
2021-01-13 21:31                                 ` Dave Chinner
2021-01-14  2:25                                   ` Darrick J. Wong
2021-01-14 10:30                                     ` Brian Foster
2021-01-21  0:12                                       ` Darrick J. Wong [this message]
2020-12-09 15:50         ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210121001222.GU3134581@magnolia \
    --to=djwong@kernel.org \
    --cc=allison.henderson@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.