All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Alli <allison.henderson@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 13/16] xfs: switch attr remove to xfs_attri_set_iter
Date: Tue, 3 May 2022 16:43:22 +1000	[thread overview]
Message-ID: <20220503064322.GX1098723@dread.disaster.area> (raw)
In-Reply-To: <982e97f354d087b12e72e4a0c158fac30420fdf3.camel@oracle.com>

On Tue, Apr 26, 2022 at 05:34:36PM -0700, Alli wrote:
> On Thu, 2022-04-14 at 19:44 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Now that xfs_attri_set_iter() has initial states for removing
> > attributes, switch the pure attribute removal code over to using it.
> > This requires attrs being removed to always be marked as INCOMPLETE
> > before we start the removal due to the fact we look up the attr to
> > remove again in xfs_attr_node_remove_attr().
> > 
> > Note: this drops the fillstate/refillstate optimisations from
> > the remove path that avoid having to look up the path again after
> > setting the incomplete flag and removeing remote attrs. Restoring
> > that optimisation to this path is future Dave's problem.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_attr.c | 26 +++++++++++++++-----------
> >  fs/xfs/xfs_attr_item.c   | 27 ++++++---------------------
> >  2 files changed, 21 insertions(+), 32 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index 8665b74ddfaf..ccc72c0c3cf5 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -507,13 +507,11 @@ int xfs_attr_node_removename_setup(
> >  	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
> >  		XFS_ATTR_LEAF_MAGIC);
> >  
> > -	if (args->rmtblkno > 0) {
> > -		error = xfs_attr_leaf_mark_incomplete(args, *state);
> > -		if (error)
> > -			goto out;
> > -
> > +	error = xfs_attr_leaf_mark_incomplete(args, *state);
> > +	if (error)
> > +		goto out;
> > +	if (args->rmtblkno > 0)
> >  		error = xfs_attr_rmtval_invalidate(args);
> > -	}
> >  out:
> >  	if (error)
> >  		xfs_da_state_free(*state);
> > @@ -954,6 +952,13 @@ xfs_attr_remove_deferred(
> >  	if (error)
> >  		return error;
> >  
> > +	if (xfs_attr_is_shortform(args->dp))
> > +		new->xattri_dela_state = XFS_DAS_SF_REMOVE;
> > +	else if (xfs_attr_is_leaf(args->dp))
> > +		new->xattri_dela_state = XFS_DAS_LEAF_REMOVE;
> > +	else
> > +		new->xattri_dela_state = XFS_DAS_NODE_REMOVE;
> > +
> Mmmm, same issue here as in patch 4, this initial state configs would
> get missed during a replay since these routines are only used in the
> delayed attr code path, not the replay code path.

*nod*

I'm working on fixing this right now.

> > @@ -311,20 +308,9 @@ xfs_xattri_finish_update(
> >  		goto out;
> >  	}
> >  
> > -	switch (op) {
> > -	case XFS_ATTR_OP_FLAGS_SET:
> > -		error = xfs_attr_set_iter(attr);
> > -		if (!error && attr->xattri_dela_state != XFS_DAS_DONE)
> > -			error = -EAGAIN;
> > -		break;
> > -	case XFS_ATTR_OP_FLAGS_REMOVE:
> > -		ASSERT(XFS_IFORK_Q(args->dp));
> > -		error = xfs_attr_remove_iter(attr);
> > -		break;
> > -	default:
> > -		error = -EFSCORRUPTED;
> > -		break;
> > -	}
> > +	error = xfs_attr_set_iter(attr);
> > +	if (!error && attr->xattri_dela_state != XFS_DAS_DONE)
> > +		error = -EAGAIN;
> >  
> 
> The concern I have here is that op_flags is recorded and recovered from
> the log item (see xfs_attri_item_recover).  Where as xattri_dela_state
> is not.  What ever value was set in attr before the shut down would not
> be there upon recovery, and with out op_flag we wont know how to
> configure it.

Right, that's the real problem with the existing operational order
and intent contents - what is on disk and/or in the journal is not
consistent and is dependent on the state at which the operation
failed. hence to recover from this, we'd need to push the current
state into the intents as well.

That's pretty messy and nasty, and I'm trying to avoid that. This is
the reason why I've reworking the way the logged attribute is logged
and recovered in this series.

When we are doing a pure attr create, we have to consider:

- shortform create is a single transaction operation and so
  never leaves anything behind for recovery to clean up.
- leaf and node insertion of inline attributes are single
  transaction operations and never leave anything for
  recovery to clean up
- short-form to leaf and leaf to node operations roll the
  transaction without changing the attr contents, so if we
  crash after those conversions are committed, recovery only
  needs to create the new attr. IOWs, nothing to clean up
  before we run the create replay.
- creating a remote attr sets the INCOMPLETE flag on the new attr in
  when the name is inserted into the btree, and it is removed when
  the remote attr creation is complete. Hence there's a transient
  state in the journal where the attr is present but INCOMPLETE.

This last state is the problem - if recovery does not remove this
INCOMPLETE xattr, or it does not restart the recovery from the exact
point it failed, we will leave stale INCOMPLETE xattrs in the btree
whenever we recover from a crash. That leaves us with two choices;
either we:

- put a whole lot more state into the intent to enable exact
continuation of the operation (logging state and remote attr extent
target); or
- we just remove the INCOMPLETE xattr and run a normal create
  operation to recreate the entire xattr.


When we are doing a replace, we have similar state based problems -
did we crash during create or removal? IF we are doing the create
first, then we can crash with both a new incomplete and an old valid
xattr of the given name. ANd after we flip the flags over, we have
a new valid and an old incomplete old xattr, and like the create
state we can't tell which is which.

Now what do we do in recovery - which one is the one we have to
remove? Or do we have to remove both? We ahve the same problem as a
pure create - we don't know what to do without knowing the exact
state of the operation.

However, if we change the replace operation to do things in a
different order because we have an intent that stores the new attr
{name, value} pair, we can avoid this whole problem. THat isL

- log the new attr intent atomically with marking the existing attr
  INCOMPLETE.
- remove the old INCOMPLETE attr
- insert the new attr as per the pure create above

If we crash at any point during this operation, we will only ever
see a single INCOMPLETE entry under the name of the new attr,
regardless of whether we failed during the remove of the old xattr
of the create of the new xattr. And because of the intents, we'll
always have the new xattr name+val at recovery time.

Pure remove has the same problem - partial remove still needs
recovery to clean up, but we don't want partially removed xattrs to
ever be visible to userspace. Hence the logged remove operation is:

- log the new attr intent atomically with marking the existing attr
  INCOMPLETE.
- remove the old INCOMPLETE attr

And now the recovery operation is simply "remove the INCOMPLETE
xattr under the logged name".

IOWs, we now have the same recovery path for all three logged xattr
operations:

- remove the INCOMPLETE xattr under the logged name if it exists
  to return the attr fork to a known good state.
- for set/replace, create the new xattr that was logged in the
  intent.

To return to the original question, this means all recovery needs to
know is whether we are recovering a SET/REPLACE or a REMOVE
operation along with the logged xattr name/val pair from the intent.
We don't need to know what state we crashed in, nor do we need to
know partial setup/teardown target/state in the journal because all
we ever need to do to get back to a known good state is teardown
the existing INCOMPLETE xattr....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2022-05-03  6:43 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-14  9:44 [PATCH 00/10 v2] xfs: LARP - clean up xfs_attr_set_iter state machine Dave Chinner
2022-04-14  9:44 ` [PATCH 01/16] xfs: avoid empty xattr transaction when attrs are inline Dave Chinner
2022-04-22  0:37   ` Alli
2022-04-14  9:44 ` [PATCH 02/16] xfs: initialise attrd item to zero Dave Chinner
2022-04-22  0:37   ` Alli
2022-04-14  9:44 ` [PATCH 03/16] xfs: make xattri_leaf_bp more useful Dave Chinner
2022-04-22  0:37   ` Alli
2022-04-14  9:44 ` [PATCH 04/16] xfs: separate out initial attr_set states Dave Chinner
2022-04-22  0:38   ` Alli
2022-04-26 23:50     ` Dave Chinner
2022-04-14  9:44 ` [PATCH 05/16] xfs: kill XFS_DAC_LEAF_ADDNAME_INIT Dave Chinner
2022-04-22  0:38   ` Alli
2022-04-14  9:44 ` [PATCH 06/16] xfs: consolidate leaf/node states in xfs_attr_set_iter Dave Chinner
2022-04-23  1:05   ` Alli
2022-04-14  9:44 ` [PATCH 07/16] xfs: split remote attr setting out from replace path Dave Chinner
2022-04-23  1:06   ` Alli
2022-04-14  9:44 ` [PATCH 08/16] xfs: XFS_DAS_LEAF_REPLACE state only needed if !LARP Dave Chinner
2022-04-23  1:06   ` Alli
2022-04-14  9:44 ` [PATCH 09/16] xfs: remote xattr removal in xfs_attr_set_iter() is conditional Dave Chinner
2022-04-23  1:07   ` Alli
2022-04-14  9:44 ` [PATCH 10/16] xfs: clean up final attr removal in xfs_attr_set_iter Dave Chinner
2022-04-23  1:06   ` Alli
2022-04-27  0:32   ` Alli
2022-04-14  9:44 ` [PATCH 11/16] xfs: xfs_attr_set_iter() does not need to return EAGAIN Dave Chinner
2022-04-27  0:32   ` Alli
2022-04-14  9:44 ` [PATCH 12/16] xfs: introduce attr remove initial states into xfs_attr_set_iter Dave Chinner
2022-04-27  0:33   ` Alli
2022-04-14  9:44 ` [PATCH 13/16] xfs: switch attr remove to xfs_attri_set_iter Dave Chinner
2022-04-27  0:34   ` Alli
2022-05-03  6:43     ` Dave Chinner [this message]
2022-04-14  9:44 ` [PATCH 14/16] xfs: remove xfs_attri_remove_iter Dave Chinner
2022-04-27  0:34   ` Alli
2022-05-03  6:53     ` Dave Chinner
2022-04-14  9:44 ` [PATCH 15/16] xfs: split attr replace op setup from create op setup Dave Chinner
2022-04-27  1:10   ` Alli
2022-04-14  9:44 ` [PATCH 16/16] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework Dave Chinner
2022-04-28  7:02   ` Alli
2022-05-03  7:40     ` Dave Chinner
2022-05-04  1:30       ` Alli
2022-05-04  5:49         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220503064322.GX1098723@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=allison.henderson@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.