From: "Ernesto A. Fernández" <ernesto.mnd.fernandez@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org,
"Darrick J. Wong" <darrick.wong@oracle.com>,
"Zorro Lang" <zlang@redhat.com>,
"Ernesto A. Fernández" <ernesto.mnd.fernandez@gmail.com>
Subject: Re: [PATCH] xfs: preserve i_mode if __xfs_set_acl() fails
Date: Tue, 15 Aug 2017 16:29:39 -0300 [thread overview]
Message-ID: <20170815192938.GA1247@debian.home> (raw)
In-Reply-To: <20170815084430.GK21024@dastard>
On Tue, Aug 15, 2017 at 06:44:30PM +1000, Dave Chinner wrote:
> On Tue, Aug 15, 2017 at 03:18:58AM -0300, Ernesto A. Fernández wrote:
> > Hi Dave, thanks for the review!
> >
> > On Tue, Aug 15, 2017 at 12:00:18PM +1000, Dave Chinner wrote:
> > > On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote:
> > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > > index de7b9bd..c3193e1 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > > @@ -395,6 +395,7 @@ xfs_attr_remove(
> > > > struct xfs_defer_ops dfops;
> > > > xfs_fsblock_t firstblock;
> > > > int error;
> > > > + int noattr = 0;
> > >
> > > bool.
> > >
> > > And make it "bool has_attr = true" to avoid the nasty double
> > > negative of "!noattr" later on.
> >
> > I called it noattr because the error was -ENOATTR, but you are right,
> > has_attr is better. I assume you would prefer mode_changed to be a bool
> > as well?
> >
> > >
> > > >
> > > > XFS_STATS_INC(mp, xs_attr_remove);
> > > >
> > > > @@ -438,7 +439,12 @@ xfs_attr_remove(
> > > > xfs_trans_ijoin(args.trans, dp, 0);
> > > >
> > > > if (!xfs_inode_hasattr(dp)) {
> > > > - error = -ENOATTR;
> > > > + /*
> > > > + * Commit even if there is no attr to remove, because when
> > > > + * setting ACLs the caller may be changing the mode in the
> > > > + * same transaction.
> > > > + */
> > > > + noattr = 1;
> > > > } else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) {
> > > > ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> > > > error = xfs_attr_shortform_remove(&args);
> > > > @@ -458,7 +464,7 @@ xfs_attr_remove(
> > > > if (mp->m_flags & XFS_MOUNT_WSYNC)
> > > > xfs_trans_set_sync(args.trans);
> > > >
> > > > - if ((flags & ATTR_KERNOTIME) == 0)
> > > > + if ((flags & ATTR_KERNOTIME) == 0 && !noattr)
> > > > xfs_trans_ichgtime(args.trans, dp, XFS_ICHGTIME_CHG);
> > >
> > > Why?
> >
> > Because we don't want the ctime to change when you try to delete an
> > attribute that didn't exist in the first place. At least that's how
> > it went before, so I tried to keep it the same.
>
> Which means the logic is convoluted and no longer obvious as to what
> it does. i.e. the noattr case is running a transaction that dirties
> an inode that no modifications were made to. Alarm bells right
> there.
>
> > > > @@ -257,16 +242,20 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> > > > return error;
> > > >
> > > > if (type == ACL_TYPE_ACCESS) {
> > > > - umode_t mode;
> > > > -
> > > > - error = posix_acl_update_mode(inode, &mode, &acl);
> > > > - if (error)
> > > > - return error;
> > > > - error = xfs_set_mode(inode, mode);
> > > > + error = posix_acl_update_mode(inode, &inode->i_mode, &acl);
> > > > if (error)
> > > > return error;
> > > > + inode->i_ctime = current_time(inode);
> > > > + XFS_STATS_INC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > > > + mode_changed = 1;
> > > > }
> > >
> > > And this doesn't.
> > >
> > > So this breaks all our rules for inode metadata updates, both for
> > > runtime access serialisation and for crash consistency.
> >
> > I'm only making changes to the vfs inode, and xfs_set_acl() should always
> > be called with i_rwsem held for writing. I think that should be enough to
> > serialize access to the vfs inode fields.
>
> It's not. The i_rwsem does not protect XFS inode metadata, which is
> what needs changing here. i.e. we change both the VFS inode and
> the corresponding XFS inode metadata at the same time, under the
> same XFS_ILOCK_EXCL lock, only inside a transaction context.
>
> That's the rules, no exceptions.
>
> > I'm sorry, I'm not sure I follow the crash consistency part. This is not
> > writing anything to disk.
>
> xfs_set_mode() runs a transaction that is journalled to change the
> inode mode. You break in-memory consistency by modifying the VFS
> inode state directly without holding the correct locks (see
> xfs_setattr_mode()), and you break the transactional change model
> (i.e. on-disk journal consistency and on-disk metadata change
> ordering) by making metadata changes this outside transaction
> contexts and without holding the correct locks.
>
> > My expectation was that the changes I made to
> > the vfs inode would be attached to the transaction later, when calling
> > xfs_trans_log_inode() in xfs_attr_set() or xfs_attr_remove(). Would that
> > not work as intended?
>
> No, it wouldn't. You have to follow the filesystem specific update
> rules to change inode metadata, not make up your own....
>
> > I was mostly imitating how btrfs sets an ACL, so I
> > could be misunderstanding how xfs does journaling.
>
> ... and btrfs has a completely different transaction model and
> implementation to XFS. IOWs, trying to do what works for btrfs in
> XFS code will break XFS. And vice versa.
>
> > > > set_acl:
> > > > - return __xfs_set_acl(inode, acl, type);
> > > > + error = __xfs_set_acl(inode, acl, type);
> > > > + if (error && mode_changed) {
> > > > + inode->i_mode = old_mode;
> > > > + inode->i_ctime = old_ctime;
> > > > + XFS_STATS_DEC(XFS_I(inode)->i_mount, xs_ig_attrchg);
> > > > + }
> > >
> > > Same here.
> >
> > i_rwsem would still be held here. My reasoning is that, since there was
> > an error, no changes were actually committed; so there would be no need
> > to commit the restoration either, just restore the old in-memory vfs inode
> > fields. Once again, this is inspired by btrfs, so I suppose it may fail
> > here in some way I can't see.
> >
> > (I should probably clarify I did run tests before submitting)
>
> When you treat the code as a black box for testing, then end result
> might appear to be what you want. That doesn't mean the
> implementation is correct or that hidden transient states within the
> block box are incorrect or invalid.
>
> > > IOWs, you can't just remove a transactional modification of inode
> > > state and replace it with in-memory VFS inode level changes. You
> > > need to do the changes in transaction context and with the correct
> > > locks held.
> > >
> > > IOWs, if you need to undo the mode changes xfs_set_mode() did, you
> > > need to call xfs_set_mode() again. And because modifications have
> >
> > But that could fail as well, and how would we handle that?
>
> If the xattr change was not a fatal error (which will shut down the
> filesystem), then calling xfs_set_mode() should work just fine as it
> doesn't require any new space to be allocated (XFS != BTRFS - we can
> run overwriting transactions even at ENOSPC).
>
> And you need to run the mode undo as a separate transaction
> because....
>
> > If this bug
> > is to be fixed the whole thing needs to be a single transaction.
>
> .... this is simply not possible. Why? Because __xfs_set_acl() can
> run up three separate transactions internally in adding the new
> xattr. That means your unlogged vfs indoe change has already been
> logged to the journal and so we *require* a transaction to overwrite
> the new mode that has already been committed to the journal.
It's clear that I don't understand xfs, and I'm causing more trouble than
I'm helping. I should leave this issue to you guys.
> But I have to ask - why do we even need to modify the mode first?
> Why not change the ACL first, then modify the mode+timestamp? If
> setting the ACL fails, then we don't have anything to undo and all
> is good....
I intended for the mode to be committed as part of the same transaction
that sets or removes the ACL. In my mind making the changes later, as part
of a separate transaction, would have meant that a crash between the two
left the filesystem in an inconsistent state, with a new ACL but without
the corresponding mode bits.
Of course that was based on a poor understanding of xfs journaling, so now
I don't know.
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
Thanks,
Ernest
next prev parent reply other threads:[~2017-08-15 19:29 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-14 8:18 [PATCH] xfs: preserve i_mode if __xfs_set_acl() fails Ernesto A. Fernández
2017-08-14 22:28 ` Darrick J. Wong
2017-08-15 0:58 ` Ernesto A. Fernández
2017-08-15 2:00 ` Dave Chinner
2017-08-15 6:18 ` Ernesto A. Fernández
2017-08-15 8:44 ` Dave Chinner
2017-08-15 19:29 ` Ernesto A. Fernández [this message]
2017-08-16 0:25 ` Dave Chinner
2017-08-16 7:16 ` Ernesto A. Fernández
2017-08-16 13:35 ` Dave Chinner
2017-08-16 19:31 ` Ernesto A. Fernández
2017-08-17 0:00 ` Dave Chinner
2017-08-17 5:34 ` Ernesto A. Fernández
2017-08-17 6:34 ` Dave Chinner
2017-12-07 17:31 ` Bill O'Donnell
2017-12-07 17:38 ` Bill O'Donnell
2017-12-07 17:43 ` Darrick J. Wong
2017-12-07 17:51 ` Bill O'Donnell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170815192938.GA1247@debian.home \
--to=ernesto.mnd.fernandez@gmail.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).