All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: NeilBrown <neilb@suse.com>, linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hch@lst.de, neilb@suse.de,
	bfields@fieldses.org, amir73il@gmail.com, jack@suse.de,
	viro@zeniv.linux.org.uk
Subject: Re: [PATCH 01/19] fs: new API for handling inode->i_version
Date: Mon, 18 Dec 2017 09:03:45 -0500	[thread overview]
Message-ID: <1513605825.3422.7.camel@kernel.org> (raw)
In-Reply-To: <87mv2jmhmm.fsf@notabene.neil.brown.name>

On Sat, 2017-12-16 at 15:17 +1100, NeilBrown wrote:
> On Wed, Dec 13 2017, Jeff Layton wrote:
> 
> > On Thu, 2017-12-14 at 09:04 +1100, NeilBrown wrote:
> > > On Wed, Dec 13 2017, Jeff Layton wrote:
> > > 
> > > > +/*
> > > > + * The change attribute (i_version) is mandated by NFSv4 and is mostly for
> > > > + * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
> > > > + * appear different to observers if there was a change to the inode's data or
> > > > + * metadata since it was last queried.
> > > > + *
> > > > + * It should be considered an opaque value by observers. If it remains the same
> > > > + * since it was last checked, then nothing has changed in the inode. If it's
> > > > + * different then something has changed. Observers cannot infer anything about
> > > > + * the nature or magnitude of the changes from the value, only that the inode
> > > > + * has changed in some fashion.
> > > 
> > > I agree that it "should be" considered opaque, but I have a suspicion
> > > that NFSv4 doesn't consider it opaque.
> > > There is something about write delegations and the server performing a
> > > GETATTR callback to the delegated client so that it can answer GETATTR
> > > from other clients without recalling the delegation.
> > > 
> > > Specifically section "10.4.3 Handling of CB_GETATTR" of RFC5661 contains
> > > the text:
> > > 
> > >    o  The client will create a value greater than c that will be used
> > >       for communicating that modified data is held at the client.  Let
> > >       this value be represented by d.
> > > 
> > > "c" here is a 'change' attribute.
> > > 
> > > Then:
> > > 
> > >    While the change attribute is opaque to the client in the sense that
> > >    it has no idea what units of time, if any, the server is counting
> > >    change with, it is not opaque in that the client has to treat it as
> > >    an unsigned integer, and the server has to be able to see the results
> > >    of the client's changes to that integer.  Therefore, the server MUST
> > >    encode the change attribute in network order when sending it to the
> > >    client.  The client MUST decode it from network order to its native
> > >    order when receiving it, and the client MUST encode it in network
> > >    order when sending it to the server.  For this reason, change is
> > >    defined as an unsigned integer rather than an opaque array of bytes.
> > > 
> > > This all suggests that nfsd needs to be certain that "incrementing" the
> > > change id will produce a new changeid, which has not been used before,
> > > and also suggests that nfsd needs to be able to control the changeid
> > > stored after writes that result from a delegation being returned.
> > > 
> > > I'd just like to say that this is one of the most annoying dumb features
> > > of NFSv4, because it is trivial to fix and I suggested a fix before
> > > NFSv4.0 was finalized.  Grumble.
> > > 
> > > Otherwise the patch set looks good.  I haven't gone over the code
> > > closely, the but approach is spot-on.
> > 
> > I don't think we have to do that. There are really only two states with
> > a client holding a write delegation, as far as the server is concerned.
> > Either:
> > 
> > a) the client has done no writes to the file, in which case it'll return
> > the same i_version that the server has when issued a CB_GETATTR
> > 
> > ...or...
> > 
> > b) it has written to the file while holding the delegation, in which
> > case it'll return a different CB_GETATTR to the server
> > 
> > The simplest thing for the server to do is to just increment the change
> > attribute _once_ when it gets back a CB_GETATTR with a different change
> > attr than it has.
> > 
> > That's sufficient to tell another client issuing a a GETATTR that the
> > file has changed without needing to recall the delegation.
> > 
> > Prior to the delegation being returned, the client will send at least
> > one WRITE RPC, and that's enough to ensure that the the next stat will
> > see the thing increase.
> 
> "increment" and "increase" are not words that mean anything for an
> "opaque value".
> NFSd is, presumably, an "observer" of i_version (as it isn't the
> filesytem that controls it), so your text says it must treat i_version as
> opaque.  That means it cannot detect an "increase" (only a change), and
> it certainly cannot "increment" the value.
> 
> I think you need to allow observers to treat i_version as a 64 bit number
> which will monotonically increase.  Any change to the file will result
> in an increment of at least '1'.
> 

One thing here...

I'm currently doing this:

static inline s64                                                               
inode_cmp_iversion(const struct inode *inode, const u64 old)                    
{                                                                               
        return (s64)inode_peek_iversion(inode) - (s64)old;                      
}                                                                               

But I don't think that'll handle wraparound correctly if we want to 
allow people to determine whether it's older or newer. I'll probably
change this to shift the old value left by one bit, and mask off the low
bit of the current inode->i_version.

That'll always give you a difference of 2 or more if they're different,
but it should return the correct sign, which is really all we care about
anyway.

Granted, we're unlikely to wrap around with a 64 bit value, but it's
hard to know for sure what values might be stored on disk on existing
filesystems.
-- 
Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2017-12-18 14:03 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-13 14:19 [PATCH 00/19] fs: rework and optimize i_version handling in filesystems Jeff Layton
2017-12-13 14:19 ` [PATCH 01/19] fs: new API for handling inode->i_version Jeff Layton
2017-12-13 22:04   ` NeilBrown
2017-12-14  0:27     ` Jeff Layton
2017-12-16  4:17       ` NeilBrown
2017-12-17 13:01         ` Jeff Layton
2017-12-18 14:03         ` Jeff Layton [this message]
2017-12-13 14:20 ` [PATCH 02/19] fs: don't take the i_lock in inode_inc_iversion Jeff Layton
2017-12-13 21:52   ` Jeff Layton
2017-12-13 22:07     ` NeilBrown
2017-12-13 14:20 ` [PATCH 03/19] fat: convert to new i_version API Jeff Layton
2017-12-13 14:20 ` [PATCH 04/19] affs: " Jeff Layton
2017-12-13 14:20 ` [PATCH 05/19] afs: " Jeff Layton
2017-12-13 14:20 ` [PATCH 06/19] btrfs: " Jeff Layton
2017-12-13 14:20 ` [PATCH 07/19] exofs: switch " Jeff Layton
2017-12-13 14:20 ` [PATCH 08/19] ext2: convert " Jeff Layton
2017-12-18 12:47   ` Jan Kara
2017-12-13 14:20 ` [PATCH 09/19] ext4: " Jeff Layton
2017-12-14 21:52   ` Theodore Ts'o
2017-12-13 14:20 ` [PATCH 10/19] nfs: " Jeff Layton
2017-12-13 14:20 ` [PATCH 11/19] nfsd: " Jeff Layton
2017-12-13 14:20 ` [PATCH 12/19] ocfs2: " Jeff Layton
2017-12-18 12:49   ` Jan Kara
2017-12-13 14:20 ` [PATCH 13/19] ufs: use " Jeff Layton
2017-12-13 14:20 ` [PATCH 14/19] xfs: convert to " Jeff Layton
2017-12-13 22:48   ` Dave Chinner
2017-12-13 23:25     ` Dave Chinner
2017-12-14  0:10       ` Jeff Layton
2017-12-14  2:17         ` Dave Chinner
2017-12-14 11:16           ` Jeff Layton
2017-12-13 14:20 ` [PATCH 15/19] IMA: switch IMA over " Jeff Layton
2017-12-13 14:20 ` [PATCH 16/19] fs: only set S_VERSION when updating times if necessary Jeff Layton
2017-12-15 12:59   ` Jeff Layton
2017-12-13 14:20 ` [PATCH 17/19] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2017-12-13 14:20 ` [PATCH 18/19] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2017-12-15 13:03   ` Jeff Layton
2017-12-13 14:20 ` [PATCH 19/19] fs: handle inode->i_version more efficiently Jeff Layton
2017-12-13 15:05 ` [PATCH 00/19] fs: rework and optimize i_version handling in filesystems J. Bruce Fields
2017-12-13 20:14   ` Jeff Layton
2017-12-13 22:10     ` Jeff Layton
2017-12-13 23:03     ` Dave Chinner
2017-12-14  0:02       ` Jeff Layton
2017-12-14 14:14         ` Jeff Layton
2017-12-14 15:14           ` J. Bruce Fields
2017-12-15 15:15             ` Jeff Layton
2017-12-15 15:26               ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1513605825.3422.7.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=bfields@fieldses.org \
    --cc=hch@lst.de \
    --cc=jack@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=neilb@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.