From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757338AbdLQNBt (ORCPT ); Sun, 17 Dec 2017 08:01:49 -0500 Received: from mail.kernel.org ([198.145.29.99]:45940 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757207AbdLQNBo (ORCPT ); Sun, 17 Dec 2017 08:01:44 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 76A2A21877 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=jlayton@kernel.org Message-ID: <1513515701.3529.18.camel@kernel.org> Subject: Re: [PATCH 01/19] fs: new API for handling inode->i_version From: Jeff Layton To: NeilBrown , linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, hch@lst.de, neilb@suse.de, bfields@fieldses.org, amir73il@gmail.com, jack@suse.de, viro@zeniv.linux.org.uk Date: Sun, 17 Dec 2017 08:01:41 -0500 In-Reply-To: <87mv2jmhmm.fsf@notabene.neil.brown.name> References: <20171213142017.23653-1-jlayton@kernel.org> <20171213142017.23653-2-jlayton@kernel.org> <87indanv2m.fsf@notabene.neil.brown.name> <1513211220.3498.79.camel@kernel.org> <87mv2jmhmm.fsf@notabene.neil.brown.name> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.26.2 (3.26.2-1.fc27) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 2017-12-16 at 15:17 +1100, NeilBrown wrote: > On Wed, Dec 13 2017, Jeff Layton wrote: > > > On Thu, 2017-12-14 at 09:04 +1100, NeilBrown wrote: > > > On Wed, Dec 13 2017, Jeff Layton wrote: > > > > > > > +/* > > > > + * The change attribute (i_version) is mandated by NFSv4 and is mostly for > > > > + * knfsd, but is also used for other purposes (e.g. IMA). The i_version must > > > > + * appear different to observers if there was a change to the inode's data or > > > > + * metadata since it was last queried. > > > > + * > > > > + * It should be considered an opaque value by observers. If it remains the same > > > > + * since it was last checked, then nothing has changed in the inode. If it's > > > > + * different then something has changed. Observers cannot infer anything about > > > > + * the nature or magnitude of the changes from the value, only that the inode > > > > + * has changed in some fashion. > > > > > > I agree that it "should be" considered opaque, but I have a suspicion > > > that NFSv4 doesn't consider it opaque. > > > There is something about write delegations and the server performing a > > > GETATTR callback to the delegated client so that it can answer GETATTR > > > from other clients without recalling the delegation. > > > > > > Specifically section "10.4.3 Handling of CB_GETATTR" of RFC5661 contains > > > the text: > > > > > > o The client will create a value greater than c that will be used > > > for communicating that modified data is held at the client. Let > > > this value be represented by d. > > > > > > "c" here is a 'change' attribute. > > > > > > Then: > > > > > > While the change attribute is opaque to the client in the sense that > > > it has no idea what units of time, if any, the server is counting > > > change with, it is not opaque in that the client has to treat it as > > > an unsigned integer, and the server has to be able to see the results > > > of the client's changes to that integer. Therefore, the server MUST > > > encode the change attribute in network order when sending it to the > > > client. The client MUST decode it from network order to its native > > > order when receiving it, and the client MUST encode it in network > > > order when sending it to the server. For this reason, change is > > > defined as an unsigned integer rather than an opaque array of bytes. > > > > > > This all suggests that nfsd needs to be certain that "incrementing" the > > > change id will produce a new changeid, which has not been used before, > > > and also suggests that nfsd needs to be able to control the changeid > > > stored after writes that result from a delegation being returned. > > > > > > I'd just like to say that this is one of the most annoying dumb features > > > of NFSv4, because it is trivial to fix and I suggested a fix before > > > NFSv4.0 was finalized. Grumble. > > > > > > Otherwise the patch set looks good. I haven't gone over the code > > > closely, the but approach is spot-on. > > > > I don't think we have to do that. There are really only two states with > > a client holding a write delegation, as far as the server is concerned. > > Either: > > > > a) the client has done no writes to the file, in which case it'll return > > the same i_version that the server has when issued a CB_GETATTR > > > > ...or... > > > > b) it has written to the file while holding the delegation, in which > > case it'll return a different CB_GETATTR to the server > > > > The simplest thing for the server to do is to just increment the change > > attribute _once_ when it gets back a CB_GETATTR with a different change > > attr than it has. > > > > That's sufficient to tell another client issuing a a GETATTR that the > > file has changed without needing to recall the delegation. > > > > Prior to the delegation being returned, the client will send at least > > one WRITE RPC, and that's enough to ensure that the the next stat will > > see the thing increase. > > "increment" and "increase" are not words that mean anything for an > "opaque value". > NFSd is, presumably, an "observer" of i_version (as it isn't the > filesytem that controls it), so your text says it must treat i_version as > opaque. That means it cannot detect an "increase" (only a change), and > it certainly cannot "increment" the value. > > I think you need to allow observers to treat i_version as a 64 bit number > which will monotonically increase. Any change to the file will result > in an increment of at least '1'. Here, I was mostly speaking about NFS in general. I think the above method is the cheapest/best way to ensure that you don't end up with reused change attributes, within the confines of the protocol. With this implementation, it's probably safe enough to make a guarantee that the value will increase wrt a previously sampled value if there was a change. I'll have to think about how best to document that. Thanks, -- Jeff Layton