linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
@ 2011-07-18 22:23 J. Bruce Fields
  0 siblings, 0 replies; 14+ messages in thread
From: J. Bruce Fields @ 2011-07-18 22:23 UTC (permalink / raw)
  To: trond; +Cc: linux-nfs

From: "J. Bruce Fields" <bfields@redhat.com>

We recently noticed that NFSv4 iozone read tests that should have been
performing at local-cache speeds were running at wire speeds, and found
that currently,

	open(O_RDWR|O_TRUNC)
	write()
	close()
	open(O_RDONLY)
	read()

results in an over-the-wire read (due to a setattr triggered by
O_TRUNC).  Even non-O_TRUNC opens send setattrs (of timestamps) in some
cases, causing the same problem.

In discussion with Trond, he blames a VFS bug for breaking what should
be a single open into an open followed by a setattr.

However, it may make sense even in a case like

	open(O_RDWR)
	write()
	setattr()
	close()
	open(O_RDONLY)
	read()

to treat the setattr similarly to the write, and not invalidate the
client's own cache as long as the setattr is performed under a write
open.

(That said, I don't know if this is the correct place in the code to
implement that behavior.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfs/inode.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 6f4850d..d4eed06 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -438,8 +438,13 @@ nfs_setattr(struct dentry *dentry, struct iattr *attr)
 	if ((attr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0)
 		nfs_inode_return_delegation(inode);
 	error = NFS_PROTO(inode)->setattr(dentry, fattr, attr);
-	if (error == 0)
+	if (error)
+		goto out_free;
+	if (attr->ia_valid & ATTR_FILE)
+		nfs_post_op_update_inode_force_wcc(inode, fattr);
+	else
 		nfs_refresh_inode(inode, fattr);
+out_free:
 	nfs_free_fattr(fattr);
 out:
 	return error;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
  2011-11-03 21:20                   ` J. Bruce Fields
@ 2011-11-03 21:39                     ` Trond Myklebust
  0 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2011-11-03 21:39 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, J. Bruce Fields, linux-nfs

On Thu, 2011-11-03 at 17:20 -0400, J. Bruce Fields wrote: 
> On Thu, Nov 03, 2011 at 05:03:05PM -0400, Trond Myklebust wrote:
> > On Thu, 2011-11-03 at 16:44 -0400, J. Bruce Fields wrote: 
> > > I'm feeling dense.
> > > 
> > > On Wed, Nov 02, 2011 at 11:54:53AM -0400, Jeff Layton wrote:
> > > > Yes, I think it is different. When we truncate the file to 0 then we
> > > > don't care if another write races in.
> > > 
> > > 	Client 1		Client 2
> > > 	--------		--------
> > > 
> > > 	setattr size to 0
> > > 				write to file
> > > 	get change attribute
> > > 
> > > If we decide not to invalidate the cache here, then we miss Client 2's
> > > write.  The same is true if we set the size to something other than 0.
> > 
> > ?????????
> > How can we "decide not to invalidate the cache here"? the call to
> > nfs_vmtruncate() will _always_ call truncate_pagecache().
> > The only case where we don't call nfs_vmtruncate() is if the client has
> > already determined that the file size was zero, and optimised away the
> > setattr call altogether.
> 
> Sorry, I'm probably using the wrong language.
> 
> I'm using the word "cache" to refer not to the page cache, but more
> generally to the client's idea of the file state.  Thus if the client,
> after the above operation, is left believing the file has size 0, I
> wouldn't say it has invalidated its cache, I'd say it's cached the fact
> that the file has size zero....
> 
> Does what I said make sense, given that?

Nope; I'm still lost.

So if the client is caching a file size of 0, why would it "miss a
write"? It will soon see that the file size on the server is non-zero.
In fact, your 'get change attribute' will always be accompanied by a
'get file size' that will detect the changed size right there...

> > > There's a second possible race when Client 2's write comes before the
> > > setattr, and it's true that that race doesn't matter in the size-0 case
> > > and does in the other.
> > > 
> > > But if we were doing a write instead of a setattr, we'd ignore that
> > > second race.
> > > 
> > > And I don't understand why something like ftruncate should be treated
> > > differently from write.  In either case we're modifying a file while
> > > holding a write open, and I'd expect us to assume in both cases that
> > > nobody else is doing the same.
> > 
> > When did we move to the topic of multiple clients, and why do you think
> > it is relevant?
> 
> We're seeing the client decide that it can no longer trust its idea of
> the file, presumably because it no longer believes it safe to assume
> that nobody else has modified it.

Did you look into the race I described yesterday? That doesn't require a
client 2.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
  2011-11-03 21:03                 ` Trond Myklebust
@ 2011-11-03 21:20                   ` J. Bruce Fields
  2011-11-03 21:39                     ` Trond Myklebust
  0 siblings, 1 reply; 14+ messages in thread
From: J. Bruce Fields @ 2011-11-03 21:20 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jeff Layton, J. Bruce Fields, linux-nfs

On Thu, Nov 03, 2011 at 05:03:05PM -0400, Trond Myklebust wrote:
> On Thu, 2011-11-03 at 16:44 -0400, J. Bruce Fields wrote: 
> > I'm feeling dense.
> > 
> > On Wed, Nov 02, 2011 at 11:54:53AM -0400, Jeff Layton wrote:
> > > Yes, I think it is different. When we truncate the file to 0 then we
> > > don't care if another write races in.
> > 
> > 	Client 1		Client 2
> > 	--------		--------
> > 
> > 	setattr size to 0
> > 				write to file
> > 	get change attribute
> > 
> > If we decide not to invalidate the cache here, then we miss Client 2's
> > write.  The same is true if we set the size to something other than 0.
> 
> ?????????
> How can we "decide not to invalidate the cache here"? the call to
> nfs_vmtruncate() will _always_ call truncate_pagecache().
> The only case where we don't call nfs_vmtruncate() is if the client has
> already determined that the file size was zero, and optimised away the
> setattr call altogether.

Sorry, I'm probably using the wrong language.

I'm using the word "cache" to refer not to the page cache, but more
generally to the client's idea of the file state.  Thus if the client,
after the above operation, is left believing the file has size 0, I
wouldn't say it has invalidated its cache, I'd say it's cached the fact
that the file has size zero....

Does what I said make sense, given that?

> > There's a second possible race when Client 2's write comes before the
> > setattr, and it's true that that race doesn't matter in the size-0 case
> > and does in the other.
> > 
> > But if we were doing a write instead of a setattr, we'd ignore that
> > second race.
> > 
> > And I don't understand why something like ftruncate should be treated
> > differently from write.  In either case we're modifying a file while
> > holding a write open, and I'd expect us to assume in both cases that
> > nobody else is doing the same.
> 
> When did we move to the topic of multiple clients, and why do you think
> it is relevant?

We're seeing the client decide that it can no longer trust its idea of
the file, presumably because it no longer believes it safe to assume
that nobody else has modified it.

--b.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
  2011-11-03 20:44               ` J. Bruce Fields
@ 2011-11-03 21:03                 ` Trond Myklebust
  2011-11-03 21:20                   ` J. Bruce Fields
  0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2011-11-03 21:03 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, J. Bruce Fields, linux-nfs

On Thu, 2011-11-03 at 16:44 -0400, J. Bruce Fields wrote: 
> I'm feeling dense.
> 
> On Wed, Nov 02, 2011 at 11:54:53AM -0400, Jeff Layton wrote:
> > Yes, I think it is different. When we truncate the file to 0 then we
> > don't care if another write races in.
> 
> 	Client 1		Client 2
> 	--------		--------
> 
> 	setattr size to 0
> 				write to file
> 	get change attribute
> 
> If we decide not to invalidate the cache here, then we miss Client 2's
> write.  The same is true if we set the size to something other than 0.

?????????

How can we "decide not to invalidate the cache here"? the call to
nfs_vmtruncate() will _always_ call truncate_pagecache().
The only case where we don't call nfs_vmtruncate() is if the client has
already determined that the file size was zero, and optimised away the
setattr call altogether.

> There's a second possible race when Client 2's write comes before the
> setattr, and it's true that that race doesn't matter in the size-0 case
> and does in the other.
> 
> But if we were doing a write instead of a setattr, we'd ignore that
> second race.
> 
> And I don't understand why something like ftruncate should be treated
> differently from write.  In either case we're modifying a file while
> holding a write open, and I'd expect us to assume in both cases that
> nobody else is doing the same.

When did we move to the topic of multiple clients, and why do you think
it is relevant?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
  2011-11-02 15:54             ` Jeff Layton
@ 2011-11-03 20:44               ` J. Bruce Fields
  2011-11-03 21:03                 ` Trond Myklebust
  0 siblings, 1 reply; 14+ messages in thread
From: J. Bruce Fields @ 2011-11-03 20:44 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Myklebust, Trond, J. Bruce Fields, linux-nfs

I'm feeling dense.

On Wed, Nov 02, 2011 at 11:54:53AM -0400, Jeff Layton wrote:
> Yes, I think it is different. When we truncate the file to 0 then we
> don't care if another write races in.

	Client 1		Client 2
	--------		--------

	setattr size to 0
				write to file
	get change attribute

If we decide not to invalidate the cache here, then we miss Client 2's
write.  The same is true if we set the size to something other than 0.

There's a second possible race when Client 2's write comes before the
setattr, and it's true that that race doesn't matter in the size-0 case
and does in the other.

But if we were doing a write instead of a setattr, we'd ignore that
second race.

And I don't understand why something like ftruncate should be treated
differently from write.  In either case we're modifying a file while
holding a write open, and I'd expect us to assume in both cases that
nobody else is doing the same.

> We're invalidating all of the
> pages we have cached for the file at that point anyway. If we try to
> reread the file afterward we're going to have to fetch the data from
> the server regardless.
> 
> I guess my main point is that we won't have any cached data after a
> truncate to 0, so there's no need to worry about ensuring that the
> cache is eventually invalidated after that. The VFS-level truncate on
> the client is going to take care of that for us anyway. That allows us
> to optimize for this special (but common) case.
> 
> A truncate to a non-zero size is different however because we may still
> have lingering cached pages. We will need to invalidate those if the
> server sends pre-op attrs

There are no pre-op attrs: the Linux v4 client's write compound is
putfh, write, getattr, with no getattr before the write, and the
nfs_post_op_update_inode_force_wcc() call in nfs4_write_done_cb() makes
the client pretends there were pre-op attributes the same as whatever it
currently has cached in the inode--if I understand it right.

--b.

> that indicate that something else happened
> prior to the truncate, or if the server doesn't send pre-op attrs at
> all.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
  2011-11-02 14:46           ` J. Bruce Fields
@ 2011-11-02 15:54             ` Jeff Layton
  2011-11-03 20:44               ` J. Bruce Fields
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Layton @ 2011-11-02 15:54 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Myklebust, Trond, J. Bruce Fields, linux-nfs

On Wed, 2 Nov 2011 10:46:57 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Wed, Nov 02, 2011 at 07:07:42AM -0400, Jeff Layton wrote:
> > On Tue, 1 Nov 2011 21:23:15 -0400
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > On Tue, Nov 01, 2011 at 08:43:25PM -0400, Jeff Layton wrote:
> > > > On Tue, 1 Nov 2011 16:07:27 -0700
> > > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > > > > Is the problem perhaps that we should be clearing the
> > > > > NFS_INO_INVALID_DATA flag in nfs_vmtruncate() when the size gets set to
> > > > > zero?
> > > > > 
> > > > 
> > > > That was my thinking too. Whenever we truncate the i_size to 0, we
> > > > can safely assume that the pagecache is now valid, and should be able
> > > > to clear NFS_INO_INVALID_DATA no matter when it was set, right?
> > > 
> > > I don't understand why 0 is a special case: why should my setting the
> > > size ever mean that I have to go reread data from the server?
> > > 
> > 
> > If the server doesn't send pre-op attrs (and linux servers don't) then
> > you have no way to know whether someone raced in and did writes to the
> > remaining pages just prior to your truncate.
> > 
> > Size 0 is a special case because there are no remaining pages.
> 
> Those 3rd-party writes could also arrive between the truncate and the
> read of the post-op attrs.  Is this setattr/write conflict really any
> different than write/write conflicts?  In both cases you've got a file
> open for write on two different clients.
> 
> And we live with the same sort of race in the write/write case.  (Hence
> the nfs_post_op_update_inode_force_wcc in nfs4_write_done_cb.)
> 


Yes, I think it is different. When we truncate the file to 0 then we
don't care if another write races in. We're invalidating all of the
pages we have cached for the file at that point anyway. If we try to
reread the file afterward we're going to have to fetch the data from
the server regardless.

I guess my main point is that we won't have any cached data after a
truncate to 0, so there's no need to worry about ensuring that the
cache is eventually invalidated after that. The VFS-level truncate on
the client is going to take care of that for us anyway. That allows us
to optimize for this special (but common) case.

A truncate to a non-zero size is different however because we may still
have lingering cached pages. We will need to invalidate those if the
server sends pre-op attrs that indicate that something else happened
prior to the truncate, or if the server doesn't send pre-op attrs at
all.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
  2011-11-02 11:07         ` Jeff Layton
@ 2011-11-02 14:46           ` J. Bruce Fields
  2011-11-02 15:54             ` Jeff Layton
  0 siblings, 1 reply; 14+ messages in thread
From: J. Bruce Fields @ 2011-11-02 14:46 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Myklebust, Trond, J. Bruce Fields, linux-nfs

On Wed, Nov 02, 2011 at 07:07:42AM -0400, Jeff Layton wrote:
> On Tue, 1 Nov 2011 21:23:15 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Tue, Nov 01, 2011 at 08:43:25PM -0400, Jeff Layton wrote:
> > > On Tue, 1 Nov 2011 16:07:27 -0700
> > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > > > Is the problem perhaps that we should be clearing the
> > > > NFS_INO_INVALID_DATA flag in nfs_vmtruncate() when the size gets set to
> > > > zero?
> > > > 
> > > 
> > > That was my thinking too. Whenever we truncate the i_size to 0, we
> > > can safely assume that the pagecache is now valid, and should be able
> > > to clear NFS_INO_INVALID_DATA no matter when it was set, right?
> > 
> > I don't understand why 0 is a special case: why should my setting the
> > size ever mean that I have to go reread data from the server?
> > 
> 
> If the server doesn't send pre-op attrs (and linux servers don't) then
> you have no way to know whether someone raced in and did writes to the
> remaining pages just prior to your truncate.
> 
> Size 0 is a special case because there are no remaining pages.

Those 3rd-party writes could also arrive between the truncate and the
read of the post-op attrs.  Is this setattr/write conflict really any
different than write/write conflicts?  In both cases you've got a file
open for write on two different clients.

And we live with the same sort of race in the write/write case.  (Hence
the nfs_post_op_update_inode_force_wcc in nfs4_write_done_cb.)

--b.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
  2011-11-02  1:23       ` J. Bruce Fields
  2011-11-02  1:36         ` Myklebust, Trond
@ 2011-11-02 11:07         ` Jeff Layton
  2011-11-02 14:46           ` J. Bruce Fields
  1 sibling, 1 reply; 14+ messages in thread
From: Jeff Layton @ 2011-11-02 11:07 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Myklebust, Trond, J. Bruce Fields, linux-nfs

On Tue, 1 Nov 2011 21:23:15 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Tue, Nov 01, 2011 at 08:43:25PM -0400, Jeff Layton wrote:
> > On Tue, 1 Nov 2011 16:07:27 -0700
> > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> > 
> > > > -----Original Message-----
> > > > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > > > Sent: Tuesday, November 01, 2011 4:27 PM
> > > > To: Myklebust, Trond
> > > > Cc: J. Bruce Fields; Myklebust, Trond; linux-nfs@vger.kernel.org
> > > > Subject: Re: [PATCH] nfs: open-associated setattr shouldn't invalidate
> > > own
> > > > cache
> > > > 
> > > > On Mon, Jul 18, 2011 at 08:09:15PM -0400, Myklebust, Trond wrote:
> > > > > We should already optimize away the unnecessary setting of the size.
> > > > 
> > > > Do you remember what commit fixed that?  (Was it an nfs change or a
> > > vfs
> > > > change?)
> > > 
> > > It predates the git repository. See the comment about "Optimization:" in
> > > nfs_setattr().
> > > 
> > > > > The problem is that truncate() still requires you to set the ctime,
> > > whereas
> > > > ftruncate() does not iirc.
> > > > 
> > > > Staring at the code....  I think you mean the opposite?  I notice
> > > > do_sys_ftruncate() calling
> > > > 
> > > > 	do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, file);
> > > > 
> > > > and do_sys_truncate() calling
> > > > 
> > > > 	do_truncate(path.dentry, length, 0, NULL);
> > > > 
> > > > where the third argument is getting OR'd with ATTR_FILE to pass into
> > > > notify_change().
> > > 
> > > Sorry, yes. ftruncate() is the one that unconditionally sets the
> > > mtime/ctime on success according to the POSIX spec.
> > > 
> > 
> > Even when it's a noop? Blech.
> > 
> > > > Also even when a setattr does get through, I don't understand why it
> > > should
> > > > be invalidating our data cache.  Is there some reason it needs to, or
> > > is this just
> > > > a case that hasn't seemed worth fixing?
> > > 
> > > Is the problem perhaps that we should be clearing the
> > > NFS_INO_INVALID_DATA flag in nfs_vmtruncate() when the size gets set to
> > > zero?
> > > 
> > 
> > That was my thinking too. Whenever we truncate the i_size to 0, we
> > can safely assume that the pagecache is now valid, and should be able
> > to clear NFS_INO_INVALID_DATA no matter when it was set, right?
> 
> I don't understand why 0 is a special case: why should my setting the
> size ever mean that I have to go reread data from the server?
> 

If the server doesn't send pre-op attrs (and linux servers don't) then
you have no way to know whether someone raced in and did writes to the
remaining pages just prior to your truncate.

Size 0 is a special case because there are no remaining pages.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
  2011-11-02  1:23       ` J. Bruce Fields
@ 2011-11-02  1:36         ` Myklebust, Trond
  2011-11-02 11:07         ` Jeff Layton
  1 sibling, 0 replies; 14+ messages in thread
From: Myklebust, Trond @ 2011-11-02  1:36 UTC (permalink / raw)
  To: J. Bruce Fields, Jeff Layton; +Cc: J. Bruce Fields, linux-nfs

> -----Original Message-----
> From: J. Bruce Fields [mailto:bfields@fieldses.org]
> Sent: Tuesday, November 01, 2011 9:23 PM
> To: Jeff Layton
> Cc: Myklebust, Trond; J. Bruce Fields; linux-nfs@vger.kernel.org
> Subject: Re: [PATCH] nfs: open-associated setattr shouldn't invalidate
own
> cache
> 
> On Tue, Nov 01, 2011 at 08:43:25PM -0400, Jeff Layton wrote:
> > On Tue, 1 Nov 2011 16:07:27 -0700
> > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> >
> > > > -----Original Message-----
> > > > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > > > Sent: Tuesday, November 01, 2011 4:27 PM
> > > > To: Myklebust, Trond
> > > > Cc: J. Bruce Fields; Myklebust, Trond; linux-nfs@vger.kernel.org
> > > > Subject: Re: [PATCH] nfs: open-associated setattr shouldn't
> > > > invalidate
> > > own
> > > > cache
> > > >
> > > > On Mon, Jul 18, 2011 at 08:09:15PM -0400, Myklebust, Trond
wrote:
> > > > > We should already optimize away the unnecessary setting of the
size.
> > > >
> > > > Do you remember what commit fixed that?  (Was it an nfs change
or
> > > > a
> > > vfs
> > > > change?)
> > >
> > > It predates the git repository. See the comment about
> > > "Optimization:" in nfs_setattr().
> > >
> > > > > The problem is that truncate() still requires you to set the
> > > > > ctime,
> > > whereas
> > > > ftruncate() does not iirc.
> > > >
> > > > Staring at the code....  I think you mean the opposite?  I
notice
> > > > do_sys_ftruncate() calling
> > > >
> > > > 	do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME,
file);
> > > >
> > > > and do_sys_truncate() calling
> > > >
> > > > 	do_truncate(path.dentry, length, 0, NULL);
> > > >
> > > > where the third argument is getting OR'd with ATTR_FILE to pass
> > > > into notify_change().
> > >
> > > Sorry, yes. ftruncate() is the one that unconditionally sets the
> > > mtime/ctime on success according to the POSIX spec.
> > >
> >
> > Even when it's a noop? Blech.
> >
> > > > Also even when a setattr does get through, I don't understand
why
> > > > it
> > > should
> > > > be invalidating our data cache.  Is there some reason it needs
to,
> > > > or
> > > is this just
> > > > a case that hasn't seemed worth fixing?
> > >
> > > Is the problem perhaps that we should be clearing the
> > > NFS_INO_INVALID_DATA flag in nfs_vmtruncate() when the size gets
set
> > > to zero?
> > >
> >
> > That was my thinking too. Whenever we truncate the i_size to 0, we
can
> > safely assume that the pagecache is now valid, and should be able to
> > clear NFS_INO_INVALID_DATA no matter when it was set, right?
> 
> I don't understand why 0 is a special case: why should my setting the
size
> ever mean that I have to go reread data from the server?

Your test case was:

open(O_RDWR|O_TRUNCATE)
write()
close()
open(O_RDONLY)
read()
...

That truncates _all_ data, so there is nothing left in the data cache
when we get to the write().

The question therefore is not "Why does setattr clear my cache?".

The question is "Why isn't that write() cached?".

That's why I'm exploring reasons why the cache might get cleared after
that write(). The other obvious one would be if the server is failing to
return post-op attributes in the WRITE reply.

Trond

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
  2011-11-02  0:43     ` Jeff Layton
@ 2011-11-02  1:23       ` J. Bruce Fields
  2011-11-02  1:36         ` Myklebust, Trond
  2011-11-02 11:07         ` Jeff Layton
  0 siblings, 2 replies; 14+ messages in thread
From: J. Bruce Fields @ 2011-11-02  1:23 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Myklebust, Trond, J. Bruce Fields, linux-nfs

On Tue, Nov 01, 2011 at 08:43:25PM -0400, Jeff Layton wrote:
> On Tue, 1 Nov 2011 16:07:27 -0700
> "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> 
> > > -----Original Message-----
> > > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > > Sent: Tuesday, November 01, 2011 4:27 PM
> > > To: Myklebust, Trond
> > > Cc: J. Bruce Fields; Myklebust, Trond; linux-nfs@vger.kernel.org
> > > Subject: Re: [PATCH] nfs: open-associated setattr shouldn't invalidate
> > own
> > > cache
> > > 
> > > On Mon, Jul 18, 2011 at 08:09:15PM -0400, Myklebust, Trond wrote:
> > > > We should already optimize away the unnecessary setting of the size.
> > > 
> > > Do you remember what commit fixed that?  (Was it an nfs change or a
> > vfs
> > > change?)
> > 
> > It predates the git repository. See the comment about "Optimization:" in
> > nfs_setattr().
> > 
> > > > The problem is that truncate() still requires you to set the ctime,
> > whereas
> > > ftruncate() does not iirc.
> > > 
> > > Staring at the code....  I think you mean the opposite?  I notice
> > > do_sys_ftruncate() calling
> > > 
> > > 	do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, file);
> > > 
> > > and do_sys_truncate() calling
> > > 
> > > 	do_truncate(path.dentry, length, 0, NULL);
> > > 
> > > where the third argument is getting OR'd with ATTR_FILE to pass into
> > > notify_change().
> > 
> > Sorry, yes. ftruncate() is the one that unconditionally sets the
> > mtime/ctime on success according to the POSIX spec.
> > 
> 
> Even when it's a noop? Blech.
> 
> > > Also even when a setattr does get through, I don't understand why it
> > should
> > > be invalidating our data cache.  Is there some reason it needs to, or
> > is this just
> > > a case that hasn't seemed worth fixing?
> > 
> > Is the problem perhaps that we should be clearing the
> > NFS_INO_INVALID_DATA flag in nfs_vmtruncate() when the size gets set to
> > zero?
> > 
> 
> That was my thinking too. Whenever we truncate the i_size to 0, we
> can safely assume that the pagecache is now valid, and should be able
> to clear NFS_INO_INVALID_DATA no matter when it was set, right?

I don't understand why 0 is a special case: why should my setting the
size ever mean that I have to go reread data from the server?

--b.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
  2011-11-01 23:07   ` Myklebust, Trond
@ 2011-11-02  0:43     ` Jeff Layton
  2011-11-02  1:23       ` J. Bruce Fields
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Layton @ 2011-11-02  0:43 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: J. Bruce Fields, J. Bruce Fields, linux-nfs

On Tue, 1 Nov 2011 16:07:27 -0700
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> > -----Original Message-----
> > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > Sent: Tuesday, November 01, 2011 4:27 PM
> > To: Myklebust, Trond
> > Cc: J. Bruce Fields; Myklebust, Trond; linux-nfs@vger.kernel.org
> > Subject: Re: [PATCH] nfs: open-associated setattr shouldn't invalidate
> own
> > cache
> > 
> > On Mon, Jul 18, 2011 at 08:09:15PM -0400, Myklebust, Trond wrote:
> > > We should already optimize away the unnecessary setting of the size.
> > 
> > Do you remember what commit fixed that?  (Was it an nfs change or a
> vfs
> > change?)
> 
> It predates the git repository. See the comment about "Optimization:" in
> nfs_setattr().
> 
> > > The problem is that truncate() still requires you to set the ctime,
> whereas
> > ftruncate() does not iirc.
> > 
> > Staring at the code....  I think you mean the opposite?  I notice
> > do_sys_ftruncate() calling
> > 
> > 	do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, file);
> > 
> > and do_sys_truncate() calling
> > 
> > 	do_truncate(path.dentry, length, 0, NULL);
> > 
> > where the third argument is getting OR'd with ATTR_FILE to pass into
> > notify_change().
> 
> Sorry, yes. ftruncate() is the one that unconditionally sets the
> mtime/ctime on success according to the POSIX spec.
> 

Even when it's a noop? Blech.

> > Also even when a setattr does get through, I don't understand why it
> should
> > be invalidating our data cache.  Is there some reason it needs to, or
> is this just
> > a case that hasn't seemed worth fixing?
> 
> Is the problem perhaps that we should be clearing the
> NFS_INO_INVALID_DATA flag in nfs_vmtruncate() when the size gets set to
> zero?
> 

That was my thinking too. Whenever we truncate the i_size to 0, we
can safely assume that the pagecache is now valid, and should be able
to clear NFS_INO_INVALID_DATA no matter when it was set, right?

> The other micro-optimisation that we might want to consider is to not
> set NFS_INO_INVALID_DATA in nfs_update_inode() if inode->i_data.nrpages
> == 0.
> 

This I'm a little less clear on...

If we find that another host has truncated the i_size to 0, don't we
still need to call invalidate_inode_pages2() somehow?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
  2011-11-01 20:27 ` J. Bruce Fields
@ 2011-11-01 23:07   ` Myklebust, Trond
  2011-11-02  0:43     ` Jeff Layton
  0 siblings, 1 reply; 14+ messages in thread
From: Myklebust, Trond @ 2011-11-01 23:07 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs

> -----Original Message-----
> From: J. Bruce Fields [mailto:bfields@fieldses.org]
> Sent: Tuesday, November 01, 2011 4:27 PM
> To: Myklebust, Trond
> Cc: J. Bruce Fields; Myklebust, Trond; linux-nfs@vger.kernel.org
> Subject: Re: [PATCH] nfs: open-associated setattr shouldn't invalidate
own
> cache
> 
> On Mon, Jul 18, 2011 at 08:09:15PM -0400, Myklebust, Trond wrote:
> > We should already optimize away the unnecessary setting of the size.
> 
> Do you remember what commit fixed that?  (Was it an nfs change or a
vfs
> change?)

It predates the git repository. See the comment about "Optimization:" in
nfs_setattr().

> > The problem is that truncate() still requires you to set the ctime,
whereas
> ftruncate() does not iirc.
> 
> Staring at the code....  I think you mean the opposite?  I notice
> do_sys_ftruncate() calling
> 
> 	do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, file);
> 
> and do_sys_truncate() calling
> 
> 	do_truncate(path.dentry, length, 0, NULL);
> 
> where the third argument is getting OR'd with ATTR_FILE to pass into
> notify_change().

Sorry, yes. ftruncate() is the one that unconditionally sets the
mtime/ctime on success according to the POSIX spec.

> Also even when a setattr does get through, I don't understand why it
should
> be invalidating our data cache.  Is there some reason it needs to, or
is this just
> a case that hasn't seemed worth fixing?

Is the problem perhaps that we should be clearing the
NFS_INO_INVALID_DATA flag in nfs_vmtruncate() when the size gets set to
zero?

The other micro-optimisation that we might want to consider is to not
set NFS_INO_INVALID_DATA in nfs_update_inode() if inode->i_data.nrpages
== 0.

Cheers
  Trond 

> > "J. Bruce Fields" <bfields@redhat.com> wrote:
> >
> > From: "J. Bruce Fields" <bfields@redhat.com>
> >
> > We recently noticed that NFSv4 iozone read tests that should have
been
> > performing at local-cache speeds were running at wire speeds, and
> > found that currently,
> >
> > 	open(O_RDWR|O_TRUNC)
> > 	write()
> > 	close()
> > 	open(O_RDONLY)
> > 	read()
> >
> > results in an over-the-wire read (due to a setattr triggered by
> > O_TRUNC).  Even non-O_TRUNC opens send setattrs (of timestamps) in
> > some cases, causing the same problem.
> >
> > In discussion with Trond, he blames a VFS bug for breaking what
should
> > be a single open into an open followed by a setattr.
> >
> > However, it may make sense even in a case like
> >
> > 	open(O_RDWR)
> > 	write()
> > 	setattr()
> > 	close()
> > 	open(O_RDONLY)
> > 	read()
> >
> > to treat the setattr similarly to the write, and not invalidate the
> > client's own cache as long as the setattr is performed under a write
> > open.
> >
> > (That said, I don't know if this is the correct place in the code to
> > implement that behavior.)
> >
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> >  fs/nfs/inode.c |    7 ++++++-
> >  1 files changed, 6 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 6f4850d..d4eed06
> > 100644
> > --- a/fs/nfs/inode.c
> > +++ b/fs/nfs/inode.c
> > @@ -438,8 +438,13 @@ nfs_setattr(struct dentry *dentry, struct iattr
> *attr)
> >  	if ((attr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0)
> >  		nfs_inode_return_delegation(inode);
> >  	error = NFS_PROTO(inode)->setattr(dentry, fattr, attr);
> > -	if (error == 0)
> > +	if (error)
> > +		goto out_free;
> > +	if (attr->ia_valid & ATTR_FILE)
> > +		nfs_post_op_update_inode_force_wcc(inode, fattr);
> > +	else
> >  		nfs_refresh_inode(inode, fattr);
> > +out_free:
> >  	nfs_free_fattr(fattr);
> >  out:
> >  	return error;
> > --
> > 1.7.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> > in the body of a message to majordomo@vger.kernel.org More
> majordomo
> > info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
  2011-07-19  0:09 Myklebust, Trond
@ 2011-11-01 20:27 ` J. Bruce Fields
  2011-11-01 23:07   ` Myklebust, Trond
  0 siblings, 1 reply; 14+ messages in thread
From: J. Bruce Fields @ 2011-11-01 20:27 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: J. Bruce Fields, trond, linux-nfs

On Mon, Jul 18, 2011 at 08:09:15PM -0400, Myklebust, Trond wrote:
> We should already optimize away the unnecessary setting of the size.

Do you remember what commit fixed that?  (Was it an nfs change or a vfs
change?)

> The problem is that truncate() still requires you to set the ctime, whereas ftruncate() does not iirc.

Staring at the code....  I think you mean the opposite?  I notice
do_sys_ftruncate() calling

	do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, file);

and do_sys_truncate() calling

	do_truncate(path.dentry, length, 0, NULL);

where the third argument is getting OR'd with ATTR_FILE to pass into
notify_change().

Also even when a setattr does get through, I don't understand why it
should be invalidating our data cache.  Is there some reason it needs
to, or is this just a case that hasn't seemed worth fixing?

--b.

> 
> "J. Bruce Fields" <bfields@redhat.com> wrote:
> 
> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> We recently noticed that NFSv4 iozone read tests that should have been
> performing at local-cache speeds were running at wire speeds, and found
> that currently,
> 
> 	open(O_RDWR|O_TRUNC)
> 	write()
> 	close()
> 	open(O_RDONLY)
> 	read()
> 
> results in an over-the-wire read (due to a setattr triggered by
> O_TRUNC).  Even non-O_TRUNC opens send setattrs (of timestamps) in some
> cases, causing the same problem.
> 
> In discussion with Trond, he blames a VFS bug for breaking what should
> be a single open into an open followed by a setattr.
> 
> However, it may make sense even in a case like
> 
> 	open(O_RDWR)
> 	write()
> 	setattr()
> 	close()
> 	open(O_RDONLY)
> 	read()
> 
> to treat the setattr similarly to the write, and not invalidate the
> client's own cache as long as the setattr is performed under a write
> open.
> 
> (That said, I don't know if this is the correct place in the code to
> implement that behavior.)
> 
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/nfs/inode.c |    7 ++++++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 6f4850d..d4eed06 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -438,8 +438,13 @@ nfs_setattr(struct dentry *dentry, struct iattr *attr)
>  	if ((attr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0)
>  		nfs_inode_return_delegation(inode);
>  	error = NFS_PROTO(inode)->setattr(dentry, fattr, attr);
> -	if (error == 0)
> +	if (error)
> +		goto out_free;
> +	if (attr->ia_valid & ATTR_FILE)
> +		nfs_post_op_update_inode_force_wcc(inode, fattr);
> +	else
>  		nfs_refresh_inode(inode, fattr);
> +out_free:
>  	nfs_free_fattr(fattr);
>  out:
>  	return error;
> -- 
> 1.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nfs: open-associated setattr shouldn't invalidate own cache
@ 2011-07-19  0:09 Myklebust, Trond
  2011-11-01 20:27 ` J. Bruce Fields
  0 siblings, 1 reply; 14+ messages in thread
From: Myklebust, Trond @ 2011-07-19  0:09 UTC (permalink / raw)
  To: J. Bruce Fields, trond; +Cc: linux-nfs

We should already optimize away the unnecessary setting of the size. The problem is that truncate() still requires you to set the ctime, whereas ftruncate() does not iirc.

"J. Bruce Fields" <bfields@redhat.com> wrote:

From: "J. Bruce Fields" <bfields@redhat.com>

We recently noticed that NFSv4 iozone read tests that should have been
performing at local-cache speeds were running at wire speeds, and found
that currently,

	open(O_RDWR|O_TRUNC)
	write()
	close()
	open(O_RDONLY)
	read()

results in an over-the-wire read (due to a setattr triggered by
O_TRUNC).  Even non-O_TRUNC opens send setattrs (of timestamps) in some
cases, causing the same problem.

In discussion with Trond, he blames a VFS bug for breaking what should
be a single open into an open followed by a setattr.

However, it may make sense even in a case like

	open(O_RDWR)
	write()
	setattr()
	close()
	open(O_RDONLY)
	read()

to treat the setattr similarly to the write, and not invalidate the
client's own cache as long as the setattr is performed under a write
open.

(That said, I don't know if this is the correct place in the code to
implement that behavior.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfs/inode.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 6f4850d..d4eed06 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -438,8 +438,13 @@ nfs_setattr(struct dentry *dentry, struct iattr *attr)
 	if ((attr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0)
 		nfs_inode_return_delegation(inode);
 	error = NFS_PROTO(inode)->setattr(dentry, fattr, attr);
-	if (error == 0)
+	if (error)
+		goto out_free;
+	if (attr->ia_valid & ATTR_FILE)
+		nfs_post_op_update_inode_force_wcc(inode, fattr);
+	else
 		nfs_refresh_inode(inode, fattr);
+out_free:
 	nfs_free_fattr(fattr);
 out:
 	return error;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-11-03 21:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-18 22:23 [PATCH] nfs: open-associated setattr shouldn't invalidate own cache J. Bruce Fields
2011-07-19  0:09 Myklebust, Trond
2011-11-01 20:27 ` J. Bruce Fields
2011-11-01 23:07   ` Myklebust, Trond
2011-11-02  0:43     ` Jeff Layton
2011-11-02  1:23       ` J. Bruce Fields
2011-11-02  1:36         ` Myklebust, Trond
2011-11-02 11:07         ` Jeff Layton
2011-11-02 14:46           ` J. Bruce Fields
2011-11-02 15:54             ` Jeff Layton
2011-11-03 20:44               ` J. Bruce Fields
2011-11-03 21:03                 ` Trond Myklebust
2011-11-03 21:20                   ` J. Bruce Fields
2011-11-03 21:39                     ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).