linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever <cel@citi.umich.edu>
To: Daniel Phillips <phillips@arcor.de>
Cc: Andrew Morton <akpm@digeo.com>,
	Rik van Riel <riel@conectiva.com.br>,
	<trond.myklebust@fys.uio.no>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: invalidate_inode_pages in 2.5.32/3
Date: Tue, 10 Sep 2002 15:04:28 -0400 (EDT)	[thread overview]
Message-ID: <Pine.BSO.4.33.0209101412300.5368-100000@citi.umich.edu> (raw)
In-Reply-To: <E17oneD-0007Dk-00@starship>

On Tue, 10 Sep 2002, Daniel Phillips wrote:

> On Tuesday 10 September 2002 17:09, Chuck Lever wrote:
> > i can answer the question "when does the NFS client purge a file's cached
> > data?"
> >
> > there are four major categories:
> >
> > a.  directory changes require any cached readdir results be purged.
>
> That is, the client changes the directory itself?  I suppose an NFS
> server is incapable of reporting directory changes caused by other
> clients, because of being stateless.

when a client itself changes a directory, it must purge it's own readdir
cache.  this is because the server is really in control of the contents,
which are treated as more or less opaque by the client.  the next time any
application on the client wants to read the directory, the client will go
back to the server to get the updated contents.

the NFS server doesn't report changes, the client must infer them from
changes in a file's size and mtime.  such changes are detected this way
for directories as well as files.  Trond may have added some code recently
that also "flushes" the dentry cache for a directory when such a change is
detected for a directory.

> > c.  when a file is locked or unlocked via lockf/fcntl, all pending writes
> >     are pushed back to the server and any cached data in the page cache is
> >     purged before the lock/unlock call returns.
>
> Do you mean, the client locks/unlocks the file, or some other client?

when a client locks or unlocks a file, it flushes pending writes and
purges its read cache.

> I'm trying to fit this into my model of how the server must work.  It
> must be that the locked/unlocked state is recorded at the server, in
> the underlying file, and that the server reports the locked/unlocked
> state of the file to every client via attr results.

no, lock management is handled by an entirely separate protocol.  the
server maintains lock state separately from the actual files, as i
understand it.

> So now, why purge at *both* lock and unlock time?

this is a little long.

clients use what's called "close-to-open" cache consistency to try to
maintain a single-system image of the file.  this means when a file is
opened, the client checks with the server to get the latest version of the
file data and metadata (or simply to verify that the client can continue
using whatever it has cached).  when a file is closed, the client always
flushes any pending writes to the file back to the server.

in this way, when client A closes a file and subsequently client B opens
the same file, client B will see all changes made by client A before it
closed the file.  note that while A still has the file open, B may not see
all changes made by A, unless an application on A explicitly flushes the
file.  this is a compromise between tight cache coherency and good
performance.

when locking or unlocking a file, the idea is to make sure that other
clients can see all changes to the file that were made while it was
locked.  locking and unlocking provide tighter cache coherency than simple
everyday close-to-open because that's why applications go to the trouble
of locking a file -- they expect to share the contents of the file with
other applications on other clients.

when a file is locked, the client wants to be certain it has the latest
version of the file for an application to play with.  the cache is purged
to cause the client to read any recent changes from the server.  when a
file is unlocked, the client wants to share its changes with other clients
so it flushes all pending writes before allowing the unlocking application
to proceed.

> > i don't want to speculate too much without Trond around to keep me honest.
> > however, i think what we want here is behavior that is closer to category
> > c., with as few negative performance implications as possible.
>
> Actually, this is really, really useful and gives me lots pointers I
> can follow for more details.

i'm very happy to be able to include more brains in the dialog!

> > i think one way to accomplish this is to create two separate revalidation
> > functions -- one that can be used by synchronous code in the NFS client
> > that uses the 100% bug killer, and one that can be used from async RPC
> > tasks that simply marks that a purge is necessary, and next time through
> > the sync one, the purge actually occurs.
>
> That would certainly be easy from the VM side, then we could simply
> use a derivative of vmtruncate that leaves the file size alone, as
> Andrew suggested.

the idea is that only the NFS client would have to worry about getting
this right, and would invoke the proper VM hammer when it is safe to do
so.  that way, NFS-specific weirdness can be kept in fs/nfs/*.c, and not
ooze into the VM layer.

> > the only outstanding issue then is how to handle pages that are dirtied
> > via mmap'd files, since they are touched without going through the NFS
> > client.

[ ... various ideas snipped ... ]

> You want to know about the dirty pages only so you can send them
> to the server, correct?  Not because the client needs to purge
> anything.

i'm thinking only at the level of what behavior we want, not how to
implement it, simply because i'm not familiar enough with how it works
today.  i've researched how this behaves (more or less) on a reference
implementation of an NFS client by asking someone who worked on the
Solaris client.

the thinking is that if applications running on two separate clients have
a file mmap'd, the application designers already know well enough that
dirtying the same regions of the file on separate clients will have
nondeterministic results.  thus the only good reason that two or more
clients would mmap the same file and dirty some pages would be to modify
different regions of the file, or there is some application-level
serialization scheme in place to keep writes to the file in a
deterministic order.

thus, when a client "revalidates" an mmap'd file and discovers that the
file was changed on the server by another client, the reference
implementation says "go ahead and flush all the writes you know about,
then purge the read cache."

so the only problem is finding the dirty pages so that the client can
schedule the writes.  i think this needs to happen after a server change
is detected but before the client schedules any new I/O against the file.

today, dirty mmap'd pages are passed to the NFS client via the writepage
address space operation.  what more needs to be done here?  is there a
mechanism today to tell the VM layer to "call writepage for all dirty
mmap'd pages?"

	- Chuck Lever
--
corporate:	<cel at netapp dot com>
personal:	<chucklever at bigfoot dot com>


  reply	other threads:[~2002-09-10 18:59 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-05 14:25 invalidate_inode_pages in 2.5.32/3 Chuck Lever
2002-09-05 18:27 ` Andrew Morton
2002-09-05 18:53   ` Chuck Lever
2002-09-05 19:17     ` Andrew Morton
2002-09-05 20:00       ` Trond Myklebust
2002-09-05 20:15         ` Andrew Morton
2002-09-05 20:27           ` Trond Myklebust
2002-09-05 20:37             ` Andrew Morton
2002-09-05 20:51               ` Trond Myklebust
2002-09-05 21:12                 ` Andrew Morton
2002-09-05 21:31                   ` Trond Myklebust
2002-09-05 22:19                     ` Andrew Morton
2002-09-06  0:48                       ` Trond Myklebust
2002-09-06  1:08                         ` Andrew Morton
2002-09-06  6:49                           ` Trond Myklebust
2002-09-07  8:37                           ` Daniel Phillips
2002-09-07 16:09                             ` Andrew Morton
2002-09-07 17:02                               ` Andrew Morton
2002-09-07  8:24                       ` Daniel Phillips
2002-09-07 16:06                         ` Andrew Morton
2002-09-09 21:08                           ` Daniel Phillips
2002-09-09 21:36                             ` Andrew Morton
2002-09-09 22:12                               ` Daniel Phillips
2002-09-07 18:47                         ` Rik van Riel
2002-09-07 23:09                           ` Andrew Morton
2002-09-09 21:44                             ` Daniel Phillips
2002-09-09 22:03                               ` Andrew Morton
2002-09-09 22:19                                 ` Daniel Phillips
2002-09-09 22:32                                   ` Andrew Morton
2002-09-10 16:57                                     ` Daniel Phillips
2002-09-09 23:51                                   ` Chuck Lever
2002-09-10  1:07                                     ` Daniel Phillips
2002-09-10 15:09                                       ` Chuck Lever
2002-09-10 16:13                                         ` Daniel Phillips
2002-09-10 19:04                                           ` Chuck Lever [this message]
2002-09-10 20:52                                             ` Daniel Phillips
2002-09-11  0:07                                               ` Andrew Morton
2002-09-11  0:27                                                 ` Daniel Phillips
2002-09-11  0:38                                                   ` Andrew Morton
2002-09-11  0:53                                                     ` Daniel Phillips
2002-09-11  1:49                                                       ` Andrew Morton
2002-09-11  2:14                                                         ` Daniel Phillips
2002-09-11 16:18                                                 ` Rik van Riel
2002-09-11 17:14                                                   ` Daniel Phillips
2002-09-12 19:06                                             ` Daniel Phillips
2002-09-12 22:05                                         ` Urban Widmark
2002-09-12 22:21                                           ` Andrew Morton
2002-09-12 22:30                                             ` Rik van Riel
2002-09-12 22:43                                               ` Daniel Phillips
2002-09-12 22:51                                               ` Andrew Morton
2002-09-12 23:05                                                 ` Randy.Dunlap
2002-09-12 23:23                                                 ` Rik van Riel
2002-09-12 23:53                                                   ` Daniel Phillips
2002-09-23 16:38                                                 ` Trond Myklebust
2002-09-23 17:16                                                   ` Daniel Phillips
2002-09-23 18:57                                                   ` Andrew Morton
2002-09-23 20:41                                                     ` Trond Myklebust
2002-09-23 20:49                                                       ` Daniel Phillips
2002-09-23 22:43                                                         ` Trond Myklebust
2002-09-24  5:09                                                           ` Daniel Phillips
2002-09-24 16:40                                                             ` Trond Myklebust
2002-09-23 19:13                                                   ` Daniel Phillips
2002-09-13  4:19                                               ` Daniel Phillips
2002-09-13  4:52                                               ` Daniel Phillips
2002-09-14  9:58                                             ` Urban Widmark
2002-09-12 13:04                                     ` Trond Myklebust
2002-09-12 18:21                                       ` Andrew Morton
2002-09-12 21:15                                         ` Daniel Phillips
2002-09-12 21:38                                           ` Andrew Morton
2002-09-12 21:52                                             ` Daniel Phillips
2002-09-05 22:01                   ` Chuck Lever
2002-09-05 22:23                     ` Andrew Morton
2002-09-05 21:41           ` Chuck Lever
2002-09-06  9:35     ` Helge Hafting
2002-09-06 16:16       ` Chuck Lever
2002-09-07  8:01   ` Daniel Phillips
2002-09-07 10:01     ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.BSO.4.33.0209101412300.5368-100000@citi.umich.edu \
    --to=cel@citi.umich.edu \
    --cc=akpm@digeo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=phillips@arcor.de \
    --cc=riel@conectiva.com.br \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).