From: Chuck Lever <cel@citi.umich.edu>
To: Steve Dickson <SteveD@redhat.com>
Cc: nfs@lists.sourceforge.net, linux-fsdevel@vger.kernel.org
Subject: Re: [NFS] Re: [PATCH][RFC] NFS: Improving the access cache
Date: Wed, 03 May 2006 00:42:51 -0400 [thread overview]
Message-ID: <445834CB.4050408@citi.umich.edu> (raw)
In-Reply-To: <44572B33.4070100@RedHat.com>
Steve Dickson wrote:
> Talking with Trond, he would like to do something slightly different
> which I'll outline here to make sure we are all on the same page....
>
> Basically we would maintain one global hlist (i.e. link list) that
> would contain all of the cached entries; then each nfs_inode would
> have its own LRU hlist that would contain entries that are associated
> with that nfs_inode. So each entry would be on two lists, the
> global hlist and hlist in the nfs_inode.
>
> We would govern memory consumption by only allowing 30 entries
> on any one hlist in the nfs_inode and by registering the globe
> hlist with the VFS shrinker which will cause the list to be prune
> when memory is needed. So this means, when the 31st entry was added
> to the hlist in the nfs_inode, the least recently used entry would
> be removed.
>
> Locking might be a bit tricky, but do able... To make this scalable,
> I would think we would need global read/write spin_lock. The read_lock()
> would be taken when the hlist in the inode was searched and the
> write_lock() would taken when the hlist in the inode was changed
> and when the global list was prune.
For the sake of discussion, let me propose some design alternatives.
1. We already have cache shrinkage built in: when an inode is purged
due to cache shrinkage, the access cache for that inode is purged as
well. In other words, there is already a mechanism for external memory
pressure to shrink this cache. I don't see a strong need to complicate
matters by adding more cache shrinkage than already exists with normal
inode and dentry cache shrinkage.
Now you won't need to hold a global lock to serialize normal accesses
with purging and cache garbage collection. Eliminating global
serialization is a Good Thing (tm).
2. Use a radix tree per inode. The radix tree key is a uid or gid, and
each node in a tree stores the access mask for that {inode, uid} tuple.
This seems a lot simpler to implement than a dual hlist, and will
scale automatically with a large number of uids accessing the same
inode. The nodes are small, and you don't need to allocate a big chunk
of contiguous memory for a hash table.
3. Instead of serializing by spinning, you should use a semaphore. The
reason for this is that when multiple processes owned by the same uid
access the same inode concurrently, only the first process should be
allowed to generate a real ACCESS request; otherwise they will race and
potentially all of them could generate the same ACCESS request concurrently.
You will need to serialize on-the-wire requests with accesses to the
cache, and such wire requests will need the waiting processes to sleep,
not spin.
4. You will need some mechanism for ensuring that the contents of the
access cache are "up to date". You will need some way of deciding when
to revalidate each {inode, uid} tuple. Based on what Peter said, I
think you are going to check the inode's ctime, and purge the whole
access cache for an inode if its ctime changes. But you may need
something like an nfs_revalidate_inode() before you proceed to examine
an inode's access cache. It might be more efficient to generate just an
ACCESS request instead of a GETATTR followed by an ACCESS, but I don't
see an easy way to do this given the current inode revalidation
architecture of the client.
5. You need to handle ESTALE. Often, ->permission is the first thing
the VFS will do before a lookup or open, and that is when the NFS client
first notices that a cached file handle is stale. Should ESTALE
returned on an ACCESS request mean always return permission denied, or
should it mean purge the access cache and grant access, so that the next
VFS step sees the ESTALE and can recover appropriately?
--
corporate: <cel at netapp dot com>
personal: <chucklever at bigfoot dot com>
next prev parent reply other threads:[~2006-05-03 4:42 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-26 1:14 [PATCH][RFC] NFS: Improving the access cache Steve Dickson
2006-04-26 1:31 ` Matthew Wilcox
2006-04-26 4:55 ` Neil Brown
2006-04-26 14:51 ` Steve Dickson
2006-04-26 22:32 ` Neil Brown
2006-05-02 9:49 ` Steve Dickson
2006-05-02 13:51 ` [NFS] " Peter Staubach
2006-05-02 14:38 ` Steve Dickson
2006-05-02 14:51 ` Peter Staubach
2006-05-02 15:26 ` [NFS] " Ian Kent
2006-05-03 4:42 ` Chuck Lever [this message]
2006-05-05 14:07 ` Steve Dickson
2006-05-05 14:53 ` Peter Staubach
2006-05-05 14:59 ` Peter Staubach
2006-05-06 14:35 ` [NFS] " Steve Dickson
2006-05-08 14:07 ` Peter Staubach
2006-05-08 17:09 ` Trond Myklebust
2006-05-08 17:20 ` Peter Staubach
2006-05-08 17:37 ` Steve Dickson
2006-05-08 2:44 ` [NFS] " Neil Brown
2006-05-08 3:23 ` Chuck Lever
2006-05-08 3:28 ` Neil Brown
2006-04-26 13:03 ` Trond Myklebust
2006-04-26 13:03 ` Trond Myklebust
2006-04-26 13:14 ` Peter Staubach
2006-04-26 13:14 ` Peter Staubach
2006-04-26 14:01 ` Trond Myklebust
2006-04-26 14:01 ` Trond Myklebust
2006-04-26 14:15 ` Peter Staubach
2006-04-26 14:15 ` Peter Staubach
2006-04-26 15:44 ` Trond Myklebust
2006-04-26 17:01 ` Peter Staubach
2006-04-26 15:03 ` Steve Dickson
2006-04-26 15:03 ` Steve Dickson
2006-04-26 13:17 ` [NFS] " Chuck Lever
2006-04-26 14:19 ` Steve Dickson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=445834CB.4050408@citi.umich.edu \
--to=cel@citi.umich.edu \
--cc=SteveD@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.