From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756209AbXLHAwq (ORCPT ); Fri, 7 Dec 2007 19:52:46 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753258AbXLHAwi (ORCPT ); Fri, 7 Dec 2007 19:52:38 -0500 Received: from mx1.redhat.com ([66.187.233.31]:34258 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753020AbXLHAwh (ORCPT ); Fri, 7 Dec 2007 19:52:37 -0500 Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: References: <26C82FDD-C778-4034-A3CF-CB1C83A0C90C@oracle.com> <6306.1196874660@redhat.com> <25619.1196904168@redhat.com> <21053.1196971251@redhat.com> To: Chuck Lever Cc: dhowells@redhat.com, Peter Staubach , Trond Myklebust , nfsv4@linux-nfs.org, linux-kernel@vger.kernel.org Subject: Re: How to manage shared persistent local caching (FS-Cache) with NFS? X-Mailer: MH-E 8.0.3+cvs; nmh 1.2-20070115cvs; GNU Emacs 23.0.50 Date: Sat, 08 Dec 2007 00:52:15 +0000 Message-ID: <16896.1197075135@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Chuck Lever wrote: > Why not encode the local mounted-on directory in the key? Can't. Namespaces. chroot. > Meaning your cache is at quota all the time, and to continue operation it must > eject items constantly. I've thought about that, thank you. Go and read the documentation. There's configurable hysteresis in the culling algorithm. > This is a scenario where it pays to cache the read-mostly items on disk, and > leave the frequently changing items in memory. Currently any file which is opened for writing is automatically ejected from the cache. > The economics of disk caches is different than memory caches. Disk caches are > much larger and cheaper, but their performance tanks when they have to track > frequently changing files. Memory caches are smaller, but tracking > frequently changing data is only a little more expensive than tracking data > that doesn't change often. I'm aware of all that. My OLS slides and paper can be found here: http://people.redhat.com/~dhowells/fscache/fscache-ols2006.odp http://people.redhat.com/~dhowells/fscache/FS-Cache.pdf Lots of small files also hurt worse than fewer big files in some ways. Lots more metadata in the cache. On the other hand, fragmentation is less of a problem. Anyway, this is straying off the main topic. > I think it's key to preventing FS-cache from making performance worse in many > common scenarios. Perhaps. The problem is that NFS doesn't know what the access pattern on a file is expected to be. I've been asked to provide fine-grained cache controls (perhaps directory level), but Al Viro was, erm, luke warm in his reception to that idea. Gathering statistical data dynamically has performance penalties of its own:-/ > Disconnected operation for NFS is fraught with challenges. Access to data on > servers is traditionally gated by the client's IP address, for example. The > client may disconnect from the network, then reconnect using a different > address where suddenly all of its accesses are rebuffed. Agreed, but isn't that one of the design goals for NFS4? It's also something of interest to other netfs's that might want to use FS-Cache. This isn't an NFS-only facility. > NFS servers, not clients, traditionally determine the file's mtime and ctime, > and its file handle. So file updates and file creation become problematic. > The client has to reconcile the server's file handle, for files created > offline, with its own when reconnecting. Yes. Basically it's a major can of major worms. Doesn't stop people wanting it, though. > And, for disconnected operation, the cache is required to contain every item > from the remote. You can't just drop items from the cache because they are > inconvenient. Yes. That's what pinning and reservations are for. Currently, support for disconnected operations is an idea I'd like to have, but is otherwise mostly non-existent. > That something might be the pathname of the mounted-on directory or of the > file itself. See above. > Yes, they do. The combination of mount options and mounted-on directory (or > local pathname to the file) gives you a unique identity for that view. See above. > So an item is cached in memory until space becomes available in the disk > cache? The item isn't considered for caching until space becomes available in the disk cache. It's put on a queue for potential caching, but won't actually be cached if it gets discarded from the icache or pagecache before being cached. It's unfortunate, but with a fast network you can download data faster than you can make space in the cache. unlink() and rmdir() are (a) slow and (b) synchronous. Each unlink() or rmdir() operation requires a task to perform it, and that task is committed until the op finishes. I could actually improve cachefilesd (the userspace cache culler) by giving it multiple threads. However, having cachefilesd doing lots of parallel synchronous, journalled disk ops hurts performance in other ways I've noticed:-/ Again, hysteresis is available. We stop writing stuff into the cache beyond a limit until the free space drops sufficiently below that limit that we've got a good go at writing a load new stuff, rather than just a block here and a block there. It's all very icky, and depends as much on the filesystem underlying the cache (ext3 for example) and *its* configuration, as the characteristics of the netfs and the network link. It's all about compromise. David