From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754107Ab2BUTxI (ORCPT ); Tue, 21 Feb 2012 14:53:08 -0500 Received: from mx1.redhat.com ([209.132.183.28]:11417 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753170Ab2BUTw4 (ORCPT ); Tue, 21 Feb 2012 14:52:56 -0500 Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells Subject: [PATCH 18/73] union-mount: Union mounts documentation [ver #2] To: linux-fsdevel@vger.kernel.org, viro@ZenIV.linux.org.uk, valerie.aurora@gmail.com Cc: linux-kernel@vger.kernel.org, David Howells Date: Tue, 21 Feb 2012 17:59:47 +0000 Message-ID: <20120221175947.25235.58759.stgit@warthog.procyon.org.uk> In-Reply-To: <20120221175721.25235.8901.stgit@warthog.procyon.org.uk> References: <20120221175721.25235.8901.stgit@warthog.procyon.org.uk> User-Agent: StGIT/0.14.3 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Valerie Aurora Document design and implementation of union mounts (a.k.a. writable overlays). With corrections from Andreas Gruenbacher . Original-author: Valerie Aurora Signed-off-by: David Howells --- Documentation/filesystems/union-mounts.txt | 712 ++++++++++++++++++++++++++++ 1 files changed, 712 insertions(+), 0 deletions(-) create mode 100644 Documentation/filesystems/union-mounts.txt diff --git a/Documentation/filesystems/union-mounts.txt b/Documentation/filesystems/union-mounts.txt new file mode 100644 index 0000000..596bfe6 --- /dev/null +++ b/Documentation/filesystems/union-mounts.txt @@ -0,0 +1,712 @@ +Union mounts (a.k.a. writable overlays) +======================================= + +This document describes the architecture and current status of union mounts, +also known as writable overlays. + +In this document: + - Overview of union mounts + - Terminology + - VFS implementation + - Locking strategy + - VFS/file system interface + - Userland interface + - NFS interaction + - Status + - Contributing to union mounts + +Overview +======== + +A union mount layers one read-write file system over one or more read-only file +systems, with all writes going to the writable file system. The namespace of +both file systems appears as a combined whole to userland, with files and +directories on the writable file system covering up any files or directories +with matching pathnames on the read-only file system. The read-write file +system is the "topmost" or "upper" file system and the read-only file systems +are the "lower" file systems. A few use cases: + +- Root file system on CD with writes saved to hard drive (LiveCD) +- Multiple virtual machines with the same starting root file system +- Cluster with NFS mounted root on clients + +Most if not all of these problems could be solved with a COW block device or a +clustered file system (include NFS mounts). However, for some use cases, +sharing is more efficient and better performing if done at the file system +namespace level. COW block devices only increase their divergence as time goes +on, and a fully coherent writable file system is unnecessary synchronization +overhead if no other client needs to see the writes. + +What union mounts are not +------------------------- + +Union mounts are not a general-purpose unioning file system. They do not +provide a generic "union of namespaces" operation for an arbitrary number of +file systems. Many interesting features can be implemented with a generic +unioning facility: dynamic insertion and removal of branches, write policies +based on space available, online upgrade, etc. Some unioning file systems that +do this are UnionFS and AUFS. + +Terminology +=========== + +The main physical metaphor for union mounts is that a writable file system is +mounted "on top" of a read-only file system. Lookups start at the "topmost" +read-write file system and travel "down" to the "bottom" read-only file system +only if no blocking entry exists on the top layer. + +Topmost layer: The read-write file system. Lookups begin here. + +Bottom layer: The read-only file system. Lookups end here. + +Path: Combination of the vfsmount and dentry structure. + +Follow down: Given a path from the top layer, find the corresponding path on +the bottom layer. + +Follow up: Given a path from the bottom layer, find the corresponding path on +the top layer. + +Whiteout: A directory entry in the top layer that prevents lookups from +travelling down to the bottom layer. Created on unlink()/rmdir() if a +corresponding directory entry exists in the bottom layer. + +Opaque flag: A flag on a directory in the top layer that prevents lookups of +entries in this directory from travelling down to the bottom layer (unless +there is an explicit fallthru entry allowing that for a particular entry). Set +on creation of any new directory in in the topmost layer (that is, a directory +that does not have any matching visible directory below it). + +Fallthru: A directory entry which allows lookups to "fall through" to the +bottom layer for that exact directory entry. This serves as a placeholder for +directory entries from the bottom layer during readdir(). Fallthrus override +opaque flags. + +File copyup: Create a file on the top layer that has the same metadata and +contents as the file with the same pathname on the bottom layer. + +Directory copyup: Copy up the visible directory entries from the bottom layer +as fallthrus in the matching top layer directory. Mark the directory opaque to +avoid unnecessary negative lookups on the bottom layer. + +Examples +======== + +What happens when I... + +- creat() /newfile -> creates on topmost layer +- unlink() /oldfile -> creates a whiteout on topmost layer +- Edit /existingfile -> copies up to top layer at open(O_WR) time +- truncate /existingfile -> copies up to topmost layer + N bytes if specified +- touch()/chmod()/chown()/etc. -> copies up to topmost layer +- mkdir() /newdir -> creates opaque dir on topmost layer +- rmdir() /olddir -> creates a whiteout on topmost layer +- mkdir() /olddir after above -> creates opaque dir on topmost layer +- readdir() /shareddir -> copies up entries from bottom layer as + fallthrus, processes duplicates and whiteouts +- link() /oldfile /newlink -> copies up /oldfile, creates /newlink on + topmost layer +- symlink() /oldfile /symlink -> nothing special +- rename() /oldfile /newfile -> copies up /oldfile to /newfile on top layer, + whiteouts /oldfile +- rename() /olddir /newdir -> EXDEV +- rename() /topmost_only_dir /topmost_only_dir2 -> success +- stat() /oldfile - inode & dev from lower layer +- stat() /newfile - inode & dev from topmost layer +- readdir() /shareddir - d_ino & d_type from lower layer on fallthrus + +Getting to a root file system with union mounts: + +- Mount the base read-only file system as the root file system +- Mount the read-only file system again on /newroot +- Mount the read-write layer on /newroot: + # mount -o union /dev/sda /newroot +- pivot_root to /newroot +- Start init + +See scripts/pivot.sh in the UML devkit linked to from: + +http://valerieaurora.org/union/ + +VFS implementation +================== + +Union mounts are implemented as an integral part of the VFS, rather than as a +VFS client file system (i.e., a stacked file system like unionfs or ecryptfs). +Implementing unioning inside the VFS eliminates the need for duplicate copies +of VFS data structures, unnecessary indirection, and code duplication, but +requires very maintainable, low overhead code. Union mounts require no change +to file systems serving as the read-only layer, and requires some minor support +from file systems serving as the read-write layer. File systems that want to +be the writable layer must implement the new ->whiteout() and ->fallthru() +inode operations, which create special dummy directory entries. + +The union mounts code must accomplish the following major tasks: + +1) Pass lookups through to the lower level file system. +2) Copy files and directories up to the topmost layer when written. +3) Create whiteouts and fallthrus as necessary. + +VFS objects and union mounts +---------------------------- + +First, some VFS basics: + +The VFS allows multiple mounts of the same file system. For example, /dev/sda +can be mounted at /usr and also at /mnt. The same file system can be mounted +read-only at one point and read-write at another. Each of these mounts has its +own vfsmount data structure in the kernel. However, each underlying file +system has exactly one in-kernel superblock structure no matter how many times +it is mounted. All the separate vfsmounts for the same file system reference +the same superblock data structure. + +Directory entries are cached by the VFS in dentry structures. The VFS keeps +one dentry structure for each file or directory in a file system, no matter how +many times it is mounted. Each dentry represents only one element of a path +name. When the VFS looks up a pathname (e.g., "/sbin/init"), the result is a +combination of vfsmount and dentry. This pair is usually stored +in a kernel structure named "path", which is simply two pointers, one to the +vfsmount and one to the dentry. A "struct path" is this structure; a pathname +is a string like "/etc/fstab". + +In union mounts, a file system can only be the topmost layer for one union +mount. A file system can be part of multiple union mounts if it is a read-only +layer. So dentries in the read-only layers can be part of multiple unions, +while a dentry in the read-write layer can only be part of one unin. + +union_dir structure +--------------------- + +The first job of union mounts is to map directories from the topmost layer to +directories with the same pathname in the lower layer. That is, given the + pair for a directory pathname in the topmost layer, we need to +find all the pairs for the directory with the same pathname in the +lower layer. We do this with the union_dir structure, which is an array +containing struct paths (mnt, dentry pointer pairs) for each directory unioned +with the topmost union. The array is pointed to from the new d_union_stack +member of struct dentry. + +/* + * The union_stack structure. It is an array of struct paths of + * directories below the topmost directory in a unioned directory, The + * topmost dentry has a pointer to this structure. The topmost dentry + * can only be part of one union, so we can reference it from the + * dentry, but lower dentries can be part of multiple union stacks. + * + * The number of dirs actually allocated is kept in the superblock, + * s_union_count. + */ +struct union_stack { + struct path u_dirs[0]; +}; + +This structure is flexible enough to support an arbitrary number of layers of +unioned file systems. Since there can be more than two layers, this section +will talk about mapping "upper" directories to "lower" directories, instead of +"topmost" directories to "bottom" directories. + +Traversing the union stack +-------------------------- + +The set of union_dir structures referring to a particular pathname are called +collectively the union stack for that directory. To traverse the union stack, +iterate through the number of layers in the union (stored in sb->s_union_count) +with union_find_dir(). Example: freeing the union stack: + +void d_free_unions(struct dentry *topmost) +{ + struct path *path; + unsigned int i, layers = topmost->d_sb->s_union_count; + + if (!IS_DIR_UNIONED(topmost)) + return; + + for (i = 0; i < layers; i++) { + path = union_find_dir(topmost, i); + if (path->mnt) + path_put(path); + } + kfree(topmost->d_union_stack); + topmost->d_union_stack = NULL; +} + +Code paths +---------- + +Union mounts modify the following key code paths in the VFS: + +- mount()/umount() +- Pathname lookup +- Any path that modifies an existing file + +Mount +----- + +Union mounts are created in two steps: + +1. Mount the read-only layer file systems read-only in the usual manner, all on +the same mountpoint. Submounts are permitted as long as they are also +read-only and not shared (part of a mount propagation group). + +2. Mount the top layer with the "-o union" option at the same mountpoint. All +read-only file systems mounted at this mountpoint will be included in the union +mount. + +The bottom layers must be read-only and the top layer must be read-write and +support whiteouts and fallthrus. A file system that supports whiteouts and +fallthrus indicates this by setting the MS_WHITEOUT and MS_FALLTHRU flags in +the superblock. Currently, the top layer is forced to "noatime" to avoid a +copyup on every access of a file. Supporting atime with the current +infrastructure would require a copyup on every open(). The "relatime" option +would be equally efficient if the atime is the same or more recent than the +mtime/ctime for every object on the read-only file system, and if the 24-hour +timeout on relatime was disabled. However, this is probably not worthwhile for +the majority of union mount use cases. + +File systems can only be union mounted at their root directories, for +simplicity and performance. + +pivot_root() to a union mounted file system is supported. The recommended way +to get to a union mounted root file system is to boot with the read-only mount +as the root file system, construct the union mount on an entirely new mount, +and pivot_root() to the new union mount root. Attempting to union mount the +root file system later in boot will result in covering other file systems, +e.g., /proc, which isn't permitted in the current code and is a bad idea +anyway. + +Hard read-only file systems +--------------------------- + +Union mounts require the lower layer of the file system to be read-only. +However, in Linux, any individual file system may be mounted at multiple places +in the namespace, and a file system can be changed from read-only to read-write +while still mounted. Thus, simply checking that the bottom layer is read-only +at the time the writable overlay is mounted over it is pointless, since at any +time the bottom layer may become read-write. + +We have to guarantee that a file system will be read-only for as long as it is +the bottom layer of a union mount. To do this, we track the number of hard +read-only users of a file system in its VFS superblock structure. When we +union mount a writable overlay over a file system, we increment its read-only +user count. The file system can only be mounted read-write if its read-only +users count is zero. + +Todo: + +- Support hard read-only NFS mounts. See discussion here: + + http://markmail.org/message/3mkgnvo4pswxd7lp + +Pathname lookup +--------------- + +Pathname lookup in a unioned directory traverses down the union stack for the +parent directory, looking up each pathname element in each layer of the file +system (according to the rules of whiteouts, fallthrus, and opaque flags). At +mount time, the union stack for the root directory of the file system is +created, and the union stack creation for every other unioned directory in the +file system is boot-strapped using the already-existing union stack of the +directory's parent. In order to simplify the code greatly, every visible +directory on the lower file system is required to have a matching directory on +the upper file system. If this matching directory does not already exist, it +is created during pathname lookup. Therefore, each unioned directory is the +child of another unioned directory (or is the root directory of the file +system). + +The actual union lookup function is called in the following code paths: + +do_lookup()->do_union_lookup()->lookup_union()->__lookup_union() +lookup_hash()->lookup_union()->__lookup_union() + +__lookup_union() is where the rules of whiteouts, fallthrus, and opaque flags +are actually implemented. __lookup_union() returns either the first visible +dentry, or a negative dentry from the topmost file system if no matching dentry +exists. If it finds a directory, it looks up any potential matching lower +layer directories. If it finds a lower layer directory, it first creates the +topmost dir if necessary via union_create_topmost_dir(), and then calls +union_add_dir() to append the lower directory to the end of the union stack. + +Note that not all directories in a union mount are unioned, only those with +matching directories on the lower layer. The macro IS_DIR_UNIONED() is a +cheap, constant time way to check if a directory is unioned, while +IS_MNT_UNION() checks if the entire mount is unioned (and therefore whether the +directory in question is potentially unioned). + +Currently, lookup of a negative dentry or a directory with no matching +directories below it requires a lookup in every directory in the union stack +every time it is looked up. We could avoid subsequent lookups by adding the +equivalent of a negative dcache entry. + +File copyup +----------- + +Any system call that alters the data or metadata of a file on the bottom layer, +or creates or changes a hard link to it will trigger a copyup of the target +file from the lower layer to the topmost layer + + - open(O_WRITE | O_RDWR | O_APPEND) + - truncate()/open(O_TRUNC) + - link() + - rename() + - chmod() + - chown()/lchown() + - utimes() + - setxattr()/lsetxattr() + +Copyup of a file due to open(O_WRITE) has already occurred when: + + - write() + - ftruncate() + - writable mmap() + +The following system calls will fail on an fd opened O_RDONLY: + + - fchmod() + - fchown() + - fsetxattr() + - futimensat() + +Contrary to common sense, the above system calls are defined to succeed on +O_RDONLY fds. The idea seems to be that the O_RDONLY/O_RDWR/O_WRITE flags only +apply to the actual file data, not to any form of metadata (times, owner, mode, +or even extended attributes). Applications making these system calls on +O_RDONLY fds are correct according to the standard and work on non-union +mounts. They will need to be rewritten (O_RDONLY -> O_RDWR) to work on union +mounts. We suspect this usage is uncommon. + +This deviation from standard is due to technical limitations of the union mount +implementation. Specifically, we would need to replace an open file descriptor +from the lower layer with an open file descriptor for a file with matching +pathname and contents on the upper layer, which is difficult to do. We avoid +this in other system calls by doing the copyup before the file is opened. +Unionfs doesn't encounter this problem because it creates a dummy file struct +which redirects or fans out operations to the struct files for the underlying +file systems. + +From an application's point of view, the result of an in-kernel file copyup is +the logical equivalent of another application updating the file via the +rename() pattern: creat() a new file, copy the data over, make changes the +copy, and rename() over the old version. Any existing open file descriptors +for that file (including those in the same application) refer to a now +invisible object that used to have the same pathname. Only opens that occur +after the copyup will see updates to the file. + +Permission checks +----------------- + +We want to be sure we have the correct permissions to actually succeed in a +system call before copying a file up to avoid unnecessary IO. At present, the +permission check for a single system call may be spread out over many hundreds +of lines of code (e.g., open()). In order to check permissions, we +occasionally need to determine if there is a writable overlay on top of this +inode. This requires a full path, but often we only have the inode at this +point. In particular, inode_permission() returns EROFS if the inode is on a +read-only file system, which is the wrong answer if there is a writable overlay +mounted on top of it. + +The current solution is to split out the file-system-wide permission checks +from the per-inode permission checks. inode_permission() becomes: + +sb_permission() +__inode_permission() + +inode_permission() calls sb_permission() and __inode_permission() on the same +path. We create path_permission() which calls sb_permission() on the parent +directory from the top layer, and __inode_permission() on the target on the +lower layer. This gets us the correct write permissions consdering that the +file will be copied up. + +Todo: + + - Currently, we don't deal with differing directory permissions at + different levels of the stack. This is a bug. + +Impact on non-union kernels and mounts +-------------------------------------- + +Union-related data structures, extra fields, and function calls are #ifdef'd +out at the function/macro level with CONFIG_UNION_MOUNT in nearly all cases +(see fs/union.h). When CONFIG_UNION_MOUNT is enabled, struct dentry has one +more pointer, reducing the size of dentry names stored in the dentry itself by +4 to 8 bytes. + +Todo: + + - Do performance tests + +Locking strategy +================ + +The current union mount locking strategy is based on the following +rules: + +* The lower layer file system is always read-only +* The topmost file system is always read-write + => A file system can never a topmost and lower layer at the same time + +Additionally, the topmost layer may only be mounted exactly once. Don't think +of the topmost layer as a separate independent file system; when it is part of +a union mount, it is only a file system in conjunction with the read-only +bottom layer. The read-only bottom layer is an independent file system in and +of itself and can be mounted elsewhere, including as the bottom layer for +another union mount. + +Thus, we may define a stable locking order in terms of top layer and bottom +layer locks, since a top layer is never a bottom layer and a bottom layer is +never a top layer. Another simplifying assumption is that all directories in a +pathname exist on the top layer, as they are created step-by-step during +lookup. This prevents us from ever having to walk backwards up the path +creating directory entries, which can get complicated. By implication, parent +directories paths during any operation (rename(), unlink(),etc.) are from the +top layer. Dentries for directories from the bottom layer are only ever seen +or used by the lookup code. + +The two major problems we avoid with the above rules are: + +Lock ordering: Imagine two union stacks with the same two file systems: A +mounted over B, and B mounted over A. Sometimes locks on objects in both A and +B will have to be held simultanously. What order should they be acquired in? +Simply acquiring them from top to bottom will create a lock-ordering problem - +one thread acquires lock on object from A and then tries for a lock on object +from B, while another thread grabs the lock on object from B and then waits for +the lock on object from A. Some other lock ordering must be defined. + +Movement/change/disappearance of objects on multiple layers: A variety of nasty +corner cases arise when more than one layer is changing at the same time. +Changes in the directory topology and their effect on inheritance are of +special concern. Al Viro's canonical email on the subject: + +http://lkml.indiana.edu/hypermail/linux/kernel/0802.0/0839.html + +We don't try to solve any of these cases, just avoid them in the first place. + +Todo: Prevent top layer from being mounted more than once. + +Cross-layer interactions +------------------------ + +The VFS code simultaneously holds references to and/or modifies objects from +both the top and bottom layers in the following cases: + +Path lookup: + +Grabs i_mutex on bottom layer while holding i_mutex on top layer directory +inode. + +File copyup: + +Holds i_mutex on the parent directory from the top layer while copying up file +from lower layer. + +link(): + +File copyup of target while holding i_mutex on parent directory on top layer. +Followed by a normal link() operation. + +rename(): + +Holds s_vfs_rename_mutex on the top layer, i_mutex of the source's parent dir +(top layer), and i_mutex of the target's parent dir (also top layer) while +looking up and copying the bottom layer target and also creating the whiteout. + +Notes on rename(): + +First, renaming of directories returns EXDEV. It's not at all reasonable to +recursively copy directory trees and userspace has to handle this case anyway. +An exception is rename() of directories that exist only on the topmost layer; +this succeeds. + +Rename involves three steps on a union mount: (1) copyup of the file from the +bottom layer, (2) rename of the new top-layer copy to the target in the usual +manner, (3) creation of a whiteout covering the source of the rename. + +Directory copyup: + +Directory entries are copied up on the first readdir(). We hold the top layer +directory i_mutex throughout and sequentially acquire and drop the i_mutex for +each lower layer directory. + +VFS-fs interface +================ + +Read-only layer: No support necessary other than enforcement of really really +read-only semantics (done by VFS for local file systems). + +Writable layer: Must implement two new inode operations: + +int (*whiteout) (struct inode *, struct dentry *, struct dentry *); +int (*fallthru) (struct inode *, struct dentry *); + +And set the MS_WHITEOUT and MS_FALLTHRU flags to indicate support of +these operations. + +Todo: + +- Implement whiteouts and fallthrus in ext3 +- Implement whiteouts and fallthrus in btrfs + +Supported file systems +---------------------- + +Any file system can be a read-only layer. File systems must explicitly support +whiteouts and fallthrus in order to be a read-write layer. This patch set +implements whiteouts for ext2, tmpfs, and jffs2. We have tested ext2, tmpfs, +and iso9660 as the read-only layer. + +Todo: + - Test corner cases of case-insensitive/oversensitive file systems + +NFS interaction +=============== + +NFS is currently not supported as either type of layer. NFS as read-only layer +requires support from the server to honor the read-only guarantee needed for +the bottom layer. To do this, the server needs to revoke access to clients +requesting read-only file systems if the exported file system is remounted +read-write or unmounted (during which arbitrary changes can occur). Some +recent discussion: + +http://markmail.org/message/3mkgnvo4pswxd7lp + +NFS as the read-write layer would require implementation of the ->whiteout() +and ->fallthru() methods. DT_WHT directory entries are theoretically already +supported. + +Also, technically the requirement for a readdir() cookie that is stable across +reboots comes only from file systems exported via NFSv2: + +http://oss.oracle.com/pipermail/btrfs-devel/2008-January/000463.html + +Todo: + +- Guarantee really really read-only on NFS exports +- Implement whiteout()/fallthru() for NFS + +Userland support +================ + +The mount command must support the "-o union" mount option and pass the +corresponding MS_UNION flag to the kerel. A util-linux git tree with union +mount support is here: + +git://git.kernel.org/pub/scm/utils/util-linux-ng/val/util-linux-ng.git + +File system utilities must support whiteouts and fallthrus. An e2fsprogs git +tree with union mount support is here: + +git://git.kernel.org/pub/scm/fs/ext2/val/e2fsprogs.git + +Currently, whiteout directory entries are not returned to userland. While the +directory type for whiteouts, DT_WHT, has been defined for many years, very +little userland code handles them. Userland will never see fallthru directory +entries. + +Known non-POSIX behaviors +------------------------- + +- Any writing system call (unlink()/chmod()/etc.) can return ENOSPC or EIO + + Most programs are not tested and don't work well under conditions of ENOSPC. + The solution is to add more disk space. + +- Link count may be wrong for files on bottom layer with > 1 link count + + A file may have more than one hard link to it. When a file with multiple + hard links is copied up, any other hard links pointing to the same inode will + remain unchanged. If the file is looked up via one of the hard links on the + read-only layer, it will have the original link count (which is off by one at + this point). An example: + + /bin/link1 -> inode 100 + /etc/link2 -> inode 100 + + inode 100 will have link count 2. + + # echo "blah" > /bin/link1 + + Now /bin/link1 will be copied up to the topmost layer. But /etc/link2 will + still point to the original inode 100, and its link count will still be 2. + +- Link count on directories will be wrong before readdir() (fixable) +- File copyup is the logical equivalent of an update via copy + + rename(). Any existing open file descriptors will continue to refer + to the read-only copy on the bottom layer and will not see any + changes that occur after the copy-up. +- rename() of directory may fail with EXDEV +- fchmod()/fchown()/futimensat()/fsetattr() fail on O_RDONLY fds + +Status +====== + +The current union mounts implementation is feature-complete on local file +systems and passes an extensive union mounts test suite, available in the union +mounts Usermode Linux-based development kit: + +http://valerieaurora.org/union/union_mount_devkit.tar.gz + +The whiteout code has had some non-trivial level of review and testing, but +much of the code has had no external review or testing outside the authors' +machines. + +The latest version is available at: + +git://git.kernel.org/pub/scm/linux/kernel/git/val/linux-2.6.git + +Check the union mounts web page for the name of the latest branch: + +http://valerieaurora.org/union/ + +Todo: + +- Run more tests (e.g., XFS test suite) +- Get review from VFS maintainers + +Non-features +------------ + +Features we do not currently plan to support in union mounts: + +Online upgrade: E.g., installing software on a file system NFS exported to +clients while the clients are still up and running. Allowing the read-only +bottom layer of a union mount to change invalidates our locking strategy. + +Recursive copying of directories: E.g., implementing rename() across layers for +directories. Doing an in-kernel copy of a single file is bad enough. +Recursively copying a directory is a big no-no. + +Read-only top layer: The readdir() strategy fundamentally requires the ability +to create persistent directory entries on the top layer file system (which may +be tmpfs). However, you can union two read-only file systems by union mounting +a third file system (such as tmpfs) over the two read-onlly file systems. +Numerous alternatives to this readdir() strategy (including in-kernel or +in-application caching) exist and are compatible with union mounts with its +writing-readdir() implementation disabled. Creating a readdir() cookie that is +stable across multiple readdir()s requires one of: + +- Write to stable storage (e.g., fallthru dentries) +- Non-evictable kernel memory cache (doesn't handle NFS server reboot) +- Per-application caching by glibc readdir() + +Often these features are supported by other unioning file systems or by other +versions of union mounts. + +Contributing to union mounts +============================ + +The union mounts web page is here: + +http://valerieaurora.org/union/ + +It links to: + + - All git repositories + - Documentation + - An entire self-contained UML-based dev kit with README, etc. + +The best mailing list for discussing union mounts is: + +linux-fsdevel@vger.kernel.org + +http://vger.kernel.org/vger-lists.html#linux-fsdevel + +Thank you for reading!