* [PATCH] lazy umount (1/4) @ 2001-09-14 19:01 Alexander Viro 2001-09-14 19:02 ` [PATCH] lazy umount (2/4) Alexander Viro ` (5 more replies) 0 siblings, 6 replies; 30+ messages in thread From: Alexander Viro @ 2001-09-14 19:01 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Patch below (and 3 incrementals to it) implement lazy-umount. Some background: right now umount() does the following * find vfsmount * check if it's busy * detach from mountpoint and drop a reference * mntput() the sucker and return, letting the garbage collection to do its job - free vfsmount and possibly deactivate and free superblock. As the matter of fact, kernel will be quite happy to do the same for busy vfsmounts - they will simply float around (not attached to anything) until the become non-busy. At that point they will be freed, etc. Situation is analogous to unlinked but still busy files - detaching the tree as analog of removing a link and deactivation (put_super()) as analog of finally destroying the file. There are only two things to take care of - a) if we detach a parent we should do it for all children b) we should not mount anything on "floating" vfsmounts. Both are obviously staisfied for current code (presence of children means that vfsmount is busy and we can't mount on something that doesn't exist). NOTE: default behaviour of umount(2) is not changed. We have a new flag (MNT_DETACH) that tells umount() to be lazy. If it is absent - everything works as usual. It's _very_ useful in a lot of situations - basically, that's what umount -f should have been. E.g. suppose that /usr is kept busy by something (NFS hard mount/hung process/fs bug/whatever). Right now we can't do anything about that - it will keep mountpoint busy. umount("/usr", MNT_DETACH) will do the following: a) detach the damned thing from /usr. Nothing is mounted here anymore. b) umount /usr/local, etc. - no matter what state /usr is in and how badly it's b0rken. c) as soon as that fs becomes not busy it will be deactivated (put_super(), etc.) d) if /usr/local wasn't busy - fine, it gets deactivated immediately. If it was - no problem, it will be deactivated as soon as it isn't busy anymore. Code got a lot of beating here during the last 4 months - it's very convenient when you are doing fs hacking ;-) Actually I've got into a habit of using that instead of normal umount in all cases except the shutdown scripts - works just fine (for obvious reasons in case of shutdown non-lazy behaviour is precisely what we want). It had been in -ac since 2.4.8-ac8 (more than three weeks). Also no problems. Please, apply. Patch is split into 4 pieces, incremental to each other. Part 1/4: Killed move_vfsmnt(). change_root() does detach_mnt() and attach_mnt() by hands. diff -urN S10-pre9-inode/fs/super.c S10-pre9-move_vfsmnt/fs/super.c --- S10-pre9-inode/fs/super.c Fri Sep 14 12:58:45 2001 +++ S10-pre9-move_vfsmnt/fs/super.c Fri Sep 14 14:02:32 2001 @@ -447,37 +447,6 @@ return -ENOENT; } -#ifdef CONFIG_BLK_DEV_INITRD -static void move_vfsmnt(struct vfsmount *mnt, - struct nameidata *nd, - const char *dev_name) -{ - struct nameidata parent_nd; - char *new_devname = NULL; - - if (dev_name) { - new_devname = kmalloc(strlen(dev_name)+1, GFP_KERNEL); - if (new_devname) - strcpy(new_devname, dev_name); - } - - spin_lock(&dcache_lock); - detach_mnt(mnt, &parent_nd); - attach_mnt(mnt, nd); - - if (new_devname) { - if (mnt->mnt_devname) - kfree(mnt->mnt_devname); - mnt->mnt_devname = new_devname; - } - spin_unlock(&dcache_lock); - - /* put the old stuff */ - if (parent_nd.mnt != mnt) - path_release(&parent_nd); -} -#endif - static void kill_super(struct super_block *); void __mntput(struct vfsmount *mnt) @@ -1941,8 +1910,13 @@ { struct vfsmount *old_rootmnt; struct nameidata devfs_nd, nd; + struct nameidata parent_nd; + char *new_devname = kmalloc(strlen("/dev/root.old")+1, GFP_KERNEL); int error = 0; + if (new_devname) + strcpy(new_devname, "/dev/root.old"); + read_lock(¤t->fs->lock); old_rootmnt = mntget(current->fs->rootmnt); read_unlock(¤t->fs->lock); @@ -1959,6 +1933,9 @@ } else path_release(&devfs_nd); } + spin_lock(&dcache_lock); + detach_mnt(old_rootmnt, &parent_nd); + spin_unlock(&dcache_lock); ROOT_DEV = new_root_dev; mount_root(); #if 1 @@ -1980,9 +1957,18 @@ blivet = blkdev_get(ramdisk, FMODE_READ, 0, BDEV_FS); printk(KERN_NOTICE "Trying to unmount old root ... "); if (!blivet) { - blivet = do_umount(old_rootmnt, 0); - mntput(old_rootmnt); - if (!blivet) { + spin_lock(&dcache_lock); + list_del(&old_rootmnt->mnt_list); + if (atomic_read(&old_rootmnt->mnt_count) > 2) { + spin_unlock(&dcache_lock); + mntput(old_rootmnt); + blivet = -EBUSY; + } else { + spin_unlock(&dcache_lock); + mntput(old_rootmnt); + if (parent_nd.mnt != old_rootmnt) + path_release(&parent_nd); + mntput(old_rootmnt); ioctl_by_bdev(ramdisk, BLKFLSBUF, 0); printk("okay\n"); error = 0; @@ -1991,10 +1977,22 @@ } if (blivet) printk(KERN_ERR "error %d\n", blivet); + kfree(new_devname); return error; } - /* FIXME: we should hold i_zombie on nd.dentry */ - move_vfsmnt(old_rootmnt, &nd, "/dev/root.old"); + + spin_lock(&dcache_lock); + attach_mnt(old_rootmnt, &nd); + if (new_devname) { + if (old_rootmnt->mnt_devname) + kfree(old_rootmnt->mnt_devname); + old_rootmnt->mnt_devname = new_devname; + } + spin_unlock(&dcache_lock); + + /* put the old stuff */ + if (parent_nd.mnt != old_rootmnt) + path_release(&parent_nd); mntput(old_rootmnt); path_release(&nd); return 0; ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] lazy umount (2/4) 2001-09-14 19:01 [PATCH] lazy umount (1/4) Alexander Viro @ 2001-09-14 19:02 ` Alexander Viro 2001-09-14 19:03 ` [PATCH] lazy umount (3/4) Alexander Viro 2001-09-14 20:43 ` [PATCH] lazy umount (1/4) Linus Torvalds ` (4 subsequent siblings) 5 siblings, 1 reply; 30+ messages in thread From: Alexander Viro @ 2001-09-14 19:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Part 2/4: Added minimal rootfs (read-only root directory - absolute minimum, just to have something to go with root vfsmount), Now we have a fixed root vfsmount and that allows to kill a lot of special-casing, diff -urN S10-pre9-move_vfsmnt/fs/super.c S10-pre9-root_vfsmnt/fs/super.c --- S10-pre9-move_vfsmnt/fs/super.c Fri Sep 14 14:02:32 2001 +++ S10-pre9-root_vfsmnt/fs/super.c Fri Sep 14 14:03:34 2001 @@ -280,6 +280,7 @@ } static LIST_HEAD(vfsmntlist); +static struct vfsmount *root_vfsmnt; static struct list_head *mount_hashtable; static int hash_mask, hash_bits; @@ -346,52 +347,6 @@ nd->dentry->d_mounted++; } -/** - * add_vfsmnt - add a new mount node - * @nd: location of mountpoint or %NULL if we want a root node - * @root: root of (sub)tree to be mounted - * @dev_name: device name to show in /proc/mounts or %NULL (for "none"). - * - * This is VFS idea of mount. New node is allocated, bound to a tree - * we are mounting and optionally (OK, usually) registered as mounted - * on a given mountpoint. Returns a pointer to new node or %NULL in - * case of failure. - * - * Potential reason for failure (aside of trivial lack of memory) is a - * deleted mountpoint. Caller must hold ->i_zombie on mountpoint - * dentry (if any). - */ - -static struct vfsmount *add_vfsmnt(struct dentry *root, const char *dev_name) -{ - struct vfsmount *mnt; - struct super_block *sb = root->d_inode->i_sb; - char *name; - - mnt = alloc_vfsmnt(); - if (!mnt) - goto out; - - /* It may be NULL, but who cares? */ - if (dev_name) { - name = kmalloc(strlen(dev_name)+1, GFP_KERNEL); - if (name) { - strcpy(name, dev_name); - mnt->mnt_devname = name; - } - } - mnt->mnt_sb = sb; - mnt->mnt_root = dget(root); - mnt->mnt_mountpoint = mnt->mnt_root; - mnt->mnt_parent = mnt; - - spin_lock(&dcache_lock); - list_add(&mnt->mnt_list, vfsmntlist.prev); - spin_unlock(&dcache_lock); -out: - return mnt; -} - static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root) { char *name = old->mnt_devname; @@ -1200,8 +1155,7 @@ list_del(&mnt->mnt_list); spin_unlock(&dcache_lock); mntput(mnt); - if (parent_nd.mnt != mnt) - path_release(&parent_nd); + path_release(&parent_nd); return 0; } spin_unlock(&dcache_lock); @@ -1241,8 +1195,7 @@ list_del(&mnt->mnt_list); spin_unlock(&dcache_lock); mntput(mnt); - if (parent_nd.mnt != mnt) - path_release(&parent_nd); + path_release(&parent_nd); return 0; } @@ -1601,6 +1554,7 @@ void __init mount_root(void) { + struct nameidata root_nd; struct file_system_type * fs_type; struct super_block * sb; struct vfsmount *vfsmnt; @@ -1610,6 +1564,7 @@ void *handle; char path[64]; int path_start = -1; + char *name = "/dev/root"; #ifdef CONFIG_ROOT_NFS void *data; @@ -1746,13 +1701,26 @@ fs_type->name, (sb->s_flags & MS_RDONLY) ? " readonly" : ""); if (path_start >= 0) { + name = path + path_start; devfs_mk_symlink (NULL, "root", DEVFS_FL_DEFAULT, - path + 5 + path_start, NULL, NULL); - memcpy (path + path_start, "/dev/", 5); - vfsmnt = add_vfsmnt(sb->s_root, path + path_start); + name + 5, NULL, NULL); + memcpy (name, "/dev/", 5); } - else - vfsmnt = add_vfsmnt(sb->s_root, "/dev/root"); + vfsmnt = alloc_vfsmnt(); + if (!vfsmnt) + panic("VFS: alloc_vfsmnt failed for root fs"); + + vfsmnt->mnt_devname = kmalloc(strlen(name)+1, GFP_KERNEL); + if (vfsmnt->mnt_devname) + strcpy(vfsmnt->mnt_devname, name); + vfsmnt->mnt_sb = sb; + vfsmnt->mnt_root = dget(sb->s_root); + + root_nd.mnt = root_vfsmnt; + root_nd.dentry = root_vfsmnt->mnt_sb->s_root; + graft_tree(vfsmnt, &root_nd); + mntput(vfsmnt); + /* FIXME: if something will try to umount us right now... */ if (vfsmnt) { set_fs_root(current->fs, vfsmnt, sb->s_root); @@ -1761,10 +1729,8 @@ bdput(bdev); /* sb holds a reference */ return; } - panic("VFS: add_vfsmnt failed for root fs"); } - static void chroot_fs_refs(struct dentry *old_root, struct vfsmount *old_rootmnt, struct dentry *new_root, @@ -1878,15 +1844,12 @@ detach_mnt(new_nd.mnt, &parent_nd); detach_mnt(root_mnt, &root_parent); attach_mnt(root_mnt, &old_nd); - if (root_parent.mnt != root_mnt) - attach_mnt(new_nd.mnt, &root_parent); + attach_mnt(new_nd.mnt, &root_parent); spin_unlock(&dcache_lock); chroot_fs_refs(root,root_mnt,new_nd.dentry,new_nd.mnt); error = 0; - if (root_parent.mnt != root_mnt) - path_release(&root_parent); - if (parent_nd.mnt != new_nd.mnt) - path_release(&parent_nd); + path_release(&root_parent); + path_release(&parent_nd); out2: up(&old_nd.dentry->d_inode->i_zombie); up(&mount_sem); @@ -1959,24 +1922,19 @@ if (!blivet) { spin_lock(&dcache_lock); list_del(&old_rootmnt->mnt_list); - if (atomic_read(&old_rootmnt->mnt_count) > 2) { - spin_unlock(&dcache_lock); - mntput(old_rootmnt); - blivet = -EBUSY; - } else { - spin_unlock(&dcache_lock); - mntput(old_rootmnt); - if (parent_nd.mnt != old_rootmnt) - path_release(&parent_nd); - mntput(old_rootmnt); - ioctl_by_bdev(ramdisk, BLKFLSBUF, 0); - printk("okay\n"); - error = 0; - } + spin_unlock(&dcache_lock); + mntput(old_rootmnt); + mntput(old_rootmnt); + blivet = ioctl_by_bdev(ramdisk, BLKFLSBUF, 0); + path_release(&parent_nd); blkdev_put(ramdisk, BDEV_FS); } - if (blivet) + if (blivet) { printk(KERN_ERR "error %d\n", blivet); + } else { + printk("okay\n"); + error = 0; + } kfree(new_devname); return error; } @@ -1991,8 +1949,7 @@ spin_unlock(&dcache_lock); /* put the old stuff */ - if (parent_nd.mnt != old_rootmnt) - path_release(&parent_nd); + path_release(&parent_nd); mntput(old_rootmnt); path_release(&nd); return 0; @@ -2000,6 +1957,54 @@ #endif +/* + * Absolutely minimal fake fs - only empty root directory and nothing else. + * In 2.5 we'll use ramfs or tmpfs, but for now it's all we need - just + * something to go with root vfsmount. + */ +static struct dentry *rootfs_lookup(struct inode *dir, struct dentry *dentry) +{ + d_add(dentry, NULL); + return NULL; +} +static struct file_operations rootfs_dir_operations = { + read: generic_read_dir, + readdir: dcache_readdir, +}; +static struct inode_operations rootfs_dir_inode_operations = { + lookup: rootfs_lookup, +}; +static struct super_block *rootfs_read_super(struct super_block * sb, void * data, int silent) +{ + struct inode * inode; + struct dentry * root; + static struct super_operations s_ops = {}; + sb->s_op = &s_ops; + inode = new_inode(sb); + if (!inode) + return NULL; + inode->i_mode = S_IFDIR|0555; + inode->i_uid = inode->i_gid = 0; + inode->i_op = &rootfs_dir_inode_operations; + inode->i_fop = &rootfs_dir_operations; + root = d_alloc_root(inode); + if (!root) { + iput(inode); + return NULL; + } + sb->s_root = root; + return sb; +} +static DECLARE_FSTYPE(root_fs_type, "rootfs", rootfs_read_super, FS_NOMOUNT); + +static void __init init_mount_tree(void) +{ + register_filesystem(&root_fs_type); + root_vfsmnt = do_kern_mount("rootfs", 0, "rootfs", NULL); + if (IS_ERR(root_vfsmnt)) + panic("can't allocate root vfsmount"); +} + void __init mnt_init(unsigned long mempages) { struct list_head *d; @@ -2055,4 +2060,5 @@ d++; i--; } while (i); + init_mount_tree(); } ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] lazy umount (3/4) 2001-09-14 19:02 ` [PATCH] lazy umount (2/4) Alexander Viro @ 2001-09-14 19:03 ` Alexander Viro 2001-09-14 19:03 ` [PATCH] lazy umount (4/4) Alexander Viro 0 siblings, 1 reply; 30+ messages in thread From: Alexander Viro @ 2001-09-14 19:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Part 3/4: New function - check_mnt(). Checks that vfsmount is an ancestor of root_vfsmnt. Calls added - since we are going to do lazy umount we should take care of mounting on something that is busy, but already umounted. Fixed leak in pivot_root(2) diff -urN S10-pre9-root_vfsmnt/fs/super.c S10-pre9-check_mnt/fs/super.c --- S10-pre9-root_vfsmnt/fs/super.c Fri Sep 14 14:03:34 2001 +++ S10-pre9-check_mnt/fs/super.c Fri Sep 14 14:04:01 2001 @@ -327,6 +327,15 @@ return p; } +static int check_mnt(struct vfsmount *mnt) +{ + spin_lock(&dcache_lock); + while (mnt->mnt_parent != mnt) + mnt = mnt->mnt_parent; + spin_unlock(&dcache_lock); + return mnt == root_vfsmnt; +} + static void detach_mnt(struct vfsmount *mnt, struct nameidata *old_nd) { old_nd->dentry = mnt->mnt_mountpoint; @@ -1226,6 +1235,8 @@ retval = -EINVAL; if (nd.dentry != nd.mnt->mnt_root) goto dput_and_out; + if (!check_mnt(nd.mnt)) + goto dput_and_out; retval = -EPERM; if (!capable(CAP_SYS_ADMIN) && current->uid!=nd.mnt->mnt_owner) @@ -1277,7 +1288,7 @@ static int do_loopback(struct nameidata *nd, char *old_name) { struct nameidata old_nd; - struct vfsmount *mnt; + struct vfsmount *mnt = NULL; int err; err = mount_is_safe(nd); @@ -1293,12 +1304,16 @@ return err; down(&mount_sem); - err = -ENOMEM; - mnt = clone_mnt(old_nd.mnt, old_nd.dentry); + err = -EINVAL; + if (check_mnt(nd->mnt)) { + err = -ENOMEM; + mnt = clone_mnt(old_nd.mnt, old_nd.dentry); + } if (mnt) { err = graft_tree(mnt, nd); mntput(mnt); } + up(&mount_sem); path_release(&old_nd); return err; @@ -1318,6 +1333,9 @@ if (!capable(CAP_SYS_ADMIN)) return -EPERM; + if (!check_mnt(nd->mnt)) + return -EINVAL; + if (nd->dentry != nd->mnt->mnt_root) return -EINVAL; @@ -1396,27 +1414,31 @@ int mnt_flags, char *name, void *data) { struct vfsmount *mnt = do_kern_mount(type, flags, name, data); - int retval = PTR_ERR(mnt); + int err = PTR_ERR(mnt); if (IS_ERR(mnt)) goto out; - mnt->mnt_flags = mnt_flags; - down(&mount_sem); /* Something was mounted here while we slept */ while(d_mountpoint(nd->dentry) && follow_down(&nd->mnt, &nd->dentry)) ; + err = -EINVAL; + if (!check_mnt(nd->mnt)) + goto unlock; /* Refuse the same filesystem on the same mount point */ + err = -EBUSY; if (nd->mnt->mnt_sb == mnt->mnt_sb && nd->mnt->mnt_root == nd->dentry) - retval = -EBUSY; - else - retval = graft_tree(mnt, nd); + goto unlock; + + mnt->mnt_flags = mnt_flags; + err = graft_tree(mnt, nd); +unlock: up(&mount_sem); mntput(mnt); out: - return retval; + return err; } static int copy_mount_options (const void *data, unsigned long *where) @@ -1772,10 +1794,8 @@ asmlinkage long sys_pivot_root(const char *new_root, const char *put_old) { - struct dentry *root; - struct vfsmount *root_mnt; struct vfsmount *tmp; - struct nameidata new_nd, old_nd, parent_nd, root_parent; + struct nameidata new_nd, old_nd, parent_nd, root_parent, user_nd; char *name; int error; @@ -1794,11 +1814,14 @@ putname(name); if (error) goto out0; + error = -EINVAL; + if (!check_mnt(new_nd.mnt)) + goto out1; name = getname(put_old); error = PTR_ERR(name); if (IS_ERR(name)) - goto out0; + goto out1; error = 0; if (path_init(name, LOOKUP_POSITIVE|LOOKUP_FOLLOW|LOOKUP_DIRECTORY, &old_nd)) error = path_walk(name, &old_nd); @@ -1807,11 +1830,14 @@ goto out1; read_lock(¤t->fs->lock); - root_mnt = mntget(current->fs->rootmnt); - root = dget(current->fs->root); + user_nd.mnt = mntget(current->fs->rootmnt); + user_nd.dentry = dget(current->fs->root); read_unlock(¤t->fs->lock); down(&mount_sem); down(&old_nd.dentry->d_inode->i_zombie); + error = -EINVAL; + if (!check_mnt(user_nd.mnt)) + goto out2; error = -ENOENT; if (IS_DEADDIR(new_nd.dentry->d_inode)) goto out2; @@ -1820,10 +1846,10 @@ if (d_unhashed(old_nd.dentry) && !IS_ROOT(old_nd.dentry)) goto out2; error = -EBUSY; - if (new_nd.mnt == root_mnt || old_nd.mnt == root_mnt) + if (new_nd.mnt == user_nd.mnt || old_nd.mnt == user_nd.mnt) goto out2; /* loop */ error = -EINVAL; - if (root_mnt->mnt_root != root) + if (user_nd.mnt->mnt_root != user_nd.dentry) goto out2; if (new_nd.mnt->mnt_root != new_nd.dentry) goto out2; /* not a mountpoint */ @@ -1842,19 +1868,18 @@ } else if (!is_subdir(old_nd.dentry, new_nd.dentry)) goto out3; detach_mnt(new_nd.mnt, &parent_nd); - detach_mnt(root_mnt, &root_parent); - attach_mnt(root_mnt, &old_nd); + detach_mnt(user_nd.mnt, &root_parent); + attach_mnt(user_nd.mnt, &old_nd); attach_mnt(new_nd.mnt, &root_parent); spin_unlock(&dcache_lock); - chroot_fs_refs(root,root_mnt,new_nd.dentry,new_nd.mnt); + chroot_fs_refs(user_nd.dentry,user_nd.mnt,new_nd.dentry,new_nd.mnt); error = 0; path_release(&root_parent); path_release(&parent_nd); out2: up(&old_nd.dentry->d_inode->i_zombie); up(&mount_sem); - dput(root); - mntput(root_mnt); + path_release(&user_nd); path_release(&old_nd); out1: path_release(&new_nd); ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] lazy umount (4/4) 2001-09-14 19:03 ` [PATCH] lazy umount (3/4) Alexander Viro @ 2001-09-14 19:03 ` Alexander Viro 0 siblings, 0 replies; 30+ messages in thread From: Alexander Viro @ 2001-09-14 19:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Part 4/4: Lazy umount itself. New functions - next_mnt() (walks the vfsmount tree) and umount_tree() (detach all mounts in a subtree and do mntput() on each vfsmonut there). do_umount() reorganized to use umount_tree() (if vfsmount is not busy, it can't have anything mounted on it, so umount_tree() does the right thing). New flag to umount(2) - MNT_DETACH. With that flag we call umount_tree() even if vfsmount is busy - i.e. we undo all mounts in a subtree and drop the references that pin vfsmounts down. As soon as they are not busy they will be deactivated (by mntput()). diff -urN S10-pre9-check_mnt/fs/super.c S10-pre9-lazy_umount/fs/super.c --- S10-pre9-check_mnt/fs/super.c Fri Sep 14 14:04:01 2001 +++ S10-pre9-lazy_umount/fs/super.c Fri Sep 14 14:04:28 2001 @@ -356,6 +356,22 @@ nd->dentry->d_mounted++; } +static struct vfsmount *next_mnt(struct vfsmount *p, struct vfsmount *root) +{ + struct list_head *next = p->mnt_mounts.next; + if (next == &p->mnt_mounts) { + while (1) { + if (p == root) + return NULL; + next = p->mnt_child.next; + if (next != &p->mnt_parent->mnt_mounts) + break; + p = p->mnt_parent; + } + } + return list_entry(next, struct vfsmount, mnt_child); +} + static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root) { char *name = old->mnt_devname; @@ -1124,10 +1140,52 @@ return 0; } +void umount_tree(struct vfsmount *mnt) +{ + struct vfsmount *p; + LIST_HEAD(kill); + + if (list_empty(&mnt->mnt_list)) + return; + + for (p = mnt; p; p = next_mnt(p, mnt)) { + list_del(&p->mnt_list); + list_add(&p->mnt_list, &kill); + } + + while (!list_empty(&kill)) { + mnt = list_entry(kill.next, struct vfsmount, mnt_list); + list_del_init(&mnt->mnt_list); + if (mnt->mnt_parent == mnt) { + spin_unlock(&dcache_lock); + } else { + struct nameidata old_nd; + detach_mnt(mnt, &old_nd); + spin_unlock(&dcache_lock); + path_release(&old_nd); + } + mntput(mnt); + spin_lock(&dcache_lock); + } +} + static int do_umount(struct vfsmount *mnt, int flags) { struct super_block * sb = mnt->mnt_sb; - struct nameidata parent_nd; + int retval = 0; + + /* + * If we may have to abort operations to get out of this + * mount, and they will themselves hold resources we must + * allow the fs to do things. In the Unix tradition of + * 'Gee thats tricky lets do it in userspace' the umount_begin + * might fail to complete on the first run through as other tasks + * must return, and the like. Thats for the mount program to worry + * about for the moment. + */ + + if( (flags&MNT_FORCE) && sb->s_op->umount_begin) + sb->s_op->umount_begin(sb); /* * No sense to grab the lock for this test, but test itself looks @@ -1139,7 +1197,7 @@ * /reboot - static binary that would close all descriptors and * call reboot(9). Then init(8) could umount root and exec /reboot. */ - if (mnt == current->fs->rootmnt) { + if (mnt == current->fs->rootmnt && !(flags & MNT_DETACH)) { int retval = 0; /* * Special case for "unmounting" root ... @@ -1155,57 +1213,20 @@ spin_lock(&dcache_lock); - if (atomic_read(&sb->s_active) > 1) { - if (atomic_read(&mnt->mnt_count) > 2) { - spin_unlock(&dcache_lock); - return -EBUSY; - } - detach_mnt(mnt, &parent_nd); - list_del(&mnt->mnt_list); + if (atomic_read(&sb->s_active) == 1) { + /* last instance - try to be smart */ spin_unlock(&dcache_lock); - mntput(mnt); - path_release(&parent_nd); - return 0; + DQUOT_OFF(sb); + acct_auto_close(sb->s_dev); + spin_lock(&dcache_lock); } - spin_unlock(&dcache_lock); - - /* - * Before checking whether the filesystem is still busy, - * make sure the kernel doesn't hold any quota files open - * on the device. If the umount fails, too bad -- there - * are no quotas running any more. Just turn them on again. - */ - DQUOT_OFF(sb); - acct_auto_close(sb->s_dev); - - /* - * If we may have to abort operations to get out of this - * mount, and they will themselves hold resources we must - * allow the fs to do things. In the Unix tradition of - * 'Gee thats tricky lets do it in userspace' the umount_begin - * might fail to complete on the first run through as other tasks - * must return, and the like. Thats for the mount program to worry - * about for the moment. - */ - - if( (flags&MNT_FORCE) && sb->s_op->umount_begin) - sb->s_op->umount_begin(sb); - - /* Something might grab it again - redo checks */ - - spin_lock(&dcache_lock); - if (atomic_read(&mnt->mnt_count) > 2) { - spin_unlock(&dcache_lock); - return -EBUSY; + retval = -EBUSY; + if (atomic_read(&mnt->mnt_count) == 2 || flags & MNT_DETACH) { + umount_tree(mnt); + retval = 0; } - - /* OK, that's the point of no return */ - detach_mnt(mnt, &parent_nd); - list_del(&mnt->mnt_list); spin_unlock(&dcache_lock); - mntput(mnt); - path_release(&parent_nd); - return 0; + return retval; } /* diff -urN S10-pre9-check_mnt/include/linux/fs.h S10-pre9-lazy_umount/include/linux/fs.h --- S10-pre9-check_mnt/include/linux/fs.h Fri Sep 14 12:58:46 2001 +++ S10-pre9-lazy_umount/include/linux/fs.h Fri Sep 14 14:04:28 2001 @@ -635,6 +635,7 @@ */ #define MNT_FORCE 0x00000001 /* Attempt to forcibily umount */ +#define MNT_DETACH 0x00000002 /* Just detach from the tree */ #include <linux/minix_fs_sb.h> #include <linux/ext2_fs_sb.h> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-14 19:01 [PATCH] lazy umount (1/4) Alexander Viro 2001-09-14 19:02 ` [PATCH] lazy umount (2/4) Alexander Viro @ 2001-09-14 20:43 ` Linus Torvalds 2001-09-14 20:54 ` Alexander Viro 2001-09-15 12:32 ` jlnance ` (3 subsequent siblings) 5 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2001-09-14 20:43 UTC (permalink / raw) To: Alexander Viro; +Cc: linux-kernel On Fri, 14 Sep 2001, Alexander Viro wrote: > > There are only two things to take care of - > a) if we detach a parent we should do it for all children > b) we should not mount anything on "floating" vfsmounts. > Both are obviously staisfied for current code (presence of children > means that vfsmount is busy and we can't mount on something that > doesn't exist). I disagree about the "we can't mount on something that doesn't exist" part. If the detached mount is busy, it might be busy exactly because somebody has his working directory in it. Which means that mount /dev/hda ./xxxx by such a process could cause a mount within the "nonexisting" mount. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-14 20:43 ` [PATCH] lazy umount (1/4) Linus Torvalds @ 2001-09-14 20:54 ` Alexander Viro 0 siblings, 0 replies; 30+ messages in thread From: Alexander Viro @ 2001-09-14 20:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Fri, 14 Sep 2001, Linus Torvalds wrote: > > On Fri, 14 Sep 2001, Alexander Viro wrote: > > > > There are only two things to take care of - > > a) if we detach a parent we should do it for all children > > b) we should not mount anything on "floating" vfsmounts. > > Both are obviously staisfied for current code (presence of children > > means that vfsmount is busy and we can't mount on something that > > doesn't exist). > > I disagree about the "we can't mount on something that doesn't exist" > part. > > If the detached mount is busy, it might be busy exactly because somebody > has his working directory in it. Which means that > > mount /dev/hda ./xxxx > > by such a process could cause a mount within the "nonexisting" mount. Sure, which is exactly why we need to add checks. See part 3 - calls of check_mnt() prevent precisely that kind of situations. What I mean is that adding these checks is backwards-compatible - in absence of lazy umounts they are never triggered. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-14 19:01 [PATCH] lazy umount (1/4) Alexander Viro 2001-09-14 19:02 ` [PATCH] lazy umount (2/4) Alexander Viro 2001-09-14 20:43 ` [PATCH] lazy umount (1/4) Linus Torvalds @ 2001-09-15 12:32 ` jlnance 2001-09-15 20:51 ` Mike Fedyk 2001-09-16 16:37 ` Alex Stewart ` (2 subsequent siblings) 5 siblings, 1 reply; 30+ messages in thread From: jlnance @ 2001-09-15 12:32 UTC (permalink / raw) To: linux-kernel On Fri, Sep 14, 2001 at 03:01:26PM -0400, Alexander Viro wrote: > convenient when you are doing fs hacking ;-) Actually I've got into > a habit of using that instead of normal umount in all cases except > the shutdown scripts - works just fine (for obvious reasons in case > of shutdown non-lazy behaviour is precisely what we want). Why not shutdown? This is the place I think it would help me the most. Thanks, Jim ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-15 12:32 ` jlnance @ 2001-09-15 20:51 ` Mike Fedyk 2001-09-17 10:06 ` Matthias Andree 0 siblings, 1 reply; 30+ messages in thread From: Mike Fedyk @ 2001-09-15 20:51 UTC (permalink / raw) To: linux-kernel On Sat, Sep 15, 2001 at 08:32:36AM -0400, jlnance@intrex.net wrote: > On Fri, Sep 14, 2001 at 03:01:26PM -0400, Alexander Viro wrote: > > > convenient when you are doing fs hacking ;-) Actually I've got into > > a habit of using that instead of normal umount in all cases except > > the shutdown scripts - works just fine (for obvious reasons in case > > of shutdown non-lazy behaviour is precisely what we want). > > Why not shutdown? This is the place I think it would help me the most. > > Thanks, > > Jim If you have a FS with a process stuck in D state, and you shutdown with an umount that *always* does lazy unmounting you get the same affect, because you'd want the kernel to pause the shutdown until the FS was properly unmounted. Either way, you'd have a system you can't reboot without hardware reset if you have a process stuck in D state on a rw FS. I have a system with badblocks and shutdown stuck in D state. Kernel is 2.2.19 on PPC with the freeswan1.9 patch. It has been stuck for about two weeks, but operating normally otherwise. I'm going to have to sync; sync; and power off, as I need to update the kernel anyway. I too would like to see a way to force umount, but I don't see a safe way. OTOH, I'm also not a kernel hacker. Does anyone see a solution? Mike ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-15 20:51 ` Mike Fedyk @ 2001-09-17 10:06 ` Matthias Andree 0 siblings, 0 replies; 30+ messages in thread From: Matthias Andree @ 2001-09-17 10:06 UTC (permalink / raw) To: linux-kernel On Sat, 15 Sep 2001, Mike Fedyk wrote: > Either way, you'd have a system you can't reboot without hardware reset if > you have a process stuck in D state on a rw FS. That is EVIL, so eventually, the umount must succeed at all costs. -- Matthias Andree "Those who give up essential liberties for temporary safety deserve neither liberty nor safety." - Benjamin Franklin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-14 19:01 [PATCH] lazy umount (1/4) Alexander Viro ` (2 preceding siblings ...) 2001-09-15 12:32 ` jlnance @ 2001-09-16 16:37 ` Alex Stewart 2001-09-17 6:57 ` Forced umount (was lazy umount) Ville Herva 2001-09-17 8:29 ` Xavier Bestel 2001-09-17 10:04 ` [PATCH] lazy umount (1/4) Matthias Andree 2001-09-17 14:43 ` David Woodhouse 5 siblings, 2 replies; 30+ messages in thread From: Alex Stewart @ 2001-09-16 16:37 UTC (permalink / raw) To: Alexander Viro; +Cc: linux-kernel Alexander Viro wrote: > It's _very_ useful in a lot of situations - basically, that's what > umount -f should have been. Actually, I personally would still like a 'umount -f' (or 'umount --yes-I-know-what-Im-doing-and-I-really-mean-it-f' or whatever) that actually works for something other than NFS. In this age of hot-pluggable (and warm-pluggable) storage it's increasingly annoying to me that I should have to reboot the whole system to fix an otherwise hot-fixable hardware problem just because some processes got stuck in a disk-wait state before the problem was detected. I want an operation that will: 1. Interrupt/Abort any processes disk-waiting on the filesystem 2. Unmount the filesystem, immediately and always. 3. Release any filesystem-related holds on the underlying device. 4. Allow me to mount it again later (when problems are fixed). Basically, I want a 'kill -KILL' for filesystems. Now, admittedly, this is only something one would want to do in a last resort, but currently when one gets to that point of last resort, linux has no tools available for them. This is one of the areas that I've always considered linux (and most unixes) to have a gaping hole in the "sysadmin should be able to control their system, not vice-versa" philosophy, and really is needed in addition to any nifty tricks with "lazy umounting", etc. IMO (though the lazy umount thing is kinda nifty, and I can see other uses for it). Just my $.02.. -alex ^ permalink raw reply [flat|nested] 30+ messages in thread
* Forced umount (was lazy umount) 2001-09-16 16:37 ` Alex Stewart @ 2001-09-17 6:57 ` Ville Herva 2001-09-17 7:03 ` Aaron Lehmann 2001-09-17 8:29 ` Xavier Bestel 1 sibling, 1 reply; 30+ messages in thread From: Ville Herva @ 2001-09-17 6:57 UTC (permalink / raw) To: linux-kernel On Sun, Sep 16, 2001 at 09:37:40AM -0700, you [Alex Stewart] claimed: > Alexander Viro wrote: > > Actually, I personally would still like a 'umount -f' (or 'umount > --yes-I-know-what-Im-doing-and-I-really-mean-it-f' or whatever) that > actually works for something other than NFS. In this age of > hot-pluggable (and warm-pluggable) storage it's increasingly annoying to > me that I should have to reboot the whole system to fix an otherwise > hot-fixable hardware problem just because some processes got stuck in a > disk-wait state before the problem was detected. > > I want an operation that will: > > 1. Interrupt/Abort any processes disk-waiting on the filesystem > 2. Unmount the filesystem, immediately and always. > 3. Release any filesystem-related holds on the underlying device. > 4. Allow me to mount it again later (when problems are fixed). > > Basically, I want a 'kill -KILL' for filesystems. This gets my vote too... It would be interesting to hear if there are large obstacles that make this impossible or hard to implement or whether it's just that nobody has coded it yet. -- v -- v@iki.fi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Forced umount (was lazy umount) 2001-09-17 6:57 ` Forced umount (was lazy umount) Ville Herva @ 2001-09-17 7:03 ` Aaron Lehmann 2001-09-17 8:38 ` Alexander Viro 0 siblings, 1 reply; 30+ messages in thread From: Aaron Lehmann @ 2001-09-17 7:03 UTC (permalink / raw) To: Ville Herva; +Cc: linux-kernel On Mon, Sep 17, 2001 at 09:57:47AM +0300, Ville Herva wrote: > > Basically, I want a 'kill -KILL' for filesystems. > > This gets my vote too... <aol>Me too</aol> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Forced umount (was lazy umount) 2001-09-17 7:03 ` Aaron Lehmann @ 2001-09-17 8:38 ` Alexander Viro 2001-09-17 10:21 ` Matthias Andree 0 siblings, 1 reply; 30+ messages in thread From: Alexander Viro @ 2001-09-17 8:38 UTC (permalink / raw) To: Aaron Lehmann; +Cc: Ville Herva, linux-kernel On Mon, 17 Sep 2001, Aaron Lehmann wrote: > On Mon, Sep 17, 2001 at 09:57:47AM +0300, Ville Herva wrote: > > > Basically, I want a 'kill -KILL' for filesystems. > > > > This gets my vote too... > > <aol>Me too</aol> Look at it that way: we have two actions that need to be done upon umount. 1) detach it from the mountpoint(s) 2) shut it down For the latter we need to have no active IO on that fs _and_ nothing that could initiate such IO. We can separate #1 and #2, letting fs shutdown happen when it's no longer busy. That's what MNT_DETACH does. What you are asking for is different - you want fs-wide revoke(). That's all nice and dandy, but it's an independent problem and it will take a _lot_ of work. Including work in fs drivers. It _is_ worth doing, but it's 2.5 stuff (along with normal revoke(2)). IMNSHO we really should separate the stuff acting on mount tree from the stuff acting on filesystems. A lot of confusion comes from the places where we don't do that - see the "per-mountpoint read-only" thread couple of weeks ago for other examples. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Forced umount (was lazy umount) 2001-09-17 8:38 ` Alexander Viro @ 2001-09-17 10:21 ` Matthias Andree 2001-09-17 10:47 ` Tigran Aivazian 0 siblings, 1 reply; 30+ messages in thread From: Matthias Andree @ 2001-09-17 10:21 UTC (permalink / raw) To: linux-kernel On Mon, 17 Sep 2001, Alexander Viro wrote: > > On Mon, Sep 17, 2001 at 09:57:47AM +0300, Ville Herva wrote: > > > > Basically, I want a 'kill -KILL' for filesystems. Me, too, similarly. > Look at it that way: we have two actions that need to be done upon umount. > 1) detach it from the mountpoint(s) > 2) shut it down > > For the latter we need to have no active IO on that fs _and_ nothing > that could initiate such IO. We can separate #1 and #2, letting fs > shutdown happen when it's no longer busy. That's what MNT_DETACH > does. > > What you are asking for is different - you want fs-wide revoke(). > That's all nice and dandy, but it's an independent problem and it > will take a _lot_ of work. Including work in fs drivers. It _is_ > worth doing, but it's 2.5 stuff (along with normal revoke(2)). Well, I think there's another way, not sure if that's feasible, but it looks so: You say "no active IO and nothing that initiates that". So add a MNT_KILLBUSY flag that sends SIGKILL to all process that have resources on the file system and wake them from "D" state. That way, processes holding resources will be nuked right away, and the kernel can let go of the file system. Possibly needed: Patch other parts of the kernel to allow SIGKILL to kill a process in 'D' state, interrupting its operations. FreeBSD can do umount -f for anything except /, not sure how it implements that, never looked at the source. unmount(2) says "Active special devices continue to work, but any further accesses to any other active files result in errors even if the filesystem is later remounted." That'd be ok for me, it'd be some sort of a stale local file handle for the processes that got their filesystem pulled from beneath their feet, and it's probably the only way to prevent corruption. Newly connected processes after a mount should be unaffected by all this and behave as though the file system had never been (lazily or forcibly) unmounted before. However, thanks for the first step in the right way. -- Matthias Andree "Those who give up essential liberties for temporary safety deserve neither liberty nor safety." - Benjamin Franklin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Forced umount (was lazy umount) 2001-09-17 10:21 ` Matthias Andree @ 2001-09-17 10:47 ` Tigran Aivazian 2001-09-17 23:21 ` Alex Stewart 2001-09-17 23:23 ` Xavier Bestel 0 siblings, 2 replies; 30+ messages in thread From: Tigran Aivazian @ 2001-09-17 10:47 UTC (permalink / raw) To: Matthias Andree; +Cc: linux-kernel On Mon, 17 Sep 2001, Matthias Andree wrote: > On Mon, 17 Sep 2001, Alexander Viro wrote: > > > > On Mon, Sep 17, 2001 at 09:57:47AM +0300, Ville Herva wrote: > > > > > Basically, I want a 'kill -KILL' for filesystems. > > Me, too, similarly. > Hi, The forced umount patch has been available for ages and the latest version can be downloaded from: http://www.moses.uklinux.net/patches/forced-umount-2.4.9.patch If there are any issues with it please let me know. Regards, Tigran ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Forced umount (was lazy umount) 2001-09-17 10:47 ` Tigran Aivazian @ 2001-09-17 23:21 ` Alex Stewart 2001-09-17 23:23 ` Xavier Bestel 1 sibling, 0 replies; 30+ messages in thread From: Alex Stewart @ 2001-09-17 23:21 UTC (permalink / raw) To: Tigran Aivazian; +Cc: linux-kernel Tigran Aivazian wrote: > The forced umount patch has been available for ages and the latest version > can be downloaded from: > > http://www.moses.uklinux.net/patches/forced-umount-2.4.9.patch > > If there are any issues with it please let me know. Admittedly, I haven't tried it yet, but the one thing I can see that looks like an issue in the context of my original request is that if a filesystem's underlying device is having IO problems (bad hardware, etc), a forced umount using this patch will potentially also lock up (in a D state) trying to close up everything cleanly before unmounting it, contributing to the problem instead of fixing it. I certainly understand (and tend to agree with) Alexander Viro's opinion that this is a 2.5 issue (my original post was just to make sure it was pointed out that we do still need to work on this), mainly with the following reasoning: I see no reason why a properly functioning system should ever need to truly force a umount. Under normal conditions, if one really needs to do an emergency umount, it should be possible to use fuser/kill/etc to clean up any processes using the filesystem from userland and then perform a normal umount to cleanly unmount the filesystem in question (or, with lazy umount, this could conceivably even be done in the reverse order). The only reason for really-I-mean-it-forcing a umount is if there is some problem which has caused one or more processes to get permanently stuck in a state where they can't be killed (i.e. D state), and thus can't release their hold on the filesystem. Ignoring NFS for the moment, assuming that the block device drivers are written correctly, there should be no way for anything to get stuck in disk-wait for an extended period of time unless there is an actual physical hardware problem preventing IO (I believe.. correct me if I'm wrong). If there is a physical failure preventing IO to the underlying device, then it is very likely that any attempts made by the umount call to read from or write to the device will also block (unless there are some special hooks into the block device drivers to avoid this, which I assume there aren't). Therefore, if a forced umount is actually required, it must not attempt to do any IO to the filesystem in question either, and must instead just tear down the kernel's structures associated with it, leaving the filesystem dirty on the disk, possibly losing data in the process. This is why I had said in my first message that this should really only be a very last resort. Now, a version of this patch which didn't attempt to actually do any IO on the device and modified umount (and presumably the various fs drivers) so it doesn't do any flushing, fs structure cleanup, etc, might be able to adequately do this, but given the degree of unchartedness in this territory, I can certainly sympathize with not wanting to put it into 2.4. That's not to say that what the forced umount patch does isn't kinda nifty and convenient, and I would like to see this sort of functionality too, but it still doesn't really address the problem I was bringing up.. -alex ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Forced umount (was lazy umount) 2001-09-17 10:47 ` Tigran Aivazian 2001-09-17 23:21 ` Alex Stewart @ 2001-09-17 23:23 ` Xavier Bestel 2001-09-18 1:04 ` Alex Stewart 1 sibling, 1 reply; 30+ messages in thread From: Xavier Bestel @ 2001-09-17 23:23 UTC (permalink / raw) To: Alex Stewart; +Cc: Tigran Aivazian, Linux Kernel Mailing List le mar 18-09-2001 at 01:21 Alex Stewart a écrit : [...] > I see no reason why a properly functioning system should ever need to > truly force a umount. Under normal conditions, if one really needs to > do an emergency umount, it should be possible to use fuser/kill/etc to > clean up any processes using the filesystem from userland and then > perform a normal umount to cleanly unmount the filesystem in question > (or, with lazy umount, this could conceivably even be done in the > reverse order). The only reason for really-I-mean-it-forcing a umount > is if there is some problem which has caused one or more processes to > get permanently stuck in a state where they can't be killed (i.e. D > state), and thus can't release their hold on the filesystem. Imagine you have a cdrom mounted with process reading it. You may want to eject this cdrom without killing all processes, but just make them know that there's an error somewhere, go read something else. So it won't kill your shells, Nautilus/Konqueror, etc. Xav ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Forced umount (was lazy umount) 2001-09-17 23:23 ` Xavier Bestel @ 2001-09-18 1:04 ` Alex Stewart 2001-09-18 20:19 ` Pavel Machek 0 siblings, 1 reply; 30+ messages in thread From: Alex Stewart @ 2001-09-18 1:04 UTC (permalink / raw) To: Xavier Bestel; +Cc: Linux Kernel Mailing List Xavier Bestel wrote: > le mar 18-09-2001 at 01:21 Alex Stewart a écrit : > [...] > >>I see no reason why a properly functioning system should ever need to >>truly force a umount. Under normal conditions, if one really needs to >>do an emergency umount, it should be possible to use fuser/kill/etc to >>clean up any processes using the filesystem from userland and then >>perform a normal umount to cleanly unmount the filesystem in question [...] > > Imagine you have a cdrom mounted with process reading it. You may want > to eject this cdrom without killing all processes, but just make them > know that there's an error somewhere, go read something else. > So it won't kill your shells, Nautilus/Konqueror, etc. Ok, I should have made my terms more clear. I see no reason why a properly functioning system should *need* to force a umount. There's a difference between "need" and "want". What you're talking about is a convenience (and I admitted that the patch would make some things more convenient), but not a necessity. With decently written software you should be able to simply go to the relevant programs and tell them to stop using the filesystem before you unmount it. All this does is make that process a little less tedious. My point was that I agree that the proposed patch is nice, and I'd like to see something like it included, but considering it's primarily a convenience rather than addressing something you can't do other ways, I think it can probably wait until 2.5 at this point (at least assuming 2.6 doesn't take as long to get out the door as 2.4 did). As far as fixing the real problem I was bringing up originally (which the patch doesn't do), I also think it'll require a large enough change that although I'd like to see it sooner, I can understand holding off until 2.5. -alex ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Forced umount (was lazy umount) 2001-09-18 1:04 ` Alex Stewart @ 2001-09-18 20:19 ` Pavel Machek 0 siblings, 0 replies; 30+ messages in thread From: Pavel Machek @ 2001-09-18 20:19 UTC (permalink / raw) To: Alex Stewart, Xavier Bestel; +Cc: Linux Kernel Mailing List Hi! > >>I see no reason why a properly functioning system should ever need to > >>truly force a umount. Under normal conditions, if one really needs to > >>do an emergency umount, it should be possible to use fuser/kill/etc to > >>clean up any processes using the filesystem from userland and then > >>perform a normal umount to cleanly unmount the filesystem in question > [...] > > > > > Imagine you have a cdrom mounted with process reading it. You may want > > to eject this cdrom without killing all processes, but just make them > > know that there's an error somewhere, go read something else. > > So it won't kill your shells, Nautilus/Konqueror, etc. > > > Ok, I should have made my terms more clear. I see no reason why a > properly functioning system should *need* to force a umount. There's a > difference between "need" and "want". What you're talking about is a > convenience (and I admitted that the patch would make some things more > convenient), but not a necessity. With decently written software you > should be able to simply go to the relevant programs and tell them to > stop using the filesystem before you unmount it. All this does is make > that process a little less tedious. ...so... it means that my kwintv (tv-in window) application should have menu option to chdir somewhere else? Imagine (common error for me): cd /cdrom kwintv & [work] I now want to umount cdrom. How do I do it? Do you suggest each app to have "cd /" menu entry? Pavel -- I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Forced umount (was lazy umount) 2001-09-16 16:37 ` Alex Stewart 2001-09-17 6:57 ` Forced umount (was lazy umount) Ville Herva @ 2001-09-17 8:29 ` Xavier Bestel 2001-09-17 8:39 ` Alexander Viro 1 sibling, 1 reply; 30+ messages in thread From: Xavier Bestel @ 2001-09-17 8:29 UTC (permalink / raw) To: Linux Kernel Mailing List Alexander Viro wrote: > I want an operation that will: > > 1. Interrupt/Abort any processes disk-waiting on the filesystem Why ? Can't you just return -EBADHANDLE, -E(NX)IO or something similar ? Aborting should be reserved to mmap()ing processes. Xav ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Forced umount (was lazy umount) 2001-09-17 8:29 ` Xavier Bestel @ 2001-09-17 8:39 ` Alexander Viro 0 siblings, 0 replies; 30+ messages in thread From: Alexander Viro @ 2001-09-17 8:39 UTC (permalink / raw) To: Xavier Bestel; +Cc: Linux Kernel Mailing List On 17 Sep 2001, Xavier Bestel wrote: > Alexander Viro wrote: No, I hadn't. Watch the attributions. > > I want an operation that will: > > > > 1. Interrupt/Abort any processes disk-waiting on the filesystem > > Why ? Can't you just return -EBADHANDLE, -E(NX)IO or something similar ? > Aborting should be reserved to mmap()ing processes. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-14 19:01 [PATCH] lazy umount (1/4) Alexander Viro ` (3 preceding siblings ...) 2001-09-16 16:37 ` Alex Stewart @ 2001-09-17 10:04 ` Matthias Andree 2001-09-17 12:13 ` Alan Cox 2001-09-17 14:43 ` David Woodhouse 5 siblings, 1 reply; 30+ messages in thread From: Matthias Andree @ 2001-09-17 10:04 UTC (permalink / raw) To: linux-kernel On Fri, 14 Sep 2001, Alexander Viro wrote: > It's _very_ useful in a lot of situations - basically, that's what > umount -f should have been. E.g. suppose that /usr is kept busy > by something (NFS hard mount/hung process/fs bug/whatever). Right now > we can't do anything about that - it will keep mountpoint busy. > umount("/usr", MNT_DETACH) will do the following: > a) detach the damned thing from /usr. Nothing is mounted here > anymore. > b) umount /usr/local, etc. - no matter what state /usr is in and > how badly it's b0rken. > c) as soon as that fs becomes not busy it will be deactivated > (put_super(), etc.) > d) if /usr/local wasn't busy - fine, it gets deactivated > immediately. If it was - no problem, it will be deactivated as soon > as it isn't busy anymore. Well, from a practical point of view two things that would really help Linux: 1) Be able kill -9 processes from "D" state. 2) Force unmount busy file systems and kill -9 all related processes. Your idea is getting close to 2) which would relieve the dire need for 2) somewhat, but if your patch really prevents that beast from being remounted until it's flushed, in the above scenario, a reboot will be required if I want /usr or /usr/local back. My personal experience from production use: I automount NFS filesystems onto /net from several other hosts. When one of these NFS servers goes down, there is ABSOLUTELY NO WAY of getting rid of the mounts besides losing unrelated data (i. e. unmount in background, killall -9 rpciod - will possibly lose data written to other servers). Now, then the server is back up and I unmounted the old beast, I need to be able to remount that file system without reboot. Looks like a deeply sleeping (state == 'D') process might prevent that, and that'd render the whole good idea no good. So, if I e. g. umount "/usr", MNT_DETACH, I must be able to reattach it, lest I wand MNT_DETACH to give me yet another reason to reboot -- which would probably need to be made through mount -o remount,ro -a -v ; sync ; reboot -f to overcome those "busy" locks that don't recover. If I could nuke away a process from "D" state with kill -9, all that would hardly matter, the killall5 before unmount would get rid of all those hung processes and the shutdown would always succeed. If the reboot succeeds, well, that's a matter of whether you use hard disk write caches or journalling/phase tree file systems, BIOS and weather... If FreeBSD had client-side NFS locking support, I'd have switched long ago, just because it does umount -f and Linux' ever-rising load with stuck processes really annoys me and has brought one of my production machines down more than once. Soft NFS mounts are not really an option. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-17 10:04 ` [PATCH] lazy umount (1/4) Matthias Andree @ 2001-09-17 12:13 ` Alan Cox 2001-09-18 0:24 ` Alex Stewart 2001-09-18 9:07 ` David Woodhouse 0 siblings, 2 replies; 30+ messages in thread From: Alan Cox @ 2001-09-17 12:13 UTC (permalink / raw) To: Matthias Andree; +Cc: linux-kernel > Well, from a practical point of view two things that would really help > Linux: > > 1) Be able kill -9 processes from "D" state. Wont happen. > 2) Force unmount busy file systems and kill -9 all related processes. umount -f > down, there is ABSOLUTELY NO WAY of getting rid of the mounts besides > losing unrelated data (i. e. unmount in background, killall -9 rpciod - > will possibly lose data written to other servers). umount -f. > Now, then the server is back up and I unmounted the old beast, I need to > be able to remount that file system without reboot. Looks like a deeply > sleeping (state == 'D') process might prevent that, and that'd render > the whole good idea no good. Not with the lazy mount stuff > ago, just because it does umount -f and Linux' ever-rising load with > stuck processes really annoys me and has brought one of my production > machines down more than once. Soft NFS mounts are not really an option. The 'D' state stuff is not "load" - it didn't bring your box down, something else did. Its reported as uptime so the stick your finger in their and guess three magic numbers overall load view reflects I/O load. It and D state go back to the earliest days of Unix and the same issues occur in any OS Alan ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-17 12:13 ` Alan Cox @ 2001-09-18 0:24 ` Alex Stewart 2001-09-18 0:39 ` Matthias Andree 2001-09-18 9:07 ` David Woodhouse 1 sibling, 1 reply; 30+ messages in thread From: Alex Stewart @ 2001-09-18 0:24 UTC (permalink / raw) To: Alan Cox; +Cc: Matthias Andree, linux-kernel Alan Cox [quoting from Matthias Andree] wrote: >>Well, from a practical point of view two things that would really help >>Linux: >> >>1) Be able kill -9 processes from "D" state. Please note that there is a reason why the "D" state exists, and it is because there are certain times when interrupting a process can have significant consequences on the integrity of the entire filesystem (or other global resource) and must not be allowed for consistency. As it happens, most of the conditions which cause processes to get "stuck" in disk-wait state (usually because of hardware issues) happen to be exactly the places where it's most difficult to work around this (at least for physically-backed filesystems, less so for NFS et al). This, I assume, is at least part of the reason why Alan Says: > Wont happen. (and I would tend to agree) >>2) Force unmount busy file systems and kill -9 all related processes. >> > > umount -f ...doesn't do anything for non-NFS filesystems, though. There isn't even a hook for it in any of the other FS drivers. >>down, there is ABSOLUTELY NO WAY of getting rid of the mounts besides >>losing unrelated data (i. e. unmount in background, killall -9 rpciod - >>will possibly lose data written to other servers). >> > > umount -f. (for NFS) does work most of the time. I'm not quite sure why, but in some cases I've needed to combine this with mounting things 'intr' so I could manually kill processes off. >>ago, just because it does umount -f and Linux' ever-rising load with >>stuck processes really annoys me and has brought one of my production >>machines down more than once. Soft NFS mounts are not really an option. >> > > The 'D' state stuff is not "load" - it didn't bring your box down, something > else did. Well, yes and no. It's not _CPU_ load, but the stuck processes can consume other limited resources (memory, file descriptors, etc) to the point that the system is unable to function properly if enough of them accumulate. I have also had this happen. -alex ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-18 0:24 ` Alex Stewart @ 2001-09-18 0:39 ` Matthias Andree 2001-09-18 8:56 ` Alexander Viro 0 siblings, 1 reply; 30+ messages in thread From: Matthias Andree @ 2001-09-18 0:39 UTC (permalink / raw) To: Alex Stewart; +Cc: Alan Cox, Matthias Andree, linux-kernel On Mon, 17 Sep 2001, Alex Stewart wrote: > Please note that there is a reason why the "D" state exists, and it is > because there are certain times when interrupting a process can have > significant consequences on the integrity of the entire filesystem (or > other global resource) and must not be allowed for consistency. As it Well, you cannot tell your local power plant "you must not fail this very moment" either. Of course, data will be lost when a process is killed from "D" state, but if the admin can tell the data will be lost either way, ... Anyways, I'm going to try the forced umount patch real soon now. > happens, most of the conditions which cause processes to get "stuck" in > disk-wait state (usually because of hardware issues) happen to be > exactly the places where it's most difficult to work around this (at > least for physically-backed filesystems, less so for NFS et al). Well, with journals, phase trees/soft updates (hint!) that's less of an issue. > Well, yes and no. It's not _CPU_ load, but the stuck processes can > consume other limited resources (memory, file descriptors, etc) to the > point that the system is unable to function properly if enough of them > accumulate. I have also had this happen. You were faster than me with this post ;) With unpatched 2.2, I once had a machine stuck with an exhausted process table. -- Matthias Andree "Those who give up essential liberties for temporary safety deserve neither liberty nor safety." - Benjamin Franklin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-18 0:39 ` Matthias Andree @ 2001-09-18 8:56 ` Alexander Viro 2001-09-18 9:08 ` Matthias Andree 0 siblings, 1 reply; 30+ messages in thread From: Alexander Viro @ 2001-09-18 8:56 UTC (permalink / raw) To: Matthias Andree; +Cc: Alex Stewart, Alan Cox, linux-kernel On Tue, 18 Sep 2001, Matthias Andree wrote: > Well, you cannot tell your local power plant "you must not fail this > very moment" either. Of course, data will be lost when a process is > killed from "D" state, but if the admin can tell the data will be lost > either way, ... Gaack... Just how do you kill a process that holds a bunch of semaphores and got blocked on attempt to take one more? It's not about lost data, it's about completely screwed kernel. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-18 8:56 ` Alexander Viro @ 2001-09-18 9:08 ` Matthias Andree 2001-09-18 13:03 ` Alan Cox 0 siblings, 1 reply; 30+ messages in thread From: Matthias Andree @ 2001-09-18 9:08 UTC (permalink / raw) To: Alexander Viro; +Cc: Matthias Andree, Alex Stewart, Alan Cox, linux-kernel On Tue, 18 Sep 2001, Alexander Viro wrote: > > Well, you cannot tell your local power plant "you must not fail this > > very moment" either. Of course, data will be lost when a process is > > killed from "D" state, but if the admin can tell the data will be lost > > either way, ... > > Gaack... Just how do you kill a process that holds a bunch of semaphores > and got blocked on attempt to take one more? It's not about lost data, > it's about completely screwed kernel. Well, if that process holds processes and blocks getting one more, something is wrong with the process and it's prone to deadlocks. Even if kill -9 just means "fail this all further syscalls instantly" in such cases, that'd be fine. Something like an "BEING KILLED" state for processes. -- Matthias Andree "Those who give up essential liberties for temporary safety deserve neither liberty nor safety." - Benjamin Franklin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-18 9:08 ` Matthias Andree @ 2001-09-18 13:03 ` Alan Cox 0 siblings, 0 replies; 30+ messages in thread From: Alan Cox @ 2001-09-18 13:03 UTC (permalink / raw) To: Matthias Andree Cc: Alexander Viro, Matthias Andree, Alex Stewart, Alan Cox, linux-kernel > something is wrong with the process and it's prone to deadlocks. Even if > kill -9 just means "fail this all further syscalls instantly" in such It will already do that. The moment it gets to the syscall return its history ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-17 12:13 ` Alan Cox 2001-09-18 0:24 ` Alex Stewart @ 2001-09-18 9:07 ` David Woodhouse 1 sibling, 0 replies; 30+ messages in thread From: David Woodhouse @ 2001-09-18 9:07 UTC (permalink / raw) To: Alex Stewart; +Cc: Alan Cox, Matthias Andree, linux-kernel alex@foogod.com said: > >>1) Be able kill -9 processes from "D" state. > Please note that there is a reason why the "D" state exists, and it is > because there are certain times when interrupting a process can have > significant consequences on the integrity of the entire filesystem (or > other global resource) and must not be allowed for consistency. As > it happens, most of the conditions which cause processes to get > "stuck" in disk-wait state (usually because of hardware issues) > happen to be exactly the places where it's most difficult to work > around this (at least for physically-backed filesystems, less so for > NFS et al) What you say is true - implementing proper cleanup code for the case where an operation is interrupted is complex and not always reasonably possible. But that's an exceedingly poor excuse for not bothering to do so, in many situations. -- dwmw2 Filesystems are hard. Let's go shopping. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] lazy umount (1/4) 2001-09-14 19:01 [PATCH] lazy umount (1/4) Alexander Viro ` (4 preceding siblings ...) 2001-09-17 10:04 ` [PATCH] lazy umount (1/4) Matthias Andree @ 2001-09-17 14:43 ` David Woodhouse 5 siblings, 0 replies; 30+ messages in thread From: David Woodhouse @ 2001-09-17 14:43 UTC (permalink / raw) To: Matthias Andree; +Cc: linux-kernel matthias.andree@stud.uni-dortmund.de said: > Well, from a practical point of view two things that would really > help Linux: > 1) Be able kill -9 processes from "D" state. 'D' state means _uninterruptible_ sleep. To be interruptible, we need to have appropriate cleanup code at the point at which the code sleeps. Often, parts of the kernel sleep in 'D' state instead of in 'S' state just because someone's been too lazy to implement the cleanup. Each one of those bugs needs to be fixed individually - and many need core changes. Fixing read_inode() so that well-behaved filesystems can deal with being interrupted during its operation is on the list for 2.5. Others will be required too, I'm sure. -- dwmw2 ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2001-09-18 21:12 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-09-14 19:01 [PATCH] lazy umount (1/4) Alexander Viro 2001-09-14 19:02 ` [PATCH] lazy umount (2/4) Alexander Viro 2001-09-14 19:03 ` [PATCH] lazy umount (3/4) Alexander Viro 2001-09-14 19:03 ` [PATCH] lazy umount (4/4) Alexander Viro 2001-09-14 20:43 ` [PATCH] lazy umount (1/4) Linus Torvalds 2001-09-14 20:54 ` Alexander Viro 2001-09-15 12:32 ` jlnance 2001-09-15 20:51 ` Mike Fedyk 2001-09-17 10:06 ` Matthias Andree 2001-09-16 16:37 ` Alex Stewart 2001-09-17 6:57 ` Forced umount (was lazy umount) Ville Herva 2001-09-17 7:03 ` Aaron Lehmann 2001-09-17 8:38 ` Alexander Viro 2001-09-17 10:21 ` Matthias Andree 2001-09-17 10:47 ` Tigran Aivazian 2001-09-17 23:21 ` Alex Stewart 2001-09-17 23:23 ` Xavier Bestel 2001-09-18 1:04 ` Alex Stewart 2001-09-18 20:19 ` Pavel Machek 2001-09-17 8:29 ` Xavier Bestel 2001-09-17 8:39 ` Alexander Viro 2001-09-17 10:04 ` [PATCH] lazy umount (1/4) Matthias Andree 2001-09-17 12:13 ` Alan Cox 2001-09-18 0:24 ` Alex Stewart 2001-09-18 0:39 ` Matthias Andree 2001-09-18 8:56 ` Alexander Viro 2001-09-18 9:08 ` Matthias Andree 2001-09-18 13:03 ` Alan Cox 2001-09-18 9:07 ` David Woodhouse 2001-09-17 14:43 ` David Woodhouse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).